About Online Matters

Posts Tagged ‘introduction to SEO’

PostHeaderIcon Funniest SEO Keywords I’d Love to Optimize

As I’ve mentioned before, I’m a lover of words.  So I take a short break from the serious work of online marketing to enjoy the beauty of our language and contemplate how single letters can not only change meaning but add humor.  After all, in SEO words are our business.

It seems The Washington Post runs a regular feature called The Style Invitational in which it invites readers to change a single letter in words to create a non-existent new word and come up with a definition for the new version.  It also has a variant of the game where the reader is asked to give a humorous definition for an existing word.  They both produce hilarious results, but I’m going to focus on the former game.  Here’s an example:

ignoranus: An individual who is both stupid and an asshole
(Pardon the offensive language. I’m quoting it.)

These will have you bending over in laughter, but since we are in the SEO game, I thought it would be fun to see the exact match volumes from Google Adwords Keyword Tool.  It turns out “surprise! surprise!” that people actually search on these new terms.  So for all of the SEO experts in the room, you now have data to justify creating pages, content and tags for these words. The words are shown in the table below. By the way, I am in stitches that ‘bozone’ and ‘karmageddon’ are the two most searched for terms. Like, wow. I mean, like really? Dude, it’s as if some karmic word God has touched all humans with the ability to recognize the truly funny. Enjoy!

Word Definition Exact Match Volume
Ignoranus (n.) An individual who is both stupid and an asshole 480
Cashtration (n.) The act of buying a house, which renders the subject financially impotent for an indefinite period of time 91
Intaxication (n.) Euphoria at getting a tax refund, which lasts until you realize it was your money to start with 110
Reintarnation (n.) Coming back to life as a hillbilly 210
Bozone (n.) The substance surrounding stupid people that stops bright ideas from penetrating. The bozone layer, unfortunately, shows little sign of breaking down in the near future 1,900
Foreploy (n.) Any misrepresentation of yourself for the purpose of getting laid 46
Giraffiti (n.) Vandalism spray painted very, very high 390
Sarchasm (n.) The gulf between the author of sarcastic wit and the person who doesn’t get it 1,000
Inoculatte (n.) To take coffee intravenously when you are running late 36
Osteopornosis (n.) A degenerate disease 46
Karmageddon (n.) It’s like, when everybody is sending off all these really bad vibes, right? And then, like, the Earth explodes and it’s like, a serious bummer 1,600
Decafalon (n.) The grueling event of getting through the day consuming only things that are good for you 210
Glibido (n.) All talk and no action 58
Dopeler Effect (n.) The tendency of stupid ideas to seem smarter when they come at you rapidly 63
Arachnoleptic Fit (n.) The frantic dance performed just after you’ve accidentally walked through a spider web 91
Beelzebug (n.) Satan in the form of a mosquito that gets into your bathroom at 3 in the morning and cannot be cast out 170
Caterpallor (n.) The color you turn after finding half a worm in the fruit you’re eating 16
FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare

PostHeaderIcon Technical SEO: Introduction to Site Load Times and Natural Search Rankings

It is one of those nights.  Those pesky technicolor dreams woke me up at 2:30 and wouldn’t let me go back to sleep.  But under the heading “turning lemons into lemonade,” at least I have some extra time to write my blog even as I am piled high with the end of month deadlines. 

Today’s topic is part of my Technical SEO series (I just named it that – now I have to go back and change all my titles and meta tags…sigh) – site load times and whether or not they effect how you rank in the SERPs.  It is another one of those topics that came out of SMX East.  In this case it was Maile Ohye, Senior Support Engineer at Google, who spoke to this issue.  Maile is a wonderfully knowledgable evangelist for Google.  I have seen her speak at many shows. Her presentations are always clear and contain good, actionable techniques for improving your rankings in Google’s SERPs. I am not alone in thinking her knowledgable.   Stephan Spencer, one of the guys I most look up to in SEO,  thought enough of Maile to interview her in August of 2007, and she was also recently interviewed by SEOMoz, another leading light in the industry (and if you haven’t used their pro tools, then you are one arrow short of a full quiver for your SEO work).   

So when Maile says “stuff,” I listen.  In her talk at SMX East, she made note that poor site load times (we are talking something between good and absolutely horrible) could harm your rankings in Google search results. Let me define the problem, then try to explain what Maile was referring to, and finally my take on all this.

Basic Concepts of Site Loading Times for Getting Indexed

One the one hand, that site loading times effect search rankings isn’t news.  Let’s take some time to lay a bit of foundation, because the how of site speeds effecting search rankings didn’t really hit me until Maile’s talk.  It’s one of those things that is obvious once you think about it, but it doesn’t really come top of mind when you are focused on specific tasks in an SEO project.  It’s a “given” in the background of your work.  Unless the site is so horribly slow that it is obviously impacting the user experience, you really don’t think about load times when you are focusing on keywords and meta tags.  The site works, move on. 

But that’s not really true from the perspective of the search bots.   Google and the other engines have to crawl billions of pages on the web on a regular basis, bring that information back, and then index it.  Some pages can be crawled infrequently, but as more of the web moves to more real-time information due to social media, the bots have to crawl more sites in real time in order to provide good results.  But there are only so many bots and so much time to crawl these billions of pages.  So if you are Google, you write your bots with algorithms that allocate this scarce resource most efficiently and, hopefully, fairly. 

How would you or I do this?  Well, if I were writing a bot, the first thing I would give it is a time limit based on the size of the site.  That’s only fair.  If you have the ability to create more content, bravo.  I want to encourage that, because it is beneficial to the community of searchers.  So all other factors being equal (e.g. site loading time), I want to allocate time to ensure all your pages get into the index.  There is also the issue of search precision and relevance: I want all that content indexed so I can present the best results to searchers.   

Of course, I can’t just set a time limit based on the number of pages.  What if one site has long pages and another one short, pithy pages (clearly not mine!)?  What if one site has lots of images or other embedded content while another does not?  My algorithm has to be pretty sophisticated to determine these factors on the fly and adapt its baseline timeout settings to new information about a site as it crawls it.

The next algorithm I would include would have to do with the frequency at which you update your data.  The more often you update, the more often I need to have my bot come back and crawl the changed pages on your site. 

Another set of algorithms would have to do with spam.  From the perspective of my limited resource and search precision, I don’t want to include pages in my index that are clearly designed only for the search engines, that are link spammers, or that may only contain PPC ads and have no relevant information for the searcher. 

You get the picture.  I only have a limited window of time to capture continually changing data from the web in order for the data in my index to be reasonably fresh.  Therefore I’ve got to move mountains (of data) in a very short period of time but only so many processing cycles to apply.  And the number of variables I have to control for in my algorithms are numerous and, in many cases, not black and white.

This is where site load times come in.  If a site is large but slow, should it be allocated as much time as it needs to be indexed?  Do I have enough processing cycles to put up with the fact it takes three times as long as a similar site to be crawled?  Is it fair given a scarce resource to allocate time to slow site if it means I can’t index five other better performing sites in my current window of opportunity?  Does it optimize search precision and the relevance of results I can show to searchers?  And last but not least, as one of the guardians of the Web, is poor site performance something I want to encourage from the perspective of user experience and making the Web useful for as many people as possible?  Let’s face it, if the web is really slow, people won’t use it, and the eyeballs that will be available to view an ad from which I stand to make money will be less. 

Hello?  Are you there?  Can you say “zero tolerance?”  And from the perspective of the universal search engines, there is also my favorite radio station – “WIFM.”  What’s In it For Me?  Answer: nothing good.  That is why Google has made page load times a factor in Adwords Quality Score, as an example.

So, in the extreme case (let’s say a page takes 30 seconds to load), the bots won’t crawl most, if any, of the site.  The engines can’t afford the time and don’t want to encourage a poor user experience.  So you are ignored – which means you never get into the indexes.

When Is a Page’s or Site’s Loading Time Considered Slow?

What is an “extreme case?”  I have looked that up and the answer is not a fixed number.  Instead, for Google, the concept of “slow loading” is relative. 

The threshold for a ‘slow-loading’ landing page is the regional average plus three seconds.

The regional average is based on the location of the server hosting your website. If your website is hosted on a server in India, for example, your landing page’s load time will be compared to the average load time in that region of India. This is true even if your website is intended for an audience in the United States.

Two things to note about how we determined the threshold: 

  • We currently calculate load time as the time it takes to download the HTML content of your landing page. HTML load time is typically 10% to 30% of a page’s total load time. A three-second difference from the regional average, therefore, likely indicates a much larger disparity.
  • We measure load time from a very fast internet connection, so most users will experience a slower load time than we do.

Moreover, Google has a sliding scale with which is grades a site.  The following quote applies to Adwords and landing pages, but my guess is similar algorithms and grading are used in determining how often and long a site is crawled:

A keyword’s load time grade is based on the average load time of the landing pages in the ad group and of any landing pages in the rest of the account with the same domain. If multiple ad groups have landing pages with the same domain, therefore, the keywords in all these ad groups will have identical load time grades.

Two things to note:

  • When determining load time grade, the AdWords system follows destination URLs at both the ad and keyword level and evaluates the final landing page.
  • If your ad group contains landing pages with different domains, the keywords’ load time grades will be based on the domain with the slowest load time. All the keywords in an ad group will always have the same load time grade.

We’ll stop here for today.  Next time, we’ll talk about happens in the nether regions between fast and clearly slow.

FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare

PostHeaderIcon .htaccess Grammar Tutorial – .htaccess Special Characters

One thing this blog promises is to provide information about anything online that someone coming new to the business of online marketing needs to know.  The whole point being my pain is your gain.  Well, I have had some real pain lately around .htaccess file rewrite rules and I wanted to provide an easy translation to those with a .htaccess grammar tutorial for beginners.

What is a .htaccess file and Why Do I Care?

A .htaccess file is a type of configuration file for Apache servers (only.  If you are working with Microsoft IIS, this tutorial does not apply).   There are several ways an Apache web server can be configured.  Webmasters who have write access to the Apache directories can access a series of files (especially httpd.conf) that allow them to do what are called server-side includes, which are preferable in many cases because they allow for more powerful command structures and tend to run faster than .htaccess files.

Why you and I care about .htaccess files is that many of us run in a hosted environment where we do not have access to Apache directories.  In many cases we run on a shared server with other websites.  In these cases, the only way to control the configuration of the Apache web server is to use a .htaccess file.

The .htaccess file should always be put in the root directory of the site to which it applies.

Why would I want to control the configuration of the Apache server?  Well, the most likely scenario is that you have moved pages, deleted pages, or renamed pages and you don’t want to lose the authority they have gained with the search engines that gives you a good placement in the SERPs.  You do this through what are called redirects that tell the server that if someone requests a specific URL like http://www.onlinematters.com/oldpage.htm  it will automatically map that to an existing URL such as http://www.onlinematters.com/seo.htm.  Another common reason to have a .htaccess file is to provide a redirect to a custom error page when someone types in a bad URL.

.htaccess Files are Touchy

.htaccess files are very powerful and, like most computer communications, are very exacting in the grammer they use to communicate with the Apache server. The slightest syntax error (like a missing space) can result in severe server malfunction. Thus it is crucial to make backup copies of everything related to your site (including any original .htaccess files) before working with your .htaccess.  It is also important to check your entire website thoroughly after making any changes.  If any errors or other problems are encountered, employ your backups immediately to restore the original configuration while you test your .htaccess files.

Is There a Place I Can Check the Grammar of My .htaccess File?

I asked this question at SMX West 2009 at a panel on Apache server configuration and 301 redirects (301 Redirect, How Do I Love You? Let Me Count The Ways).    The speakers were Alex Bennert, In House SEO, Wall Street Journal; Jordan Kasteler, Co-Founder, SearchandSocial.com; Carolyn Shelby from CShel; Stephan Spencer, Founder & President, Netconcepts; and Jonah Stein, Founder, ItsTheROI. These are all serious SEO players – so they would know if anyone would.  When the question got asked, they all looked puzzled and then said "I just test it live on my staging server."  I have spent hours looking for a .htaccess grammar checker and have yet to find anything that has any real horsepower.   So seemingly the only options to check your .htaccess grammar are either to test it on your stage or live server or find a friend or Apache guru who can review what you have done. 

Basic .htaccess Character Set

We’re going to start this overview of .htaccess grammar with a review of the core character definitions (which is probably the hardest documentation I’ve had to find.  You’d think everyone would start with "the letters"  of the alphabet, but believe it or not, they don’t).  In the next post, we will then construct basic statements with these character sets so you can see them in action.  After that, we’ll move into multipage commands. 

#
the # instructs the server to ignore the line. Used for comments. Each comment line requires it’s own #. It is good practice to use only letters, numbers, dashes, and underscores, as this will help eliminate/avoid potential server parsing errors.
 
[C]
Chain: instructs server to chain the current rule with the previous rule.
 
[E=variable:value]
Environmental Variable: instructs the server to set the environmental variable "variable" to "value".
 
[F]
Forbidden: instructs the server to return a 403 Forbidden to the client. 
 
[G]
Gone: instructs the server to deliver Gone (no longer exists) status message. 
 
[L]
Last rule: instructs the server to stop rewriting after the preceding directive is processed.
 
[N]
Next: instructs Apache to rerun the rewrite rule until all rewriting directives have been achieved.
 
[NC]
No Case: defines any associated argument as case-insensitive. i.e., "NC" = "No Case".
 
[NE]
No Escape: instructs the server to parse output without escaping characters.
 
[NS]
No Subrequest: instructs the server to skip the directive if internal sub-request.  
 
[OR]
Or: specifies a logical "or" that ties two expressions together such that either one proving true will cause the associated rule to be applied.
 
[P]
Proxy: instructs server to handle requests by mod_proxy
 
[PT]
Pass Through: instructs mod_rewrite to pass the rewritten URL back to Apache for further processing.  
 
[QSA]
Append Query String: directs server to add the query string to the end of the expression (URL).
 
[R]
Redirect: instructs Apache to issue a redirect, causing the browser to request the rewritten/modified URL.
 
[S=x]
Skip: instructs the server to skip the next "x" number of rules if a match is detected.
 
[T=MIME-type]
Mime Type: declares the mime type of the target resource.
 
[]
specifies a character class, in which any character within the brackets will be a match. e.g., [xyz] will match either an x, y, or z.
 
[]+
character class in which any combination of items within the brackets will be a match. e.g., [xyz]+ will match any number of x’s, y’s, z’s, or any combination of these characters.
 
[^]
specifies not within a character class. e.g., [^xyz] will match any character that is neither x, y, nor z.
 
[a-z]
a dash (-) between two characters within a character class ([]) denotes the range of characters between them. e.g., [a-zA-Z] matches all lowercase and uppercase letters from a to z.
 
a{n}
specifies an exact number, n, of the preceding character. e.g., x{3} matches exactly three x’s.
 
a{n,}
specifies n or more of the preceding character. e.g., x{3,} matches three or more x’s.
 
a{n,m}
specifies a range of numbers, between n and m, of the preceding character. e.g., x{3,7} matches three, four, five, six, or seven x’s.
 
()
used to group characters together, thereby considering them as a single unit. e.g., (perishable)?press will match press, with or without the perishable prefix.
 
^
denotes the beginning of a regex (regex = regular expression) test string. i.e., begin argument with the proceeding character.
 
$
denotes the end of a regex (regex = regular expression) test string. i.e., end argument with the previous character.
 
 ?
declares as optional the preceding character. e.g., monzas? will match monza or monzas, while mon(za)? will match either mon or monza. i.e., x? matches zero or one of x.
 
!
declares negation. e.g., “!string” matches everything except “string”.
 
.
a dot (or period) indicates any single arbitrary character.
 
-
instructs “not to” rewrite the URL, as in “...domain.com.* - [F]”.
 
+
matches one or more of the preceding character. e.g., G+ matches one or more G’s, while "+" will match one or more characters of any kind.
 
*
matches zero or more of the preceding character. e.g., use “.*” as a wildcard.
 
|
declares a logical “or” operator. for example, (x|y) matches x or y.
 
\
escapes special characters ( ^ $ ! . * | ). e.g., use “\.” to indicate/escape a literal dot.
 
\.
indicates a literal dot (escaped).
 
/*
zero or more slashes.
 
.*
zero or more arbitrary characters.
 
^$
defines an empty string.
 
^.*$
the standard pattern for matching everything.
 
[^/.]
defines one character that is neither a slash nor a dot.
 
[^/.]+
defines any number of characters which contains neither slash nor dot.
 
http://
this is a literal statement — in this case, the literal character string, “http://”.
 
^domain.*
defines a string that begins with the term “domain”, which then may be proceeded by any number of any characters.
 
^domain\.com$
defines the exact string “domain.com”.
 
-d
tests if string is an existing directory
 
-f
tests if string is an existing file
 
-s
tests if file in test string has a non-zero value   

 

      Redirection Header Codes [ ^ ]

  • 301 – Moved Permanently
  • 302 – Moved Temporarily
  • 403 – Forbidden
  • 404 – Not Found
  • 410 – Gone

 

 

 

 

FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare

PostHeaderIcon From the Trenches: Overview of SEO Project Implementation

If you look on the About page of my blog, you’ll see that one of the key audiences I am concerned with are search marketers who for one reason or another came late to the game. While I have been doing online products and marketing since 1992 (think about that one for a second…), I did come late to the search marketing party because at the time that these markets evolved I hired people to sweat the details of day-to-day implementation. I was actually pretty knowledgeable and could do many things myself that most CMOs couldn’t do – e.g. develop extensive keyword lists and upload them into the Adwords Editor, write VB scripts – but I was still a long way away from all the intricacies of the practice.

And let’s start with that as my first statement on the science of online marketing: developing online strategies is relatively easy. It is in the details/intricacies of implementation that great search marketers make their bones. Details come in many forms and, in the interest of time, I will not go into categorizing these. We’ll do that at the end of the series on “From the Trenches.” In the meantime, we’ll just work through them for each area that I’ll cover.

The initial portion of this series will focus on Search Engine Optimization, since this is a very hot topic in the current economy.  The approach – given this is a blog – will be to do relatively short modules on one subject within each major topic.  Each module will begin with the name of the section  and then the topic at hand (e.g. Keyword Analysis – Building the Initial Keyword Universe).  I am going to add presentation material in the form of audio powerpoints whuch will provide a bit more extensive coverage of each topic.  How long will the presentations be – not sure yet.  We’ll have to try it out and see – after all, I’m learning how to present in blog mode just as you are learning how to learn in blog mode. 

The sections for basic SEO will run as follows:

  • Introduction to SEO
  • Keyword Analysis
  • Site Architecture Issues
  • On-Page Content and Meta Data
  • Link Building
  • Combining the Basics into an SEO Program

Looking forward to these sessions.  I expect to start them shortly – once I get the presentation technology set up. 

Reblog this post [with Zemanta]
FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare
Posts By Date
October 2014
M T W T F S S
« Jul    
 12345
6789101112
13141516171819
20212223242526
2728293031