About Online Matters

Posts Tagged ‘SEO training’

PostHeaderIcon Technical SEO: Site Loading Times and SEO Rankings Part 2

In my last post, I discussed the underlying issues regarding site loading times and SEO rankings.  What I tried to do was help the reader understand why site loading times are important from the perspective of someone designing a search engine that has to crawl billions of pages.  The post also outlines a few of the structures that they would have to put in place to accurately and effectively crawl all the pages they need in a limited time with limited processing power.  I also tried to show that a search engine like Google has a political and economic agenda in ensuring fast sites, not just a technical agenda.  Google wants as many people/eyeballs on the web as possible, so it is to their advantage to ensure that web sites provide a good user experience.  As a result, they feel quite justified in penalizing sites that do not have good speed/performance characteristics.

As you would expect, the conclusion is that if your site is hugely slow you will not get indexed and will not rank in the SERPs.  What is “hugely slow”?  Google has indicated that slow is a relative notion and is determined based on the loading times typical of sites in your geographical region.  Having said that, relative or not, from an SEO perspective I wouldn’t want to have a site where pages are taking more than 10 seconds on average to load.  We have found from the sites we have tested and built that average load times higher than approximately 10 seconds to completely load a page will have a significant impact on being indexed.  From a UE perspective, there is some interesting data that the limit on visitors patience is about 6-8 secondsGoogle has studied this data, so it would probably prefer to set its threshhold in that region.  But I doubt it can.   Many small sites are not that sophisticated, do not know these kinds of rules, and do not know how to check or evaluate their site loading times.  Besides this, there are often problems with hosts that cause servers to run slowly at times.  Google has to take that into account, as well.  So I believe that the timeout has to be substantially higher than 6-8 seconds, but 10 seconds as a crawl limit is a guess, 

I have yet to see a definitive statement by anyone as to what the absolute limit is for site speed before indexing ceases altogether (if you have a reference, please post it in the comments).  I’m sure that if a bot comes to a first page and it exceeds the bot’s timeout threshold in the algorithm, your site won’t get spidered at all.  But once the bot gets by the first page, it has to do an on-going computation of average page loading times for the site to determine if the average exceeds the built-in threshold, so at least a few pages would have to be crawled in that case. 

Now here’s where it gets interesting.  What happens between fast (let’s say < 1-2 second loading times, although this is actually pretty slow but a number Matt Cutts in the video below indicates is ok) and the timeout limit?  And how important is site speed as a ranking signal?  Let’s answer one question at a time.

When a site is slow but not slow enough to hit any built-in timeout limits (not tied to the number of pages), a couple of things can happen.   We do know that Google allocates bot time by the number of pages on the site and the number of pages it has to index/re-index.  So for a small site that performs poorly, it is likely that most of the pages will get indexed.  Likely, but not a guarantee.  It all depends on the cumulative time lag versus the average that a site creates. If a site is large, then you can almost guarantee that some pages will not be indexed, as the cumulative time lag will ultimately hit the threshold set by the bots for a site of that number of pages. By definition, some of your content will not get ranked and you will not get the benefit of that content in your rankings.

As an aside, by the way, there has been a lot of confusion around the <meta name=”revisit-after”> tag.  The revisit-after meta tag takes this form <meta name=”revisit-after” content=”5 days”>. 
This tag supposedly tells the bots how often to come back to the site to reindex this specific page (in this case 5 days).  The idea is that you can improve the crawlability of your site by telling the bots not to index certain pages all the time, but only some of the time.  I became aware of this tag at SMX East, when one of the “authorities” on SEO mentioned it as usable for this purpose.  The trouble is that, from everything I have read, the tag is completely unsupported by any of the major engines, and was only supported by one tiny search engine (SearchBC)  many years ago. 

But let’s say you are one of the lucky sites where the site runs slowly but all the pages do get indexed.  Do Google or any of the other major search engines use the site’s performance as a ranking signal?  In other words, all my pages are in the index.  So you would expect that they would be ranked based on the quality of their content and their authority derived from inbound links, site visits, time-on-site, and other typical ranking signals.  Performance is not a likely candidate for a ranking signal and isn’t important. 

If you thought that, then you were wrong. Historically, Google has said, and Matt Cutts reiterates this in the video below, that site load times do not influence search rankings.  But while that may be true now, it may not be in the near future.  And this is where Maile’s comments took me by surprise.  In a small group session at SMX East 2009, Maile was asked about site performance and rankings.  She indicated that for the “middle ground” sites that are indexing but loading slowly, site performance may already be used to influence rankings.  Who is right, I can’t say.  These are both highly respected professionals who choose their words carefully. 

 

 

 

Whatever is true, Google is sending us signals that this change is coming.  Senior experts like Matt and Maile don’t say these things lightly.  They are well considered and probably approved positions that they are asked to take.  This is Google’s way of preventing us from getting mad when the change occurs.  Google has the fallback of saying “we warned you this could happen.”  Which from today’s viewpoiint means it will happen.

Conclusion: Start working on your site performance now, as it will be important for SEO rankings later. 

Oh and, by the way, your user experience will just happen to be better, which is clearly the real reason to fix site performance. 

And it isn’t only Google that may make this change.  Engineers from Yahoo! recently filed a patent with the title “Web Document User Experience Characterization Methods and Systems” which bears on this topic.  Let me quote paragraph 21:

With so many websites and web pages being available and with varying hardware and software configurations, it may be beneficial to identify which web documents may lead to a desired user experience and which may not lead to a desired user experience. By way of example but not limitation, in certain situations it may be beneficial to determine (e.g., classify, rank, characterize) which web documents may not meet performance or other user experience expectations if selected by the user. Such performance may, for example, be affected by server, network, client, file, and/or like processes and/or the software, firmware, and/or hardware resources associated therewith. Once web documents are identified in this manner the resulting user experience information may, for example, be considered when generating the search results.

In does not appear Yahoo! has implemented any aspect of this patent yet, and who knows what the Bing agreement will mean for site performance and search.  But clearly this is a “problem” that the search engine muftis have set their eyes on and I would expect that if Google does implement it, others will follow.

FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare

PostHeaderIcon Technical SEO: Introduction to Site Load Times and Natural Search Rankings

It is one of those nights.  Those pesky technicolor dreams woke me up at 2:30 and wouldn’t let me go back to sleep.  But under the heading “turning lemons into lemonade,” at least I have some extra time to write my blog even as I am piled high with the end of month deadlines. 

Today’s topic is part of my Technical SEO series (I just named it that – now I have to go back and change all my titles and meta tags…sigh) – site load times and whether or not they effect how you rank in the SERPs.  It is another one of those topics that came out of SMX East.  In this case it was Maile Ohye, Senior Support Engineer at Google, who spoke to this issue.  Maile is a wonderfully knowledgable evangelist for Google.  I have seen her speak at many shows. Her presentations are always clear and contain good, actionable techniques for improving your rankings in Google’s SERPs. I am not alone in thinking her knowledgable.   Stephan Spencer, one of the guys I most look up to in SEO,  thought enough of Maile to interview her in August of 2007, and she was also recently interviewed by SEOMoz, another leading light in the industry (and if you haven’t used their pro tools, then you are one arrow short of a full quiver for your SEO work).   

So when Maile says “stuff,” I listen.  In her talk at SMX East, she made note that poor site load times (we are talking something between good and absolutely horrible) could harm your rankings in Google search results. Let me define the problem, then try to explain what Maile was referring to, and finally my take on all this.

Basic Concepts of Site Loading Times for Getting Indexed

One the one hand, that site loading times effect search rankings isn’t news.  Let’s take some time to lay a bit of foundation, because the how of site speeds effecting search rankings didn’t really hit me until Maile’s talk.  It’s one of those things that is obvious once you think about it, but it doesn’t really come top of mind when you are focused on specific tasks in an SEO project.  It’s a “given” in the background of your work.  Unless the site is so horribly slow that it is obviously impacting the user experience, you really don’t think about load times when you are focusing on keywords and meta tags.  The site works, move on. 

But that’s not really true from the perspective of the search bots.   Google and the other engines have to crawl billions of pages on the web on a regular basis, bring that information back, and then index it.  Some pages can be crawled infrequently, but as more of the web moves to more real-time information due to social media, the bots have to crawl more sites in real time in order to provide good results.  But there are only so many bots and so much time to crawl these billions of pages.  So if you are Google, you write your bots with algorithms that allocate this scarce resource most efficiently and, hopefully, fairly. 

How would you or I do this?  Well, if I were writing a bot, the first thing I would give it is a time limit based on the size of the site.  That’s only fair.  If you have the ability to create more content, bravo.  I want to encourage that, because it is beneficial to the community of searchers.  So all other factors being equal (e.g. site loading time), I want to allocate time to ensure all your pages get into the index.  There is also the issue of search precision and relevance: I want all that content indexed so I can present the best results to searchers.   

Of course, I can’t just set a time limit based on the number of pages.  What if one site has long pages and another one short, pithy pages (clearly not mine!)?  What if one site has lots of images or other embedded content while another does not?  My algorithm has to be pretty sophisticated to determine these factors on the fly and adapt its baseline timeout settings to new information about a site as it crawls it.

The next algorithm I would include would have to do with the frequency at which you update your data.  The more often you update, the more often I need to have my bot come back and crawl the changed pages on your site. 

Another set of algorithms would have to do with spam.  From the perspective of my limited resource and search precision, I don’t want to include pages in my index that are clearly designed only for the search engines, that are link spammers, or that may only contain PPC ads and have no relevant information for the searcher. 

You get the picture.  I only have a limited window of time to capture continually changing data from the web in order for the data in my index to be reasonably fresh.  Therefore I’ve got to move mountains (of data) in a very short period of time but only so many processing cycles to apply.  And the number of variables I have to control for in my algorithms are numerous and, in many cases, not black and white.

This is where site load times come in.  If a site is large but slow, should it be allocated as much time as it needs to be indexed?  Do I have enough processing cycles to put up with the fact it takes three times as long as a similar site to be crawled?  Is it fair given a scarce resource to allocate time to slow site if it means I can’t index five other better performing sites in my current window of opportunity?  Does it optimize search precision and the relevance of results I can show to searchers?  And last but not least, as one of the guardians of the Web, is poor site performance something I want to encourage from the perspective of user experience and making the Web useful for as many people as possible?  Let’s face it, if the web is really slow, people won’t use it, and the eyeballs that will be available to view an ad from which I stand to make money will be less. 

Hello?  Are you there?  Can you say “zero tolerance?”  And from the perspective of the universal search engines, there is also my favorite radio station – “WIFM.”  What’s In it For Me?  Answer: nothing good.  That is why Google has made page load times a factor in Adwords Quality Score, as an example.

So, in the extreme case (let’s say a page takes 30 seconds to load), the bots won’t crawl most, if any, of the site.  The engines can’t afford the time and don’t want to encourage a poor user experience.  So you are ignored – which means you never get into the indexes.

When Is a Page’s or Site’s Loading Time Considered Slow?

What is an “extreme case?”  I have looked that up and the answer is not a fixed number.  Instead, for Google, the concept of “slow loading” is relative. 

The threshold for a ‘slow-loading’ landing page is the regional average plus three seconds.

The regional average is based on the location of the server hosting your website. If your website is hosted on a server in India, for example, your landing page’s load time will be compared to the average load time in that region of India. This is true even if your website is intended for an audience in the United States.

Two things to note about how we determined the threshold: 

  • We currently calculate load time as the time it takes to download the HTML content of your landing page. HTML load time is typically 10% to 30% of a page’s total load time. A three-second difference from the regional average, therefore, likely indicates a much larger disparity.
  • We measure load time from a very fast internet connection, so most users will experience a slower load time than we do.

Moreover, Google has a sliding scale with which is grades a site.  The following quote applies to Adwords and landing pages, but my guess is similar algorithms and grading are used in determining how often and long a site is crawled:

A keyword’s load time grade is based on the average load time of the landing pages in the ad group and of any landing pages in the rest of the account with the same domain. If multiple ad groups have landing pages with the same domain, therefore, the keywords in all these ad groups will have identical load time grades.

Two things to note:

  • When determining load time grade, the AdWords system follows destination URLs at both the ad and keyword level and evaluates the final landing page.
  • If your ad group contains landing pages with different domains, the keywords’ load time grades will be based on the domain with the slowest load time. All the keywords in an ad group will always have the same load time grade.

We’ll stop here for today.  Next time, we’ll talk about happens in the nether regions between fast and clearly slow.

FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare

PostHeaderIcon Why Search Engine Optimization Matters

Yesterday, a reasonably well-known blogger, Derek Powazek, (whose article,  against my strongest desire to give it any further validation in the search engine rankings where this article now ranks #10, gets a link here because at the end of the day the Web is about transparency and the I truly believe that any argument must win out in the realm of ideas) let out a rant against the entire SEO industry.  The article, and the responses both on his website and on SearchEngineLand upset me hugely for a number of reasons:

  1. The tone was so angry and demeaning.  As I get older (and I hope wiser), I want to speak in a way that bridges differences and heals breaches, not stokes the fire of discord.
     
  2. I believe the tone was angry in order to evoke strong responses in order to build links in order to rank high in the search engines.  Linkbuilding is a tried-and-true, legitimate SEO practice and so invalidates the entire argument Derek makes that understanding and implementing a well thought-out SEO program is so much flim-flam. Even more important to me, do we need to communicate in angry rants in order to get attention in this information and message-overwhelmed universe?  Is that what we’ve come to?  I sure hope not.
     
  3. The article’s advice about user experience coming first was right (and has my 100% agreement).  But it’s assumptions about SEO and therefore its conclusions were incorrect.
     
  4. The article’s erroneous conclusions will hurt a number of people who could benefit from good SEO advice.  THAT is probably the thing that saddens me most – it will send people off in a direction that will hurt them and their businesses substantially.  Good SEO is not a game.  It has business implications and by giving bad advice, Derek is potentially costing a lot of good people money that they need to feed their families in these tough times.
     
  5. The number of responses in agreement with his blog was overwhelming relative to the number that did not agree.  That also bothered me – that the perception of our industry is such that so many people feel our work does not serve a legitimate purpose.
     
  6. The comments on Danny Sullivan’s response to Derek were few, but they were also pro-SEO (of course).  Which means that the two communities represented in these articles aren’t talking to each other in any meaningful way.  You agree with Derek, comment to him.  You agree with Danny, comment there.  Like attracts like, but it doesn’t ultimately yield to two communities bridging their difference.

I, too, started to make comments on both sites.  But my comments rambled (another one of those prerogatives I maintain in this 140 character world) , and so it became apparent that I would need to create a blog entry to respond to the article – which I truly do not want to do because, frankly, I really don’t want to "raise the volume" of this disagreement between SEO believers and SEO heretics.  But I have some things to say that no one else is saying, and it goes to the heart of the debate on why SEO IS important and is absolutely not the same thing as a good user experience of web development.

So to Danny, to Derek, and to all the folks who have entered this debate, I  hope you find my comments below useful and, if not, my humble apologies for wasting your valuable time.

Good site design is about the user experience. I started my career in online and software UE design when that term was an oxymoron.  My first consulting company, started in 1992, was inspired by David Kelley, my advisor at Stanford, CEO of IDEO (one of the top design firms in the world),  and now founder and head of the Stanford School of Design.  I was complaining to David about the horrible state of user interfaces in software and that we needed an industry initiative to wake people.  His response was "If it’s that bad, go start a company to fix it."  Which I did.  That company built several products that won awards for their innovative user experience. 

That history, I hope, gives credibility to next next statement: I have always believed, and will always believe, that good site experience trumps anything else you do.  Design the site for your customer first.  Create a "natural" conversation with them as they flow through the site and you will keep loyal customers.

Having said that, universal search engines do not "think" like human beings.  They are neither as fast or as capable of understanding loosely organized data.  They work according to algorithms that attempt to mimic how we think, but they are a long way from actually achieving it.  These algorithms, as well as the underlying structures used to make them effective, also must run in an environment of limited processing power (even with all of Google’s server farms) relative to the volume of information, so they have also made trade-offs between accuracy and speed.  Examples of these structures are biword indices and positional indices.  I could go into the whole theory of Information architecture, but leave it to say that a universal search engine needs help in interpreting content in order to determine relevance. 

Meta data is one area that has evolved to help the engines do this.  So, first and foremost, by expecting this information, the search engines expect and need us to include data especially for them that has nothing to do with the end user experience and everything with being found relevant and precise.  This is the simplest form of SEO.  There are two points here:

  1. Who is going to decide what content goes into these tags? Those responsible for the user experience?  I think not.  The web developers? Absolutely positively not.  It is marketing and those who position the business who make these decisions.
     
  2. But how does marketing know how a search engine thinks?  Most do not.  And there are real questions of expertise here, albeit for this simple example, small ones that marketers can (and are) learning.  What words should I use for the search engines to consider a page relevant that then go into the meta data?  For each meta data field, what is the best structure for the information?  How many marketers, for example, know that a title tag should only be 65 characters long, or that a description tag needs to be limited to 150 characters, that the words in anchor text are a critical signaling factor to the search engines, or that alt-text on an image can help a search engine understand the relevance of a page to a specific keyword/search?  How many know the data from the SEOMoz Survey of SEO Ranking Factors showing that the best place to put that keyword in a title tag for search engine relevance is in first position, and that the relevance drops off in an exponential manner the further back in the title the keyword sits?  On this last point, there isn’t one client who hasn’t asked me for advice.  They don’t and can’t track the industry and changes in the algorithms closely enough to follow this.  They need SEO experts to help them – a member of the trained and experienced professionals in the SEO industry, and this is just the simplest of SEO issues.

How about navigation?  If you do not build good navigational elements into deeper areas of the site (especially large sites) that are specifically for search engines and/or you build it in a way that a search engine can’t follow (e.g. by the use of Javascript in the headers or flash in a single navigation mechanism throughout the site), then the content won’t get indexed and the searcher won’t find it.  Why are good search-specific navigational elements so important?  It comes back to limited processing power and time.  Each search engine has only so much time and power to crawl the billions of pages on the web, numbers that grow every day and where existing pages can change not just every day but every minute.  These engines set rules about how much time they will spend crawling a site and if your site is too hard to crawl or too slow, many pages will not make it into the indices and the searcher, once again, will never find what could be hugely relevant content.

Do UE designers or web developers understand these rules at a high level?  Many now know not to use Javascript in the headers, to be careful how they use flash and, if they do use it in the navigation, to have alternate navigational elements that help the bots crawl the site quickly.  Is this about user experience?  Only indirectly.  It is absolutely positively about search engine optimization, however, and it is absolutely valid in terms of assuring that relevant content gets put in front of a searcher.

Do UE designers or web developers understand the gotchas with these rules?  Unlikely.  Most work in one organization with one site (or a limited number of sites).  They haven’t seen the actual results of good and bad navigation across 20 or 50 or 100 sites and learned from hard experience what is a best practice.  They need an SEO expert, someone from the SEO  industry, to help guide them.  

Now let’s talk about algorithms.  Algorithms, as previously mentioned, are an attempt (and a crude one based on our current understanding of search) at mimicking how searchers (or with personalization a single searcher) think so that searches return relevant results to that searcher.  If you write just for people, and structure your pages just for readers, you are doing your customers a disservice because what a human can understand as relevant and what a search engine can grasp of meaning and relevance are not the same.  You might write great content for people on the site, but if a search engine can’t understand its relevance, a searcher who cares about that content will never find it. 

Does that mean you sacrifice the user experience to poor writing?  Absolutely, positively, without qualification not.  But within the structure of good writing and a good user experience, you can design a page that helps/signals the search engines, with their limited time and ability to understand content, what keywords are relevant to that page. 

Artificial constraint, you say? How is that different than the constraints I have when trying to get my message across with a good user experience in a data sheet?  How is that different when I have 15 minutes to get a story across in a presentation to my executive staff in a way that is user friendly and clear in its messaging?  Every format, every channel for marketing has constraints.  The marketer’s (not the UE designer’s and not the web developer’s) job is to communicate effectively within those constraints. 

Does a UE designer or the web developer understand how content is weighted to create a ranking score for a specific keyword within a specific search engine?  Do they know how position on the page relates to how the engines consider relevance? Do they understand how page length effects the weighting?  Take this example.  If I have two pages, one of which contains two exact copies of the content on the first page, which is more relevant?  From a search engine’s perspective they are equally relevant, but if a search engine just counted all the words on the second page, it would rank higher.  A fix is needed.

One way that many search engines compensate for page length differences is through something called pivoted document length normalization (write me if you want a further explanation).  How do I know this?  Because I am a search engine professional who spends time every day learning his trade, reading on information architecture and studying the patents filed by the major search engines to understand how the technology of search can or may be evolving.  Because – since I can’t know exactly what algorithms are currently being used –  I run tests on real sites to see the impact of various content elements on ranking.  Because I do competitive analysis on other industry sites to see what legitimate, white hat techniques they have used and content they have created (e.g. videos on a youtube channel that then point to their main site) to signal the relevance of their content to the search engines. 

And to Derek’s point, what happens when the algorithms change?  Who is there watching the landscape for any change, like an Indian scout in a hunting party looking for the herd of buffalo?  Who can help interpret the change and provide guidance on how to adapt content to maintain the best signals of relevance for a keyword to the search engines?  Derek makes this sound like an impossible task and a lot of hocus-pocus.  It isn’t and it’s not.  Professional SEO consultants do this for their clients all the time, by providing good maintenance services.  They help their clients content remain relevant, and hopefully ranking high in the SERPs, in the face of constant change.

So to ask again, do UE designers or product managers understand these issues around content?  At some high level they may (a lot don’t).  Do web developers? Maybe, but most don’t because they don’t deal in content – it is just filler that the code has to deal with (it could be lorem ipsum for their purposes).  Do any of these folks in their day-to-day struggles to do their jobs under tight time constraints have the time to spend, as I do, learning and understanding these subtleties or running tests? Absolutely, positively not.  They need an SEO professional to counsel them so that they make the right design, content and development choices.

I’ll stop here.  I pray I’ve made my point calmly and with a reasoned argument.  Please let me know.  I’m not Danny Sullivan, Vanessa Fox, Rand Fishkin, or Stephan Spencer, to name a few of our industry’s leading lights.  I’m just a humble SEO professional who adores his job and wants to help his clients rank well with their relevant business information.  My clients seem to like me and respect what I do, and that gives me an incredible amount of satisfaction and joy. 

I’m sorry Derek, but I respect your viewpoint and I know that you truly believe what you are saying.  But as an honest, hard-working SEO professional, I couldn’t disagree with you more.

FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare

PostHeaderIcon .htaccess Grammar Tutorial – .htaccess Special Characters

One thing this blog promises is to provide information about anything online that someone coming new to the business of online marketing needs to know.  The whole point being my pain is your gain.  Well, I have had some real pain lately around .htaccess file rewrite rules and I wanted to provide an easy translation to those with a .htaccess grammar tutorial for beginners.

What is a .htaccess file and Why Do I Care?

A .htaccess file is a type of configuration file for Apache servers (only.  If you are working with Microsoft IIS, this tutorial does not apply).   There are several ways an Apache web server can be configured.  Webmasters who have write access to the Apache directories can access a series of files (especially httpd.conf) that allow them to do what are called server-side includes, which are preferable in many cases because they allow for more powerful command structures and tend to run faster than .htaccess files.

Why you and I care about .htaccess files is that many of us run in a hosted environment where we do not have access to Apache directories.  In many cases we run on a shared server with other websites.  In these cases, the only way to control the configuration of the Apache web server is to use a .htaccess file.

The .htaccess file should always be put in the root directory of the site to which it applies.

Why would I want to control the configuration of the Apache server?  Well, the most likely scenario is that you have moved pages, deleted pages, or renamed pages and you don’t want to lose the authority they have gained with the search engines that gives you a good placement in the SERPs.  You do this through what are called redirects that tell the server that if someone requests a specific URL like http://www.onlinematters.com/oldpage.htm  it will automatically map that to an existing URL such as http://www.onlinematters.com/seo.htm.  Another common reason to have a .htaccess file is to provide a redirect to a custom error page when someone types in a bad URL.

.htaccess Files are Touchy

.htaccess files are very powerful and, like most computer communications, are very exacting in the grammer they use to communicate with the Apache server. The slightest syntax error (like a missing space) can result in severe server malfunction. Thus it is crucial to make backup copies of everything related to your site (including any original .htaccess files) before working with your .htaccess.  It is also important to check your entire website thoroughly after making any changes.  If any errors or other problems are encountered, employ your backups immediately to restore the original configuration while you test your .htaccess files.

Is There a Place I Can Check the Grammar of My .htaccess File?

I asked this question at SMX West 2009 at a panel on Apache server configuration and 301 redirects (301 Redirect, How Do I Love You? Let Me Count The Ways).    The speakers were Alex Bennert, In House SEO, Wall Street Journal; Jordan Kasteler, Co-Founder, SearchandSocial.com; Carolyn Shelby from CShel; Stephan Spencer, Founder & President, Netconcepts; and Jonah Stein, Founder, ItsTheROI. These are all serious SEO players – so they would know if anyone would.  When the question got asked, they all looked puzzled and then said "I just test it live on my staging server."  I have spent hours looking for a .htaccess grammar checker and have yet to find anything that has any real horsepower.   So seemingly the only options to check your .htaccess grammar are either to test it on your stage or live server or find a friend or Apache guru who can review what you have done. 

Basic .htaccess Character Set

We’re going to start this overview of .htaccess grammar with a review of the core character definitions (which is probably the hardest documentation I’ve had to find.  You’d think everyone would start with "the letters"  of the alphabet, but believe it or not, they don’t).  In the next post, we will then construct basic statements with these character sets so you can see them in action.  After that, we’ll move into multipage commands. 

#
the # instructs the server to ignore the line. Used for comments. Each comment line requires it’s own #. It is good practice to use only letters, numbers, dashes, and underscores, as this will help eliminate/avoid potential server parsing errors.
 
[C]
Chain: instructs server to chain the current rule with the previous rule.
 
[E=variable:value]
Environmental Variable: instructs the server to set the environmental variable "variable" to "value".
 
[F]
Forbidden: instructs the server to return a 403 Forbidden to the client. 
 
[G]
Gone: instructs the server to deliver Gone (no longer exists) status message. 
 
[L]
Last rule: instructs the server to stop rewriting after the preceding directive is processed.
 
[N]
Next: instructs Apache to rerun the rewrite rule until all rewriting directives have been achieved.
 
[NC]
No Case: defines any associated argument as case-insensitive. i.e., "NC" = "No Case".
 
[NE]
No Escape: instructs the server to parse output without escaping characters.
 
[NS]
No Subrequest: instructs the server to skip the directive if internal sub-request.  
 
[OR]
Or: specifies a logical "or" that ties two expressions together such that either one proving true will cause the associated rule to be applied.
 
[P]
Proxy: instructs server to handle requests by mod_proxy
 
[PT]
Pass Through: instructs mod_rewrite to pass the rewritten URL back to Apache for further processing.  
 
[QSA]
Append Query String: directs server to add the query string to the end of the expression (URL).
 
[R]
Redirect: instructs Apache to issue a redirect, causing the browser to request the rewritten/modified URL.
 
[S=x]
Skip: instructs the server to skip the next "x" number of rules if a match is detected.
 
[T=MIME-type]
Mime Type: declares the mime type of the target resource.
 
[]
specifies a character class, in which any character within the brackets will be a match. e.g., [xyz] will match either an x, y, or z.
 
[]+
character class in which any combination of items within the brackets will be a match. e.g., [xyz]+ will match any number of x’s, y’s, z’s, or any combination of these characters.
 
[^]
specifies not within a character class. e.g., [^xyz] will match any character that is neither x, y, nor z.
 
[a-z]
a dash (-) between two characters within a character class ([]) denotes the range of characters between them. e.g., [a-zA-Z] matches all lowercase and uppercase letters from a to z.
 
a{n}
specifies an exact number, n, of the preceding character. e.g., x{3} matches exactly three x’s.
 
a{n,}
specifies n or more of the preceding character. e.g., x{3,} matches three or more x’s.
 
a{n,m}
specifies a range of numbers, between n and m, of the preceding character. e.g., x{3,7} matches three, four, five, six, or seven x’s.
 
()
used to group characters together, thereby considering them as a single unit. e.g., (perishable)?press will match press, with or without the perishable prefix.
 
^
denotes the beginning of a regex (regex = regular expression) test string. i.e., begin argument with the proceeding character.
 
$
denotes the end of a regex (regex = regular expression) test string. i.e., end argument with the previous character.
 
 ?
declares as optional the preceding character. e.g., monzas? will match monza or monzas, while mon(za)? will match either mon or monza. i.e., x? matches zero or one of x.
 
!
declares negation. e.g., “!string” matches everything except “string”.
 
.
a dot (or period) indicates any single arbitrary character.
 
-
instructs “not to” rewrite the URL, as in “...domain.com.* - [F]”.
 
+
matches one or more of the preceding character. e.g., G+ matches one or more G’s, while "+" will match one or more characters of any kind.
 
*
matches zero or more of the preceding character. e.g., use “.*” as a wildcard.
 
|
declares a logical “or” operator. for example, (x|y) matches x or y.
 
\
escapes special characters ( ^ $ ! . * | ). e.g., use “\.” to indicate/escape a literal dot.
 
\.
indicates a literal dot (escaped).
 
/*
zero or more slashes.
 
.*
zero or more arbitrary characters.
 
^$
defines an empty string.
 
^.*$
the standard pattern for matching everything.
 
[^/.]
defines one character that is neither a slash nor a dot.
 
[^/.]+
defines any number of characters which contains neither slash nor dot.
 
http://
this is a literal statement — in this case, the literal character string, “http://”.
 
^domain.*
defines a string that begins with the term “domain”, which then may be proceeded by any number of any characters.
 
^domain\.com$
defines the exact string “domain.com”.
 
-d
tests if string is an existing directory
 
-f
tests if string is an existing file
 
-s
tests if file in test string has a non-zero value   

 

      Redirection Header Codes [ ^ ]

  • 301 – Moved Permanently
  • 302 – Moved Temporarily
  • 403 – Forbidden
  • 404 – Not Found
  • 410 – Gone

 

 

 

 

FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare

PostHeaderIcon From the Trenches: Overview of SEO Project Implementation

If you look on the About page of my blog, you’ll see that one of the key audiences I am concerned with are search marketers who for one reason or another came late to the game. While I have been doing online products and marketing since 1992 (think about that one for a second…), I did come late to the search marketing party because at the time that these markets evolved I hired people to sweat the details of day-to-day implementation. I was actually pretty knowledgeable and could do many things myself that most CMOs couldn’t do – e.g. develop extensive keyword lists and upload them into the Adwords Editor, write VB scripts – but I was still a long way away from all the intricacies of the practice.

And let’s start with that as my first statement on the science of online marketing: developing online strategies is relatively easy. It is in the details/intricacies of implementation that great search marketers make their bones. Details come in many forms and, in the interest of time, I will not go into categorizing these. We’ll do that at the end of the series on “From the Trenches.” In the meantime, we’ll just work through them for each area that I’ll cover.

The initial portion of this series will focus on Search Engine Optimization, since this is a very hot topic in the current economy.  The approach – given this is a blog – will be to do relatively short modules on one subject within each major topic.  Each module will begin with the name of the section  and then the topic at hand (e.g. Keyword Analysis – Building the Initial Keyword Universe).  I am going to add presentation material in the form of audio powerpoints whuch will provide a bit more extensive coverage of each topic.  How long will the presentations be – not sure yet.  We’ll have to try it out and see – after all, I’m learning how to present in blog mode just as you are learning how to learn in blog mode. 

The sections for basic SEO will run as follows:

  • Introduction to SEO
  • Keyword Analysis
  • Site Architecture Issues
  • On-Page Content and Meta Data
  • Link Building
  • Combining the Basics into an SEO Program

Looking forward to these sessions.  I expect to start them shortly – once I get the presentation technology set up. 

Reblog this post [with Zemanta]
FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare
Posts By Date
July 2014
M T W T F S S
« Jul    
 123456
78910111213
14151617181920
21222324252627
28293031