About Online Matters

Posts Tagged ‘SEO’

PostHeaderIcon Web Site Latency and Performance Issues – Part 6

Taking up where we left off in part 5…

In the last post, we had just moved aboutonlinematters.com to a privately-hosted standalone server and seen a substantial decrease is web site latency. We had seen our ratings improve in Google Page Speed from being better than 43% of similar websites to about 53% of sites. So great improvement. But we were still showing a lot of issues in ySlow and the Google Page Speed tool. These fell into three categories:

  • Server Configuration. This involves optimizing settings on our Apache web server: enabling gzip for file compression, applying entity tags, adding expires headers, turning on keep-alive,  and splitting components across domains.
  • Content Compression. This involves items like compressing images,  javascript, and css, specifying image sizes, and reducing the number of DOM elements.
  • Reducing External Calls. This involves combining all external css and javascript files into a single file, using cookieless domains, minimizing DNS lookups and redirects, as well as optimizing the order and style of scripts.

We decided to attack the web site latency issues in stages, first attacking those elements that were easiest to fix (server configuration) and leaving the most difficult to fix (reducing external calls) until last.

Server Configuration Issues

In their simplest form, server configuration issues related to web site latency have to do with settings on a site’s underlying web server, such as Apache.   For larger enterprise sites, server configuration issues cover a broader set of technical topics, including load balancing across multiple servers and databases as well as the use of a content delivery network.  This section is only going to cover the former, and not the latter, as they relate to web site latency.

With Apache (and Microsoft IIS), the server settings we care about can be managed and tracked through a page’s HTTP headers.  Thus, before we get into the settings we specifically care about, we need to have a discussion of what HTTP headers are and why they are important.

HTTP Headers

HTTP headers are an Internet protocol, or set of rules, for formatting certain types of data and instructions that are either:

  • included in a request from a web client/browser, or
  • sent by the server along with a response to a browser.

HTTP headers carry information in both directions.  A client or browser can make a request to the server for a web page or other resource, usually a file or dynamic output from a server side script.  Alternately, there are also HTTP headers designed to be sent by the server along with its response to the browser or client request.

As SEOs, we care about HTTP headers because our request from the client to the server will return information about various elements of server configuration that may impact web site latency and performance. These elements include:

  • Response status; 200 is a valid response from the server.
  • Date of request.
  • Server details; type, configuration and version numbers. For example the php version.
  • Cookies; cookies set on your system for the domain.
  • Last-Modified; this is only available if set on the server and is usually the time the requested file was last modified
  • Content-Type; text/html is a html web page, text/xml an xml file.

There are two kinds of requests. A HEAD request returns only the header information from the server. A GET request returns both the header information and file content exactly as a browser would request the information. For our purposes, we only care about HEAD requests. Here is an example of a request:

Headers Sent Request
HEAD / HTTP/1.0
Host: www.aboutonlinematters.com
Connection: Close

And here is what we get back in its simplest form using the Trellian FireFox Toolbar :

Response: HTTP/1.1 200 OK
Date: Sun, 04 Apr 2010 00:17:06 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: PHP/5.3.0
X-Pingback: http://www.aboutonlinematters.com/xmlrpc.php
Link: <http://wp.me/DbBZ>; rel=shortlink
Content-Encoding: gzip
Cache-Control: max-age=31536000
Expires: Mon, 04 Apr 2011 00:17:06 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: Chunked
Proxy-Connection: Keep-alive
x-ua-compatible: IE=EmulateIE7

Different tools will return different header information depending on the specific requests made in the calling script. For example, Live HTTP headers, a plugin for FireFox, provides detailed header request and response information for every element on a page (it basically breaks out each GET and shows you the actual response that comes back from the server). This level of detail will prove helpful later when we undertake deep analysis to reduce external server requests. But for now, what is shown here is adequate for the purposes of our analysis.

For a summary of HTTP header requests and response codes, click here .  But for now, let’s get back to configuring our Apache Server to reduce web site latency.

Apache Server Settings Related to Site Latency

Enabling Gzip Compression

Web site latency substantially improves when the amount of data that has to flow between the server and the browser is at a minimum.  I believe I’ve read somewhere that image requests account for 80% of the load time of most web pages, so just following good image-handling protocols for web sites (covered in a later installment) can substantially improve web site latency and page loading times.  However, manually compressing images is painful and time consuming.  Moreover, there are other types of files – Javascript and CSS are the most common – that can also be compressed.

Designers of web servers identified this problem early on and provided a built-in tool on their servers for compressing files moving between the server and the browser.  Starting with HTTP/1.1, web clients indicate support for compression by including the Accept-Encoding header in the HTTP request.

Accept-Encoding: gzip, deflate

If the web server sees this header in the request, it may compress the response using one of the methods listed by the client. The web server notifies the web client of this via the Content-Encoding header in the response.

Content-Encoding: gzip

Gzip remains the most popular and effective compression method. It was developed by the GNU project and standardized by RFC 1952. The only other compression format is deflate, but it’s less effective and less popular.

Gzipping generally reduces the response size by about 70%.  Approximately 90% of today’s Internet traffic travels through browsers that claim to support gzip. If you use Apache, the module configuring gzip depends on your version: Apache 1.3 uses mod_gzip while Apache 2.x uses mod_deflate.

Configuring Entity Tags

Web servers and browsers use Entity tags (ETags) to determine whether the component in the browser’s cache, like an image or script (which are examples of an “entity”) matches the one on the origin server. It is a simple string, surrounded by quotation marks, that uniquely identifies a specific version of the selected component/entity. The origin server specifies the component’s ETag using the ETag response header.

HTTP/1.1 200 OK
Last-Modified: Sun, 04 Apr 2010 00:37:48 GMT
Etag: "1896-bf9be880"
Expires: Mon, 04 Apr 2011 00:37:48 GMT

Later, if the browser has to validate a component, it uses the If-None-Match header to pass the ETag back to the origin server. If the ETags match, a 304 status code is returned.

GET http://www.aboutonlinematters.com/wp-content/plugins/web-optimizer/cache/f39a292fcf.css?1270299922 HTTP/1.1
Host: www.aboutonlinematters.com
If-Modified-Since: Sun, 04 Apr 2010 00:37:48 GMT
If-None-Match: "1896-bf9be880"
HTTP/1.1 304 Not Modified

ETags can impact site latency because they are typically constructed using attributes that make them unique to a specific server. ETags won’t match when a browser gets the original component from one server and later tries to validate that component on a different server, which is a fairly standard scenario on Web sites that use a cluster of servers to handle requests. By default, both Apache and IIS embed data in the ETag that dramatically reduces the odds of the validity test succeeding on web sites with multiple servers. If the ETags don’t match, the web client doesn’t receive the small, fast 304 response that ETags were designed for.  Instead,  they get a normal 200 response along with all the data for the component.  This isn’t a problem for small sites hosted on a single server. But it is a substantial problem for sites with multiple servers using Apache or IIS with the default ETag configuration.  Web clients see higher web site latency, web servers have a higher load,  bandwidth consumption is high, and proxies aren’t caching content efficiently.

So when a site does not benefit from the flexible validation model provided by Etags, it’s better to just remove the ETag altogether. In Apache, this is done by simply adding the following line to your Apache configuration file:

FileETag none

Expires Headers

The Expires header makes any components in an HTTP request cacheable. This avoids unnecessary HTTP requests on any page views after the initial visit because components downloaded during the initial visit, for example images and script files, remain in the browser’s local cache and do not have to be downloaded on subsequent requests. Expires headers are most often used with images, but they should be used on all components including scripts, stylesheets, and Flash components.

Browsers (and proxies) use a cache to reduce the number and size of HTTP requests, making web pages load faster.  The Expires header in the HTTP response tells the client how long a component can be cached. This far future Expires header

Expires: Thu, 15 Apr 2020 20:00:00 GMT

tells the browser that this response won’t be stale until April 15, 2020.

Apache uses the ExpiresDefault directive to set an expiration date relative to the current date. So for example:

ExpiresDefault "access plus 10 years"

sets the Expires date 10 years out from the time of the request.

Using a far future Expires header affects page views only after a user has already visited a site for the first time or when the cache has been cleared. Therefore the impact of this performance improvement depends on how often users hit your pages with a primed cache. In the case of About Online Matters, we still do not get lots of visitors, so you would expect that the impact of this change to the server would have little impact on our performance and, indeed, that proved to be true.

Keep Alive Connections

The Keep-Alive extension to HTTP/1.0 and the persistent connection feature of HTTP/1.1 provide long-lived HTTP sessions which allow multiple requests to be sent over the same TCP connection. What this does is prevent an extra HTTP request/response for every object on a page, and instead allows multiple objects to be requested and retrieved in a single HTTP session.  HTTP requests require a three-way handshake and have built in algorithms for congestion control that restrict available bandwidth on the startup of an HTTP session.  Making multiple requests in a single session reduces the number of times congestion control is invoked.  As a result, in some cases, enabling keep-alive on an Apache server has been shown to result in an almost 50% speedup in latency times for HTML documents with many images.  To enable keep-alive add the following line to your Apache configuration:

KeepAlive On

Is The Configuration Correct?

When I make these various changes to the server configuration, how can I verify they have actually been implemented?  This is where the HTTP headers come into play.  Let’s take a look at the prior response we got from www.aboutonlinematters.com when we made a HEADERS request:

Response: HTTP/1.1 200 OK
Date: Sun, 04 Apr 2010 00:17:06 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: PHP/5.3.0
X-Pingback: http://www.aboutonlinematters.com/xmlrpc.php
Link: <http://wp.me/DbBZ>; rel=shortlink
Content-Encoding: gzip
Cache-Control: max-age=31536000
Expires: Mon, 04 Apr 2011 00:17:06 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: Chunked
Proxy-Connection: Keep-alive
x-ua-compatible: IE=EmulateIE7

The line items in blue show that Gzip, expires headers, and keep-alive switches have been implemented on our server.  ETags won’t show in this set of responses because ETags are associated with a specific entity on a page.  They show instead in tools that provide detailed analysis of HTTP requests and responses, such as Live HTTP Headers or Charles.  No ETags should be visible in an HTTP request or response if FileETag: None has been implemented.

Results

We made changes in two steps.  First we activated Gzip compression, Expires Headers and removed ETags.  These changes made only negligible changes in overall web site latency.  Then we implemented the keep-alive  setting.  Almost immediately, our site latency improved in the Google Page Speed tool from being better than 53% of similar sites to being better than 61%.

We’ll stop there for today and pickup on content compression in the next installment.

FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare

PostHeaderIcon Web Site Latency and Performance Tools

It is back to the blogstone. And once again, I have broken my own rule about writing long posts infrequently. This one is a continuation of my previous posts on improving web site performance. What especially motivated me to go back to this topic was a request I received from Justified on my site performance posts:

My fellow classmates use your blogs as our reference materials. We look out for more interesting articles from your end about the same topic . Even the future updates about this topic would be of great help.

What a nice compliment. I wouldn’t be a very good marketer if I didn’t respect the wishes of my ‘customers.” So, I continue the series on web site performance issues and my saga to improve the performance of this blog. Having said that, a number of things have happened since that last post.

First, as noted in a previous post , I had the opportunity to go to SMX West earlier this month. While there, I attended a session titled “Diagnosing Technical SEO Issues”, with Adam Audette, Patrick Bennett, Gabe Gayhart, and Brian Ussery as the panelists.  One thing I learned is that the term “site performance” has a general usage different than what I am covering here.  Site performance is usually defined as including:

  • How easy a site is to crawl.
  • Infrastructure issues, including URL structures, template coding, directory structures,  and file naming conventions.
  • Latency issues such as html redirects, http headers, image compression, and all the other items I have been covering in this series.

The point is, what this series of posts are about is only one element of site performance which is web site latency and response times as seen by Google and other search engines.  In the future, I will use this technical term in these posts.  I have to decide – for purposes of rankings – whether to change the names of my posts, the URLS,  and all the core meta data to reflect this change or whether I will stay with web site performance as the keyword I want to optimize for. That decision will probably be made based on the keyword search volumes as shown in the Google Adwords Keyword Tool. (Actually I have now changed the keyword I am optimizing for to web site latency as I am testing some theories I have on page optimization in the SERps that has nothing to do with site performance. So it just goes to show…)

Second, as also noted in the third post in this series on web site latency, Google has announced and deployed a new web site performance tool within Google Webmaster Tools, as well as a Firefox/Firebug plugin.  So in order to continue to explore the topic of AboutOnlineMatters site latency, I need to cover that tool.  But then we get into the whole issue of the core set of site performance tools to use for evaluating site latency issues.  We already discussed and showed our results from pingdom’s latency analysis tool, but there are many more, some of them providing similar analysis and, as I was bemused to discover, often providing differing results for the same items.

So what I’ve decided to do is to provide some discussion of web site latency and performance tools and toolbars before we get back to analyzing AboutOnlineMatters, and then I can show how I used the tools to debug my site latency issues.

Here are the tools I plan to cover, and just so you know, I may cover some or all of them in flash/video, which would be a first for this blog.  Although I’m not a big video fan (I can take in more info more quickly by reading), I know many people prefer than format so I want to try and accomodate them along with my current readers.

Tool Function
Charles A desktop application that provies a HTTP proxy / HTTP monitor / Reverse Proxy that enables a developer to view all of the HTTP and SSL / HTTPS traffic between their machine and the Internet. This includes requests, responses and the HTTP headers (which contain the cookies and caching information). A great tool for understanding what calls/requests are being made and how they impact web site latency.
curl [url] curl is a downloadable command line tool for transferring data with URL syntax.
dynamic drive Image Optimizer is a web-based service that lets you easily optimize your gifs, animated gifs, jpgs, and pngs, so they load as fast as possible on your site. It provides images in a range of filesize (for the same size image) by decreasing the DPI of the image. It also easily converts from one image type to another. Upload size limit is 300 kB.
Firebug Firebug is a Firefox plugin that provides a number of tools for developers and technical SEO work, including web site latency and performance analysis. I will cover many of the plugins later, if a get the chance. In the meantime, take a look at this article at webresources depot to find a good list of useful Firebug plugins.
Google Page Speed Page Speed is an open-source Firefox/Firebug add-on that performs several tests on a site’s web server configuration and front-end code. It provides a comprehensive report and score on issues that can effect web site latency, as well as recommendations for improving site latency. This is how Google sees your web site latency and is the first tool you should run to understand if you have web site performance problems from Google’s perspective, which over time will have a larger impact on your rankings.
HttpWatch HttpWatch is a desktop (downloadable) HTTP viewer and debugger that integrates with IE and Firefox to provide seamless HTTP and HTTPS monitoring without leaving the browser window. It is similar in functionality to Charles.
JSMIN JSMin is a Javascript minifier. Basically, it acts as a filter which removes comments and unnecessary whitespace from JavaScript files. It typically reduces filesize by half, resulting in faster downloads. It also encourages a more expressive programming style because it eliminates the download cost of clean, literate self-documentation.JSMIN can be downloaded as a MS-DOS .exe file or as source code that can be compiled.
Live HTTP headers A Firefox toolbar plugin that allows you to view http headers of a page while browsing. Analysis of headers is important to understand if certain key functions/libraries that effect web site latency and performance, like gzip, are active on the web server serving up pages.
Lynx A downloadable text browser that allows you to view your site as the search crawlers do. Also a way of ensuring that people with text-only browsers can use the site – however this is a pretty minimal use nowadays.
NetExport NetExport is a Firebug 1.5 extension that allows exporting all collected and computed data from the Firebug Net panel. The structure of the created file uses HTTP Archive 1.1 (HAR) format (based on JSON)
Dean Edward’s Packer A web-based JavaScript compressor.
Pingdom Full Page Test Pingdom’s Full Page Test is a web-based tool that loads a complete HTML page including all objects (images, CSS, JavaScripts, RSS, Flash and frames/iframes). It mimics the way a page is loaded in a web browser. The load time of all objects is shown visually with time bars.
ShowSlow ShowSlow is an open source tool that helps monitor various web site latency and performance metrics over time. It captures the results of YSlow and Google Page Speed rankings and graphs them, to help you understand how various changes to your site affect its performance. This is a great tool to see how the two tools results compare, but also to understand which items they are analyzing. Showslow can be run from within your Firefox/Firebug toolbar or be installed on your server. Be forewarned, to run it on your toolbar you will need to make some settings changes to the about:config page and your results will show publicly on www.showslow.com.
Site-perf.com Site-Perf.com is another performance analysis tool that visually displays web page load times. It is similar to Pingdom’s Full Page Test Tool, although it provides a little bit more detail and better explanations of what the load times mean. It also has a network performance test tool that is handy in understanding what portion of your web site latency and performance issues are coming from your host rather than from the site – and let me tell you that can be a lifesaver as you watch your performance go from great to lousy to great again. The page test tool provides an accurate, realistic, and helpful estimation of your site’s loading speed. The script fully emulates natural browser behavior downloading your page with all the images, CSS, JS and other files, just like a regular user.
Smush.it Smush.it runs as a web service or as a Firebug plugin that comes with ySlow V2. It uses optimization techniques specific to image format to remove unnecessary bytes from image files. It is a “lossless” tool, which means it optimizes the images without changing their look or visual quality. After Smush.it runs on a web page it reports how many bytes would be saved by optimizing the page’s images and provides a downloadable zip file with the minimized image files. smush
Wave Toolbar The WAVE Toolbar provides button options and a menu that will modify the current web page to reveal the underlying page structure information so you can visualize where web site latency issues may be occurring. It also has a built in text-browser comparable to Lynx.
Web Page Test webpagetest.org is a hosted service that provides a detailed test and review of web site latency and performance issues. It is probably the most complete single tool I have found for getting an overview of what is happening with your website. I like this better than yslow or showslow, but I would still use Google Page Speed Test as that is how googlebot sees web site performance.
Web Developer Toolbar If you do any web work, this is the one must-have plug-in for FireFox. It contains a series of developer tools that let you visualize various web page elements and determine if there are html, css, or javascript errors. This is just one of its many functions.
Webo Site Speedup Webo site speedup deserves special mention. It is actually not so much a tool but a fix. It comes as an installable application for your web server or as a plugin for WordPress or Joomla. There is a free community edition and a premium edition with extra features that runs $99. It performs a range of functions to boost web site latency significantly, including compression of images/css/javascript, combining multiple css or javascript files into a single file, moving javascript to the bottom of the page rather than the top, and minifying javascript, among numerous other functions.
wget wget is a free utility for the non-interactive download of files from the web. It runs in the background (so you can be doing other things) and supports http, https, and ftp protocols, as well as retrieval through http proxies. You can use it, for example, to create a local version of a remote website, fully recreating that site’s directory structure.
Xenu Link Sleuth Xenu Link Sleuth spiders web sites looking for broken links. Link verification is done on ‘normal’ links, images, frames, backgrounds and local image maps. It displays a continously updated list of URLs which you can sort by different criteria.
ySlow ySlow, developed by Yahoo!, is a FireFox/FireBug plugin. It is a general purpose web site latency and performance optimizer. It analyzes a variety of factors impacting web site latency, provides reports, and makes suggestions for fixes. This has been the most commonly used tool for analyzing web site performance until now.
YUI Compressor The YUI Compressor, developed by Yahoo!, is a JavaScript minifier designed to be 100% safe and yield a higher compression ratio than most other tools. It is part of the YUI library. The YUI Library is a set of utilities and controls, written with JavaScript and CSS, for building richly interactive web applications using techniques such as DOM scripting, DHTML and AJAX. YUI is available under a BSD license and is free for all uses.
FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare

PostHeaderIcon Comparing General Purpose SEO Toolbars

Note to readers: click on the link to download a copy of my analysis of general purpose SEO toolbars.

One of my more unusual pet peeves in SEO is the number of general purpose SEO toolbars available for use. Actually not so much the number, but the fact that each one claims to be the best of the best, each one has its champions in the community, and each reports slightly different data for certain metrics. Which one is best for me and in what situations? Which provides the most accurate data?

I have been asked so many times by customers “What is the best general purpose toolbar for Firefox?” that I finally decided to do a detailed comparison, if nothing else to satisfy my curiosity about:

  • how similar the features/functions of these toolbars are.
  • how accurate they were (e.g. did they all get the same numbers for similar analyses).
  • which one was best in which situation.

I could spend days writing reviews of these toolbars, but very few would read these and I doubt the added information would prove all that useful versus downloading and trying the tools. SEOs are, by definition, very much experimenters. They prefer to test rather than read or guess. On the other hand, a high level visual summary, which can provide a sense of the tools coverage as well as its focus/strengths, would probably help those evaluating tools so they can know which features they should explore in which tool.

Ergo the table below, which shows a feature-by-feature comparison of six popular SEO toolbars for Firefox – FoxySEO Tool, SEMOMoz, SEOBook, SEOpen, SEOQuake, and SEO for Firefox. Hopefully the categories and line items are self-explanatory. If not and I get enough comments, then I will add an addendum explaining the line items – but that would be painful and probably not add a lot of value for most of the audience.

No doubt, readers are going to be upset because I didn’t get to their favorite toolbar, and for that my humble apologies. In the process of this research, I found several more general purpose SEO toolbars, like ToolbarBrowser, which need review. I will have to cover those on in a separate post and I will update the downloadable pdf comparing all of the toolbars as I extend the research.

What I found interesting:

  1. There was no single metric/line item that all six toolbars analyzed except the number of backlinks to site in Yahoo! SiteExplorer.
  2. Five of the six tools included metrics for PageRank of the current page, pages in Google Cache, number of backlinks to page in SiteExplorer, DMOZ Entries, Keyword Density, and Meta Tag Analysis. These were the most widely shared metrics.
  3. Each tool has different strengths and weaknesses. For example:
    • SEOBook’s toolbar clearly has a much deeper set of tools for keyword analysis compared to the others.
    • FoxySEO tool has a broader coverage of metrics in the social media, site performance, and indexing domains.
    • SEOQuake and several others provides the set of analyses for every line item in the SERPS, which is very handy for competitive analysis.
    • SEOMoz, on the other hand, has fewer tools but they cover areas not included in other toolbars, as well as including metrics unique to SEOMoz – e.g. MOZRank, MozTrust. The toolbar also links to the SEOMoz Pro tools on the SEOMoz website, where you can run other, deeper analyses. Clearly, its purpose is not to perform all the analysis “on page” but rather to give a high-level view from which further data can be gleaned by using the more detailed SEOMoz tools. However, those tools (and the toolbar) are only available to paid SEOMoz members.
  4. As far as accuracy goes, when checked most of the tools reported consistent data on everything but link-related metrics, where the data tended to vary widely. Yahoo SiteExplorer links probably had the least variation, but when you looked at bookmarks on social media sites or directory entries in DMOZ or Yahoo, you can see results that differ by a factor of 10 in some cases. What is even more interesting, when I manually went to some of these sites and typed in the same query, I actually got a set of results that differed from the data I got in SEOQuake and FoxySEO Tool.

So which toolbar is right for you depends on:

  • What metrics you feel are important to you.
  • What services you subscribe to.
  • How cluttered you like your Firefox screen.
  • How much performance degradation you are willing to tolerate. In the case of tools which provide metrics for every line in the SERPs, the wait time can be substantial.

How Can You Tell Someone is a SEO Tool Addict? Look at Their Firefox.

As you can see from the picture, as a self-confessed tool addict I have all of them running – which explains why I never use Firefox for anything but SEO work. Advantage to Chrome for regular browsing and social media work, at least until all the SEO plugins port over to Chrome.

Category/Feature Foxy SEO Tool SEOMoz SEOBook SEOpen SEOQuake SEO for
Firefox
General
Site Found Date/Age + + +
Google Page PageRank + + + + +
Google Site PageRank +
Google Cache + + + + +
Google Info +
Google Site +
Google Similar Sites +
Quarkbase Info +
Error 404 page check +
robots.txt check + + +
robots.txt viewer +
W3C Validation + +
Site Header Check +
XML Sitemap Checker + +
Links to SEOMoz tools +
Show Nofollow tags + + +
Web Server Type +
Wayback Machine Archives +
Traffic Measures
Alexa + + + + +
Bing +
Google Trends +
Quantcast + +
SEMRush rank +
SEMRush price of CPC +
SEMRush Traffic + +
Compete.com rank + + + + +
Compete.com uniques + + + +
Google Trends +
Search spider simulator +
Site Performance
Network World Response +
Page Loading Test + +
Ping Test +
DNS Test +
Geo Location +
IP Address + + +
IP Search +
IP Neighbors +
My Server Header + +
My IP Information +
Copyscape +
Internet Archive +
WhoIs + + + + +
Proxy View +
Measures of Content Indexing
Google Pages Indexed + + + +
Google Search Domain +
Google Images +
Yahoo Pages Indexed + + + +
Yahoo Search Domain +
Bing Pages Indexed + + +
Bing Images +
Bing Search Domain +
Google Webmaster Tools – Top Searches +
Ask Search Domain +
Link Analysis/Metrics
Google Links + +
Google Webmaster Tools – Backlinks +
Site Explorer Backlinks Site + + + + + +
Site Explorer Backlinks Page + + + + +
Site Explorer for this Site +
Site Explorer for this Page +
Site Explorer .edu links + +
Site Explorer .gov links + +
Site Explorer .mil links +
Bing Links + + +
Bing Site +
Alexa Backlinks to Site +
Blog links +
Majestic SEO Linkdomain + +
Internal links to page + +
mozRank of page +
mozTrust of Page +
Domains linking to site +
Root domains linking to page +
mozrank of subdomain +
Domain mozRank
Domain mozTrust
Social Media Visibility Metrics
Google News +
Google Blog +
Google Groups +
Yahoo News +
Ask News +
Bing News +
Bloglines +
Delicious + + + +
Digg + + + +
Digg Popular Stories + +
MySpace +
Stumbleupon + + +
Technorati + + +
Twitter + + +
Wikipedia + +
Yahoo Answers +
youtube +
Tools to Bookmark in Social Media
Reddit +
Facebook +
Mixx +
MySpace +
Propeller +
Squidoo +
Stumbleupon +
Technorati +
Twitter +
Yahoo! Buzz +
Set Bookmarks in Search Engines
Ask +
AOL +
Google +
Live +
Yahoo! +
Directory Entry Metrics
About.com +
DMOZ + + + + +
Google Directory +
Yahoo! Directory + + + +
Best of the Web + +
Business.com +
Keyword Analysis Tools
Keyword Density Checker + + + + +
Keyword Importance +
Keyword List Generator +
Keyword List Cleaner +
Keyword Highlighter +
Keyword Typo Generator +
Meta Tag Analysis + + + + +
Shows images alt text +
SEMRush Domain Report + +
Domain Name Search +
Google Adwords CPC +
Google Adwords Traffic Estimator +
Google Sponsored Links +
Google Adwords Keyword Tool +
Google Search-based Keyword Tool + +
Google Trends Searches + +
Google Insights Search + +
Google Suggest +
Keyword Discovery +
Quintura
SEOBook Keyword Suggestion Tool +
Wordpot +
Wordtracker + +
Yahoo! Search Results +
Rankings
Keyword Rank Checker +
Site Comparison Tool +
Other
Google Translate +
FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare

PostHeaderIcon SEO: The Long Tail IS Valuable for Small Keyword Markets

Well, I’m back after the holidays. Here I thought the holidays would be slow and I’d get a chance to write every day. Turns out that was a bad assumption. I was busy as heck with clients who needed it “yesterday.” That’s different than the beginning of the year – so maybe that’s a sign the economy is really improving.

Today’s post is about whether the long-tail is useful for companies with small keyword universes. This became of interest to me when a colleague of mine, Steven Ebin, suggested that this strategy was useful even for small search markets, which had not been my experience.

I define a small keyword universe as one that is less than 1,000 core keywords. The short-tail is defined as keywords that represent 60% of traffic (in my experience usually about 10-15 keywords for small universes, although that can vary substantially). The mid-tail is defined as keywords that represent the next 25% and the long-tail is defined as keywords that represent the balance. I find that many small customers have core keyword universes of this size and distribution, although size of the keyword universe really ties to the specific market, not the size of customer. Be that as it may, my experience is that there is a correlation between size of company and size of keyword market.

Let’s assume that the overall number of monthly clicks for a keyword market, including the long-tail search terms, is 2,200,000 visits.

Next, we know that not everyone clicks – so we need to ignore that traffic. Some aged results reported for AOL in the Webmaster World forums suggest that 46% of searchers don’t click through.

We then have to make some assumptions about where we will rank in each of the positional categories. For purposes of this analysis, I’ll assume that the above average SEO can get an average position of 5 for the short-tail terms, a position of 3 on average of the mid-tail terms, and an average position of 2 for the long-tail terms.

We also have to estimate the clickthrough rates for results in each of these positions in the SERPs. This data varies widely. One study from Cornell University suggests that 56% of searchers who click click on the first result. Some calculations by Jay Geiger suggest that 42.3% of searchers who click click on the first result, and some reported statistics for AOL from the prior-mentioned source show the same number as 23%.

clickthrough rates based on position in the serps

Clickthrough Rates Based on Position in Search Results

We also have to make some assumptions about conversion rates. Estimating conversion rates is hard because it can vary so much by industry, but let’s assume we have a 0.5% conversion rate for our short-tail terms, 1% for our mid-tail terms, and 3% for our long-tail terms. This gives us a weighted-average conversion rate (based on the percentages in each part of the tail) of 1.0%, which is not unrealistic for organic traffic and may even be low. If you play with these numbers, you will see that a reasonable variation in conversion rate assumption on the long-tail doesn’t change the results of the analysis.

Why increase the conversion rate so much for the long tail? Long-tail searches are more “perfected.” The fact that people type in more specific keyword strings indicates that they are further down the decision-making cycle, and thus the conversion rates are significantly higher. The “spread” that I have assumed for the model is actually conservative. We have often seen conversion rate differentials between the short- and long-tail that are even greater.

When we run out the numbers, using all three sets of clickthrough rates we get the following results:

Results of Analysis on Long-Tail Keywords

Results of Analysis on Long-Tail Keywords

This analysis shows that the long-tail can potentially double the business for a small company.

We can also look at this same analysis using the length of the keyword string. How does keyword string length relate to location in the tail? Are long keyword strings (3+ words) equivalent to the “long tail”? The answer is “no” – in fact the ratios of searches for length of tail versus length of keyword string are almost the inverse (60/25/15 vs 20/24/56). However, since we could also segment our keyword universe based on length of keyword string and develop a traffic strategy based on that, let’s look at the analysis that way.

Hitwise completed research on the percentage of searches based on number of terms in the keyword string in January 2009 with these results:

Searches By Number of Terms from Hitwise January 2009

Searches By Number of Terms from Hitwise January 2009

It you add the searches with 3+ terms in the keyword, you get 56.06% – so a pretty substantial amount of traffic, but not as high as the 70% I have heard from others.

We also need to adapt the conversion rates to maintain an average 1% across all three categories, so that the analysis between the two approaches is “apples-to-apples.” When you run the analysis based on length of keyword string with a conversion rate to get the same number of conversions as in the prior case, you get the following results:

Results of Long Tail Analysis for Small Keyword Markets

Results of Analysis for Small Keyword Markets, By String Length

In this case, the results are even more dramatic, with the number of conversions for keywords with three+ terms dwarfing the one- and two-keyword strings. Even if you take the conversion rate for the three+ keywords down to 0.8% (the same as for two keyword search strings), the conversions are still almost double what is in the one-and two-keyword string categories combined.

So the answer to the question is an absolute “Yes” – the long-tail can be a very valuable source of business even for small keyword universes.

FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare

PostHeaderIcon Search Engines: Social Media, Author Rank and SEO

In my previous discussions of social media, channel architectures, and branding, I discussed the fact that I am manic about locking down my online brand (onlinematters) because there seems to be some relationship in the universal search engines between the number of posts/the number of sites that I post from under a specific username and how my posts rank.  It is as if there is some measure of trust given to an author the more he publishes from different sites and the more people see/read/link to what he has written.  I am not talking about authority given to the actual content written by the author – that is the core of search.  I am talking instead about using the author's behavior and success as a content producer to change where his content ranks for any given search result on a specific search term.  It is similar, in many ways, to what happened in the Vincent release where brand became a more important ranking factor.  In this case, the author and the brand are synonymous and when the brand is highly valued, then those results would, under my hypothesis, be given an extra boost in the rankings.

This was an instinct call, and while I believed I had data to support the theory, I had no research to prove that perhaps an underlying algorithm had been considered/created to measure this phenomenon in universal search. 

I thus considered myself twice lucky while doing my weekly reading on the latest patents to find one that indicates someone is thinking about the issue of "author rank."  On October 29th, Jaya Kawale and Aditya Pal of Yahoo!  applied for a patent with the name "Method and Apparatus for Rating User Generated Content in Search Results."  The abstract reads as follows:

Generally, a method and apparatus provides for rating user generated content (UGC) with respect to search engine results. The method and apparatus includes recognizing a UGC data field collected from a web document located at a web location. The method and apparatus calculates: a document goodness factor for the web document; an author rank for an author of the UGC data field; and a location rank for web location. The method and apparatus thereby generates a rating factor for the UGC field based on the document goodness factor, the author rank and the location rank. The method and apparatus also outputs a search result that includes the UGC data field positioned in the search results based on the rating factor.

Let's see if we can't put this into English comprehensible to the common search geek.  Kawale and Pal want to collect data on three specific ranking factors and to combine these into a single, weighted ranking factor, that is then used to influence rank ordering based on  what they term "User Generated Content" or UGC.  The authors note that typical ranking factors in search engines today are not suitable foir ranking UGC.  UGC are fairly short, they generally do not have links to or from them (rendering the back-link based analysis unhelpful) and spelling mistakes are quite common.  Thus a new set of factors is needed to adequately index and rank content from UGC.

The first issue the patent/algorithm has to deal with is defining what the term UGC includes.  The patent specifically mentions "blogs, groups, public mailing lists, Q & A services, product reviews, message boards, forums and podcasts, among other types of content." The patent does not specifically mention social media sites, but those are clearly implied. 

The second issue is to determine what sites should be scoured for UGC.  UGC sites are not always easy to identify.  An example would be a directory in which people rank references based on 5-star rating, where that is the only user input.  Is this site easy to identify as a site with UGC?  Not really, but somehow the search engine must make a decision whether this site is within its valid universe.  Clearly, some mechanism for categorizing sites with UGC needs to exist and while Kawale and Pal use the example of blog search as covering a limited universe of sites, their patent does not give any indication of how sites are to be chosen for inclusion in the crawl process.

Now we come to the ranking factors.  The three specific ranking factors proposed by Kawale and Pal are:

  • Document Goodness.  The Document Goodness Factor is based on at least one (and possibly more) of the following attributes of the document itself: a user rating; a frequency of posts before and after the document is posted; a document's contextual affinity with a parent document; a page click/view number for the document; assets in the document; document length; length of a thread in which the document lies; and goodness of a child document. 
  • Author Rank.  The Author Rank is a measure of the author's authority in the social media realm on a subject, and is based on on or more of the following attributes:  a number of relevant posted messages; a number of irrelevant posted messages; a total number of root documents posted by the author within a prescribed time period; a total number of replies or comments made by the author; and a number of groups to which the author is a member.
  • Location Rank.  Location Rank is a measure of the authority of the site in the social media realm.  It can be based on one or more of the following attributes: an activity rate in the web location; a number of unique users in the web location; an average document goodness factor of documents in the web location; an average author rank of users in the web location; and an external rank of the web location.

These ranking factors are not used directly as calculated.  They are "normalized" for elements like document length and then combined in some mechanism to create a single UGC ranking factor. 

The main thing to note – and the item that caught my attention, obviously – is Author Rank.  Note that is has ranking factors that correspond with what I have been hypothesizing exist in the universal search engines.  That is to say, search results are not ranked only by the content on the page, but by the authority of the author who has written them, as determined by how many posts that author has made, how many sites he has made them on, how many groups he or she belongs to, and so on.

Can I say for certain that any algorithm like this has been implemented?  Absolutely not.  But my next task has to be to design an experiment to see if we can detect a whiff of it in the ether.  I'll keep you informed.

FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare
Posts By Date
July 2014
M T W T F S S
« Jul    
 123456
78910111213
14151617181920
21222324252627
28293031