About Online Matters

PostHeaderIcon SEO: The Long Tail IS Valuable for Small Keyword Markets

Well, I’m back after the holidays. Here I thought the holidays would be slow and I’d get a chance to write every day. Turns out that was a bad assumption. I was busy as heck with clients who needed it “yesterday.” That’s different than the beginning of the year – so maybe that’s a sign the economy is really improving.

Today’s post is about whether the long-tail is useful for companies with small keyword universes. This became of interest to me when a colleague of mine, Steven Ebin, suggested that this strategy was useful even for small search markets, which had not been my experience.

I define a small keyword universe as one that is less than 1,000 core keywords. The short-tail is defined as keywords that represent 60% of traffic (in my experience usually about 10-15 keywords for small universes, although that can vary substantially). The mid-tail is defined as keywords that represent the next 25% and the long-tail is defined as keywords that represent the balance. I find that many small customers have core keyword universes of this size and distribution, although size of the keyword universe really ties to the specific market, not the size of customer. Be that as it may, my experience is that there is a correlation between size of company and size of keyword market.

Let’s assume that the overall number of monthly clicks for a keyword market, including the long-tail search terms, is 2,200,000 visits.

Next, we know that not everyone clicks – so we need to ignore that traffic. Some aged results reported for AOL in the Webmaster World forums suggest that 46% of searchers don’t click through.

We then have to make some assumptions about where we will rank in each of the positional categories. For purposes of this analysis, I’ll assume that the above average SEO can get an average position of 5 for the short-tail terms, a position of 3 on average of the mid-tail terms, and an average position of 2 for the long-tail terms.

We also have to estimate the clickthrough rates for results in each of these positions in the SERPs. This data varies widely. One study from Cornell University suggests that 56% of searchers who click click on the first result. Some calculations by Jay Geiger suggest that 42.3% of searchers who click click on the first result, and some reported statistics for AOL from the prior-mentioned source show the same number as 23%.

clickthrough rates based on position in the serps

Clickthrough Rates Based on Position in Search Results

We also have to make some assumptions about conversion rates. Estimating conversion rates is hard because it can vary so much by industry, but let’s assume we have a 0.5% conversion rate for our short-tail terms, 1% for our mid-tail terms, and 3% for our long-tail terms. This gives us a weighted-average conversion rate (based on the percentages in each part of the tail) of 1.0%, which is not unrealistic for organic traffic and may even be low. If you play with these numbers, you will see that a reasonable variation in conversion rate assumption on the long-tail doesn’t change the results of the analysis.

Why increase the conversion rate so much for the long tail? Long-tail searches are more “perfected.” The fact that people type in more specific keyword strings indicates that they are further down the decision-making cycle, and thus the conversion rates are significantly higher. The “spread” that I have assumed for the model is actually conservative. We have often seen conversion rate differentials between the short- and long-tail that are even greater.

When we run out the numbers, using all three sets of clickthrough rates we get the following results:

Results of Analysis on Long-Tail Keywords

Results of Analysis on Long-Tail Keywords

This analysis shows that the long-tail can potentially double the business for a small company.

We can also look at this same analysis using the length of the keyword string. How does keyword string length relate to location in the tail? Are long keyword strings (3+ words) equivalent to the “long tail”? The answer is “no” – in fact the ratios of searches for length of tail versus length of keyword string are almost the inverse (60/25/15 vs 20/24/56). However, since we could also segment our keyword universe based on length of keyword string and develop a traffic strategy based on that, let’s look at the analysis that way.

Hitwise completed research on the percentage of searches based on number of terms in the keyword string in January 2009 with these results:

Searches By Number of Terms from Hitwise January 2009

Searches By Number of Terms from Hitwise January 2009

It you add the searches with 3+ terms in the keyword, you get 56.06% – so a pretty substantial amount of traffic, but not as high as the 70% I have heard from others.

We also need to adapt the conversion rates to maintain an average 1% across all three categories, so that the analysis between the two approaches is “apples-to-apples.” When you run the analysis based on length of keyword string with a conversion rate to get the same number of conversions as in the prior case, you get the following results:

Results of Long Tail Analysis for Small Keyword Markets

Results of Analysis for Small Keyword Markets, By String Length

In this case, the results are even more dramatic, with the number of conversions for keywords with three+ terms dwarfing the one- and two-keyword strings. Even if you take the conversion rate for the three+ keywords down to 0.8% (the same as for two keyword search strings), the conversions are still almost double what is in the one-and two-keyword string categories combined.

So the answer to the question is an absolute “Yes” – the long-tail can be a very valuable source of business even for small keyword universes.

  • Facebook
  • Twitter
  • FriendFeed
  • StumbleUpon
  • Delicious
  • Digg
  • LinkedIn
  • Multiply
  • Blogger Post
  • Ping
  • Diigo
  • Google Reader
  • MySpace
  • Plaxo Pulse
  • Sphinn
  • Technorati Favorites
  • Tumblr
  • WordPress
  • Share/Bookmark

PostHeaderIcon Technical SEO: Analyzing Site Loading Times

Well, it’s after Thanksgiving and I finally get back to the blog. Feels good. This is the next installment about site performance analysis and how to deal with a site with worrisomely slow page loading times. It turns out I had a case study right under my nose. This site, the OnlineMatters blog. Recently, I showed a client my site and watched as it loaded, and loaded and….loaded. I was embarassed but also frustrated. I had just finished my pieces on site performance and knew that this behavior was going to cause my rankings in the SERPs to drop, even before Google releases Caffeine. While I am not trying to publish this blog to a mass audience – to do that I would need to write every day – I still wanted to rank well on keywords I care about. Given what I do, it’s an important proof point for customers and prospects.

So I am going to take advantage of this negative and turn it into a positive for you. You will see how to perform various elements of site analysis by watching me debug this blog in near real time. Yesterday, I spent three hours working through the issues, and I am not done yet. So this first piece will take us about halfway there. But even now you can learn a lot from my struggles.

The first step was to find out just how bad the problem was. The way to do this is to use Pingdom’s Full Page Analysis tool. This tool not only tests page loading speeds but also visualizes which parts of the page are causing problems. An explanation of how to use the tool can be found here, and you should read it before trying to interpret the results for your site. Here is what I got back when I ran the test:

aboutonlinematters site performance test results in pingdom

A load time of 11.9 seconds? Ouch!  Since Pingdom runs this on their servers, the speed is not influenced by my sometime unpredictably slow Comcast connection.

Pingdom shows I had over 93 items loading with the home page of which the vast majority were images (a partial listing is shown below).  There were several (lines 1, 36, 39, 40, 41, 54) where a significant part of the load time was occurring during rendering (that is, after the element had been downloaded into the browser).  This is indicated by the blue part of the bar.  But in the majority of cases, the load time was mainly caused by the time it took from either the first request to the server until the content began downloading (the yellow portion of the bar), or from the time of downloading to the time rendering began (the green portion).  This suggested that

  1. I had too big a page, because the download time for all the content to the browser was very long.
  2. I might have a server bandwidth problem.

But rather than worrying about item 2, which would require a more extensive fix – either an upgrade in service or a new host – I decided to see how far I could get with some simple site fixes.

pingdom details for www.aboutonlinematters site performance analysis

 

The first obvious thing to fix was the size of the home page, which was 485 KB – very heavy.  I tend to write long (no kidding?) and add several images to posts, so it seemed only natural to reduce the number of entries on my home page below the 10 I had it set for.  I set the allowable number in Wordpress to 5 entries, saved the changes, and ran the test again. 

Miracle of miracles: My page now weighed 133 KB (respectable), had 72 total objects, and downloaded in six seconds.  That was a reduction in load time by almost 50% for one simple switch.  

improved site performance for aboutonlinematters.com

Good, but not great.  My page loading time was still 6 seconds – it needed to be below 2.  So more work was needed.

If you look at the picture above, you can just make out that some of the slowest loading files – between 4 and 6 of them  – were .css or Javascript files.  Since these are files that are part of Wordpress, I chose to let them go for the moment and move onto the next obvious class of files – images.  Since images usually represent 80% of page loading times, this was the next obvious place to look.   There were between 6 and 10 files –  mainly .png files – that were adding substantially to download times.  Most of these were a core portion of the template I was using (e.g. header.png).  So they effected the whole site and, more importantly, they had been part of the blog before I ever made one entry.  The others were the icons in my Add-to-Any toolbar, which also showed on every post on the site.

Since I developed the template myself using Artisteer when I was relatively new to Wordpress, I hypothesized that an image compression tool might make a substantial improvement for little effort. 

Fortunately, the ySlow Firefox plugin, which is a site performance analyzer we will examine in my next entry, contains smushit, an image compression tool created by Yahoo! that is easy to use, identifies and shows just how much bandwidth it saves, produces all compressed files at the push of a single button, and produces excellent output quality.

So I ran the tool (I sadly did not keep a screenshot of the first run, but a sample output is below), and Smushit reduced image sizes overall by about 8%, and significantly compressed the size of the template elements.  So I downloaded the smushed images and uploaded them to the site

image compression output for site performance from smushit

 

As you can see below – my home page was now 89.8 KB, but my load time had increased to 8.8 seconds! – and note on the right of the image that several prior runs confirmed the earlier 6 second load time. So either compression did not help or some other factor was at play. 

The fact is the actual rendering times had basically reduced from measurable amounts (e.g. 0.5 seconds) to milliseconds – so the actual file sizes had improved rendering performance.  Download times had increased – once again pointing to my host.  But before  going there, i wanted to see if there were any other elements on the site I could manipulate to improve peformance.

More in next post.  BTW, as I go to press this am, my site speed was 5.1 seconds – a new, positive record.  Nothing has changed – so more and more I’m suspecting my ISP and feeling I need a virtual private server.

NOTE: Even more important: as I go to press Google has just announced that it is adding a site performance tool to Google Webmaster Tools in anticipation of site performance becoming a ranking factor.

 

  • Facebook
  • Twitter
  • FriendFeed
  • StumbleUpon
  • Delicious
  • Digg
  • LinkedIn
  • Multiply
  • Blogger Post
  • Ping
  • Diigo
  • Google Reader
  • MySpace
  • Plaxo Pulse
  • Sphinn
  • Technorati Favorites
  • Tumblr
  • WordPress
  • Share/Bookmark

PostHeaderIcon Search Engines: Social Media, Author Rank and SEO

In my previous discussions of social media, channel architectures, and branding, I discussed the fact that I am manic about locking down my online brand (onlinematters) because there seems to be some relationship in the universal search engines between the number of posts/the number of sites that I post from under a specific username and how my posts rank.  It is as if there is some measure of trust given to an author the more he publishes from different sites and the more people see/read/link to what he has written.  I am not talking about authority given to the actual content written by the author – that is the core of search.  I am talking instead about using the author's behavior and success as a content producer to change where his content ranks for any given search result on a specific search term.  It is similar, in many ways, to what happened in the Vincent release where brand became a more important ranking factor.  In this case, the author and the brand are synonymous and when the brand is highly valued, then those results would, under my hypothesis, be given an extra boost in the rankings.

This was an instinct call, and while I believed I had data to support the theory, I had no research to prove that perhaps an underlying algorithm had been considered/created to measure this phenomenon in universal search. 

I thus considered myself twice lucky while doing my weekly reading on the latest patents to find one that indicates someone is thinking about the issue of "author rank."  On October 29th, Jaya Kawale and Aditya Pal of Yahoo!  applied for a patent with the name "Method and Apparatus for Rating User Generated Content in Search Results."  The abstract reads as follows:

Generally, a method and apparatus provides for rating user generated content (UGC) with respect to search engine results. The method and apparatus includes recognizing a UGC data field collected from a web document located at a web location. The method and apparatus calculates: a document goodness factor for the web document; an author rank for an author of the UGC data field; and a location rank for web location. The method and apparatus thereby generates a rating factor for the UGC field based on the document goodness factor, the author rank and the location rank. The method and apparatus also outputs a search result that includes the UGC data field positioned in the search results based on the rating factor.

Let's see if we can't put this into English comprehensible to the common search geek.  Kawale and Pal want to collect data on three specific ranking factors and to combine these into a single, weighted ranking factor, that is then used to influence rank ordering based on  what they term "User Generated Content" or UGC.  The authors note that typical ranking factors in search engines today are not suitable foir ranking UGC.  UGC are fairly short, they generally do not have links to or from them (rendering the back-link based analysis unhelpful) and spelling mistakes are quite common.  Thus a new set of factors is needed to adequately index and rank content from UGC.

The first issue the patent/algorithm has to deal with is defining what the term UGC includes.  The patent specifically mentions "blogs, groups, public mailing lists, Q & A services, product reviews, message boards, forums and podcasts, among other types of content." The patent does not specifically mention social media sites, but those are clearly implied. 

The second issue is to determine what sites should be scoured for UGC.  UGC sites are not always easy to identify.  An example would be a directory in which people rank references based on 5-star rating, where that is the only user input.  Is this site easy to identify as a site with UGC?  Not really, but somehow the search engine must make a decision whether this site is within its valid universe.  Clearly, some mechanism for categorizing sites with UGC needs to exist and while Kawale and Pal use the example of blog search as covering a limited universe of sites, their patent does not give any indication of how sites are to be chosen for inclusion in the crawl process.

Now we come to the ranking factors.  The three specific ranking factors proposed by Kawale and Pal are:

  • Document Goodness.  The Document Goodness Factor is based on at least one (and possibly more) of the following attributes of the document itself: a user rating; a frequency of posts before and after the document is posted; a document's contextual affinity with a parent document; a page click/view number for the document; assets in the document; document length; length of a thread in which the document lies; and goodness of a child document. 
  • Author Rank.  The Author Rank is a measure of the author's authority in the social media realm on a subject, and is based on on or more of the following attributes:  a number of relevant posted messages; a number of irrelevant posted messages; a total number of root documents posted by the author within a prescribed time period; a total number of replies or comments made by the author; and a number of groups to which the author is a member.
  • Location Rank.  Location Rank is a measure of the authority of the site in the social media realm.  It can be based on one or more of the following attributes: an activity rate in the web location; a number of unique users in the web location; an average document goodness factor of documents in the web location; an average author rank of users in the web location; and an external rank of the web location.

These ranking factors are not used directly as calculated.  They are "normalized" for elements like document length and then combined in some mechanism to create a single UGC ranking factor. 

The main thing to note – and the item that caught my attention, obviously – is Author Rank.  Note that is has ranking factors that correspond with what I have been hypothesizing exist in the universal search engines.  That is to say, search results are not ranked only by the content on the page, but by the authority of the author who has written them, as determined by how many posts that author has made, how many sites he has made them on, how many groups he or she belongs to, and so on.

Can I say for certain that any algorithm like this has been implemented?  Absolutely not.  But my next task has to be to design an experiment to see if we can detect a whiff of it in the ether.  I'll keep you informed.

  • Facebook
  • Twitter
  • FriendFeed
  • StumbleUpon
  • Delicious
  • Digg
  • LinkedIn
  • Multiply
  • Blogger Post
  • Ping
  • Diigo
  • Google Reader
  • MySpace
  • Plaxo Pulse
  • Sphinn
  • Technorati Favorites
  • Tumblr
  • WordPress
  • Share/Bookmark
Let's Chat
Posts By Date
March 2010
M T W T F S S
« Jan    
1234567
891011121314
15161718192021
22232425262728
293031