Posts Tagged ‘Google’
Technical SEO: Site Loading Times and SEO Rankings Part 2
In my last post, I discussed the underlying issues regarding site loading times and SEO rankings. What I tried to do was help the reader understand why site loading times are important from the perspective of someone designing a search engine that has to crawl billions of pages. The post also outlines a few of the structures that they would have to put in place to accurately and effectively crawl all the pages they need in a limited time with limited processing power. I also tried to show that a search engine like Google has a political and economic agenda in ensuring fast sites, not just a technical agenda. Google wants as many people/eyeballs on the web as possible, so it is to their advantage to ensure that web sites provide a good user experience. As a result, they feel quite justified in penalizing sites that do not have good speed/performance characteristics.
As you would expect, the conclusion is that if your site is hugely slow you will not get indexed and will not rank in the SERPs. What is "hugely slow"? Google has indicated that slow is a relative notion and is determined based on the loading times typical of sites in your geographical region. Having said that, relative or not, from an SEO perspective I wouldn't want to have a site where pages are taking more than 10 seconds on average to load. We have found from the sites we have tested and built that average load times higher than approximately 10 seconds to completely load a page will have a significant impact on being indexed. From a UE perspective, there is some interesting data that the limit on visitors patience is about 6-8 seconds. Google has studied this data, so it would probably prefer to set its threshhold in that region. But I doubt it can. Many small sites are not that sophisticated, do not know these kinds of rules, and do not know how to check or evaluate their site loading times. Besides this, there are often problems with hosts that cause servers to run slowly at times. Google has to take that into account, as well. So I believe that the timeout has to be substantially higher than 6-8 seconds, but 10 seconds as a crawl limit is a guess,
I have yet to see a definitive statement by anyone as to what the absolute limit is for site speed before indexing ceases altogether (if you have a reference, please post it in the comments). I'm sure that if a bot comes to a first page and it exceeds the bot's timeout threshold in the algorithm, your site won't get spidered at all. But once the bot gets by the first page, it has to do an on-going computation of average page loading times for the site to determine if the average exceeds the built-in threshold, so at least a few pages would have to be crawled in that case.
Now here's where it gets interesting. What happens between fast (let's say < 1-2 second loading times, although this is actually pretty slow but a number Matt Cutts in the video below indicates is ok) and the timeout limit? And how important is site speed as a ranking signal? Let's answer one question at a time.
When a site is slow but not slow enough to hit any built-in timeout limits (not tied to the number of pages), a couple of things can happen. We do know that Google allocates bot time by the number of pages on the site and the number of pages it has to index/re-index. So for a small site that performs poorly, it is likely that most of the pages will get indexed. Likely, but not a guarantee. It all depends on the cumulative time lag versus the average that a site creates. If a site is large, then you can almost guarantee that some pages will not be indexed, as the cumulative time lag will ultimately hit the threshold set by the bots for a site of that number of pages. By definition, some of your content will not get ranked and you will not get the benefit of that content in your rankings.
As an aside, by the way, there has been a lot of confusion around the <meta name="revisit-after"> tag. The revisit-after
This tag supposedly tells the bots how often to come back to the site to reindex this specific page (in this case 5 days). The idea is that you can improve the crawlability of your site by telling the bots not to index certain pages all the time, but only some of the time. I became aware of this tag at SMX East, when one of the "authorities" on SEO mentioned it as usable for this purpose. The trouble is that, from everything I have read, the tag is completely unsupported by any of the major engines, and was only supported by one tiny search engine (SearchBC) many years ago.
But let's say you are one of the lucky sites where the site runs slowly but all the pages do get indexed. Do Google or any of the other major search engines use the site's performance as a ranking signal? In other words, all my pages are in the index. So you would expect that they would be ranked based on the quality of their content and their authority derived from inbound links, site visits, time-on-site, and other typical ranking signals. Performance is not a likely candidate for a ranking signal and isn't important.
If you thought that, then you were wrong. Historically, Google has said, and Matt Cutts reiterates this in the video below, that site load times do not influence search rankings. But while that may be true now, it may not be in the near future. And this is where Maile's comments took me by surprise. In a small group session at SMX East 2009, Maile was asked about site performance and rankings. She indicated that for the "middle ground" sites that are indexing but loading slowly, site performance may already be used to influence rankings. Who is right, I can't say. These are both highly respected professionals who choose their words carefully.
Whatever is true, Google is sending us signals that this change is coming. Senior experts like Matt and Maile don't say these things lightly. They are well considered and probably approved positions that they are asked to take. This is Google's way of preventing us from getting mad when the change occurs. Google has the fallback of saying "we warned you this could happen." Which from today's viewpoiint means it will happen.
Conclusion: Start working on your site performance now, as it will be important for SEO rankings later.
Oh and, by the way, your user experience will just happen to be better, which is clearly the real reason to fix site performance.
And it isn't only Google that may make this change. Engineers from Yahoo! recently filed a patent with the title "Web Document User Experience Characterization Methods and Systems" which bears on this topic. Let me quote paragraph 21:
With so many websites and web pages being available and with varying hardware and software configurations, it may be beneficial to identify which web documents may lead to a desired user experience and which may not lead to a desired user experience. By way of example but not limitation, in certain situations it may be beneficial to determine (e.g., classify, rank, characterize) which web documents may not meet performance or other user experience expectations if selected by the user. Such performance may, for example, be affected by server, network, client, file, and/or like processes and/or the software, firmware, and/or hardware resources associated therewith. Once web documents are identified in this manner the resulting user experience information may, for example, be considered when generating the search results.
In does not appear Yahoo! has implemented any aspect of this patent yet, and who knows what the Bing agreement will mean for site performance and search. But clearly this is a "problem" that the search engine muftis have set their eyes on and I would expect that if Google does implement it, others will follow.
The Economics of Twitter for Advertisers, Part 2
Let’s continue our discussion of Twitter economics.
The average Twitterer has 549 followers. Now this is skewed by corporate accounts (e.g. like our travel sites) and news sites that have a very large number of followers. I have gone through a number of accounts to determine what seems like a realistic average number to use – and I am going to assume 200 followers. Our experience is that for the first generation of followers, 10% pass along an offer (the theory of this is also quite enlightening but I will not cover it here). For subsequent generations it is much lower, usually in the 2-5% range. We mentioned previously that 15,000 is the average number of followers for the Big 3 sites (Expedia, Orbitz, Travelocity). The calculation therefore looks something like the following:
15,000 (followers)
+ (15,000 * 200 *.1) = 300,000 (first generation pass along)
+ (300,000 * .02 * 200) = 1,200,000 (second generation pass along)
= 1,515,000 (total number of individuals)
The number of impressions is then this base of 1,515,000 multiplied by the number of offers “seen”. Expedia seems to be making offers every five minutes, as does Hotwire (they must have set up some kind of automated feed into their Twitter accounts). Travelocity and Orbitz seem to be making offers once a day (or even less). The big unknown is how many offers does the average follower actually see? They aren’t always online, or if online, they are doing other things and their attention is not focused on Twitter. Or they are on Twitter, but the offer doesn’t register through the noise of all the other tweets. Without any really good data, I will assume that each individual “sees” two offers/month – which I hope is a conservative number.
This means that the total number of impressions is: 1,515,000 * 24 = 36,360,000 per year
Given this number of impressions, what is the potential economic impact for Expedia, Orbitz, and Travelocity? Typical conversion rates on these sites runs 3-5% according to various published data I have seen. But, this is not a situation where someone has either typed in a keyword or clicked on an ad that appears when a keyword is typed in. This is much more of a grazing situation. Many offers are made, but only a few are relevant to any specific individual. So the response rates look more like email, and yet they are even smaller. Why? Because while the first generation is signed up to receive notifications (parallel notion to an email, in this case), the second and third generation are not. Our first benchmark is therefore an email conversion rate from the initial mailing – which is calculated as follows (I am ignoring losses due to bad addresses, since that is not an issue for online accounts - although see below for a related issue of dormant accounts):
# of impressions * open rate * conversion rate
Typical average open rates for good emailings are 10-12%, and conversion rates vary but let’s assume 2%, which is a number that comes from my experience with emailings. That would yield the equivalent of a .2% conversion rate for the first generation. But for the second and third generations, the response would be substantially smaller, maybe .1% or even as low as .05%. Since the first generation is such a small number of individuals, I will use .1% as the conversion rate for the entire base of impressions.
The last pieces of data we need are the number of tickets purchased, the number of purchases per individual in a year, and the average revenue to the travel agency from each ticket purchased. Again, I am going to use data that is fairly well known in the travel business. These are gross averages and do not take into account a number of variables, such as the type of travel (business vs. personal), destination (domestic vs.international), and type of flier (managed vs. unmanaged)
Number of trips per year: 2
Average number of tickets purchased/trip: 2.2
Avg revenue per ticket to agency: $25
So now let’s do the annual revenue calculation for the economic impact of Twitter for a large online travel agency:
36,360,000 * .001 *2 *2.2*25 = $3,999,600
For a big travel agencies, which have around $1B in annual revenue, this is small (.4% of revenue) but it isn’t chump change either.
Before I close, one other issue needs to be explored – and that is the issue of dormant accounts. The model presented assumes that every individual who is following or who receives a retweet or direct message is an “active” Twitter user. But as we all know, many from our own experience, you may set up a Twitter account and then never go back to it. Or you may visit it only rarely. I call these dormant accounts. There has been a lot written on this topic – just type “dormant twitter accounts” into Google. Nicholas Carlson recently wrote a post for BusinessInsider.com titled “60% Of Twitter Users Quit After A Month“. Carlson cites Oprah (@oprah) as an example of someone who has become “bored” with Twitter and reports that Nielsen Online estimates that 60% of Twitter users quit after a month. The post goes on to say that the 60% number may be misleading as Nielson only measures Twitter usage based off Twitter.com and not from mobile use or apps like TweetDeck. Given this data is pretty consistent with other social media sites, and the fact that a lot of tweets happen off of twitter.com, I think we can safely assume that the dormancy rate for Twitter is 50%.
In this case, our approximately $4mm in annual revenue has now become $2mm in annual revenue.
Not huge, but I think we could say that the ROI on the costs associated with maintaining a corporate Twitter account for this purpose are probably pretty spectacular.
I do not doubt that this post will cause a lot of discussion/controversy (at least I hope it will), and I look forward to all feedback.
Matt Cutts, Nofollow, and the Consistently Inconsistent
I have avoided (like the plague) weighing in on the tempest Matt Cutts unleashed at SMX Advanced in June regarding Google’s change to the use of the <nofollow> tag for PageRank sculpting. I have avoided it for two reasons:
- In my mind, more has been made of it than its true impact on people’s rankings.
- As far as I’m concerned, in general (and note those two words) the use of the <nofollow> tag is a last resort and a crutch for less than optimal internal cross-linking around thematic clusters. When internal cross-linking is done right, I don’t believe the use of the <no follow> tag is that impactful.
Bruce Clay had a great show on Webmaster Radio on the subject of the <nofollow> controversy, and basically he was of the same opinion as me. There are also many more heavyweights who have weighed in than I care to name. So adding my comments to the mix isn’t all that helpful to my readers or the SEO community generally.
But I was searching today for some help on undoing 301 redirects when I found this section on the SEOMoz blog (click here for the whole article) from 2007 that provides some historical context for these conversations – so I thought I’d share it here. My compliments to Rand Fiskin of SEOMoz for reproduction of this content:
“2.Does Google recommend the use of nofollow internally as a positive method for controlling the flow of internal link love?
A) Yes – webmasters can feel free to use nofollow internally to help tell Googlebot which pages they want to receive link juice from other pages
(Matt’s precise words were: The nofollow attribute is just a mechanism that gives webmasters the ability to modify PageRank flow at link-level granularity. Plenty of other mechanisms would also work (e.g. a link through a page that is robot.txt’ed out), but nofollow on individual links is simpler for some folks to use. There’s no stigma to using nofollow, even on your own internal links; for Google, nofollow’ed links are dropped out of our link graph; we don’t even use such links for discovery. By the way, the nofollow meta tag does that same thing, but at a page level.)
B) Sometimes – we don’t generally encourage this behavior, but if you’re linking to user-generated content pages on your site who’s content you may not trust, nofollow is a way to tell us that.
C) No – nofollow is intended to say “I don’t editorially vouch for the source of this link.” If you’re placing un-trustworthy content on your site, that can hurt you whether you use nofollow to link to those pages or not.”
Just some interesting background as you consider the current debate.
Google's Orion and Vincent
But of course, I don’t want to ignore the previous Vincent update – as that was the connection to post #1.
Orion first. Actually Google did not announce “Orion” – which is a search technology it purchased in 2006, along with it’s college-student developer Ori Allon. But my guess is that thanks to Greg Sterling’s new article containing that title the term “Orion Release” will stick. Here’s how Danny Sullivan described the technology back in April 2006:
It sounds like Allon mainly developed an algorithm useful in pulling out better summaries of web pages. In other words, if you did a search, you’d be likely to get back extracted sections of pages most relevant to your query.
Ori himself wrote the following in his press release:
Orion finds pages where the content is about a topic strongly related to the key word. It then returns a section of the page, and lists other topics related to the key word so the user can pick the most relevant.
Google actually announced two changes:
Longer Snippets. When users input queries of more than three words, the Google results will now contain more lines of text in order to provide more information and context. As a reminder, a snippet is a search result that starts with a dark blue title and is followed by a few lines of text. Google’s research must have shown that regular-length snippets were not providing enough information to searchers to provide a clear preference for a result based on their longer search term – as their stated intent is to provide enhanced information that will improve the searcher’s ability to determine the relevance of items listed in the SERPs.
Having said this, I don’t see any difference. My slav…. I mean my 12-yo son (who has been doing keyword analysis since he was 10, so no slouch at this) ran ten tests on Google to see if we could find a difference (I won’t detail all the one- and two- vs 3+ word combinations we tried – if you want to have the list, leave a comment or send a twitter to arthurofsun and I will forward it to you). But shown below are the results for France Travel vs France Travel Guides for Northern France:

As you can see, there is absolutely no difference in snippet length for the two searches - and this was universally true across all the searches we ran. So I’m not sure – I wonder if Ori Allon, who wrote the post, could help us out on this one.
Also, I am somewhat confused. If you type in more keywords, the search engine has more information by which to determine the relevance of a result. So why would I need more information? Where I need more information is in the situation of a 3- keyword search, which will return a broad set of results that I will need to filter based on the information contained in a longer snippet.
Enhanced Search Associations. The bigger enhancement – and the one that seems most likely to derive from the original Orion technology – are enhanced associations between keywords. Basically if you type in a keyword – Ori uses the example ”principles of physics” – then the new algorithms understand that there are other ideas related to this I may be interested in, like “Big Bang” or “Special Relativity.” The way Google has implemented this is to put a set of related keywords at the bottom of the first SERP, which you may click on. When you click, it returns a new set of search results based on the keyword you clicked. Why at the bottom of the first SERP? My hypothesis would be that if the searcher has gone to the bottom of the page, it means that they haven’t found what they are looking for. So this is the right place in the user experience to prompt them with related keywords that they may find more relevant to the content they are seeking.
From my perspective, this feels like the “People who liked this item also bought…” widget on most comparison shopping sites (which I know something about, having been the head of marketing for SHOP.COM.) I’m not saying there is anything wrong with this – I’m just trying to make an analogy to the type of user experience Google is trying to create.
Shown below is an example of a enhanced search associations from a search on the broad term “credit derivatives in the USA”:
As I expected, the term “credit default swaps” – which is the major form of credit derivative – shows as an associated keyword. What I do not see in the list – and was surprised – was any reference to the International Swaps and Derivatives Association (ISDA), which is the organization that has developed the standards and rules by which most derivatives are created. It does, however, show up for the search on the keyword “credit default swap.” I’d be curious to understand just exactly how the algorithm has been tuned to make trade-offs between broad concepts (i.e, credit derivatives, which is a category)) and very focused concepts (i.e. credit default swap, which is a specific product). Maybe I can get Ori to opine on that as well, but most likely that comes under the category of secret sauce.
Anyway, fascinating and it certainly shows that Google continues to evolve the state of IR.
Well, I’ll just have to leave the Vincent release until tomorrow. Something else happened this morning I need to do a quick entry about. Sigh…..

![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=4b5d6c3c-9077-4f12-bef7-55d69a2a167b)
![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=993e545d-54e6-48d1-a5ef-78ec84c3d944)
