Posts Tagged ‘site load times’
Well, it’s Monday. A good Monday with some interesting insights.
I will continue with tool review going forward, but I’m finding that I need to document our work on our website performance as we go along or else we lose the data from the intermediate steps, and there have already been several that have been implemented. So let me bring you up to speed.
After my last post about the site and reviewing the data from the Google Site Performance tab in Google Webmaster tools, I was able to visualize (see the image) what was going on. As the image shows, performance jumped around substantially from mid-September, when I started the blog, until early-mid December. These jumped did not coincide in any major way with the debugging and latency improvements that I had been working on. Except for December – around the time of my last post. That seemed to have cut my latency in half – which was what pingdom had shown. So perhaps I was moving in the right direction.
Things continued to improve steadily through January – even though I had not changed any further settings. This again suggested that the fact I was hosted on a shared server and that perhaps my ISP had improved the performance of that server might be the reason for unpredictable performance changes, good or bad. But then in mid-January, I started to see a jump in latency times again.
At the same time, I wanted to continue debugging AboutOnlineMatters site latency and implement some of the changes from ySlow, such as gzip, entity tags, and expires headers. To do that, I needed direct access to my Apache Server. Given these two facts, I decided that it was time to remove the server as a factor and host the blog myself.
On February 6, we moved the site onto our own hosted setup. This is basically a dedicated server (we do have a few other small sites running, but they are using insignificant server resources) and I have direct access to all the configuration settings. From that time forward, as the chart shows, site latency has decreased continually until it is now at close to it’s historical lows.
I’ll leave it there for now – following my rule of short posts. We’ll pick up the next steps I took tomorrow.
It is back to the blogstone. And once again, I have broken my own rule about writing long posts infrequently. This one is a continuation of my previous posts on improving web site performance. What especially motivated me to go back to this topic was a request I received from Justified on my site performance posts:
My fellow classmates use your blogs as our reference materials. We look out for more interesting articles from your end about the same topic . Even the future updates about this topic would be of great help.
What a nice compliment. I wouldn’t be a very good marketer if I didn’t respect the wishes of my ‘customers.” So, I continue the series on web site performance issues and my saga to improve the performance of this blog. Having said that, a number of things have happened since that last post.
First, as noted in a previous post , I had the opportunity to go to SMX West earlier this month. While there, I attended a session titled “Diagnosing Technical SEO Issues”, with Adam Audette, Patrick Bennett, Gabe Gayhart, and Brian Ussery as the panelists. One thing I learned is that the term “site performance” has a general usage different than what I am covering here. Site performance is usually defined as including:
- How easy a site is to crawl.
- Infrastructure issues, including URL structures, template coding, directory structures, and file naming conventions.
- Latency issues such as html redirects, http headers, image compression, and all the other items I have been covering in this series.
The point is, what this series of posts are about is only one element of site performance which is web site latency and response times as seen by Google and other search engines. In the future, I will use this technical term in these posts. I have to decide – for purposes of rankings – whether to change the names of my posts, the URLS, and all the core meta data to reflect this change or whether I will stay with web site performance as the keyword I want to optimize for. That decision will probably be made based on the keyword search volumes as shown in the Google Adwords Keyword Tool. (Actually I have now changed the keyword I am optimizing for to web site latency as I am testing some theories I have on page optimization in the SERps that has nothing to do with site performance. So it just goes to show…)
Second, as also noted in the third post in this series on web site latency, Google has announced and deployed a new web site performance tool within Google Webmaster Tools, as well as a Firefox/Firebug plugin. So in order to continue to explore the topic of AboutOnlineMatters site latency, I need to cover that tool. But then we get into the whole issue of the core set of site performance tools to use for evaluating site latency issues. We already discussed and showed our results from pingdom’s latency analysis tool, but there are many more, some of them providing similar analysis and, as I was bemused to discover, often providing differing results for the same items.
So what I’ve decided to do is to provide some discussion of web site latency and performance tools and toolbars before we get back to analyzing AboutOnlineMatters, and then I can show how I used the tools to debug my site latency issues.
Here are the tools I plan to cover, and just so you know, I may cover some or all of them in flash/video, which would be a first for this blog. Although I’m not a big video fan (I can take in more info more quickly by reading), I know many people prefer than format so I want to try and accomodate them along with my current readers.
|Charles||A desktop application that provies a HTTP proxy / HTTP monitor / Reverse Proxy that enables a developer to view all of the HTTP and SSL / HTTPS traffic between their machine and the Internet. This includes requests, responses and the HTTP headers (which contain the cookies and caching information). A great tool for understanding what calls/requests are being made and how they impact web site latency.|
|curl [url]||curl is a downloadable command line tool for transferring data with URL syntax.|
|dynamic drive||Image Optimizer is a web-based service that lets you easily optimize your gifs, animated gifs, jpgs, and pngs, so they load as fast as possible on your site. It provides images in a range of filesize (for the same size image) by decreasing the DPI of the image. It also easily converts from one image type to another. Upload size limit is 300 kB.|
|Firebug||Firebug is a Firefox plugin that provides a number of tools for developers and technical SEO work, including web site latency and performance analysis. I will cover many of the plugins later, if a get the chance. In the meantime, take a look at this article at webresources depot to find a good list of useful Firebug plugins.|
|Google Page Speed||Page Speed is an open-source Firefox/Firebug add-on that performs several tests on a site’s web server configuration and front-end code. It provides a comprehensive report and score on issues that can effect web site latency, as well as recommendations for improving site latency. This is how Google sees your web site latency and is the first tool you should run to understand if you have web site performance problems from Google’s perspective, which over time will have a larger impact on your rankings.|
|HttpWatch||HttpWatch is a desktop (downloadable) HTTP viewer and debugger that integrates with IE and Firefox to provide seamless HTTP and HTTPS monitoring without leaving the browser window. It is similar in functionality to Charles.|
|Live HTTP headers||A Firefox toolbar plugin that allows you to view http headers of a page while browsing. Analysis of headers is important to understand if certain key functions/libraries that effect web site latency and performance, like gzip, are active on the web server serving up pages.|
|Lynx||A downloadable text browser that allows you to view your site as the search crawlers do. Also a way of ensuring that people with text-only browsers can use the site – however this is a pretty minimal use nowadays.|
|NetExport||NetExport is a Firebug 1.5 extension that allows exporting all collected and computed data from the Firebug Net panel. The structure of the created file uses HTTP Archive 1.1 (HAR) format (based on JSON)|
|ShowSlow||ShowSlow is an open source tool that helps monitor various web site latency and performance metrics over time. It captures the results of YSlow and Google Page Speed rankings and graphs them, to help you understand how various changes to your site affect its performance. This is a great tool to see how the two tools results compare, but also to understand which items they are analyzing. Showslow can be run from within your Firefox/Firebug toolbar or be installed on your server. Be forewarned, to run it on your toolbar you will need to make some settings changes to the about:config page and your results will show publicly on www.showslow.com.|
|Site-perf.com||Site-Perf.com is another performance analysis tool that visually displays web page load times. It is similar to Pingdom’s Full Page Test Tool, although it provides a little bit more detail and better explanations of what the load times mean. It also has a network performance test tool that is handy in understanding what portion of your web site latency and performance issues are coming from your host rather than from the site – and let me tell you that can be a lifesaver as you watch your performance go from great to lousy to great again. The page test tool provides an accurate, realistic, and helpful estimation of your site’s loading speed. The script fully emulates natural browser behavior downloading your page with all the images, CSS, JS and other files, just like a regular user.|
|Smush.it||Smush.it runs as a web service or as a Firebug plugin that comes with ySlow V2. It uses optimization techniques specific to image format to remove unnecessary bytes from image files. It is a “lossless” tool, which means it optimizes the images without changing their look or visual quality. After Smush.it runs on a web page it reports how many bytes would be saved by optimizing the page’s images and provides a downloadable zip file with the minimized image files. smush|
|Wave Toolbar||The WAVE Toolbar provides button options and a menu that will modify the current web page to reveal the underlying page structure information so you can visualize where web site latency issues may be occurring. It also has a built in text-browser comparable to Lynx.|
|Web Page Test||webpagetest.org is a hosted service that provides a detailed test and review of web site latency and performance issues. It is probably the most complete single tool I have found for getting an overview of what is happening with your website. I like this better than yslow or showslow, but I would still use Google Page Speed Test as that is how googlebot sees web site performance.|
|wget||wget is a free utility for the non-interactive download of files from the web. It runs in the background (so you can be doing other things) and supports http, https, and ftp protocols, as well as retrieval through http proxies. You can use it, for example, to create a local version of a remote website, fully recreating that site’s directory structure.|
|Xenu Link Sleuth||Xenu Link Sleuth spiders web sites looking for broken links. Link verification is done on ‘normal’ links, images, frames, backgrounds and local image maps. It displays a continously updated list of URLs which you can sort by different criteria.|
|ySlow||ySlow, developed by Yahoo!, is a FireFox/FireBug plugin. It is a general purpose web site latency and performance optimizer. It analyzes a variety of factors impacting web site latency, provides reports, and makes suggestions for fixes. This has been the most commonly used tool for analyzing web site performance until now.|
In my last post, I discussed the underlying issues regarding site loading times and SEO rankings. What I tried to do was help the reader understand why site loading times are important from the perspective of someone designing a search engine that has to crawl billions of pages. The post also outlines a few of the structures that they would have to put in place to accurately and effectively crawl all the pages they need in a limited time with limited processing power. I also tried to show that a search engine like Google has a political and economic agenda in ensuring fast sites, not just a technical agenda. Google wants as many people/eyeballs on the web as possible, so it is to their advantage to ensure that web sites provide a good user experience. As a result, they feel quite justified in penalizing sites that do not have good speed/performance characteristics.
As you would expect, the conclusion is that if your site is hugely slow you will not get indexed and will not rank in the SERPs. What is “hugely slow”? Google has indicated that slow is a relative notion and is determined based on the loading times typical of sites in your geographical region. Having said that, relative or not, from an SEO perspective I wouldn’t want to have a site where pages are taking more than 10 seconds on average to load. We have found from the sites we have tested and built that average load times higher than approximately 10 seconds to completely load a page will have a significant impact on being indexed. From a UE perspective, there is some interesting data that the limit on visitors patience is about 6-8 seconds. Google has studied this data, so it would probably prefer to set its threshhold in that region. But I doubt it can. Many small sites are not that sophisticated, do not know these kinds of rules, and do not know how to check or evaluate their site loading times. Besides this, there are often problems with hosts that cause servers to run slowly at times. Google has to take that into account, as well. So I believe that the timeout has to be substantially higher than 6-8 seconds, but 10 seconds as a crawl limit is a guess,
I have yet to see a definitive statement by anyone as to what the absolute limit is for site speed before indexing ceases altogether (if you have a reference, please post it in the comments). I’m sure that if a bot comes to a first page and it exceeds the bot’s timeout threshold in the algorithm, your site won’t get spidered at all. But once the bot gets by the first page, it has to do an on-going computation of average page loading times for the site to determine if the average exceeds the built-in threshold, so at least a few pages would have to be crawled in that case.
Now here’s where it gets interesting. What happens between fast (let’s say < 1-2 second loading times, although this is actually pretty slow but a number Matt Cutts in the video below indicates is ok) and the timeout limit? And how important is site speed as a ranking signal? Let’s answer one question at a time.
When a site is slow but not slow enough to hit any built-in timeout limits (not tied to the number of pages), a couple of things can happen. We do know that Google allocates bot time by the number of pages on the site and the number of pages it has to index/re-index. So for a small site that performs poorly, it is likely that most of the pages will get indexed. Likely, but not a guarantee. It all depends on the cumulative time lag versus the average that a site creates. If a site is large, then you can almost guarantee that some pages will not be indexed, as the cumulative time lag will ultimately hit the threshold set by the bots for a site of that number of pages. By definition, some of your content will not get ranked and you will not get the benefit of that content in your rankings.
As an aside, by the way, there has been a lot of confusion around the <meta name=”revisit-after”> tag. The revisit-after meta tag takes this form <meta name=”revisit-after” content=”5 days”>.
This tag supposedly tells the bots how often to come back to the site to reindex this specific page (in this case 5 days). The idea is that you can improve the crawlability of your site by telling the bots not to index certain pages all the time, but only some of the time. I became aware of this tag at SMX East, when one of the “authorities” on SEO mentioned it as usable for this purpose. The trouble is that, from everything I have read, the tag is completely unsupported by any of the major engines, and was only supported by one tiny search engine (SearchBC) many years ago.
But let’s say you are one of the lucky sites where the site runs slowly but all the pages do get indexed. Do Google or any of the other major search engines use the site’s performance as a ranking signal? In other words, all my pages are in the index. So you would expect that they would be ranked based on the quality of their content and their authority derived from inbound links, site visits, time-on-site, and other typical ranking signals. Performance is not a likely candidate for a ranking signal and isn’t important.
If you thought that, then you were wrong. Historically, Google has said, and Matt Cutts reiterates this in the video below, that site load times do not influence search rankings. But while that may be true now, it may not be in the near future. And this is where Maile’s comments took me by surprise. In a small group session at SMX East 2009, Maile was asked about site performance and rankings. She indicated that for the “middle ground” sites that are indexing but loading slowly, site performance may already be used to influence rankings. Who is right, I can’t say. These are both highly respected professionals who choose their words carefully.
Whatever is true, Google is sending us signals that this change is coming. Senior experts like Matt and Maile don’t say these things lightly. They are well considered and probably approved positions that they are asked to take. This is Google’s way of preventing us from getting mad when the change occurs. Google has the fallback of saying “we warned you this could happen.” Which from today’s viewpoiint means it will happen.
Conclusion: Start working on your site performance now, as it will be important for SEO rankings later.
Oh and, by the way, your user experience will just happen to be better, which is clearly the real reason to fix site performance.
And it isn’t only Google that may make this change. Engineers from Yahoo! recently filed a patent with the title “Web Document User Experience Characterization Methods and Systems” which bears on this topic. Let me quote paragraph 21:
With so many websites and web pages being available and with varying hardware and software configurations, it may be beneficial to identify which web documents may lead to a desired user experience and which may not lead to a desired user experience. By way of example but not limitation, in certain situations it may be beneficial to determine (e.g., classify, rank, characterize) which web documents may not meet performance or other user experience expectations if selected by the user. Such performance may, for example, be affected by server, network, client, file, and/or like processes and/or the software, firmware, and/or hardware resources associated therewith. Once web documents are identified in this manner the resulting user experience information may, for example, be considered when generating the search results.
In does not appear Yahoo! has implemented any aspect of this patent yet, and who knows what the Bing agreement will mean for site performance and search. But clearly this is a “problem” that the search engine muftis have set their eyes on and I would expect that if Google does implement it, others will follow.
It is one of those nights. Those pesky technicolor dreams woke me up at 2:30 and wouldn’t let me go back to sleep. But under the heading “turning lemons into lemonade,” at least I have some extra time to write my blog even as I am piled high with the end of month deadlines.
Today’s topic is part of my Technical SEO series (I just named it that – now I have to go back and change all my titles and meta tags…sigh) – site load times and whether or not they effect how you rank in the SERPs. It is another one of those topics that came out of SMX East. In this case it was Maile Ohye, Senior Support Engineer at Google, who spoke to this issue. Maile is a wonderfully knowledgable evangelist for Google. I have seen her speak at many shows. Her presentations are always clear and contain good, actionable techniques for improving your rankings in Google’s SERPs. I am not alone in thinking her knowledgable. Stephan Spencer, one of the guys I most look up to in SEO, thought enough of Maile to interview her in August of 2007, and she was also recently interviewed by SEOMoz, another leading light in the industry (and if you haven’t used their pro tools, then you are one arrow short of a full quiver for your SEO work).
So when Maile says “stuff,” I listen. In her talk at SMX East, she made note that poor site load times (we are talking something between good and absolutely horrible) could harm your rankings in Google search results. Let me define the problem, then try to explain what Maile was referring to, and finally my take on all this.
Basic Concepts of Site Loading Times for Getting Indexed
One the one hand, that site loading times effect search rankings isn’t news. Let’s take some time to lay a bit of foundation, because the how of site speeds effecting search rankings didn’t really hit me until Maile’s talk. It’s one of those things that is obvious once you think about it, but it doesn’t really come top of mind when you are focused on specific tasks in an SEO project. It’s a “given” in the background of your work. Unless the site is so horribly slow that it is obviously impacting the user experience, you really don’t think about load times when you are focusing on keywords and meta tags. The site works, move on.
But that’s not really true from the perspective of the search bots. Google and the other engines have to crawl billions of pages on the web on a regular basis, bring that information back, and then index it. Some pages can be crawled infrequently, but as more of the web moves to more real-time information due to social media, the bots have to crawl more sites in real time in order to provide good results. But there are only so many bots and so much time to crawl these billions of pages. So if you are Google, you write your bots with algorithms that allocate this scarce resource most efficiently and, hopefully, fairly.
How would you or I do this? Well, if I were writing a bot, the first thing I would give it is a time limit based on the size of the site. That’s only fair. If you have the ability to create more content, bravo. I want to encourage that, because it is beneficial to the community of searchers. So all other factors being equal (e.g. site loading time), I want to allocate time to ensure all your pages get into the index. There is also the issue of search precision and relevance: I want all that content indexed so I can present the best results to searchers.
Of course, I can’t just set a time limit based on the number of pages. What if one site has long pages and another one short, pithy pages (clearly not mine!)? What if one site has lots of images or other embedded content while another does not? My algorithm has to be pretty sophisticated to determine these factors on the fly and adapt its baseline timeout settings to new information about a site as it crawls it.
The next algorithm I would include would have to do with the frequency at which you update your data. The more often you update, the more often I need to have my bot come back and crawl the changed pages on your site.
Another set of algorithms would have to do with spam. From the perspective of my limited resource and search precision, I don’t want to include pages in my index that are clearly designed only for the search engines, that are link spammers, or that may only contain PPC ads and have no relevant information for the searcher.
You get the picture. I only have a limited window of time to capture continually changing data from the web in order for the data in my index to be reasonably fresh. Therefore I’ve got to move mountains (of data) in a very short period of time but only so many processing cycles to apply. And the number of variables I have to control for in my algorithms are numerous and, in many cases, not black and white.
This is where site load times come in. If a site is large but slow, should it be allocated as much time as it needs to be indexed? Do I have enough processing cycles to put up with the fact it takes three times as long as a similar site to be crawled? Is it fair given a scarce resource to allocate time to slow site if it means I can’t index five other better performing sites in my current window of opportunity? Does it optimize search precision and the relevance of results I can show to searchers? And last but not least, as one of the guardians of the Web, is poor site performance something I want to encourage from the perspective of user experience and making the Web useful for as many people as possible? Let’s face it, if the web is really slow, people won’t use it, and the eyeballs that will be available to view an ad from which I stand to make money will be less.
Hello? Are you there? Can you say “zero tolerance?” And from the perspective of the universal search engines, there is also my favorite radio station – “WIFM.” What’s In it For Me? Answer: nothing good. That is why Google has made page load times a factor in Adwords Quality Score, as an example.
So, in the extreme case (let’s say a page takes 30 seconds to load), the bots won’t crawl most, if any, of the site. The engines can’t afford the time and don’t want to encourage a poor user experience. So you are ignored – which means you never get into the indexes.
When Is a Page’s or Site’s Loading Time Considered Slow?
What is an “extreme case?” I have looked that up and the answer is not a fixed number. Instead, for Google, the concept of “slow loading” is relative.
The threshold for a ‘slow-loading’ landing page is the regional average plus three seconds.
The regional average is based on the location of the server hosting your website. If your website is hosted on a server in India, for example, your landing page’s load time will be compared to the average load time in that region of India. This is true even if your website is intended for an audience in the United States.
Two things to note about how we determined the threshold:
- We currently calculate load time as the time it takes to download the HTML content of your landing page. HTML load time is typically 10% to 30% of a page’s total load time. A three-second difference from the regional average, therefore, likely indicates a much larger disparity.
- We measure load time from a very fast internet connection, so most users will experience a slower load time than we do.
Moreover, Google has a sliding scale with which is grades a site. The following quote applies to Adwords and landing pages, but my guess is similar algorithms and grading are used in determining how often and long a site is crawled:
A keyword’s load time grade is based on the average load time of the landing pages in the ad group and of any landing pages in the rest of the account with the same domain. If multiple ad groups have landing pages with the same domain, therefore, the keywords in all these ad groups will have identical load time grades.
Two things to note:
- When determining load time grade, the AdWords system follows destination URLs at both the ad and keyword level and evaluates the final landing page.
- If your ad group contains landing pages with different domains, the keywords’ load time grades will be based on the domain with the slowest load time. All the keywords in an ad group will always have the same load time grade.
We’ll stop here for today. Next time, we’ll talk about happens in the nether regions between fast and clearly slow.