Posts Tagged ‘Technical SEO’
Well, it’s Monday. A good Monday with some interesting insights.
I will continue with tool review going forward, but I’m finding that I need to document our work on our website performance as we go along or else we lose the data from the intermediate steps, and there have already been several that have been implemented. So let me bring you up to speed.
After my last post about the site and reviewing the data from the Google Site Performance tab in Google Webmaster tools, I was able to visualize (see the image) what was going on. As the image shows, performance jumped around substantially from mid-September, when I started the blog, until early-mid December. These jumped did not coincide in any major way with the debugging and latency improvements that I had been working on. Except for December – around the time of my last post. That seemed to have cut my latency in half – which was what pingdom had shown. So perhaps I was moving in the right direction.
Things continued to improve steadily through January – even though I had not changed any further settings. This again suggested that the fact I was hosted on a shared server and that perhaps my ISP had improved the performance of that server might be the reason for unpredictable performance changes, good or bad. But then in mid-January, I started to see a jump in latency times again.
At the same time, I wanted to continue debugging AboutOnlineMatters site latency and implement some of the changes from ySlow, such as gzip, entity tags, and expires headers. To do that, I needed direct access to my Apache Server. Given these two facts, I decided that it was time to remove the server as a factor and host the blog myself.
On February 6, we moved the site onto our own hosted setup. This is basically a dedicated server (we do have a few other small sites running, but they are using insignificant server resources) and I have direct access to all the configuration settings. From that time forward, as the chart shows, site latency has decreased continually until it is now at close to it’s historical lows.
I’ll leave it there for now – following my rule of short posts. We’ll pick up the next steps I took tomorrow.
Well, it’s after Thanksgiving and I finally get back to the blog. Feels good. This is the next installment about site performance analysis and how to deal with a site with worrisomely slow page loading times. It turns out I had a case study right under my nose. This site, the OnlineMatters blog. Recently, I showed a client my site and watched as it loaded, and loaded and….loaded. I was embarassed but also frustrated. I had just finished my pieces on site performance and knew that this behavior was going to cause my rankings in the SERPs to drop, even before Google releases Caffeine. While I am not trying to publish this blog to a mass audience – to do that I would need to write every day – I still wanted to rank well on keywords I care about. Given what I do, it’s an important proof point for customers and prospects.
So I am going to take advantage of this negative and turn it into a positive for you. You will see how to perform various elements of site analysis by watching me debug this blog in near real time. Yesterday, I spent three hours working through the issues, and I am not done yet. So this first piece will take us about halfway there. But even now you can learn a lot from my struggles.
The first step was to find out just how bad the problem was. The way to do this is to use Pingdom’s Full Page Analysis tool. This tool not only tests page loading speeds but also visualizes which parts of the page are causing problems. An explanation of how to use the tool can be found here, and you should read it before trying to interpret the results for your site. Here is what I got back when I ran the test:
A load time of 11.9 seconds? Ouch! Since Pingdom runs this on their servers, the speed is not influenced by my sometime unpredictably slow Comcast connection.
Pingdom shows I had over 93 items loading with the home page of which the vast majority were images (a partial listing is shown below). There were several (lines 1, 36, 39, 40, 41, 54) where a significant part of the load time was occurring during rendering (that is, after the element had been downloaded into the browser). This is indicated by the blue part of the bar. But in the majority of cases, the load time was mainly caused by the time it took from either the first request to the server until the content began downloading (the yellow portion of the bar), or from the time of downloading to the time rendering began (the green portion). This suggested that
- I had too big a page, because the download time for all the content to the browser was very long.
- I might have a server bandwidth problem.
But rather than worrying about item 2, which would require a more extensive fix – either an upgrade in service or a new host – I decided to see how far I could get with some simple site fixes.
The first obvious thing to fix was the size of the home page, which was 485 KB – very heavy. I tend to write long (no kidding?) and add several images to posts, so it seemed only natural to reduce the number of entries on my home page below the 10 I had it set for. I set the allowable number in WordPress to 5 entries, saved the changes, and ran the test again.
Miracle of miracles: My page now weighed 133 KB (respectable), had 72 total objects, and downloaded in six seconds. That was a reduction in load time by almost 50% for one simple switch.
Good, but not great. My page loading time was still 6 seconds – it needed to be below 2. So more work was needed.
Since I developed the template myself using Artisteer when I was relatively new to WordPress, I hypothesized that an image compression tool might make a substantial improvement for little effort.
Fortunately, the ySlow Firefox plugin, which is a site performance analyzer we will examine in my next entry, contains smushit, an image compression tool created by Yahoo! that is easy to use, identifies and shows just how much bandwidth it saves, produces all compressed files at the push of a single button, and produces excellent output quality.
So I ran the tool (I sadly did not keep a screenshot of the first run, but a sample output is below), and Smushit reduced image sizes overall by about 8%, and significantly compressed the size of the template elements. So I downloaded the smushed images and uploaded them to the site
As you can see below – my home page was now 89.8 KB, but my load time had increased to 8.8 seconds! – and note on the right of the image that several prior runs confirmed the earlier 6 second load time. So either compression did not help or some other factor was at play.
The fact is the actual rendering times had basically reduced from measurable amounts (e.g. 0.5 seconds) to milliseconds – so the actual file sizes had improved rendering performance. Download times had increased – once again pointing to my host. But before going there, i wanted to see if there were any other elements on the site I could manipulate to improve peformance.
More in next post. BTW, as I go to press this am, my site speed was 5.1 seconds – a new, positive record. Nothing has changed – so more and more I’m suspecting my ISP and feeling I need a virtual private server.
NOTE: Even more important: as I go to press Google has just announced that it is adding a site performance tool to Google Webmaster Tools in anticipation of site performance becoming a ranking factor.
In my last post, I discussed the underlying issues regarding site loading times and SEO rankings. What I tried to do was help the reader understand why site loading times are important from the perspective of someone designing a search engine that has to crawl billions of pages. The post also outlines a few of the structures that they would have to put in place to accurately and effectively crawl all the pages they need in a limited time with limited processing power. I also tried to show that a search engine like Google has a political and economic agenda in ensuring fast sites, not just a technical agenda. Google wants as many people/eyeballs on the web as possible, so it is to their advantage to ensure that web sites provide a good user experience. As a result, they feel quite justified in penalizing sites that do not have good speed/performance characteristics.
As you would expect, the conclusion is that if your site is hugely slow you will not get indexed and will not rank in the SERPs. What is “hugely slow”? Google has indicated that slow is a relative notion and is determined based on the loading times typical of sites in your geographical region. Having said that, relative or not, from an SEO perspective I wouldn’t want to have a site where pages are taking more than 10 seconds on average to load. We have found from the sites we have tested and built that average load times higher than approximately 10 seconds to completely load a page will have a significant impact on being indexed. From a UE perspective, there is some interesting data that the limit on visitors patience is about 6-8 seconds. Google has studied this data, so it would probably prefer to set its threshhold in that region. But I doubt it can. Many small sites are not that sophisticated, do not know these kinds of rules, and do not know how to check or evaluate their site loading times. Besides this, there are often problems with hosts that cause servers to run slowly at times. Google has to take that into account, as well. So I believe that the timeout has to be substantially higher than 6-8 seconds, but 10 seconds as a crawl limit is a guess,
I have yet to see a definitive statement by anyone as to what the absolute limit is for site speed before indexing ceases altogether (if you have a reference, please post it in the comments). I’m sure that if a bot comes to a first page and it exceeds the bot’s timeout threshold in the algorithm, your site won’t get spidered at all. But once the bot gets by the first page, it has to do an on-going computation of average page loading times for the site to determine if the average exceeds the built-in threshold, so at least a few pages would have to be crawled in that case.
Now here’s where it gets interesting. What happens between fast (let’s say < 1-2 second loading times, although this is actually pretty slow but a number Matt Cutts in the video below indicates is ok) and the timeout limit? And how important is site speed as a ranking signal? Let’s answer one question at a time.
When a site is slow but not slow enough to hit any built-in timeout limits (not tied to the number of pages), a couple of things can happen. We do know that Google allocates bot time by the number of pages on the site and the number of pages it has to index/re-index. So for a small site that performs poorly, it is likely that most of the pages will get indexed. Likely, but not a guarantee. It all depends on the cumulative time lag versus the average that a site creates. If a site is large, then you can almost guarantee that some pages will not be indexed, as the cumulative time lag will ultimately hit the threshold set by the bots for a site of that number of pages. By definition, some of your content will not get ranked and you will not get the benefit of that content in your rankings.
As an aside, by the way, there has been a lot of confusion around the <meta name=”revisit-after”> tag. The revisit-after meta tag takes this form <meta name=”revisit-after” content=”5 days”>.
This tag supposedly tells the bots how often to come back to the site to reindex this specific page (in this case 5 days). The idea is that you can improve the crawlability of your site by telling the bots not to index certain pages all the time, but only some of the time. I became aware of this tag at SMX East, when one of the “authorities” on SEO mentioned it as usable for this purpose. The trouble is that, from everything I have read, the tag is completely unsupported by any of the major engines, and was only supported by one tiny search engine (SearchBC) many years ago.
But let’s say you are one of the lucky sites where the site runs slowly but all the pages do get indexed. Do Google or any of the other major search engines use the site’s performance as a ranking signal? In other words, all my pages are in the index. So you would expect that they would be ranked based on the quality of their content and their authority derived from inbound links, site visits, time-on-site, and other typical ranking signals. Performance is not a likely candidate for a ranking signal and isn’t important.
If you thought that, then you were wrong. Historically, Google has said, and Matt Cutts reiterates this in the video below, that site load times do not influence search rankings. But while that may be true now, it may not be in the near future. And this is where Maile’s comments took me by surprise. In a small group session at SMX East 2009, Maile was asked about site performance and rankings. She indicated that for the “middle ground” sites that are indexing but loading slowly, site performance may already be used to influence rankings. Who is right, I can’t say. These are both highly respected professionals who choose their words carefully.
Whatever is true, Google is sending us signals that this change is coming. Senior experts like Matt and Maile don’t say these things lightly. They are well considered and probably approved positions that they are asked to take. This is Google’s way of preventing us from getting mad when the change occurs. Google has the fallback of saying “we warned you this could happen.” Which from today’s viewpoiint means it will happen.
Conclusion: Start working on your site performance now, as it will be important for SEO rankings later.
Oh and, by the way, your user experience will just happen to be better, which is clearly the real reason to fix site performance.
And it isn’t only Google that may make this change. Engineers from Yahoo! recently filed a patent with the title “Web Document User Experience Characterization Methods and Systems” which bears on this topic. Let me quote paragraph 21:
With so many websites and web pages being available and with varying hardware and software configurations, it may be beneficial to identify which web documents may lead to a desired user experience and which may not lead to a desired user experience. By way of example but not limitation, in certain situations it may be beneficial to determine (e.g., classify, rank, characterize) which web documents may not meet performance or other user experience expectations if selected by the user. Such performance may, for example, be affected by server, network, client, file, and/or like processes and/or the software, firmware, and/or hardware resources associated therewith. Once web documents are identified in this manner the resulting user experience information may, for example, be considered when generating the search results.
In does not appear Yahoo! has implemented any aspect of this patent yet, and who knows what the Bing agreement will mean for site performance and search. But clearly this is a “problem” that the search engine muftis have set their eyes on and I would expect that if Google does implement it, others will follow.