Archive for the ‘Yahoo’ Category
Back from SMX Advanced London, where I got a chance to speak on “SEO, Search, and Reputation Management and SMX Advanced 2010 in Seattle, where I got to relax and just take in the knowledge.
So here for all who could not attend, is a summary of three of the sessions I attended on the first day of SMX Advanced 2010. I only get so much time to blog…working guy you know. I’ll do my best to post the rest, but no promises.
SEO for Google versus Bing
Janet Miller, Searchmojo
- From heatmap studies, it appears people “see” Bing and Google SERPs in pretty much the same way. The “hotspots” are pretty similar.
- Not surprising: average pages/visit and time on site are higher for Bing than Google – but that has always been true from my perspective
- Bing does not currently accept video or news sitemaps.
- On Google you can edit sitelinks in Webmaster tools, in Bing you cannot.
- Geolocation results show pretty much the same in both sets of results.
- One major difference: Google shopping is free for ecommerce sites to submit; Bing only has a paid option for now.
- Bing lets you to share results (social sharing) on Facebook, Twitter, and email, Google does not. But the sharing links point back to the images on Bing, not to the original images on your site. You also have to grant access to Bing on Facebook.
- Bing allows “document preview” when you rollover the entry. It will also play videos in preview mode – but only those on youTube. If you look at the behavior, information from the page shows up. To optimize the presentation of that information, Bing takes information in this order:
- H1 tag first – if title tag and h1 tag don’t match, it takes the H1 tag
- First paragraphs of information
- To add contact info, add that information to that page. Bing is really good about recognizing contact information that is on a page.
- To disable “document preview” enter the following
- Add this meta tag to the page: <meta name=“msnbot”, content=“nopreview”>
- Or add this line to robots.txt: x-robots-tag: nopreview
Rand Fishkin: Ranking Factor Correlations: Google versus Bing
As usual, Rand brought his array of statistical knowledge to bear to compare how Bing and Google react to different ranking signals. Here are the takeaways:
Overall Summary of Correlations with Ranking, in Order of Importance
- Number of linking root domains
- An exact match of .com domain name with desired keyword
- Linking domains with an exact match in the TLD name
- Any exact match of the domain name with the desired keyword
- Number of inbound links
- An exact match of .com domain name with desired keyword
- Linking domains with an exact match in the TLD name
- Number of linking root domains
- Any exact match of the domain name with the desired keyword
- Number of inbound links
Domain Names as Ranking Factors
- Exact match domains remain powerful ranking signals in both engines (anchor text could be a factor, too).
- Hyphenated versions of domain names are less powerful, though when they show they show more frequently (more times on a page) in Bing (G: 271 vs. B: 890).
- Just having keywords in the domain name has substantial positive correlation with high rankings.
- If you really want to rank on a keyword, make sure you get exactmatchname.com as the TLD.
- Other exact match domains may still help, but don’t have as high correlation.
- Keywords in subdomains are not nearly as powerful as in root domain name (no surprise).
- Bing may be rewarding subdomain keywords less than before (though G: 673 vs. B: 1394).
- On alternate TLD extensions:
- Bing appears to give substantially more weight to these than Google.
- Matt Cutts’ claim that Google does not differentiate between .gov, .info and .edu appears accurate.
- The .org TLD has a surprisingly high correlation with high rankings but you can attribute this to elements of their authority – more links, more non-commercial links, Less spam.
- Don’t forget the exact match data .com is still probably a very good thing (at least own it).
- Shorter URLs are likely a good best practice (especially on Bing).
- Long domains may not be ideal, but aren’t awful.
On-Page Keyword Usage
- Google rankings seem to be much more highly correlated with on-page keyword usage than for Bing.
- The alt attribute of images shows significant correlation as an on-page ranking factor. (I always thought so and it’s one of the elements most SEO newbies miss.)
- Putting keywords in URLs is likely a best practice.
- Everyone optimizes titles (G: 11,115 vs. B: 11,143). Differentiating here is hard.
- (Simplistic) on-page optimization isn’t a huge factor.
- Raw content length (length of page and number of times the keyword is mentioned on the page) seems to have only a marginal correlation with rankings.
Link Counts and Link Diversity
- Links are likely still a major part of the algorithms, with Bing having a slightly higher correlation.
- Bing may be slightly more naïve in their usage of link data than Google, but better than before.
- Diversity of link sources remains more important than raw link quantity.
- Many anchor text links from the same domain likely don’t add much value.
- Anchor text links from diverse domains, however, appears highly correlated.
- Bing seems more Google-like than in the past in handling exact match anchor links (this is a surprise!).
- Bing’s stereotype holds true: homepages are more favored in top results vs. Google.
Twitter, Real-Time Search, and Real-Time SEO
Steve Langville – Mint.com
Steve had a lot of interesting points, and I thought his approach to real-time was one of the most sophisticated I had heard.
- One element of his strategy is what I like to call “Merchandising Real-Time Search.” Basically someone at Mint has a merchandising calendar of important dates/topics in consumers financial lives (e.g. tax time) and also watches for hot topics that could impact a consumers sense of money (e.g. new credit card legislation). Mint then has a team that can create new content on that topic that is likely to generate word-of-mouth. At that point, they push the content out and then energize their communities on Facebook, Twitter, etc. by promoting the content to them. This generates buzz and visits back to mint.com.
- Mint has also created Mint Answers, it’s own Yahoo Answers-like site where people ask and answer questions on financial topics. The result is a lot of user generated content on Mint.com on critical keywords that yields high ranking in the SERPs.
- Mint also developed as Twitter aggregator widget around personal finance and put this as a section on their site. Twitter’s community managers then retweeted these folks who then signed up for @mint and began retweeting @mint tweets. According to Steve, the amplification effect was huge.
As always, Danny had some really interesting insights to add about real-time search. I will honestly say that many times I still think Danny, like many search marketers, thinks “transactionally” about search , as compared to consumer marketers who think about having an on-going “conversation” with a customer. (More on that notion later). But in this case, Danny really showed why he is known as an industry visionary:
- Search marketing means being visible wherever someone has overtly expressed a need or desire. It is more than web; more than keywords. An example is mobile apps – search by another name- so I guess he agrees with Steve Jobs on that one.
- This was uniquely insightful. Whereas normal search is a many-to-many platform where anonymous individuals post content whose authority grows based on “good” links that are added over time, real-time search is a one-to-one platform where clearly identified people post questions or comments and get responses. Authority comes from the level of active engagement, not links. I had never heard real-time described this way, and it is a succinct but very sophisticated definition of real-time search.
- You can use conversations to identify folks interested in what you need. Not a new concept, but good to repeat. So if you have a service that sells vacuum cleaners, search for “anyone know vacuum cleaners” and the folks who have an interest are now identified and you can respond to them.
- Get a gift by giving a gift. That’s the fundamental currency of social media. Danny answered 42 questions from people who didn’t know him, didn’t follow him. He got no complaints and 10 thank yous.
- Recency versus Relevancy. Anyone doing real-time gets this – that authority can come from having high-quality information or having reasonably high quality information in a very short time frame – in other words, sometimes the recency of news makes it more worthy of attention than something older but more thought out. Danny believes that as Twitter matures (and maybe the entire real-time search business – that wasn’t clear), relevancy is going to get a higher relative weighting, so that relevant results will get more hang time in the SERPs.
I have trouble summarizing all of Chris’s talk – and it was a very good talk – because so much of what he talked about was covered in my notes from other speakers. So here are the unique points from his chat:
- You have to decide how you resource Twitter and other sites. Questions to ask for your strategy
- Consumers First: What are consumers saying about your site/company already? How might they use your Twitter content? Develop representative Personas of consumers who would engage with you on Twitter.
- Time/Investment: How much time do you have to devote to Twittering? Do you devote someone to spend time dailyreading/responding to Tweets?
- Goals: What are some advantageous things you could accomplish by interacting with consumers in real-time?
- Strategy will decide whether you hire a full-time person, part-time person, or use automation.
- Use OAuth for API integration as it shows the application the visitor used as an appended data point
- Convert your Google News feeds to RSS to make them easier to subscribe to by members of your community
- A great tool for small business social media management is www.closely.com which auto-creates a social action page for every offer a company makes on Twitter and Facebook
- Be brief but really clear in main point on Tweets. Include a call to action as they are retweeted at a much higher rate.
- Tweets with please were retweeted ~5.5% of the time versus 0.5% for random Tweets.
- 98% of usernames on Twitter are 12 characters or less. So make your tweets no longer than 125 characters to allow for RT addition with username.
- Top 10 common words in retweets are (in order of most mentioned): you, Twitter, please, retweet, post, blog, social, free, media, help.
- Use custom features in URL shorteners to include your desired keyword on which to rank in the shortened URL.
- Resources to check out:
- http://collective-thoughts.com/2009/02/20/social-bites-like-sound-bites-butdifferent/ (Retweet tips – perfect “social bite”)
- The Science of ReTweets:
John Shehata – Advanced Internet
I loved John’s presentation because it confirmed many of the same conclusions I had reached about real-time search and reported on at SMX Advanced in London. Key points:
- The ranking factors for real-time search are very different. They include:
- User (author) authority (My comment: not just one site but across every site on which the author publishes).
- How fresh that author’s content continues to be.
- Number of followers.
- The quality of follows and how they act on the author’s content (is it retweeted often? Is it stumbled? Does someone flow it into their RSS feed? How often? How quickly?).
- URL real-time resolution.
- It is not about how many followers you have but how reputable (authoritative) your followers are. (This is what I call Authorank and like PageRank it is passed from authoritative follower to those they follow.)
- You earn reputation, and then you give reputation. If lots of people follow you, and then you follow someone–then even though this [new person] does not have lots of followers, his tweet is deemed valuable because his followers are themselves followed widely.
- Other possible ranking factors:
- Recent Activity : Google pays more attention to accounts with more activity?
- User name: keywords in your user name might also help.
- Age: since age plays a big role in Google search engine ranking, it’s possible that more established Twitter accounts will outrank the newer ones.
- External links: links to your @account from (reputable) non-social media sites should boost reputation as far as Google is concerned.
- Tweet Quantity: the more you tweet, the better chance you’ve got to be seen in Google real-time search results.
- Ratios of followed vs follow: a close ratio between the two can raise a red flag.
- Lists: it might also matter in how many lists you appear.
Tactics to follow:
- Encourage retweets by tweeting content of 120 characters or less so you can save room for the RT @ Username that is added when someone passes along your message to their followers.
- Tools to identify hot trends: Google Hot Trends, Google Insights, Google News, Bing xRank, Surchur, Crowdeye, Oneriot.
- Same advice as Steve Langville – plan for seasonal keyword trends.
- Don’t update multiple accounts, reTweet instead.
- Connect your social profiles.
- Attract reputable, topically-related followers.
- Write keyword-rich tweets whenever possible, without sounding spammy:
- Do not create content with multiple buzzing terms.
- Do not abuse shortening services for spam links.
- Do not go overboard using Twitter #hashtags – Search Engines will eliminate your tweet from search if you use too many because it “looks bad.”
- Spammy looking tweet streams will be eliminated from search.
- Don’t use same IP address for different twitter accounts.
Show Me The Links
This was a great session with a HUGE number of ideas for getting new links. And each person talked about a very different philosophy towards link building and their tactics reflected those philosophies. Let’s see if I can capture them:
- Philosophy centers on using easily created and highly valued visual or viral content:
- Creating Infographics – they work very well. An example – a “where does the money go from the 2008 stimulus bill” infographic generated 29,000 links.
- Writing guest blog posts whose content is highly viral for others . Embed a link to your site as the source. You give the gift of traffic to them, you get links as a gift in return.
- More traditional link building
- 50% is content development and promotion. The big example he used on this was the Google April Fools Day Prank about Google opening an SEO Shop. Got picked up as “real” story by Newswire 27 days after post, went viral, generated 800 backlinks.
- 20% is blog post and article placement.
- 10% is basic link development.
- 20% is targeted link requests to those few critical high-value sites. There are NO magic bullets here – it takes creativity and just good old-fashioned hard work and persistence. But the rewards can be substantial.
- Use badges with your URL embedded that benefits the person who puts on site (e.g. “a gold star” validation).
- Write testimonials for other folks.
- Write on sites that want good content and can deliver an audience.
- Answer questions on answer sites where you have the expertise.
- Make it easy to link to you by providing the information to potential linkers.
Focused on B2B link building tactics:
- Backlink trolling from competitors- but also look for sites that your competitors aren’t on – you want your own authoritative link network.
- Don’t ignore TLD .us There are lots of good possible link sites with decent authority there.
- Look at associations that provide ways to link to their members. Search for member lists, restrict your search to .org and add in relevant keyword phrases to filter for your related groups.
- Look at dead sites with broken links – see who is linking to them. Once you have identified a dead internet page do a linkdomain: search on Yahoo to identify sites still linking to the dead site.
- Free links from resources, directories, or “where to buy” sites.
- Bloggers: cultivate alliances and relationships with other sites and blogs. Particular bloggers who like to do interviews.
- You have all this content that you generate as a normal part of your business. Use it.
- Use dapper.net to create RSS feeds of your blog content
- Joost de Valk has a WordPress plugin at http://yoast.com/wordpress/rss-footer/ which let’s you add an extra line of content to articles in your feed, defaulting to”Post from“ and then a link(s) back to your blog,with your blog’s name as it’s anchor text.
- Use RSS feeds from news sources to identify media leads to speak with as part of your PR work.
- Content syndication: podcasts, white papers, living stories, news streams and user generated content (e.g. gues blogging) are still hot. Infographics, short articles, individual blogs, and Wikipedia are not.
- Widget Bait: basic widgets that you can build on widgetbox are getting somewhat passé but still have some value. You need to do more advanced versions – information aggregation widgets seem to work very well right now. Make people come to you to download them.
- Microsites: the old link wheels are worthless at this point – the engines have figured those out and treat them similarly to link spam sites. Those with good content – e.g. blogs or sites with good content – work. One option is to buy an established site and then rebrand it.
In my last post, I discussed the underlying issues regarding site loading times and SEO rankings. What I tried to do was help the reader understand why site loading times are important from the perspective of someone designing a search engine that has to crawl billions of pages. The post also outlines a few of the structures that they would have to put in place to accurately and effectively crawl all the pages they need in a limited time with limited processing power. I also tried to show that a search engine like Google has a political and economic agenda in ensuring fast sites, not just a technical agenda. Google wants as many people/eyeballs on the web as possible, so it is to their advantage to ensure that web sites provide a good user experience. As a result, they feel quite justified in penalizing sites that do not have good speed/performance characteristics.
As you would expect, the conclusion is that if your site is hugely slow you will not get indexed and will not rank in the SERPs. What is “hugely slow”? Google has indicated that slow is a relative notion and is determined based on the loading times typical of sites in your geographical region. Having said that, relative or not, from an SEO perspective I wouldn’t want to have a site where pages are taking more than 10 seconds on average to load. We have found from the sites we have tested and built that average load times higher than approximately 10 seconds to completely load a page will have a significant impact on being indexed. From a UE perspective, there is some interesting data that the limit on visitors patience is about 6-8 seconds. Google has studied this data, so it would probably prefer to set its threshhold in that region. But I doubt it can. Many small sites are not that sophisticated, do not know these kinds of rules, and do not know how to check or evaluate their site loading times. Besides this, there are often problems with hosts that cause servers to run slowly at times. Google has to take that into account, as well. So I believe that the timeout has to be substantially higher than 6-8 seconds, but 10 seconds as a crawl limit is a guess,
I have yet to see a definitive statement by anyone as to what the absolute limit is for site speed before indexing ceases altogether (if you have a reference, please post it in the comments). I’m sure that if a bot comes to a first page and it exceeds the bot’s timeout threshold in the algorithm, your site won’t get spidered at all. But once the bot gets by the first page, it has to do an on-going computation of average page loading times for the site to determine if the average exceeds the built-in threshold, so at least a few pages would have to be crawled in that case.
Now here’s where it gets interesting. What happens between fast (let’s say < 1-2 second loading times, although this is actually pretty slow but a number Matt Cutts in the video below indicates is ok) and the timeout limit? And how important is site speed as a ranking signal? Let’s answer one question at a time.
When a site is slow but not slow enough to hit any built-in timeout limits (not tied to the number of pages), a couple of things can happen. We do know that Google allocates bot time by the number of pages on the site and the number of pages it has to index/re-index. So for a small site that performs poorly, it is likely that most of the pages will get indexed. Likely, but not a guarantee. It all depends on the cumulative time lag versus the average that a site creates. If a site is large, then you can almost guarantee that some pages will not be indexed, as the cumulative time lag will ultimately hit the threshold set by the bots for a site of that number of pages. By definition, some of your content will not get ranked and you will not get the benefit of that content in your rankings.
As an aside, by the way, there has been a lot of confusion around the <meta name=”revisit-after”> tag. The revisit-after meta tag takes this form <meta name=”revisit-after” content=”5 days”>.
This tag supposedly tells the bots how often to come back to the site to reindex this specific page (in this case 5 days). The idea is that you can improve the crawlability of your site by telling the bots not to index certain pages all the time, but only some of the time. I became aware of this tag at SMX East, when one of the “authorities” on SEO mentioned it as usable for this purpose. The trouble is that, from everything I have read, the tag is completely unsupported by any of the major engines, and was only supported by one tiny search engine (SearchBC) many years ago.
But let’s say you are one of the lucky sites where the site runs slowly but all the pages do get indexed. Do Google or any of the other major search engines use the site’s performance as a ranking signal? In other words, all my pages are in the index. So you would expect that they would be ranked based on the quality of their content and their authority derived from inbound links, site visits, time-on-site, and other typical ranking signals. Performance is not a likely candidate for a ranking signal and isn’t important.
If you thought that, then you were wrong. Historically, Google has said, and Matt Cutts reiterates this in the video below, that site load times do not influence search rankings. But while that may be true now, it may not be in the near future. And this is where Maile’s comments took me by surprise. In a small group session at SMX East 2009, Maile was asked about site performance and rankings. She indicated that for the “middle ground” sites that are indexing but loading slowly, site performance may already be used to influence rankings. Who is right, I can’t say. These are both highly respected professionals who choose their words carefully.
Whatever is true, Google is sending us signals that this change is coming. Senior experts like Matt and Maile don’t say these things lightly. They are well considered and probably approved positions that they are asked to take. This is Google’s way of preventing us from getting mad when the change occurs. Google has the fallback of saying “we warned you this could happen.” Which from today’s viewpoiint means it will happen.
Conclusion: Start working on your site performance now, as it will be important for SEO rankings later.
Oh and, by the way, your user experience will just happen to be better, which is clearly the real reason to fix site performance.
And it isn’t only Google that may make this change. Engineers from Yahoo! recently filed a patent with the title “Web Document User Experience Characterization Methods and Systems” which bears on this topic. Let me quote paragraph 21:
With so many websites and web pages being available and with varying hardware and software configurations, it may be beneficial to identify which web documents may lead to a desired user experience and which may not lead to a desired user experience. By way of example but not limitation, in certain situations it may be beneficial to determine (e.g., classify, rank, characterize) which web documents may not meet performance or other user experience expectations if selected by the user. Such performance may, for example, be affected by server, network, client, file, and/or like processes and/or the software, firmware, and/or hardware resources associated therewith. Once web documents are identified in this manner the resulting user experience information may, for example, be considered when generating the search results.
In does not appear Yahoo! has implemented any aspect of this patent yet, and who knows what the Bing agreement will mean for site performance and search. But clearly this is a “problem” that the search engine muftis have set their eyes on and I would expect that if Google does implement it, others will follow.