About Online Matters

Archive for the ‘Search Engines’ Category

PostHeaderIcon Notes from First Day of SMX Advanced 2010

Back from SMX Advanced London, where I got a chance to speak on “SEO, Search, and Reputation Management and SMX Advanced 2010 in Seattle, where I got to relax and just take in the knowledge.

So here for all who could not attend, is a summary of three of the sessions I attended on the first day of SMX Advanced 2010.  I only get so much time to blog…working guy you know.  I’ll do my best to post the rest, but no promises.

SEO for Google versus Bing

Janet Miller, Searchmojo

  • From heatmap studies, it appears people “see” Bing and Google SERPs in pretty much the same way.  The “hotspots” are pretty similar.
  • Not surprising: average pages/visit and time on site are higher for Bing than Google – but that has always been true from my perspective
  • Bing does not currently accept video or news sitemaps.
  • On Google you can edit sitelinks in Webmaster tools, in Bing you cannot.
  • Geolocation results show pretty much the same in both sets of results.
  • One major difference:  Google shopping is free for ecommerce sites to submit; Bing only has a paid option for now.
  • Bing lets you to share results (social sharing) on Facebook, Twitter, and email, Google does not.  But the sharing links point back to the images on Bing, not to the original images on your site.  You also have to grant access to Bing on Facebook.
  • Bing allows “document preview” when you rollover the entry.  It will also play videos in preview mode – but only those on youTube.  If you look at the behavior, information from the page shows up.  To optimize the presentation of that information, Bing takes information in this order:
    • H1 tag first – if title tag and h1 tag don’t match, it takes the H1 tag
    • First paragraphs of information
    • To add contact info, add that information to that page.  Bing is really good about recognizing contact information that is on a page.
      • Address
      • Phone
      • Email
      • To disable “document preview” enter the following
        • Add this meta tag to the page: <meta name=“msnbot”, content=“nopreview”>
        • Or add this line to robots.txt: x-robots-tag: nopreview

      Rand Fishkin: Ranking Factor Correlations: Google versus Bing

      As usual, Rand brought his array of statistical knowledge to bear to compare how Bing and Google react to different ranking signals.  Here are the takeaways:

      Overall Summary of Correlations with Ranking, in Order of Importance

      Bing Google
      1. Number of linking root domains
      2. An exact match of .com domain name with desired keyword
      3. Linking domains with an exact match in the TLD name
      4. Any exact match of the domain name with the desired keyword
      5. Number of inbound links
      1. An exact match of .com domain name with desired keyword
      2. Linking domains with an exact match in the TLD name
      3. Number of linking root domains
      4. Any exact match of the domain name with the desired keyword
      5. Number of inbound links

      Domain Names as Ranking Factors

      • Exact match domains remain powerful ranking signals in both engines (anchor text could be a factor, too).
      • Hyphenated versions of domain names are less powerful, though when they show they show more  frequently (more times on a page)  in Bing (G: 271 vs. B: 890).
      • Just having keywords in the domain name has substantial positive correlation with high rankings.
      • If you really want to rank on a keyword, make sure you get exactmatchname.com as the TLD.
      • Other exact match domains may still help, but don’t have as high correlation.
      • Keywords  in subdomains are not nearly as powerful as in root domain name (no surprise).
      • Bing may be rewarding subdomain keywords less than before (though G: 673 vs. B: 1394).
      • On alternate TLD extensions:
        • Bing appears to give substantially more weight to these than Google.
        • Matt Cutts’ claim that Google does not differentiate between .gov, .info and .edu appears accurate.
        • The .org TLD has a surprisingly high correlation with high rankings  but you can attribute this to elements of their authority – more links, more non-commercial links, Less spam.
        • Don’t forget the exact match data  .com is still probably a very good thing (at least own it).
        • Shorter URLs are likely a good best practice (especially on Bing).
        • Long domains may not be ideal, but aren’t awful.

      On-Page Keyword Usage

      • Google rankings seem to be much more highly correlated with on-page keyword usage than for Bing.
      • The alt attribute of images shows significant correlation as an on-page ranking factor. (I always thought so and it’s one of the elements most SEO newbies miss.)
      • Putting keywords  in URLs is likely a best practice.
      • Everyone optimizes titles (G: 11,115 vs. B: 11,143).  Differentiating here is hard.
      • (Simplistic) on-page optimization isn’t a huge factor.
      • Raw content length (length of page and number of times the keyword is mentioned on the page) seems to have only a marginal correlation with rankings.

      Link Counts and Link Diversity

      • Links are likely still a major part of the algorithms, with Bing having a slightly higher correlation.
      • Bing may be slightly more naïve in their usage of link data than Google, but better than before.
      • Diversity of link sources remains more important than raw link quantity.
      • Many anchor text links from the same domain likely don’t add much value.
      • Anchor text links from diverse domains, however, appears highly correlated.
      • Bing seems more Google-like than in the past in handling exact match anchor links (this is a surprise!).

      Home Pages

      • Bing’s stereotype holds true: homepages are more favored in top results vs. Google.

      Twitter, Real-Time Search, and Real-Time SEO

      Steve Langville – Mint.com

      Steve had a lot of interesting points, and I thought his approach to real-time was one of the most sophisticated I had heard.

      1. One element of his strategy is what I like to call “Merchandising Real-Time Search.”    Basically someone at Mint has a merchandising calendar of important dates/topics in consumers financial lives (e.g. tax time) and also watches for hot topics that could impact a consumers sense of money (e.g. new credit card legislation).  Mint then has a team that can create new content on that topic that is likely to generate word-of-mouth.  At that point, they push the content out and then energize their communities on Facebook, Twitter, etc. by promoting the content to them.  This generates buzz and visits back to mint.com.
      2. Mint has also created Mint Answers, it’s own Yahoo Answers-like site where people ask and answer questions on financial topics.  The result is a lot of user generated content on Mint.com on critical keywords that yields high ranking in the SERPs.
      3. Mint also developed as Twitter aggregator widget around personal finance and put this as a section on their site.  Twitter’s community managers then retweeted these folks who then signed up for @mint and began retweeting @mint tweets.  According to Steve, the amplification effect was huge.

      Danny Sullivan

      As always, Danny had some really interesting insights to add about real-time search.  I will honestly say that many times I still think Danny, like many search marketers, thinks “transactionally” about search , as compared to consumer marketers who think about having an on-going “conversation” with a customer.  (More on that notion later).  But in this case, Danny really showed why he is known as an industry visionary:

      • Search marketing means being visible wherever someone has overtly expressed a need or desire.  It is more than web; more than keywords.  An example is mobile apps –  search by another name- so I guess he agrees with Steve Jobs on that one.
      • This was uniquely insightful. Whereas normal search is a many-to-many platform where anonymous individuals post  content whose authority grows based on “good” links that are added over time, real-time search is a one-to-one platform where clearly identified people post questions or comments  and get responses.  Authority comes from the level of active engagement, not links.  I had never heard real-time described this way, and it is a succinct but very sophisticated definition of real-time search.
      • You can use conversations to identify folks interested in what you need. Not a new concept, but good to repeat.  So if you have a service that sells vacuum cleaners, search for “anyone know vacuum cleaners” and the folks who have an interest are now identified and you can respond to them.
      • Get a gift by giving a gift. That’s the fundamental currency of social media. Danny answered 42 questions from people who didn’t know him, didn’t follow him.  He got no complaints and 10 thank yous.
      • Recency versus Relevancy. Anyone doing real-time gets this – that authority can come from having high-quality information or having reasonably high quality information in a very short time frame – in other words, sometimes the recency of news makes it more worthy of attention than something older but more thought out.  Danny believes that as Twitter matures (and maybe the entire real-time search business – that wasn’t clear), relevancy is going to get a higher relative weighting, so that relevant results will get more hang time in the SERPs.

      Chris Silver-Smith

      I have trouble summarizing all of Chris’s talk – and it was a very good talk – because so much of what he talked about was covered in my notes from other speakers.  So here are the unique points from his chat:

      • You have to decide how you resource Twitter and other sites.  Questions to ask for your strategy
        • Consumers First: What are consumers saying about your site/company already? How might they use your Twitter content? Develop representative Personas of consumers who would engage with you on Twitter.
        • Time/Investment: How much time do you have to devote to Twittering? Do you devote someone to spend time dailyreading/responding to Tweets?
        • Goals: What are some advantageous things you could accomplish by interacting with consumers in real-time?
        • Strategy will decide whether you hire a full-time person, part-time person, or use automation.
        • Use OAuth for API integration as it shows the application the visitor used as an appended data point
        • Convert your Google News feeds to RSS to make them easier to subscribe to by members of your community
        • A great tool for small business social media management is www.closely.com which auto-creates a social action page for every offer a company makes on Twitter and Facebook
        • Be brief but really clear in main point on Tweets. Include a call to action as they are retweeted at a much higher rate.

      John Shehata – Advanced Internet

      I loved John’s presentation because it confirmed many of the same conclusions I had reached about real-time search and reported on at SMX Advanced in London.  Key points:

      • The ranking factors for real-time search are very different. They include:
        • User (author) authority (My comment:  not just one site but across every site  on which the author publishes).
        • How fresh that author’s content continues to be.
        • Number of followers.
        • The quality of follows and how they act on the author’s content (is it retweeted often?  Is it stumbled?  Does someone flow it into their RSS feed?  How often?  How quickly?).
        • URL real-time resolution.
        • It is not about how many followers you have but how reputable (authoritative) your followers are.  (This is what I call Authorank and like PageRank it is passed from authoritative follower to those they follow.)
        • You earn reputation, and then you give reputation. If lots of people follow you, and then you follow someone–then even though this [new person] does not have lots of followers, his tweet is deemed valuable because his followers are themselves followed widely.
        • Other possible ranking factors:
          • Recent Activity : Google pays more attention to accounts with more activity?
          • User name: keywords in your user name might also help.
          • Age: since age plays a big role in Google search engine ranking, it’s possible that more established Twitter accounts will outrank the newer ones.
          • External links: links to your @account from (reputable) non-social media sites should boost reputation as far as Google is concerned.
          • Tweet Quantity: the more you tweet, the better chance you’ve got to be seen in Google real-time search results.
          • Ratios of followed vs follow: a close ratio between the two can raise a red flag.
          • Lists: it might also matter in how many lists you appear.

      Tactics to follow:

      • Encourage retweets by tweeting content of 120 characters or less so you can save room for the RT @ Username that is added when someone passes along your message to their followers.
      • Tools to identify hot trends: Google Hot Trends, Google Insights, Google News, Bing xRank, Surchur, Crowdeye, Oneriot.
      • Same advice as Steve Langville – plan for seasonal keyword trends.
      • Don’t update multiple accounts, reTweet instead.
      • Connect your social profiles.
      • Attract reputable, topically-related followers.
      • Write keyword-rich tweets whenever possible, without sounding spammy:
        • Do not create content with multiple buzzing terms.
        • Do not abuse shortening services for spam links.
        • Do not go overboard using Twitter #hashtags – Search Engines will eliminate your tweet from search if you use too many because it “looks bad.”
        • Spammy looking tweet streams will be eliminated from search.
        • Don’t use same IP address for different twitter accounts.

      Show Me The Links

      This was a great session with a HUGE number of ideas for getting new links.  And each person talked about a very different philosophy towards link building and their tactics reflected those philosophies.  Let’s see if I can capture them:

      Chris Bennett

      • Philosophy centers on using easily created and highly valued visual or viral content:
        • Creating Infographics – they work very well.  An example – a “where does the money go from the 2008 stimulus bill” infographic generated 29,000 links.
        • Writing guest blog posts whose content is highly viral for others .  Embed a link to your site as the source.  You give the gift of traffic to them, you get links as a gift in return.

      Arnie Kuenn

      • More traditional link building
        • 50% is content development  and promotion.  The big example he used on this was the Google April Fools Day Prank about Google opening an SEO Shop.  Got picked up as “real” story by Newswire 27 days after post, went viral, generated 800 backlinks.
        • 20% is blog post and article placement.
        • 10% is basic link development.
        • 20% is targeted link requests to those few critical high-value sites. There are NO magic bullets here – it takes creativity and just good old-fashioned hard work and persistence.  But the rewards can be substantial.

      Gil Reich

      • Use badges with your URL embedded that benefits the person who puts on site (e.g. “a gold star” validation).
      • Write testimonials for other folks.
      • Write on sites that want good content and can deliver an audience.
      • Answer questions on answer sites where you have the expertise.
      • Make it easy to link to you by providing the information to potential linkers.

      Roger Montti

      Focused on B2B link building tactics:

      • Backlink trolling from competitors- but also look for sites that your competitors aren’t on – you want your own authoritative link network.
      • Don’t ignore TLD .us  There are lots of good possible link sites with decent authority there.
      • Look at associations that provide ways to link to their members.  Search for member lists, restrict your search to .org and add in relevant keyword phrases to filter for your related groups.
      • Look at dead sites with broken links – see who is linking to them.  Once you have identified a dead internet page do a linkdomain: search on Yahoo to identify sites still linking to the dead site.
      • Free links from resources, directories, or “where to buy” sites.
      • Bloggers:  cultivate alliances and relationships with other sites and blogs.  Particular bloggers who like to do interviews.

      Debra Masteler

      • You have all this content that you generate as a normal part of your business.  Use it.
        • Use dapper.net to create RSS feeds of your blog content
        • Joost de Valk has a WordPress plugin at http://yoast.com/wordpress/rss-footer/ which let’s you add an extra line of content to articles in your feed, defaulting to”Post from“ and then a link(s) back to your blog,with your blog’s name as it’s anchor text.
        • Use RSS feeds from news sources to identify media leads to speak with as part of your PR work.
        • Content syndication: podcasts, white papers, living stories, news streams and user generated content (e.g. gues blogging) are still hot.  Infographics, short articles, individual blogs, and Wikipedia are not.
        • Widget Bait: basic widgets that you can build on widgetbox are getting somewhat passé but still have some value.   You need to do more advanced versions – information aggregation widgets seem to work very well right now.  Make people come to you to download them.
        • Microsites: the old link wheels are worthless at this point – the engines have figured those out and treat them similarly to link spam sites.  Those with good content – e.g. blogs or sites with good content – work.  One option is to buy an established site and then rebrand it.
FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare

PostHeaderIcon Google TV: TV Advertisers Should Be Mad as Hell

The big announcement at Google I/O last week was the release of details about Google TV.  And it should make TV advertisers of all stripes angry and concerned.  VERY concerned.  So much so, in fact, that they should be actively seeking technologies and business models to deflect/prevent what is effectively an advertising power play by Google.

Google TV is the latest attempt to merge the television experience with a web-based TV (also called IPTV) experience on the television set (as compared to bringing TV to the web, as say a SlingBox does).  There have been numerous attempts to bring the Web to the television, going back all the way to 1996 when Steve Perlman, Bruce Leak and Phil Goldman brought to market the WebTV set-top box, marketed by both Sony and Phillips.  (Find a list of TV/Internet hybrids in the next post).  None of these has been particularly successful, for numerous reasons:

  • Most require an extra set-top box that is expensive (Google is no different.  As an example of technology that uses the consumer’s  computer or laptop as the interface to the TV, see Kylo).
  • The experience doesn’t truly integrate.  You either watch the web-based offerings or Live TV, but not both at the same time.  In many cases, the box is meant for the delivery of movies or TV shows on-demand, as compared to being broadcast in real-time.  The Roku/Netflix platform is an example of this.  PopBox is another example, but they also deliver more content – websites, social media experiences from Facebook and Twitter, images, YouTube videos, games, and music from sources like Photobucket and Pandora.
  • The interface requires a separate remote control, which adds another layer of complexity to the consumer experience.

None of these really impacts the effectiveness of a “single” broadcast TV advertisement in any meaningful way.  They are separate experiences from broadcast television and, as a rule, they do not take away from live TV viewership.  Some amount of consumers’ time is given to the Internet and movies on-demand nowadays.  Whether I interface with that experience through my computer screen or TV screen doesn’t change the amount of time I spend in an “online” mode versus a TV viewing mode, and it does not impact my current behavior around the TV ads themselves.

Google TV has come up with a different approach which, at least during an initial search, overlays the Internet on top of the television experience (see first image).  When it overlays, the interface is transparent so you can see your TV behind the browser interface that lets you search for the shows and information you want.

Google TV transparent interface

There are other times when the interface switches completely and the TV experience is put on hold while the viewer interacts with Web content (see second image), which is more like the experiences of the current generation of web-to-TV offerings.  But the difference here is how easy and seamless Google TV makes it to switch between all three of these user experiences – live TV, TV in the background, and Internet-only.  The other difference, and one of critical import for this article, is that Google intends to sell advertising within the Google TV platform.  Where, how, and how much are still to be determined.

Google TV solid menu interface

Google has definitely come up with something unique that I believe will be very compelling to television viewers as it now truly integrates the television and web experiences for the first time.

But if the consumer loves this, traditional TV advertisers should hate it.

Today, television advertising is a $70B market versus $25B for Internet and mobile search advertising.

US Advertising Market Revenues 2009

In 2010, TV advertising is projected to grow 4.3%, or $3 billion on a base of $70.2 billion. This compares to non-search on-line advertising which is projected to grow 12.9% or $1.6 billion on a base of $12.2 billion during this period.   So even though Internet advertising is growing at a faster rate, television advertising real-dollar growth is twice that of Internet advertising.

Google knows this.  Rishi Chandra, the  product manager for Google TV, mentioned the $70B factoid at Google’s I/O conference last week.  Moreover, Google also gets that despite the fact consumers are spending an increasing number of hours per day online,  television viewership is at an all-time high, with 180mm US consumers watching TV for over 5 hours/day on average.  Rishi mentioned this, as well. The guys at the Googleplex are no dummies. As the old saying goes, they can see a mountain in time. In this case, the mountain they want to cash in on is TV advertising.

What both Google and the advertisers also know is that television advertising is broken.  In 1987 an advertiser could reach 80% of viewers by airing a 30-second spot only three times. Today, that same commercial would have to air 150 times to reach 80% of viewers.[1] The rapid decline of TV ad viewership is due to the “TiVo” effect and today’s viewers’ multi-tasking habits – texting, phoning, emailing and web surfing while watching TV.  Brands are urgently seeking a solution to reengage viewers of their TV commercials as brand expenditures on television dwarf what they spend on all other ad mediums.

Now Google would argue that it has found the solution, and from its perspective I truly think they believe this.  The Google culture is driven by data and metrics.  Current television advertising with its lack of performance measurement is anathema to a Googler’s mindset.   If you are a fanatic about data-driven marketing, Google TV “solves” this problem because of its ability to bring CPC and other easily measurable formats into the web-based part of the new integrated television experience.

Interesting and correct as far as it goes.  But wrong – and I mean dead wrong – from the perspective of television advertisers who focus 3x of their dollars on television versus online advertising because it is still the most potent means of getting a message to the consumer.  Moreover, it is a power play by Google to disconnect the brand advertisers from their traditional advertising providers and drive them, willingly and like lemmings, onto the Google platform(s), thus providing Google an even stronger power position relative to advertisers.

Let’s think about this.  Television advertising is already much less effective than it used to be.  Now along comes Google TV with its overlay and ability to seamlessly move away from the live television experience.  Let’s say you are a viewer watching Lost, that you are using Google TV, and you have left your laptop in the other room because – heck – you don’t need a two-screen solution to access the Internet during live television now that you have Google TV.  Something on the show triggers you to want to look up some factoid on the web at a Lost fan site.  You plan to type in “Lost fan site.”

When are you going to type this in?  During the time the episode is airing?  Absolutely not.  You’re not going to want to miss one minute because Hurley is about to tell Jack his real name.    Or take another example – a sports case.  Are you going to put the potential touchdown play in background mode while you look up Brett Favre’s completion percentage in third down and long situations?  Absolutely, positively not, to the extent that the sports fan is thinking “don’t you dare touch the remote or there will be one less thumb in this family.”

No.  You are going to switch to the Internet experience when the television ads come on and you can safely move away from the live broadcast to find what you need before your show comes back on.

There is another interesting fact that only makes this seem a more likely behavior on the part of cross-platform TV viewers, at least the early adopters of Google TV.  In a recent study of US Online TV Viewership by Comscore involving 1,800 subjects, a majority (67 percent) of cross-platform (TV and online) viewers preferred online TV viewing because it has less interference from commercials[2].  Since these folks are the likely early adopters of Google TV, the tendency to move away from live TV during commercials will be very strong.

So what does Google TV do?  It makes the television ad spend of the major brands even less effective than currentlyBecause Google TV still provides an interruptive experience, it actually encourages cross-platform viewers who wish to increase the “information content” of their viewing experience from web-based channels to do so at the exact time that advertisers least want them to do so.

There is an even bigger implication of this for brand advertisers.  In order to keep the cross-platform consumer’s attention as they move away from viewing television ads, the brand advertisers will be forced to place their ads on the web-based portion of the Google TV interface.  And to a certain extent this makes sense going back to our previous point about measurability of TV advertising.  The Google platform is measurable and consumers more and more are becoming habituated to interacting with web-based CPC or banner advertising.  So the TV advertiser keeps the attention of the cross-platform viewer during the commercial break in the show and gets better metrics.  It’s an obvious win-win for both Google and the advertiser, and a very seductive business proposition to marketing executives looking for better measurability around TV advertising.

But for TV advertisers, Google TV is the equivalent of the poison apple given to Sleeping Beauty.  As Google TV penetrates households, more and more TV viewers will become habituated to the dual-use experience and will spend more and more time on the Google platform during broadcast television advertising pods.  And despite the fact I haven’t said much about mobile in this article until now, Google TV will also move onto the mobile platform and will provide an even more integrated experience for the consumer across the two screens, with a whole host of implications for the two-screen experience that I won’t discuss here.  Given the timing of historical consumer behavior transitions in the television market, this could take ten years. But over that time, Google will take a larger and larger share of the currently $70B TV advertising and the $2.7B mobile advertising markets.  This means that as much as an advertiser is currently dependent on Google for web advertising, they will become even more dependent on the single provider that is Google because of its reach in these other channels.

If you as an advertiser aren’t concerned about the implications of this for your business, where Google can set effectively monopoly prices you pay for ads across every major advertising platform you have, you should be.  You should be very concerned and mad as hell at this attempt to manipulate your advertising dollars even further into the maw of the machine that Google has become.

If I were a brand advertiser right now, I would be talking to my peers and looking for a second-platform solution from someone that can constrain this power play by Google before it becomes a fait accomplice.  If I were Yahoo or Microsoft, I’d be developing or investing in a prototype of something I could show to brand advertisers today and get them to invest strategically in order to prevent Google from locking up this market before it is too late.


[1] “Advertising is Dead, Long Live Advertising” Himpe, 2008

[2] Yuki, Tania “Comscore Study of US Online TV Viewership.” http://www.comscore.com/Press_Events/Press_Releases/2010/4/Viewers_Indicate_Higher_Tolerance_for_Advertising_Messaging_while_Watching_Online_TV_Episodes

FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare

PostHeaderIcon Search Engines: Social Media, Author Rank and SEO

In my previous discussions of social media, channel architectures, and branding, I discussed the fact that I am manic about locking down my online brand (onlinematters) because there seems to be some relationship in the universal search engines between the number of posts/the number of sites that I post from under a specific username and how my posts rank.  It is as if there is some measure of trust given to an author the more he publishes from different sites and the more people see/read/link to what he has written.  I am not talking about authority given to the actual content written by the author – that is the core of search.  I am talking instead about using the author's behavior and success as a content producer to change where his content ranks for any given search result on a specific search term.  It is similar, in many ways, to what happened in the Vincent release where brand became a more important ranking factor.  In this case, the author and the brand are synonymous and when the brand is highly valued, then those results would, under my hypothesis, be given an extra boost in the rankings.

This was an instinct call, and while I believed I had data to support the theory, I had no research to prove that perhaps an underlying algorithm had been considered/created to measure this phenomenon in universal search. 

I thus considered myself twice lucky while doing my weekly reading on the latest patents to find one that indicates someone is thinking about the issue of "author rank."  On October 29th, Jaya Kawale and Aditya Pal of Yahoo!  applied for a patent with the name "Method and Apparatus for Rating User Generated Content in Search Results."  The abstract reads as follows:

Generally, a method and apparatus provides for rating user generated content (UGC) with respect to search engine results. The method and apparatus includes recognizing a UGC data field collected from a web document located at a web location. The method and apparatus calculates: a document goodness factor for the web document; an author rank for an author of the UGC data field; and a location rank for web location. The method and apparatus thereby generates a rating factor for the UGC field based on the document goodness factor, the author rank and the location rank. The method and apparatus also outputs a search result that includes the UGC data field positioned in the search results based on the rating factor.

Let's see if we can't put this into English comprehensible to the common search geek.  Kawale and Pal want to collect data on three specific ranking factors and to combine these into a single, weighted ranking factor, that is then used to influence rank ordering based on  what they term "User Generated Content" or UGC.  The authors note that typical ranking factors in search engines today are not suitable foir ranking UGC.  UGC are fairly short, they generally do not have links to or from them (rendering the back-link based analysis unhelpful) and spelling mistakes are quite common.  Thus a new set of factors is needed to adequately index and rank content from UGC.

The first issue the patent/algorithm has to deal with is defining what the term UGC includes.  The patent specifically mentions "blogs, groups, public mailing lists, Q & A services, product reviews, message boards, forums and podcasts, among other types of content." The patent does not specifically mention social media sites, but those are clearly implied. 

The second issue is to determine what sites should be scoured for UGC.  UGC sites are not always easy to identify.  An example would be a directory in which people rank references based on 5-star rating, where that is the only user input.  Is this site easy to identify as a site with UGC?  Not really, but somehow the search engine must make a decision whether this site is within its valid universe.  Clearly, some mechanism for categorizing sites with UGC needs to exist and while Kawale and Pal use the example of blog search as covering a limited universe of sites, their patent does not give any indication of how sites are to be chosen for inclusion in the crawl process.

Now we come to the ranking factors.  The three specific ranking factors proposed by Kawale and Pal are:

  • Document Goodness.  The Document Goodness Factor is based on at least one (and possibly more) of the following attributes of the document itself: a user rating; a frequency of posts before and after the document is posted; a document's contextual affinity with a parent document; a page click/view number for the document; assets in the document; document length; length of a thread in which the document lies; and goodness of a child document. 
  • Author Rank.  The Author Rank is a measure of the author's authority in the social media realm on a subject, and is based on on or more of the following attributes:  a number of relevant posted messages; a number of irrelevant posted messages; a total number of root documents posted by the author within a prescribed time period; a total number of replies or comments made by the author; and a number of groups to which the author is a member.
  • Location Rank.  Location Rank is a measure of the authority of the site in the social media realm.  It can be based on one or more of the following attributes: an activity rate in the web location; a number of unique users in the web location; an average document goodness factor of documents in the web location; an average author rank of users in the web location; and an external rank of the web location.

These ranking factors are not used directly as calculated.  They are "normalized" for elements like document length and then combined in some mechanism to create a single UGC ranking factor. 

The main thing to note – and the item that caught my attention, obviously – is Author Rank.  Note that is has ranking factors that correspond with what I have been hypothesizing exist in the universal search engines.  That is to say, search results are not ranked only by the content on the page, but by the authority of the author who has written them, as determined by how many posts that author has made, how many sites he has made them on, how many groups he or she belongs to, and so on.

Can I say for certain that any algorithm like this has been implemented?  Absolutely not.  But my next task has to be to design an experiment to see if we can detect a whiff of it in the ether.  I'll keep you informed.

FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare

PostHeaderIcon Technical SEO: Site Loading Times and SEO Rankings Part 2

In my last post, I discussed the underlying issues regarding site loading times and SEO rankings.  What I tried to do was help the reader understand why site loading times are important from the perspective of someone designing a search engine that has to crawl billions of pages.  The post also outlines a few of the structures that they would have to put in place to accurately and effectively crawl all the pages they need in a limited time with limited processing power.  I also tried to show that a search engine like Google has a political and economic agenda in ensuring fast sites, not just a technical agenda.  Google wants as many people/eyeballs on the web as possible, so it is to their advantage to ensure that web sites provide a good user experience.  As a result, they feel quite justified in penalizing sites that do not have good speed/performance characteristics.

As you would expect, the conclusion is that if your site is hugely slow you will not get indexed and will not rank in the SERPs.  What is “hugely slow”?  Google has indicated that slow is a relative notion and is determined based on the loading times typical of sites in your geographical region.  Having said that, relative or not, from an SEO perspective I wouldn’t want to have a site where pages are taking more than 10 seconds on average to load.  We have found from the sites we have tested and built that average load times higher than approximately 10 seconds to completely load a page will have a significant impact on being indexed.  From a UE perspective, there is some interesting data that the limit on visitors patience is about 6-8 secondsGoogle has studied this data, so it would probably prefer to set its threshhold in that region.  But I doubt it can.   Many small sites are not that sophisticated, do not know these kinds of rules, and do not know how to check or evaluate their site loading times.  Besides this, there are often problems with hosts that cause servers to run slowly at times.  Google has to take that into account, as well.  So I believe that the timeout has to be substantially higher than 6-8 seconds, but 10 seconds as a crawl limit is a guess, 

I have yet to see a definitive statement by anyone as to what the absolute limit is for site speed before indexing ceases altogether (if you have a reference, please post it in the comments).  I’m sure that if a bot comes to a first page and it exceeds the bot’s timeout threshold in the algorithm, your site won’t get spidered at all.  But once the bot gets by the first page, it has to do an on-going computation of average page loading times for the site to determine if the average exceeds the built-in threshold, so at least a few pages would have to be crawled in that case. 

Now here’s where it gets interesting.  What happens between fast (let’s say < 1-2 second loading times, although this is actually pretty slow but a number Matt Cutts in the video below indicates is ok) and the timeout limit?  And how important is site speed as a ranking signal?  Let’s answer one question at a time.

When a site is slow but not slow enough to hit any built-in timeout limits (not tied to the number of pages), a couple of things can happen.   We do know that Google allocates bot time by the number of pages on the site and the number of pages it has to index/re-index.  So for a small site that performs poorly, it is likely that most of the pages will get indexed.  Likely, but not a guarantee.  It all depends on the cumulative time lag versus the average that a site creates. If a site is large, then you can almost guarantee that some pages will not be indexed, as the cumulative time lag will ultimately hit the threshold set by the bots for a site of that number of pages. By definition, some of your content will not get ranked and you will not get the benefit of that content in your rankings.

As an aside, by the way, there has been a lot of confusion around the <meta name=”revisit-after”> tag.  The revisit-after meta tag takes this form <meta name=”revisit-after” content=”5 days”>. 
This tag supposedly tells the bots how often to come back to the site to reindex this specific page (in this case 5 days).  The idea is that you can improve the crawlability of your site by telling the bots not to index certain pages all the time, but only some of the time.  I became aware of this tag at SMX East, when one of the “authorities” on SEO mentioned it as usable for this purpose.  The trouble is that, from everything I have read, the tag is completely unsupported by any of the major engines, and was only supported by one tiny search engine (SearchBC)  many years ago. 

But let’s say you are one of the lucky sites where the site runs slowly but all the pages do get indexed.  Do Google or any of the other major search engines use the site’s performance as a ranking signal?  In other words, all my pages are in the index.  So you would expect that they would be ranked based on the quality of their content and their authority derived from inbound links, site visits, time-on-site, and other typical ranking signals.  Performance is not a likely candidate for a ranking signal and isn’t important. 

If you thought that, then you were wrong. Historically, Google has said, and Matt Cutts reiterates this in the video below, that site load times do not influence search rankings.  But while that may be true now, it may not be in the near future.  And this is where Maile’s comments took me by surprise.  In a small group session at SMX East 2009, Maile was asked about site performance and rankings.  She indicated that for the “middle ground” sites that are indexing but loading slowly, site performance may already be used to influence rankings.  Who is right, I can’t say.  These are both highly respected professionals who choose their words carefully. 

 

 

 

Whatever is true, Google is sending us signals that this change is coming.  Senior experts like Matt and Maile don’t say these things lightly.  They are well considered and probably approved positions that they are asked to take.  This is Google’s way of preventing us from getting mad when the change occurs.  Google has the fallback of saying “we warned you this could happen.”  Which from today’s viewpoiint means it will happen.

Conclusion: Start working on your site performance now, as it will be important for SEO rankings later. 

Oh and, by the way, your user experience will just happen to be better, which is clearly the real reason to fix site performance. 

And it isn’t only Google that may make this change.  Engineers from Yahoo! recently filed a patent with the title “Web Document User Experience Characterization Methods and Systems” which bears on this topic.  Let me quote paragraph 21:

With so many websites and web pages being available and with varying hardware and software configurations, it may be beneficial to identify which web documents may lead to a desired user experience and which may not lead to a desired user experience. By way of example but not limitation, in certain situations it may be beneficial to determine (e.g., classify, rank, characterize) which web documents may not meet performance or other user experience expectations if selected by the user. Such performance may, for example, be affected by server, network, client, file, and/or like processes and/or the software, firmware, and/or hardware resources associated therewith. Once web documents are identified in this manner the resulting user experience information may, for example, be considered when generating the search results.

In does not appear Yahoo! has implemented any aspect of this patent yet, and who knows what the Bing agreement will mean for site performance and search.  But clearly this is a “problem” that the search engine muftis have set their eyes on and I would expect that if Google does implement it, others will follow.

FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare

PostHeaderIcon Why Search Engine Optimization Matters

Yesterday, a reasonably well-known blogger, Derek Powazek, (whose article,  against my strongest desire to give it any further validation in the search engine rankings where this article now ranks #10, gets a link here because at the end of the day the Web is about transparency and the I truly believe that any argument must win out in the realm of ideas) let out a rant against the entire SEO industry.  The article, and the responses both on his website and on SearchEngineLand upset me hugely for a number of reasons:

  1. The tone was so angry and demeaning.  As I get older (and I hope wiser), I want to speak in a way that bridges differences and heals breaches, not stokes the fire of discord.
     
  2. I believe the tone was angry in order to evoke strong responses in order to build links in order to rank high in the search engines.  Linkbuilding is a tried-and-true, legitimate SEO practice and so invalidates the entire argument Derek makes that understanding and implementing a well thought-out SEO program is so much flim-flam. Even more important to me, do we need to communicate in angry rants in order to get attention in this information and message-overwhelmed universe?  Is that what we’ve come to?  I sure hope not.
     
  3. The article’s advice about user experience coming first was right (and has my 100% agreement).  But it’s assumptions about SEO and therefore its conclusions were incorrect.
     
  4. The article’s erroneous conclusions will hurt a number of people who could benefit from good SEO advice.  THAT is probably the thing that saddens me most – it will send people off in a direction that will hurt them and their businesses substantially.  Good SEO is not a game.  It has business implications and by giving bad advice, Derek is potentially costing a lot of good people money that they need to feed their families in these tough times.
     
  5. The number of responses in agreement with his blog was overwhelming relative to the number that did not agree.  That also bothered me – that the perception of our industry is such that so many people feel our work does not serve a legitimate purpose.
     
  6. The comments on Danny Sullivan’s response to Derek were few, but they were also pro-SEO (of course).  Which means that the two communities represented in these articles aren’t talking to each other in any meaningful way.  You agree with Derek, comment to him.  You agree with Danny, comment there.  Like attracts like, but it doesn’t ultimately yield to two communities bridging their difference.

I, too, started to make comments on both sites.  But my comments rambled (another one of those prerogatives I maintain in this 140 character world) , and so it became apparent that I would need to create a blog entry to respond to the article – which I truly do not want to do because, frankly, I really don’t want to "raise the volume" of this disagreement between SEO believers and SEO heretics.  But I have some things to say that no one else is saying, and it goes to the heart of the debate on why SEO IS important and is absolutely not the same thing as a good user experience of web development.

So to Danny, to Derek, and to all the folks who have entered this debate, I  hope you find my comments below useful and, if not, my humble apologies for wasting your valuable time.

Good site design is about the user experience. I started my career in online and software UE design when that term was an oxymoron.  My first consulting company, started in 1992, was inspired by David Kelley, my advisor at Stanford, CEO of IDEO (one of the top design firms in the world),  and now founder and head of the Stanford School of Design.  I was complaining to David about the horrible state of user interfaces in software and that we needed an industry initiative to wake people.  His response was "If it’s that bad, go start a company to fix it."  Which I did.  That company built several products that won awards for their innovative user experience. 

That history, I hope, gives credibility to next next statement: I have always believed, and will always believe, that good site experience trumps anything else you do.  Design the site for your customer first.  Create a "natural" conversation with them as they flow through the site and you will keep loyal customers.

Having said that, universal search engines do not "think" like human beings.  They are neither as fast or as capable of understanding loosely organized data.  They work according to algorithms that attempt to mimic how we think, but they are a long way from actually achieving it.  These algorithms, as well as the underlying structures used to make them effective, also must run in an environment of limited processing power (even with all of Google’s server farms) relative to the volume of information, so they have also made trade-offs between accuracy and speed.  Examples of these structures are biword indices and positional indices.  I could go into the whole theory of Information architecture, but leave it to say that a universal search engine needs help in interpreting content in order to determine relevance. 

Meta data is one area that has evolved to help the engines do this.  So, first and foremost, by expecting this information, the search engines expect and need us to include data especially for them that has nothing to do with the end user experience and everything with being found relevant and precise.  This is the simplest form of SEO.  There are two points here:

  1. Who is going to decide what content goes into these tags? Those responsible for the user experience?  I think not.  The web developers? Absolutely positively not.  It is marketing and those who position the business who make these decisions.
     
  2. But how does marketing know how a search engine thinks?  Most do not.  And there are real questions of expertise here, albeit for this simple example, small ones that marketers can (and are) learning.  What words should I use for the search engines to consider a page relevant that then go into the meta data?  For each meta data field, what is the best structure for the information?  How many marketers, for example, know that a title tag should only be 65 characters long, or that a description tag needs to be limited to 150 characters, that the words in anchor text are a critical signaling factor to the search engines, or that alt-text on an image can help a search engine understand the relevance of a page to a specific keyword/search?  How many know the data from the SEOMoz Survey of SEO Ranking Factors showing that the best place to put that keyword in a title tag for search engine relevance is in first position, and that the relevance drops off in an exponential manner the further back in the title the keyword sits?  On this last point, there isn’t one client who hasn’t asked me for advice.  They don’t and can’t track the industry and changes in the algorithms closely enough to follow this.  They need SEO experts to help them – a member of the trained and experienced professionals in the SEO industry, and this is just the simplest of SEO issues.

How about navigation?  If you do not build good navigational elements into deeper areas of the site (especially large sites) that are specifically for search engines and/or you build it in a way that a search engine can’t follow (e.g. by the use of Javascript in the headers or flash in a single navigation mechanism throughout the site), then the content won’t get indexed and the searcher won’t find it.  Why are good search-specific navigational elements so important?  It comes back to limited processing power and time.  Each search engine has only so much time and power to crawl the billions of pages on the web, numbers that grow every day and where existing pages can change not just every day but every minute.  These engines set rules about how much time they will spend crawling a site and if your site is too hard to crawl or too slow, many pages will not make it into the indices and the searcher, once again, will never find what could be hugely relevant content.

Do UE designers or web developers understand these rules at a high level?  Many now know not to use Javascript in the headers, to be careful how they use flash and, if they do use it in the navigation, to have alternate navigational elements that help the bots crawl the site quickly.  Is this about user experience?  Only indirectly.  It is absolutely positively about search engine optimization, however, and it is absolutely valid in terms of assuring that relevant content gets put in front of a searcher.

Do UE designers or web developers understand the gotchas with these rules?  Unlikely.  Most work in one organization with one site (or a limited number of sites).  They haven’t seen the actual results of good and bad navigation across 20 or 50 or 100 sites and learned from hard experience what is a best practice.  They need an SEO expert, someone from the SEO  industry, to help guide them.  

Now let’s talk about algorithms.  Algorithms, as previously mentioned, are an attempt (and a crude one based on our current understanding of search) at mimicking how searchers (or with personalization a single searcher) think so that searches return relevant results to that searcher.  If you write just for people, and structure your pages just for readers, you are doing your customers a disservice because what a human can understand as relevant and what a search engine can grasp of meaning and relevance are not the same.  You might write great content for people on the site, but if a search engine can’t understand its relevance, a searcher who cares about that content will never find it. 

Does that mean you sacrifice the user experience to poor writing?  Absolutely, positively, without qualification not.  But within the structure of good writing and a good user experience, you can design a page that helps/signals the search engines, with their limited time and ability to understand content, what keywords are relevant to that page. 

Artificial constraint, you say? How is that different than the constraints I have when trying to get my message across with a good user experience in a data sheet?  How is that different when I have 15 minutes to get a story across in a presentation to my executive staff in a way that is user friendly and clear in its messaging?  Every format, every channel for marketing has constraints.  The marketer’s (not the UE designer’s and not the web developer’s) job is to communicate effectively within those constraints. 

Does a UE designer or the web developer understand how content is weighted to create a ranking score for a specific keyword within a specific search engine?  Do they know how position on the page relates to how the engines consider relevance? Do they understand how page length effects the weighting?  Take this example.  If I have two pages, one of which contains two exact copies of the content on the first page, which is more relevant?  From a search engine’s perspective they are equally relevant, but if a search engine just counted all the words on the second page, it would rank higher.  A fix is needed.

One way that many search engines compensate for page length differences is through something called pivoted document length normalization (write me if you want a further explanation).  How do I know this?  Because I am a search engine professional who spends time every day learning his trade, reading on information architecture and studying the patents filed by the major search engines to understand how the technology of search can or may be evolving.  Because – since I can’t know exactly what algorithms are currently being used –  I run tests on real sites to see the impact of various content elements on ranking.  Because I do competitive analysis on other industry sites to see what legitimate, white hat techniques they have used and content they have created (e.g. videos on a youtube channel that then point to their main site) to signal the relevance of their content to the search engines. 

And to Derek’s point, what happens when the algorithms change?  Who is there watching the landscape for any change, like an Indian scout in a hunting party looking for the herd of buffalo?  Who can help interpret the change and provide guidance on how to adapt content to maintain the best signals of relevance for a keyword to the search engines?  Derek makes this sound like an impossible task and a lot of hocus-pocus.  It isn’t and it’s not.  Professional SEO consultants do this for their clients all the time, by providing good maintenance services.  They help their clients content remain relevant, and hopefully ranking high in the SERPs, in the face of constant change.

So to ask again, do UE designers or product managers understand these issues around content?  At some high level they may (a lot don’t).  Do web developers? Maybe, but most don’t because they don’t deal in content – it is just filler that the code has to deal with (it could be lorem ipsum for their purposes).  Do any of these folks in their day-to-day struggles to do their jobs under tight time constraints have the time to spend, as I do, learning and understanding these subtleties or running tests? Absolutely, positively not.  They need an SEO professional to counsel them so that they make the right design, content and development choices.

I’ll stop here.  I pray I’ve made my point calmly and with a reasoned argument.  Please let me know.  I’m not Danny Sullivan, Vanessa Fox, Rand Fishkin, or Stephan Spencer, to name a few of our industry’s leading lights.  I’m just a humble SEO professional who adores his job and wants to help his clients rank well with their relevant business information.  My clients seem to like me and respect what I do, and that gives me an incredible amount of satisfaction and joy. 

I’m sorry Derek, but I respect your viewpoint and I know that you truly believe what you are saying.  But as an honest, hard-working SEO professional, I couldn’t disagree with you more.

FacebookTwitterFriendFeedStumbleUponDeliciousDiggLinkedInMultiplyBlogger PostPingDiigoGoogle ReaderMySpacePlaxo PulseSphinnTechnorati FavoritesTumblrWordPressShare
Posts By Date
September 2014
M T W T F S S
« Jul    
1234567
891011121314
15161718192021
22232425262728
2930