About Online Matters

PostHeaderIcon Search Engines: Social Media, Author Rank and SEO

In my previous discussions of social media, channel architectures, and branding, I discussed the fact that I am manic about locking down my online brand (onlinematters) because there seems to be some relationship in the universal search engines between the number of posts/the number of sites that I post from under a specific username and how my posts rank.  It is as if there is some measure of trust given to an author the more he publishes from different sites and the more people see/read/link to what he has written.  I am not talking about authority given to the actual content written by the author – that is the core of search.  I am talking instead about using the author's behavior and success as a content producer to change where his content ranks for any given search result on a specific search term.  It is similar, in many ways, to what happened in the Vincent release where brand became a more important ranking factor.  In this case, the author and the brand are synonymous and when the brand is highly valued, then those results would, under my hypothesis, be given an extra boost in the rankings.

This was an instinct call, and while I believed I had data to support the theory, I had no research to prove that perhaps an underlying algorithm had been considered/created to measure this phenomenon in universal search. 

I thus considered myself twice lucky while doing my weekly reading on the latest patents to find one that indicates someone is thinking about the issue of "author rank."  On October 29th, Jaya Kawale and Aditya Pal of Yahoo!  applied for a patent with the name "Method and Apparatus for Rating User Generated Content in Search Results."  The abstract reads as follows:

Generally, a method and apparatus provides for rating user generated content (UGC) with respect to search engine results. The method and apparatus includes recognizing a UGC data field collected from a web document located at a web location. The method and apparatus calculates: a document goodness factor for the web document; an author rank for an author of the UGC data field; and a location rank for web location. The method and apparatus thereby generates a rating factor for the UGC field based on the document goodness factor, the author rank and the location rank. The method and apparatus also outputs a search result that includes the UGC data field positioned in the search results based on the rating factor.

Let's see if we can't put this into English comprehensible to the common search geek.  Kawale and Pal want to collect data on three specific ranking factors and to combine these into a single, weighted ranking factor, that is then used to influence rank ordering based on  what they term "User Generated Content" or UGC.  The authors note that typical ranking factors in search engines today are not suitable foir ranking UGC.  UGC are fairly short, they generally do not have links to or from them (rendering the back-link based analysis unhelpful) and spelling mistakes are quite common.  Thus a new set of factors is needed to adequately index and rank content from UGC.

The first issue the patent/algorithm has to deal with is defining what the term UGC includes.  The patent specifically mentions "blogs, groups, public mailing lists, Q & A services, product reviews, message boards, forums and podcasts, among other types of content." The patent does not specifically mention social media sites, but those are clearly implied. 

The second issue is to determine what sites should be scoured for UGC.  UGC sites are not always easy to identify.  An example would be a directory in which people rank references based on 5-star rating, where that is the only user input.  Is this site easy to identify as a site with UGC?  Not really, but somehow the search engine must make a decision whether this site is within its valid universe.  Clearly, some mechanism for categorizing sites with UGC needs to exist and while Kawale and Pal use the example of blog search as covering a limited universe of sites, their patent does not give any indication of how sites are to be chosen for inclusion in the crawl process.

Now we come to the ranking factors.  The three specific ranking factors proposed by Kawale and Pal are:

  • Document Goodness.  The Document Goodness Factor is based on at least one (and possibly more) of the following attributes of the document itself: a user rating; a frequency of posts before and after the document is posted; a document's contextual affinity with a parent document; a page click/view number for the document; assets in the document; document length; length of a thread in which the document lies; and goodness of a child document. 
  • Author Rank.  The Author Rank is a measure of the author's authority in the social media realm on a subject, and is based on on or more of the following attributes:  a number of relevant posted messages; a number of irrelevant posted messages; a total number of root documents posted by the author within a prescribed time period; a total number of replies or comments made by the author; and a number of groups to which the author is a member.
  • Location Rank.  Location Rank is a measure of the authority of the site in the social media realm.  It can be based on one or more of the following attributes: an activity rate in the web location; a number of unique users in the web location; an average document goodness factor of documents in the web location; an average author rank of users in the web location; and an external rank of the web location.

These ranking factors are not used directly as calculated.  They are "normalized" for elements like document length and then combined in some mechanism to create a single UGC ranking factor. 

The main thing to note – and the item that caught my attention, obviously – is Author Rank.  Note that is has ranking factors that correspond with what I have been hypothesizing exist in the universal search engines.  That is to say, search results are not ranked only by the content on the page, but by the authority of the author who has written them, as determined by how many posts that author has made, how many sites he has made them on, how many groups he or she belongs to, and so on.

Can I say for certain that any algorithm like this has been implemented?  Absolutely not.  But my next task has to be to design an experiment to see if we can detect a whiff of it in the ether.  I'll keep you informed.

Share

3 Responses to “Search Engines: Social Media, Author Rank and SEO”

Posts By Date
November 2009
M T W T F S S
« Oct   Dec »
 1
2345678
9101112131415
16171819202122
23242526272829
30