Posts Tagged ‘algorithms’
A slight detour in our discussion of methods of geolocation in mobile for a comment about algorithms.
Many, if not most, companies with any intelligent automation talk about their algorithms. There’s an algorithm for optimizing pricing, an algorithm for selecting a target audience, an algorithm for beating the casino at blackjack, etc. etc. You get the point. Marketers almost always want to include the term in their collateral. Why? Because the implication of the word “algorithm” today is that they are hard, manipulate huge amounts of data, require a lot of complex math, and thus by inference make the company appear both smarter than the average bear and the owner of unique intellectual property that makes their products or services better than the next guy’s.
On my teams, the word algorithm is verboten in describing what we do. Algorithms are tools - nothing more than step-by-step procedures for calculations. And yes, for the technologists in the crowd I am aware the definition can be a tad more precise. But that is just the point – the word algorithm has been used so much and applied to such a wide range of situations (e.g ”Mark had an algorithm by which he determined which route to take to the office during rush hour) that it has become effectively meaningless.
On my teams, we use the word model because what we do is model human behavior. We look at data to understand how people act, what they value, what they believe. We then hypothesize what that data means in terms of the motivations and internal beliefs/processes that lead to those behaviors . Basically, we are data-driven virtual psychologists trying to understand what is going on in the ‘black box’ of the human mind based on what we can see – the inputs into the box and the outputs from it.
The hardest part of our job is not the math or the calculation process, but asking the right questions. As I gain more experience in this arena, this is where most of the data scientists miss the mark. They are so caught up in the math they forget about (or don’t understand) the real issue. After all, guys (and it is mainly guys) who have a highly dominant left brain don’t really groc the emotion that their work is trying to uncover. This is especially true, and this is not a sexist comment, when we are talking about the emotions of women shoppers. And at the end of the day, it is comprehending the sentiments of a human being that we really want to understand.
For example, we see that two people go to a Starbucks every day and both drink three cups of coffee. However, one person goes repetitively to the same Starbucks, while the other goes to numerous ones around their city throughout the day. What would cause that difference. Hypothesis: one is a stay at home mom/worker who takes a run/walk every morning and stops for coffee; the other is a service professional moving between customer offices. Or maybe the second person is a pizza delivery person. Each of these hypotheses is tested with various calculations of data and are either validated or not. At that point we have a guess at who they might be.
Now we can look at other data and start making predictions about their attitudes and values. Let’s say the person has been identified as a stay at home mom. What does that tell us about them? Well, we might guess they love being a parent enough to sacrifice some part of their career to have time with their kids. Alternately, they might be driven by the fact that their spouse makes more money and so they have to be the member of the couple that has to make a career sacrifice for the financial welfare of the family. Which means that they are rational, but also are willing (with regrets) to subsume their own needs to those of others. Either way, they might be frustrated with having to stay at home, and be responsive to an offer for a brand that shows it recognized their frustration and offers them something uniquely for them and not the other members of the family. A spa day, for example. Or at least time to create their own relaxation time at home – because they can’t go to a spa and leave the kids at home alone (because…they can’t afford a nanny?).
So now what. I create a model that tries to capture and predict the behavior of someone with those attitudes and values. Yes, there is math underneath it. Yes there is a step-by-step procedure – an algorithm – which runs underneath it. But I could care less about the math – that’s a tool. The important thing is to focus on how we think the black box of the human soul is working.
Using this model I would predict a certain type of response to an ad that reflects these values, based on prior response rates to similar ads – maybe not targeted to exactly the same psychology but let’s say similar (without defining what ‘similar’ means in this case.) Now we run the ad and see what happens. If the response rate exceeds my projected threshold, I will assume that my model of what is happening in the black box is right; if not, we go back to the drawing board.
So have you seen the latest swiffer ads? Swiffer’s value proposition: spend less time housecleaning and we give you more time for yourself. In the ad, mom uses that gift of a home spa day that’s been sitting on the shelf. When her kids come into the bathroom looking for her, she turns and has a cucumber mask in process. Kids scream in fright; run out. Mom is not happy to scare her kids, but in some ways smug because her needs came before the kids. I would bet this ad is successful at engaging the audience just described because it appeals to their sentiments.
Let’s be clear, though. Success does not mean I really know what is happening in the black box – how the gears are arranged, what causes them to move, how fast they move. It’s just that whatever model I have created parallels the way the mechanics of the black box of a group of people work, so I assume I have got the model right. But later data may prove me wrong and, with further modeling and using better algorithms as tools, I may get better and better at paralleling the real psychology. But this is working at a very high level on a group of people. I can never really know what is going on in the black box of any individual’s mind, and even within the group, it varies from person to person.
The algorithm is not the model. It is a tool we use to build a model. Nothing more; nothing less. That’s why the term is verboten in my groups. Our focus must always be on the person, not the tool, or else we lose sight of our customers and can only see as far as our computer screens.
But of course, I don’t want to ignore the previous Vincent update – as that was the connection to post #1.
Orion first. Actually Google did not announce “Orion” – which is a search technology it purchased in 2006, along with it’s college-student developer Ori Allon. But my guess is that thanks to Greg Sterling’s new article containing that title the term “Orion Release” will stick. Here’s how Danny Sullivan described the technology back in April 2006:
It sounds like Allon mainly developed an algorithm useful in pulling out better summaries of web pages. In other words, if you did a search, you’d be likely to get back extracted sections of pages most relevant to your query.
Ori himself wrote the following in his press release:
Orion finds pages where the content is about a topic strongly related to the key word. It then returns a section of the page, and lists other topics related to the key word so the user can pick the most relevant.
Google actually announced two changes:
Longer Snippets. When users input queries of more than three words, the Google results will now contain more lines of text in order to provide more information and context. As a reminder, a snippet is a search result that starts with a dark blue title and is followed by a few lines of text. Google’s research must have shown that regular-length snippets were not providing enough information to searchers to provide a clear preference for a result based on their longer search term – as their stated intent is to provide enhanced information that will improve the searcher’s ability to determine the relevance of items listed in the SERPs.
Having said this, I don’t see any difference. My slav…. I mean my 12-yo son (who has been doing keyword analysis since he was 10, so no slouch at this) ran ten tests on Google to see if we could find a difference (I won’t detail all the one- and two- vs 3+ word combinations we tried – if you want to have the list, leave a comment or send a twitter to arthurofsun and I will forward it to you). But shown below are the results for France Travel vs France Travel Guides for Northern France:
As you can see, there is absolutely no difference in snippet length for the two searches - and this was universally true across all the searches we ran. So I’m not sure – I wonder if Ori Allon, who wrote the post, could help us out on this one.
Also, I am somewhat confused. If you type in more keywords, the search engine has more information by which to determine the relevance of a result. So why would I need more information? Where I need more information is in the situation of a 3- keyword search, which will return a broad set of results that I will need to filter based on the information contained in a longer snippet.
Enhanced Search Associations. The bigger enhancement – and the one that seems most likely to derive from the original Orion technology – are enhanced associations between keywords. Basically if you type in a keyword – Ori uses the example ”principles of physics” – then the new algorithms understand that there are other ideas related to this I may be interested in, like “Big Bang” or “Special Relativity.” The way Google has implemented this is to put a set of related keywords at the bottom of the first SERP, which you may click on. When you click, it returns a new set of search results based on the keyword you clicked. Why at the bottom of the first SERP? My hypothesis would be that if the searcher has gone to the bottom of the page, it means that they haven’t found what they are looking for. So this is the right place in the user experience to prompt them with related keywords that they may find more relevant to the content they are seeking.
From my perspective, this feels like the “People who liked this item also bought…” widget on most comparison shopping sites (which I know something about, having been the head of marketing for SHOP.COM.) I’m not saying there is anything wrong with this – I’m just trying to make an analogy to the type of user experience Google is trying to create.
Shown below is an example of a enhanced search associations from a search on the broad term “credit derivatives in the USA”:
As I expected, the term “credit default swaps” – which is the major form of credit derivative – shows as an associated keyword. What I do not see in the list – and was surprised – was any reference to the International Swaps and Derivatives Association (ISDA), which is the organization that has developed the standards and rules by which most derivatives are created. It does, however, show up for the search on the keyword “credit default swap.” I’d be curious to understand just exactly how the algorithm has been tuned to make trade-offs between broad concepts (i.e, credit derivatives, which is a category)) and very focused concepts (i.e. credit default swap, which is a specific product). Maybe I can get Ori to opine on that as well, but most likely that comes under the category of secret sauce.
Anyway, fascinating and it certainly shows that Google continues to evolve the state of IR.
Well, I’ll just have to leave the Vincent release until tomorrow. Something else happened this morning I need to do a quick entry about. Sigh…..