20
May
By Mayank Bawa in Analytics, Blogroll, Business analytics, Business intelligence on May 20, 2008
   

I’ve remarked in an earlier post that the usage of data is changing and new applications are on the horizon. Over the past few years, we’ve observed or invented quite a few interesting design patterns for business processes that use data.

There are no books or tutorials for these new applications, and they are certainly not being taught in the classrooms of today. So I figured I’d share some of these design patterns on our blog.

Let me start with a design pattern that we internally call “The Automated Feedback Loop”. I didn’t invent it but I’ve seen it being applied successfully at search engines during my research days at Stanford University. I certainly think there is a lot of power that remains to be leveraged from this design principle in other verticals and applications.

Consider a search engine. Users ask keyword queries. The search engine ranks documents that match the queries and provides 10 results to the user. The user clicks one of these results, perhaps comes back and clicks another result, and then does not come back.

How do search engines improve themselves? One key way is by recording the number of times users clicked or ignored a result page. They also record the speed with which a user returned from that page to continue his exploration. The quicker the user returned, the less relevant the page was for user’s query. The relevancy of a page now becomes a factor in the ranking function itself for future queries.

The Automated Feedback LoopSo here is an interesting feedback loop. We offered options (search results) to the user, and the user provided us feedback (came back or not) on how good one option was compared to the others. We then used this knowledge to adapt and improve future options. The more the user engages, the more everyone wins!

This same pattern could hold true in a lot of consumer-facing applications that provide consumers with options.

Advertising networks, direct marketing companies, and social networking sites are taking consumer feedback into account. However, this feedback loop in most companies today is manual and not automated. Usually the optimization (adapting to user response) is done by domain experts who read historical reports from their warehouses, build an intuition of user needs and then apply their intuition to build a model that runs everything from marketing campaigns to supply chain processes.

Such a manual feedback loop has two significant drawbacks:

1. The process is expensive: it takes a lot of time, trial and error for humans to become experts, and as a result the experts are hard to find and worth their weight in gold.

2. The process is ineffective: humans can only think about handful of parameters and they optimize for the most popular products or processes (e.g., “Top 5 products or Top 10 destinations”). Everything outside this area of comfort is left under-optimized.

Such a narrow focus on optimization is severely limiting. The incorporation of Top 10 trends into future behavior is akin to a search engine saying that it will optimize for only the top 10 searches of the quarter. I am sure Google would definitely be a less valuable company then, and the world a less engaging place.

I strongly believe that there are rich dividends to be reaped if we can automate the feedback process in more consumer-facing areas. What about hotel selection, airline travel, and e-mail marketing campaigns? E-tailers, news (content providers), insurance, banks and media sites are all offering the consumer a choice for his time and money. Why not instill an automated feedback loop in all consumer-facing processes to improve consumer experience? The world will be a better place for both the consumer and the provider!


Comments:
Scott on May 20th, 2008 at 11:00 pm #

I just discovered your company & technology, and I’ve added your blog to my list of feeds. Ever since I read Google’s paper about the Google Filesystem, I realized that their massively parallel data storage solution was the key enabler of their ability to analyze vast amounts of data (they were also quick to acknowledge that RAID is stupid!).

I’m currently in the ETL field, so I’m familiar with the technology you’re discussing… I’ll definitely be paying attention. Congrats on the MySpace implementation – 100TB is a great accomplishment.

MCF on May 21st, 2008 at 9:05 am #

Mayank -

Yes, ideally companies could automate this process. But I think there are a few significant barriers to this type of approach:

(1) Institutional – Managers and firms generally desire control (which is not per se negative). Take the example of product price testing. While we could rely on analytics to drive the pricing levels we show to users (based on demand, other factors) we instead take the analysis offline. In this case, there are qualitative factors that influence our decisions, i.e. overall product strategy, future bundling opportunities, etc. Oh, and distrust in automated technologies that directly effect the top-line.

(2) Cost – These types of systems are (a) not easy to architecture and (b) not easy to implement. Sure, companies like Google, et al w/ vast resources can see a positive ROI. But most startups (and even established companies) that focus on a vertical or do not have significant scale cannot allocate resources to a project with nebulous returns.

You make the point that the manual/human process is expensive but do you have data to support that? e.g. The cost of implementing an automated system vs. hiring someone with domain knowledge?

I work in the data group at a startup so I deal w/ these issues on a day-to-day process. I’d be interested to hear more of your thoughts.

Mayank on May 21st, 2008 at 1:52 pm #

Scott – Thank you.

Mayank on May 21st, 2008 at 2:01 pm #

MCF – good points.

(1) Control: The key here is to separate strategy from operational tactics. For example, etailers can have a strategy to give $5 off to customers who have discarded their shopping cart, either via email or after inactivity. The operational piece is to iterate upon different messaging, mediums of contact (on-site, email), threshold when this offer is extended. So Control lives with the Intelligent Expert, Operations lives in nimble processes.

(2) Cost: It doesn’t cost much to set up a feedback loop. It is more of a way of realizing its power, and taking steps towards it. I’d not be surprised to learn that Google was getting feedback into its ranking when it was a small company. Yep, it was a small company once without the billions it makes now.

dm on May 23rd, 2008 at 6:02 am #

Mayank,
This sounds like what big companies do today using BI, however there is still a human component. Someone has to read a report or look at a chart and see the data in powerpoint, make a decision and then influence the system. Did anyone say backlog? So the premise is wonderful – fast queries on large datasets to optimize a user experience – I want that and so do many others. If I was buying something online, why can’t I see a simple graph displaying my average purchase at that store over the last 12 months, compared to the spending habbits of four different segments based on my social network? Sign me up!

Well there are the latency and cost factors which Aster seems to have a new solution for. However we may optimize techonology, there is still a tipping point of sorts. It is fair to say the above information is used today for internal decision making processes to expand share or increase profit. This is done inside the firewall. Question is, when does it become more profitable for companies to externalize these analysis to drive the customer experience?

Thanks – dm

John Donahue on June 3rd, 2008 at 1:31 pm #

Mayank –

Clearly there exists the potential to take what I will call “operational” optimization and bring it to the next level. I mean relevancy is the name of the game in regards to that bag baby.

You seem to be indicating and almost professing the value of behavioral networks. The challenge will come in determining not only how to regress behavior in order to deliver more relevant content but more so in one’s ability to track sequential behavior to try to guess what’s going to be of interest next. I believe one of your existing clients could speak on this topic far more than i could though ;)

-jd

John Donahue on June 3rd, 2008 at 2:18 pm #

just in case there is curiosity to my post…behavioral networks use feedback loops to determine use behavior. etc..

Sam on August 24th, 2008 at 9:34 pm #

Good ideas. Added links on my site

Post a comment

Name: 
Email: 
URL: 
Comments: