By George Candea in Analytics, Business analytics on July 14, 2008

In a recent interview with Wired magazine, IBM’s Wattenberg mentioned an interesting yardstick for data analytics: compare the data you give to a human to the sum total of the words that human will hear in a lifetime, which is less than 1 TB of text. Incidentally, this 1 TB number is how big Gordon Bell thinks a lifetime of recording daily minutiae would be; Bell now has MyLifeBits, the most extensive personal archive, in which he records all his e-mails, photographs, phone calls, Web pages visited, IM conversations, desktop activity (like which apps he ran and when), health records, books in his library, labels of the bottles of wine he enjoyed, etc. His collection grows at about 1 GB / month, amounting to ~1 TB for a lifetime; and that’s what the human brain is built for.

Wattenberg comes with an interesting perspective: human language is a form of compression (“Twelve words from Voltaire can hold a lifetime of experience”). This is because of the strong contextual information carried by each phrase. MyLifeBits does not reflect the life experiences; it provides the bits from which those life experiences are built, through connections and interpretations.

Herein lies the challenge of data analytics: how to “compress” vast amounts of data into a small volume of information that the human brain can absorb, process, and act upon. How to leverage context in delivering answers, recommendations, and insights. The Web brought data out in the open, search engines allowed us to ask questions of this data, analytics engines are now starting to allow precise and deep questions to be asked of otherwise overwhelming amounts of data. We, as an industry, are just entering the Neolithic of information history.

We need breakthroughs in visualization and, in particular, in the way we leverage the context of previous answers. Researchers at University College London are looking into how the hippocampus encodes spatial and episodic memories; they are going as far as analyzing fMRI (functional MRI) scans of the brain to extract the memories stored in that brain. In computerized data analytics, we are faced with a relatively simpler task: record all past answers and then leverage this context in order to more effectively communicate new results. Understand how the current answer relates to the previous one, and deliver an interpretation of the delta. That’s where we would like to be, sooner rather than later.

Bookmark and Share

Post a comment