14
Jul
How to Answer Analytic Questions
By George Candea in Analytics, Business analytics on July 14, 2008
   

In a recent interview with Wired magazine, IBM’s Wattenberg mentioned an interesting yardstick for data analytics: compare the data you give to a human to the sum total of the words that human will hear in a lifetime, which is less than 1 TB of text. Incidentally, this 1 TB number is how big Gordon Bell thinks a lifetime of recording daily minutiae would be… Bell now has MyLifeBits, the most extensive personal archive, in which he records all his e-mails, photographs, phone calls, Web pages visited, IM conversations, desktop activity (like which apps he ran and when), health records, books in his library, labels of the bottles of wine he enjoyed, etc. His collection grows at about 1 GB / month, amounting to ~1 TB for a lifetime… and that’s what the human brain is built for.

Wattenberg comes with an interesting perspective: human language is a form of compression (�Twelve words from Voltaire can hold a lifetime of experience�). This is because of the strong contextual information carried by each phrase. MyLifeBits does not reflect the life experiences; it provides the bits from which those life experiences are built, through connections and interpretations.

Herein lies the challenge of data analytics: how to “compress� vast amounts of data into a small volume of information that the human brain can absorb, process, and act upon. How to leverage context in delivering answers, recommendations, and insights. The Web brought data out in the open, search engines allowed us to ask questions of this data, analytics engines are now starting to allow precise and deep questions to be asked of otherwise overwhelming amounts of data. We, as an industry, are just entering the Neolithic of information history.

We need breakthroughs in visualization and, in particular, in the way we leverage the context of previous answers. Researchers at University College London are looking into how the hippocampus encodes spatial and episodic memories; they are going as far as analyzing fMRI (functional MRI) scans of the brain to extract the memories stored in that brain. In computerized data analytics, we are faced with a relatively simpler task: record all past answers and then leverage this context in order to more effectively communicate new results. Understand how the current answer relates to the previous one, and deliver an interpretation of the delta. That’s where we would like to be, sooner rather than later.

Bookmark and Share


17
Jun
By Tasso Argyros in Analytics, Analytics tech, Blogroll, Database, Scalability on June 17, 2008
   

 

I’m delighted to be able to bring to a guest post to our blog this week. David Cheriton, one of Aster Data Systems’ angel investors, leads the Distributed Systems Group at Stanford University and has been known for making some smart investments. Below is what David has to say about the need to address the network interconnect in MPP systems - we hope this spurs some interesting conversation!

“A cluster of commodity computer nodes clearly offers a very cost-effective means of tackling demanding large-scale applications such as data mining over large data sets. However, most applications require substantial communication. For example, consider a query that requires a join between three tables that share no common key to partition on (non-parallelizable query), a frequent case in analytics. In conventional architectures, such operations need to move huge amounts of data among different nodes and depend on the interconnect to deliver adequate performance.

The cost and performance impact of the interconnect for the cluster to support this communication is often an unpleasant surprise, particularly without careful design of the cluster software. Yes, we are seeing the cost of 10G Ethernet coming down in cost, both in switches and NICs, and the IEEE is starting work on 100G Ethernet. However, the interconnect is, and will remain, an issue for several reasons.

First, in a parallelizable query, you need to get data from one node to several others. The bandwidth out of this one node is limited by its NIC bandwidth, Bn. In a uniformly configured cluster, each of the receiving nodes has the same NIC bandwidth Bn, so with K receivers, each is receiving at 1/K. However, the actual performance of the cluster can be limited by data hotspots, where the requirement for data from a given node far exceeds its NIC and/or memory bandwidth.

The inverse problem, often called the incast problem, arises when K nodes need to send data to a single node. Each can send at bandwidth Bn for a total bandwidth demand of K*Bn, but the target node can only receive at Bn or 1/K of the offered load. The result can be congestion, packet drop from overflowing packet queues, TCP timeouts and backoff, resulting in dramatically lower goodput than even Bn. Here, I say “dramatically” because the performance can collapse to 1/10 of expected or worse because of packet drop, timeout and retries that can occur at the TCP level. In systems with as little as 10 nodes, connected via a Gigabit Ethernet interconnect, performance can deteriorate to under 10 MB per second per node! For higher number of nodes, the problem becomes even worse.

Phanishayee et al have studied the incast problem. They show that TCP tuning does not help significantly. They observe that significantly larger switch buffering helps up to some scale, but that drives up the cost of the switches substantially. Besides some form of link-level flow control (which suffers from head-of-line blocking, is not generally available and usually does not work between switches), the other solution is just adding more NICs or faster NICs per node, to increase the send and receive bandwidth.

Moreover, with k NICs per node, an N node network now requires k*N ports, requiring a larger network to interconnect all the nodes in the cluster. Large fast networks are an engineering and operation challenge. The simplest switch is a single-chip shared memory switch. This type of switch is limited by the memory and memory bandwidth available for buffering. For instance, a 24-port 10 Gbps switch requires roughly 30 Gbytes/sec of memory bandwidth, forcing the use of on-chip memory or off-chip SRAM, in either case rather limited in size, aggravating TCP performance problems. This memory bandwidth demand tends to limit the size of shared memory switches.

The next step up is a crossbar switch. In effect, each line card is a shared memory switch, possibly splitting the send and receive sides, connected by a special interconnect, the crossbar. The cost per port increases because of the interconnect and the overall complexity of the system and the lower volume for large-scale switches. In particular each line card needs to solve the same congestion problems as above in sending through the interconnect to other line cards.

Scaling larger means building a multi-switch network. The conventional hierarchical multi-switch network introduces bottlenecks within the network, such as from the top-of-rack switch to the inter-rack switch, leading to packet loss inside the network. Various groups have proposed building Clos networks out of commodity GbE switches, but these require specialized routing support and complex configuration and a larger number of components, leading to more failures and complex failure behavior and extra cost.

Overall, you can regard the problem as being k nodes of a cluster needing to read from and write to the memory of the other nodes. The network is just an intermediary trying to handle this aggregate of read and write traffic across all the nodes in the cluster, thus requiring expensive high-speed buffering because these actions are asynchronous/streamed. Given this aggregate demand, faster processors and faster NICs just make the challenge greater.

In summary, MPP databases are more MPP than databases, in the sense that for complex distributed queries the network performance (major bottleneck in MPP systems) is much more challenging than disk I/O performance (major bottleneck in conventional database systems). Smart software that is able to minimize demands on the network and avoid hotspots and incast can significantly reduce the demand on the network and achieve far more cost-efficient scaling of the cluster, plus avoid dependence on complex (CLOS) or non-sweet spot networking technologies (i.e. non-Ethernet). It’s a great investment in software and processor cycles when the network is intrinsically a critical resource. In some sense, smart software in the nodes is the ultimate end-to-end solution, achieving good application performance by minimizing its dependence on the intermediary, the interconnect.”

- Prof. David Cheriton, Computer Science Dept., Stanford University

 

Bookmark and Share


11
Jun
By Mayank Bawa in Analytics, Blogroll, Business analytics, Interactive marketing on June 11, 2008
   

I had the opportunity to work closely with Anand Rajaraman while at Stanford University and now at our company. Anand teaches the Data Mining class at Stanford as well, and recently he did a very instructive post on the observation that efficient algorithms on more data usually beat complex algorithms on small data. He followed it up with an elaboration post. Google also seems to believe in a similar philosophy.

I want to build upon that observation here. If you haven’t read the posts, do read them first. It is well-worth the time!

I propose that there are 2 forces in action that help simple algorithms on big data beat complex algorithms on small data:

  1. The freedom of big data allows us to bring in related datasets that provide contextual richness.
  2. Simple algorithms allow us to identify small nuances by leveraging contextual richness in the data.

Let me expand my proposal using Internet Advertising Networks as an example.

Advertising networks essentially make a guess about a user’s intent and present an advertisement (creative) to the consumer. If the user is indeed interested, the user clicks through the creative to learn more.

Advertising networks are used today on a CPC model (Cost-Per-Click). There are stronger variants of CPL (Cost-Per-Lead) or CPA (Cost-Per-Acquisition) but these variants are as applicable to the discussion as the simpler CPC model. There is a simpler variant of CPM (Cost-Per-Impression) but an advertiser ends up effectively computing CPC by keeping track of click-through rates for money spent via the CPM model. The CPC model dictates that Advertising Networks do not make money unless the user clicks on a creative.

Today, the best advertising networks have a click through rate of less than 1%. In other words, advertising networks correctly interpret a user’s intentions 1% of the time, 99% of the time they are ineffective!I find this statistic immensely liberating. Here is a statistic that shows that even if we are correct 1% of the time, the rewards are significant. ☺Why is the click-through rate so low? I think it is because human behavior is difficult to predict. Even sophisticated algorithms (that are computationally practical only on small datasets) do a bad job of predicting human behavior.It is much more powerful to think of efficient algorithms that execute across larger, diverse datasets to exploit the richness inherent in the context to enable a higher click-through rate.I’ve observed people in the field sample behavioral data to reduce their operating dataset. I submit that a sample of 1% will lose the nuances and the context that can cause an uplift and growth in revenue.For example, a Content Media site may have 2% of their users who come in to read Sports stay on to read Finance articles. A sampling of 1% is certain to reduce this 2% population trait to a statistically insignificant portion in the sample. Should we or should we not derive this insight to identify and engage the 2% by serving them better content?Similarly, an Internet Retailer may have 2% of their users who come in to buy flat-panel TV have also bought video games recently. Should we or should we not act on this insight to identify and engage the 2% by offering them better deals on games? Given that games are a high-margin product, the net effect on revenue via cross-sell could be higher than 2% in dollars.We often want to develop an algorithm that is provably correct under all circumstances. In a bid to satisfy this urge, we restrict our datasets to find a statistically significant model that is a good predictor. I associate that with a purist way of algorithm development that was drilled into us at school.Anand’s observation is a call for practitioners to think simple, use context and come up with rules that segment and win locally. It will be faster to develop, test and win on simple heuristics than waiting for a perfect “Aha!” that explains all things human.

Bookmark and Share


27
May
Visibility vs. Control
By George Candea in Administration, Blogroll, Manageability on May 27, 2008
   

When developing a system that is expected to take care of itself (self-managing, autonomic, etc.) the discussion of how much control to give users over the details of the system inevitably comes up. There is, however, a clear line between visibility and control.

Users want control primarily because they don’t have visibility into the reasons for a system’s behavior. Take for instance a database whose performance has suddenly dropped 3x… This can be due to someone running a crazy query, or some other process on the same machine updating a filesystem index, or the battery of a RAID controller’s cache having run out and forcing all updates to be write-through, etc. In order to figure out what is going on, the DBA would normally start poking around with ps, vmstat, mdadm, etc. and for this (s)he needs control. However, what the DBA really wants is visibility into the cause of the slowdown… the control needed to remedy the situation is minimal: kill a query, reboot, replace a battery, etc.)

To provide good visibility, one ought to expose why the system is doing something, not how it is doing it. Any system that self-manages must be able to explain itself when requested to do so. If a DB is slow, it should be able to provide a profile of the in-flight queries. If a cluster system reboots nodes frequently, it should be able to tell whether it’s rebooting due to the same cause or a different one every time. If a node is taken offline, the system should be able to tell it’s because of suspected failure of disk device /dev/sdc1 on that node. And so on… this is visibility.

We do see, however, very many systems and products that substitute control for visibility, such as providing root access on the machines running the system. I believe this is mainly because the engineers themselves do not understand very well in which way the how turns into the why, i.e., they do not understand all the different paths that lead to poor system behavior.

Choosing to expose the why instead of the how influences the control knobs provided to users and administrators. Retrofitting complex systems to provide visibility instead of control is hard, so this really needs to be done from day one. What’s more, when customers get used to control, it becomes difficult to give it up in exchange for visibility, so the product must maintain the user-accessible controls for backward compatibility. This allows administrators to introduce unpredictable causes of system behavior (e.g., by allowing RAID recovery to be triggered at arbitrary times), which makes self-management that much harder and inaccurate. Hence the need to build visibility in from day one and to minimize unnecessary control.

Bookmark and Share


20
May
By Mayank Bawa in Analytics, Blogroll, Business analytics, Business intelligence on May 20, 2008
   

I’ve remarked in an earlier post that the usage of data is changing and new applications are on the horizon. Over the past few years, we’ve observed or invented quite a few interesting design patterns for business processes that use data.

There are no books or tutorials for these new applications, and they are certainly not being taught in the classrooms of today. So I figured I’d share some of these design patterns on our blog.

Let me start with a design pattern that we internally call “The Automated Feedback Loop”. I didn’t invent it but I’ve seen it being applied successfully at search engines during my research days at Stanford University. I certainly think there is a lot of power that remains to be leveraged from this design principle in other verticals and applications.

Consider a search engine. Users ask keyword queries. The search engine ranks documents that match the queries and provides 10 results to the user. The user clicks one of these results, perhaps comes back and clicks another result, and then does not come back.

How do search engines improve themselves? One key way is by recording the number of times users clicked or ignored a result page. They also record the speed with which a user returned from that page to continue his exploration. The quicker the user returned, the less relevant the page was for user’s query. The relevancy of a page now becomes a factor in the ranking function itself for future queries.

The Automated Feedback LoopSo here is an interesting feedback loop. We offered options (search results) to the user, and the user provided us feedback (came back or not) on how good one option was compared to the others. We then used this knowledge to adapt and improve future options. The more the user engages, the more everyone wins!

This same pattern could hold true in a lot of consumer-facing applications that provide consumers with options.

Advertising networks, direct marketing companies, and social networking sites are taking consumer feedback into account. However, this feedback loop in most companies today is manual and not automated. Usually the optimization (adapting to user response) is done by domain experts who read historical reports from their warehouses, build an intuition of user needs and then apply their intuition to build a model that runs everything from marketing campaigns to supply chain processes.

Such a manual feedback loop has two significant drawbacks:

1. The process is expensive: it takes a lot of time, trial and error for humans to become experts, and as a result the experts are hard to find and worth their weight in gold.

2. The process is ineffective: humans can only think about handful of parameters and they optimize for the most popular products or processes (e.g., “Top 5 products or Top 10 destinations”). Everything outside this area of comfort is left under-optimized.

Such a narrow focus on optimization is severely limiting. The incorporation of Top 10 trends into future behavior is akin to a search engine saying that it will optimize for only the top 10 searches of the quarter. I am sure Google would definitely be a less valuable company then, and the world a less engaging place.

I strongly believe that there are rich dividends to be reaped if we can automate the feedback process in more consumer-facing areas. What about hotel selection, airline travel, and e-mail marketing campaigns? E-tailers, news (content providers), insurance, banks and media sites are all offering the consumer a choice for his time and money. Why not instill an automated feedback loop in all consumer-facing processes to improve consumer experience? The world will be a better place for both the consumer and the provider!

Bookmark and Share


20
May
By Mayank Bawa in Blogroll, Statements on May 20, 2008
   

I am glad to share the news that one of our first customers, MySpace, has scaled their Aster nCluster enterprise data warehouse to more than 100 Terabytes of actual data.

MySpace.com LogoIt is not easy to cross the 100TB barrier, especially when loads happen continuously and queries are relentless, as they are at MySpace.com.

Hala, Richard, Dan, Jim, Allen, and Aber, you have been awesome partners for us! It has been a great experience for Aster to work with you and we can see the reasons behind MySpace’s continued success. Your team is amazingly strong and capable and there is a clear sense of purpose. Tasso and I often remark that we need to replicate that culture in our company as we grow. At the end of the day, it is the culture and the strength of a team that makes a company successful.

And to everyone at Aster, you have been great from Day 1. It is impressive how a fresh perspective and a clean architecture can solve a tough technical challenge!

Thank you. And I wish everyone as much fun in the coming days!

Bookmark and Share


19
May
By Tasso Argyros in Blogroll, Database, Manageability, Scalability on May 19, 2008
   

One of the most interesting, complex and perhaps overused terms in data analytics today is scalability. People constantly talk about “scaling problems� and “scalable solutions.� But what really makes a data analytics system “scalable�? Unfortunately, despite its importance, this question is rarely discussed so I wanted to post my thoughts here.

Any good definition of scalability needs to be a multi-dimensional concept. In other words, there is no single system property that is enough to make a data analytics system scalable. But what are the dimensions that separate scalable from non-scalable systems? In my opinion the three most important are (a) data volume; (b) analytical power; and (c) manageability. Let me provide a couple of thoughts on each.

(a) Data Volume. This is definitely an important scale dimension because enterprises today generate huge amounts of data. For a shared-nothing MPP system this means accommodating a sufficient number of nodes to accommodate the available data. Evolution in disk and server technology have made it possible to store 10s of TBs of data per node, so this scale dimension alone can be achieved even with a relatively small number of nodes.

(b) Analytical Power. This is an equally important scale dimension to Data Volume because storing large amounts of data alone has little benefit; one needs to be able to extract deep insights out of it to provide real business value. And for non-trivial queries in a shared-nothing environment this presents two requirements. First, the system needs to be able to accommodate a large number of nodes to have adequate processing power to execute complex analytics. And secondly, the system needs to scale its performance linearly as more nodes are added. The latter is particularly hard for queries that involve processing of distributed state such as distributed joins: really intelligent algorithms have to be in place or else interconnect bottlenecks just kill performance and the system is not truly scalable.

(c) Manageability. Scalability across the manageability dimension means that a system can scale up and keep operating at a large scale without armies of administrators or downtime. For an MPP architecture this translates to seamless incremental scalability, scalable replication and failover, and little if any requirement for human intervention during management operations. Despite popular belief, we believe manageability can be measured and we need to take such metrics into account when characterizing a system as scalable or non-scalable.

At Aster, we focus on building systems that scale across all dimensions. We believe that even if one dimension is missing our products do not deserve to be called scalable. And since this is such an important issue, I’ll be looking forward to more discussion around it!

Bookmark and Share


17
May
By George Candea in Administration, Blogroll, Database, Manageability on May 17, 2008
   

I want databases that are as easy to manage as Web servers.

IT operations account for 50%-80% of today’s IT budgets and amount to 10s of billions of dollars yearly(1). Poor manageability impacts the bottomline and reduces reliability, availability, and security.

Stateless applications, like Web servers, require little configuration, can be scaled through mere replication, and are reboot-friendly. I want to do that with databases too. But the way they’re built today, the number of knobs is overwhelming: the most popular DB has 220 initialization parameters and 1,477 tables of system parameters, while its “Administrator’s Guide� is 875 pages long(2).

What worries me is an impending manageability crisis, as large data repositories are proliferating at an astonishing pace… in 2003, large Internet services were collecting >1 TB of clickstream data per day(3). 5 years later we’re encountering businesses that want SQL databases to store >1 PB of data. PB-scale databases are by necessity distributed, since no DB can scale vertically to 1 PB; now imagine taking notoriously hard-to-manage single-node databases and distributing them…

How does one build a DB as easy to manage as a Web server? All real engineering disciplines use metrics to quantitatively measure progress toward a design goal, to evaluate how different design decisions impact the desired system property.

We ought to have a manageability benchmark, and the place to start is a concrete metric for manageability, one that is simple, intuitive, and applies to a wide range of systems. We don’t just use the metric to measure, but also to guide developers in making day-to-day choices. It should tell engineers how close their system is to the manageability target. It should enable IT managers to evaluate and compare systems to each other. It should lay down a new criterion for competing in the market.

Here’s a first thought…

I think of system management as a collection of tasks the administrators have to perform to keep a system running in good condition (e.g., deployment, configuration, upgrades, tuning, backup, failure recovery). The complexity of a task is roughly proportional to the number of atomic steps Stepsi required to complete task i; the larger Stepsi, the more inter-step intervals, so the greater the opportunity for the admin to mess up. Installing an operating system, for example, has Stepsinstall in the 10s or 100s.

Efficiency of management operations can be approximated by the time Ti in seconds it takes the system to complete task i ; the larger Ti , the greater the opportunity for unrelated failures to impact atomicity of the management operation. For a trouble-free OS install, Tinstall is probably around 1-3 hours.

If Ni represents the number of times task i is performed during a time interval �evaluation (e.g., 1 year) and Ntotal=N1+… +Nn, then task i ’s relative frequency of occurrence is Frequencyi = Ni / Ntotal . Typical values for Frequencyi can be derived empirically or extracted from surveys(4),(5),(6). The less frequently one needs to manage a system, the better.

Manageability can now be expressed with a formula, with larger values of manageability being better:

manageability formula

This says that, the more frequently a system needs to be “managed,� the poorer its manageability. The longer each step takes, the poorer the manageability. The more steps involved in each management action, the poorer the manageability. The longer the evaluation interval, the better the manageability, because observing a system longer increases the confidence in the “measurement.�

While complexity and efficiency are system-specific, their relative importance is actually specific to a customer: an improvement in complexity may be preferred over an improvement in efficiency or vice-versa; this differentiated weighting is captured by α. I would expect α>2 in general, because having fewer, atomic steps is valued more from a manageability perspective than reducing task duration, since the former reduces the risk of expensive human mistakes and training costs, while the latter relates almost exclusively to service-level agreements.

So would this metric work? Is there a simpler one that’s usable?

Bookmark and Share


15
May
A taste of something new
By Mayank Bawa in Analytics, Analytics tech, Blogroll, Statements on May 15, 2008
   

Have you ever discovered a wonderful little restaurant off the beaten path? You know the kind of place. It’s not part of some corporate conglomerate. They don’t advertise. The food is fresh and the service is perfect – it feels like your own private oasis. Keeping it to yourself would just be wrong (even if you selfishly don’t want the place to get too crowded).

We’re happy to see a similar anticipation and word-of-mouth about some new ideas Aster is bringing to the data analytics market. Seems that good news is just too hard to keep to yourself.

We’re serving up something unique that we’ve been preparing for several years now. We’re just as excited to be bringing you this fresh approach.

Bookmark and Share


08
May
How can I analyze all of this data?
By Tasso Argyros in Analytics, Analytics tech on May 8, 2008
   

Over the last couple of years I’ve talked to scores of companies that face data analytics problems and ask this question. From these discussions it was pretty clear that no existing infrastructure can really solve the problem of driving deep insights from massive amounts of data for most enterprises. But why? And how do companies today try to cope with this issue?

I’ve seen three classes of “solutions” that companies attempt to implement in a desperate attempt to overcome their data analytics challenges. Let me try to describe what I’ve seen here.

“Solutionâ€Â? One. Vertical scale-up. If you are like most companies, database performance problems make your favorite hardware vendor sales rep lots of money every year! There is nothing new here. Ever since the 1960s, when the first data management systems came around, performance issues were solved by buying much more expensive hardware. So here’s the obvious problem with this approach: cost. And here’s the non-obvious one: there’s a limit in how much you can scale this way, which is actually pretty low. (Question: what is the maximum number of CPUs that you can buy in a high-end server? How does it compare to the average Google cluster?)

“Solutionâ€Â? Two. “Massivelyâ€Â? parallel database clusters. Sometimes I’ve heard an argument that goes like this: “Why shouldn’t it be simple to build a farm of databases just like we have farms of app servers or web servers?” Driven by this seemingly innocent question, you may try (or have tried) to put together clusters of databases to do analytics, either on your own or using one of the MPP products that are in the marketplace. This will work fine for small datasets *or* very simple queries (e.g. computing a sum of values). But, as any student of distributed systems knows, there is a reason why web servers scale so nicely: they are stateless! That’s why they’re so easy to deploy and scale. On the other hand, databases do have state. In fact, they have lots of it, perhaps several Gigabytes per box. And, guess what, in analytics each query potentially needs access to all of it at once! So what works fine for very small numbers of nodes or small amount of data, doesn’t do anything for slightly more complex queries and larger systems - which is probably the issue you were trying to solve in the first place.

By the way, all the solutions that are in the marketplace today solve the wrong problems. For instance, some optimize disk I/O of the individual nodes and not overall system performance for complex queries, which is the real issue (e.g., “columnar” systems). Others allow for fast execution of really simple queries but do nothing to allow more complex ones to go really quickly (e.g., “MPP” databases). None of these products can provide a solution that is even relevant to the hardest problems these systems face.

“Solutionâ€Â? Three. Write custom code. Why not? Google and Yahoo have done it pretty successfully! The only problem is, this approach is even more expensive than approach #1! Google has built a great infrastructure, but what is the cost to retain and compensate the best minds in the world who can develop and maintain your analytics? (Hint: It’s more than free snacks and soda). I’ve frequently seen what starts as a simple, cheap solution for a single point problem evolve to a productivity nightmare, where each new data insight requires development time and specialized (thus expensive) skills. If you can afford that, that’s fine. But I’ll bet you do not want to spend your most precious resources reinventing the wheel every time you need to run a new query instead of doing what makes your company most successful.

The end result is that all of these approaches are pretty far from solving the real problem. Rather, the cost of becoming more competitive through data is currently huge - and it shouldn’t be! I believe that as soon as the right tools are built and made available, companies will immediately take advantage of them to be more competitive and successful. This is the upcoming data revolution that I see, and, frankly, it has been long overdue.

Bookmark and Share