I had commented that a new set of applications are being written that leverage data to act smarter to enable companies to deliver more powerful analytic applications. Operating a business today without serious insight into business data is not an option. Data volumes are growing like wildfire, applications are getting more data-heavy and more analytics-intensive, and companies are putting more demands on their data.
The traditional 20-year old data pipeline of Operational Data Stores (to pool data), Data Warehouses (to store data), Data Marts (to farm out data), Application Servers (to process data) and UI (to present data) are under severe strain – because we are expecting a lot of data to move from one tier to the other. Application Servers pull data from Databases for computations and push the results of the computation to the UI servers. But data is like a boulder – the larger the data, the more the inertia, and therefore the larger the time and effort needed to move it from one system to another.
The resulting performance problems of moving ‘big data’ are so severe that application writers unconsciously compromise the quality of their analysis by avoiding “big data computations” – they first reduce the “big data” to “small data” (via SQL-based aggregations/windowing/sampling) and then perform computations on “small data” or data samples.
The problem of ‘big data’ analysis will continue to grow severe in the next 10 years as data volumes grow and applications demand more data granularity to model behavior and identify patterns so as to better understand and service their customers. To do this, you have to analyze all your available data. For the last 5 years, companies have routinely upgraded their data infrastructure every 12-18 months as data sizes double and the traditional data pipeline buckles under the weight of larger data movement – and they will be forced to continue doing this in the next 10 years if nothing fundamental changes.
Clearly, we need a new, sustainable solution to address this state of affairs.
The ‘aha!’ for big data management is to realize that traditional data pipeline suffers from an architecture problem – of moving data to applications – that must change to allow applications to move to the data.
I am very pleased to announce a new version of Aster Data nCluster that addresses this challenge head-on.
Moving applications to the data requires a fundamental change in the traditional database architecture where applications are co-located inside the database engine so that they can iteratively read, write and update all data. The new infrastructure acts as a ‘Data-Application Server’ managing both data and applications as first-class citizens. Like a traditional database, it provides a very strong data management layer. Like a traditional application server, it provides a very strong application processing framework. It co-locates applications with data, thus eliminating data movement from the Database to the Application server. At the same time, it keeps the two layers separate to ensure the right fault-tolerance and resource-management models – bad data will not crash the application, and vice-versa a bad application will not crash the database.
Our architecture and implementation ensures that apps should not have to be re-written to make this transition. The application is pushed down into the Aster 4.0 system and transparently parallelized across the servers that store the relevant data. As a result, Aster Data nCluster 4.0 simultaneously also delivers 10x-100x boost in performance and scalability.
Those using Aster Data’s solution, including comScore, Full Tilt Poker, Telefonica I+D, Enquisite – are testament to the benefits of this fundamental change. In each case, it was the embedding of the application with the data that enables them to scale seamlessly and perform ultra-fast analysis.
The new release brings to fruition a major product roadmap milestone that we’ve been working on for the last 4 years. There is a lot more innovation coming – and this milestone is significant enough that we issue a clarion call to all persons working on “big data applications” – we need to move applications to the data because the other way round is unsustainable in this new era.