02
Nov
Posted by Mayank in Data-Analytics Server, Statement on November 2, 2009

I had commented that a new set of applications are being written that leverage data to act smarter to enable companies to deliver more powerful analytic applications. Operating a business today without serious insight into business data is not an option. Data volumes are growing like wildfire, applications are getting more data-heavy and more analytics-intensive, and companies are putting more demands on their data.

The traditional 20-year old data pipeline of Operational Data Stores (to pool data), Data Warehouses (to store data), Data Marts (to farm out data), Application Servers (to process data) Moving boulder uphilland UI (to present data) are under severe strain – because we are expecting a lot of data to move from one tier to the other. Application Servers pull data from Databases for computations and push the results of the computation to the UI servers. But data is like a boulder – the larger the data, the more the inertia, and therefore the larger the time and effort needed to move it from one system to another.

The resulting performance problems of moving ‘big data’ are so severe that application writers unconsciously compromise the quality of their analysis by avoiding “big data computations” – they first reduce the “big data” to “small data” (via SQL-based aggregations/windowing/sampling) and then perform computations on “small data” or data samples.

Replacing sections of pipeThe problem of ‘big data’ analysis will continue to grow severe in the next 10 years as data volumes grow and applications demand more data granularity to model behavior and identify patterns so as to better understand and service their customers. To do this, you have to analyze all your available data. For the last 5 years, companies have routinely upgraded their data infrastructure every 12-18 months as data sizes double and the traditional data pipeline buckles under the weight of larger data movement - and they will be forced to continue doing this in the next 10 years if nothing fundamental changes.

Clearly, we need a new, sustainable solution to address this state of affairs.

The ‘aha!’ for big data management is to realize that traditional data pipeline suffers from an architecture problem - of moving data to applications - that must change to allow applications to move to the data.

I am very pleased to announce a new version of Aster Data nCluster that addresses this challenge head-on.

Moving applications to the data requires a fundamental change in the traditional database architecture where applications are co-located inside the database engine so that they can iteratively read, write and update all data. The new infrastructure acts as a ‘Data-Application Server’ managing both data and applications as first-class citizens. Like a traditional database, it provides a very strong data management layer. Like a traditional application server, it provides a very strong application processing framework. It co-locates applications with data, thus eliminating data movement from the Database to the Application server. At the same time, it keeps the two layers separate to ensure the right fault-tolerance and resource-management models - bad data will not crash the application, and vice-versa a bad application will not crash the database.

Our architecture and implementation ensures that apps should not have to be re-written to make this transition. The application is pushed down into the Aster 4.0 system and transparently parallelized across the servers that store the relevant data. As a result, Aster Data nCluster 4.0 simultaneously also delivers 10x-100x boost in performance and scalability.

Those using Aster Data’s solution, including comScore, Full Tilt Poker, Telefonica I+D, Enquisite - are testament to the benefits of this fundamental change. In each case, it was the embedding of the application with the data that enables them to scale seamlessly and perform ultra-fast analysis.

The new release brings to fruition a major product roadmap milestoneA clarion call that we’ve been working on for the last 4 years. There is a lot more innovation coming – and this milestone is significant enough that we issue a clarion call to all persons working on “big data applications” – we need to move applications to the data because the other way round is unsustainable in this new era.


Comments:
Jake Kaldenbaugh on November 11th, 2009 at 4:58 pm #

I like your aha! moment. I’ve been arguing that one of the major problems with current architectures is that each app is building its own data pool. Companies will be able to get a competitive advantage by making applications logic and presentation layers over a single pool of all of their data.

[...] which Aster Data’s 4.0 release uniquely supports. Last week we announced the capability to fully push down analytics application logic inside our MPP database so applications can now inside the database allowing analytics to be performed on massive data [...]

surekha on November 17th, 2009 at 12:29 pm #

I am some what of a mixed opinion on the “move application to data” concept. The reason being that in the past Stored Procs did try to manipulate data and house business logic in the database but then the industry realized that separation of concerns was important. To state it another way separating the business logic and rules from the data logic was key to insure that the less volatile business data was not commingled with the more volatile business logic/ rules. To that effect pushing application logic back to the data tier seems to violate the architecture tenet of separating the business rules from the data tier or for that matter the presentation tier as well. On the other hand, I can relate to the problem we are seeing in terms of moving tons of data to the mid-tier or the application tier - as we routinely run out of memory on the mid-tier!!

I would love to hear your comments on how AsterData is able to address this concern and this dilemma.

Thanks.
surekha -

Mayank on November 19th, 2009 at 10:17 am #

Surekha,

I agree with you that “business logic” needs to be separate from “business data”.

As you point out, the major problem with Stored Procedures is that it put “business logic” deep inside the relational database stack. In fact, relational databases never supported Stored Procedures as elegantly as they supported declarative SQL queries. For relational databases, Stored Procedures are just functions on data - not full applications - that are too dependent on properties of data. Application Servers supported business logic a lot more elegantly than Stored Procedures did.

But having the logic too far outside the data infrastructure means we’ll keep running out of the memory in the mid-tier for ‘big data applications’.

The correct architecture enables applications to be “close” to the data without being “subservient” to it. In other words, applications should have their own life and freedom (exactly as when they would be run in a separate application server) and yet not be too far from the data to make read/write accesses to it difficult.

The Data Blog: Aster Data Blog » Blog Archive » Partnering with SAS Institute for ‘Big Data’ Analysis on May 3rd, 2011 at 7:13 am #

[...] which Aster Data’s 4.0 release uniquely supports. Last week we announced the capability to fully push down analytics application logic inside our MPP database so applications can now inside the database allowing analytics to be performed on massive data [...]

Post a comment
Name: 
Email: 
URL: 
Comments: