Comments:
Hi Guys,
Your solution is very interesting, I am curious though – I have seen a lot of interesting dialogue regarding the scalability of your solution and discussion regarding how analytics is a key component but have had difficulty figuring out how analytics is deployed with the solution. Are you presenting a data discovery type solution (such as that offered by Visual Site) or proposing a plug in architecture with analytics partners (such as SAS). I suspect I am missing something important here! Congratulations on a great launch BTW.
Cheers, James.
Thanks for the comments/questions, James!
We allow you to store all your logs in one database on which queries can be run. The logs can be augmented with contextual information (e.g., information about pages, users, geographies, etc.). We can then use SQL to process data and generate reports. You can also use data-mining tools like SAS, SPSS, and R to analyze data in our database.
Nima on June 15th, 2008 at 7:12 pm #
Very interesting, how does your implementation differ from a distributed data system like Mnesia (Erlang). I have read that MS-SQL 2008 Clustering will use a similar approach. Have you guys heard different?
Back to mnesia , the creators have said that the service begins to fail at the petabyte level, have you guys done such theoretical tests?
Regardless, Congratulations! and great product.
Mayank on June 16th, 2008 at 5:05 am #
Nima – Mnesia and nCluster share many of the same goals (distribution, location transparency, ACID transactions, non-stop applications).
The biggest difference is in our support for Normalization and fast Joins.
Normalization helps limit data growth rates; de-normalization causes values to be replicated multiple times spurring data growth.
Denormalization is preferred to avoid Joins; it is fair to say that Joins are the slowest component in query execution. We do joins pretty fast even in a distributed environment.
Nima on June 16th, 2008 at 12:52 pm #
I definitely get that part, the support for horizontal fragmentation is clearly the main benefit of this type of scaling approach, but isn’t their a scaling issue vertically?
Meaning once you have a large enough data set vertically and a sufficiently horizontally fragmented schema doesn’t the scaling fall apart due to the messaging overhead?
Keep in mind , this is in no way a reflection of your product, and to anyone reading this I have no experience with any Asterdata product. I am simply stating characteristics of other distributed systems that are *somewhat* similar to your guys approach. I’m just curious and hopeful!
Thanks!
[...] has been driving innovation very aggressively. 1. Revisiting database engines. MPP is the answer to Big Data, among other [...]