20
May
By Mayank Bawa in Blogroll, Statements on May 20, 2008
   

I am glad to share the news that one of our first customers, MySpace, has scaled their Aster nCluster enterprise data warehouse to more than 100 Terabytes of actual data.

MySpace.com LogoIt is not easy to cross the 100TB barrier, especially when loads happen continuously and queries are relentless, as they are at MySpace.com.

Hala, Richard, Dan, Jim, Allen, and Aber, you have been awesome partners for us! It has been a great experience for Aster to work with you and we can see the reasons behind MySpace’s continued success. Your team is amazingly strong and capable and there is a clear sense of purpose. Tasso and I often remark that we need to replicate that culture in our company as we grow. At the end of the day, it is the culture and the strength of a team that makes a company successful.

And to everyone at Aster, you have been great from Day 1. It is impressive how a fresh perspective and a clean architecture can solve a tough technical challenge!

Thank you. And I wish everyone as much fun in the coming days!


Comments:
James Dutton on May 27th, 2008 at 5:40 am #

Hi Guys,
Your solution is very interesting, I am curious though – I have seen a lot of interesting dialogue regarding the scalability of your solution and discussion regarding how analytics is a key component but have had difficulty figuring out how analytics is deployed with the solution. Are you presenting a data discovery type solution (such as that offered by Visual Site) or proposing a plug in architecture with analytics partners (such as SAS). I suspect I am missing something important here! Congratulations on a great launch BTW.
Cheers, James.

Mayank Bawa on May 27th, 2008 at 9:38 am #

Thanks for the comments/questions, James!

We allow you to store all your logs in one database on which queries can be run. The logs can be augmented with contextual information (e.g., information about pages, users, geographies, etc.). We can then use SQL to process data and generate reports. You can also use data-mining tools like SAS, SPSS, and R to analyze data in our database.

Nima on June 15th, 2008 at 7:12 pm #

Very interesting, how does your implementation differ from a distributed data system like Mnesia (Erlang). I have read that MS-SQL 2008 Clustering will use a similar approach. Have you guys heard different?

Back to mnesia , the creators have said that the service begins to fail at the petabyte level, have you guys done such theoretical tests?

Regardless, Congratulations! and great product.

Mayank on June 16th, 2008 at 5:05 am #

Nima – Mnesia and nCluster share many of the same goals (distribution, location transparency, ACID transactions, non-stop applications).

The biggest difference is in our support for Normalization and fast Joins.

Normalization helps limit data growth rates; de-normalization causes values to be replicated multiple times spurring data growth.

Denormalization is preferred to avoid Joins; it is fair to say that Joins are the slowest component in query execution. We do joins pretty fast even in a distributed environment.

Nima on June 16th, 2008 at 12:52 pm #

I definitely get that part, the support for horizontal fragmentation is clearly the main benefit of this type of scaling approach, but isn’t their a scaling issue vertically?

Meaning once you have a large enough data set vertically and a sufficiently horizontally fragmented schema doesn’t the scaling fall apart due to the messaging overhead?

Keep in mind , this is in no way a reflection of your product, and to anyone reading this I have no experience with any Asterdata product. I am simply stating characteristics of other distributed systems that are *somewhat* similar to your guys approach. I’m just curious and hopeful!

Thanks!

[...] has been driving innovation very aggressively. 1. Revisiting database engines. MPP is the answer to Big Data, among other [...]

Post a comment

Name: 
Email: 
URL: 
Comments: