Archive for August, 2010

By Tasso Argyros in Analytics, Blogroll on August 10, 2010

Coming out of Stanford to start Aster Data five years back, my co-founders and I had to answer a lot of questions. What kind of an engineering team do we want to build? Do we want people experienced in systems or databases? Do we want to hire people from Oracle or another established organization? When you’re just starting a company, embarking on a journey that you know will have many turns, answers are not obvious.

What we ended up doing very early on is bet on intelligent, smart and adaptable engineers, as opposed to experience or a long resume. It turned out that this was the right thing to do because, as a startup, we had react to market needs and change our focus at a blink of an eye. Having a team of people that were used to tackling never-seen-before problems made us super-agile as a product organization. As the company grew, we ended up having a mix of people that combined expertise in certain areas and core engineering talent. But the culture of the company was set in stone even though we didn’t realize it: even today our interview process expects talent, intelligence and flexibility to be there and strongly complement the experience our candidates may have.

There are three things that are great about being an engineer at Aster Data:

Our Technology Stack is Really Tall.

We have people working right above the Kernel on filesystems, workload management, I/O performance, etc. We have many challenging problems that involve very large scale distributed systems - and I’m talking about the whole nine yards, including performance, reliability, manageability, and data management at scale. We have people working on database algorithms from the I/O stack to the SQL planner to no-SQL planners. And we have a team of people working on data mining and statistical algorithms on distributed systems (this is our “quant”? group since people there come with a background in physics as much as computer science). It’s really hard to get bored or stop learning here.

We Build Real Enterprise Software.

There’s a difference between the software one would write in a company like Aster Data versus a company like Facebook. Both companies write software for big data analysis. However, a company like Facebook solves their problem (a very big problem, indeed) for themselves and each engineer gets to work on a small piece of the pie. At Aster Data we write software for enterprises and due to our relatively small size each engineer makes a world of a difference. We also ship software to third-party people and they expect our software to be out-of-the-box resilient, reliable and easy to manage/debug. This makes the problem more challenging but also gives us great leverage: once we get something right, not one, nor two, but potentially hundreds or thousands of companies can benefit from our products. The impact of the work of each engineer at Aster Data is truly significant.

We’re Working on (Perhaps) the Biggest IT Revolution of the 21st Century.

Big Data. Analytics. Insights. Data Intelligence. Commodity hardware. Cloud/elastic data management. You name it. We have it. When we started Aster Data in 2005 we just wanted to help corporations analyze the mountains of data that they generate. We thought it was a critical problem for corporations if they wanted to remain competitive and profitable. But the size and importance of data grew beyond anyone’s expectations over the past few years. We can probably thank Google, Facebook and the other internet companies for demonstrating to the world what data analytics can do. Given the importance and impact of our work, there’s no ceiling on how successful we can become.

You’ve probably guessed it by now, but the reason I’m telling you all this is to also tell you that we’re hiring. If you think you have what it takes to join such an environment, I’d encourage you to apply. We get many applications daily so the best way to get an interview here is through a recommendation and referral. With tools like LinkedIn (who happens to be a customer) it’s really easy to explore your network. My LinkedIn profile is here, so see if we have a professional or academic connection. You can also look at our management team, board of directors, investors and advisors to see if there are any connections there. If there’s no common connection, feel free to email your resume to However, to stand out I’d encourage you to tell us a couple of words about what excites you about Aster Data, large scale distributed systems, databases, analytics and/or startups that work to revolutionize an industry, and why you think you’ll be successful here. Finally, take a look at the events we either organize or participate in - it’s a great way to meet someone from our team and explain why you’re excited to join our quest to revolutionize data management and analytics.

By Tasso Argyros in Analytics on August 9, 2010

Watching our customers use Aster Data to discover new insights and build new big data products is one of the most satisfying parts of my job. Having seen this process a few times, I found that it always has the same steps:

An Idea or Concept – Someone comes up with an idea of a hidden treasure that could be hidden in the data, e.g. a new customer segment that could be very profitable, a new pattern that reveals novel cases of fraud, or other event-triggered analysis.

Dataset – An idea based on data that doesn’t exist is like a great recipe without the ingredients. Hopefully the company has already deployed one or more big data repositories that have the necessary data in full detail (no summaries, sampling, etc). If that’s not the case, data has to be generated, captured and moved to a big data-analytics server, which is an MPP database with a fully integrated analytics engine, like Aster Data’s solution. It addresses both parts of the big data need – scalable data storage and data processing.

Iterative Experimentation – This is the fun part. In contrast to traditional reporting, where the idea translates almost automatically to a query or report (e.g.: I want to know average sales per store for the past 2 years), a big data product idea (e.g.: I want to know what is my most profitable customer segment) requires building an intuition about the data before coming up with the right answer. This can only be achieved by a large number of analytical queries using either SQL or MapReduce, and it’s the step where the analyst or data scientist builds their intuition and understanding of the dataset and of the hidden gems buried there.

Data Productization – Once iterative experimentation provides the data scientist with evidence of gold, the next step is to make the process repeatable so that its output can be systematically used by humans (e.g. marketing department) or systems (e.g. a credit card transaction clearing system that needs to identify fraudulent transactions). This requires not only a repeatable process but also data that’s certified to be of high quality and processing that can meet specific SLAs, always while using a hybrid of SQL and MapReduce for deep big data analysis

If you think about it, this process is similar to the process of coming up with a new product (software or otherwise). You start with an idea, you then get the first material and build a lot of prototypes. I’ve found that people who find an important and valuable data insight after a process of iterative experimentation feel the same satisfaction as an inventor who has just made a huge discovery. And the next natural step is to take that prototype, make it a repeatable manufacturing process and start using it in the real world.

In the “old”? world of simple reporting, the process of creating insights was straightforward. Respectively the value of the outcome (reports) was much lower and easily replicable by everyone. Big Data Analytics, on the other hand, require a touch of innovation and creativity, which is exactly why it is hard to replicate and why its results produce such important and sustainable advantages to businesses. I believe that Big Data Products are the next wave of corporate value creation and competitive differentiation.