Archive for September, 2008

21
Sep
A shout-out: Join us!
By Mayank Bawa in Blogroll, Statements on September 21, 2008
   

There has been a lot of turmoil this past week in Financial Services. Several good people had their projects stalled, or even lost their jobs, due to market forces beyond their control.

I’d like to call out to Quantitative Computer Scientists who have been affected. If you are good with data and know how to extract intelligence from it, we want you in our team!

We are hiring. You’ll have the chance to work with a number of our customers and help them do more with their data. You’ll bring a fresh perspective to the business processes at our customers; in turn, you’ll gain from learning about the business processes of various verticals. An invaluable education when you want to go back to Financial Services after the crisis has passed in a couple of years.

Drop us a note at careers [at] asterdata [dot] com. We’d love to hear from you!



15
Sep
Highlights from the Claremont Report on Database Research
By Tasso Argyros in Blogroll, Database on September 15, 2008
   

Dave Kellog’s blog reminded me that the Claremont DB Research report was recently released. The Claremont report is the result of two days of discussion among some of the world’s greatest academics in databases and aims to identify and promote the most promising research directions in databases.

As I was reading the report, I realized that Aster Data is at the forefront of some of the most exciting database research topics. In particular, the report mentions four areas (out of a total of six) where Aster has been driving innovation very aggressively.

1. Revisiting database engines. MPP is the answer to Big Data, among other things.

2. Declarative programming for emerging platforms. MapReduce is explicitly mentioned here, noting its potential in data management. This is a very important development given that certain database academics (that participated in the report) have repeatedly shown their depreciation and ignorance on the topic.

3. Interplay of structured and unstructured data. This is an important area where MapReduce can play a huge role.

4. Cloud data services. Database researchers realize the potential of the cloud, both as a data management and a research tool. With our precision scaling feature, we are a strong fit for internal Enterprise clouds.

The world of databases is changing fast and this is an opportunity for us to provide the most cutting-edge database technology to our customers.

We’ve also found a lot of benefit from our strong ties with academia, by nature of our background and advisors, and we intend to strengthen these even more.



10
Sep
By Tasso Argyros in Blogroll, MapReduce on September 10, 2008
   

I am very excited about the power that In-Database MapReduce puts in the hands of the larger BI community. I’ll be leading a Night School session on In-Database MapReduce at the TDWI World Conference in November in New Orleans.

Please join me if you are interested in learning more about the MapReduce framework and its applications. I will introduce MapReduce from the basic principles, and then help build up your intuition. If we have time, I will even address why MapReduce is not UDF re-discovered. :-)

If you are unable to attend, or eager to understand, here are some MapReduce resources you may find informative: Aster’s whitepaper on In-Database MapReduce; Google Labs’ MapReduce research paper; Curt Monash’s post on Known Applications of MapReduce.

A great open-source project that I’d like to commend and draw your attention to illustrate the power of MapReduce is Apache’s Mahout Project, which is building machine learning algorithms on the MapReduce framework (Classification, Clustering, Regression, Dimension reduction and Evolutionary Algorithms).

I am sure this is just a snippet of the MapReduce resources available. If you have some that you have found helpful, please share them in your comments. I will be happy to review and cover them in our TDWI Night School!



06
Sep
By Tasso Argyros in Blogroll, Database, MapReduce on September 6, 2008
   

In response to Aster’s In-Database MapReduce initiative, I’ve been asked the following question:

“How does Aster Data Systems compete with open source MapReduce implementations, such as Hadoop?”

My answer -we simply do not.

Hadoop and Google’s implementation of MapReduce are targeted to the development (coding) community. The primary interface of these systems is the command line; and the primary means of accessing data is through Java or Python code. There have been efforts to build higher-level interfaces on top of these systems, but they are usually limited, do not follow any existing standard, and are incompatible with the existing filesystem.

Such tools are ideal for environments that are dominated by engineers, such as academic institutions, research labs or technology companies like Google/Yahoo that have a strong culture of in-house development (often hundreds of thousands of lines of code) to solve technical problems.

Most enterprises are unlike the culture of Google/Yahoo and each “build vs. buy” decision is carefully considered. Good engineering talent is a precious resource that is directed towards adding business value, not in building infrastructure from the ground up. The Data Services groups are universally under-staffed and consist of people that understand and leverage databases. As such, there are corporate governance expectations from any data management tool that they use:

- it has to comply with applicable standards like ANSI-SQL,

- it needs to provide a set of tools that IT can use & manage, and

- it needs to be ecosystem-friendly (BI and data integration tools compatibility).

In such an environment, using Java or developer-centric command line as the primary interface will increase the burden on the data services group and their IT counter-parts.

I strongly believe, that while existing MapReduce tools are good for development organizations, they are totally inappropriate for a large majority of enterprise IT departments.

Our goal is not to build yet another tool for development groups, but rather to create a product that unleashes the power of MapReduce for the enterprise IT organization.

How can we achieve that?

First, we’ve developed Aster to be a super-fast, always-parallel database for large-scale data warehousing using SQL. Then we allow our customers and partners to extend SQL through a tightly integrated MapReduce functionality.

The person that develops our MapReduce functions, naturally, needs to be a developer; but the person that is using this functionality can be an analyst using a standard BI tool (e.g., Microstrategy, Business Objects, Pentaho) over ODBC or JDBC connections!

Invoking MapReduce functions in Aster looks almost identical to writing standard SQL code. This way, the powerful MapReduce extensions that are developed by a small set of developers (either within an IT organization or by Aster itself) can be used by people with SQL skills using their existing sets of tools.

Integrating MapReduce and SQL is not an easy job; we had to innovate on multiple levels to achieve that, e.g. by creating a new type of UDFs that are both parallel and polymorphic, to make MapReduce extensions almost indistinguishable from standard SQL.

In summary, we have enabled:

- The flexible, parallel power of MapReduce to enable deep analytical insights that are impossible to express in standard SQL

- Seamless integration with ANSI standard SQL and all the rich commands, types, functions, etc. that are inherent in this well-known language

- Full JDBC/ODBC support ensures interoperability between Aster In-Database MapReduce and 3rd party database ecosystem tools like BI, reporting, advanced analytics (e.g., data mining), ETL, monitoring, scheduling, GUI administration, etc.

- SQL/MR functions -powerful plug-in operators that any non-engineer can easily plug into standard ANSI SQL to exploit the power of MapReduce analytic applications

- Polymorphism -unlike static, unreliable UDFs, SQL/MR functions unleash the power of polymorphism (run-time/dynamic) for cost-efficient reusability.  Built-in sandboxing ensures fault tolerance to avoid system crashes commonly experienced with UDFs

To conclude, it is important to understand that Aster nCluster is not yet another MapReduce implementation nor does it compete with Hadoop for resources or audience.

Rather, Aster nCluster is the world’s most powerful database that breaks traditional SQL barriers allowing Data Services groups and IT organizations to extract more knowledge out of their data