Archive for the ‘Analytics’ Category

15
Mar
By Steve Wooledge in Analytics, MapReduce, Teradata Aster on March 15, 2012
   

Yesterday I presented at the Los Angeles Teradata User Group on the topic of “Data Science: Finding Patterns in Your Data More Quickly & Easily with MapReduce”. One point discussed was the common misnomer that big data is about volume, which is certainly part of the issue organizations are facing. However, the big story in big data is the complexity and additional processing required to make “unstructured” data actionable through analytics. This is where procedural frameworks like MapReduce can help. Here is a great post by Teradata’s own Bill Franks about unstructured data which helps describe the requirements unstructured data demands in the context of analytics.

As Franks notes, “the thought of using unstructured data really shouldn’t intimidate people as much as it often does.” Read more to learn why.

 



21
Feb
By Tasso Argyros in Analytic platform, Analytics, Analytics tech, Database, MapReduce on February 21, 2012
   

It has been about seven years since Aster Data was founded, four years since our industry-first Enteprise SQL-MapReduce implementation (first commercial MapReduce offering) and three years since our first Big Data Summit event (the first “Big Data” event in the industry as far as I know). During this whole time, we have witnessed our technology investments take off together with the Big Data market – just think how many people had never even heard the word MapReduce three years ago, and how many swear by it today!

As someone who was caught in the Big Data wave since 2005, I can tell you that the stage of the market has changed significantly during this time – and with it, the challenges that Enterprise customers face. A few years ago, customers were realizing the challenges that piles of new types of data were bringing – big volumes (terabytes to petabytes) and new, complex types (multi-structured data such as weblogs, text, customer interaction data); but at the same time, the opportunities that the new analytical interfaces, like MapReduce, were enabling. Fast forward to today and most enterprises are trying to put together their Big Data strategies and make sense of what the market has to offer – and as a result there is a lot of market noise and confusion: it is usually not clear what use cases apply to traditional technologies versus new; how to reconcile existing technologies with new investments; and what type of projects will they give them highest ROI versus a long and painful failure.

Teradata and Teradata Aster have a high interest in customers being successful with Big Data challenges and technologies, because we believe that the growth of the market will translate into growth for us. Given Teradata’s history in being the #1 strategic advisor to customers around data management and analytics, we only want to offer the best solutions to our customers. This includes our products –which are recognized by Gartner as leading technologies in Data Warehousing and Big Data analytics– but also our expertise helping customers how to use complementary solutions, like Hadoop, and making sure that the total solution works reliably and succeeds in tackling big business problems.

With this partnership, we are taking one more step towards this direction. So we are announcing three things:

1. Teradata and Hortonworks will work together to jointly solve big challenges for our customers. This is a win/win for customers and the industry.

2. Our intent to do joint R&D to make it easier for customers that use products from Teradata and Hadoop to utilize these products together. This is important because every enterprise will look to combine new technologies with existing investments, and there is plenty of opportunity to do better.

3. A set of reference architectures that combine Teradata and Hadoop products to accelerate the implementation of Big Data Big Data projects. We hope that this will be a starting point that will save enterprises time and money when they embark on Big Data projects.

We believe that all the above three points will translate into eliminating risks and unnecessary trial and error. We have enough collective experience to guide customers to avoid failed projects and traps. And by helping clear up some of the confusion in the big data market, we hope to accelerate its growth and the benefit to Enterprises that are looking to utilizing their data to become more competitive and efficient.



29
Sep
By Tasso Argyros in Analytic platform, Analytics, MapReduce on September 29, 2011
   

One of the great things about starting your own company (if you’re lucky and your company does well) is that you take part in the evolution of a whole new market, from its nascent days to its heyday. This was the case with Aster and the “Big Data” market. Back when we started Aster, in 2005, MPP systems that could store and analyze data using off-the-self servers was still a pretty new concept. I also recall in 2008, when we first came out with our native in-database MapReduce support — and our SQL-MapReduce® technology — we had to explain to most people what MapReduce even was. In 2009, we came out with the first Big Data event series — “Big Data Summit” — because we knew we were doing something new and wanted a term to describe it. “Big Data” caught on more than we had imagined back then, and the rest is history. Product innovation was at the core of Aster’s existence, and we kept pushing ourselves and our product to become the best platform for enterprise-class data analytics using both SQL and MapReduce as first class citizens on one analytic platform.

Today there is a lot of innovation in the big data market. However, we see a “chasm” between the SQL technologies—which are very enterprise-friendly—and the new wave of open source big data or “NoSQL” software which is used extensively by engineering organizations. In the middle is a very large number of enterprises trying to understand how they can use these new technologies to push their analytical capabilities beyond purely SQL, while at the same time utilizing their existing investments in technologies and people. This is the problem that Aster solves.

With last week’s announcement, the launch of our Teradata Aster MapReduce solutions which include Aster Database 5.0 software (formerly Aster nCluster) and our new Aster MapReduce Appliance, we bring to market the best answer for the organizations that are “caught in the middle.” Unlike SQL-only systems focused primarily on analyzing structured data, our database and appliance provide support for native MapReduce which enables a new generation of analytics, such as digital marketing optimization, social graph analysis, fraud detection based on customer behavior, etc. Our newly extended libraries of pre-built MapReduce analytical functions allows such applications to be developed with significantly less time and cost versus other MapReduce technologies. And, unlike other MapReduce-based systems, we offer full SQL support, integration with all major BI and ETL vendors and a data adaptor to EDW systems that allows enterprises to utilize existing tools and skills to bring big data analytics to their businesses. Finally, with our new appliance, we leverage Teradata’s strength and engineering to provide a proven and performance-optimized system for businesses to start analyzing untapped diverse data while cutting down on time, cost and frustration!

As we move forward, Aster is committed to being the leader in SQL and MapReduce analytics for multi-structured data. Having spent 6 years in this market, we believe that it’s not just the coolest technologies that will win, but the ones that make it easier for business analysts and data scientists within organizations to solve their business problems and innovate with analytics. With the launch of our new Teradata Aster solutions — including the revamped SQL-MapReduce interfaces and the new Aster MapReduce appliance—we are pushing the state of the art towards this direction (or as my marketing team likes to say – “bringing the science of data to the art of business”). :)



28
Jul
By Mayank Bawa in Analytic platform, Analytics on July 28, 2011
   

I wrote earlier that data is structured in multiple forms. In fact, it is the structure of data that allows applications to handle it “automatically” – as an automaton, i.e., programmatically – rather than relying on humans to handle it “semantically”.

Thus a search engine can search for words, propose completion of partially typed words, do spell checking, and suggest grammar corrections “automatically”.

In the last 30 years, we’ve built specialized systems to handle each data structure differently at scale. We index a large corpus of documents in a dedicated search engine for searches, we arrange lots of words in a publishing framework to compose documents, we store relational data in a RDBMS to do reporting, we store emails in an e-discovery platform to identify emails that satisfy a certain pattern, we build and store cubes in a MOLAP engine to do interactive analysis, and so on.

Each such system is a silo – it imposes a particular structure on big data, and then it leverages that structure to do its tasks efficiently at scale.

The silo approach imposes fragmentation of data assets. It is expensive to maintain these silos. It is inefficient for humans and programs to master these silos – they have to learn the nuances of each silo to become an expert in exploiting it. As a result, we have all kinds of data administrators – a cube expert, a text expert, a spreadsheet expert, and so on.

The state of data fragmentation reminds me of the “dedicated function machines” that pre-dated the “Personal Computer”. We used to have electronic type-writers that would create documents, calculators that would calculate formulae, fax machines that would transmit documents, even tax machines that would calculate taxes. All of these machines were booted to relic-status at a museum by a general-purpose computer – the functions were ported on top of its computing framework and the data was stored in its file system. The unity of all of these functions and its data on the general-purpose computer gave rise to “integration” benefits. It made tasks easier: we can now fill our tax forms in (structured form-based) PDF documents, do tax calculations, and file taxes by transmitting the document – all on one platform. Our productivity has gone up. Indeed, the assimilation of data is leading to net new tasks that were not possible before. We can let programs search for previous year’s filings, read the entries, and populate this year’s forms from previous year’s filing to minimize data-entry errors.

We have the same opportunity in front of us now in the field of big data. For too long, have we relegated functions that work on big data to isolated “dedicated function machines.” These dedicated function machines are bad because they are not “open.” Data in a search engine can only be “searched” – it cannot be analyzed for sentiments or plagiarism or edited to insert or remove references. The data is the same, but each of these tasks requires a “dedicated function machine.”

We have the option to build a general purpose machine for big data – a multi-structured big data platform – that allows multiple structures of data to co-exist on a single platform that is flexible enough to perform multiple functions on data.

Such a platform, for example, would allow us to analyze structured payments data to identify our valuable customers, interpret sentiments of calls they made to us, analyze the most common problem across negative sentiment interactions, and predict the loss in revenue that can be prevented by solving that problem and the cost of acquiring net new customers to overcome the losses. Without a multi-structure big data platform, the above workflow is a 12-18 month cycle performed by a cross-functional team of “dedicated function experts” (CFO group, Customer Support group, Products group, Marketing group) – a bureaucratic mess of project management that produces results too expensively, too infrequently and too inaccurately, making simplifying assumptions at each step as they cannot agree on even basic metrics.

An open “Multi-Structured Big Data Platform” would be hugely enabling and open up vast efficiency and functionality that we can’t imagine today.



25
May
By jonbock in Analytics, MapReduce on May 25, 2011
   

In case you missed the news, Aster Data just took another step to make SQL-MapReduce the best programming framework for big data analytics. The Aster Data SQL-MapReduce® Developer Portal is the first collaborative online developer community for SQL-MapReduce analytics, our framework for processing non-relational data and ultra-fast analytics. It builds on other efforts to enable MapReduce analytics including: Developer Center, a resource center for MapReduce and SQL-MapReduce developers; Aster Data Developer Express, the first integrated development environment for SQL-MapReduce; and Aster Data Analytic Foundation, a suite of ready-to-use SQL-MapReduce functions.

The Developer Portal gives our customers and partners a community for collaborating with peers to leverage the flexibility and power of SQL-MapReduce for analytics that were previously impossible or impractical. Data scientists, quantitative analysts, and developers from customers, partners, and Aster Data are using the portal to highlight insights and best practices, share analytic functions, and leverage the experience and knowledge of the community to easily harness the power of SQL-MapReduce for big data analytics.

The portal enables collaboration that is key in making it easy for our customers to become SQL-MapReduce experts so they can solve core business challenges. As Navdeep Alam, director of data architecture at Mzinga, said, the portal “will allow us the ability to share and leverage insights with others in using big data analytics to attain a deeper understanding of customers’ behavior and create competitive advantage for our business.”

We’re seeing strong interest in the Developer Portal from our current customers. Early activity and content on the portal includes discussions about using the GSL libraries, programming in .NET, and writing sessionization and sampling functions. We plan to expand on this with tutorials for additional functions over the next few months.

If you aren’t already a customer, we encourage you to get started at the Aster Data Developer Center, where you can get your hands on SQL-MapReduce by downloading Aster Data Developer Express for free and find links to other resources like www.mapreduce.org.  If you are an Aster Data customer, we encourage you to also register for access to the new SQL-MapReduce Developer Portal for additional content and learning.

We’re always interested in your feedback as to how we can better help developers learn about and use MapReduce and Aster Data’s SQL-MapReduce.  If you have any suggestions, please feel free to add them below in the comments.



02
Feb
By Tasso Argyros in Analytic platform, Analytics on February 2, 2011
   

In my previous post, I spoke about how strongly I feel that this is the year that the analytic platform will become its own distinct and unique category.  As the market as a whole realizes the value of integrated data and process management, in-database applications and in-database analytics, the “analytic platform”, or “analytic computing system”, or “data analytics server” (pick your name) will gain even more momentum, reaching critical mass this year.

In this process, you will see significant movement from vendors, first in their marketing collateral (as it is always the case for followers in a technology space) and then scrambling to cover their product gaps in the 5 categories that define a true analytic platform that I mentioned in Part I of 2011: – The Year of the Analytics Platform.

What took Aster Data 6+ years to build is impossible to be done overnight, or over a few releases (side note: if you are interested in software product development and haven’t read the Mythical Man-Month, now is a good time – it’s an all-time classic and explains this point very clearly), and especially if the fundamental architecture is not there from day one.

But the momentum for the analytic platform category is there and, at this point, is irreversible. Part of this powerful trend is derived from the central place that analytics is taking in the enterprise and government. Analytics today is not a luxury, but a necessity for competitiveness. Every industry today is thinking how to employ analytics to better understand its customers, cut costs, and increase revenues. For example, companies in the financial services sector, a fiercely competitive space, want to use the wealth of data they have to become more relevant to their customers, increase customer satisfaction and retention rates. Governments’ use of data and analytics is one of few last resorts against terrorism and cyber threats. In retail, the advent of Internet, social networks, and globalization has increased competition and reduced margins. Using analytics to understand cross-channel behavior and preferences of consumers improves the returns of marketing campaigns and optimizes product pricing and placement, and can make the difference between red and black ink at the bottom of the balance sheet. Read the rest of this entry »



26
Jan
By Tasso Argyros in Analytic platform, Analytics, Database, MapReduce on January 26, 2011
   

When we kicked off Aster Data back in 2005, we envisioned building a product that would advance the state of the art in data management in two areas; (1) size and diversity of data and (2) depth of insight/analytics. My co-founders and I quickly realized that building just another database wouldn’t cut it. With yet-another-database, even if we enabled companies to more cost-effectively manage large data sizes, it was not going to be enough given the explosion in diverse data types and the massive need to process all of it. So we set out to build a new platform that would solve these challenges – what’s now commonly known as the ‘Big Data’ challenge.

Fast forward to 2008 when Aster Data led the way in putting massive parallel processing inside a MPP database, using MapReduce, to advance how you process massive amounts of diverse data. While this was fully aligned with our vision for managing hoards of diverse data and allowing deep data processing in a single platform, most thought it was intriguing but couldn’t quite see the light in terms of where the future was going. At one point, we thought of naming our product XAP – “extreme analytic platform” or “extreme analytic processing” as that’s what it was designed to do from day one. However, we thought better of it since we thought we would have to educate people too much on what an “analytic platform” was and how it was different from a traditional DBMS for data warehousing. Since we also were serving the data architects in organizations as well as the front-line business that demands better, faster analytics, we needed to use terminology that resonated with both.

Then, in the fall of 2009, with our flagship product Aster Data nCluster 4.0, we made further strides in running advanced analytics inside the database by including all the built-in application services (e.g. like dynamic WLM, backup, monitoring, etc) to go with it. At that time, we referred to it as a Data-Application Server – which our customers quickly started calling a Data-Analytics Server.  I remember when analyst Jim Kobielus at Forrester said,

“It’s really innovative and I don’t use those terms lightly. Moving application logic into the data warehousing environment is ‘a logical next step’.”

And others saying,

“The platform takes a different approach from traditional data warehouses, DBMS and data analytics solutions by housing data and applications together in one system, fully parallelizing both. This eradicates the need for movements of massive amounts of data and the problems with latency and restricted access that creates.”

What they started to fully appreciate and realize is that big data is not just about storing hoards of data, but rather, cracking the code on how to process all of it in deep ways, at blazing fast speeds. Read the rest of this entry »



19
Nov
By Barton George in Analytics, Cloud Computing on November 19, 2010
   

Barton George is Cloud Computing and Scale-Out Evangelist for Dell.

Today at a press conference in San Francisco we announced the general availability of our Dell cloud solutions. One of the solutions we debuted was the Dell Cloud Solution for Data Analytics, a combination of our PowerEdge C servers with Aster Data’s nCluster, a massively parallel processing database with an integrated analytics engine.

Earlier this week I stopped by Aster Data‘s headquarters in San Carlos, CA and met up with their EVP of marketing, Sharmila Mulligan. I recorded this video where Sharmila discusses the Dell and Aster solution and the fantastic results a customer is seeing with it.

Some of the ground Sharmila covers:

  • What customer pain points and problems does this solution address (hint: organizations trying to manage huge amounts of both structured and unstructured data)
  • How Aster’s nCluster software is optimized for Dell PowerEdge C2100 and how it provides very high performance analytics as well as a cost effective way to store very large data.
  • (2:21) InsightExpress, a leading provider of digital marketing research solutions, has deployed the Dell and Aster analytics solution and has seen great results:
    • Up and running w/in 6 weeks
    • Queries that took 7-9 minutes now run in 3 seconds

Pau for now…

Extra-credit reading



09
Nov
By Mayank Bawa in Analytics, MapReduce on November 9, 2010
   

It’s ironic how all of a sudden Vertica is changing its focus from being a column-only database to claiming to be an Analytic Platform.

If you’ve used an Analytic Platform you know it’s more than just bolting in a layer of analytic functions on top of a database. But that’s how Vertica claims it’s now a full-blown analytic platform when in fact their analytics capability is rather thin. For instance, their first layer is a pair of window functions (CTE and CCE). The CCE window function is used, for example, to do sessionization. Vertica has a blog post that posits sessionization as a major advanced analytic operation. In truth, Vertica’s sessionization is not analytics. It is a basic data preparation step that adds a session attribute to each clickstream event so that very simple session-level analytics can be performed.

What’s interesting is the CCE window function is simply a pre-built function – some might say just syntactic sugar – that combines the functionality of finite width SQL window functions (LEAD/LAG) with CASE statements (WHEN condition THEN predicate). Nothing ground breaking to say the least!

For example, the CTE query referred to in a Vertica blog post can be rewritten very simply using SQL-99:

SELECT
symbol, bid, timestamp,
SUM(CASE WHEN bid > 10.6 THEN 1 ELSE 0 END)
OVER (PARTITION BY symbol ORDER BY timestamp) window_id
FROM tickstore;

The layering of custom pre-built functions has for a long time been the traditional way of adding functions to a database. The SQL-99 and SQL-2003 analytic functions themselves follow this tradition.

The problem with this is not just with Vertica but also with the giants of the market, Oracle and Microsoft for instance. Their approach is that the customer is at the mercy of the database vendor – pre-built analytic functions are hard-coded to every major release of the DBMS. There is no independence between the analytics layer and the DBMS – which real, well-architected analytic platforms need to have. Simply put, if you want to do a different sessionization semantic, you’ll have to wait for Vertica to build a whole new function. Read the rest of this entry »



08
Nov
By Steve Wooledge in Analytics on November 8, 2010
   

One of the coolest parts of my job is seeing how companies use analytics to drive their business. The term “big data” has become somewhat of a superstar in the world of analytics recently, but it’s also the complexity and richness of the insights from that data which make it a “big data” challenge for companies to tackle with traditional data management infrastructures. It’s not just size that matters – it’s analytical power. That is to say, what you DO with data.

And it’s not just in Silicon Valley or on Wall Street. October marked one year of the “Big Data Summit” road show which we hosted across the US to offer high-level executives, data analytic practitioners, and analysts the opportunity to share best practices and exchange ideas about solving big data problem within their industries and organizations. It was a huge success with an average of 80-100 people attending each summit in major cities including held New York, Chicago, San Francisco, Dallas, and Washington DC. We are starting the tour again later this month in New York City on November 18 and are rebranding it “Data Analytics Summit,” again because of the feedback that it’s more about the application of data in analytics and applications within a specific business area or industry.

Here are some examples. The attendees at the summits have been providing us with interesting data through surveys. The attendees are from a variety of industries, from traditional retailers to bleeding-edge digital media companies. We asked respondents questions like, “What are the biggest opportunities for benefiting from big data within the market?” Let me know if you think they missed any big opportunities. Here are a few of our findings:

- Data exploration to discover new market opportunities: Nearly 30% of respondents thought that analyzing big data to find “the next big thing” was a huge opportunity. This supports the notion that data scientists will be one of the sexiest jobs in the future.

- Behavioral targeting: 16% surveyed called out the importance of establishing links between purchasing behavior and areas like advertising spend to better tailor budgets and promotional campaign

- Social Network Analysis: 15% of those surveyed responded that using social network analysis to build a more complete profile of their customer base is a key business opportunity

- Monetizing Data: 15% of respondents say monetizing data is key for organizations seeking to unlock the hidden value within previously untapped asset

- Fraud Reduction and Risk Profiling: Distinguishing good customers from bad ones, for fraud reduction (10%) and risk profiling (10%), was identified as critical for financial institutions

Another general observation from attendees is that using sampled or aggregated data is no longer a viable business option for rich analytics and there is an urgent need to analyze all available data including structured and unstructured data.

What other areas do you see? Let us know what you think or if you have any questions on the statistics. I don’t claim to be an industry analyst, but it was fun to look at the breakdown of how various cities responded to the survey.