Archive for the ‘Business intelligence’ Category

26
Nov
   

Speaking of ending things on a high note, New York City on December 6th will play host to the final event in the Big Analytics 2013 Roadshow series. Big Analytics 2013 New York is taking place at the Sheraton New York Hotel and Towers in the heart of Midtown on bustling 7th Avenue.

As we reflect on the illustrious journey of the Big Analytics 2013 Roadshow, kicking off in San Francisco, this year the Roadshow traveled through major international destinations including Atlanta, Dallas, Beijing, Tokyo, London and finally culminating at the Big Apple – it truly capsulated the appetite today for collecting, processing, understanding and analyzing data.

Big Analytics Atlanta 2013 photo

Big Analytics Roadshow 2013 stops in Atlanta

Drawing business & technical audiences across the globe, the roadshow afforded the attendees an opportunity to learn more about the convergence of technologies and methods like data science, digital marketing, data warehousing, Hadoop, and discovery platforms. Going beyond the “big data” hype, the event offered learning opportunities on how technologies and ideas combine to drive real business innovation. Our unyielding focus on results from data is truly what made the events so successful.

Continuing on with the rich lineage of delivering quality Big Data information, the New York event promises to pack tremendous amount of Big Data learning & education. The keynotes for the event include such industry luminaries as Dan Vesset, Program VP of Business Analytics at IDC, Tasso Argyros, Senior VP of Big Data at Teradata & Peter Lee, Senior VP of Tibco Software.

Photo of the Teradata Aster team in Dallas

Teradata team at the Dallas Big Analytics Roadshow

The keynotes will be followed by three tracks around Big Data Architecture, Data Science & Discovery & Data Driven Marketing. Each of these tracks will feature industry luminaries like Richard Winter of WinterCorp, John O’Brien of Radiant Advisors & John Lovett of Web Analytics Demystified. They will be joined by vendor presentations from Shaun Connolly of Hortonworks, Todd Talkington of Tableau & Brian Dirking of Alteryx.

As with every Big Analytics event, it presents an exciting opportunity to hear first hand from leading organizations like Comcast, Gilt Groupe & Meredith Corporation on how they are using Big Data Analytics & Discovery to deliver tremendous business value.

In summary, the event promises to be nothing less than the Oscars of Big Data and will bring together the who’s who of the Big Data industry. So, mark your calendars, pack your bags and get ready to attend the biggest Big Data event of the year.



12
Nov
   

I’ve been working in the analytics and database market for 12 years. One of the most interesting pieces of that journey has been seeing how the market is ever-shifting. Both the technology and business trends during these short 12 years have massively changed not only the tech landscape today, but also the future of evolution of analytic technology. From a “buzz” perspective, I’ve seen “corporate initiatives” and “big ideas” come and go. Everything from “e-business intelligence,” which was a popular term when I first started working at Business Objects in 2001, to corporate performance management (CPM) and “the balanced scorecard.” From business process management (BPM) to “big data”, and now the architectures and tools that everyone is talking about.

The one golden thread that ties each of these terms, ideas and innovations together is that each is aiming to solve the questions related to what we are today calling “big data.” At the core of it all, we are searching for the right way to enable the explosion of data and analytics that today’s organizations are faced with, to simply be harnessed and understood. People call this the “logical data warehouse”, “big data architecture”, “next-generation data architecture”, “modern data architecture”, “unified data architecture”, or (I just saw last week) “unified data platform”.  What is all the fuss about, and what is really new?  My goal in this post and the next few will be to explain how the customers I work with are attacking the “big data” problem. We call it the Teradata Unified Data Architecture, but whatever you call it, the goals and concepts remain the same.

Mark Beyer from Gartner is credited with coining the term “logical data warehouse” and there is an interesting story and explanation. A nice summary of the term is,

The logical data warehouse is the next significant evolution of information integration because it includes ALL of its progenitors and demands that each piece of previously proven engineering in the architecture should be used in its best and most appropriate place.  …

And

… The logical data warehouse will finally provide the information services platform for the applications of the highly competitive companies and organizations in the early 21st Century.”

The idea of this next-generation architecture is simple: When organizations put ALL of their data to work, they can make smarter decisions.

It sounds easy, but as data volumes and data types explode, so does the need for more tools in your toolbox to help make sense of it all. Within your toolbox, data is NOT all nails and you definitely need to be armed with more than a hammer.

In my view, enterprise data architectures are evolving to let organizations capture more data. The data was previously untapped because the hardware costs required to store and process the enormous amount of data was simply too big. However, the declining costs of hardware (thanks to Moore’s law) have opened the door for more data (types, volumes, etc.) and processing technologies to be successful. But no singular technology can be engineered and optimized for every dimension of analytic processing including scale, performance or concurrent workloads.

Thus, organizations are creating best-of-breed architectures by taking advantage of new technologies and workload-specific platforms such as MapReduce, Hadoop, MPP data warehouses, discovery platforms and event processing, and putting them together into, a seamless, transparent and powerful analytic environment. This modern enterprise architecture enables users to get deep business insights and allows ALL data to be available to an organization, creating competitive advantage while lowering the total system cost.

But why not just throw all your data into files and put a search engine like Google on top? Why not just build a data warehouse and extend it with support for “unstructured” data? Because, in the world of big data, the one-size-sits-all approach simply doesn’t work.

Different technologies are more efficient at solving different analytical or processing problems. To steal an analogy from Dave Schrader—a colleague of mine—it’s not unlike a hybrid car. The Toyota Prius can average 47 mpg with hybrid (gas and electric) vs. 24 mpg with a “typical” gas-only car – almost double! But you do not pay twice as much for the car.

How’d they do it? Toyota engineered a system that uses gas when I need to accelerate fast (and also to recharge the battery at the same time), electric mostly when driving around town, and braking to recharge the battery.

Three components integrated seamlessly – the driver doesn’t need to know how it works.  It is the same idea with the Teradata UDA, which is a hybrid architecture for extracting the most insights per unit of time – at least doubling your insight capabilities at reasonable cost. And, business users don’t need to know all of the gory details. Teradata builds analytic engines—much like the hybrid drive train Toyota builds— that are optimized and used in combinations with different ecosystem tools depending on customer preferences and requirements, within their overall data architecture.

In the case of the hybrid car, battery power and braking systems, which recharge the battery, are the “new innovations” combined with gas-powered engines. Similarly, there are several innovations in data management and analytics that are shaping the unified data architecture, such as discovery platforms and Hadoop. Each customer’s architecture is different depending on requirements and preferences, but the Teradata Unified Data Architecture recommends three core components that are key components in a comprehensive architecture – a data platform (often called “Data Lake”), a discovery platform and an integrated data warehouse. There are other components such as event processing, search, and streaming which can be used in data architectures, but I’ll focus on the three core areas in this blog post.

Data Lakes

In many ways, this is not unlike the operational data store we’ve seen between transactional systems and the data warehouse, but the data lake is bigger and less structured. Any file can be “dumped” in the lake with no attention to data integration or transformation. New technologies like Hadoop provide a file-based approach to capturing large amounts of data without requiring ETL in advance. This enables large-scale data processing for data refining, structuring, and exploring data prior to downstream analysis in workload-specific systems, which are used to discover new insights and then move those insights into business operations for use by hundreds of end-users and applications.

Discovery Platforms

Discovery platforms are a new workload-specific system that is optimized to perform multiple analytic techniques in a single workflow to combine SQL with statistics, MapReduce, graph, or text analysis to look at data from multiple perspectives. The goal is to ultimately provide more granular and accurate insights to users about their business. Discovery Platforms enable a faster investigative analytical process to find new patterns in data, identify different types fraud or consumer behavior that traditional data mining approaches may have missed.

Integrated Data Warehouses

With all the excitement about what’s new, companies quickly forget the value of consistent, integrated data for reuse across the enterprise. The integrated data warehouse has become a mission-critical operational system which is the point of value realization or “operationalization” for information. The data within a massively parallel data warehouse has been cleansed, and provides a consistent source of data for enterprise analytics. By integrating relevant data from across the entire organization, a couple key goals are achieved. First, they can answer the kind of sophisticated, impactful questions that require cross-functional analyses. Second, they can answer questions more completely by making relevant data available across all levels of the organization. Data lakes (Hadoop) and discovery platforms complement the data warehouse by enriching it with new data and new insights that can now be delivered to 1000’s of users and applications with consistent performance (i.e., they get the information they need quickly).

A critical part of incorporating these novel approaches to data management and analytics is putting new insights and technologies into production in reliable, secure and manageable ways for organizations.  Fundamentals of master data management, metadata, security, data lineage, integrated data and reuse all still apply!

The excitement of experimenting with new technologies is fading. More and more, our customers are asking us about ways to put the power of new systems (and the insights they provide) into large-scale operation and production. This requires unified system management and monitoring, intelligent query routing, metadata about incoming data and the transformations applied throughout the data processing and analytical process, and role-based security that respects and applies data privacy, encryption and other policies required. This is where I will spend a good bit of time on my next blog post.



12
Jun
   

Back in 2005, when we first founded Aster Data, our vision was to take some of the latest technology innovations – including MPP shared-nothing architectures; Linux-based commodity hardware; and novel analytical interfaces like Google’s MapReduce – and bring them to mainstream enterprises. This vision translated into a strategy focused not only on big data innovations, but also on delivering technologies that make big data viable for enterprise environments. SQL-MapReduce®, our industry-leading patented technology that combines standard SQL processing with a native MapReduce execution environment, is one example of how we make big data enterprise ready.

Today we have completed another major milestone on providing value to our customers by announcing a major innovation: Aster SQL-H™, a seamless way to execute SQL & SQL-MapReduce on Apache™ Hadoop™ data.

This is a significant step forward from what was state-of-the-art until yesterday. What was missing? A common DBMS-Hadoop connector operating at the physical layer. This means that getting data from Hadoop to a database required a Hadoop expert in the middle to do the data cleansing and the data type translation. If the data was not 100% clean (which is the case in most circumstances) a developer was needed to get it to a consistent, proper form. Besides wasting the valuable time of that expert, this process meant that business analysts couldn’t directly access and analyze data in Hadoop clusters. Other database connectors require duplicating the data into HDFS by using proprietary formats; a cumbersome and expensive approach by any measure.

SQL-H, an industry-first, solves all those problems.

First, we have integrated Aster’s metadata engine with Hadoop’s emerging metadata standard, HCatalog. This means that data stored in Hadoop using Pig, Hive & HBase can be “seen” in an Aster system as if they are just another Aster view. The business implication is that a business analyst using standard SQL or a BI tool can have full and seamless access to Hadoop data through the Aster’s standard ODBC/JDBC connector and Aster’s SQL engine. There is no need to have a human in the middle to translate the data or ensure its consistency; and no need to file tickets or call up experts to get the data the business needs. Everything happens transparently, seamlessly, and instantly. This is an industry first, since today all available Hadoop tools either do not provide standard SQL interfaces that are well optimized, do not provide native BI compatibility, or require manual data translation and movement from Hadoop to a third party system. None of these approaches are viable options for SQL & BI execution on Hadoop data, thus making it hard for enterprises to get value from Hadoop.

Secondly, SQL-H provides a high-performance, type-safe data connector, that can take a SQL or SQL-MapReduce query that involves Hadoop data, automatically select the minimum subset of data in Hadoop that is required for execution of the query, and run the query on the Aster system. The performance of running SQL and SQL-MapReduce analytics in Aster is significantly higher than Hadoop because (a) Aster can optimize data partitioning and distribution, thus reducing network transfers and overhead; (b) Aster’s engine can keep statistics about the data and use that to optimize execution of both SQL & MapReduce; (c) Aster’s SQL queries are cost-based-optimized which means that it can handle very complex SQL, including SQL produced by BI tools, very efficiently.

In addition, one can take advantage of SQL-H to apply the 50+ pre-build SQL-MapReduce apps that Teradata Aster provides on Hadoop data, thus doing big data analytics that are impossible to do in every other database without having to write a single line of Java MapReduce code! These apps include functions for path & pattern analysis, statistics, graph, text analysis, and more.

Teradata Aster is committed to groundbreaking product innovation as the key strategy in maintaining our #1 position in the big analytics market. SQL-H is another important step that we expect will make Hadoop and big data analytics much more palatable for enterprise environments, allowing business analysts, SQL power-users & BI tool users to analyze Hadoop data without having to learn about Hadoop interfaces and code.

If you want to find out more we’ll be talking about SQL-H at Hadoop Summit, on webcast taking place June 21st, at the upcoming Big Analytics 2012 events in Chicago & New York, and at the annual Teradata Partners event.



21
Mar
   

The conversation around “big data” has been evolving beyond a technology discussion to focus on analytics and applications to the business.  As such, we’ve worked with our partners and customers to expand the scope of the Big Data Summit events we initiated back in 2009 and have created Big Analytics 2012 – a new series of roadshow events kicking off in San Francisco on April 19, 2012 .

According to previous attendees and market surveys, the greatest big data application opportunities in businesses are:

- Digital marketing applications such as multi-channel analytics and testing to better understand and engage your customers

- Using data science and analytics to explore and develop new markets or data-driven services

Companies like LinkedIn, Edmodo, eBay,  and others have effectively applied data science and analytics to take advantage of the new economics of data. And they are ready to share details of what they have learned along the way.

Big Analytics 2012 is a half-day event, is absolutely free to attend, and will include insight from industry insiders in two different tracks: Digital Marketing Optimization, and Data Science and Analytics. Big Analytics 2012 is a great way to meet and hear from your peers such as: executives who want to learn more about leveraging advanced analytics to a competitive advantage, interactive marketing innovators who want access to “game changing” insights for digital marketing optimization, enterprise architects and business intelligence professionals looking to provide big data infrastructure and data scientists and business analysts who are responsible for developing new data-driven products or business insights.

Come to learn from the panel of experts and stay for an evening networking reception that will put you in touch with big data and analytics professionals from throughout the industry. Big Analytics 2012 will be coming soon to a city near you. Click here to learn more about the event and to register now.

 



19
Aug
   

I am curious if anyone out there is attending the TDWI World Conference in San Diego this week? If so and you would like to meet up with me, please do drop me a line or comment below as I will be in attendance. I’m of course very excited to be making the trip to sunny San Diego and hope to catch a glimpse of Ron Burgundy and the channel 4 news team! :-)

But of course it’s not all fun and games, as I’ll participate in one of TDWI’s famous Tool Talk evening sessions discussing data warehouse appliances. This should make for some great dialogue between me and other database appliance players, especially given the recent attention our industry has seen. I think Aster has a really different approach to analyzing big data and look forward to discussing exactly why.

For those interested in the talk, here are the details..come on by and let’s chat!
What:TDWI Tool Talk Session on data warehouse appliances
When: Wednesday, August 20, 2008 @ 6:00p.m.
Where: Manchester Grand Hyatt, San Diego, CA



05
Aug
By Mayank Bawa in Analytics, Blogroll, Business analytics, Business intelligence, Database on August 5, 2008
   

Today we are pleased to welcome Pentaho as a partner to Aster Data Systems. What this means is that our customers can now use Pentaho open-source BI products for reporting and analysis on top of Aster nCluster.

We have been working with Pentaho for some time on testing the integration between their BI products and our analytic database. We’ve been impressed with Pentaho’s technical team and the capabilities of the product they’ve built together with the open source community. Pentaho recently announced a new iPhone application which is darn cool!

I guess, by induction, Aster results can be seen on the iPhone too. :-)



25
Jul
   

Stuart announced yesterday that Microsoft has agreed to acquire DATAllegro. It is pretty clear Stuart and his team have worked hard for this day: it is heartening to see that hard work gets rewarded sooner or later. Congratulations, DATAllegro!

Microsoft is clearly acquiring DATAllegro for its technology. Indeed, Stuart says that DATAllegro will start porting away from Ingres to SQL Server once the acquisition completes. Microsoft’s plan is to provide a separate offering from its traditional SQL Server Clustering.

In effect, this event provides a second admission from a traditional database vendor that OLTP databases are not up to the task for large-scale analytics. The first admission was in 1990s when Sybase (ironically, originator of SQL Server code base) offered Sybase IQ as a separate product from its OLTP offering.

The market already knew this fact: the key point here is that Microsoft is waking up to the realization.

A corollary is that it must have been really difficult for Microsoft SQL Server division to scale SQL Server for larger scale deployments. Clearly, Microsoft is an engineering shop and the effort of integrating alien technology into their SQL Server code-base must have been carefully evaluated for a build-vs-buy decision. The buy decision is a tacit admission that it is incredibly hard to scale their SQL Server offering with its roots in traditional OLTP database.

We can expect Oracle, IBM, and HP to have similar problems in scaling their 1980s code-base for the needs of data-scale and query-workloads of today’s data warehousing systems. Will the market wait for Oracle, IBM, and HP’s efforts to scale to come to fruition? Or will Oracle, IBM, and HP soon acquire companies to improve their own scalability?

It is interesting to note that DATAllegro will be moving to an all-Microsoft platform. The acquisition could also be read as a defensive move by Microsoft. All of the large-scale data warehouse offerings today are based on Unix variants (Unix/Linux/Solaris), thus leading to the uncomfortable situation at some all-Microsoft shops who chose to run Unix-based data warehouse offerings because SQL Server would not scale. Microsoft needed an offering that could preserve their enterprise-wide customers on Microsoft platforms.

Finally, there is a difference in philosophy between Microsoft’s and DATAllegro’s product offerings. Microsoft SQLServer has sought to cater to the lower end of the BI spectrum; DATAllegro has actively courted the higher end. Correspondingly, DATAllegro uses powerful servers, fast storage, and expensive interconnect to deliver a solution. Microsoft SQL Server has sought to deliver a solution at a much lower cost. We can only wait and watch: will the algorithms of one philosophy work well in the infrastructure of the other?

At Aster Data Systems, we believe that the market dynamics will not change as a result of this acquisition: companies will want the best solutions to derive the most value from data. In the last decade, Internet changed the world and old-market behemoths could not translate their might into the new market. In this decade, Data will produce a similar disruption.



20
May
By Mayank Bawa in Analytics, Blogroll, Business analytics, Business intelligence on May 20, 2008
   

I’ve remarked in an earlier post that the usage of data is changing and new applications are on the horizon. Over the past few years, we’ve observed or invented quite a few interesting design patterns for business processes that use data.

There are no books or tutorials for these new applications, and they are certainly not being taught in the classrooms of today. So I figured I’d share some of these design patterns on our blog.

Let me start with a design pattern that we internally call “The Automated Feedback Loop”. I didn’t invent it but I’ve seen it being applied successfully at search engines during my research days at Stanford University. I certainly think there is a lot of power that remains to be leveraged from this design principle in other verticals and applications.

Consider a search engine. Users ask keyword queries. The search engine ranks documents that match the queries and provides 10 results to the user. The user clicks one of these results, perhaps comes back and clicks another result, and then does not come back.

How do search engines improve themselves? One key way is by recording the number of times users clicked or ignored a result page. They also record the speed with which a user returned from that page to continue his exploration. The quicker the user returned, the less relevant the page was for user’s query. The relevancy of a page now becomes a factor in the ranking function itself for future queries.

The Automated Feedback LoopSo here is an interesting feedback loop. We offered options (search results) to the user, and the user provided us feedback (came back or not) on how good one option was compared to the others. We then used this knowledge to adapt and improve future options. The more the user engages, the more everyone wins!

This same pattern could hold true in a lot of consumer-facing applications that provide consumers with options.

Advertising networks, direct marketing companies, and social networking sites are taking consumer feedback into account. However, this feedback loop in most companies today is manual and not automated. Usually the optimization (adapting to user response) is done by domain experts who read historical reports from their warehouses, build an intuition of user needs and then apply their intuition to build a model that runs everything from marketing campaigns to supply chain processes.

Such a manual feedback loop has two significant drawbacks:

1. The process is expensive: it takes a lot of time, trial and error for humans to become experts, and as a result the experts are hard to find and worth their weight in gold.

2. The process is ineffective: humans can only think about handful of parameters and they optimize for the most popular products or processes (e.g., “Top 5 products or Top 10 destinations”). Everything outside this area of comfort is left under-optimized.

Such a narrow focus on optimization is severely limiting. The incorporation of Top 10 trends into future behavior is akin to a search engine saying that it will optimize for only the top 10 searches of the quarter. I am sure Google would definitely be a less valuable company then, and the world a less engaging place.

I strongly believe that there are rich dividends to be reaped if we can automate the feedback process in more consumer-facing areas. What about hotel selection, airline travel, and e-mail marketing campaigns? E-tailers, news (content providers), insurance, banks and media sites are all offering the consumer a choice for his time and money. Why not instill an automated feedback loop in all consumer-facing processes to improve consumer experience? The world will be a better place for both the consumer and the provider!