Archive for the ‘Analytics’ Category
|
|
|
|
|
Barton George is Cloud Computing and Scale-Out Evangelist for Dell.
Today at a press conference in San Francisco we announced the general availability of our Dell cloud solutions. One of the solutions we debuted was the Dell Cloud Solution for Data Analytics, a combination of our PowerEdge C servers with Aster Data’s nCluster, a massively parallel processing database with an integrated analytics engine.
Earlier this week I stopped by Aster Data‘s headquarters in San Carlos, CA and met up with their EVP of marketing, Sharmila Mulligan. I recorded this video where Sharmila discusses the Dell and Aster solution and the fantastic results a customer is seeing with it.
Some of the ground Sharmila covers:
- What customer pain points and problems does this solution address (hint: organizations trying to manage huge amounts of both structured and unstructured data)
- How Aster’s nCluster software is optimized for Dell PowerEdge C2100 and how it provides very high performance analytics as well as a cost effective way to store very large data.
- (2:21) InsightExpress, a leading provider of digital marketing research solutions, has deployed the Dell and Aster analytics solution and has seen great results:
- Up and running w/in 6 weeks
- Queries that took 7-9 minutes now run in 3 seconds
Pau for now…
Extra-credit reading
|
|
|
|
|
|
|
|
|
|
|
|
|
It’s ironic how all of a sudden Vertica is changing its focus from being a column-only database to claiming to be an Analytic Platform.
If you’ve used an Analytic Platform you know it’s more than just bolting in a layer of analytic functions on top of a database. But that’s how Vertica claims it’s now a full-blown analytic platform when in fact their analytics capability is rather thin. For instance, their first layer is a pair of window functions (CTE and CCE). The CCE window function is used, for example, to do sessionization. Vertica has a blog post that posits sessionization as a major advanced analytic operation. In truth, Vertica’s sessionization is not analytics. It is a basic data preparation step that adds a session attribute to each clickstream event so that very simple session-level analytics can be performed.
What’s interesting is the CCE window function is simply a pre-built function – some might say just syntactic sugar – that combines the functionality of finite width SQL window functions (LEAD/LAG) with CASE statements (WHEN condition THEN predicate). Nothing ground breaking to say the least!
For example, the CTE query referred to in a Vertica blog post can be rewritten very simply using SQL-99:
SELECT
symbol, bid, timestamp,
SUM(CASE WHEN bid > 10.6 THEN 1 ELSE 0 END)
OVER (PARTITION BY symbol ORDER BY timestamp) window_id
FROM tickstore;
The layering of custom pre-built functions has for a long time been the traditional way of adding functions to a database. The SQL-99 and SQL-2003 analytic functions themselves follow this tradition.
The problem with this is not just with Vertica but also with the giants of the market, Oracle and Microsoft for instance. Their approach is that the customer is at the mercy of the database vendor – pre-built analytic functions are hard-coded to every major release of the DBMS. There is no independence between the analytics layer and the DBMS – which real, well-architected analytic platforms need to have. Simply put, if you want to do a different sessionization semantic, you’ll have to wait for Vertica to build a whole new function. Read the rest of this entry »
|
|
|
|
|
|
|
|
|
|
|
By Steve Wooledge in Analytics on November 8, 2010 |
| |
|
|
|
|
One of the coolest parts of my job is seeing how companies use analytics to drive their business. The term “big data” has become somewhat of a superstar in the world of analytics recently, but it’s also the complexity and richness of the insights from that data which make it a “big data” challenge for companies to tackle with traditional data management infrastructures. It’s not just size that matters – it’s analytical power. That is to say, what you DO with data.
And it’s not just in Silicon Valley or on Wall Street. October marked one year of the “Big Data Summit” road show which we hosted across the US to offer high-level executives, data analytic practitioners, and analysts the opportunity to share best practices and exchange ideas about solving big data problem within their industries and organizations. It was a huge success with an average of 80-100 people attending each summit in major cities including held New York, Chicago, San Francisco, Dallas, and Washington DC. We are starting the tour again later this month in New York City on November 18 and are rebranding it “Data Analytics Summit,” again because of the feedback that it’s more about the application of data in analytics and applications within a specific business area or industry.
Here are some examples. The attendees at the summits have been providing us with interesting data through surveys. The attendees are from a variety of industries, from traditional retailers to bleeding-edge digital media companies. We asked respondents questions like, “What are the biggest opportunities for benefiting from big data within the market?” Let me know if you think they missed any big opportunities. Here are a few of our findings:
- Data exploration to discover new market opportunities: Nearly 30% of respondents thought that analyzing big data to find “the next big thing” was a huge opportunity. This supports the notion that data scientists will be one of the sexiest jobs in the future.
- Behavioral targeting: 16% surveyed called out the importance of establishing links between purchasing behavior and areas like advertising spend to better tailor budgets and promotional campaign
- Social Network Analysis: 15% of those surveyed responded that using social network analysis to build a more complete profile of their customer base is a key business opportunity
- Monetizing Data: 15% of respondents say monetizing data is key for organizations seeking to unlock the hidden value within previously untapped asset
- Fraud Reduction and Risk Profiling: Distinguishing good customers from bad ones, for fraud reduction (10%) and risk profiling (10%), was identified as critical for financial institutions
Another general observation from attendees is that using sampled or aggregated data is no longer a viable business option for rich analytics and there is an urgent need to analyze all available data including structured and unstructured data.

What other areas do you see? Let us know what you think or if you have any questions on the statistics. I don’t claim to be an industry analyst, but it was fun to look at the breakdown of how various cities responded to the survey.
|
|
|
|
|
|
|
|
|
|
|
|
|
In the recently announced nCluster 4.6 we continue to innovate and improve nCluster on many fronts to make it the high performance platform of choice for deep, high value analytics. One of the new features is a hybrid data store, which now gives nCluster users the option of storing their data in either a row or column orientation. With the addition of this feature, nCluster is the first data warehouse and analytics platform to combine a tightly integrated hybrid row- and column-based storage with SQL-MapReduce processing capabilities. In this post we’ll discuss the technical details of the new hybrid store as well as the nCluster customer workloads that prompted the design.
Row- and Column-store Hybrid
Let’s start with the basics of row and column stores. In a row store, all of the attribute values for a particular record are stored together in the same on-disk page. Put another way, each page contains one or more entire records. Such a layout is the canonical database design found in most database textbooks, as well as both open source and commercial databases. A column store flips this model around and stores values for only one attribute on each on-disk page. This means that to construct, say, an entire two-attribute record will require data from two different pages in a column store, whereas in a row-store the entire record would be found on only one page. If a query needs only one attribute in that same two-attribute table, then the column store will deliver more needed values per page read. The row store must read pages containing both attributes even though only one attribute is needed, wasting some I/O bandwidth on the unused attribute. Research has shown that for workloads where a small percentage of attributes in a table are required, a column oriented storage model can result in much more efficient I/O because only the required data is read from disk. As more attributes are used, a column store becomes less competitive with a row store because there is an overhead associated with combining the separate attribute values into complete records. In fact, for queries that access many (or all!) attributes of a table, a column store performs worse and is the wrong choice. Having a hybrid store provides the ability to choose the optimal storage for a given query workload.
Aster Data customers have a wide range of analytics use cases from simple reporting to advanced analytics such as fraud detection, data mining, and time series analysis. Reports typically ask relatively simple questions of data such as total sales per region or per month. Such queries tend to require only a few attributes and therefore benefit from columnar storage. In contrast, deeper analytics such as applying a fraud detection model to a large table of customer behaviors relies on applying that model to many attributes across many rows of data. In that case, a row store makes a lot more sense.
Clearly there are cases where having both a column and row store benefits an analytics workload, which is why we have added the hybrid data store feature to nCluster 4.6.
Performance Observations
What does the addition of a hybrid store mean for typical nCluster workloads? The performance improvements from reduced I/O can be considerable: a 5x to 15x speedup was typical in some in-house tests on reporting queries. These queries were generally simple reporting queries with a few joins and aggregation. Performance improvement on more complex analytics workloads, however, was highly variable, so we took a closer look at why. As one would expect (and a number of columnar publications demonstrate), we also find that queries that use all or almost all attributes in a table benefit little or are slowed down by columnar storage. Deep analytical queries in nCluster like scoring, fraud detection, and time series analysis tend to use a higher percentage of columns. Therefore, as a class, they did not benefit as much from columnar, but when these queries do use a smaller percentage of columns, choosing the columnar option in the hybrid store provided good speedup.
A further reason that these more complex queries benefit less from a columnar approach is Amdahl’s law. As we push more complex applications into the database via SQL-MapReduce, we see a higher percentage of query time spent running application code rather than reading or writing from disk. This highlights an important trend in data analytics: user CPU cycles per byte is increasing, which is one reason that deployed nCluster nodes tend to have a higher CPU per byte ratio than one might expect in a data warehouse. The takeaway message is that the hybrid store provides an important performance benefit for simple reporting queries and for analytical workloads that include a mix of ad hoc and simple reporting queries, performance is maximized by choosing the data orientation that is best suited for each workload.
Implementation
The hybrid store is made possible by integrating a column store within the nCluster data storage and query-processing engine, which already used row-storage. The new column storage is tightly integrated with existing query processing and system services. This means that any query answerable by the existing Aster storage engine can now also be answered in our hybrid store, whether the data is stored in row or column orientation. Moreover, all SQL-MapReduce features, workload management, replication, fail-over, and cluster backup features are available to any data stored in the hybrid store.
Providing flexibility and high performance on a wide range of workloads, makes Aster Data the best platform for high value analytics. To that end, we look forward to continuing development of the nCluster hybrid storage engine to further optimize row and column data access. Coupled with workload management and SQL-MapReduce, the new hybrid nCluster storage highlights Aster Data’s commitment to provide nCluster users with the most flexibility to make the most of their data.
|
|
|
|
|
|
|
|
|
|
|
|
|
Coming out of Stanford to start Aster Data five years back, my co-founders and I had to answer a lot of questions. What kind of an engineering team do we want to build? Do we want people experienced in systems or databases? Do we want to hire people from Oracle or another established organization? When you’re just starting a company, embarking on a journey that you know will have many turns, answers are not obvious.
What we ended up doing very early on is bet on intelligent, smart and adaptable engineers, as opposed to experience or a long resume. It turned out that this was the right thing to do because, as a startup, we had react to market needs and change our focus at a blink of an eye. Having a team of people that were used to tackling never-seen-before problems made us super-agile as a product organization. As the company grew, we ended up having a mix of people that combined expertise in certain areas and core engineering talent. But the culture of the company was set in stone even though we didn’t realize it: even today our interview process expects talent, intelligence and flexibility to be there and strongly complement the experience our candidates may have.
There are three things that are great about being an engineer at Aster Data:
Our Technology Stack is Really Tall.
We have people working right above the Kernel on filesystems, workload management, I/O performance, etc. We have many challenging problems that involve very large scale distributed systems – and I’m talking about the whole nine yards, including performance, reliability, manageability, and data management at scale. We have people working on database algorithms from the I/O stack to the SQL planner to no-SQL planners. And we have a team of people working on data mining and statistical algorithms on distributed systems (this is our “quant”? group since people there come with a background in physics as much as computer science). It’s really hard to get bored or stop learning here.
We Build Real Enterprise Software.
There’s a difference between the software one would write in a company like Aster Data versus a company like Facebook. Both companies write software for big data analysis. However, a company like Facebook solves their problem (a very big problem, indeed) for themselves and each engineer gets to work on a small piece of the pie. At Aster Data we write software for enterprises and due to our relatively small size each engineer makes a world of a difference. We also ship software to third-party people and they expect our software to be out-of-the-box resilient, reliable and easy to manage/debug. This makes the problem more challenging but also gives us great leverage: once we get something right, not one, nor two, but potentially hundreds or thousands of companies can benefit from our products. The impact of the work of each engineer at Aster Data is truly significant.
We’re Working on (Perhaps) the Biggest IT Revolution of the 21st Century.
Big Data. Analytics. Insights. Data Intelligence. Commodity hardware. Cloud/elastic data management. You name it. We have it. When we started Aster Data in 2005 we just wanted to help corporations analyze the mountains of data that they generate. We thought it was a critical problem for corporations if they wanted to remain competitive and profitable. But the size and importance of data grew beyond anyone’s expectations over the past few years. We can probably thank Google, Facebook and the other internet companies for demonstrating to the world what data analytics can do. Given the importance and impact of our work, there’s no ceiling on how successful we can become.
You’ve probably guessed it by now, but the reason I’m telling you all this is to also tell you that we’re hiring. If you think you have what it takes to join such an environment, I’d encourage you to apply. We get many applications daily so the best way to get an interview here is through a recommendation and referral. With tools like LinkedIn (who happens to be a customer) it’s really easy to explore your network. My LinkedIn profile is here, so see if we have a professional or academic connection. You can also look at our management team, board of directors, investors and advisors to see if there are any connections there. If there’s no common connection, feel free to email your resume to jobs@asterdata.com. However, to stand out I’d encourage you to tell us a couple of words about what excites you about Aster Data, large scale distributed systems, databases, analytics and/or startups that work to revolutionize an industry, and why you think you’ll be successful here. Finally, take a look at the events we either organize or participate in – it’s a great way to meet someone from our team and explain why you’re excited to join our quest to revolutionize data management and analytics.
|
|
|
|
|
|
|
|
|
|
|
By Tasso Argyros in Analytics on August 9, 2010 |
| |
|
|
|
|
Watching our customers use Aster Data to discover new insights and build new big data products is one of the most satisfying parts of my job. Having seen this process a few times, I found that it always has the same steps:
An Idea or Concept – Someone comes up with an idea of a hidden treasure that could be hidden in the data, e.g. a new customer segment that could be very profitable, a new pattern that reveals novel cases of fraud, or other event-triggered analysis.
Dataset – An idea based on data that doesn’t exist is like a great recipe without the ingredients. Hopefully the company has already deployed one or more big data repositories that have the necessary data in full detail (no summaries, sampling, etc). If that’s not the case, data has to be generated, captured and moved to a big data-analytics server, which is an MPP database with a fully integrated analytics engine, like Aster Data’s solution. It addresses both parts of the big data need – scalable data storage and data processing.
Iterative Experimentation – This is the fun part. In contrast to traditional reporting, where the idea translates almost automatically to a query or report (e.g.: I want to know average sales per store for the past 2 years), a big data product idea (e.g.: I want to know what is my most profitable customer segment) requires building an intuition about the data before coming up with the right answer. This can only be achieved by a large number of analytical queries using either SQL or MapReduce, and it’s the step where the analyst or data scientist builds their intuition and understanding of the dataset and of the hidden gems buried there.
Data Productization – Once iterative experimentation provides the data scientist with evidence of gold, the next step is to make the process repeatable so that its output can be systematically used by humans (e.g. marketing department) or systems (e.g. a credit card transaction clearing system that needs to identify fraudulent transactions). This requires not only a repeatable process but also data that’s certified to be of high quality and processing that can meet specific SLAs, always while using a hybrid of SQL and MapReduce for deep big data analysis
If you think about it, this process is similar to the process of coming up with a new product (software or otherwise). You start with an idea, you then get the first material and build a lot of prototypes. I’ve found that people who find an important and valuable data insight after a process of iterative experimentation feel the same satisfaction as an inventor who has just made a huge discovery. And the next natural step is to take that prototype, make it a repeatable manufacturing process and start using it in the real world.
In the “old”? world of simple reporting, the process of creating insights was straightforward. Respectively the value of the outcome (reports) was much lower and easily replicable by everyone. Big Data Analytics, on the other hand, require a touch of innovation and creativity, which is exactly why it is hard to replicate and why its results produce such important and sustainable advantages to businesses. I believe that Big Data Products are the next wave of corporate value creation and competitive differentiation.
|
|
|
|
|
|
|
|
|
|
|
|
|
I have always enjoyed the subtle irony of someone trying to be impressive by saying “my data warehouse is X Terabytes”? [muted: "and it's bigger than yours"?]! Why is this ironic? Because it describes a data warehouse, which is supposed to be all about data processing and analysis, using a storage metric. Having an obese 800 Terabytes system that may take hours or days to just do a single pass over the data is not impressive and definitely calls for some diet.
Surprisingly though, several vendors went down the path of making their data warehousing offerings fatter and fatter. Greenplum is a good example. Prior to Sun’s acquisition by Oracle, they were heavily pushing systems based on the Sun Thumper, a 48-disk-heavy 4U box that can store up to 100TBs/box. I was quite familiar with that box as it partly came out of a startup called Kealia that my Stanford advisor, David Cheriton, and Sun co-founder Andy Bechtolsheim had founded and then sold to Sun in 2004. I kept wondering, though, what a 50TB/CPU configuration has to do with data analytics.
After long deliberation I came to the conclusion that it has nothing to do with it. There were two reasons why people were interested in this configuration. First, there were some use cases that required “near-line storage”?, a term that’s used to describe a data repository whose major purpose is to store data but also allows for basic & infrequent data access. In that respect, Greenplum’s software on top of the Sun Thumpers represented a cheap storage solution that offered basic data access and was very useful for applications where processing or analytics was not the main focus.
The second reason for the interest, though, is a tendency to drive DW projects towards an absolute low per-TB price to reduce costs. Experienced folks will recognize that such an approach leads to disaster, because (as mentioned above) analytics is more than just Terabytes. Perfectly low per-TB price using fat storage looks great on glossy paper but in reality it’s no good because nobody’s analytical problems are that simple.
The point here is that analytics have more to do with processing rather than storage. It requires a fair number of balanced servers (thus good scalability & fault tolerance), CPU cycles, networking bandwidth, smart & efficient algorithms, fair amounts of memory to avoid thrashing etc. It’s also about how much processing can it be done by SQL, and how much of your analytics need to use next-generation interfaces like MapReduce or pre-packaged in-database analytical engines. In the new decade in which we’re embarking, solving business problems like fraud, market segmentation & targeting, financial optimization, etc., require much more than just cheap, overweight storage.
So going to the EMC/Greenplum news, I think such an acquisition makes sense, but in a specific way. It will lead to systems that live between storage and data warehousing, systems able to store data and also give the ability to retrieve it on an occasional basis or if the analysis required is trivial. But the problems Aster is excited about are those of advanced in-database analytics for rich, ad hoc querying, delivered through a full application environment inside a MPP database. It’s these problems that we see as opportunities to not only cut IT costs but also provide tremendous competitive advantages to our customers. And on that front, we promise to continue innovating and pushing the limits of technology as much as possible.
|
|
|
|
|
|
|
|
|
|
|
|
|
There is a lot of talk these days about relational vs. non-relational data. But what about analytics? Does it make sense to talk about relational and non-relational analytics?
I think it does. Historically, a lot of data analysis in the enterprise has been done with pure SQL. SQL-based analysis is a type of “relational analysis,”? which I define as analysis done via a set-based declarative language like SQL. Note how SQL treats every table as a set of values; SQL statements are relational set operations; and any intermediate SQL results, even within the same query, need to follow the relational model. All these are characteristics of a relational analysis language. Although recent SQLÂ standards define the language to be Turing Complete, meaning you can implement any algorithm in SQL, in practice implementing any computation that departs from the simple model of sets, joins, groupings, and orderings is severely sub-optimal, in terms of performance or complexity.
On the other hand, an interface like MapReduce is clearly non-relational in terms of its algorithmic and computational capabilities. You have the full flexibility of a procedural programming language, like C or Java; MapReduce intermediate results can follow any form; and the logic of a MapReduce analytical application can implement almost arbitrary formations of code flow and data structures. In addition, any MapReduce computation can be automatically extended to a shared-nothing parallel system which implies ability to crunch big amounts of data. So MapReduce is one version of “non-relational”? analysis.
So Aster Data’s SQL-MapReduce becomes really interesting if you see it as a way of doing non-relational analytics on top of relational data. In Aster Data’s platform, you can store your data in a purely relational form. By doing that, you can use popular RDBMS mechanisms to achieve things like adherence to a data model, security, compliance, integration with ETL or BI tools etc. The similarities, however, stop there. Because you can then use SQL-MapReduce to do analytics that were never possible before in a relational RDBMS, because they are MapReduce-based and non-relational and they extend to TBs or PBs. And that includes a large number of analytical applications like fraud detection, network analysis, graph algorithms, data mining, etc.
|
|
|
|
|
|
|
|
|
|
|
|
|
Recently, a journalist called to ask about in-memory data processing, a very interesting subject. I always thought that in-memory processing will be more and more important as memory prices keep falling drastically. In fact, these days you can get 128GB of memory into a single system for less than $5K plus the server cost, not to mention that DDR3 and multiple memory controllers are giving a huge performance boost. And if you run software that can handle shared-nothing parallelism (MPP), your memory cost increases linearly, and systems with TBs of memory are possible.
So what do you do with all that memory? There are two classes of use cases that are emerging today. First is the case where you need to increase concurrent access to data with reduced latency. Tools like memcached offer in-memory caching that, used properly, can vastly improve latency and concurrency for large-scale OLTP applications like websites. Also the nice thing with object caching is that it scales well in a distributed way and people have build TB-level caches. Memory-only OLTP databases have started to emerge, such as VoltDB. And memory is used implicitly as a very important caching layer in open-source key-value products like Voldemort. We should only expect memory to play a more and more important role here.
The second way to use memory is to gain “processing flexibility” when doing analytics. The idea is to throw your data into memory (however much it fits, of course) without spending much time thinking how to do that or what queries you’ll need to run. Because memory is so fast, most simple queries will be executed at interactive times and also concurrency is handled well. European upstart QlikView exploits this fact to offer a memory-only BI solution which provides simple and fast BI reporting. The downside is its applicability to only 10s of GBs of data as Curt Monash notes.
By exploiting an MPP shared-nothing architecture, Aster Data has production clusters with TBs of total memory. Our software takes advantage of memory in two ways: first, it uses caching aggressively to ensure the most relevant data stays in memory; and when data is in memory, processing is much faster and more flexible. Secondly, MapReduce is a great way to utilize memory as it provides full flexibility to the programmer to use memory-focused data structures for data processing. In addition, Aster Data’s SQL-MapReduce provides tools to the user to encourage the development of memory-only MapReduce applications.
However, one shouldn’t fall into the trap of thinking that all analytics will be in-memory anytime soon. While memory is down to $30/GB, disk manufacturers have been busy increasing platter density and dropping their price to less than $0.06/GB. Given that the amount of data in the world grows faster than Moore’s law and memory, there will always be more data to be stored and analyzed than what fits into any amount of memory that an enterprise can use. In fact, most big data applications will have data sets that do not fit into memory because, while tools like memcached worry only about the present (e.g. current Facebook users), analytics need to worry about the past, as well – and that means much more data. So a multi-layer architecture will be the only cost-effective way of analyzing large amounts of data for some time.
One shouldn’t be discussing memory without mentioning solid-state disk products (like Aster Data partner company Fusion-io). SSDs are likely to make the surprise here given that their per-GB price is falling faster than disks (being a solid-state product that follows Moore’s law does help). In the next few years we’ll witness SSDs in read-intensive applications providing similar advantages to memory while accommodating much larger data sizes.
|
|
|
|
|
|
|
|
|
|
|
|
|
It has been a few weeks since we announced the Aster Analytics Center, so I think this is a good time to shed a little more light on what we are doing. Our goal is to make analytical work easier and faster to do on many types of data sets. We have already worked closely with many customers to architect solutions that solve their analytics challenges: fraud detection; complex security analysis to detect communication anomalies; graph analysis for social networks.
As part of the center, we are building an analytics infrastructure to make advanced analytics readily accessible to anyone using Aster Data. This includes making use of our SQL-MapReduce interface to do analysis that can’t easily be expressed in SQL, and often leads to huge performance gains. In addition, we are releasing a suite of functions built on Aster’s API for MapReduce that allows for easy invocation from within SQL. The suite includes, for example, novel tools to do sequence analysis, which is very useful for anyone trying to do pattern analysis. It’s important to note that many of our customers are already writing their own applications using this API and it’s really straightforward to get started. Incidentally, development for our Java API has just become very easy with our new SDK that uses a plug-in for Eclipse. Also, we are actively developing partnerships with analytic functions and solution providers.
I’d like to briefly provide a brief background of why I’m so excited about what Aster is enabling and how this is indicative of a significant shift in how companies use and analyze their data. I first encountered Aster Data when I was at LinkedIn building analytically driven products with the large data sets that LinkedIn has amassed. Our team faced severe limitations with our standard warehouse, but with the introduction of the MPP Aster system we were suddenly able to analyze data much faster. Analyses that previously took 10 hours to run could suddenly run in 5 minutes. Our ability to think of an idea and get answers was no longer limited by the constraints of the equipment we owned but was instead bottlenecked by how quickly we could think. With a 10 hour wait-time you frequently forgot what you were working on or the stakeholder had moved on without doing a proper analysis. If you made a mistake or wanted to tweak your query you had to wait another 10 hours. With the Aster-enable approach to analytic development, however, a whole new way of thinking emerged and we started to perform analyses we didn’t even think was previously possible. Having the ability to quickly iterate on an idea is invaluable when solving problems – the answers we got back helped guide business decisions and enabled better products on LinkedIn.
As a customer I worked directly with the Aster team on a number of problems and was amazed by their depth of knowledge of the challenges analytics practitioners face and their ability to innovate. Since joining the team, I’ve been pleased by Aster’s strong commitment to make analytics accessible to all. A scalable system that can do more with data will unleash a whole new set of capabilities for enterprises. I’m very excited that the field team has grown and we have attracted top-talent like ex-particle physicist Puneet Batra and data mining experts like Qi Su. Ajay Mysore, another member of the team, conducted master’s research on clustering algorithms. Our team lives and breathes data and is always ready for new challenges. Right now the field of analytics is undergoing a renaissance and it’s exciting to be working with a leader in the field of big data and advanced analytics.
|
|
|
|
|
|
|
|
|
|