I’ve remarked in an earlier post that the usage of data is changing and new applications are on the horizon. In the last couple of years, we’ve observed interesting design patterns for business processes that use data.
In a previous post, I outlined a design pattern that we call “The Automated Feedback Loop.” In this post, I want to outline a design pattern that we call “The (Iterative) Analytics Data Warehouse”.
The traditional well-understood design pattern of a data warehouse is a central (for Enterprise Data Warehouse) or departmental (for Data Marts) repository of data. Data is fed into the warehouse from ETL processes that pull data from a variety of sources. The data is organized in a data model that caters to 3 use-cases of the warehouse:
- Reports - A set of BI queries are run with regular frequency to monitor the state of the business. The target of the reports are business users who want to understand what happened. The goal is to keep them in touch with the pulse of the business.
- Exports - A set of export jobs are run with adhoc frequency to provide data sets for further analysis. The target of the exports are business analysts who want to optimize business practices. The goal is to provide them with true, quality-stamped data so that they can make confident optimization recommendations.
- Adhoc - A set of queries are run with adhoc frequency to detect or verify patterns that influence business events. The source of the queries are data scientists who want to understand and optimize business practices. The goal is to provide them with computation capabilities (good query interfaces, enough processing, memory and storage resources) to allow them to interact with the data.
The exports and adhoc tasks are transient tasks. Once the data analysts or data scientists find a pattern valuable to the business, that pattern is incorporated into a report so that business users can monitor that pattern on a frequent repeatable practice.
In a typical data warehouse, the bulk of tasks (~80%) are from [1] Reports. The remainder of 20% is from [2] Exports and [3] Adhoc.
Since Reports are frequent and generate known queries, the design of the data warehouse is done to cater to reporting. This includes data models, indexes, materialized views or derived tables – and other optimizations – to make the known Reporting queries go fast. Read the rest of this entry »

Mayank Bawa is the CEO and co-founder of