Research Papers on MapReduce,
Data Warehousing, Business Intelligence
This resource contains published academic papers written by Aster Data employees about the challenges facing the data community.
Analytic Power
- SQL-MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions
- A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses
- The Architecture of PIER: an Internet-Scale Query Processor
- Querying at Internet-Scale
- Turbo-charging Vertical Mining of Large Databases
- Minimizing View Sets without Losing Query-Answering Power
- SETS: Search Enhanced by Topic Segmentation
- Efficient and Extensible Algorithms for Multi Query Optimization
- Don't Trash your Intermediate Results, Cache 'em
- Materialized View Selection and Maintenance Using Multi-Query Optimization
- Pipelining in Multi-Query Optimization
- Garbage Collection in Object-Oriented Databases Using Transactional Cyclic Reference Counting
- OSQR: Overlapping Clustering of Query Results
- Enhanced Business Intelligence Using EROCS
- Efficient Subsequence Matching in Time Series Databases Under Time and Amplitude Transformations
- Extracting Cyber Communities Through Patterns
Availability
- Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel
- Reducing Recovery Time in a Small Recursively Restartable System
- Crash-Only Software
- The Price of Validity in Dynamic Networks
- Transience of Peers and Streaming Media
Manageability
- DART: Distributed Automated Regression Testing for Large-Scale Network Applications
- Combining Visualization and Statistical Analysis to Improve Operator Confidence and Efficiency for Failure Detection and Localization
- Predictable Software - A Shortcut to Dependable Computing?
- Middleware-based Replication: The Gaps Between Theory and Practice
Reliability
- A Utility-Centered Approach to Building Dependable Infrastructure Services
- ConfErr: A Tool for Assessing Resilience to Human Configuration Errors
- GnatDb: A Small-Footprint, Secure Database System
Scalability
- PlanetLab: An Overlay Testbed for Broad-coverage Services
- User-Centric Performance Analysis of Market-Based Cluster Batch Schedulers
- Distributed Hash Queues: Architecture and Design
- OnCall: Defeating Spikes with a Free-Market Application Cluster
- Online Balancing of Range-Partitioned Data with Applications to Peer-to-Peer Systems
- Peer-to-Peer Research at Stanford
- Symphony: Distributed Hashing in a Small World
- Locality-Aware Request Distribution in Cluster-based Network Servers
- Soft Timers: Efficient Microsecond Software Timer Support for Network Processing
- Efficient Support for P-HTTP in Cluster-Based Web Servers
- Cluster Reserves: A Mechanism for Resource Management in Cluster-based Network Servers
- The SawMill Framework for Virtual Memory Diversity
- Distributed File Organization with Scalable Cost/Performance
- A Performance Model of a Design for a Minimally Replicated Distributed Database for Database-Driven Telecommunications Services
- DERBY: A Memory Management System for Distributed Main Memory Databases
- Snowball: Scalable Storage on Networks of Workstations with Balanced Load
- Architectural Considerations for Efficient Software Execution on Parallel Microprocessors
Security
- Detecting identity-based attacks in wireless networks using signal prints
- MobiCom poster: public-key-based secure Internet access.
- DoS and authentication in wireless public access networks
- Authenticity and availability in PIPE networks
- MIDAS: An Impact Scale for DDoS attacks
- Analyzing Large DDoS Attacks using Multiple Data Sources
- Reval: A Tool for Real-time Evaluation of DDoS Mitigation Strategies
The Best Insights Possible
White Paper: A Revolutionary Approach for Advanced Analytics and Big Data Management |
| Whitepaper: Deriving Deep Insights from Big Datasets |
Research Report: MapReduce and the Data Scientist |


