Research Papers on MapReduce, Data Warehousing, Business Intelligence

This resource contains published academic papers written by Aster Data Systems employees about the challenges facing the data community.

Analytic Power

Back to Top

Availability

  • Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel
  • Reducing Recovery Time in a Small Recursively Restartable System
  • Crash-Only Software
  • Recovery-Oriented Computing: Building Multitier Dependability
  • Microreboot - A Technique for Cheap Recovery
  • Improving Availability with Recursive Microreboots: A Soft-State System Case Study
  • Autonomous Recovery in Componentized Internet Applications
  • Toward Self-Healing Multitier Services
  • The Price of Validity in Dynamic Networks
  • Transience of Peers and Streaming Media
Back to Top

Manageability

  • DART: Distributed Automated Regression Testing for Large-Scale Network Applications
  • Combining Visualization and Statistical Analysis to Improve Operator Confidence and Efficiency for Failure Detection and Localization
  • A Scalable, Sound, Eventually-Complete Algorithm for Deadlock Immunity
  • Predictable Software - A Shortcut to Dependable Computing?
  • Middleware-based Replication: The Gaps Between Theory and Practice
Back to Top

Reliability

Back to Top

Scalability

  • PlanetLab: An Overlay Testbed for Broad-coverage Services
  • User-Centric Performance Analysis of Market-Based Cluster Batch Schedulers
  • Distributed Hash Queues: Architecture and Design
  • OnCall: Defeating Spikes with a Free-Market Application Cluster
  • Online Balancing of Range-Partitioned Data with Applications to Peer-to-Peer Systems
  • Peer-to-Peer Research at Stanford
  • Symphony: Distributed Hashing in a Small World
  • Locality-Aware Request Distribution in Cluster-based Network Servers
  • Soft Timers: Efficient Microsecond Software Timer Support for Network Processing
  • Efficient Support for P-HTTP in Cluster-Based Web Servers
  • Cluster Reserves: A Mechanism for Resource Management in Cluster-based Network Servers
  • The SawMill Framework for Virtual Memory Diversity
  • Distributed File Organization with Scalable Cost/Performance
  • A Performance Model of a Design for a Minimally Replicated Distributed Database for Database-Driven Telecommunications Services
  • DERBY: A Memory Management System for Distributed Main Memory Databases
  • Snowball: Scalable Storage on Networks of Workstations with Balanced Load
  • Architectural Considerations for Efficient Software Execution on Parallel Microprocessors
Back to Top

Security

Back to Top
Top Picks
Whitepaper: New MapReduce Whitepaper
Webcast: Bringing Big Data Analytics to the Enterprise - 11/12, with Merv Adrian
Webinar: Service Oriented 'Analytics' - 11/19, with James Kobelius