Posted by George Candea in Administration, Blogroll, Database, Manageability on May 17, 2008

I want databases that are as easy to manage as Web servers.

IT operations account for 50%-80% of today’s IT budgets and amount to 10s of billions of dollars yearly(1). Poor manageability impacts the bottomline and reduces reliability, availability, and security.

Stateless applications, like Web servers, require little configuration, can be scaled through mere replication, and are reboot-friendly. I want to do that with databases too. But the way they’re built today, the number of knobs is overwhelming: the most popular DB has 220 initialization parameters and 1,477 tables of system parameters, while its “Administrator’s Guide� is 875 pages long(2).

What worries me is an impending manageability crisis, as large data repositories are proliferating at an astonishing pace… in 2003, large Internet services were collecting >1 TB of clickstream data per day(3). 5 years later we’re encountering businesses that want SQL databases to store >1 PB of data. PB-scale databases are by necessity distributed, since no DB can scale vertically to 1 PB; now imagine taking notoriously hard-to-manage single-node databases and distributing them…

How does one build a DB as easy to manage as a Web server? All real engineering disciplines use metrics to quantitatively measure progress toward a design goal, to evaluate how different design decisions impact the desired system property.

We ought to have a manageability benchmark, and the place to start is a concrete metric for manageability, one that is simple, intuitive, and applies to a wide range of systems. We don’t just use the metric to measure, but also to guide developers in making day-to-day choices. It should tell engineers how close their system is to the manageability target. It should enable IT managers to evaluate and compare systems to each other. It should lay down a new criterion for competing in the market.

Here’s a first thought…

I think of system management as a collection of tasks the administrators have to perform to keep a system running in good condition (e.g., deployment, configuration, upgrades, tuning, backup, failure recovery). The complexity of a task is roughly proportional to the number of atomic steps Stepsi required to complete task i; the larger Stepsi, the more inter-step intervals, so the greater the opportunity for the admin to mess up. Installing an operating system, for example, has Stepsinstall in the 10s or 100s.

Efficiency of management operations can be approximated by the time Ti in seconds it takes the system to complete task i ; the larger Ti , the greater the opportunity for unrelated failures to impact atomicity of the management operation. For a trouble-free OS install, Tinstall is probably around 1-3 hours.

If Ni represents the number of times task i is performed during a time interval �evaluation (e.g., 1 year) and Ntotal=N1+… +Nn, then task i ’s relative frequency of occurrence is Frequencyi = Ni / Ntotal . Typical values for Frequencyi can be derived empirically or extracted from surveys(4),(5),(6). The less frequently one needs to manage a system, the better.

Manageability can now be expressed with a formula, with larger values of manageability being better:

manageability formula

This says that, the more frequently a system needs to be “managed,� the poorer its manageability. The longer each step takes, the poorer the manageability. The more steps involved in each management action, the poorer the manageability. The longer the evaluation interval, the better the manageability, because observing a system longer increases the confidence in the “measurement.�

While complexity and efficiency are system-specific, their relative importance is actually specific to a customer: an improvement in complexity may be preferred over an improvement in efficiency or vice-versa; this differentiated weighting is captured by α. I would expect α>2 in general, because having fewer, atomic steps is valued more from a manageability perspective than reducing task duration, since the former reduces the risk of expensive human mistakes and training costs, while the latter relates almost exclusively to service-level agreements.

So would this metric work? Is there a simpler one that’s usable?

Bookmark and Share

Valerie Henson on May 18th, 2008 at 11:04 am #

My experience is that surveys or evaluations or anything that involves people are very expensive, and nobody wants to pay for them. If we could design a manageability metric that didn’t require stopwatches and rooms with one-way mirrors, that would be a big improvement.

When we were designing the ZFS user interface, our principles were derived in large part by counterexample: the interface for the existing Solaris volume manager and UFS. One goal was to reduce the number of steps to the bare minimum required to express your intent. But another goal was to reduce the amount of “magic” in each step - bizarre command lines, long random-looking strings of mixed letters and numbers, required numerical paraeters that could have been automatically deduced, hidden naming rules, etc. Reducing the number of steps is good, but reducing the complexity of each step is better.

At one point, the ZFS presentation had a great set of slides comparing UFS and ZFS administration. Those slides are excerpted below:


(Originals are hosted on a very slow site, http://www.filibeto.org/~aduritz/truetrue/solaris10/ZFS_SOSUG17oct2005_preso.pdf )

The metric I thought of immediately was, “How small does the font have to be to fit all the commands on one page?” The second thought was to measure the gzipped size of the total command set - gzip being an approximation for the complexity of the command set.

George Candea on May 19th, 2008 at 2:25 am #

I like the simplicity of the gzip-based metric and the ZFS example is great. This will work well for management interfaces consisting of single-command actions (as is often the case for a file system) or for configuration files. At a first level of approximation, it works for single-step, non-interactive systems.

It seems though that stopwatches remain a fundamental component of a manageability benchmark for more general systems (like a database). A 1000-command DB could circumvent the gzip metric by converting to a 1-command interface, called admin. Once you type admin, the DB offers a wizard-like sequence of multiple-choice questions: change schema? do backup? configure users? …, followed by complete backup? incremental backup? … and so on for many more steps. I don’t think this 1-command system is any more manageable than the 1000-command version, yet it would score highly on a gzip-based metric: there is only one command, but every operation takes a long time and has many steps, with each additional step being a new opportunity for mistakes.

The point about expensive human studies is good though, and perhaps we, as an industry, ought to include stopwatches directly in the systems we build. Then we could be running the human study constantly in the field, on deployed systems, by having the systems themselves collect suitably-anonymized information on all their admin interactions. With a bit of analytics, this data provides all necessary info to measure the manageability of that system at that customer site; if the system is itself a database, the collection and analysis features could be already available.

Ken Lewis on May 19th, 2008 at 3:04 am #

Your objection to the gzip-of-command-prompt-entries is fair. Can it not be overcome with a gzip-of-keyboard-entries recorded with the legitimate use of a keylogger?

Also, Val Henson’s suggestion works well where the commandline rules, but GUI complexity will probably need some measure of both mouse pixels traversed (which accounts for inherent spatial nature of GUI layout) and characters of text displayed (which measures how much attention and comprehension is required of the user).

Take care.

Eivind Kjørstad on May 21st, 2008 at 4:34 am #

These suggestions ignore what I frequently find takes the MOST amount of time; knowing *precisely* what to do.

It’s little consolation that performing a certain task is achievable with a single command taking two simple options if it takes you a week of fiddling, experimenting and research to figure out the magic.

I think, ultimately, you’d need to let a representative sample of actual admins try performing a representative sample of tasks on a representative collection of hardware and then use the stopwatch. Val is right though, that is extremely costly.

Installing Ubuntu is a 12-step process, however if you’re american and satisfied with the defaults, 10 of those steps consist of hitting “enter” (or clicking on “Next”) in a wizard-interface.

In practice that is much more managable, and much less likely to go wrong, than the same number of steps where each step requires esoteric and hard-to-find options.

It seems grossly unfair that figuring out that you need to do:

command sub-command -option1=foo -exclude=[complex-regexp] -replication=limited

should be 1 step.

And “press enter to confirm already selected default which is correct” is -also- 1 step. In practice, the former can require a week, while the second typically requires a second or two.

Vals suggestion is better for routine-tasks where it’s assumed that the investment in figuring out -what- to do is already done. But no admins I know spend a large fraction of their time performing such — if they did, they should write a shell-script for it which reduces ANY sequence of commands to a single command.

Emmanuel Cecchet on May 21st, 2008 at 4:55 am #

The various tasks involved in the administration of a RDBMS require various skills. I am not sure that task complexity can be captured in a single alpha, but it would probably require an alpha_i (per step).
A straightforward script that requires no or very little input from an admin is likely to always complete successfully with the same amount of time. However, more complex tasks requiring more skills will be more error prone and the impact of an admin operation failure should be captured by the manageability metric. Manageability should be lower for tasks requiring skills or if the task failure impacts the availability of the system.
I am not sure also that a single global metric will really capture the system manageability. If I have a system that offers me only 1 management operation that can only display the system status in a very fast way and I was lucky enough to not have any failure during my Delta_evaluation, the system will score high on the Manageability metric.
Some tasks are only performed once like install, whereas some others are more frequent like updates or recurrent like backups or day-to-day tasks. Like car reviews, that give separate scores for performance, style, interior… I would really consider a composed metric for Manageability that combines individual metrics of pre-identified domains (ie. install, upgrade, backup, tuning, recovery, …). Each domain could have a different weight depending on the user needs or the kind of application being evaluated (database cluster, app server cluster…). Right now, a system that has a long install phase but very automated management later on could have the same Manageability score as a system that installs very fast but requires a lot of manual work for each admin task.

I have no good suggestion yet on how to capture the availability aspects. I prefer a system that performs a task more slowly if it can do it without downtime for my clients rather than a faster one that requires a system shutdown. Transactional management operations that can rollback in the presence of failures should somehow have a better score.
The current metric definition depends a lot on what happened during the evaluation period which is not necessarily relevant of the overall system Manageability. If this metric is to be used to compare systems, the set of operations or tasks to be performed during the evaluation period should be specified.

Like TPC has several benchmarks for different use cases, maybe there is not a single manageability metric but one per field of application?

George Candea on May 22nd, 2008 at 11:21 am #

Great points, folks! This got me thinking whether we need to distinguish usability from manageability. IMO, usability expresses the ease with which human subjects can employ a tool to achieve some objective. A system S ‘s management commands are a tool used to reach the objective of keeping S in good functioning order for its users. So one could say S ‘s manageability is the usability of its management functions. Alas, the usability folks don’t seem to have good benchmarks on par with TPC* or SPEC*.

One of the problems seems to be the variability in human actions; e.g., some admin may be vastly more capable at a given task than another, and vice versa. So perhaps the manageability benchmark should try to eliminate this variability from the measurement. One way to do that would be to focus solely on the expert sysadmin. An interface that is easy to learn often ends up being inefficient once you become an expert, so aiming for efficient management for experts doesn’t sound too bad.

An important issue you brought up is measuring the effects of management actions: dropping all tables is very different from changing the privileges of a single user. Perhaps the easiest way is to measure how expensive it is (in terms of time and money) to undo management actions: the more expensive it is, the less you want to see that happen; the more the system helps you undo your mistakes, the more manageable it is.

Identifying representative management workloads for specific classes of systems is also a good idea. Can we draft such a workload for databases, without splitting hairs too much? I’ll seed the list with the following:

- installation and deployment
- configuration for a given environment (users, connection to other DBs, etc.)
- performance tuning for a given workload
- software upgrades (includes patching)
- hardware upgrades
- scaling up/down (resizing)
- backup and recovery

At the end of the day, to turn manageability into something measurable (and thus quantitatively improvable), we must accept that certain aspects of the question will remain unsolved; we should just make sure we get the most important ones right. Are we?

Post a comment