Purveyors of cloud storage services may be doing their customers, or themselves, a disservice by relying on imprecise metrics for billing, argued a researcher at a Usenix conference.
"Disk time is what costs, not I/Os or bytes, and that is what should be the metric in cloud storage systems," said Matthew Wachs, a researcher at Carnegie Mellon University, in a talk at the Usenix HotCloud workshop this week in Portland, Oregon.
Wachs, along with other researchers at Carnegie Mellon and VMware, investigated the topic in their Usenix paper, "Exertion-based billing for cloud storage access."
"Cloud storage access billing should be exertion-based, charging tenants for the costs actually induced by their I/O activities rather than an inaccurate proxy (e.g., byte or I/O count) for those costs," the paper said.
Today, IaaS (Infrastructure-as-a-Service) cloud storage providers such as Amazon or Google typically bill on two factors, the amount of data being stored and the amount of data that is transferred to and from the cloud, or I/O.
While charging based on the amount of data stored is a reasonable metric, Wachs contended, the amount charged for I/O is flawed, given the work expended to read that data from disk or write that data to disk. The cost of handling those bits on disk may vary widely from one instance to another, Wachs pointed out.
"As a result, tenant bills for storage access may bear little to no relationship to the actual costs," the paper said.
Wachs mentioned a number of factors that can lead to this variance, the most prominent being the difference between random and sequential access on the disk.
In sequential access, data is written to or read from one portion of the disk in a continual stream of bits. In random access, the disk head must jump around to different parts of the disk to read or write data.
The difference between these two types of workloads can be immense, Wachs said.
For instance, sequential access can achieve a throughput on an average disk of up to 63.5MB/s (megabits per second), whereas random access can only be executed at 1.5MB/s.
In practical terms, this disparity means that one customer executing lots of random reads and writes is using a lot more of the system's resources than another customer who may be accessing the same amount of data through sequential accesses, even though both customers are charged the same amount.
In the long run, this practice would provide no incentive for customers to establish more efficient data transfer practices, and fiscally penalize those customers who do have such practices in place. It could also erode the profit margins of storage providers, who may not have accounted for these inefficiencies in their original plans.
Sign up for CIO Asia eNewsletters.