Testing should also mirror the compressibility, patterns and how well data can be de-duplicated. To understand how well pattern recognition operates in the environment, testing must include data types representative of the applications that are using file storage.
In SAN environments, each application maintains its own metadata. From the viewpoint of storage traffic, metadata access looks just like application access, except the metadata region typically is a hot spot, where access is more frequent than areas where application data is stored.
In order to properly characterize block data access, one must understand the basic command mix, whether data is accessed sequentially or randomly, the I/O sizes, any hotspots and the compressibility and de-duplicability of the stored data. This is critical for flash storage deployments as compression and inline deduplication facilities are essential to making flash storage affordable. The workload model must take data types into account as these technologies can have significant performance impacts, and because vendors implement these features in different ways.
Finally, the load patterns help determine how much demand and load can fluctuate over time. In order to generate a real-world workload model, understanding how the following characteristics vary over time is essential: IOPs per NIC/HBA, IOPs per application, Read & Write IOPs, metadata IOPs, Read, Write, & total bandwidth, data compressibility and the number of open files are key metrics.
There are a number of products and vendor-supplied tools that exist to extract this information from storage devices or by observing network traffic. This forms the foundation of a workload model that accurately characterizes workloads.
Running & analyzing the workload models
Once you have created an accurate representation of the workload model, the next step is to define the various scenarios to be evaluated. You can start by directly comparing identical workloads run against different vendors or different configurations. For example, most hybrid storage systems allow you to trade off the amount of installed flash versus HDDs. Doing runs, via a load generating appliance that compares latencies and throughput from a 5% flash / 95% HDD configuration versus a 20% flash / 80% HDD configuration, usually produces surprising results.
After you have determined which products and configurations to evaluate, you can then vary the access patterns, load patterns and environment characteristics. For example, what happens to performance during the log-in/boot storms? During end of day/month situations? What if the file size distribution changes? What if the typical block size was changed from 4KB to 8KB? What if the command mix shifts to be more metadata intensive? What is the impact of a cache miss?
All of these factors can be modeled and simulated in an automated fashion that allows direct comparisons of IOPS, throughput and latencies for each workload. With such information, you will know the breaking points of any variation that could potentially impact response times.
Sign up for CIO Asia eNewsletters.