Thin Provisioning: When "Enough" Isn't Enough.

We’ve all heard the saying “You don’t know what you’ve got until it’s gone.” Imagine the following scenario: You have a need for high-capacity, high-availability storage. You’ll never be able to convince management to let you spend as much as you’d like to get as much as you need, and, if you do, you know there will always be a need for more in the future. Luckily, financial outlay in the modern enterprise has become much more manageable recently with the advent of hardware virtualization. Compared to just a couple of years ago, we’re able to save incredible amounts of money — in OPEX and CAPEX on hardware, deployment costs and management. In the storage world, gone (nearly) are the days where a failing disk means a rough night, moving file shares is no longer (as much of) a headache, and increasing storage capacity is (almost) simply a matter of changing a setting. This is all enabled by storage virtualization.

Virtualizing your storage is a relatively old concept. We’ve been creating virtual hard drives, most commonly in the form of .VHD or .VMDK files, for use in virtual machines since Windows Server 2003 and early versions of the VMware solutions. These have had the ability to “thin provision,” in that you could take advantage of a dynamically expanding hard disk file that could grow as it was utilized. They could be limited, however, in that in some instances you could not increase the virtual hard disk size as it was filled up, and in others the OS was completely unaware that it was running on a “fake” hard drive. This can lead to overutilization, poor performance, and capacity issues caused by poor planning.

Now, however, we’ve got a technology that goes beyond simply faking a hard drive. What I’m referring to is block-level virtualization. This technology allows us to be much more flexible with our arrays, hypervisors, and storage appliances. Virtualizing storage at the block level means that, via software or hardware, we have the ability to place logical units of storage in any almost any order, on almost any type of array or drive type. Most specifically within the scope of this article, with thin provisioning, we can implement a large capacity storage solution, without actually having to install (and pay for!) the space. Like anything, though, there are pros and cons.

Pros

  • First and foremost, costs are dramatically reduced. You can deploy the capacity you need now, with the storage you need later. This allows you to lower your initial cost to implement, and down the road, turn what would be an ugly capital expense to upgrade or replace the array into simple operating expense (or fixed asset capital expense) to add capacity to the existing array. Either way you look at it, you’re saving money, and you’re able to be more flexible with your budget.
  • Ease in provisioning. You have the ability to expand and shrink volumes as necessary, as well as moving storage and file shares easily and quickly.
  • Redundancy, resiliency and replication. Data integrity is improved as you’re not necessarily subject to one vendor’s RAID array, storage solution, or backup strategy.
  • It is possible to have live migration from one logical storage unit to another. This means the ability to failover while the volume is in use.
  • Faster backups and easier VM/storage management. Files are only moved if they have to be.
  • Data reduction technologies like deduplication are possible at the hardware level.
  • Tiered storage solutions are possible, with most commonly accessed “hot” blocks automatically moved to faster storage, while less commonly accessed “cold” blocks moved to slower, cheaper storage.

Cons

  • Explaining to management why you need to keep throwing money at an appliance or service that already has “all” the capacity that’s required.
  • It’s possible to end up with a very slow array. Certain products like Microsoft’s Storage Spaces will allow you to combine hard drives of different speed and capacity. In most cases, this will drag the entire array down to the speed of the slowest drive.
  • When the physical capacity of a thin provisioned array is met, most products will FREEZE the volume until more storage is added. This can wreak havoc on databases, mail servers and virtual machines.
  • Maintenance tasks are completely different than ever before. For example, certain file system or product tasks (like database defragmentation) can actually cause your volume to grow, needlessly increasing the size of your thin volume.
  • Planning — this can’t be emphasized enough when moving towards a thin provisioned solution. It’s easy to underestimate your needs and not buy enough storage. This can manifest itself not only as a lack of capacity on the array, but can also reduce performance dramatically because of drive utilization.

The major headache of a thin provisioned appliance is user perception. They may see a 3 TB array, but unless you remind them that it’s not actually 3TB (or plan for existing utilization), it will most likely start getting filled immediately, and if you didn’t plan well enough, you will have problems right away.

Thin provisioning of your storage solutions is not something to be taken lightly. When done right, the net result is an efficient storage array that provides an appropriate amount of capacity and IOPS at an appropriate price for your environment. However, it’s easy to just run with the cost savings and forget about the complications inherent in deploying such an advanced technology, and also easy for your team to overlook, for example, the fact that even though that array says 3TB, there’s only 1TB available right now. Unless they’re careful, they literally may not know what they’ve got until it’s gone.