Sometimes the simplest questions are also the most profound.
"Why object storage?
Why?" As any parent knows, or as anyone who spends time with children knows, answering a child's "Why? Why? Why?" questions can be frustrating but also insightful. Sometimes the answer is clear and the child truly doesn't understand or doesn't want to understand, "You need to go to bed because if you don't, you'll be tired and cranky in the morning for school."
The insights come when the answers reveal that the true reason is "You need to do this because this is the way I did it, and I haven't realized that times and technologies have changed." I fell into this trap recently with my high school-aged son, who demanded to know why I insisted that he lug his textbooks with him on a trip last weekend to study for his midterms. After I calmed down and began to listen, he explained how all the practice problems were online, as well as the study guides, which he would access from his iPad. To his credit, he studied very effectively the whole weekend without one single textbook along.
As we move to rethink storage, we need to ask ourselves the same question - why are we doing things the way we are? Are there truly good reasons which still apply, or do we need a fresh perspective --untainted by legacy technologies -- to realize that there is a better way?
Block and file based storage systems have been the workhorse of the data center for decades. Highly efficient, these systems are tuned to deliver maximum performance from the drives, leveraging in-memory caches on both the storage server and on the client to optimize the access to data and to minimize the time wasted on slow network transfers of data. Data sharing is difficult and requires complex logic, typically at the application level, due to the rapid update rate of the data and due to a desire to not interfere with a system specifically tuned for data update, throughput, and rapid responses. Such systems became the backbone of stock trading systems, recording credit card transactions, recording financial transactions, and uses far beyond these such as recording medical exam results and patient records, enabling video editing and computer animation, and far more.
But analyzing the prior paragraph with an eye on the question of "Why?", something doesn't add up. Why do we need to avoid the network, when 1Gb and 10Gb networks allow far faster data transfers than is possible from any spinning media? Why do we tune for rapid transaction responses, when MRI or other medical systems may only be generating 100s of images per day? Why is data sharing achieved via complex application level coordination, and not within the storage system, when my primary care and specialist doctors may view the medical image, when teams of animators are working together on a single video, or, in this age of social media, when thousands to millions are accessing the latest music, watching the latest movie trailers, and the like?
Technology has changed dramatically since the design of block and file based systems. Multi-core CPUs are far more powerful and we can implement complex algorithms inside the storage system to manage shared access by applications. These same CPUs, together with fast, low latency networks, have enabled high-scale distributed systems that can scale out and provide far greater processing power than traditional systems. The center of tuning has changed from avoiding the network to avoiding the disk.
Object storage systems are designed around these principles, optimizing for the state of the world today. The typical user is accessing or updating data from a laptop, a tablet, or a mobile, where the throughput to an individual device is relatively modest, constrained by the wireless or cellular network, but where the aggregate bandwidth across multiple users can be staggering. Object systems are optimized to simultaneously transfer data to many such users, not only the ~50 students in the two sections of my son's class, but to the hundreds to thousands of students in my town (and the neighboring towns) who are all studying for their own midterms.
Could traditional systems be used to address this same issue? With enough custom software, and with enough custom code on the client machines, with support for all the different iOS and Android and Windows versions, and with the software distribution and maintenance associated, the problem could be addressed. The question is, "why?"
To that end, perhaps the question we should be asking is not "Why object storage", but "Why block and/or file storage?" What insights would that lead to in your own environment?