This raises a couple of interesting questions: Where is this Big Data going now in your storage environment? And how does that impact what you may want to do with it later? We were wondering about that here in ASD, and we went looking for some answers. We commissioned a study with The 451Group to try and get a handle on what businesses are doing today . The study encompassed corporate IT departments (a majority of whom are at the Manager level or above) at over 100 Enterprise companies.
One of the key questions we asked was: Which storage methods do you today support or do you plan to support for your Big Data implementation? In retrospect, the responses where not that surprising if you believe that Big Data is alive and active in the data center today and not some future ideal. The data shows that almost 50% said they store Big Data on an existing SAN, 30% indicated that it was on an existing NAS, and 30% said a cloud based storage platform (multiple selections were accepted).
Question: Which storage methods do you support today or plan to support for your Big Data implementation?
So it looks like most folks are just parking that data wherever there is extra space as sort of a convenience play. We understand it; it’s not like IT departments now have a ton of money laying around to go and buy a dedicated device for all the stuff you think might be a good big data project in the future. Most of us are just like you, we want to do some level of analytics on that data (i.e. Hadoop.) but we are just not sure when we’re going to be able to dedicate resources for that. Or when the business is going to demand it. The problem we have is that little voice in the back of your head saying “You do know we are going to have to migrate all the data in order to do that, don’t you?” That’s when things can get messy. What’s an IT pro to do?
In a perfect world, you could move the data to a dedicated SAN or some other storage platform, but the reality is that for many, that’s just too expensive. You could move it to a public cloud, which is cheap initially, but you would be doing a lot of “get & put” and that can add up fast. How about leaving it where it is? “That’s probably what’s going to end up happening!” you say. And it will just sit there for another year. But what if you could leave it where it sits and run your analytics on it?
The EMC ViPR software-defined storage platform is designed to do exactly this. At a high level ViPR aggregates multi-vendor heterogeneous storage into a unified storage platform, that, in turn, can be leveraged as a logical scale-out layer which can serve as the underlying infrastructure for hosting a range of data services (like HDFS) to support collecting, managing and utilizing unstructured content at massive scale. Data services are storage abstractions that reflect the combination of a data type (file, object or blocks of data), access protocols (iSCSI, NFS, REST, etc.), and durability, availability, and security characteristics (snapshots, replication, etc.). In ViPR, block, file, object, and HDFS are all data services. These data services can be used to provide different semantic views of the same data. You can manipulate a file as a file or as an object without having to move the data to a different platform that features that semantic.
Now, instead of building a discrete analytics silo with dedicated infrastructure, the ViPR HDFS Data Service can leverage the existing ViPR virtualized storage environment and the backend storage platforms it utilizes. That means you can go ahead and start unlocking the big data advantage that your competitors are still waiting for, and you can bring that to the business before they ask for it. Wouldn’t that be a great way to start the Year?