It feels slightly strange sitting in Scotland in December and its 12 degrees outside, we’re much more used to snow and a white Christmas. No doubt that Global Warming has affected our weather systems here.
However, in the world of data it’s becoming a polar opposite (see what I did there?). Data continues to grow and get colder by the day. We’ve become a society of data hoarders, we continue to store everything, never accessing but keeping it for that ‘just in case’ moment. This has led to the rise of ROT data; Redundant, Obsolete or Trivial content that is never accessed but continues to consume valuable resources.
A recent Veritas survey shows that only 14% of our data is accessed regularly, with a further 32% being classified as ROT data. The worrying statistic is the one I’ve not yet quoted; this means that 54% of our data is simply unknown, and like the majority of an iceberg sits unseen below our visibility.
This dark data may have business value, or may be valueless, but the crucial point being that it remains unknown. More worryingly, this dark data may contain personal customer information, non-compliant data or other high-risk corporate data, with the potential for critical risks at the core of a business.
Recent legislation changes mean that Data Governance has to become more critical to business operations, location of data, content of repositories and the ability to search and discover data of relevance, upon demand, is placing new and unique challenges for IT operations, challenges that they have never previously faced.
Illuminating dark data is not easy, it requires elimination of ROT, it requires understanding of corporate data and what data may have business value, and it requires further understanding of legislation relative to the customer environment. Finally the ability to find that needle requires the use of tools and the knowledge to understand what you are looking for.
Having the ability to seek across all data sets, and having the ability to apply filters to the searches is not an easy task, but one that you will face at some point. Identifying the process and the tools is a mission that needs addressing now, when you are asked for it may be too late to avoid significant costs and the potential for large fines if data cannot be produced in a timely manner.
The Data Iceberg is not melting, but at least we can understand the 54% not immediately visible to us. Our data hoarding exacerbates the problem, time to shine a light in the darkness.
Now, where’s my sunglasses?
*Information has been sourced from the recent Veritas publication; The Databerg Report: See What Others Don’t
I’m allowed some Star Wars geekery occasionally!
With the imminent launch of the latest Star Wars movie I turned to thinking about the generation of images used in movies. We think less and less about the computer generated images we see in movies, but are simply accepting of them as part of the action, even though the Wow factor is still there.
We know that those buildings are not really destroyed; the Golden Gate Bridge has not really been devastated 20 times in movies recently, so we know its Computer Generated Imagery (CGI), but have we ever thought about the technology required to create these sequences?
Most important in this process is the role of the storage environment; it’s imperative to be able to process images quickly and to be able to render images in a timeframe to minimise cost and production time.
This is one of the places that Flash-based storage arrays really shine; the ability to deliver output in a rapid fashion means that my Star Wars user experience happens in 2015, and not in several more years’ time.
Remember, the original Disney cartoons took several years to make but now several can be produced every year, Flash storage solutions are one of the key factors behind this.
Now, performance isn’t always everything, but in the film industry it can be.
Whilst I genuinely have no preference for technology vendors, occasionally there are just some things you just have to highlight. One of these has been our recent testing of the HP StoreServ 20850 storage array. Having recently achieved world record results in the SPC-2 tests the 20850 became an obvious candidate for Computacenter to evaluate whether the claims could be substantiated in a real world scenario.
The performance of this array has been blindingly fast, and is one of the few which actually matches the vendor’s claims in terms of performance. Having tested several vendors’ solutions, the HP 20850 has stood with the best of them in terms of both price and performance. Combining this with improved manageability makes the HP 20850 a compelling solution for customers across a wide range of applications, and supports customers in their move to the silicon datacentre.
The HP StoreServ represents a return to form for one of the major players in the storage industry, and is available for Customer Demonstration with a variety of either simulated workloads, or customer-specific tests utilising actual data, in the Computacenter Solution Centre based in Hatfield.
To (almost) quote Darth Vader; ‘HP StoreServ 20850- The Force is Strong with This One’
For those that don’t know, my background is in Mathematics & Physics which, as a wise man once pointed out to me, is why I have OCD tendencies around numbers.
I like precision, I don’t like estimates or guesstimates, and I’m not a big fan of vendor spreadsheets that show how their technology will reduce your Capex or Opex and provide virtually immediate ROI, because we all know there are so many variables that they cannot possibly be particulalry accurate.
If I followed these models ultimately I could go in ever-decreasing circles where I have ultimate performance, at little cost, with no footprint and it pays for itself before I’ve bought it. Hooray for that!
Back in my precise world it’s important that we know what it realistically achievable, and more importantly what is achievable in specific environments with specific applications. One thing we have learned is that whilst all storage technology may look similar from the outside, it doesn’t always perform in a similar manner. One thing I’m asked repeatedly is how to decide between vendor technologies and what is the optimal solution for customers.
The answer is not simple, there are many variables that can affect the performance of any storage environment, and why for specific workloads there will be a solution which will work better than others for specific criteria. When sizing storage solutions we need to look at a multitude of variables;
- Performance requirements in terms of IOPS, Latency & Bandwidth
- Read / Write ratios
- Application usage
- Block size in use
- Typical file sizes
- Whether compression is applicable,and how well data may compress
- Deduplication and how well data can be deduplicated
Now here comes the challenge; 64% of IT organisations don’t know their application storage I/O profiles & performance requirements; so they guess. The application owner may closely know the performance and capacity requirements, but adds extra to accommodate growth and ‘just to be safe’. The IT department takes the requirements and adds some more for growth and ‘just to be safe’ because ultimately we cannot have a new storage subsystem which does not deliver the required performance.
This means performance planning can be guesswork, with substantial under or more likely over-provisioning, and the unseen costs of troubleshooting and administration providing more significant overheads than should be necessary.
The ultimate result of this can be a solution which meets all the performance requirements but is inefficient in terms of cost and utilisation.
This is where Computacenter come in; working closely with our latest Partner LoadDynamix we can;
- ACQUIRE customer specific workloads and understand exactly the requirements
- MODEL workloads to understand the scale of solution required and ramp up workloads to find the tolerance of existing infrastructure
- GENERATE workloads against proposed storage platforms to ascertain optimal solution, and how many workloads can be supported on a platform
- ANALYSE the performance of proposed solutions which factual data, not vendor marketing figures
Coupling this approach provides an exact science for sizing the storage solution, and coupling this with Computacenter’s real world experience ensures my OCD tendencies can be fully satisfied.
The Computacenter / LoadDynamix Partnership announcement can be found here;
I like accuracy; working together with LoadDynamix we can achieve that not just for me, but more importantly for our customers and their users.
Coming Soon – Look out for the #BillAwards2015 announcing in December; want to know who wins these prestigious awards? Follow me on twitter @billmcgloin for all the answers
We’ve accepted that data has gravity and like any large body as it increases it draws an increasing amount of applications, uses and more data towards it.
We’ve also accepted that applications also have mass, growing in complexity through their evolution unless the painful decision to start again is taken.
Combine these factors with an increasing number of requests, an increasing request size and suddenly you have a significant impact on the access time and the bandwidth available to move data takes a hit.
If these factors apply in day to day operations, then consider the impact when you have to move large quantities of data from one place to another; whether as part of a re-platforming operation, or as a move to an archive, or possibly as a move to the fabled Data lake technology. Then the combined data gravity and Application mass can combine to have a seriously detrimental effect on the movement of data.
Whilst any admin can script the movement of data between platforms and numerous ‘free’ tools exist, the ability to move data rapidly and effectively between similar or dissimilar platforms in a rapid manner, minimising any outages and working around locked files and ensuring file permissions and configurations remain complete, becomes crucial for customers. Neither internal nor external customers accept data outages; we have to be always on.
In my career I have migrated PBs of data between storage arrays, and honestly it can be a long dull and ultimately boring process, certainly not the sexy storage world we’ve come to know and love. Moving data was never something to sit and watch; it was always a kick off and go for several cups of coffee (that may explain your caffeine addiction. ED).
Now, however, things are finally changing. Computacenter has recently partnered with Data Dynamics to move file data more efficiently and effectively than previously possible. Through the use of the Data Dynamics StorageX toolset Computacenter can offer movement of data detailing what moved, where it moved and even what didn’t move (and why). It does this whilst reducing disruption, decreasing migration time by 600% and reducing network load.
Combining these features with the ability to validate the configuration of the target system makes for a very compelling case and ultimately becomes significantly less expensive to a business than the ‘free’ tools available.
Moving data is a weighty matter, but that doesn’t mean it should be stressful.
It was an interesting but different opening keynote session at emc world in Las Vegas this morning; the noticeable absence of Joe Tucci pointing to future sessions as the federation moves forward.
The other thing of note was the inclusion of a significant amount of hardware announcements, for a company moving rapidly towards a software defined world this was pretty unusual.
As the session opened David Goulden CEO of EMC II reminded us that software is the enable of the connected world, how software enables connected devices and how our expectations of technology and data have changed.
Ultimately we have all become part of the Information Generation.
What followed therefore felt a bit strange; it started to feel a bit old-school EMC. As you would expect the product announcements made sense, but the continued focus on infrastructure was not what I expected.
The first major announcement was the launch of VX Rack, targeted to provide Hyper-Convergence at a datacenter scale. The number associated with the launch were admittedly pretty impressive, scaling from 4 to 1000 nodes the VX Rack can provide up to 38PB and up to 240M IOPS. whilst I’m still not sure why I would actually want 240M IOPS it is an impressive number, certainly more then the 24 IOPS I managed to tweet at the time (I’m blaming auto-correct).
Where VX Rack fits between VSpex Blue and VBlock I’ll endeavour to find out across the coming days and report back here.
The second major announcement was the release of Xtremio v4.0, dubbed ‘The Beast’ and launched by David Goulden and Guy Churchward to much fanfare including a caged ‘Beast’ being released into the wild.
With this release comes the availability of the 40TB C-Brick, and with up to 8 such Bricks in an array the overall raw total becomes 320TB. Free upgrades for customers with v3 arrays should be available at an unspecified time this year. With the hardware upgrades comes enhancements in the software with the array such as enhanced management, data protection and cloud integration functionality.
The third announcements was the release of the Data Domain 9500, promising 2x every feature of previous models.
In summary, this morning’s announcements seemed slightly disappointing for those of us that are regular attendees at this event. Enhanced hardware offerings are fine for what they are but hardly earth-shattering. Speaking with other attendees this feeling was shared amongst many others, a general disappointment, but surely there is more to come across the week.
In customer meeting there are a few words I try to avoid using; I try to avoid the C**** word, I’m not keen on using B** D*** unless asked, the current buzz around S****** D****** can mean so many differing things to different people that it’s best to be very clear on precise definitions before starting conversations around it.
Invariably, however, these are terms that naturally come up in conversation; they are areas that challenge customers as we enter the fastest evolution of the IT industry that we have seen.
We’ve talked about Cloud (Knew you would use it at some point. Ed) for several years now and the adoption has grown, certainly over the last 18 months with usage now being mainstream.
Some of the recent announcements from vendors, particularly the messaging coming out of VMworld (Aug 2014), with their announcement of vCloud Air may attract more users to this type of solution to their business challenges, and for many organisations the case for consumption of resource in this manner is compelling.
However, in recent conversations with customers I’ve noticed and interesting trend; whilst the need for Compute resource and Data capacity continues its unrelenting journey on an upwards curve there is more selectivity about where these resources come from.
If data truly is the new natural resource, and the most valuable commodity on the world then a noticeable trend is to keep that value close to home. Whilst customers are happy to consume compute resources from outside their core Datacenter, and even the application layer being consumes ‘As a Service’ they are becoming increasingly keen to protect their data and house it locally.
There can be many valid reasons to keep data close to home; sovereignty, security, compliance and protection, but possibly Data is the Glue that holds the business operation together. Data Glue now there’s an interesting concept, watch this space….
I was asked by a vendor recently why customers would purchase another vendor’s storage solution when theirs could offer both the functionality and performance at a lower price. Whilst there can be many answers to such a question, and I did supply several, the one that got me thinking was ‘Because they know it, they like it and because it has never let them down’. In other words the same reason that the chap asking the question was on his 5th BMW, he liked and trusted the brand.
However, in the wonderful world of data all you need to understand is that things ain’t what they used to be. More than perhaps ever before, success, maybe even survival, may depend on a company’s ability to cultivate loyal, maybe even devoted, repeat customers. The thing about loyalty is that it isn’t always as pure as we’d like it to be. Sometimes (ideally) it’s earned, but sometimes it can be all but forced upon us for technical or commercial reasons, or possibly the disruption is too excessive to consider change.
Storage vendors have become very good at offering competitive upgrade programs designed to retain the customer, and in general these work very well for both the vendor (who retains the footprint) and the customer (who gets a commercially compelling deal). Equally vendors can offer deals to replace another technology with their own, whilst enabling ease of migration between platforms. But therein lies the challenge; technology is always unique to a single storage vendor; whilst the underlying disk technology may be the same the connections and access methods are invariably incompatible.
Now, it’s very possible to carry out data-in-place upgrades to controllers, with the existing disk technology remaining in place, with this type of solution available from the majority of vendors, and that does encourage a sense of loyalty to vendors. Obviously the decision to change technology in an environment is a hard, or brave, decision to make, and certainly not an easy one. Therefore in many circumstances it simply becomes easier for customers to stick with what they have in place.
However, as I said earlier, times they are a changin’, and the world of data and storage is changing more rapidly than other areas, and these changes may make it harder for incumbent vendors to retain their existing customer base. As we enter the world where software is king and everything assumes the software defined banner then suddenly the need to stick with a previously preferred vendor disappears. Going further, by removing the intelligence previously provided by the disk controllers to a software layer not only removes the need to have loyalty to a vendor, it also removes the need to be tied to dedicated block, file or object based arrays and removes the need to be tied to specific features of an existing array.
New and emerging vendors are already using commodity components and are providing their USP and value through their unique software, but even this can become compromised as the software layer evolves.
It’s a challenge that all existing storage vendors will have to face; this rise of commodity infrastructure in all its guises is coming fast and not stopping, and whilst this applies to all infrastructures it’s at the data layer where the most radical changes may occur.
As always, it’s a really interesting time to be working in storage and data.