Archive by Author | Bill McGloin

Avoiding the Doping Scandal in Storage Performance

Some Performance is More Equal than Others

Some Performance is More Equal than Others

For those that don’t know, my background is in Mathematics & Physics which, as a wise man once pointed out to me, is why I have OCD tendencies around numbers.

I like precision, I don’t like estimates or guesstimates, and I’m not a big fan of vendor spreadsheets that show how their technology will reduce your Capex or Opex and provide virtually immediate ROI, because we all know there are so many variables that they cannot possibly be particulalry accurate.

If I followed these models ultimately I could go in ever-decreasing circles where I have ultimate performance, at little cost, with no footprint and it pays for itself before I’ve bought it. Hooray for that!

Back in my precise world it’s important that we know what it realistically achievable, and more importantly what is achievable in specific environments with specific applications. One thing we have learned is that whilst all storage technology may look similar from the outside, it doesn’t always perform in a similar manner. One thing I’m asked repeatedly is how to decide between vendor technologies and what is the optimal solution for customers.

The answer is not simple, there are many variables that can affect the performance of any storage environment, and why for specific workloads there will be a solution which will work better than others for specific criteria. When sizing storage solutions we need to look at a multitude of variables;

  • Performance requirements in terms of IOPS, Latency & Bandwidth
  • Read / Write ratios
  • Application usage
  • Block size in use
  • Typical file sizes
  • Whether compression is applicable,and how well data may compress
  • Deduplication and how well data can be deduplicated

Now here comes the challenge; 64% of IT organisations don’t know their application storage I/O profiles & performance requirements; so they guess. The application owner may closely know the performance and capacity requirements, but adds extra to accommodate growth and ‘just to be safe’. The IT department takes the requirements and adds some more for growth and ‘just to be safe’ because ultimately we cannot have a new storage subsystem which does not deliver the required performance.

This means performance planning can be guesswork, with substantial under or more likely over-provisioning, and the unseen costs of troubleshooting and administration providing more significant overheads than should be necessary.

The ultimate result of this can be a solution which meets all the performance requirements but is inefficient in terms of cost and utilisation.

This is where Computacenter come in; working closely with our latest Partner LoadDynamix we can;

  • ACQUIRE customer specific workloads and understand exactly the requirements
  • MODEL workloads to understand the scale of solution required and ramp up workloads to find the tolerance of existing infrastructure
  • GENERATE workloads against proposed storage platforms to ascertain optimal solution, and how many workloads can be supported on a platform
  • ANALYSE the performance of proposed solutions which factual data, not vendor marketing figures

Coupling this approach provides an exact science for sizing the storage solution, and coupling this with Computacenter’s real world experience ensures my OCD tendencies can be fully satisfied.

The Computacenter / LoadDynamix Partnership announcement can be found here;

http://www.computacenter.com/news/151006_Load_DynamiX.asp

I like accuracy; working together with LoadDynamix we can achieve that not just for me, but more importantly for our customers and their users.

Coming Soon – Look out for the #BillAwards2015 announcing in December; want to know who wins these prestigious awards? Follow me on twitter @billmcgloin for all the answers

Moving Data is a Weighty Matter

 

powerlifting

 

We’ve accepted that data has gravity and like any large body as it increases it draws an increasing amount of applications, uses and more data towards it.

We’ve also accepted that applications also have mass, growing in complexity through their evolution unless the painful decision to start again is taken.

Combine these factors with an increasing number of requests, an increasing request size and suddenly you have a significant impact on the access time and the bandwidth available to move data takes a hit.

If these factors apply in day to day operations, then consider the impact when you have to move large quantities of data from one place to another; whether as part of a re-platforming operation, or as a move to an archive, or possibly as a move to the fabled Data lake technology. Then the combined data gravity and Application mass can combine to have a seriously detrimental effect on the movement of data.

Whilst any admin can script the movement of data between platforms and numerous ‘free’ tools exist, the ability to move data rapidly and effectively between similar or dissimilar platforms in a rapid manner, minimising any outages and working around locked files and ensuring file permissions and configurations remain complete, becomes crucial for customers. Neither internal nor external customers accept data outages; we have to be always on.

In my career I have migrated PBs of data between storage arrays, and honestly it can be a long dull and ultimately boring process, certainly not the sexy storage world we’ve come to know and love.  Moving data was never something to sit and watch; it was always a kick off and go for several cups of coffee (that may explain your caffeine addiction. ED).

Now, however, things are finally changing. Computacenter has recently partnered with Data Dynamics to move file data more efficiently and effectively than previously possible. Through the use of the Data Dynamics StorageX toolset Computacenter can offer movement of data detailing what moved, where it moved and even what didn’t move (and why). It does this whilst reducing disruption, decreasing migration time by 600% and reducing network load.

Combining these features with the ability to validate the configuration of the target system makes for a very compelling case and ultimately becomes significantly less expensive to a business than the ‘free’ tools available.

Moving data is a weighty matter, but that doesn’t mean it should be stressful.

EMC World Day 1 Announcement Summary

It was an interesting but different opening keynote session at emc world in Las Vegas this morning; the noticeable absence of Joe Tucci pointing to future sessions as the federation moves forward.

The other thing of note was the inclusion of a significant amount of hardware announcements, for a company moving rapidly towards a software defined world this was pretty unusual.

As the session opened David Goulden CEO of EMC II reminded us that software is the enable of the connected world, how software enables connected devices and how our expectations of technology and data have changed.

Ultimately we have all become part of the Information Generation.

What followed therefore felt a bit strange; it started to feel a bit old-school EMC. As you would expect the product announcements made sense, but the continued focus on infrastructure was not what I expected.

The first major announcement was the launch of VX Rack, targeted to provide Hyper-Convergence at a datacenter scale. The number associated with the launch were admittedly pretty impressive, scaling from 4 to 1000 nodes the VX Rack can provide up to 38PB and up to 240M IOPS. whilst I’m still not sure why I would actually want 240M IOPS it is an impressive number, certainly more then the 24 IOPS I managed to tweet at the time (I’m blaming auto-correct).

Where VX Rack fits between VSpex Blue and VBlock I’ll endeavour to find out across the coming days and report back here.

The second major announcement was the release of Xtremio v4.0, dubbed ‘The Beast’ and launched by David Goulden and Guy Churchward to much fanfare including a caged ‘Beast’ being released into the wild.

With this release comes the availability of the 40TB C-Brick, and with up to 8 such Bricks in an array the overall raw total becomes 320TB. Free upgrades for customers with v3 arrays should be available at an unspecified time this year. With the hardware upgrades comes enhancements in the software with the array such as enhanced management, data protection and cloud integration functionality.

The third announcements was the release of the Data Domain 9500, promising 2x every feature of previous models.

In summary, this morning’s announcements seemed slightly disappointing for those of us that are regular attendees at this event. Enhanced hardware offerings are fine for what they are but hardly earth-shattering. Speaking with other attendees this feeling was shared amongst many others, a general disappointment, but surely there is more to come across the week.

The Rise of the Data-Only Datacenter?

In customer meeting there are a few words I try to avoid using; I try to avoid the C**** word,  I’m not keen on using B** D*** unless asked, the current buzz around S****** D****** can mean so many differing things to different people that it’s best to be very clear on precise definitions before starting conversations around it.

Invariably, however, these are terms that naturally come up in conversation; they are areas that challenge customers as we enter the fastest evolution of the IT industry that we have seen.

We’ve talked about Cloud (Knew you would use it at some point. Ed) for several years now and the adoption has grown, certainly over the last 18 months with usage now being mainstream.

Some of the recent announcements from vendors, particularly the messaging coming out of VMworld (Aug 2014), with their announcement of vCloud Air may attract more users to this type of solution to their business challenges, and for many organisations the case for consumption of resource in this manner is compelling.

However, in recent conversations with customers I’ve noticed and interesting trend; whilst the need for Compute resource and Data capacity continues its unrelenting journey on an upwards curve there is more selectivity about where these resources come from.

If data truly is the new natural resource, and the most valuable commodity on the world then a noticeable trend is to keep that value close to home. Whilst customers are happy to consume compute resources from outside their core Datacenter, and even the application layer being consumes ‘As a Service’ they are becoming increasingly keen to protect their data and house it locally.

There can be many valid reasons to keep data close to home; sovereignty, security, compliance and protection, but possibly Data is the Glue that holds the business operation together. Data Glue now there’s an interesting concept, watch this space….

Is Storage Brand Loyalty a Thing of the Past?

I was asked by a vendor recently why customers would purchase another vendor’s storage solution when theirs could offer both the functionality and performance at a lower price. Whilst there can be many answers to such a question, and I did supply several, the one that got me thinking was ‘Because they know it, they like it  and because it has never let them down’.  In other words the same reason that the chap asking the question was on his 5th BMW, he liked and trusted the brand.

However, in the wonderful world of data all you need to understand is that things ain’t what they used to be. More than perhaps ever before, success, maybe even survival, may depend on a company’s ability to cultivate loyal, maybe even devoted, repeat customers. The thing about loyalty is that it isn’t always as pure as we’d like it to be. Sometimes (ideally) it’s earned, but sometimes it can be all but forced upon us for technical or commercial reasons, or possibly the disruption is too excessive to consider change.

Storage vendors have become very good at offering competitive upgrade programs designed to retain the customer, and in general these work very well for both the vendor (who retains the footprint) and the customer (who gets a commercially compelling deal). Equally vendors can offer deals to replace another technology with their own, whilst enabling ease of migration between platforms. But therein lies the challenge; technology is always unique to a single storage vendor; whilst the underlying disk technology may be the same the connections and access methods are invariably incompatible.

Now, it’s very possible to carry out data-in-place upgrades to controllers, with the existing disk technology remaining in place, with this type of solution available from the majority of vendors, and that does encourage a sense of loyalty to vendors. Obviously the decision to change technology in an environment is a hard, or brave, decision to make, and certainly not an easy one. Therefore in many circumstances it simply becomes easier for customers to stick with what they have in place.

However, as I said earlier, times they are a changin’, and the world of data and storage is changing more rapidly than other areas, and these changes may make it harder for incumbent vendors to retain their existing customer base. As we enter the world where software is king and everything assumes the software defined banner then suddenly the need to stick with a previously preferred vendor disappears. Going further, by removing the intelligence previously provided by the disk controllers to a software layer not only removes the need to have loyalty to a vendor, it also removes the need to be tied to dedicated block, file or object based arrays and removes the need to be tied to specific features of an existing array.

New and emerging vendors are already using commodity components and are providing their USP and value through their unique software, but even this can become compromised as the software layer evolves.

It’s a challenge that all existing storage vendors will have to face; this rise of commodity infrastructure in all its guises is coming fast and not stopping, and whilst this applies to all infrastructures it’s at the data layer where the most radical changes may occur.

As always, it’s a really interesting time to be working in storage and data.

So Much Data, So Little Information

Data is all around us. But does it inform us, entertain us, connect us, influence us and divide us? Or is it danger of consuming us?

We all know data is growing rapidly, and all kinds of statistics exist about creating more data in the last two years than in all of creation before that time. If all words spoken by every human who ever lived were committed as text it would consume just over 2.2 Exabytes of capacity, in 2014 we will create more than 20x this figure.

On its own much of this may be a problem or it may be nice to have.

Let’s consider some raw data… 12. On its own it means nothing to us, other than it’s a number, some raw data.

What if we tie it in with another piece of data OC, then we know if we join two separate pieces together we have a temperature, still really not much use to us.

But consider if we add another two elements of potentially unrelated data; London & Tuesday, then suddenly we have

12OC London Tuesday

Now we are able to use the data to make an information decision; we know what to wear, how to travel and what to carry. Several pieces of data used together now have relevance, they have become information.

Let’s take another piece of raw data, what do you think of if I give you the word ‘Bridge’ – a river?

What if I add Playing cards, suddenly you think of a card game

What if I change cards to dentist? Something else again?

The second part of information is context, without context data is no use to us.

For data to of use we need both relevance and context, when we have both we can use the information to make informed decisions. We can use it in four ways;

  • Descriptive –to understand what’s happened
  • Diagnostic –to understand why it’s happened
  • Predictive- to forecast what might happen
  • Prescriptive – to know what to do when it does happen

Sources of Data without context & relevance are of no use, data with context and relevance can be used together in ways we’ve not yet dreamed of.

Remember the Most valuable currency in the world is not money; it’s not data, its information.

Multiples of bytes

Decimal
Value Metric
1000 kB kilobyte
10002 MB megabyte
10003 GB gigabyte
10004 TB terabyte
10005 PB petabyte
10006 EB exabyte
10007 ZB zettabyte
10008 YB yottabyte
Binary
Value JEDEC IEC
1024 KB kilobyte KiB kibibyte
10242 MB megabyte MiB mebibyte
10243 GB gigabyte GiB gibibyte
10244 TiB tebibyte
10245 PiB pebibyte
10246 EiB exbibyte
10247 ZiB zebibyte
10248 YiB yobibyte
Orders of magnitude of data

Every Generation Needs a New Revolution

Congratulations to anyone that spotted the above to be a quote from the third President of the USA, Thomas Jefferson, and although he may have said it in 1803 the relevance remains today.

I’m no longer sure which generation I belong to, I come from an age when disk drives could be measured in Megabytes, nowadays we don’t talk in Gigabytes and some of us don’t even talk inTerbytes any more. We know data is exploding and we know technology develops to cope with this; however that’s evolution not revolution.

I believe we are at the cusp of the next revolution in technology. To be the next big thing has to fundamentally change how we do things. It has to change how we look and think about our world; it has to be revolutionary.

It used to be that we got excited by individual pieces of technology; maybe our first laptop, maybe our first 1Tb drive, maybe our first smartphone which you just love to hold, and be seen with.

But whilst these may be considered revolutionary, they remain point solutions – they are single dimensional.

We’re moving into a multi-dimensional world of IT.  We’re moving from single dimensional solutions to Multi-dimensional solutions

revolution

  • Where everything has an impact on everything else
  • Where every piece of technology has to interact with everything else

That’s just in business, what about the personal world, where your smart phone has to interact with your car, which has to interact with your microwave, which has to interact with your television, so when you get home everything is in its place. How do you choose? And more importantly how do you control it all?

The problem with multi-dimensional solutions is that there so many choices to be made. We are seeing the start of this wave now in the ‘Software Defined’ world, where it gets harder to identify components of a solution, but really why should we care anyway?

So what do we do in this multi-dimensional, software-defined world of IT?

  • Should you ignore everyone and continue as you are, after all it works doesn’t it?
  • Maybe putting everything in the cloud and consuming as a service is the answer
  • Why not adopt all the new methods, be seen to be progressive but continue to do everything the same old way?
  • What if you adopt every new solution out there, change all your processes to get all that benefit you’ve been promised? How much disruption would that cause? & what if it doesn’t deliver?

It is a minefield out there, and as with all minefields it’s always good to have someone with experience to guide you through it. This is where Computacenter come in.

Every generation has an obligation to renew, reinvent, re-establish, re-create structures and redefine its realities for itself. Get ready for the next generation.