We’ve all seen the graphs showing data growing exponentially, and how much data has been created in the last two years, and how much data we are all going to create in the future, so it’s natural that we talk about the Volume (V1) of data.
Then, there is numerous sources of data, all different that we are having to deal with, our next accepted fact is that the is a large Variety (V2) of data coming in.
Next, we talk about the speed of data creation, the Velocity (V3), with new sources comes a new generation of data sources, presenting an ever-increasing problem for vendors and customers alike.
Also, multiple disparate sources of data lead to a Variance (V4) in quality and data types , in turn leading to further problems for staff.
However, the most important V is Value, the value quality data can bring to a business, the competitive advantage it bestows, and the ability to change the entire business direction.
We hear ‘Data is the New Oil’; it’s not a view I subscribe to. As oil needs to be refined and treated properly to become of use, so data must be treated properly to deliver true business value. Which is why I believe the Ps of data are much more important than the universally accepted Vs.
The first important P is PREPARATION. As oil need to be refined to provide benefit, so data needs to be understood and nurtured to allow the data-driven business to exist. It’s critically important to understand data sources, and the ultimate desired outcome. Once sources are identified data needs to be ingested, this is now where we get onto the topic of Edge Computing, but I’ll save that for next time.
Data then needs to be cleansed to ensure quality, erroneous and duplicate entries need to be removed, driving true quality into data sets is crucial to deliver the desired outcome in the most efficient manner.
Then the next P comes into play; PLATFORM. Prepared data needs to reside on the optimum platform to allow the creation of intelligent business Information. The platform should align with the criticality of the content to the business. This can be on-premises or Cloud as appropriate. Obviously some data needs to remain in a datacenter, however Cloud-based platforms can now offer previously unachievable levels of performance. Whilst the Edge-Core-Cloud model still prevails it is important to investigate options, both in terms of performance & cost, ensuring user requirements are met is paramount.
Once PREPARED & PLATFORMED, the real benefits can be achieved; PRODUCTIVITY. This is where the only important V is relevant; business Value. The ability to use data, and it’s more important sibling, Information to transform a business is at the true heart of digital transformation. There are simply too many use cases to list, but simply defining smart initiatives and involving key stakeholders across the business will allow organisations to identify and validate their use cases, and quickly gain a competitive advantage. They can become truly Data-Driven.
The Final P is probably the most important; PROFIT. The Data-Driven business will simply out-perform their competition. The ability to PROFIT from data does not necessarily have to be commercial, it can translate in many ways.
So, next time someone talks about volume and velocity, remember the four Ps are the real key to data value;
….and I didn’t even mention PROTECT. Until the next time…
See what I did there? TrAIn? AI or to give it its Sunday name Artificial Intelligence is everywhere just now. Or rather, it is everywhere in the technology press, but we’re just at the cusp of it coming into and affecting our lives. No longer do we need to worry about Cloud or Big Data as our hype trends, now we have AI and her close friend IoT. (You forgot Machine Learning – Ed)
However, we are several years away from seeing the true impact of AI. The growth in the number of connected devices allows businesses to transform (you nearly got Digital Transformation in as well for those playing buzzword bingo – Ed) based on the more and varied sources of data. As the number of connected devices grows so our Data Scientists can interpret and turn this deluge of data into business Information.
(Image Source; The Connectivist)
At present, this is simply Machine Learning, not true Artificial Intelligence. Current decision making technologies and outcomes are programmed by humans, there is no interpretation with all outcomes driven by complex algorithms. We now have infrastructure solutions more capable than ever of processing information in real time. As an example, Suduko; a typical person would take over two hours to complete a 4×4 puzzle, Google Image Recognition Software can complete the same puzzle in 9 seconds.
Therefore we know advantages can be gained by systems completing tasks faster than humans can possibly consider. We have high precision robots, we can translate signs simply by pointing our phones at them using image recognition and we almost have driverless cars, however the car doesn’t quite understand yet why I want to go via Dominos to collect a pizza on the way home.
The gap to true AI is logic and reasoning; whilst robots can do a significant amount of human tasks they will not know why, they are following a set of instructions. Whilst we can possibly program logic into a robotic operation can we get the same robot to comprehend moral issues? What if a self-driving car is out of control? Does it drive into a wall risking its passengers or hit a bus stop of people? The moral issue has relevance in these situations, and has to be involved in the decision making process.
So whilst we are still somewhat away from holidaying at Westworld[i] the current rate of technological advancement will see it arrive in the next several years. It will have a material impact on all our lives and we will see autonomous vehicles, enhanced customer services and a myriad of options we simply have not considered yet. The data we are generating today is already impacting development of future products and services, from healthcare to transport and everything in between.
The obvious concern is the terminator scenario where computers think for themselves and take over, as per the well-publicised exchange between Mark Zuckerberg and Elon Musk recently there are differing opinions on this. Whilst unlikely, the potential for computers to adopt human traits of emotion, aggression and protection exist and it’s important that humans retain the ultimate off switch.
I’m not planning to work for Cyberdyne Systems or develop my own Skynet just yet, and I fully intend that any robot will work for me and not the other way about.
[i] Westworld, for those living in a bubble, is an American science fiction western thriller television series. The story takes place in the fictional Westworld, a technologically advanced Wild West–themed amusement park populated by android hosts. Westworld caters to high-paying guests, who may indulge in whatever they wish within the park, without fear of retaliation from the hosts, or so they thought.
As individuals we create increasing amounts of personal data, this data can be hugely valuable to businesses allowing them to turn your raw data into valuable business information. Businesses use information you provide to target both you and people from similar backgrounds with whatever product they happen to be marketing.
The interesting question is who actually owns the data we provide. Who is responsible for the data we supply? In general, people naturally assume that businesses own this data and will protect it and use it responsibly. However as we’ve seen recently this is not always the case.
Recent data breaches, with Experian in the USA being a recent example, have shown that our personal information is not always as safe as we would like to hope. We get no visibility of how our data is being used, protected or what is done with it after we willingly supply it. With constant and increasing numbers of data breaches our data becomes more vulnerable. Remember data is more valuable than oil.
There are many example of data misuse, ranging from nuisance phone calls, spam mails and unsolicited post. However, this may all be about to change. Under the forthcoming GDPR regulations businesses will become simply custodians of my data.
It’s important for organisations to realise that IT departments do not own the data, they simply provide the infrastructure to allow access to data through a series of applications. The business is responsible for the data held, and to continue to get value from it they will have to treat it differently going forward.
Businesses will need to become more transparent in their dealing with external customers, through showing what data is held, and even why it remains held, either through showing agreement to allow data to be held verbally or through the dreaded tick box.
Inevitably this will lead to a change in business processes, which is why at Computacenter we have seen a rise in demand for data masking and Anonymisation. This allows organisations to translate their data held into valuable information without the risk of items being personally identifiable.
Possibly the most important thing for businesses to do over the coming months is to start to understand what data they have, what is valuable to them and can be translated to Information, what new or existing sources of data they have and how they treat it to ensure regulatory compliance.
My data belongs to me now, I may let organisations use it in return for a service I deem of value but ultimately it is personal and belongs to me.
There’s a new sheriff in town…..
The smart office has become more common in workplaces across the country. The Digital workplace has evolved to make our workplaces more efficient and adaptable to the changing needs of users.
By incorporating smart devices such as motion sensors, thermostats, smart switches and cameras organisations can reduce energy consumption, improve staff morale and improve productivity. Since commercial buildings account for around 40% of global energy consumption embedding sensors in walls and ceilings can have significant impact on the only using resources such as lighting, heating or cooling only when staff are present.
These sensors can be connected to the company network and using visualisation techniques can provide a view of working patterns. In turn, this can lead to energy savings of between 20-40%. Whilst the cost of creating the smart office is not insignificant potential benefits for businesses can be realised in relatively short periods.
The rise and growth of these IoT devices continues exponentially and helps create efficiencies in floor space usage and space planning. These devices can improve the experience for workers and allow the creation of personalised workspaces where individual lighting and cooling can be controlled either by an App or by your smart desk.
However, this does not come without its privacy challenges. If your smart desk recognises you through RFID tagging as you approach, and creates your personalised settings you are immediately engaged and hopefully more efficient.
The challenge comes with how much your desk then knows about you. Heat and motion sensors, RFID tags and proximity sensors mean that workers are potentially under constant surveillance. Sensors can track when people are at desks, moving around, present or not present, whether individual workers are happy with this level of surveillance remains to be seen.
Concerns are starting to be raised around what data may be being collected by sensors. We come back to the privacy paradox around what people are willing to sacrifice in terms of their privacy for convenience. Most data will be collected anonymously bit that does not preclude future use for other purposes. There is a fine line between efficiency and surveillance as some organisations have found out to their cost.
We may be entering the age of the Smart Building, but it may find itself in competition with the smart human. Will the last person to leave switch off the lights? no need for that the building will do that itself. It may not be Big Brother that is watching you; it may be Big Building.
I unlock my phone by looking at it in a meaningful way; it trusts me and unlocks. Despite the rumours I’ve yet to be able to unlock with a photo.
Both my face and yours has 83 data points that technology can recognise to ensure we are actually who we say we are. So if I can unlock my phone what else can I do with my face? Over the past few years computers are becoming increasingly good at recognising faces by using these data points and by measuring the distance between them.
We’re seeing solutions come to market to provide enhanced convenience to users, and also to provide surveillance capabilities to authorities. We’re already seeing developments in China around extensive use of facial recognition; walk up to the barrier at a train station and the gate opens for you, assuming your face resembles your national identity card. No worries if you’re feeling rough or having a bad hair day, there are sufficient data points to allow you through the barrier.
This negates the need for the widespread use of contactless cards that we currently see used extensively in the UK. This then has an impact on our banking regimes, as the technology advances we may see reduced demands for passwords and PIN numbers as we may just simply look at the ATM and ask for cash; ‘Alexa, can I have £60 please?’
It’s already possible to transfer money using an app and your face to authorise. Again in China 120 million people have access to a mobile payment app using their face as credentials. It’s possible to both transfer money and also get a loan simply by using your face as identification.
Ticket touts are the current scourge of getting into concerts (something close to my heart), but if your ticket is matched to your face then there is no unauthorised secondary ticket market. Getting access to sporting events could be made easier for the fan, whilst saving costs for the club.
In addition, whilst surveillance is still considered a delicate subject, tracking of movement through a venue allows for efficiencies in access areas and the targeting of relevant services to individuals. It could also allow tracking of movement through public transport systems for improved customer experiences.
We’ve heard a lot about body-worn police cameras recently. Ultimately these could be linked to central resources for the identification of known criminals making our streets a safer place.
Cars could be enabled to recognise an authorised driver, meaning no stolen cars and no lost keys. The list goes on.
Obviously this relies on a few things, one of the reasons that China is a large market for this is the large national database for identification purposes, and whilst some may not be comfortable with this in the Western world, there is a decision as to whether the benefits outweigh the use of your personal data – The Privacy Paradox applies.
It would also rely on suitably responsive infrastructure to support the use cases, but with the technology evolution you’ll soon be able to use public transport, buy goods and when you walk into Starbucks they will no longer need to ask your name, you’ll be recognised as you walk in, and this time the cup will have your correct name on it.
Now where is that false beard?
It feels slightly strange sitting in Scotland in December and its 12 degrees outside, we’re much more used to snow and a white Christmas. No doubt that Global Warming has affected our weather systems here.
However, in the world of data it’s becoming a polar opposite (see what I did there?). Data continues to grow and get colder by the day. We’ve become a society of data hoarders, we continue to store everything, never accessing but keeping it for that ‘just in case’ moment. This has led to the rise of ROT data; Redundant, Obsolete or Trivial content that is never accessed but continues to consume valuable resources.
A recent Veritas survey shows that only 14% of our data is accessed regularly, with a further 32% being classified as ROT data. The worrying statistic is the one I’ve not yet quoted; this means that 54% of our data is simply unknown, and like the majority of an iceberg sits unseen below our visibility.
This dark data may have business value, or may be valueless, but the crucial point being that it remains unknown. More worryingly, this dark data may contain personal customer information, non-compliant data or other high-risk corporate data, with the potential for critical risks at the core of a business.
Recent legislation changes mean that Data Governance has to become more critical to business operations, location of data, content of repositories and the ability to search and discover data of relevance, upon demand, is placing new and unique challenges for IT operations, challenges that they have never previously faced.
Illuminating dark data is not easy, it requires elimination of ROT, it requires understanding of corporate data and what data may have business value, and it requires further understanding of legislation relative to the customer environment. Finally the ability to find that needle requires the use of tools and the knowledge to understand what you are looking for.
Having the ability to seek across all data sets, and having the ability to apply filters to the searches is not an easy task, but one that you will face at some point. Identifying the process and the tools is a mission that needs addressing now, when you are asked for it may be too late to avoid significant costs and the potential for large fines if data cannot be produced in a timely manner.
The Data Iceberg is not melting, but at least we can understand the 54% not immediately visible to us. Our data hoarding exacerbates the problem, time to shine a light in the darkness.
Now, where’s my sunglasses?
*Information has been sourced from the recent Veritas publication; The Databerg Report: See What Others Don’t
I’m allowed some Star Wars geekery occasionally!
With the imminent launch of the latest Star Wars movie I turned to thinking about the generation of images used in movies. We think less and less about the computer generated images we see in movies, but are simply accepting of them as part of the action, even though the Wow factor is still there.
We know that those buildings are not really destroyed; the Golden Gate Bridge has not really been devastated 20 times in movies recently, so we know its Computer Generated Imagery (CGI), but have we ever thought about the technology required to create these sequences?
Most important in this process is the role of the storage environment; it’s imperative to be able to process images quickly and to be able to render images in a timeframe to minimise cost and production time.
This is one of the places that Flash-based storage arrays really shine; the ability to deliver output in a rapid fashion means that my Star Wars user experience happens in 2015, and not in several more years’ time.
Remember, the original Disney cartoons took several years to make but now several can be produced every year, Flash storage solutions are one of the key factors behind this.
Now, performance isn’t always everything, but in the film industry it can be.
Whilst I genuinely have no preference for technology vendors, occasionally there are just some things you just have to highlight. One of these has been our recent testing of the HP StoreServ 20850 storage array. Having recently achieved world record results in the SPC-2 tests the 20850 became an obvious candidate for Computacenter to evaluate whether the claims could be substantiated in a real world scenario.
The performance of this array has been blindingly fast, and is one of the few which actually matches the vendor’s claims in terms of performance. Having tested several vendors’ solutions, the HP 20850 has stood with the best of them in terms of both price and performance. Combining this with improved manageability makes the HP 20850 a compelling solution for customers across a wide range of applications, and supports customers in their move to the silicon datacentre.
The HP StoreServ represents a return to form for one of the major players in the storage industry, and is available for Customer Demonstration with a variety of either simulated workloads, or customer-specific tests utilising actual data, in the Computacenter Solution Centre based in Hatfield.
To (almost) quote Darth Vader; ‘HP StoreServ 20850- The Force is Strong with This One’
We’ve accepted that data has gravity and like any large body as it increases it draws an increasing amount of applications, uses and more data towards it.
We’ve also accepted that applications also have mass, growing in complexity through their evolution unless the painful decision to start again is taken.
Combine these factors with an increasing number of requests, an increasing request size and suddenly you have a significant impact on the access time and the bandwidth available to move data takes a hit.
If these factors apply in day to day operations, then consider the impact when you have to move large quantities of data from one place to another; whether as part of a re-platforming operation, or as a move to an archive, or possibly as a move to the fabled Data lake technology. Then the combined data gravity and Application mass can combine to have a seriously detrimental effect on the movement of data.
Whilst any admin can script the movement of data between platforms and numerous ‘free’ tools exist, the ability to move data rapidly and effectively between similar or dissimilar platforms in a rapid manner, minimising any outages and working around locked files and ensuring file permissions and configurations remain complete, becomes crucial for customers. Neither internal nor external customers accept data outages; we have to be always on.
In my career I have migrated PBs of data between storage arrays, and honestly it can be a long dull and ultimately boring process, certainly not the sexy storage world we’ve come to know and love. Moving data was never something to sit and watch; it was always a kick off and go for several cups of coffee (that may explain your caffeine addiction. ED).
Now, however, things are finally changing. Computacenter has recently partnered with Data Dynamics to move file data more efficiently and effectively than previously possible. Through the use of the Data Dynamics StorageX toolset Computacenter can offer movement of data detailing what moved, where it moved and even what didn’t move (and why). It does this whilst reducing disruption, decreasing migration time by 600% and reducing network load.
Combining these features with the ability to validate the configuration of the target system makes for a very compelling case and ultimately becomes significantly less expensive to a business than the ‘free’ tools available.
Moving data is a weighty matter, but that doesn’t mean it should be stressful.
It was an interesting but different opening keynote session at emc world in Las Vegas this morning; the noticeable absence of Joe Tucci pointing to future sessions as the federation moves forward.
The other thing of note was the inclusion of a significant amount of hardware announcements, for a company moving rapidly towards a software defined world this was pretty unusual.
As the session opened David Goulden CEO of EMC II reminded us that software is the enable of the connected world, how software enables connected devices and how our expectations of technology and data have changed.
Ultimately we have all become part of the Information Generation.
What followed therefore felt a bit strange; it started to feel a bit old-school EMC. As you would expect the product announcements made sense, but the continued focus on infrastructure was not what I expected.
The first major announcement was the launch of VX Rack, targeted to provide Hyper-Convergence at a datacenter scale. The number associated with the launch were admittedly pretty impressive, scaling from 4 to 1000 nodes the VX Rack can provide up to 38PB and up to 240M IOPS. whilst I’m still not sure why I would actually want 240M IOPS it is an impressive number, certainly more then the 24 IOPS I managed to tweet at the time (I’m blaming auto-correct).
Where VX Rack fits between VSpex Blue and VBlock I’ll endeavour to find out across the coming days and report back here.
The second major announcement was the release of Xtremio v4.0, dubbed ‘The Beast’ and launched by David Goulden and Guy Churchward to much fanfare including a caged ‘Beast’ being released into the wild.
With this release comes the availability of the 40TB C-Brick, and with up to 8 such Bricks in an array the overall raw total becomes 320TB. Free upgrades for customers with v3 arrays should be available at an unspecified time this year. With the hardware upgrades comes enhancements in the software with the array such as enhanced management, data protection and cloud integration functionality.
The third announcements was the release of the Data Domain 9500, promising 2x every feature of previous models.
In summary, this morning’s announcements seemed slightly disappointing for those of us that are regular attendees at this event. Enhanced hardware offerings are fine for what they are but hardly earth-shattering. Speaking with other attendees this feeling was shared amongst many others, a general disappointment, but surely there is more to come across the week.
In customer meeting there are a few words I try to avoid using; I try to avoid the C**** word, I’m not keen on using B** D*** unless asked, the current buzz around S****** D****** can mean so many differing things to different people that it’s best to be very clear on precise definitions before starting conversations around it.
Invariably, however, these are terms that naturally come up in conversation; they are areas that challenge customers as we enter the fastest evolution of the IT industry that we have seen.
We’ve talked about Cloud (Knew you would use it at some point. Ed) for several years now and the adoption has grown, certainly over the last 18 months with usage now being mainstream.
Some of the recent announcements from vendors, particularly the messaging coming out of VMworld (Aug 2014), with their announcement of vCloud Air may attract more users to this type of solution to their business challenges, and for many organisations the case for consumption of resource in this manner is compelling.
However, in recent conversations with customers I’ve noticed and interesting trend; whilst the need for Compute resource and Data capacity continues its unrelenting journey on an upwards curve there is more selectivity about where these resources come from.
If data truly is the new natural resource, and the most valuable commodity on the world then a noticeable trend is to keep that value close to home. Whilst customers are happy to consume compute resources from outside their core Datacenter, and even the application layer being consumes ‘As a Service’ they are becoming increasingly keen to protect their data and house it locally.
There can be many valid reasons to keep data close to home; sovereignty, security, compliance and protection, but possibly Data is the Glue that holds the business operation together. Data Glue now there’s an interesting concept, watch this space….