Data science is going to involve trial and error, and the costs of failure have to be contained
Listening to the presentations at the Information Age Data Leadership conference yesterday, it was noticeable that the term ‘big data’ didn’t come up with the frequency some of us expected. It was even treated a little disdain by a couple of speakers, portrayed as a term with more hype than substance behind it.
It reflected some of the lessons that emerged from the conference: that data has to be properly prepared to obtain clear insights; that it’s easier to prepare data held inside your enterprise; and that a lot of big data, despite all of its promises, resides outside the enterprise. The hard part in harnessing big data is not just to get at all that juicy information on the outside, but to get into a shape from which you can produce something worthwhile.
It was notable that in the stand out case study, on how Network Rail is learning a stream of valuable lessons from its data, which came predominantly from inside the organisation, making it much easier to manage. And a strong impression to emerge from the day was that you can get the best results in the short term by limiting your ambition, looking at what you have and can reasonably use rather than making grand plans to tap into streams from the outside world.
I’m not one to write off the potential of big data; it’s the continuation of a trend of bringing together and analysing information that is already producing plenty of real value for business. But it is probably being talked up by its evangelists to the point where it will disappoint a lot of expectations in the short term.
Harnessing all that unwieldy data from outside the enterprise is going to be a massive task, made more difficult by the unstructured nature of a lot of the information, and it will take a long time for best practice to develop. The emergence of data science will probably provide answers over time, but the discipline is in its early days and there aren’t yet many data scientists around. It will probably be well into the next decade before it matures, and some people are going to waste a lot of time and money in unproductive big data projects before then.
Which is why expectations should be kept in check, projects run on a small scale and not used for business-critical decisions until the techniques have been proven. Trial and error is inevitable in opening up any field of science, but data science is going to happen in the business world, not the controlled conditions of a laboratory, and it’s important that the errors are not too costly.
Business will benefit most from big data by allowing it evolve, not letting it loose in a big bang.
Mark Say is a UK based writer who covers the role of information management and technology in business. See www.marksay.co.uk