Data science is going
to involve trial and error, and the costs of failure have to be contained
Listening to the presentations at the Information Age Data Leadership
conference yesterday, it was noticeable that the term ‘big data’ didn’t come up
with the frequency some of us expected. It was even treated a little disdain by
a couple of speakers, portrayed as a term with more hype than substance behind
it.
It reflected some of the lessons that emerged from the
conference: that data has to be properly prepared to obtain clear insights;
that it’s easier to prepare data held inside your enterprise; and that a lot of
big data, despite all of its promises, resides outside the enterprise. The hard
part in harnessing big data is not just to get at all that juicy information on
the outside, but to get into a shape from which you can produce something
worthwhile.
It was notable that in the stand out case study, on how Network
Rail is learning a stream of valuable lessons from its data, which came
predominantly from inside the organisation, making it much easier to manage. And
a strong impression to emerge from the day was that you can get the best
results in the short term by limiting your ambition, looking at what you have
and can reasonably use rather than making grand plans to tap into streams from
the outside world.
I’m not one to write off the potential of big data; it’s the
continuation of a trend of bringing together and analysing information that is
already producing plenty of real value for business. But it is probably being
talked up by its evangelists to the point where it will disappoint a lot of
expectations in the short term.
Harnessing all that unwieldy data from outside the
enterprise is going to be a massive task, made more difficult by the
unstructured nature of a lot of the information, and it will take a long time
for best practice to develop. The emergence of data science will probably
provide answers over time, but the discipline is in its early days and there
aren’t yet many data scientists around. It will probably be well into the next
decade before it matures, and some people are going to waste a lot of time and
money in unproductive big data projects before then.
Which is why expectations should be kept in check, projects
run on a small scale and not used for business-critical decisions until the
techniques have been proven. Trial and error is inevitable in opening up any
field of science, but data science is going to happen in the business world,
not the controlled conditions of a laboratory, and it’s important that the
errors are not too costly.
Business will benefit most from big data by allowing it
evolve, not letting it loose in a big bang.
Mark Say is a UK based
writer who covers the role of information management and technology in
business. See www.marksay.co.uk