I have come across an article on The Next Web about the possibilities of Big Data. I thought I would only look into it for 2 minutes, but got intriqued by the first question: “Who thinks Big Data is BS?”. Then the whole presentation turned into a great perspective on what Big Data actually means for us. It is demistifying the concept and puts it into perspective, clearly stating how it changes the way we solve problems today. Let me recap the key points besides giving you the video as well.
Backlash about Big Data is starting to happen
The first question of the video: who in this room thinks Big Data is BS? You can sense even from the fact that the question pops up that some level of rejection about the hype around Big Data is on the corner. It is a natural cycle of hype, but still I’m glad that we are starting to get to the point of breaking down the myth and actually starting to dig out what it is really useful for.
The scale have changed
Many people state that we are not doing many things differently. We are still processing data and we always had data at our hands, just simply not in a digital format. The phenomenon of Big Data is mostly about the fact that more and more of our data is being stored in a format which allows the processing of massive amounts of it. It seems at first that the only thing which changed is the scale at which we are doing the same things with data as we always did. Nevertheless having the scale change so dramatically opens up opportunities which were not there before.
Switch from quality to quantity
There is though one way in which the approach to data processing is changing with Big Data. Up the the boom of digitally stored and collected information each bit was valuable. Clear focus was put on the quality of data stored and structured to maximize the value which can be gained from it. Due to the above mentioned scale shift the focus has been changing to the quantity of data. We reached in many areas the critical mass of information to alow ourselves some “mess”.
With this much data we can afford errors and some quality issues, because the error they represent is insignificant, thus it is more efficient to collect even more data to supress these errors than to focus on correcting them.
Causality vs. Correlation
With limited amount data it was imperative to understand causalities in order to apply our learnings to actually solving problems or preventing them from happening. While the human mind is still geared toward this approach the volume of data enables us to relax this reflex.
Correlation will still not equal causality, but som many correlations can be drawn from this much information that we would be able to predict and describe situations even without understanding what is causing them. If you know in advance something is going to happen you do not need to understand the reason behind it to use the information. This use could be preventing something, but also enhancing, magnifying the effect.
A key skill is to arise through this: people who can let go of finding reasons and utilize the power of the web of correlations. Seeing the problems as a set of connecting trends instead of cause – consequence.
This approach takes us to the real opportunity: “datafying” the problem. That is to turn a problem into a mathematical, statistical challenge and solve it through the sheer amount of data and connections within this data.
An excellent example of this is Google Translate. The issue was that so much learning and understanding needs to go into having a human translate from even one language to another that to simply program it as a set of rules is impossible. Nevertheless Google Translate, even if not perfect, does a pretty good job of doing so. It is possible only because we learnt to collect and store the massive amount of data being channelled through it by people using the service. Then correlations are being drawn and refined after each and every translation (and correction of results).
Google Translate will never give you the answer: why something is to be translated like this. But due to the massive web of data and correlations it does not have to.
This is the real possibility in Big Data: solving problems before understanding them fully by translating real world issues into statistical challenges.
I very much support the perspective that Big Data is not the first answer to everything, but it is clear that it opens horizons never encountered before.
Also the always mentioned skill gap is very visible from this videos, because to make it work you really need to have 2 kinds of people:
- Those who can solve problems through statistical correlations and probabilities, not only through cause-consequence connections
- Subject matter experts (be it business, humanitarian aid, social scientist, etc.) who can build the bridge and translate problems into analytical situations
My humble opinion is that the second one will be the hardest, though that does not mean the first one is easy. One more reason to start small and simple and build your skills to be one of these people.
Last, but not least, do check out the original video for more insights. Also I would be glad if you could share your perspective with me in the comments.