I recevied a tweet from Future Brands stating that only 0.5% of the world’s data is being currently analyzed. However shocking it sounds, I think it is true. I also believe that despite all efforts this figure is going to decrease, despite the fact that humanity will be able to analyse than ever before. Simply, raw data explodes these days faster than our ability to analyze it.
Nevertheless, I do not think it is only the issue of shere data volumes. While there are arguments on the definition of Big Data, the 4 Vs as a trend are definitely present in today’s world and all 4 angles of it contribute heavily to our inability to catch up.
- Volume – this is the simplest to identify as issue, but actually without the other 3Vs this can be conqured as once you have the analysis methods in space it only depends on calculation capacity to scale the analysis, if done right.
- Variety – This aspect is probably causing one of the biggest scalability issues in data analysis. Each type of data requires you to change analytic or data processing methodology and increasing and instable variety does not help. You need to keep developing systems to match the variety of sources and formats available, something quite difficult to scale.
- Velocity – While a technical problem, managing real time and super fast information in high volumes, from an analytics perspective is less of an issue. How well you use velocity as an opportunity for real time analytics and actions though is a different piece of cake. Using old methods will get you analyze even fast incoming data with a delay, but will not enable you to gain additional benefits on speed to insight.
- Verocity – Quality seems to be the other major issue when it comes to how much data is being analyzed. Many aspects of data cleaning is difficult to automate especially when the type of data errors keep changing and increasing in variety. How we solve it these days is that we have the luxury of disregarding data due to volume, which is statistically sound but need to get us accept the new rules of big data: you will never get to 100% analyzability.