The term data has a longer history than computing, its first occurence is documented in 1646 where it represented quantities in mathematics, something that was “given” (from the latin word “do” i.e. “to give”. The term later changed its meaning to the result of a calculation or of an experiment.
Working with data happened long before the term was coined, beginning with counting stones where each stone represented an animal, the abacus and early machines such as Pascal’s calculator. However, not only calculations were accelerated by machines; also, work was automated: The Jacquard Machine accelerated the production of textiles using an early form of punched cards, and the Hollerith tabulating machine accelerated the processing of questionnaires during the 1890 US census. Konrad Zuse built one of the first computers using relays in order to compute statics.
The main theme here is that the faster computers became and the less expensive storage is, the more data is stored and processed. Add the internet and the increasing connectability of data sources plus the rise of sensors, and the whole world has become a pure data-producing environment. The existence of data alone, however, does not mean that it can be exploited to generate an advantage. The smarter the use of data, the bigger the business advantage.
Data is not information, and information is not knowledge. This is an important distinction because data is used to derive information, and in the best of all worlds, knowledge can be obtained. We will use a slightly different model of the data – information – knowledge triangle.
Big data is a buzz word without a generally accepted definition; some people understand it as data that is so complex or so massive that it is difficult to process it in realtime. Having said that, there are only very few examples where big data actually is big data.
Next: What is Data Science?