What is Data

Data has become so much generalized and overused term that when it comes to define the word it is better to define the word in some particular context. So, we will discuss the term in the context of analytics.

What is Data in Context of Analytics?

Data is the food and water of analytics. Data is what is fed in to get the insight out. There are many definitions of the data put out there. One of my most favorite definitions of the data is as Google defines it:

Facts and statistics collected together for reference or analysis.

Let’s break the definition down to get a deeper look:

Fact and Statistics: Yes, that’s what really data is: facts and statistics. They can be about anything. A ledger book consisting the day to day transaction at a local merchant shop in China makes a very good example of data. Even a simple ledger book may contain lots of facts about the customers of the merchant shop, amount, and date of the transactions, type of the transaction wheather it is a sales or receipt and so on.

Collected Together: All the facts and statistics only make the sense when they are collected together. Does it really make any sense to write the transaction on some small piece of paper and have a full drawer of such small pieces in her shop? The processing on the data gains some meaningful and confident knowledge only when there is sufficient raw data available to process on. For example: If we have a coin with us and flip is once, all we know is weather we got head or tails, which a single data point is for sure, but cannot be used for further processing however, if we have used the same coin for flipping many times in the past and have recorded outcome of each of the toss somewhere in the diary, we can use that together collected data to tell whether it is a fair coin or a biased one.

For Reference or Analysis: Collecting, storing and managing data does not come free and it is no point to have the data for no reason. If there exists some data, it should not exist in isolation. Recalling the earlier Chinese merchant example, if he uses paper-slips instead of the ledger, it will be very difficult for the merchant to search through each slip and find the correct one if a customer comes in the future for purchase return and the record is needed for reference. Even the often associated conjugate with the word ‘data’ is ‘analysis’ which itself implies that data exists there to be analyzed.

A Few Things about Data

  • Most of the times data is raw in nature, therefore, not very well processed in its initial form
  • By raw data, we mean that it is in the form it was captured and not of much use without any additional processing. Processing on the data is what mines the value out of it and we gain some knowledge or usability
  • When we say data in context of analytics, we almost always imply that the data is in machine readable format
  • Or if it is not in the machine readable format, at some point of time during processing, it will be converted into machine readable format so that a computer assisted processing can be run on the data
  • Data can be captured at any scale and by any relevant method. For example: Data can be captured manually by hand on the paper by a volunteer working under natural crisis in the South Asia at the live location or can be captured at very high speed in a very large volumes on the same natural crisis from the worldwide locations at the same time by automated servers of Twitter and other social network websites
  • Same data can be used for different purposes by different entities. For example The Red Cross Society can be using Twitter data to reach emergency supplies on the crisis site in timely manner whereas, the meteorology department might be using the same data to improve their near future weather forecast model and some other fundraising organization to raise the funds more efficiently to support the victims

Thus, data is everywhere floating around us, and found in all the forms, generated by all the systems, and used for all the purposes. And it is truly worth to explore more about it.

This post is the part of series An Introduction to the Data on my blogDoData School of Analytics.

I hope you enjoyed reading. Please let me know.

Image Source: Flickr