DoingData

View Original

Data, Information or Insight?

“But where is the data?”
“We need some more information.”
“I got the insight!”

These are some common phrases that can be heard in a data driven environment. Data! Information! Insight! Well, for a business user these all might be the same thing however, definitely not for an analyst.

Using vague terminology in our day to day lives is our tendency. And people often talk me about data, sometimes information and few are those who talk about insight. Sometimes they all refer to the same thing using these different terminologies and sometimes they mean entirely different things.
Even though from a definitive perspective it may not seem necessary to distinguish among these three, it is important to understand difference among data, information and insight from the point of view of a data analyst.

Data

As we already discussed, data is a collection of facts and statistics and is often in very raw form. Databases are one very good example. Raw data is much larger and detailed than what we want to see. We take some data out, process it and get some knowledge and make some discovery. Fundamentally, data is the input required to produce some knowledge out of it. Though in raw form, data may or may not be of much use, data is the fuel that powers the engine of knowledge and discovery.

Example: Here is the 1 week consolidated log-data of two users of a website. It records which user spent how much time in seconds on the website. It also provides us a column with information user accessed website from which IP address. For the sake of simplicity, we will keep our example limited to only two users

This is a hypothetical example and will not resemble any similarity with most of real world data-sets, but that's okay. I only want to convey a thought process in subsequent sections of the article.

Information

So what really these knowledge and discoveries are we are talking about in the context of data? I will talk about information now. Information is knowledge indeed. When we know something, we have information. The next question comes is how information is different from data. One most important difference is that information is in ready to consume form as opposed to data which is in raw form. Data can be aggregated, combined, filtered, sliced-diced, sorted to produce some very first information from it. Even some very basic arithmetic operations like addition and averages have the power to produce very informative numbers.

Example: Upon doing an analysis of log data in the above example, we get some information

  • An average user spends 11 minutes per day on the website
  • An average user accesses website from 4 different IP addresses in a week
  • An average user spends 50% more time on weekend than the weekday

Insight

IN(SIDE)+SIGHT

Behind the scene | What’s really going on | Not so obvious | Hidden pattern in the data | Seeing data from a different angle | Results of intense data exploration and discovery | Thinking before and after analyzing the data. These all are my own definition of insight.


Any information is just information until some action is not taken based on that knowledge. Actionable insight driven out from the data helps us do that. Insight is more than just summary; it is those hidden patterns in the data which can easily not be seen. I do not mean to say that complicated algorithms are necessary to bring insight out of the data. Classification, segmentation, correlation, outlier detection and deep-dive analysis are usual techniques to get insights from the data. 

Example: However if we look really closer into the data, we can get some more information

  • user_01 is hardly using ip_01 during weekends, but a lot during weekdays
  • user_02 is hardly using ip_08 during weekdays, but a lot during weekdays
  • user_02 is hardly using ip_07 during weekends, but a lot during weekdays

This leads us to gain a deeper insight into user behavior and makes up think are our user accessing websites from different systems and networks on different days? May be, on weekdays they have been browsing website in the office hours from office networks, and from home networks on the weekends. Even though this might be an obvious assumption already, but now gives us a good confirmation, or at least some intuition about how users might be accessing website.

Role of Data Visualization

I am emphasizing right from the beginning that the data visualization is the most important part of your data journey. Because sometimes, what is hidden in a large table of numbers, is easily detectable in in a data visualization. So visualization techniques come handy during the data discovery process and provide a new perspective to our data. And also, data visualization makes it simpler to present a complex insight to your audience who can then better grasp the insight and take decisions. We will discuss later on more about role of data visualization in the overall data analysis and consumption process.

Example: 

Isn't it a lot easier to not to miss some of those patterns in the data, we were talking about before. And throws more question at us than a simple table can do. And most importantly, everybody understands it.

I would love to hear from you. Express your thoughts in comments below or write me an email. I’ll write back.