Data and Machine-Readable Format

Humans need to be pampered, data need to be processed.

I know that is a super-lame quote by me, but that is true; at least about the data. Data do need to be processed, and it is a lot convenient, accurate and cheaper if data is in machine readable format.

What is Machine-Readable Format

Data is in machine-readable format when it is in a form that can be read understood and processed by a computer. Let's understand this by an example. Suppose, you went to a restaurant, ordered some food and beer. Upon paying bill, guy at the restaurant billing desk printed and gave you a receipt like this and kept another copy of receipt with him

This receipt is in a human-readable format. Everyone understand the receipt, it serves it's purpose well, you and restaurant, both are happy. there are no worries. So good so far. 

But notice, this is not just a receipt. There is a lot of data on it. It has a date, it has a time, it has what products restaurant sold, it has the amount. Now imagine, restaurant owner plans to open a new branch of her restaurant and wants to know few things for the planning of new restaurant

  1. How much do we sell
  2. What products do we sell
  3. What time do we sell
  4. On what days of week do we sell
  5. What is the best price to sell
  6. What combos are sold best
  7. What kind of people do visit my restaurant
  8. Where should I open my restaurant

The first 6 questions can be answered straight forward from data on the receipt and last two answers can be fairly answered when added with some more external data like city population data etc. So, in search of answers, she asks the manager of the restaurant for all the information and to her disappointment, he comes up with bundles of a few thousand receipt restaurant has been storing. But wait! There is the problem. These are a lot of receipts. They are in human-readable format. It will take a lot of efforts for a human to read all those receipts. It will be even difficult if the information was to be processed by hand and come with answers.  Finally, all of those receipts were scanned by a computer, information printed on receipts was converted into the machine-readable format and analyzed by someone on a computer.

How convenient would that be, if those receipts were also stored somewhere in a machine-readable format. Maybe something like this:

[
 {
   "Particular":"Burger",
   "Amount":120
 },
 {
   "Particular":"Beer",
   "Amount":280
 },
 {
   "Particular":"Service Charge",
   "Amount":50
 },
 {
   "Particular":"Total",
   "Amount":450
 }
]

It's not really easy, and often impossible for computers to understand the language, format, and way we humans understand things. Therefore, data is stored in databases, computer systems and servers in the format a computer understands the best: machine-readable format. 

JSON, XML, Databases

These are a few formats to name computers understands best. The format used in above example is JSON. Other some very popular formats include XML and data stored in databases. Formats like JSON are efficient to store unstructured and semistructured. Relational databases are good at handling structured data that is in the tabular format.

Note: If you are reading this post as the part of series Introduction to Data, you do really not need to worry about all these technical terms. All you need to know is that data is required in machine-readable formats for all reasons mentioned in below paragraph

Why Data Needs to Be in Machine-Readable Format

Well, that is a stupid question, I'd say. We know, data needs to be in machine-readable format so that could be read by a machine (a computer that means). What is really worth thinking is WHY data needs to read by a machine. There are several good reasons. Here is a short list

  1. Because it is difficult, costly, inconvenient, and in a lot of cases impossible to read data by humans
  2. Humans make errors, machines (most often) don't
  3. Machines are good at sharing information with others than humans
  4. Computers need data to run applications properly
  5. Computers can identify hidden patterns in the data, that humans cannot see
  6. Computers can do complex manipulations on the data
  7. Computers are much less prone to lose information than humans
  8. Computer labour is cheaper than human labour
  9. Computer labour is faster than human labour

I can keep going on and on and on... But I am sure, you have understood what is a machine-readable format and why data needs to be in a format that can be read by a machine. I just wanted to give you an idea of that before we move on to the next post in series explaining How Data is Generated.


I hope this post was informative and you enjoyed reading it. Leave your comment or email me on ashksc[at]gmail[dot]com. You can also tweet me on @ashishyoungy


my twitter test test