Gist:
This blog describes the importance of data and big data.
The Idea:
As programmers and developers, we always focus on developing a great app that could be useful worldwide. So, what is data and why is it more important than your app itself?
Explanation:
According to wikipedia, data are values of qualitative or quantitative variables, belonging to a set of items. In other words, a datum, in its atomic form, is an entity that represents something. It is atomic and idempotent in nature and does not care about how it will be used. It stands for itself and nothing more. An information, on the other hand, is processed data. The same data could be processed in different ways to represent different information. The data processor that sits between your data and your information is your application. Your application can ideally only be as good as your data.
To understand this, lets take an example of a foodie app. The first thing that comes to our mind, in terms of dataset, is restaurants and their menu. If we proceed with only this dataset to build our application, our app will of course be limited to data listing and minimal search capability. When time comes to introduce more features, all you can do is add social interactions, ratings, and a few more. In other words, it introduces software entropy or software bloat, whatever the cool kids call it these days.
But with a different approach, what if we spend a good amount of time collecting the data and understand the knowledge base of the domain itself? My approach would be to first understand what the domain actually is about and how are the users related to the domain itself and not with my application goals. Remember, applications are just to process the data into information. The first thing that hits my cerebral cortex, when i think of this application, is food. Not restaurants, not their menu, but food items. Now think about it. There are hundreds of thousands of food items edible enough that are being consumed all over the world. Each food item could be considered as an atom, its sub atomic particles being the ingredients, we could even list all of the ingredients edible enough in the world. Now, try running a combination (maybe, even permutation) on all of the ingredients and suddenly you have tens of thousands of food items that were never given a possible thought!! Of course, you would find combinations that dont make any sense and seem horrifying but after curation, you will have a good list of all possible food items. We can easily assume that the core relation between the user and food are the food items. Now, we could go on improving the data by adding more attributes like its calorie value, climate conditions, availability and locations due to climatic conditions, cost of production, even chemical bonding, if you are crazy enough!! And then you have a good dataset to work on.
Now, although the data is elaborate enough for you to start with, it has no importance to the user since their goal is to find food that they like and is available nearby. Here comes the data processing layer , so called an application, that process the data into an information which can be easily consumed by the user. Now you may ask, why do all of this to achieve the same goal as the previous example? You are not looking at the dataset carefully would be my answer. With this, you have an opportunity to build innumerable applications over the same dataset. For eg. a app that would recommend the chefs what new combinations they could try cooking, an app that would help dieters find food that they like and at the same time control calorie intake, mood based and weather based food recommendation, and many more. Combine the data with the available tools of machine learning, information retrieval, data mining, and in general artificial intelligence and you suddenly have a super human butler, chef, foodie friend, etc. Above all, even if the dataset grows to several petabytes, you still wont feel the bloat since you know its atomic state.
At this point, you could then start listing all of the restaurants and their menu, and then map these lists to your food data set. Your core remains the same, independent of whatever you do with your application. To add user comments and other social interactions, you could now create profiles for each ingredients even!! People would suddenly start learning a lot about the different ingredients they use, reason of its market price, health issues, etc. Same could be done with the food and your other entities like restaurants.
As far as big data is concerned, this is one way of looking at it from what I understand.
Conclusion:
There's nothing new about what I just described above. If you can recall Object Oriented Programming, it ideally is the same thing. How objects are supposedly independent of each other and the only interaction between them being an interface layer, maybe even adapters, if they are not related at all. Each object is described by n number of parameters or attributes that describe the object.
Moral:
Put in more effort to fine-tune your data rather than fine-tuning your app.
This blog describes the importance of data and big data.
The Idea:
As programmers and developers, we always focus on developing a great app that could be useful worldwide. So, what is data and why is it more important than your app itself?
Explanation:
According to wikipedia, data are values of qualitative or quantitative variables, belonging to a set of items. In other words, a datum, in its atomic form, is an entity that represents something. It is atomic and idempotent in nature and does not care about how it will be used. It stands for itself and nothing more. An information, on the other hand, is processed data. The same data could be processed in different ways to represent different information. The data processor that sits between your data and your information is your application. Your application can ideally only be as good as your data.
To understand this, lets take an example of a foodie app. The first thing that comes to our mind, in terms of dataset, is restaurants and their menu. If we proceed with only this dataset to build our application, our app will of course be limited to data listing and minimal search capability. When time comes to introduce more features, all you can do is add social interactions, ratings, and a few more. In other words, it introduces software entropy or software bloat, whatever the cool kids call it these days.
But with a different approach, what if we spend a good amount of time collecting the data and understand the knowledge base of the domain itself? My approach would be to first understand what the domain actually is about and how are the users related to the domain itself and not with my application goals. Remember, applications are just to process the data into information. The first thing that hits my cerebral cortex, when i think of this application, is food. Not restaurants, not their menu, but food items. Now think about it. There are hundreds of thousands of food items edible enough that are being consumed all over the world. Each food item could be considered as an atom, its sub atomic particles being the ingredients, we could even list all of the ingredients edible enough in the world. Now, try running a combination (maybe, even permutation) on all of the ingredients and suddenly you have tens of thousands of food items that were never given a possible thought!! Of course, you would find combinations that dont make any sense and seem horrifying but after curation, you will have a good list of all possible food items. We can easily assume that the core relation between the user and food are the food items. Now, we could go on improving the data by adding more attributes like its calorie value, climate conditions, availability and locations due to climatic conditions, cost of production, even chemical bonding, if you are crazy enough!! And then you have a good dataset to work on.
Now, although the data is elaborate enough for you to start with, it has no importance to the user since their goal is to find food that they like and is available nearby. Here comes the data processing layer , so called an application, that process the data into an information which can be easily consumed by the user. Now you may ask, why do all of this to achieve the same goal as the previous example? You are not looking at the dataset carefully would be my answer. With this, you have an opportunity to build innumerable applications over the same dataset. For eg. a app that would recommend the chefs what new combinations they could try cooking, an app that would help dieters find food that they like and at the same time control calorie intake, mood based and weather based food recommendation, and many more. Combine the data with the available tools of machine learning, information retrieval, data mining, and in general artificial intelligence and you suddenly have a super human butler, chef, foodie friend, etc. Above all, even if the dataset grows to several petabytes, you still wont feel the bloat since you know its atomic state.
At this point, you could then start listing all of the restaurants and their menu, and then map these lists to your food data set. Your core remains the same, independent of whatever you do with your application. To add user comments and other social interactions, you could now create profiles for each ingredients even!! People would suddenly start learning a lot about the different ingredients they use, reason of its market price, health issues, etc. Same could be done with the food and your other entities like restaurants.
As far as big data is concerned, this is one way of looking at it from what I understand.
Conclusion:
There's nothing new about what I just described above. If you can recall Object Oriented Programming, it ideally is the same thing. How objects are supposedly independent of each other and the only interaction between them being an interface layer, maybe even adapters, if they are not related at all. Each object is described by n number of parameters or attributes that describe the object.
Moral:
Put in more effort to fine-tune your data rather than fine-tuning your app.