By Iliya Valchanov, 365 data Science.
Data science. It is this buzz word that many have tried to define with varying success.
Thinking about this problem makes one go through all these other fields related to data science – business analytics, data analytics, business intelligence, advanced analytics, machine learning, and ultimately AI.
Going about this issue we at 365 data Science realized that ‘absolute definitions’ of data science require a lot of ‘data science’ background to be understood, which is a recursive problem... The hypothesis here is that statisticians or programmers understand what data science is much easier than, say historians or linguists, as the former have been exposed to data science in one form or another.
This brings us to the idea that a ‘relative definition’ of data science may be much more useful and here’s our proposed scheme.
It is an Euler diagram depicting all the above-mentioned fields. Each color represents a different field (mixed colors indicate intersections), there is a timeline, and example uses.
Fig. 1: the position, size, and colour of the rectangles show conceptual similarities and differences, not complexity
Since all this information can be overwhelming let’s start from the beginning.
To avoid oversimplifying the issue, we will assume that the word ‘business’ needs no definition. Some examples of business activities are:
- Business case studies
- Qualitative analytics
- Preliminary data report(ing)
- Reporting with visuals
- Creating dashboards
- Sales forecasting
They are sitting comfortably in the blue rectangle.
Here’s where the actual Euler diagram starts. If we include Data into the picture, we will have the two big fields and their intersection, or a total of three sections.
Given our initial term choice, we can move the last four terms in the intersection of Business and Data, currently represented as the purple area in the picture. That’s because ‘Preliminary data report(ing)’, ‘Reporting with visuals’, ‘Creating dashboards’, and ‘Sales Forecasting’ are all data-driven business activities.
They can be opposed to ‘Business case studies’ and ‘Qualitative analytics’ as those are within Business but are based on past knowledge, experience, and behavior. All are important but as you will see shortly –not really data science.
Analysis vs Analytics
Analysis refers to the process of segmenting your problem into easily digestible chunks that you can study individually and examine how they relate to each other.
Analytics, on the other hand, is the application of logical and computational reasoning to the component parts obtained in an analysis. And in doing that one is looking for patterns and often exploring what she could do with them in the future.
So instead of Business and Data, we should better use Business Analytics and Data Analytics.
Before going any further, let’s introduce a timeline as it turns out to be crucial for subsequent segmentation.
We will employ three states – past, present, and future.
There will be a line that crosses the diagram indicating the present moment for any analytics problem. Everything on the left will refer to analytics looking backward, to the past that is. All that is on the right will refer to predictive analytics.
The last two sections of our analysis got the picture to this point.
‘Sales forecasting’ moved to the right, as its name implies a forward-looking analytics process. Broadly, ‘Qualitative analytics’ is the use of your intuition and experience to plan your next move – thus another term looking into the future.
For most readers that’s the pinnacle of the article. Data Science is a field that can’t do without data. Therefore, it is completely within the realm of Data Analytics.
What about its relationship to Business Analytics?
Well, it turns out that all that is Data Analytics and Business Analytics at the same time is indeed Data Science.
With one note, though. There exist data science processes that are not directly and immediately business analytics but are data analytics. For instance, ‘Optimization of Drilling Operations’ requires data science tools and techniques. data scientists may well do that on a daily basis. However, while in the domain of ‘oil business’ we can’t really say that it is directly related toBusiness Analytics.
Stepping on the ‘relative definition’ notion, to illustrate these points better, ‘Digital signal processing’ is an example of an activity that is part of Data Analytics, but is not Data Science, nor Business Analytics. Data, programing, and mathematics come into play, but not in the same way we would employ them in Data Science.
For consistency let’s finish this off with the timeline – Data scienceis both on the left and on the right of the line (as the others).
Which brings us to the question: Is there a field which is only past-oriented?
Business Intelligence (BI) is the process of analyzing and reporting historical data.
Is it past-oriented? Not necessarily, but there are no predictive analytics involved. Regression, classification, and all the other typically predictive methods are a part of Data Science, but not BI. That’s where the line is drawn.
Moreover, Business Intelligence is entirely a subset of Data Science.Thus, when one is dealing with descriptive statistics, reporting or visualization of past events, she is doing both BI and data science.
Machine Learning and AI
Here the definitions are going to be a bit vaguer, as just explaining ML and AI will result in losing the focus of this article. Plus, there are many resources on what machine learning is, especially, here, on KDnuggets.
Artificial intelligence (AI)is any form of intelligence shown by a machine, which resembles natural (human) intelligence such as planning, learning, problem solving, etc.
Machine learning (ML) is the ability of machines to predict outcomes without being explicitly programmed to do so.
ML is an approach to AI, however, the two are often confused as ML is actually the only viable path to AI that we, as humans, have developed so far. Therefore, when we are talking about real-life applications of AI that companies are using, we are actually referring to ML.
In our diagram the two terms fit in the following way.
Machine Learning is entirely within Data Analytics, as it cannot be performed without data. It also overlaps with Data Science, as it is one of the best tools in the data scientist’s arsenal. Finally, it also takes part in BI, as long as there are no predictive analytics involved.
Instances of ML in Data Scienceare ‘Client Retention’, ‘Fraud Prevention’, and ‘Creating real-time dashboards’ (also a part of BI). Prominent examples include ‘Speech recognition’ and ‘Image recognition’. Both can be considered inside or outside Data Science, that’s why we’ve placed them on the border.
To exhaust all relationships, ML is entirely within AI, but AI itself has subfields which are unrelated even with business- and data analytics! One instance we’ve chosen is ‘Symbolic reasoning’.
The final field in our analysis is Advanced Analytics. It is not a data science term, but rather a marketing one. It is used to describe ‘not-so-easy-to-handle’ analytics. Subjectively, for a beginner everything in this diagram is advanced. While not the best term, it is definitely useful to aggregate all these ‘proper’ terms that we used throughout the article.
Removing AI and adding Advanced Analytics, that’s what we get.
In the lingo of this article, our analysis of advanced analytics is complete.
Here is the animated gif that compares these definitions.
Bio: Iliya Valchanov is a Co-founder at 365 data Science.