Data Mining is the application of specific algorithms for extracting patterns from data
For decades, data mining was being done with Statistics. In the 1990s, when Computer Science was getting exponentially popular, people started doing Data Mining with Computer Science.
Data Mining + Computer Science = Data Science
Data Science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract insights from data.
Fun fact: IBM pioneered the first relational database
There are three main career specializations within the Data Science field:
- Data Engineer
- Data Scientist
- Machine Learning Expert
The Data Engineer comes first in the process of creating value from data. It is the Data Engineer’s job to collect and store data. Then comes the Data Scientist; it is his/her job to take this data, clean it and explore/visualize it (w/ statistics, graphs, charts, etc.). After that, the Machine Learning Expert can apply intelligent algorithms to the data in order to extract insights from it.
Data Science Workflow
Data Scientist’s take a systematic approach to answering questions with data. It usually follows this pattern:
- Formulate Question
- clear; scientifically testable
- Gather Data
- Clean Data
- remove missing, incomplete, inaccurate data
- Explore & Visualize
- helps to better understand the data
- Train Algorithm
- Evaluate
- did the results answer our question?