What is Data Science?

Data Mining is the application of specific algorithms for extracting patterns from data

For decades, data mining was being done with Statistics. In the 1990s, when Computer Science was getting exponentially popular, people started doing Data Mining with Computer Science.

Data Mining + Computer Science = Data Science

Data Science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract insights from data.

Fun fact: IBM pioneered the first relational database

There are three main career specializations within the Data Science field:

  1. Data Engineer
  2. Data Scientist
  3. Machine Learning Expert

The Data Engineer comes first in the process of creating value from data. It is the Data Engineer’s job to collect and store data. Then comes the Data Scientist; it is his/her job to take this data, clean it and explore/visualize it (w/ statistics, graphs, charts, etc.). After that, the Machine Learning Expert can apply intelligent algorithms to the data in order to extract insights from it.

Data Science Workflow

Data Scientist’s take a systematic approach to answering questions with data. It usually follows this pattern:

  1. Formulate Question
    • clear; scientifically testable
  2. Gather Data
  3. Clean Data
    • remove missing, incomplete, inaccurate data
  4. Explore & Visualize
    • helps to better understand the data
  5. Train Algorithm
  6. Evaluate
    • did the results answer our question?