Vraj Kotwala

Home > Blog > What's the best way to learn Data Science as a beginner?

What's the best way to learn Data Science as a beginner?

June 17, 2020

Responsive image

Getting started with Data Science(DS), Machine Learning(ML), and Deep Learning(DL); This will help you to find your learning gaps.

Firstly, know the difference between all the data science disciplines.

Step 0: Learn Python, and be comfortable with Anaconda Environment.
Websites like w3schools, Javatpoint will help you learn python.
For installing Conda Environment on Windows.

Step 1: Learn the basics of Probability.
Concepts like Frequency, Combinatorics, Permutations and combinations, Bayesian Inference, Set theory, Discrete and Continuous Distributions (Poisson, Binomial, Standard, Students' T, Chi-Squared, Exponential, Logistic)
This YouTube playlist will help you learn the above concepts.

Step 2: Learn the basics of Statistics.
Data types, frequency distribution, mean, median, mode, histogram, skewness, variance, standard deviation, covariance, correlation, Inferential stats, central error, central limit theorem, confidence intervals, hypothesis testing, p-value.
Pick up a statistics book or watch videos on YouTube for each topic.

Step 3: Learn Advance Stats.
(Regression, Linear Regression, Logistic Regression, Cluster Analysis, Clustering, K-Means)
Again, Pick up a statistics book or watch videos on YouTube each topic.

Step 4: Solve practical problems on Stats. (ML) (steep learning curve!!!)
Learn how the packages (NumPy, pandas, Scikit-Learn, Matplotlib, Statsmodels) differ from each other, and what each package has to provide.
Try to solve fundamental problems related to advance stats(Regression, Linear Regression, Logistic Regression, Cluster Analysis, Clustering, K-Means) using the above-mentioned python packages.
You can find problems on my GitHub repo.

You have to learn the syntax of python packages simultaneously. Again, websites like Javatpoint will help. But, I recommend you to refer to the official documentation for all the python modules.
Here are the links to official documentation:

Google "stats problems" and you'll find a bunch of them or check my GitHub repo.

Good, now you have learnt the basics of Machine Learning. Machine Learning is *nothing* but statistics and estimations!!

Step 5: Mathematics.
Learn concepts like Matrices, Scalers, Vectors, Arrays, Tensors, operations on matrices (addition, subtraction, transpose, dot product).
In terms of programming, a tensor is no different than n-dimensional-array.

Step 6: Deep Learning. (steep learning curve!!!)
Before moving ahead, you need to know the types of Machine Learning.
You also need to visualise how a Neural Network works. Thanks to @3blue1brown for creating this wonderful video.
The next thing you need to do is search and learn for these topics:

Objective Function
L2-norm Loss
Cross-Entropy
Gradient Descent

Now you are ready to make your first deep learning model using NumPy. But, that's only for learning purposes. To solve real-world problems, we use packages like Tensorflow, Keras, Pytorch etcetera.

Now search and learn for these topics:

What is a Layer?
What is a Deep Net?
Non-Linearities and their Purpose
Activation Functions, Softmax Activation
Backpropagation
Overfitting and Underfitting
Validation
N-Fold Cross-Validation
Early Stopping
Momentum, Learning Rate Schedules
Standardization
Binary & One-Hot Encoding
Adaptive Learning Rate Schedules (AdaGrad and RMSprop)

This YouTube playlist will help you learn the above-mentioned concepts.

Step 7: Learn Tensorflow.
Refer to the official documentation only.
Sklearn does not provide functions regarding Neural Networks. Sklearn is useful in preprocessing (i.e. clustering, random forests, etc.). Therefore, you must switch to higher level packages like TensorFlow or Pytorch. I recommend you to use TensorFlow as a beginner.
Fact: In 2017, Keras and TensorFlow were integrated; Keras is nothing but an interface for TensorFlow rather than a different library. Tensorflow2 is Keras because TF2 uses the syntax of Keras.

Step 8: MNIST dataset.
MNIST dataset is the "Hello World" of Deep Learning & Image recognition. A model you must try as a beginner. Yann Lecun is the creator of the dataset.
The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). Your goal is to write an algorithm that "identifies" the math digits.
You can find the dataset here.
Wiki

I have created a GitHub repo where you can find all my data-science-related .ipynb files. You will find it here.

#DataScience
#Machine-Learning-Tutorial
#Python