Home >
Blog > What's the best way to learn Data Science as a beginner?
What's the best way to learn Data Science as a beginner?
June 17, 2020
Getting started with Data Science(DS), Machine Learning(ML), and Deep Learning(DL); This will help you to find your learning gaps.
Firstly, know the
difference between all the data science disciplines.
Step 0: Learn Python, and be comfortable with Anaconda Environment.
Websites like
w3schools,
Javatpoint will help you learn python.
For installing
Conda Environment on Windows.
Step 1: Learn the basics of Probability.
Concepts like Frequency, Combinatorics, Permutations and combinations, Bayesian Inference, Set theory, Discrete and
Continuous Distributions (Poisson, Binomial, Standard, Students' T, Chi-Squared, Exponential, Logistic)
This
YouTube playlist will help you learn the above concepts.
Step 2: Learn the basics of Statistics.
Data types, frequency distribution, mean, median, mode, histogram, skewness, variance, standard deviation, covariance, correlation, Inferential stats, central error, central limit theorem,
confidence intervals, hypothesis testing, p-value.
Pick up a statistics book or watch videos on YouTube for each topic.
Step 3: Learn Advance Stats.
(Regression, Linear Regression, Logistic Regression, Cluster Analysis, Clustering, K-Means)
Again, Pick up a statistics book or watch videos on YouTube each topic.
Step 4: Solve practical problems on Stats. (ML) (steep learning curve!!!)
Learn how the packages (NumPy, pandas, Scikit-Learn, Matplotlib, Statsmodels) differ from each other, and what each package has to provide.
Try to solve fundamental problems related to advance stats(Regression, Linear Regression, Logistic Regression, Cluster Analysis, Clustering, K-Means) using the above-mentioned python packages.
You can find problems on my
GitHub repo.
You have to learn the syntax of python packages simultaneously. Again, websites like Javatpoint will help. But, I recommend you to refer to the official documentation for all the python modules.
Here are the links to official documentation:
- Numpy
- Pandas
- Matplotlib
- Scikit Learn
- StatsModels
Google "stats problems" and you'll find a bunch of them or check my
GitHub repo.
Good, now you have learnt the basics of Machine Learning. Machine Learning is *nothing* but statistics and estimations!!
Step 5: Mathematics.
Learn concepts like Matrices, Scalers, Vectors, Arrays, Tensors, operations on matrices (addition, subtraction, transpose, dot product).
In terms of programming, a tensor is no different than n-dimensional-array.
Step 6: Deep Learning. (steep learning curve!!!)
Before moving ahead, you need to know the types of
Machine Learning.
You also need to visualise
how a Neural Network works.
Thanks to
@3blue1brown for creating this wonderful video.
The next thing you need to do is search and learn for these topics:
- Objective Function
- L2-norm Loss
- Cross-Entropy
- Gradient Descent
Now you are ready to make your
first deep learning model using NumPy.
But, that's only for learning purposes. To solve real-world problems, we use packages like Tensorflow, Keras, Pytorch etcetera.
Now search and learn for these topics:
- What is a Layer?
- What is a Deep Net?
- Non-Linearities and their Purpose
- Activation Functions, Softmax Activation
- Backpropagation
- Overfitting and Underfitting
- Validation
- N-Fold Cross-Validation
- Early Stopping
- Momentum, Learning Rate Schedules
- Standardization
- Binary & One-Hot Encoding
- Adaptive Learning Rate Schedules (AdaGrad and RMSprop)
This
YouTube playlist
will help you learn the above-mentioned concepts.
Step 7: Learn Tensorflow.
Refer to the
official documentation only.
Sklearn does not provide functions regarding Neural Networks. Sklearn is useful in preprocessing (i.e. clustering, random forests, etc.).
Therefore, you must switch to higher level packages like TensorFlow or Pytorch. I recommend you to use TensorFlow as a beginner.
Fact: In 2017, Keras and TensorFlow were integrated; Keras is nothing but an interface for TensorFlow rather than a different library. Tensorflow2 is Keras because TF2 uses the syntax of Keras.
Step 8: MNIST dataset.
MNIST dataset is the "Hello World" of Deep Learning & Image recognition. A model you must try as a beginner.
Yann Lecun is the creator of the dataset.
The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). Your goal is to write an algorithm that "identifies" the math digits.
You can find the dataset
here.
Wiki
I have created a GitHub repo where you can find all my data-science-related .ipynb files.
You will find it
here.
#DataScience
#Machine-Learning-Tutorial
#Python