Home > Blog > What's the best way to learn Data Science as a beginner?

What's the best way to learn Data Science as a beginner?
June 17, 2020

Responsive image

Getting started with Data Science(DS), Machine Learning(ML), and Deep Learning(DL); This will help you to find your learning gaps.

Firstly, know the difference between all the data science disciplines.

Step 0: Learn Python, and be comfortable with Anaconda Environment.
Websites like w3schools, Javatpoint will help you learn python.
For installing Conda Environment on Windows.

Step 1: Learn the basics of Probability.
Concepts like Frequency, Combinatorics, Permutations and combinations, Bayesian Inference, Set theory, Discrete and Continuous Distributions (Poisson, Binomial, Standard, Students' T, Chi-Squared, Exponential, Logistic)
This YouTube playlist will help you learn the above concepts.

Step 2: Learn the basics of Statistics.
Data types, frequency distribution, mean, median, mode, histogram, skewness, variance, standard deviation, covariance, correlation, Inferential stats, central error, central limit theorem, confidence intervals, hypothesis testing, p-value.
Pick up a statistics book or watch videos on YouTube for each topic.

Step 3: Learn Advance Stats.
(Regression, Linear Regression, Logistic Regression, Cluster Analysis, Clustering, K-Means)
Again, Pick up a statistics book or watch videos on YouTube each topic.

Step 4: Solve practical problems on Stats. (ML) (steep learning curve!!!)
Learn how the packages (NumPy, pandas, Scikit-Learn, Matplotlib, Statsmodels) differ from each other, and what each package has to provide.
Try to solve fundamental problems related to advance stats(Regression, Linear Regression, Logistic Regression, Cluster Analysis, Clustering, K-Means) using the above-mentioned python packages.
You can find problems on my GitHub repo.

You have to learn the syntax of python packages simultaneously. Again, websites like Javatpoint will help. But, I recommend you to refer to the official documentation for all the python modules.
Here are the links to official documentation:
  1. Numpy
  2. Pandas
  3. Matplotlib
  4. Scikit Learn
  5. StatsModels
Google "stats problems" and you'll find a bunch of them or check my GitHub repo.

Good, now you have learnt the basics of Machine Learning. Machine Learning is *nothing* but statistics and estimations!!

Step 5: Mathematics.
Learn concepts like Matrices, Scalers, Vectors, Arrays, Tensors, operations on matrices (addition, subtraction, transpose, dot product).
In terms of programming, a tensor is no different than n-dimensional-array.

Step 6: Deep Learning. (steep learning curve!!!)
Before moving ahead, you need to know the types of Machine Learning.
You also need to visualise how a Neural Network works. Thanks to @3blue1brown for creating this wonderful video.
The next thing you need to do is search and learn for these topics:
  1. Objective Function
  2. L2-norm Loss
  3. Cross-Entropy
  4. Gradient Descent
Now you are ready to make your first deep learning model using NumPy. But, that's only for learning purposes. To solve real-world problems, we use packages like Tensorflow, Keras, Pytorch etcetera.

Now search and learn for these topics:
  1. What is a Layer?
  2. What is a Deep Net?
  3. Non-Linearities and their Purpose
  4. Activation Functions, Softmax Activation
  5. Backpropagation
  6. Overfitting and Underfitting
  7. Validation
  8. N-Fold Cross-Validation
  9. Early Stopping
  10. Momentum, Learning Rate Schedules
  11. Standardization
  12. Binary & One-Hot Encoding
  13. Adaptive Learning Rate Schedules (AdaGrad and RMSprop)
This YouTube playlist will help you learn the above-mentioned concepts.

Step 7: Learn Tensorflow.
Refer to the official documentation only.
Sklearn does not provide functions regarding Neural Networks. Sklearn is useful in preprocessing (i.e. clustering, random forests, etc.). Therefore, you must switch to higher level packages like TensorFlow or Pytorch. I recommend you to use TensorFlow as a beginner.
Fact: In 2017, Keras and TensorFlow were integrated; Keras is nothing but an interface for TensorFlow rather than a different library. Tensorflow2 is Keras because TF2 uses the syntax of Keras.

Step 8: MNIST dataset.
MNIST dataset is the "Hello World" of Deep Learning & Image recognition. A model you must try as a beginner. Yann Lecun is the creator of the dataset.
The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). Your goal is to write an algorithm that "identifies" the math digits.
You can find the dataset here.
Wiki

I have created a GitHub repo where you can find all my data-science-related .ipynb files. You will find it here.


#DataScience
#Machine-Learning-Tutorial
#Python




anonmap