Useful Resources to Learn Industry Data Science and Machine Learning
I started working in Data Science after studying Economics and Mathematics. Apart from a few lines of R code I copy-pasted for my master’s thesis, I had no clue about Data Science. I learned everything by doing and studying on evenings and weekends.
Over three months, I studied Python and the mathematics behind Machine Learning for at least 15 hours a week on top of my full-time job.
This is a curated list of resources that I have been sharing with friends, co-workers, and students.
Note: Data Science is a wide field, so this is a non-exhaustive list. If other resources have helped you, please share them in the comment section!
Python for Data Science
When I had no idea where to start, a colleague shared a Udemy course by Jose Portilla: Python for Data Science and Machine Learning (link). This is a great introduction to the topic. It assumes little previous knowledge and covers the basics of both the theory of Machine Learning (see further sections) and of Python. If you were to only use one of the resources listed here, I would recommend this one.
Hands-On ML with Python by Aurélien Géron
If you are more of a book person, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Géron is a fantastic introduction to applied Machine Learning. It will walk you through the different steps in building a Machine Learning project in Python. Géron also covers a wide selection of Machine Learning models with clever visuals and clear explanations.
The Maths Behind Machine Learning
Introduction to Statistical Learning (ISL) is a freely available book covering the theory of Machine Learning, covering topics such as Model Evaluation, Selection and Regularisation. You can also get the print version here. The clear prose and explanations will equip you with the theoretical toolbox you need to understand most widely used Machine Learning models. This is the textbook used in the Python for Data Science course mentioned above.
Note: It uses R as the coding language, which I would not recommend if you are starting out. However, the code section is not central to the textbook.
SQL
To get a Data Science job in industry, you will most likely need SQL. Structured Query Language (SQL, pronounced “sequel” or “S, Q, L”) is widely used to query a range of data storage technologies.
To learn it, I would recommend the following:
- The Complete SQL Bootcamp Udemy course: another very good introductory course by Jose Portilla.
- A solid free option is the SQL tutorial website.
- SQL online games (yes, you read this correctly): sites like SQL Island (you may need to change the language) or SQL Murder Mystery are fun ways to experiment with the language.
- GPT-4/GenAI training: once you have a database like PostgreSQL installed on your computer (instructions in the Udemy course above), you can ask GPT-4 to give you SQL exercises. GPT-4 could even write the scripts to create the tables you need to complete the exercises and correct your mistakes.
By the end of your SQL training, you should aim for understanding the following concepts:
- Join types and self joins
- Group by and aggregations
- CTEs (Common Table Expressions)
- Window functions (row numbers, rolling averages, shift/lag)
Don’t worry if these sound like gibberish right now; they will become clearer as you progress on your journey.
Final Thoughts
Based on your existing knowledge or aspirations, you may want to skip or go deeper into certain areas. I hope that you will find these resources useful and wish you all the best on your Machine Learning journey!