Preface
During the last semester(2024.09-2024.12), I took the Machine Learning Course at University of Geneva given by Prof. Sebastian ENGELKE as an auditor,which was opened for Statistics students, although I was just an auditor from other major, the quality of the lecture as well as the exercises really explained every basic concepts of different models in Machine Learning, and gave me a general picture of how Machine Learning works, instead of a black box like before.
From my understanding, Machine Learning is a very important concepts existing in every subjects, especially in Statistics and Data analysis. It is also an fundmental basis for Artifical Intelligence, you might already heared about it or even used it. And in this Blog, I’d like to write a short toturial to briefly introduce what is Machine Learning and What is the Steps of Constructing your Machine Learning Model based on the Course given by Prof. Sebastian, and also for me to better review it.
There are two main reference of this Course:
- 1: James, Witten, Hastie & Tibshirani. An Introduction to Statistical Learning, Springer
- 2: Hastie, Tibshirani & Friedman. The Elements of Statistical Learning, Springer.
Algorithm and Model
As the key element of Machine Learning, different architectures or algorithms play an key role in the development of Machine Learning. The most famous machine learning example is ChatGPT, GPT stands for Generative Pretrained Transformer, Transformer is the specific Architecture(Algorithm) that being used in this case, and GPT is an model based on transformer, while ChatGPT is a specialized model that can handle texts and interacting with users. To construct a machine learning model we need to realize the difference between Models and Architectures.
Here are concepts of these two terms:
Architecture(or Algorithms) usually describes the overall algorithm of an ML model and how it is laid out to solve a problem
Model is the applications of a specific machine learning architecture, which is specilzed to the problems.
Depending on unique requirement and the problems need to be solved, different method will be chosen to contruct the specific models to solve the problems. And it’s also important to understand and identify what types of mahcine learning algorithm are used in various models.
As shown in Figure 1, There are three main branches of machine learning algorithms: Unsupervised Learning, Supervised Learning and Reinforcement learning.
Fig 1. Machine Learning Algorithms and examples
Unsupervised learning algorithm is trained on unlabeled data. The goal of using this type of algorithm is often to discover hidden patterns, structures and relationships in the data without and “correct answer” provided. While in Supervised Learning, the algorithm is trained on labelled data, each training sample include input and correspond “correct answer”(output), Reinforcement learning is different from the other two, it learns from interacting with its environment and receive reward or penalties for its actions.
Generally speaking, Supervised learning learns how to map output to given input, and Unsupervised learning learns to identify the possible output from the input.
As the basiss of Mahicne learning, if we want to do any research with machine learning tools, understand different architectures can help us choose the most suitable one for different problems, and have an idea of how to tune their parameters.
In the course given by Prof. Sebastian, we mainly focused on the Supervised learning, and also in this blog, we mainly talk about the algorithms of supervised learning and the steps of constructing them. There are two main branches in supervised learning, Classification and Regression, as shown in Figure 2, these two branches main share the some algorithms or have a unique algorithm for itself.
Fig 2. Two main branches of Supervised Learning
Different Algorithms of Regression and Classification
As we mentioned above, there are two branches of supervised learning: Regression and Classsification, or we should say
As listed in the following table,
| Algorithm | Model Parameters | Tuning Parameters | How to reach High complexicity | Method of Fitting |
|---|---|---|---|---|
| Linear Regression | $ \vec{\beta} = (\beta_0, \beta_1….\beta_p) \epsilon R $ | Large $p$ | Least Square with Gradient descent | |
| KNN | $k=1,2,3,….,n$ | Small $k$ | ||
| Lasso/Ridge Regression | $ \vec{\beta} = (\beta_0, \beta_1….\beta_p) \epsilon R $ | $ \lambda \geq 0$ | Large $p$ & small $\lambda$ | Least Square with Gradient descent |