# Machine Learning CheatSheet | 机器学习概念、基础理论与常用算法速览

• 加强概率与统计的基础课程，建议采用莫里斯 · 德格鲁特 (Morris H.DeGroot) 和马克 · 舍维什 (Mark J.Schervish) 合著的第四版《概率论与数理统计》(Probability and Statistics) 为教材。

• 在线性代数课程里，加强矩阵分析的内容。教材建议使用吉尔伯特 · 斯特朗 (Gilbert Strang) 的《线性代数导论》(Introduction to Linear Algebra)。吉尔伯特 · 斯特朗在麻省理工学院一直讲述线性代数，他的网上视频课程堪称经典。后续建议开设矩阵计算，采用特雷费森 · 劳埃德 (Trefethen N.Lloyd) 和戴维 · 鲍 (David Bau lll) 著作的《数值线性代数》(Numerical Linear Algebra) 为教科书。

• 开设机器学习课程。机器学习有许多经典的书籍，但大多不太适宜做本科生的教材。最近，麻省理工学院出版的约翰 · 凯莱赫 (John D.Kelleher) 和布瑞恩 · 麦克 · 纳米 (Brian Mac Namee) 等人著作的《机器学习基础之预测数据分析》(Fundamentals of Machine Learning for Predictive Data Analytics )，或者安得烈 · 韦伯 (Andrew R.Webb) 和基思 · 科普塞 (Keith D.Copsey) 合著的第三版《统计模式识别》(Statistical Pattern Recognition ) 比较适合作为本科生的教科书。同时建议课程设置实践环节，让学生尝试将机器学习方法应用到某些特定问题中。

• 开设数值优化课程，建议参考教材乔治 · 诺塞达尔 (Jorge Nocedal) 和史蒂芬 · 赖特 (Stephen J.Wright) 的第二版《数值优化》(Numerical Optimization )，或者开设数值分析，建议采用蒂莫西 · 索尔的《数值分析》(Numerical Analysis) 为教材。

• 加强算法课程，增加高级算法，比如随机算法，参考教材是迈克尔 · 米曾马克 (Michael Mitzenmacher) 和伊莱 · 阿普法 (Eli Upfal) 的《概率与计算: 随机算法与概率分析》(Probability and Computing: Randomized Algorithms and Probabilistic Analysis)。

• 在程序设计方面，增加或加强并行计算的内容。特别是在深度学习技术的执行中，通常需要 GPU 加速，可以使用戴维 · 柯克 (David B.Kirk) 和胡文美 (Wen-mei W.Hwu) 的教材 《大规模并行处理器编程实战》(第二版)(Programming Massively Parallel Processors:A Hands-on Approach,Second Edition )；另外，还可以参考优达学城 (Udacity) 上英伟达 (Nvidia) 讲解 CUDA 计算的公开课。

# Classification: 分类

                      Condition: A             Not A​  Test says “A”       True positive (TP)   |   False positive (FP)                      ----------------------------------  Test says “Not A”   False negative (FN)  |    True negative (TN)
• TP-- 将正类预测为正类数；

• FN-- 将正类预测为负类数；

• FP-- 将负类预测为正类数；

• TN-- 将负类预测为负类数；

• 准确率(Precision，$P$)定义为被正确预测为正例的数目占所有被预测为正例的数目的比重: $P = \frac{T_p}{T_p + F_p}$

• 召回率(Recall，$R$)定义为被正确预测为正例的数目占所有实际正例数目的比重: $R = \frac{T_p}{T_p + F_n}$

• F1 则是相对综合的评价值，定义为了准确率与召回率的调和平均数:

• True Positives(TP)：即实际为正例且被分类器划分为正例的样本数；

• False Positives(FP)：即实际为负例但被分类器划分为正例的样本数；

• True Negatives(TN)：即实际为负例且被分类器划分为负例的样本数；

• False Negatives(FN)：即实际为负例但被分类器划分为负例的样本数；

$accuracy = \frac{TP+TN}{P+N}$

$recall = \frac{TP}{TP+FN}$

# Algorithms: 算法

### Grouped By Learning Style(根据学习风格分类)

Supervised Learning

Unsupervised Learning

Semi-Supervised Learning

### Grouped By Similarity(根据算法相似度分类)

Regression Algorithms

Regression is concerned with modelling the relationship between variables that is iteratively refined using a measure of error in the predictions made by the model.

Regression methods are a workhorse of statistics and have been cooped into statistical machine learning. This may be confusing because we can use regression to refer to the class of problem and the class of algorithm. Really, regression is a process.

The most popular regression algorithms are:

• Ordinary Least Squares Regression (OLSR)

• Linear Regression

• Logistic Regression

• Stepwise Regression

• Multivariate Adaptive Regression Splines (MARS)

• Locally Estimated Scatterplot Smoothing (LOESS)

Instance-based Algorithms

Instance based learning model a decision problem with instances or examples of training data that are deemed important or required to the model.

Such methods typically build up a database of example data and compare new data to the database using a similarity measure in order to find the best match and make a prediction. For this reason, instance-based methods are also called winner-take-all methods and memory-based learning. Focus is put on representation of the stored instances and similarity measures used between instances.

The most popular instance-based algorithms are:

• k-Nearest Neighbour (kNN)

• Learning Vector Quantization (LVQ)

• Self-Organizing Map (SOM)

• Locally Weighted Learning (LWL)

Regularization Algorithms

An extension made to another method (typically regression methods) that penalizes models based on their complexity, favoring simpler models that are also better at generalizing.

I have listed regularization algorithms separately here because they are popular, powerful and generally simple modifications made to other methods.

The most popular regularization algorithms are:

• Ridge Regression

• Least Absolute Shrinkage and Selection Operator (LASSO)

• Elastic Net

• Least-Angle Regression (LARS)

Decision Tree Algorithms

Decision tree methods construct a model of decisions made based on actual values of attributes in the data.

Decisions fork in tree structures until a prediction decision is made for a given record. Decision trees are trained on data for classification and regression problems. Decision trees are often fast and accurate and a big favorite in machine learning.

The most popular decision tree algorithms are:

• Classification and Regression Tree (CART)

• Iterative Dichotomiser 3 (ID3)

• C4.5 and C5.0 (different versions of a powerful approach)

• Chi-squared Automatic Interaction Detection (CHAID)

• Decision Stump

• M5

• Conditional Decision Trees

Bayesian Algorithms

Bayesian methods are those that are explicitly apply Bayes’ Theorem for problems such as classification and regression.

The most popular Bayesian algorithms are:

• Naive Bayes

• Gaussian Naive Bayes

• Multinomial Naive Bayes

• Averaged One-Dependence Estimators (AODE)

• Bayesian Belief Network (BBN)

• Bayesian Network (BN)

Clustering Algorithms

Clustering, like regression describes the class of problem and the class of methods.

Clustering methods are typically organized by the modelling approaches such as centroid-based and hierarchal. All methods are concerned with using the inherent structures in the data to best organize the data into groups of maximum commonality.

The most popular clustering algorithms are:

• k-Means

• k-Medians

• Expectation Maximisation (EM)

• Hierarchical Clustering

Artificial Neural Network Algorithms

Artificial Neural Networks are models that are inspired by the structure and/or function of biological neural networks.

They are a class of pattern matching that are commonly used for regression and classification problems but are really an enormous subfield comprised of hundreds of algorithms and variations for all manner of problem types.

Note that I have separated out Deep Learning from neural networks because of the massive growth and popularity in the field. Here we are concerned with the more classical methods.

The most popular artificial neural network algorithms are:

• Perceptron

• Back-Propagation

• Hopfield Network

• Radial Basis Function Network (RBFN)

Deep Learning Algorithms

Deep Learning methods are a modern update to Artificial Neural Networks that exploit abundant cheap computation.

They are concerned with building much larger and more complex neural networks, and as commented above, many methods are concerned with semi-supervised learning problems where large datasets contain very little labelled data.

The most popular deep learning algorithms are:

• Deep Boltzmann Machine (DBM)

• Deep Belief Networks (DBN)

• Convolutional Neural Network (CNN)

• Stacked Auto-Encoders

Support Vector Machines

Ensemble Algorithms

Ensemble methods are models composed of multiple weaker models that are independently trained and whose predictions are combined in some way to make the overall prediction.

Much effort is put into what types of weak learners to combine and the ways in which to combine them. This is a very powerful class of techniques and as such is very popular.

• Boosting

• Bootstrapped Aggregation (Bagging)

• Stacked Generalization (blending)

• Gradient Boosting Machines (GBM)

• Gradient Boosted Regression Trees (GBRT)

• Random Forest

# 模型评估与正则化

## VC 维

VC 维是一类函数，描述的是这类函数能够把多少个样本的所有组合都划分开来。当你选定了一个模型以及它对应的特征之后，你是大概可以知道这组模型和特征的选择能够对多大的数据集进行分类的。此外，一类函数的 VC 维的大小，还可以反应出这类函数过拟合的可能性。