# AI CheatSheet | AI

AI 从感知层大致分为两大块，一块是计算机视觉，这一块已经比较成熟，无论是人脸识别、物体检测、运动检测都已经能用于实际场景中。另一块则是 NLP，虽然微软、Google 等宣称它们的 AI 翻译准确率已经极高，但实际上仍然不太好用，而多轮会话的问题没有解决，Chatbot 还是难以与人展开正常对话。

# 知识领域

## Machine Learning | 机器学习

Maximum Objective Function

Data Collection

Data Preparation

Build Model

Train Model

Evaluation

Tune

Predict

## Deep Learning | 深度学习

Traditional statistical models do very well on structured data, i.e. tabular data, but have notoriously struggled with unstructured data like images, audio, and natural language. Neural networks that contain many layers of neurons embody the research that is popularly called Deep Learning. The key insight and property of deep neural networks that make them suitable for modeling unstructured data is that complex data, like images, generally have many layers of unique features that are composed to produce the data. As a classic example: images have edges which form the basis for textures, textures form the basis for simple objects, simple objects form the basis for more complex objects, and so on. In deep neural networks we aim to learn these many layers of composable features.

Traditional statistical models do very well on structured data, i.e. tabular data, but have notoriously struggled with unstructured data like images, audio, and natural language. Neural networks that contain many layers of neurons embody the research that is popularly called Deep Learning. The key insight and property of deep neural networks that make them suitable for modeling unstructured data is that complex data, like images, generally have many layers of unique features that are composed to produce the data. As a classic example: images have edges which form the basis for textures, textures form the basis for simple objects, simple objects form the basis for more complex objects, and so on. In deep neural networks we aim to learn these many layers of composable features.

# Terminology | 通用概念

## Function | 函数

💡 Sigmod $\sigma$ 💡

$S(x) = \frac{1}{1+ e^{-x}} \\ S'(x)=S(x)[1-S(x)]$

Geoff Hinton covered exactly this topic in his coursera course on neural nets. The problem with sigmoids is that as you reach saturation (values get close to 1 or 0), the gradients vanish. This is detrimental to optimization speed. Softmax doesn't have this problem, and in fact if you combine softmax with a cross entropy error function the gradients are just (z-y), as they would be for a linear output with least squares error.

# sigmoid functiondef sigmoid(x, deriv=False):    if(deriv==True):        return x*(1-x)    return 1/(1+np.exp(-x))

## Networks | 网络

$f(x^{t+1}) < f(x^t),t=0,1,2...$

$f(x+\Delta x) \simeq f(x) + \Delta x^T \nabla f(x)$

$\Delta x = -{step} \nabla f(x)$

• 当目标函数为凸函数时，局部极小点就对应着函数全局最小值时，这种方法可以快速的找到最优解；

• 当目标函数存在多个局部最小值时，可能会陷入局部最优解。因此需要从多个随机的起点开始解的搜索。

• 当目标函数不存在最小值点，则可能陷入无限循环。因此，有必要设置最大迭代次数。

# Rectified Linear Units or ReLU

Sigmoid 函数的输出间隔为[0,1]，而 ReLU 的输出范围为[0,infinity]，换言之 Sigmoid 更合适 Logistic 回归而 ReLU 更适合于表示正数。深度学习中 ReLU 并不会受制于所谓的梯度消失问题(Vanishing Gradient Problem)。

# Tanh

Tanh 函数有助于将你的网络权重控制在[-1,1]之间，而且从上图中可以看出，越靠近 0 的地方梯度值越大，并且梯度的范围位于[0,1]之间，和 Sigmoid 函数的范围一致，这一点也能有助于避免梯度偏差。

# Softmax

Softmax 函数常用于神经网络的末端以添加分类功能，该函数主要是进行多元逻辑斯蒂回归，也就可以用于多元分类问题。通常会使用交叉熵作为其损失函数。

# Drop out

Drop out 同样可以避免过拟合，并且能以近似指数的时间来合并多个不同的神经网络结构。该方法会随机地在每一层中选择一些显性层与隐层，在我们的实践中通常会由固定比例的层 Drop out 决定。

# F1/F Score

F1 = 2 * (Precision * Recall) / (Precision + Recall)Precision = True Positives / (True Positives + False Positives)Recall = True Positives / (True Positives + False Negatives)