一元函数的导数

$f^{(n)}(x) = [f^{(n-1)}(x)]'$

多元函数及其偏导数

$\lim_{\Delta x \to 0}\frac{f(x_0+\Delta x,y_0) - f(x_0,y_0)}{\Delta x}$

$\frac{\partial}{\partial x}(\frac{\partial f}{\partial x}) = \frac{\partial^{2} f}{\partial x^2}$
$\frac{\partial}{\partial y}(\frac{\partial f}{\partial x}) = \frac{\partial^{2} f}{\partial x \partial y}$
$\frac{\partial}{\partial x}(\frac{\partial f}{\partial y}) = \frac{\partial^{2} f}{\partial y \partial x}$
$\frac{\partial}{\partial y}(\frac{\partial f}{\partial y}) = \frac{\partial^{2} f}{\partial y^2}$

复合函数及链式法则

$\frac{dz}{dx} = \frac{dz}{dy}\frac{dy}{dx}$

$\frac{\partial z}{\partial x_i} = \sum_{j}\frac{\partial z}{\partial y_j}\frac{\partial y_j}{\partial x_i}$

$\nabla_{\boldsymbol{x}}z = (\frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}})^T\nabla_{\boldsymbol{y}}z$

$\begin{bmatrix}\frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_m}\\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_m}\\ \vdots & \vdots & \ddots &\vdots\\ \frac{\partial y_n}{\partial x_1} & \frac{\partial y_n}{\partial x_2} & \cdots & \frac{\partial y_n}{\partial x_m}\end{bmatrix}$

$$\nabla$$ 是哈密顿算子，对于二元函数，$$\nabla = \frac{\partial}{\partial x}\vec{i}+\frac{\partial}{\partial y}\vec{j}$$ 。

$\nabla_{\boldsymbol{x}} = (\frac{\partial}{\partial x_1},\frac{\partial}{\partial x_1},\cdots ,\frac{\partial}{\partial x_n})^T$

梯度及梯度下降法

一元函数的情况

$f(x+\Delta x) \simeq f(x) + \Delta x f'(x)$

多元函数

$$n$$ 维空间中两个点的差值是一个向量，方向可能有无数多个。如果要尽快收敛到极小值点，那么应该沿着函数值下降最快的方向前进。

$(\frac{\partial f}{\partial \boldsymbol{x}}|\boldsymbol{x=x_0})^T(\frac{\partial \boldsymbol{x_0} + \boldsymbol{l_0}t}{\partial t}) = \nabla_{\boldsymbol{x}} f \centerdot \boldsymbol{l_0}$

$\Delta \boldsymbol{x} = - \gamma \nabla f(\boldsymbol{x})$

神经网络及BP算法

1. 获取隐层的输入：
$\boldsymbol{\alpha}=\boldsymbol{Vx}$
其中$$\boldsymbol{V}$$ 是 $$q\times d$$ 的矩阵，$$\boldsymbol{\alpha}$$ 是 $$q$$ 维向量。

2. 获取隐层的输出：
$\boldsymbol{b} = f(\boldsymbol{\alpha + \gamma})$

3. 获取输出层的输入：
$\boldsymbol{\beta } = \boldsymbol{Wh}$
其中$$\boldsymbol{W}$$ 是 $$l\times q$$ 的矩阵。

4. 获取输出层的输出：
$\boldsymbol{y} = f(\boldsymbol{\beta + \theta})$

$\boldsymbol{y} = \boldsymbol{F(x,V,\gamma,W,\theta)}$

$\boldsymbol{J(V,\gamma,W,\theta)}=\sum_{\boldsymbol{x}\in\mathbb{X}}(\boldsymbol{y}^* - \boldsymbol{F(x,V,\gamma,W,\theta)})^2$

1. 首先，求关于$$\boldsymbol{\theta}$$ 的梯度。
根据链式法则，
$\frac{\partial F}{\partial \boldsymbol{\theta}} = (\frac{\partial (\boldsymbol{\theta + \beta})}{\partial \boldsymbol{\theta}})^T\frac{\partial F}{\partial (\boldsymbol{\theta + \beta})} = \frac{\partial F}{\partial (\boldsymbol{\theta + \beta})}$

2. 求关于$$\boldsymbol{W}$$ 的梯度

$\frac{\partial F}{\partial \boldsymbol{W}} = (\frac{\partial \boldsymbol{ \beta}}{\partial \boldsymbol{W}})^T\frac{\partial F}{\partial \boldsymbol{ \beta}}$
其中 $\frac{\partial \boldsymbol{ \beta}}{\partial \boldsymbol{W}} = \begin{bmatrix}\frac{\partial \boldsymbol{ \beta}}{\partial w_{11}} & \frac{\partial \boldsymbol{ \beta}}{\partial w_{21}} &\cdots &\frac{\partial \boldsymbol{ \beta}}{\partial w_{l1}}\\ \frac{\partial \boldsymbol{ \beta}}{\partial w_{12}} & \frac{\partial \boldsymbol{ \beta}}{\partial w_{22}} &\cdots &\frac{\partial \boldsymbol{ \beta}}{\partial w_{l2}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial \boldsymbol{ \beta}}{\partial w_{1q}} & \frac{\partial \boldsymbol{ \beta}}{\partial w_{2q}} &\cdots &\frac{\partial \boldsymbol{ \beta}}{\partial w_{lq}}\end{bmatrix}$
每一个元素都是一个向量。带入得到
$\frac{\partial F}{\partial w_{ij}} = (\frac{\partial \boldsymbol{\beta}}{\partial w_{ij}})^{T} (\frac{\partial \boldsymbol{F}}{\partial \boldsymbol{\beta}})$
$$\frac{\partial \boldsymbol{F}}{\partial \boldsymbol{\beta}}$$可类似$$\frac{\partial F}{\partial \boldsymbol{\theta}}$$ 求出

3. 求关于 $$\boldsymbol{\gamma}$$ 的梯度

$\frac{\partial F}{\partial \boldsymbol{\gamma}} = (\frac{\partial \boldsymbol{b}}{\partial \boldsymbol{\gamma}})^T(\frac{\partial F}{\partial \boldsymbol{b}})$
其中$$\frac{\partial \boldsymbol{b}}{\partial \boldsymbol{\gamma}}$$可以很容易求出。 $\frac{\partial F}{\partial \boldsymbol{b}} = (\frac{\partial \boldsymbol{\beta}}{\partial \boldsymbol{b}})^T(\frac{\partial F}{\partial \boldsymbol{\beta}})\\ \frac{\partial \boldsymbol{\beta}}{\partial \boldsymbol{b}} = \boldsymbol{W}$

4. 求关于 $$\boldsymbol{V}$$的梯度

$\frac{\partial F}{\partial \boldsymbol{V}} = (\frac{\partial \boldsymbol{ \alpha}}{\partial \boldsymbol{V}})^T\frac{\partial F}{\partial \boldsymbol{ \alpha}}$
同样根据链式法则，有
$\frac{\partial F}{\partial \boldsymbol{ \alpha}} = (\frac{\partial \boldsymbol{b}}{\partial \boldsymbol{\alpha}})^T(\frac{\partial \boldsymbol{F}}{\partial \boldsymbol{b}})$

参考

• Matrix calculus
• 周志华. 机器学习[M]. 清华大学出版社, 2016.
• Goodfellow I, Bengio Y, Courville A. Deep Learning[M]. The MIT Press, 2016.

posted on 2017-11-11 11:20  花老????  阅读(5554)  评论(0编辑  收藏