主要copy自新浪微博MachineLearner的博客希望作为自己学习机器学习的工具。

矩阵求导好像从来没有学过,讲矩阵的课不讲求导,讲求导的课不讲矩阵。像维基百科什么的查找起来又费劲。其实在实际机器学习工作中,最常用的就是实值函数 y <script type="math/tex" id="MathJax-Element-8204">y</script>对向量x<script type="math/tex" id="MathJax-Element-8205">\bf x</script>求导。定义如下:

yx=yx1yx2yxn
<script type="math/tex; mode=display" id="MathJax-Element-8206"> \frac{\partial y}{\partial \bf x}=\left[ \begin {array}{c} \frac{\partial y}{\partial x_1}\\ \frac{\partial y}{\partial x_2}\\ \vdots\\ \frac{\partial y}{\partial x_n} \end {array}\right ] </script>
实值函数 y <script type="math/tex" id="MathJax-Element-8207">y</script>对矩阵X<script type="math/tex" id="MathJax-Element-8208">\bf X</script>求导:
yX=yx11yx21yxn1yx12yx22yxn2yx1nyx2nyxnn
<script type="math/tex; mode=display" id="MathJax-Element-8209">\frac{\partial y}{\partial \bf X}=\left[ \begin {array}{c} \begin{array}{ccc} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{12}}& \cdots &\frac{\partial y}{\partial x_{1n}}\\ \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{22}}&\cdots &\frac{\partial y}{\partial x_{2n}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y}{\partial x_{n1}} & \frac{\partial y}{\partial x_{n2}}&\cdots &\frac{\partial y}{\partial x_{nn}}\\ \end{array} \end {array}\right ] </script>
因为有监督的机器学习的一般套路是给定输入 x <script type="math/tex" id="MathJax-Element-8210">\bf x</script>,选择一个模型 f <script type="math/tex" id="MathJax-Element-8211">f</script>作为决策函数,由f(x<script type="math/tex" id="MathJax-Element-8212">f(\bf{x}</script> ) <script type="math/tex" id="MathJax-Element-8213">)</script>预测出y¯<script type="math/tex" id="MathJax-Element-8214">\bar {y}</script>。而要得到 f <script type="math/tex" id="MathJax-Element-8215">f</script>的参数θ<script type="math/tex" id="MathJax-Element-8216">\bf{\theta}</script>,需要定义一个loss函数来定义当前的预测值 y¯ <script type="math/tex" id="MathJax-Element-8217">\bar {y}</script>和实际值 y <script type="math/tex" id="MathJax-Element-8218">y</script>之间的接近程度,模型学习的过程就是求使得loss函数L(f(x<script type="math/tex" id="MathJax-Element-8219">L(f(\bf x</script>), y) <script type="math/tex" id="MathJax-Element-8220">y)</script>最小的参数 θ <script type="math/tex" id="MathJax-Element-8221">\theta</script>。这是一个最优化的问题,实际应用中都是用和梯度相关的最优化方法,如梯度下降,共轭梯度,拟牛顿法等等。
为方便推倒有以下公式:
βTxx=β
<script type="math/tex; mode=display" id="MathJax-Element-8222">\frac{\partial \beta^T\bf x}{\partial \bf x}=\beta</script>
xTxx=2x
<script type="math/tex; mode=display" id="MathJax-Element-8223">\frac{\partial \bf x^T\bf x}{\partial \bf x}=2\bf x</script>
xTAxx=(A+AT)x
<script type="math/tex; mode=display" id="MathJax-Element-8224">\frac{\partial \bf x^T Ax}{\partial \bf x}=(\bf{A+A}^T)x</script>
Andrew Ng使用矩阵的迹相关公式:
tr(a)=a
<script type="math/tex; mode=display" id="MathJax-Element-8225">\text{tr}(a)=a</script>
tr(AB)=tr(BA)
<script type="math/tex; mode=display" id="MathJax-Element-8226">\text{tr}(AB)=\text{tr}(BA)</script>
tr(ABC)=tr(CAB)=tr(BCA)
<script type="math/tex; mode=display" id="MathJax-Element-8227">\text{tr}(ABC)=\text{tr}(CAB)=\text{tr}(BCA)</script>
tr(AB)A=BT
<script type="math/tex; mode=display" id="MathJax-Element-8228">\frac{\partial{\text{tr}(AB)}}{A}=B^T</script>
tr(A)=tr(AT)
<script type="math/tex; mode=display" id="MathJax-Element-8229">\text{tr}(A)=\text{tr}(A^T)</script>
tr(ABATC)A=CAB+CTABT
<script type="math/tex; mode=display" id="MathJax-Element-8230">\frac{\partial{\text{tr}(ABA^TC)}}{A}=CAB+C^TAB^T</script>

Logo

CSDN联合极客时间,共同打造面向开发者的精品内容学习社区,助力成长!

更多推荐