This chapter focuses on the proof and operation of SVD, along with the geometric significance and the low-rank approximation of matrices, which are closely connected to the applications of SVD. 本章重点介绍 SVD 的证明和运算,以及矩阵的几何意义和低秩近似,它们与 SVD 的应用密切相关。
Definition 定义
Let A be any m xx nm \times n matrix with real entries, A can be expressed in the product of three matrices 设 A 是任何 m xx nm \times n 具有实数项的矩阵,A 可以用三个矩阵的乘积表示
Where: 哪里:
U is an mxxm\mathrm{m} \times \mathrm{m} orthogonal matrix. Its columns u_(i)\mathrm{u}_{\mathrm{i}} are called the left singular vectors of AA. U 是 mxxm\mathrm{m} \times \mathrm{m} 正交矩阵。它的列 u_(i)\mathrm{u}_{\mathrm{i}} 称为 的 AA 左奇异向量。 Sigma\Sigma is an mxxn\mathrm{m} \times \mathrm{n} diagonal matrix with non-negative real entries sigma_(i)\sigma_{\mathrm{i}} on the diagonal, which are the singular values of A. Generally, we have sigma_(1) >= sigma_(2) >= dots >= sigma_(p) >= 0\sigma_{1} \geq \sigma_{2} \geq \ldots \geq \sigma_{p} \geq 0, p=min(m,n)\mathrm{p}=\min (\mathrm{m}, \mathrm{n}). Sigma\Sigma 是一个 mxxn\mathrm{m} \times \mathrm{n} 对角矩阵,对角线上有非负实数项 sigma_(i)\sigma_{\mathrm{i}} ,它们是 A 的奇异值。通常,我们有 sigma_(1) >= sigma_(2) >= dots >= sigma_(p) >= 0\sigma_{1} \geq \sigma_{2} \geq \ldots \geq \sigma_{p} \geq 0 、 p=min(m,n)\mathrm{p}=\min (\mathrm{m}, \mathrm{n}) 。
V is an mxxn\mathrm{m} \times \mathrm{n} orthogonal matrix. Its columns v_(i)\mathrm{v}_{\mathrm{i}} are called the right singular vectors of A . V 是 mxxn\mathrm{m} \times \mathrm{n} 正交矩阵。它的列 v_(i)\mathrm{v}_{\mathrm{i}} 称为 A 的右奇异向量。
Proof 证明
A^(T)AA^{T} A is positive semidefinite (having all non-negative eigenvalues) A^(T)AA^{T} A 为正半定(具有所有非负特征值)
Proof: Let AinR^(mxxn),x\mathrm{A} \in \mathrm{R}^{\mathrm{m} \times \mathrm{n}}, \mathrm{x} is an n -dimentional column vector, 证明: 设 AinR^(mxxn),x\mathrm{A} \in \mathrm{R}^{\mathrm{m} \times \mathrm{n}}, \mathrm{x} 是一个 n 维列向量,
{:[x^(T)(A^(T)A)x=x^(T)A^(T)Ax],[=(Ax)^(T)(Ax)],[=|Ax|^(2) >= 0]:}\begin{gathered}
x^{T}\left(A^{T} A\right) x=x^{T} A^{T} A x \\
=(A x)^{T}(A x) \\
=|A x|^{2} \geq 0
\end{gathered}
So A^(T)AA^{T} A is positive semidefinite. 正半定也是如此 A^(T)AA^{T} A 。
Creating a set of bases in the column space of A 在 A 的列空间中创建一组基
Suppose AA is an m xx nm \times n real matrix. Since A^(T)AA^{T} A is a real symmetric matrix (diagonalizable and having orthogonal eigenvectors), let the n eigenvalues of A^(T)AA^{T} \mathrm{~A} be lambda_(1) >= lambda_(2) >= dots >= lambda_(n) >= 0\lambda_{1} \geq \lambda_{2} \geq \ldots \geq \lambda_{n} \geq 0 ( A^(T)A\mathrm{A}^{\mathrm{T}} \mathrm{A} is positive semidefinite, proven in step 1 ). From the eigen decomposition of AA, there exists an n xx nn \times n orthogonal matrix VV, such that 假设 AA 是一个 m xx nm \times n 实矩阵。由于 A^(T)AA^{T} A 是一个实对称矩阵(可对角化且具有正交特征向量),因此设 be lambda_(1) >= lambda_(2) >= dots >= lambda_(n) >= 0\lambda_{1} \geq \lambda_{2} \geq \ldots \geq \lambda_{n} \geq 0 的 A^(T)AA^{T} \mathrm{~A} n 个特征值( A^(T)A\mathrm{A}^{\mathrm{T}} \mathrm{A} 是正半定的,在步骤 1 中得到证明)。从 AA 的特征分解 中,存在一个 n xx nn \times n 正交矩阵 VV ,使得
Let V=[v_(1)v_(2)cdotsV_(n)]\mathrm{V}=\left[\mathrm{v}_{1} \mathrm{v}_{2} \cdots \mathrm{~V}_{\mathrm{n}}\right] (the eigenvectors of A^(T)A\mathrm{A}^{\mathrm{T}} \mathrm{A} ). 设 V=[v_(1)v_(2)cdotsV_(n)]\mathrm{V}=\left[\mathrm{v}_{1} \mathrm{v}_{2} \cdots \mathrm{~V}_{\mathrm{n}}\right] ( A^(T)A\mathrm{A}^{\mathrm{T}} \mathrm{A} 的特征向量)。
Since V is an orthogonal matrix, to any n -dimentional column vector x , there is a unique set of value ( c_(1),c_(2),cdots,c_(n)c_{1}, c_{2}, \cdots, c_{n} ), such that 由于 V 是正交矩阵,因此对于任何 n 维列向量 x ,都有一组唯一的值 ( c_(1),c_(2),cdots,c_(n)c_{1}, c_{2}, \cdots, c_{n} ),使得
Else, to any i,j(i!=j)\mathrm{i}, \mathrm{j}(\mathrm{i} \neq \mathrm{j}), we have 否则,对于任何 i,j(i!=j)\mathrm{i}, \mathrm{j}(\mathrm{i} \neq \mathrm{j}) ,我们都有
(Av_(i))^(T)(Av_(j))=(A^(T)Av_(j))=lambda_(j)v_(j)=0\left(A v_{i}\right)^{T}\left(A v_{j}\right)=\left(A^{T} A v_{j}\right)=\lambda_{j} v_{j}=0
So Av_(i)_|_Av_(j)A v_{i} \perp \mathrm{Av}_{j}. That means {Av_(1),Av_(2),cdots,Av_(r)}\left\{\operatorname{Av}_{1}, \mathrm{Av}_{2}, \cdots, \mathrm{Av}_{\mathrm{r}}\right\} is a set of orthogonal bases of the column space of AA. 所以 Av_(i)_|_Av_(j)A v_{i} \perp \mathrm{Av}_{j} .这意味着 {Av_(1),Av_(2),cdots,Av_(r)}\left\{\operatorname{Av}_{1}, \mathrm{Av}_{2}, \cdots, \mathrm{Av}_{\mathrm{r}}\right\} 是 的列空间的一组正交基 AA 数。
Computing the length of Av_(i)\mathrm{Av}_{\mathrm{i}}, 计算 Av_(i)\mathrm{Av}_{\mathrm{i}} 的长度 ,
|Av_(i)|=sqrt(|Av_(i)|^(2))=sqrt(A^(T)Av_(i))=sqrt(lambda_(i)v_(i))=sqrt(lambda_(i))\left|A v_{i}\right|=\sqrt{\left|A v_{i}\right|^{2}}=\sqrt{A^{T} A v_{i}}=\sqrt{\lambda_{i} v_{i}}=\sqrt{\lambda_{i}}
So {(Av_(1))/(sqrt(lambda_(1))),(Av_(2))/(sqrt(lambda_(2))),cdots,(Av_(r))/(sqrt(lambda_(r)))}\left\{\frac{\mathrm{Av}_{1}}{\sqrt{\lambda_{1}}}, \frac{\mathrm{Av}_{2}}{\sqrt{\lambda_{2}}}, \cdots, \frac{\mathrm{Av}_{\mathrm{r}}}{\sqrt{\lambda_{\mathrm{r}}}}\right\} is a set of orthogonal bases with a length of 1 in the column space of A. 在 A 的列空间中,一组长度为 1 的正交底也是如此 {(Av_(1))/(sqrt(lambda_(1))),(Av_(2))/(sqrt(lambda_(2))),cdots,(Av_(r))/(sqrt(lambda_(r)))}\left\{\frac{\mathrm{Av}_{1}}{\sqrt{\lambda_{1}}}, \frac{\mathrm{Av}_{2}}{\sqrt{\lambda_{2}}}, \cdots, \frac{\mathrm{Av}_{\mathrm{r}}}{\sqrt{\lambda_{\mathrm{r}}}}\right\} 。
The final step 最后一步
To all i in{1,2,cdots,r}i \in\{1,2, \cdots, r\}, let u_(i)=(Av_(i))/(sqrt(lambda_(i)))u_{i}=\frac{\mathrm{Av}_{i}}{\sqrt{\lambda_{i}}}, and extend the set of bases to {u_(1),u_(2),cdots,u_(r),u_(r+1),cdots,u_(m)},u_(r+1)∼u_(m)\left\{u_{1}, u_{2}, \cdots, u_{r}, u_{r+1}, \cdots, u_{m}\right\}, u_{r+1} \sim u_{m} are the orthogonal bases of the null space of A (with a length of 1). The new set of bases is an orthogonal bases set of R^(mxxm)\mathrm{R}^{\mathrm{m} \times \mathrm{m}}. Computing AV: 对于所有 i in{1,2,cdots,r}i \in\{1,2, \cdots, r\} ,let u_(i)=(Av_(i))/(sqrt(lambda_(i)))u_{i}=\frac{\mathrm{Av}_{i}}{\sqrt{\lambda_{i}}} 和 extend 的基数集是 {u_(1),u_(2),cdots,u_(r),u_(r+1),cdots,u_(m)},u_(r+1)∼u_(m)\left\{u_{1}, u_{2}, \cdots, u_{r}, u_{r+1}, \cdots, u_{m}\right\}, u_{r+1} \sim u_{m} A 的零空间(长度为 1)的正交基数。新的基集是 R^(mxxm)\mathrm{R}^{\mathrm{m} \times \mathrm{m}} 的正交基集。计算 AV:
Let U=[u_(1)u_(2)cdotsu_(m)],sigma_(i)=sqrt(lambda_(i))U=\left[\mathrm{u}_{1} \mathrm{u}_{2} \cdots \mathrm{u}_{\mathrm{m}}\right], \sigma_{\mathrm{i}}=\sqrt{\lambda_{\mathrm{i}}}, so AV=U Sigma\mathrm{AV}=U \Sigma, so that 设 U=[u_(1)u_(2)cdotsu_(m)],sigma_(i)=sqrt(lambda_(i))U=\left[\mathrm{u}_{1} \mathrm{u}_{2} \cdots \mathrm{u}_{\mathrm{m}}\right], \sigma_{\mathrm{i}}=\sqrt{\lambda_{\mathrm{i}}} , so AV=U Sigma\mathrm{AV}=U \Sigma , so , 所以
From the above proof, we have found that the singular values of A is the square roots of the (biggest min(m,n)\min (m, n) ) eigenvalues of A^(T)AA^{T} A. From the other side (begin with AA^(T)\mathrm{AA}^{\mathrm{T}} ), using the same method of above, it’s easy to prove that the singular values of AA is the square roots of the (biggest min(m,n)\min (m, n) ) eigenvalues of AA^(T)A A^{T}, and {u_(1),u_(2),cdots,u_(m)}\left\{u_{1}, u_{2}, \cdots, u_{m}\right\} are the eigenvectors of AA^(T)A A^{T}. So the singular values and the singular vectors have the following rules: 从上面的证明中,我们发现 A 的奇异值是 的(最大 min(m,n)\min (m, n) )特征值的平方根 A^(T)AA^{T} A 。从另一边(以 开始) AA^(T)\mathrm{AA}^{\mathrm{T}} 来看,使用上述相同的方法,很容易证明 的 AA 奇异值是 AA^(T)A A^{T} 的(最大 min(m,n)\min (m, n) )特征值的平方根,并且 {u_(1),u_(2),cdots,u_(m)}\left\{u_{1}, u_{2}, \cdots, u_{m}\right\} 是 AA^(T)A A^{T} 的特征向量。因此,奇异值和奇异向量具有以下规则:
Right singular vectors: A^(T)Av_(i)=v_(i)\mathrm{A}^{\mathrm{T}} \mathrm{Av}_{\mathrm{i}}=\mathrm{v}_{\mathrm{i}}. 右奇异向量: A^(T)Av_(i)=v_(i)\mathrm{A}^{\mathrm{T}} \mathrm{Av}_{\mathrm{i}}=\mathrm{v}_{\mathrm{i}} .
Left singular vectors: A^(T)u_(i)=u_(i)A^{T} u_{i}=u_{i}. 左奇异向量: A^(T)u_(i)=u_(i)A^{T} u_{i}=u_{i} .
Connections between u_(i)u_{i} and v_(i):v_(i)=sigma_(i)u_(i)v_{i}: v_{i}=\sigma_{i} u_{i} and A^(T)u_(i)=sigma_(i)v_(i)A^{T} u_{i}=\sigma_{i} v_{i}. 和 和 v_(i):v_(i)=sigma_(i)u_(i)v_{i}: v_{i}=\sigma_{i} u_{i} 之间的 u_(i)u_{i} 连接 A^(T)u_(i)=sigma_(i)v_(i)A^{T} u_{i}=\sigma_{i} v_{i} 。
Geometric significance 几何意义
The singular value decomposition (SVD) is an extension to the eigenvalue decomposition (EVD). EVD finds the vectors (directions) which a matrix only scales (and/or reverses/erases) them but doesn’t rotate them (by multiplication). SVD finds the orthogonal vectors (directions) which a matrix only scales and/or rotates them but doesn’t affect the orthogonality between them. Similar to EVD, SVD can also be described as a rotation-stretchingrotation process. 奇异值分解 (SVD) 是特征值分解 (EVD) 的扩展。EVD 查找矩阵仅缩放(和/或反转/擦除)它们但不旋转它们(通过乘法)的向量(方向)。SVD 查找正交向量(方向),矩阵仅缩放和/或旋转它们,但不影响它们之间的正交性。与 EVD 类似,SVD 也可以描述为旋转-拉伸旋转过程。
The orthogonal matrix V^(T)\mathrm{V}^{\mathrm{T}} (or V^(-1)\mathrm{V}^{-1} ) rotates the orthogonal vectors onto the standard axis. (The core of Chapter 4) 正交矩阵 V^(T)\mathrm{V}^{\mathrm{T}} (或 V^(-1)\mathrm{V}^{-1} ) 将正交向量旋转到标准轴上。(第 4 章的核心)
The diagonal matrix Sigma\Sigma scales each vector by a value, with a change in the number of dimensions ( R^(nxxn)rarrR^(mxxm)\mathrm{R}^{\mathrm{n} \times \mathrm{n}} \rightarrow \mathrm{R}^{\mathrm{m} \times \mathrm{m}} ). 对角矩阵按一个值 Sigma\Sigma 缩放每个向量,维数 ( R^(nxxn)rarrR^(mxxm)\mathrm{R}^{\mathrm{n} \times \mathrm{n}} \rightarrow \mathrm{R}^{\mathrm{m} \times \mathrm{m}} ) 发生变化。
The orthogonal matrix UU rotates the vectors in R^(m xx m)R^{m \times m} space. 正交矩阵 UU 在 R^(m xx m)R^{m \times m} 空间中旋转向量。
Here is a simple picture to show the process: 这是一张简单的图片来展示这个过程: