Math - Linear Algebra 數學 - 線性代數
Linear Algebra is the branch of mathematics that studies vector spaces and linear transformations between vector spaces, such as rotating a shape, scaling it up or down, translating it (ie. moving it), etc.
線性代數是數學的一個分支,它研究向量空間和向量空間之間的線性變換,例如旋轉形狀、放大或縮小、平移(即移動)等。
Machine Learning relies heavily on Linear Algebra, so it is essential to understand what vectors and matrices are, what operations you can perform with them, and how they can be useful.
機器學習在很大程度上依賴於線性代數,因此了解什麼是向量和矩陣、您可以使用它們執行哪些操作以及它們有何用處至關重要。
Before we start, let's ensure that this notebook works well in both Python 2 and 3:
在開始之前,我們來確保此筆記本在 Python 2 和 3 中都能正常工作:
Vectors¶ 向量¶
Definition¶ 定義¶
A vector is a quantity defined by a magnitude and a direction. For example, a rocket's velocity is a 3-dimensional vector: its magnitude is the speed of the rocket, and its direction is (hopefully) up. A vector can be represented by an array of numbers called scalars. Each scalar corresponds to the magnitude of the vector with regards to each dimension.
向量是由大小和方向定義的量。例如,火箭的速度是一個 3 維向量:它的大小是火箭的速度,它的方向(希望)是向上的。向量可以由稱為標量的數字數位數組表示。每個標量對應於向量相對於每個維度的大小。
For example, say the rocket is going up at a slight angle: it has a vertical speed of 5,000 m/s, and also a slight speed towards the East at 10 m/s, and a slight speed towards the North at 50 m/s. The rocket's velocity may be represented by the following vector:
例如,假設火箭以一個小角度上升:它的垂直速度為 5,000 m/s,向東的垂直速度為 10 m/s,向北的微速度為 50 m/s。火箭的速度可以用以下向量表示:
velocity
速度
Note: by convention vectors are generally presented in the form of columns. Also, vector names are generally lowercase to distinguish them from matrices (which we will discuss below) and in bold (when possible) to distinguish them from simple scalar values such as
注意:按照約定,向量通常以列的形式表示。此外,向量名稱通常為小寫,以便與矩陣(我們將在下面討論)區分開來,並使用粗體(如果可能)以區別於簡單的標量值,例如
A list of N numbers may also represent the coordinates of a point in an N-dimensional space, so it is quite frequent to represent vectors as simple points instead of arrows. A vector with 1 element may be represented as an arrow or a point on an axis, a vector with 2 elements is an arrow or a point on a plane, a vector with 3 elements is an arrow or point in space, and a vector with N elements is an arrow or a point in an N-dimensional space… which most people find hard to imagine.
N 個數位的清單也可以表示 N 維空間中某個點的座標,因此將向量表示為簡單點而不是箭頭是相當常見的。具有 1 個元素的向量可以表示為軸上的箭頭或點,具有 2 個元素的向量是平面上的箭頭或點,具有 3 個元素的向量是空間中的箭頭或點,具有 N 個元素的向量是 N 維空間中的箭頭或點......大多數人覺得很難想像。
Purpose¶ 目的¶
Vectors have many purposes in Machine Learning, most notably to represent observations and predictions. For example, say we built a Machine Learning system to classify videos into 3 categories (good, spam, clickbait) based on what we know about them. For each video, we would have a vector representing what we know about it, such as:
向量在機器學習中有很多用途,最明顯的是表示觀察和預測。例如,假設我們構建了一個機器學習系統,根據我們對視頻的瞭解將視頻分為 3 類(好、垃圾、點擊誘餌)。對於每個視頻,我們將有一個向量來表示我們對它的瞭解,例如:
video
This vector could represent a video that lasts 10.5 minutes, but only 5.2% viewers watch for more than a minute, it gets 3.25 views per day on average, and it was flagged 7 times as spam. As you can see, each axis may have a different meaning.
這個向量可以代表一個持續 10.5 分鐘的視頻,但只有 5.2% 的觀眾觀看時間超過一分鐘,平均每天獲得 3.25 次觀看,並且被標記為垃圾郵件 7 次。如您所見,每個軸可能具有不同的含義。
Based on this vector our Machine Learning system may predict that there is an 80% probability that it is a spam video, 18% that it is clickbait, and 2% that it is a good video. This could be represented as the following vector:
根據這個向量,我們的機器學習系統可能會預測它是垃圾郵件視頻的概率為80%,點擊誘餌的概率為18%,優質視頻的概率為2%。這可以表示為以下向量:
class_probabilities
Since we plan to do quite a lot of scientific calculations, it is much better to use NumPy's ndarray
, which provides a lot of convenient and optimized implementations of essential mathematical operations on vectors (for more details about NumPy, check out the NumPy tutorial). For example:
由於我們計劃進行相當多的科學計算,因此最好使用 NumPy 的 ndarray
,它為向量的基本數學運算提供了許多方便和優化的實現(有關 NumPy 的更多詳細資訊,請查看 NumPy 教程)。例如:
陣列([ 10.5 , 5.2 , 3.25, 7. ])
The size of a vector can be obtained using the size
attribute:
可以使用 size
屬性獲取向量的大小:
The
表示
Note that indices in mathematics generally start at 1, but in programming they usually start at 0. So to access
請注意,數學中的索引通常從 1 開始,但在程式設計中它們通常從 0 開始。因此,要以程式設計方式訪問
Plotting vectors¶
繪製向量¶
To plot vectors we will use matplotlib, so let's start by importing it (for details about matplotlib, check the matplotlib tutorial):
要繪製向量,我們將使用 matplotlib,因此讓我們從導入它開始(有關 matplotlib 的詳細資訊,請查看 matplotlib 教程):
These vectors each have 2 elements, so they can easily be represented graphically on a 2D graph, for example as points:
這些向量每個都有2個元素,因此它們可以很容易地在2D圖形上以圖形方式表示,例如點:
Vectors can also be represented as arrows. Let's create a small convenience function to draw nice arrows:
向量也可以表示為箭頭。讓我們創建一個小的便捷函數來繪製漂亮的箭頭:
Now let's draw the vectors u and v as arrows:
現在讓我們將向量 u 和 v 繪製為箭頭:
<matplotlib.figure.Figure 在 0x10680f550>
Now let's plot them using matplotlib's Axes3D
:
現在讓我們使用 matplotlib 的 Axes3D
繪製它們:
It is a bit hard to visualize exactly where in space these two points are, so let's add vertical lines. We'll create a small convenience function to plot a list of 3d vectors with vertical lines attached:
要準確想像這兩個點在空間中的確切位置有點困難,因此讓我們添加垂直線。我們將建立一個小的便捷函數來繪製帶有垂直線的 3D 向量清單:
<matplotlib.figure.Figure 在 0x106a05250>
Norm¶ 范數¶
The norm of a vector
向量的範數
We could implement this easily in pure python, recalling that
我們可以很容易地用純 python 實現這一點,回想一下
||[2 5] ||=
However, it is much more efficient to use NumPy's norm
function, available in the linalg
(Linear Algebra) module:
但是,使用 NumPy 的範
數函數要高效得多,該函數在 linalg
(Linear Algebra) 模組中可用:
Let's plot a little diagram to confirm that the length of vector
讓我們畫一個小圖來確認向量
Looks about right! 看起來差不多!
Let's look at what vector addition looks like graphically:
讓我們看看向量加法的圖形效果:
<matplotlib.figure.Figure 在 0x106a21c50>
Vector addition is commutative, meaning that
向量加法是可交換的,這意味著
Vector addition is also associative, meaning that
向量加法也是結合的,這意味著
If you have a shape defined by a number of points (vectors), and you add a vector
如果元件由多個點(向量)定義,並且向所有這些點添加一個向量
Finally, substracting a vector is like adding the opposite vector.
最後,減去向量就像添加相反的向量。
1.5 * [2 5] =
array([ 3. , 7.5])
Graphically, scalar multiplication results in changing the scale of a figure, hence the name scalar. The distance from the origin (the point at coordinates equal to zero) is also multiplied by the scalar. For example, let's scale up by a factor of k = 2.5
:
從圖形上看,標量乘法會導致圖形的比例發生變化,因此得名標量。距原點 (座標等於零的點) 的距離也乘以標量。例如,讓我們按係數 k = 2.5
進行擴展:
As you might guess, dividing a vector by a scalar is equivalent to multiplying by its inverse:
正如您可能猜到的那樣,將向量除以標量等效於乘以其逆函數:
Scalar multiplication is commutative:
標量乘法是可交換的:
It is also associative:
它也是結合的:
Finally, it is distributive over addition of vectors:
最後,它是分散式的,而不是向量的添加:
Zero, unit and normalized vectors¶
零、單位和歸一化向量¶
- A **zero-vector ** is a vector full of 0s.
零向量是充滿 0 的向量。 - A unit vector is a vector with a norm equal to 1.
單位向量是范數等於 1 的向量。 - The normalized vector of a non-null vector
, noted , is the unit vector that points in the same direction as . It is equal to:
非 null 向量 的歸一化向量 ,注意 ,是指向與 相同的方向的單位向量。它等於:
Dot product¶ 點積¶
Definition¶ 定義¶
The dot product (also called scalar product or inner product in the context of the Euclidian space) of two vectors
兩個向量的點積(在歐幾里得空間的上下文中也稱為標量積或內積)
where
其中
Another way to calculate the dot product is:
計算點積的另一種方法是:
In python¶ 在 python 中¶
The dot product is pretty simple to implement:
點積的實現非常簡單:
But a much more efficient implementation is provided by NumPy with the dot
function:
但是 NumPy 通過 dot
函數提供了一個更有效的實現:
Equivalently, you can use the dot
method of ndarray
s:
等效地,你可以使用 ndarray
s 的 dot
方法:
Caution: the *
operator will perform an elementwise multiplication, NOT a dot product:
注意: *
運算子將執行按元素乘法,而不是點積:
Main properties¶
主要屬性¶
- The dot product is commutative:
.
點積是可交換的: 。 - The dot product is only defined between two vectors, not between a scalar and a vector. This means that we cannot chain dot products: for example, the expression
is not defined since is a scalar and is a vector.
點積僅在兩個向量之間定義,而不是在標量和向量之間定義。這意味著我們不能鏈式點積:例如,表達式 沒有定義,因為 是標量 並且 是向量。 - This also means that the dot product is NOT associative:
since neither are defined.
這也意味著點積不是結合的: 因為兩者都沒有定義。 - However, the dot product is associative with regards to scalar multiplication:
但是,點積對於標量乘法是結合的: - Finally, the dot product is distributive over addition of vectors:
.
最後,點積在向量的添加上是分配的: 。
Calculating the angle between vectors¶
計算 vector 之間的角度¶
One of the many uses of the dot product is to calculate the angle between two non-zero vectors. Looking at the dot product definition, we can deduce the following formula:
點積的眾多用途之一是計算兩個非零向量之間的角度。查看點積定義,我們可以推匯出以下公式:
Note that if
請注意,如果 ,則
Let's use this formula to calculate the angle between
讓我們用這個公式來計算 和之間的
角度 = 0.868539395286 弧度
Note: due to small floating point errors, cos_theta
may be very slightly outside of the arccos
fail. This is why we clipped the value within the range, using NumPy's clip
function.
注意:由於浮點誤差較小,cos_theta
可能會略微超出 arccos
失敗。這就是為什麼我們使用 NumPy 的 clip
函數在範圍內裁剪值的原因。
<matplotlib.figure.Figure 在 0x106dcf210>
Matrices¶ 矩陣¶
A matrix is a rectangular array of scalars (ie. any number: integer, real or complex) arranged in rows and columns, for example:
矩陣是按行和列排列的標量(即任何數位:整數、實數或複數)的矩形陣陣,例如:
You can also think of a matrix as a list of vectors: the previous matrix contains either 2 horizontal 3D vectors or 3 vertical 2D vectors.
您還可以將矩陣視為向量清單:前一個矩陣包含 2 個水準 3D 向量或 3 個垂直 2D 向量。
Matrices are convenient and very efficient to run operations on many vectors at a time. We will also see that they are great at representing and performing linear transformations such rotations, translations and scaling.
矩陣可以方便且非常高效地同時對多個向量運行操作。我們還將看到它們非常擅長表示和執行線性變換,例如旋轉、平移和縮放。
A much more efficient way is to use the NumPy library which provides optimized implementations of many matrix operations:
一種更有效的方法是使用 NumPy 庫,它提供了許多矩陣運算的優化實現:
By convention matrices generally have uppercase names, such as
按照約定,矩陣通常具有大寫名稱,例如
In the rest of this tutorial, we will assume that we are using NumPy arrays (type ndarray
) to represent matrices.
在本教程的其餘部分,我們將假設我們使用 NumPy 陣列(類型 ndarray
)來表示矩陣。
Size¶ 大小¶
The size of a matrix is defined by its number of rows and number of columns. It is noted
矩陣的大小由其行數和列數定義。值得注意的是
To get a matrix's size in NumPy:
要在 NumPy 中獲取矩陣的大小:
Caution: the size
attribute represents the number of elements in the ndarray
, not the matrix's size:
注意: size
屬性表示 ndarray
中的元素數,而不是矩陣的大小:
Element indexing¶
元素索引¶
The number located in the
位於矩陣的
However in this notebook we will use the
但是,在這個筆記本中,我們將使用符號,
The
Similarly, the
同樣,
Note that the result is actually a one-dimensional NumPy array: there is no such thing as a vertical or horizontal one-dimensional array. If you need to actually represent a row vector as a one-row matrix (ie. a 2D NumPy array), or a column vector as a one-column matrix, then you need to use a slice instead of an integer when accessing the row or column, for example:
請注意,結果實際上是一個一維 NumPy 陣列:沒有垂直或水準一維數位之類的東西。如果你需要實際將行向量表示為單行矩陣(即 2D NumPy 陣列),或將列向量表示為單列矩陣,那麼在訪問行或列時需要使用切片而不是整數,例如:
An upper triangular matrix is a special kind of square matrix where all the elements below the main diagonal (top-left to bottom-right) are zero, for example:
上三角矩陣是一種特殊的方陣,其中主對角線以下(從左上到右下)的所有元素都為零,例如:
Similarly, a lower triangular matrix is a square matrix where all elements above the main diagonal are zero, for example:
同樣,下三角矩陣是一個方陣,其中主對角線上方的所有元素都為零,例如:
A triangular matrix is one that is either lower triangular or upper triangular.
三角矩陣是下三角矩陣或上三角矩陣。
A matrix that is both upper and lower triangular is called a diagonal matrix, for example:
同時為上三角矩陣和下三角矩陣的矩陣稱為對角矩陣,例如:
You can construct a diagonal matrix using NumPy's diag
function:
您可以使用 NumPy 的 diag
函數建構對角矩陣:
If you pass a matrix to the diag
function, it will happily extract the diagonal values:
如果您將矩陣傳遞給 diag
函數,它將很高興地提取對角線值:
Finally, the identity matrix of size
最後,大小
Numpy's eye
function returns the identity matrix of the desired size:
Numpy 的 eye
函數傳回所需大小的單位矩陣:
The identity matrix is often noted simply
當給定上下文時,當單位矩陣的大小很明確時,通常會簡單地
Adding matrices¶
添加矩陣¶
If two matrices
如果兩個矩陣
For example, let's create a
例如,讓我們建立一個
Addition is commutative, meaning that
加法是可交換的,這意味著
It is also associative, meaning that
它也是結合的,這意味著
Scalar multiplication¶
標量乘法¶
A matrix
矩陣
A more concise way of writing this is:
更簡潔的寫法是:
In NumPy, simply use the *
operator to multiply a matrix by a scalar. For example:
在 NumPy 中,只需使用 *
運算子將矩陣乘以標量即可。例如:
Scalar multiplication is also defined on the right hand side, and gives the same result:
標量乘法也在右側定義,並給出相同的結果:
This makes scalar multiplication commutative.
這使得標量乘法是可交換的。
It is also associative, meaning that
它也是結合的,這意味著
Finally, it is distributive over addition of matrices, meaning that
最後,它是 distrivoive over addrices,這意味著
Matrix multiplication¶
矩陣乘法¶
So far, matrix operations have been rather intuitive. But multiplying matrices is a bit more involved.
到目前為止,矩陣運算一直相當直觀。但是乘以矩陣要複雜一些。
A matrix
size 的矩陣
The element at position
結果矩陣中位置的
You may notice that each element
您可能會注意到,每個元素
So we can rewrite
所以我們可以更簡潔地改
Let's multiply two matrices in NumPy, using ndarray
's dot
method:
讓我們在 NumPy 中使用 ndarray
的 dot
方法將兩個矩陣相乘:
陣列([[ 930, 1160, 1320, 1560],
Let's check this result by looking at one element, just to be sure: looking at
讓我們通過查看一個元素來檢查這個結果,以確保萬無一失:
Looks good! You can check the other elements until you get used to the algorithm.
看起來不錯!您可以檢查其他元素,直到您習慣了該演算法。
We multiplied a
我們將一個
ValueError:形狀 (3,4) 和 (2,3) 未對齊:4 (dim 1) != 2 (dim 0)
This illustrates the fact that matrix multiplication is NOT commutative: in general
這說明了矩陣乘法不是可交換的:一般來說
In fact,
實際上,
On the other hand, matrix multiplication is associative, meaning that
另一方面,矩陣乘法是結合的,這意味著
陣列([[21640, 28390, 27320, 31140, 13570],
陣列([[21640, 28390, 27320, 31140, 13570],
It is also distributive over addition of matrices, meaning that
它也是分散式的,而不是矩陣的加法,這意味著
陣列([[1023, 1276, 1452, 1716],
陣列([[1023, 1276, 1452, 1716],
The product of a matrix
矩陣
This is generally written more concisely (since the size of the identity matrices is unambiguous given the context):
這通常寫得更簡潔(因為在上下文中,單位矩陣的大小是明確的):
For example: 例如:
Caution: NumPy's *
operator performs elementwise multiplication, NOT a matrix multiplication:
注意: NumPy 的 *
運算子執行元素乘法,而不是矩陣乘法:
The @ infix operator @ 中綴運算符
Python 3.5 introduced the @
infix operator for matrix multiplication, and NumPy 1.10 added support for it. If you are using Python 3.5+ and NumPy 1.10+, you can simply write A @ D
instead of A.dot(D)
, making your code much more readable (but less portable). This operator also works for vector dot products.
Python 3.5 引入了用於矩陣乘法的 @
中綴運算符,而 NumPy 1.10 增加了對它的支援。如果您使用的是 Python 3.5+ 和 NumPy 1.10+,您可以簡單地編寫 A @ D
而不是 A.dot(D),
從而使您的代碼更具可讀性(但可移植性較差)。此運算子也適用於向量點積。
Note: Q @ R
is actually equivalent to Q.__matmul__(R)
which is implemented by NumPy as np.matmul(Q, R)
, not as Q.dot(R)
. The main difference is that matmul
does not support scalar multiplication, while dot
does, so you can write Q.dot(3)
, which is equivalent to Q * 3
, but you cannot write Q @ 3
(more details).
注意:Q @ R
實際上等同於 Q.__matmul__(R),
它由 NumPy 實現為 np.matmul(Q, R),
而不是 Q.dot(R)。
主要區別在於 matmul
不支援標量乘法,而 dot
支援,所以你可以寫 Q.dot(3),
相當於 Q * 3
,但你不能寫 Q @ 3
(更多細節)。
Matrix transpose¶
矩陣轉置¶
The transpose of a matrix
矩陣
In other words, (
換句話說,(
Obviously, if
顯然,如果
Note: there are a few other notations, such as
注意:還有一些其他符號,例如
In NumPy, a matrix's transpose can be obtained simply using the T
attribute:
在 NumPy 中,只需使用 T
屬性即可獲得矩陣的轉置:
As you might expect, transposing a matrix twice returns the original matrix:
如您所料,將矩陣轉置兩次將返回原始矩陣:
Transposition is distributive over addition of matrices, meaning that
轉置是分配的,而不是矩陣的加法,這意味著
Moreover,
此外,
A symmetric matrix
對稱矩陣
The product of a matrix by its transpose is always a symmetric matrix, for example:
矩陣通過其轉置的乘積始終是對稱矩陣,例如:
Converting 1D arrays to 2D arrays in NumPy¶
在 NumPy 中將 1D 陣列轉換為 2D 陣列¶
As we mentionned earlier, in NumPy (as opposed to Matlab, for example), 1D really means 1D: there is no such thing as a vertical 1D-array or a horizontal 1D-array. So you should not be surprised to see that transposing a 1D array does not do anything:
正如我們之前提到的,在 NumPy 中(例如,與 Matlab 相反),一維實際上意味著一維:沒有垂直的一維數位或水準的一維數位列。因此,你不應該驚訝地發現轉置 1D 陣列不會做任何事情:
We want to convert
我們想在轉置之前將其轉換為
Notice the extra square brackets: this is a 2D array with just one row (ie. a 1x2 matrix). In other words it really is a row vector.
注意額外的方括弧:這是一個只有一行的 2D 陣列(即 1x2 矩陣)。換句話說,它實際上是一個行向量。
This quite explicit: we are asking for a new vertical axis, keeping the existing data as the horizontal axis.
這很明確:我們要求一個新的垂直軸,將現有數據保持為水平軸。
This is equivalent, but a little less explicit.
這是等效的,但不太明確。
This is the shortest version, but you probably want to avoid it because it is unclear. The reason it works is that np.newaxis
is actually equal to None
, so this is equivalent to the previous version.
這是最短的版本,但您可能希望避免它,因為它不清楚。它起作用的原因是 np.newaxis
實際上等於 None
,所以這相當於以前的版本。
Ok, now let's transpose our row vector:
好了,現在讓我們轉置我們的行向量:
Great! We now have a nice column vector.
偉大!現在,我們有一個不錯的列向量。
Rather than creating a row vector then transposing it, it is also possible to convert a 1D array directly into a column vector:
與其創建行向量然後轉置它,不如將一維陣列直接轉換為列向量:
Plotting a matrix¶
繪製矩陣¶
We have already seen that vectors can been represented as points or arrows in N-dimensional space. Is there a good graphical representation of matrices? Well you can simply see a matrix as a list of vectors, so plotting a matrix results in many points or arrows. For example, let's create a P
and plot it as points:
我們已經看到,向量可以在 N 維空間中表示為點或箭頭。矩陣有沒有很好的圖形表示?好吧,你可以簡單地將矩陣視為向量清單,因此繪製矩陣會產生許多點或箭頭。例如,讓我們創建一個 P
並將其繪製為點:
<matplotlib.figure.Figure 在 0x106a2c490>
Of course we could also have stored the same 4 vectors as row vectors instead of column vectors, resulting in a
當然,我們也可以存儲與行向量相同的 4 個向量,而不是列向量,從而產生一個
Since the vectors are ordered, you can see the matrix as a path and represent it with connected dots:
由於向量是有序的,因此您可以將矩陣視為路徑,並用連接的點表示它:
<matplotlib.figure.Figure 在 0x106b0e090>
Or you can represent it as a polygon: matplotlib's Polygon
class expects an
或者你可以把它表示成一個多邊形: matplotlib 的 Polygon
類需要一個
Geometric applications of matrix operations¶
矩陣運算的幾何應用¶
We saw earlier that vector addition results in a geometric translation, vector multiplication by a scalar results in rescaling (zooming in or out, centered on the origin), and vector dot product results in projecting a vector onto another vector, rescaling and measuring the resulting coordinate.
我們之前看到,向量添加會導致幾何平移,向量乘以標量會導致重新縮放(放大或縮小,以原點為中心),而向量點積導致將一個向量投影到另一個向量上,重新縮放並測量結果座標。
Similarly, matrix operations have very useful geometric applications.
同樣,矩陣運算也有非常有用的幾何應用。
<matplotlib.figure.Figure 在 0x1072322d0>
If we add a matrix full of identical vectors, we get a simple geometric translation:
如果我們添加一個充滿相同向量的矩陣,我們會得到一個簡單的幾何轉換:
Although matrices can only be added together if they have the same size, NumPy allows adding a row vector or a column vector to a matrix: this is called broadcasting and is explained in further details in the NumPy tutorial. We could have obtained the same result as above with:
雖然矩陣只有在大小相同的情況下才能相加,但 NumPy 允許向矩陣添加行向量或列向量:這稱為廣播,在 NumPy 教程中有更詳細的解釋。我們本可以得到與上述相同的結果:
陣列([[ 2.5, 3.5, 0.5, 4.1],
Scalar multiplication¶
標量乘法¶
Multiplying a matrix by a scalar results in all its vectors being multiplied by that scalar, so unsurprisingly, the geometric result is a rescaling of the entire figure. For example, let's rescale our polygon by a factor of 60% (zooming out, centered on the origin):
將矩陣乘以標量會導致其所有向量乘以該標量,因此不出所料,幾何結果是整個圖形的重新縮放。例如,讓我們將多邊形重新縮放 60% 的係數(縮小,以原點為中心):
<matplotlib.figure.Figure 在 0x1073a8550>
Matrix multiplication – Projection onto an axis¶
矩陣乘法 – 投影到軸上¶
Matrix multiplication is more complex to visualize, but it is also the most powerful tool in the box.
矩陣乘法的可視化更加複雜,但它也是開箱即用的最強大的工具。
Let's start simple, by defining a
讓我們從簡單的開始,定義一個
Now let's look at the dot product
現在讓我們看看點積
陣列([[ 3. , 4. , 1. , 4.6]])
These are the horizontal coordinates of the vectors in
這些是 中
We can actually project on any other axis by just replacing
我們實際上可以通過替換為
Good! Remember that the dot product of a unit vector and a matrix basically performs a projection on an axis and gives us the coordinates of the resulting points on that axis.
好!請記住,單位向量和矩陣的點積基本上在軸上執行投影,併為我們提供該軸上結果點的座標。
陣列([[ 0.8660254, 0.5 ],
Let's look at the product
讓我們看看產品
陣列([[ 2.69807621, 5.21410162, 1.8660254 , 4.23371686],
The first row is equal to
第一行等於
Matrix
矩陣
Matrix multiplication – Other linear transformations¶
矩陣乘法 – 其他線性變換¶
More generally, any linear transformation
更一般地說,任何將 n 維向量映射到 m 維向量的線性變換
and
,定義為
This transormation
這種轉換以線性方式
Now, to compute
現在,為了計算
If we have a matric
如果我們有一個 matric
To summarize, the matrix on the left hand side of a dot product specifies what linear transormation to apply to the right hand side vectors. We have already shown that this can be used to perform projections and rotations, but any other linear transformation is possible. For example, here is a transformation known as a shear mapping:
Let's look at how this transformation affects the unit square:
Now let's look at a squeeze mapping:
The effect on the unit square is:
Let's show a last one: reflection through the horizontal axis:
Matrix inverse¶
Now that we understand that a matrix can represent any linear transformation, a natural question is: can we find a transformation matrix that reverses the effect of a given transformation matrix
For example, the rotation, the shear mapping and the squeeze mapping above all have inverse transformations. Let's demonstrate this on the shear mapping:
We applied a shear mapping on
We defined the inverse matrix inv
function to compute a matrix's inverse, so we could have written instead:
Only square matrices can be inversed. This makes sense when you think about it: if you have a transformation that reduces the number of dimensions, then some information is lost and there is no way that you can get it back. For example say you use a
Looking at this image, it is impossible to tell whether this is the projection of a cube or the projection of a narrow rectangular object. Some information has been lost in the projection.
Even square transformation matrices can lose information. For example, consider this transformation matrix:
This transformation matrix performs a projection onto the horizontal axis. Our polygon gets entirely flattened out so some information is entirely lost and it is impossible to go back to the original polygon using a linear transformation. In other words,
Here is another example of a singular matrix. This one performs a projection onto the axis at a 30° angle above the horizontal axis:
But this time, due to floating point rounding errors, NumPy manages to calculate an inverse (notice how large the elements are, though):
As you might expect, the dot product of a matrix by its inverse results in the identity matrix:
This makes sense since doing a linear transformation followed by the inverse transformation results in no change at all.
Another way to express this is that the inverse of the inverse of a matrix
Also, the inverse of scaling by a factor of
Once you understand the geometric interpretation of matrices as linear transformations, most of these properties seem fairly intuitive.
A matrix that is its own inverse is called an involution. The simplest examples are reflection matrices, or a rotation by 180°, but there are also more complex involutions, for example imagine a transformation that squeezes horizontally, then reflects over the vertical axis and finally rotates by 90° clockwise. Pick up a napkin and try doing that twice: you will end up in the original position. Here is the corresponding involutory matrix:
Finally, a square matrix
Therefore:
It corresponds to a transformation that preserves distances, such as rotations and reflections, and combinations of these, but not rescaling, shearing or squeezing. Let's check that
Determinant¶
The determinant of a square matrix
- Where
is the matrix without row and column .
For example, let's calculate the determinant of the following
Using the method above, we get:
Now we need to compute the determinant of each of these
Now we can calculate the final result:
To get the determinant of a matrix, you can call NumPy's det
function in the numpy.linalg
module:
One of the main uses of the determinant is to determine whether a square matrix can be inversed or not: if the determinant is equal to 0, then the matrix cannot be inversed (it is a singular matrix), and if the determinant is not 0, then it can be inversed.
For example, let's compute the determinant for the
That's right,
This determinant is suspiciously close to 0: it really should be 0, but it's not due to tiny floating point errors. The matrix is actually singular.
Perfect! This matrix can be inversed as we saw earlier. Wow, math really works!
The determinant can also be used to measure how much a linear transformation affects surface areas: for example, the projection matrices
We rescaled the polygon by a factor of 1/2 on both vertical and horizontal axes so the surface area of the resulting polygon is 1/4
Correct!
The determinant can actually be negative, when the transformation results in a "flipped over" version of the original polygon (eg. a left hand glove becomes a right hand glove). For example, the determinant of the F_reflect
matrix is -1 because the surface area is preserved but the polygon gets flipped over:
Composing linear transformations¶
Several linear transformations can be chained simply by performing multiple dot products in a row. For example, to perform a squeeze mapping followed by a shear mapping, just write:
Since the dot product is associative, the following code is equivalent:
Note that the order of the transformations is the reverse of the dot product order.
If we are going to perform this composition of linear transformations more than once, we might as well save the composition matrix like this:
From now on we can perform both transformations in just one dot product, which can lead to a very significant performance boost.
What if you want to perform the inverse of this double transformation? Well, if you squeezed and then you sheared, and you want to undo what you have done, it should be obvious that you should unshear first and then unsqueeze. In more mathematical terms, given two invertible (aka nonsingular) matrices
And in NumPy:
Singular Value Decomposition¶
It turns out that any
- a rotation matrix
(an orthogonal matrix) - a scaling & projecting matrix
(an diagonal matrix) - and another rotation matrix
(an orthogonal matrix)
For example, let's decompose the shear transformation:
Note that this is just a 1D array containing the diagonal values of Σ. To get the actual matrix Σ, we can use NumPy's diag
function:
Now let's check that F_shear
:
It worked like a charm. Let's apply these transformations one by one (in reverse order) on the unit square to understand what's going on. First, let's apply the first rotation
Now let's rescale along the vertical and horizontal axes using
Finally, we apply the second rotation
And we can see that the result is indeed a shear mapping of the original unit square.
Eigenvectors and eigenvalues¶
An eigenvector of a square matrix
Where
For example, any horizontal vector remains horizontal after applying the shear mapping (as you can see on the image above), so it is an eigenvector of
If we look at the squeeze mapping, we find that any horizontal or vertical vector keeps its direction (although its length changes), so all horizontal and vertical vectors are eigenvectors of
However, rotation matrices have no eigenvectors at all (except if the rotation angle is 0° or 180°, in which case all non-zero vectors are eigenvectors).
NumPy's eig
function returns the list of unit eigenvectors and their corresponding eigenvalues for any square matrix. Let's look at the eigenvectors and eigenvalues of the squeeze mapping matrix
Indeed the horizontal vectors are stretched by a factor of 1.4, and the vertical vectors are shrunk by a factor of 1/1.4=0.714…, so far so good. Let's look at the shear mapping matrix
Wait, what!? We expected just one unit eigenvector, not two. The second vector is almost equal to
Trace¶
The trace of a square matrix
The trace does not have a simple geometric interpretation (in general), but it has a number of properties that make it useful in many areas:
- …
It does, however, have a useful geometric interpretation in the case of projection matrices (such as
What next?¶ 下一步是什麼?¶
This concludes this introduction to Linear Algeabra. Although these basics cover most of what you will need to know for Machine Learning, if you wish to go deeper into this topic there are many options available: Linear Algebra books, Khan Academy lessons, or just Wikipedia pages.