How to get from high school math to cutting-edge ML/AI: a detailed 4-stage roadmap with links to the best learning resources that I’m aware of.
如何從高中數學進階到前沿機器學習/人工智能:一份詳細的四階段路線圖,附帶我所知的最佳學習資源鏈接。
1) Foundational math. 2) Classical machine learning. 3) Deep learning. 4) Cutting-edge machine learning.
1) 基礎數學。 2) 經典機器學習。 3) 深度學習。 4) 前沿機器學習。
I recently talked to a number of people who work in software and want to get to the point where they can read serious ML/AI papers like Denoising Diffusion Probabilistic Models.
我最近與一些從事軟件工作的人交談,他們希望能夠閱讀如《去噪擴散概率模型》這樣的嚴肅的機器學習/人工智能論文。
But even though they did well in high school math, even AP Calculus, maybe even learned some undergraduate math…
但即使他們在高中數學表現出色,甚至學習了 AP 微積分,可能還學習了一些大學數學...
the math in these cutting-edge ML papers still looks like hieroglyphics.
這些前沿機器學習論文中的數學仍然看起來像象形文字。
So, how do you get from high school math to cutting-edge ML?
那麼,如何從高中數學進階到前沿機器學習呢?
Here’s a 4-stage roadmap.
這裡有一個四階段的路線圖。
I’ll start by briefly describing all the stages – and then I’ll go back to each stage for a deep dive where I fully explain the rationale and point you to resources that you can use to guide your learning.
我將首先簡要描述所有階段 - 然後我會回到每個階段進行深入探討,充分解釋理由並為您指出可以指導學習的資源。
- Stage 1: Foundational Math. All the high school and university-level math that underpins machine learning. All of algebra, a lot of single-variable calculus / linear algebra / probability / statistics, and a bit of multivariable calculus.
第一階段:基礎數學。所有支撐機器學習的高中和大學水平數學。包括所有代數、大量單變量微積分/線性代數/概率/統計,以及一些多變量微積分。 - Stage 2: Classical Machine Learning. Coding up streamlined versions of basic regression and classification models, all the way from linear regression to small multi-layer neural networks.
第二階段:經典機器學習。編寫基本回歸和分類模型的精簡版本,從線性回歸到小型多層神經網絡。 - Stage 3: Deep Learning. Multi-layer neural networks with many parameters, where the architecture of the network is tailored to the specific kind of task you’re trying to get the model to perform.
第三階段:深度學習。具有多個參數的多層神經網絡,其中網絡架構針對您試圖讓模型執行的特定類型任務進行了定制。 - Stage 4: Cutting-Edge Machine Learning. Transformers, LLMs, diffusion models, and all the crazy stuff that’s coming out now, that captured your interest to begin with.
第四階段:前沿機器學習。變換器、LLMs、擴散模型,以及現在正在出現的所有令人興奮的新技術,這些都最初引起了您的興趣。
Note that I’ve spent the past 5+ years working on resources to support learners in stages 1-2, and there is a general lack of serious yet friendly resources in those stages, so I’m going to be including my own resources there (along with some others).
請注意,我在過去 5 年多的時間裡致力於開發支持第 1-2 階段學習者的資源,而在這些階段中普遍缺乏嚴肅但友好的資源,因此我將在這裡包括我自己的資源(以及其他一些資源)。
But in stages 3-4, all the resources I’ll reference are things I’m not affiliated with in any way.
但在第 3-4 階段,我提到的所有資源都是我沒有任何關聯的內容。
Alright, let’s dive in! 好,讓我們開始吧!
Stage 1: Foundational Math
第一階段:基礎數學
There’s a lot of math underpinning ML: all of high school algebra, a lot of single-variable calculus / linear algebra / probability / statistics, and a bit of multivariable calculus.
機器學習的基礎有很多數學:包括所有高中代數、大量單變量微積分/線性代數/概率/統計,以及一些多變量微積分。
Last year, I spent some time mapping all this math out at a topic-by-topic level. In many of these math courses, some topics are absolutely essential to know for ML, while other topics are unnecessary.
去年,我花了一些時間在主題層面上詳細梳理了所有這些數學。在許多數學課程中,有些主題對機器學習來說絕對必不可少,而其他主題則不必要。
For instance, in multivariable calculus:
例如,在多變量微積分中:
- You absolutely must know how to compute gradients (they show up all the time in the context of gradient descent when training ML models), and you need to be solid on the multivariable chain rule, which underpins the backpropagation algorithm for training neural networks.
您必須知道如何計算梯度(它們在訓練機器學習模型時的梯度下降背景中經常出現),並且您需要牢固掌握多變量鏈式法則,這是訓練神經網絡的反向傳播算法的基礎。 - But on the other hand, many other multivariable calculus topics like divergence, curl, spherical coordinates, and Stokes' theorem don’t really show up anywhere in ML.
但另一方面,許多其他多變量微積分主題如散度、旋度、球面坐標和斯托克斯定理在機器學習中並沒有真正出現。
Once I mapped out the list of required topics, we put together a Mathematics for Machine Learning course whose table of contents can be viewed here.
一旦我列出了所需主題的清單,我們就組織了一門機器學習數學課程,其目錄可以在這裡查看。
(To see the list of all 242 individual topics, click on “Content” and then click to expand the Unit boxes.)
(要查看全部 242 個獨立主題的列表,請點擊"內容",然後點擊展開單元框。)
If you’re working on Math Academy’s maximum-efficiency learning system, you can learn all this content in about 70 hours if you know your math up through single-variable calculus.
如果你正在使用 Math Academy 的最高效學習系統,並且你已經掌握了單變量微積分的數學知識,你可以在大約 70 小時內學完所有這些內容。
(And if you’re missing background knowledge, that’s totally fine – the diagnostic assessment will automatically figure out the prerequisite material that you need to learn in algebra, calculus, etc. and add it to your learning plan.
(如果你缺少背景知識,那完全沒問題 - 診斷性評估會自動找出你需要學習的代數、微積分等先修材料,並將其添加到你的學習計劃中。
Even if you’ve completely forgotten all the math you learned and need to rebuild your foundations from the ground up, starting with fractions, we’ve got you covered.)
即使你已經完全忘記了所學的所有數學知識,需要從基礎開始重建你的知識,從分數開始,我們都能幫到你。)
It’s also possible to learn these topics through free online resources. For instance,
也可以通過免費的在線資源學習這些主題。例如,
- Khan Academy covers arithmetic up through high school math,
Khan Academy 涵蓋了從算術到高中數學的內容, - OpenStax covers high school and a bit of early university math,
OpenStax 涵蓋了高中和一些早期大學數學, - MIT OpenCourseWare covers plenty of university math, and
MIT OpenCourseWare 涵蓋了大量的大學數學,而 - for any given math topic, there are plenty of notes online and usually plenty of videos on YouTube.
對於任何給定的數學主題,網上都有大量筆記,通常在 YouTube 上也有許多視頻。
However, to make it all the way through this Stage 1 using free resources, you’ll have to piece together a hodgepodge of scattered educational content, and there will be a lot more unproductive friction that will not only slow you down but also make you more likely to get overwhelmed and give up.
然而,要完全通過免費資源完成第一階段,你必須將零散的教育內容拼湊在一起,這將產生更多無效的阻力,不僅會減慢你的進度,還會使你更容易感到 overwhelmed 並放棄。
(Personally, I self-studied a bunch of math subjects through MIT OpenCourseWare and a variety of textbooks when I was in high school. These were good resources and I came a long way with them, but for the amount of effort that I put into learning, I could have gone a lot further if my time were used more efficiently. For more info, see my post Why Not Just Learn from a Textbook?.)
(就我個人而言,我在高中時通過麻省理工學院開放課程和各種教科書自學了一系列數學科目。這些都是很好的資源,我也取得了很大進步,但就我投入學習的努力而言,如果我的時間使用得更有效率,我本可以走得更遠。更多信息,請參閱我的文章《為什麼不直接從教科書學習?》。)
For most people, this stage of foundational math learning is a make-or-break moment and the hardest part of their machine learning journey.
對大多數人來說,這個基礎數學學習階段是一個成敗攸關的時刻,也是他們機器學習之旅中最艱難的部分。
Learning the foundational math for machine learning is like learning how to read: achieving literacy opens up a world of further learning experiences, but without literacy, that world is completely closed off to you.
學習機器學習的基礎數學就像學習閱讀:獲得識字能力為進一步學習經驗打開了一個世界,但沒有識字能力,這個世界對你完全封閉。
Stage 2: Classical Machine Learning
第二階段:經典機器學習
Once you have your math foundations in place, you’re ready to start coding up streamlined versions of basic ML models from linear regression to small multi-layer neural networks.
一旦你掌握了數學基礎,你就可以開始編寫從線性回歸到小型多層神經網絡的基本機器學習模型的簡化版本。
But don’t try to jump into the fancy cutting-edge ML models just yet – if you do, you’re going to be confused by a lot of patterns that come from classical machine learning.
但不要試圖直接跳入 fancy 的前沿機器學習模型 - 如果你這樣做,你會被許多來自經典機器學習的模式所困惑。
Let me give a concrete example of what I mean by this. Let’s go back to that paper Denoising Diffusion Probabilistic Models.
讓我舉一個具體例子來說明我的意思。讓我們回到那篇《去噪擴散概率模型》的論文。
If you look at the bottom of page 3, you’ll see that equation (8) looks something like the following, but with a lot more subscripts and function arguments (which I’ve left out for simplicity):
如果你看第 3 頁底部,你會看到方程(8)看起來像下面這樣,但有更多的下標和函數參數(為了簡單起見,我省略了這些):
Even if you know
即使你知道
- what the means (“expectation” from probability/statistics),
的意思(來自概率/統計學中的「期望值」), - what the means (“standard deviation” also from probability/statistics),
的意思(也來自概率/統計學中的「標準差」), - what the means ("vector norm" from linear algebra and multivariable calculus),
的意思(來自線性代數和多變量微積分中的「向量範數」), - what the means (arbitrary "constant of integration" that vanishes when we take the derivative, covered in calculus),
的意思(微積分中涉及的任意「積分常數」,在求導時消失), - and so on ... 等等……
the way these symbols are arranged here might look really confusing.
這些符號在此處的排列方式可能看起來非常令人困惑。
But if you know your classical machine learning, the equation immediately hits you as resembling a “loss function” that measures the average squared difference between two quantities.
但如果你熟悉經典機器學習,這個方程會立即讓你聯想到衡量兩個量之間平均平方差的「損失函數」。
(It’s no coincidence that the authors used the letter on the left-hand side of the equation – for “loss.”)
(作者在等式左側使用字母 並非巧合—— 代表「損失」。)
In classical ML, in that average squared difference between two quantities, one of those quantities comes from your model, and the other comes from a data set, so the loss function in a sense measures how “wrong” your model is in its predictions across the data set.
在經典機器學習中,在兩個量之間的平均平方差中,一個量來自你的模型,另一個來自數據集,因此損失函數在某種意義上衡量了你的模型在整個數據集的預測中有多「錯誤」。
(Loosely speaking, the goal of “training” a ML model is to minimize the loss function, that is, to adjust the parameters of the model to minimize how “wrong” the model is about the data.)
(簡而言之,「訓練」機器學習模型的目標是最小化損失函數,即調整模型參數以最小化模型對數據的「錯誤」程度。)
My point here is that if you know about these modeling patterns from classical ML, then you’ll immediately have intuition about the equations you see in cutting-edge ML papers.
我在此要強調的是,如果你了解這些來自經典機器學習的建模模式,那麼你將立即對前沿機器學習論文中所見的方程式有直觀理解。
It’s like how, if a musician has practiced playing their scales and identifying musical keys, they can often jump into any song – even one they’ve never heard it before – and improvise in a way that sounds good, because their understanding of underlying musical patterns has given them a level of “feeling” or “intuition” about the new musical structure in which they’re operating.
這就像一個音樂家如果練習過音階並能識別音樂調性,他們通常能夠即興演奏任何歌曲 — 即使是他們從未聽過的歌曲 — 並且聽起來很好,因為他們對基本音樂模式的理解賦予了他們對新音樂結構的某種「感覺」或「直覺」。
Similarly, an experienced software developer who knows all sorts of design patterns might be able to get a sense of how a piece of code operates just by noticing patterns in the high-level structure – even though they haven’t actually gone through line by line to understand the precise operations that are happening in the code.
同樣地,一個熟悉各種設計模式的經驗豐富的軟件開發人員可能只需注意到高層結構中的模式,就能夠理解一段代碼的運作方式 — 即使他們沒有逐行詳細了解代碼中發生的精確操作。
If you know your classical machine learning, the same thing will happen to you when reading modern ML papers.
如果你熟悉經典機器學習,在閱讀現代機器學習論文時,你也會遇到同樣的情況。
Just glancing at equations and diagrams, you’ll find meaningful features jumping out at you in intuitive ways.
只需瞥一眼方程式和圖表,你就會發現有意義的特徵以直觀的方式躍然紙上。
Reading the paper will feel like looking at a “picture” where you get a high-level sense of what’s going on and then zoom in on the details afterwards.
閱讀論文將感覺像是在看一幅「圖畫」,你首先獲得對整體情況的高層次理解,然後再放大細節。
But if you don’t know your classical ML, then good luck reading modern ML papers, because the authors are not going to spell all this out for you!
但如果你不了解經典機器學習,那麼閱讀現代機器學習論文將會很困難,因為作者不會為你詳細解釋所有這些!
So, once you have your math foundations in place, how can you get up to speed classical machine learning?
那麼,一旦你具備了數學基礎,如何能夠快速掌握經典機器學習呢?
One problem with many ML courses is that they don’t have students implement the key models from scratch.
許多機器學習課程的一個問題是,它們沒有讓學生從頭開始實現關鍵模型。
Now, I’m not saying you have to implement a full-fledged ML library with all the bells and whistles of scikit-learn, pytorch, tensorflow, keras, etc., …
現在,我並不是說你必須實現一個具有 scikit-learn、pytorch、tensorflow、keras 等所有功能的完整機器學習庫,...
but simply coding up a streamlined version of each basic ML model along the way from linear regression to small multi-layer neural networks will do wonders for your understanding.
但僅僅編寫從線性迴歸到小型多層神經網絡的每個基本機器學習模型的精簡版本,就會極大地提升你的理解。
This was the premise of a series of quantitative coding classes that I developed and taught from 2020-23.
這是我在 2020-23 年間開發並教授的一系列量化編碼課程的前提。
- First, students implemented basic ML algorithms from scratch, coding up streamlined versions of polynomial and logistic regression, k-nearest neighbors, k-means clustering, and parameter fitting via gradient descent.
首先,學生從頭開始實現基本的機器學習算法,編寫多項式和邏輯迴歸、k-最近鄰、k-均值聚類以及通過梯度下降進行參數擬合的精簡版本。 - Then, they implemented more advanced ML algorithms such as decision trees and neural networks (still streamlined versions), and they reproduced academic research papers in artificial intelligence leading up to Blondie24, an AI computer program that taught itself to play checkers.
然後,他們實現了更高級的機器學習算法,如決策樹和神經網絡(仍然是精簡版本),並重現了人工智能領域的學術研究論文,直至 Blondie24,這是一個自學下棋的人工智能計算機程序。
I wrote a textbook for those courses, Introduction to Algorithms and Machine Learning, which includes all the instructional material and exercises that I used during the courses. It’s freely available here.
我為這些課程編寫了一本教科書《算法和機器學習導論》,其中包括我在課程中使用的所有教學材料和練習。它可以在這裡免費獲取。
The textbook is self-contained, and once I finished it, my classes were almost entirely self-service. Students read the assigned chapter on their own and completed the exercises. In class, instead of giving traditional lessons, I answered questions and helped students debug their code.
這本教科書是自成一體的,一旦我完成了它,我的課程幾乎完全是自助式的。學生自行閱讀指定的章節並完成練習。在課堂上,我不再進行傳統的授課,而是回答問題並幫助學生調試他們的代碼。
(My students came in without much coding experience, so the book also has you implement canonical data structures and algorithms including sorting, searching, graph traversals, etc. –
(我的學生進來時沒有太多編碼經驗,所以這本書還讓你實現經典的數據結構和算法,包括排序、搜索、圖遍歷等 —
you can skip this stuff if you already know it, but if you don’t know it, it would definitely worth using these exercises to develop your algorithmic coding chops in general.
如果你已經知道這些,可以跳過,但如果你不知道,那麼使用這些練習來培養你的算法編碼能力絕對是值得的。
Even if you’ve been working in software for years – if you don’t have experience writing this sort of algorithm-heavy code, it would be a good idea to use this as an opportunity to get some practice.
即使你已經在軟件領域工作多年 — 如果你沒有編寫這種算法密集型代碼的經驗,利用這個機會獲得一些實踐將是一個好主意。
It’s like how even beyond competitive weightlifting, most serious athletes still lift weights to develop supporting musculature and improve their overall athleticism, since it indirectly carries over to increase their sport-specific performance.)
就像在競技舉重之外,大多數認真的運動員仍會舉重以發展輔助肌肉群並提高整體運動能力,因為這間接地有助於提升他們在特定運動中的表現。
IMPORTANT: it’s crucial to point out that the intended learning outcome of Introduction to Algorithms and Machine Learning is pretty general quantitative coding with a focus on ML/AI.
重要:必須指出,《演算法與機器學習導論》的預期學習成果是相當通用的定量編碼,重點在於機器學習/人工智慧。
While the book provides you with solid foundations, you would still want to follow up with a proper ML course afterwards that goes through the whole zoo of ML algorithms.
儘管該書為你提供了堅實的基礎,你仍然需要在之後跟進一個適當的機器學習課程,以全面了解各種機器學習演算法。
For this, I would recommend working through Andrew Ng’s acclaimed Machine Learning Specialization on Coursera.
為此,我建議你學習 Andrew Ng 在 Coursera 上備受讚譽的機器學習專項課程。
This is a sequence of 3 courses:
這是一個包含 3 門課程的系列:
- Course 1: Supervised Machine Learning: Regression and Classification
課程 1:監督式機器學習:迴歸和分類 - Course 2: Advanced Learning Algorithms
課程 2:進階學習演算法 - Course 3: Unsupervised Learning, Recommenders, Reinforcement Learning
課程 3:無監督學習、推薦系統、強化學習
But make sure to do the quizzes and assignments! It doesn’t count if you just watch the videos. Passively “following along” is not the same as, or even remotely close to, actually learning.
但請務必完成測驗和作業!僅僅觀看視頻是不夠的。被動地"跟著學"與實際學習相去甚遠,甚至完全不同。
Stage 3: Deep Learning 階段 3:深度學習
So much of modern machine learning is built upon deep learning, that is, multi-layer neural networks with many parameters, where the architecture of the network is tailored to the specific kind of task you’re trying to get the model to perform.
現代機器學習的大部分都建立在深度學習之上,即具有許多參數的多層神經網絡,其中網絡的架構是針對您試圖讓模型執行的特定任務量身定制的。
If you asked me several years ago how to spin up on deep learning, I wouldn’t know what to tell you. While neural networks have been around for a while in the academic literature, deep learning didn’t really take off in popularity until around 2010, which means there wasn’t much demand for educational material until recently.
如果幾年前您問我如何快速掌握深度學習,我可能不知道該如何回答。雖然神經網絡在學術文獻中已經存在了一段時間,但深度學習直到 2010 年左右才真正開始流行,這意味著直到最近才出現對教育材料的需求。
When I first got interested in deep learning, around 2013-14, the only way to spin up on it was to trudge through miscellaneous research papers, YouTube videos, and blog posts. It was extremely frustrating and inefficient.
當我在 2013-14 年左右首次對深度學習產生興趣時,唯一的學習方法就是艱難地閱讀各種研究論文、YouTube 視頻和博客文章。這是極其令人沮喪和低效的。
Later in the 2010s, a number of deep learning textbooks came out – and while they definitely improved the state of deep learning education, they were seriously lacking concrete computational examples and exercises.
在 2010 年代後期,出現了一些深度學習教科書 – 雖然它們確實改善了深度學習教育的狀況,但它們嚴重缺乏具體的計算示例和練習。
It was kind of like reading a long, thorough lecture, which was better than a hodgepodge of papers, but it was still pretty hard to learn from compared to, say, a well-scaffolded calculus textbook full of worked examples and hands-on exercises.
這有點像閱讀一個冗長而全面的講座,雖然比雜亂無章的論文好,但與一本結構良好、充滿已解答示例和動手練習的微積分教科書相比,學習起來仍然相當困難。
More recently, though, one deep learning textbook has caught my eye as a superb educational resource: Understanding Deep Learning by Simon J. D. Prince.
然而,最近有一本深度學習教科書引起了我的注意,它是一個極佳的教育資源:Simon J. D. Prince 所著的《理解深度學習》。
It’s freely available here, and I would highly recommend to check out this highlights reel demonstrating what makes the book so awesome.
該書可在此處免費獲取,我強烈建議您查看這個亮點集錦,它展示了這本書的優秀之處。
I’ve read through a number of chapters myself. It’s a serious yet friendly textbook – remarkably detailed and full of visualizations, quick concrete algebraic/numerical examples and exercises, historical notes/references, and references to current work in the field.
我自己已經閱讀了幾個章節。這是一本嚴肅而友好的教科書 – 內容非常詳細,充滿了視覺化內容、簡潔具體的代數/數值示例和練習、歷史註解/參考文獻,以及對該領域當前工作的引用。
Overall, an amazing resource for anyone who has foundational math chops and knowledge of classical ML, and has paid attention to deep learning making headlines through the last decade, but hasn’t kept up with all the technical details.
總的來說,這是一個極好的資源,適合那些具有基礎數學技能和經典機器學習知識,並在過去十年中關注過深度學習頭條新聞,但未能跟上所有技術細節的人。
By the way, it’s not just me who loves this book. It blew up on HackerNews last year and has 4.8/5 stars across 97 reviews on Amazon – and if you read those Amazon reviews, it’s obvious that this book has made a tremendous positive impact on many people’s lives.
順便說一下,不僅是我喜歡這本書。去年它在 HackerNews 上引起轟動,在亞馬遜上有 97 條評論,平均評分為 4.8/5 星 – 如果你閱讀這些亞馬遜評論,很明顯這本書對許多人的生活產生了巨大的積極影響。
Afterwards, I would also recommend to work through Jeremy Howard’s Practical Deep Learning for Coders (Part 1) on Fast.AI.
隨後,我也建議您學習 Fast.AI 上 Jeremy Howard 的《實用深度學習編程》(第一部分)。
This course has a wealth of great hands-on, guided projects that go beyond textbook exercises.
這門課程有大量優秀的實踐項目,遠超教科書練習。
Again, make sure to do the projects! It doesn’t count if you just watch the videos.
再次強調,務必完成這些項目!僅僅觀看視頻是不夠的。
(To be clear: I would still recommend to work through the exercises in the Understanding Deep Learning book so that you come into the Fast.AI projects with plenty of background knowledge.
(需要明確的是:我仍然建議完成《理解深度學習》一書中的練習,以便在開始 Fast.AI 項目時已具備充分的背景知識。
Projects are great for pulling a bunch of knowledge together and solidifying your knowledge, but when you’re initially learning something for the very first time, it’s more efficient to do so in a lower-complexity, higher-scaffolding setting.
項目非常適合整合大量知識並鞏固您的理解,但當您初次學習某些內容時,在較低複雜度、較高輔助的環境中學習會更有效率。
Otherwise, without a serious level of background knowledge, it’s easy to get overwhelmed by a project – and when you’re overwhelmed and spinning your wheels being confused without making much progress, that’s very inefficient for your learning.
否則,如果缺乏足夠的背景知識,很容易被項目所淹沒——當您感到不知所措,困惑地原地打轉而無法取得進展時,這對您的學習效率極為不利。
By the way, this is an instance of the “expertise reversal effect” in the science of learning.)
順便說一下,這是學習科學中「專業倒置效應」的一個例子。)
Stage 4: Cutting-Edge Machine Learning
第四階段:前沿機器學習
Finally, we reach the end destination: transformers, LLMs, diffusion models, and all the crazy stuff that’s coming out now, that captured your interest to begin with.
最後,我們到達了終點:transformer、LLMs、擴散模型,以及現在出現的所有令人驚嘆的技術,這些正是最初引起您興趣的內容。
Guess what? You know how I said that when I first got interested in deep learning, around 2013-14, the only way to spin up on it was to trudge through miscellaneous research papers, YouTube videos, and blog posts – and it was extremely frustrating and inefficient?
您還記得嗎?我曾說過,當我在 2013-14 年左右首次對深度學習產生興趣時,唯一學習的方法就是艱難地閱讀各種研究論文、觀看 YouTube 視頻和閱讀博客文章——這個過程極其令人沮喪且效率低下。
Well, that’s just kind of how it goes when you reach the cutting edge of a field – and when you get to this “Stage 4,” that’s what you can expect (at least, for now, until the field matures further).
嗯,當你達到一個領域的前沿時,情況就是這樣——當你到達這個「第四階段」時,這就是你可以預期的(至少目前如此,直到該領域進一步成熟)。
That said, even when you’re operating at the edge of human knowledge, there is still a way to optimize your approach.
話雖如此,即使你在人類知識的邊緣運作,仍然有優化你方法的方式。
I’ll quote my colleague Alex Smith who who recently posted about his own experience getting up to speed on his dissertation topic while doing a PhD in mathematics:
我引用我的同事亞歷克斯·史密斯最近發布的關於他在數學博士期間熟悉論文主題的經歷:
- "My biggest mistake when starting my doctoral research was taking a top-down approach. I focused my efforts on a handful of research papers on the frontier of my chosen field, even writing code to solve problems in these papers from day one. However, I soon realized I lacked many foundational prerequisites, making the first year exceptionally tough.
「開始我的博士研究時,我最大的錯誤是採取自上而下的方法。我將精力集中在我所選領域前沿的幾篇研究論文上,甚至從第一天就開始編寫代碼來解決這些論文中的問題。然而,我很快意識到我缺乏許多基礎先決條件,使得第一年異常艱難。
What I should have done was spend 3-6 months dissecting the hell out of all the key research papers and books written on the subject, starting from the very basics (from my knowledge frontier) and working my way up (the bottom-up approach)."
我應該做的是花 3-6 個月的時間徹底解析該主題的所有關鍵研究論文和書籍,從最基礎的開始(從我的知識前沿),然後逐步向上(自下而上的方法)。」
I know “cutting-edge” might sound like a single line where you cross from reading established textbooks to reading the latest arXiv preprint that blew up yesterday…
我知道「前沿」可能聽起來像是一條你從閱讀已建立的教科書到閱讀昨天爆紅的最新 arXiv 預印本的單一界線……
but in reality, the cutting edge is more of a “zone” where the amount of guidance gradually fades. Just like how, on a physical knife, you typically see the blade start to narrow a couple millimeters before the true edge.
但實際上,前沿更像是一個「區域」,在這裡指導的數量逐漸減少。就像在物理刀具上,你通常會看到刀刃在真正的邊緣前幾毫米就開始變窄。
At the start of the cutting edge, there may not be any textbooks – but there are typically miscellaneous guided resources like videos, blog posts, and foundational research papers and literature reviews that are easier to follow than the latest research papers all the way at the end of the cutting edge.
在前沿的開始,可能沒有任何教科書——但通常有各種引導性資源,如視頻、博客文章,以及比最新研究論文更容易理解的基礎研究論文和文獻綜述。
For a concrete example: if you want to work on cutting-edge LLM research, don’t start by immediately trying to build on the latest paper that’s come out.
舉個具體例子:如果你想進行前沿的LLM研究,不要立即嘗試在最新發表的論文基礎上進行研究。
Instead, start out with a literature search and collect the important papers, videos, blog posts, etc., that will bring you from the beginning of the edge of the field to the very end.
相反,從文獻搜索開始,收集重要的論文、視頻、博客文章等,這些將帶你從該領域邊緣的起點到最前沿。
For instance, Andrej Karpathy’s Neural Networks: Zero to Hero series on YouTube would stand firmly in that pool of resources.
例如,Andrej Karpathy 在 YouTube 上的《Neural Networks: Zero to Hero》系列將堅定地站在這些資源的行列中。
So would Jeremy Howard’s From Deep Learning Foundations to Stable Diffusion on Fast.AI.
Jeremy Howard 在 Fast.AI 上的《From Deep Learning Foundations to Stable Diffusion》也是如此。
Responses to Follow-Up Questions
對後續問題的回應
I received some great follow-up questions about this post after it gained some traction on X/Twitter (August 2024). Feel free to contact me if you have any additional questions that aren’t addressed here.
在這篇文章在 X/Twitter 上獲得一些關注後(2024 年 8 月),我收到了一些很好的後續問題。如果你有任何這裡沒有解答的其他問題,請隨時聯繫我。
Don’t you need software engineering skills as well?
你不需要軟體工程技能嗎?
Yes. This roadmap is directed at people who work in software and want to get into serious ML/AI, so I’m assuming that they’re coming in with SWE skills.
是的。這個路線圖是針對在軟體領域工作並想進入嚴肅的機器學習/人工智能領域的人,所以我假設他們已經具備軟體工程技能。
But I’m sure there are also lots of people reading who don’t have SWE experience – including a “dual space” of people who have plenty of background in math but none in SWE (I’m sure plenty of math majors fall into this category).
但我相信也有很多讀者沒有軟體工程經驗 - 包括一個「對偶空間」的人,他們在數學方面有豐富的背景,但在軟體工程方面沒有經驗(我相信很多數學專業的人都屬於這一類)。
Which brings us to the next question:
這就引出了下一個問題:
Can you also provide a roadmap for learning the fundamentals of CS and coding specifically for doing cutting-edge ML/AI research?
你能否也提供一個學習計算機科學基礎和編碼的路線圖,特別針對進行前沿機器學習/人工智能研究?
I’d say that if you don’t know how to do this already, the first order of business would be to implement the canonical data structures and algorithms covered in Introduction to Algorithms and Machine Learning (sorting, searching, graph traversals, etc., basically the stuff you’d see in a typical Data Structures / Algorithms course). If you work through that textbook in full, you’ll naturally cover all that stuff in addition to the core foundations of classical machine learning.
我會說,如果你還不知道如何做這個,首要任務就是實現《算法導論》和《機器學習》中涵蓋的典型數據結構和算法(排序、搜索、圖遍歷等,基本上是你在典型的數據結構/算法課程中會看到的內容)。如果你完整地學習這本教科書,你自然會涵蓋所有這些內容,並且還包括經典機器學習的核心基礎。
After that, I think it’s easier to pickup the rest of CS/coding as needed along the way since – at least, in my experience – it’s less hierarchical than math.
在那之後,我認為根據需要沿途學習剩餘的計算機科學/編程會更容易,因為——至少根據我的經驗——它不像數學那樣層次分明。
Don’t get me wrong, there are many CS/coding skills that you’ll need to learn if you don’t know already (off the top of my head: version control, interacting with databases and GPUs, various ML libraries/frameworks), but I these sorts of things can be picked up on the fly because, unlike in a tall hierarchy like math, it’s much easier to identify what prerequisites you’re missing whenever there’s something you don’t know how to do.
別誤會我的意思,如果你還不知道的話,確實有許多計算機科學/編程技能需要學習(我能想到的有:版本控制、與數據庫和 GPU 交互、各種機器學習庫/框架),但我認為這些東西可以在實踐中學習,因為與數學這樣的高層次體系不同,當你遇到不知道如何處理的情況時,更容易識別出你缺少哪些先決條件。
Like, let’s say you get to a stage where you need to run a ML model on a GPU, and you don’t know how to hook it up to the GPU. That’s something you’ll need to learn, but it’s not like there’s a mountain of prerequisite knowledge leading up to that task. You don’t have to know much of the underlying theory behind how GPUs work. There are frameworks like PyTorch where you can define a model in terms of tensor operations and the framework handles the process of getting it running efficiently on a GPU. You just have to point the framework to the GPU, which is pretty much just a configuration thing.
比如說,你到了需要在 GPU 上運行機器學習模型的階段,但你不知道如何將其連接到 GPU。這是你需要學習的東西,但並不像有一座先決知識的大山需要攀登才能完成這項任務。你不需要了解太多關於 GPU 工作原理的底層理論。有像 PyTorch 這樣的框架,你可以用張量運算來定義模型,框架會處理在 GPU 上高效運行的過程。你只需要將框架指向 GPU,這基本上就是一個配置問題。
Now, I should mention that the way I’m interpreting “ML/AI research” is that you’re focusing on building models that are new in a theoretical sense, as opposed to deploying / scaling up existing models. If you’re actually wanting to get into ML/AI deployment, or maxing out ML/AI compute power, then there’s going to be more CS/coding/software prereqs.
現在,我應該提到,我對"機器學習/人工智能研究"的理解是,你專注於在理論上建立新穎的模型,而不是部署/擴展現有模型。如果你實際上想進入機器學習/人工智能部署,或最大化機器學習/人工智能的計算能力,那麼會有更多計算機科學/編程/軟件先決條件。
For instance, if you want to further improve the efficiency with which a model utilizes a GPU – like, making improvements to PyTorch itself or something like that – then obviously you’re going to have to know a lot more about low-level GPU programming. But I don’t think that stuff would really fall under the realm of “ML/AI research” proper; it would be more properly categorized as “developing supporting software for ML/AI” or something like that.
例如,如果你想進一步提高模型利用 GPU 的效率——比如對 PyTorch 本身進行改進之類的——那麼顯然你需要了解更多關於低級 GPU 編程的知識。但我不認為這些真正屬於"機器學習/人工智能研究"的範疇;它更適合歸類為"開發支持機器學習/人工智能的軟件"或類似的東西。
So, you don’t need to know graduate-level math for deep learning?
那麼,深度學習不需要研究生水平的數學嗎?
That’s right, the necessary math would be classified as undergraduate level. Now, I should point out that there are some theoretical ML researchers who are trying to build unifying theories of deep learning, and they sometimes use more advanced/abstract math in their theoretical frameworks that what I’ve talked about here. But that’s a separate thing.
沒錯,必要的數學應該被歸類為本科水平。現在,我應該指出,有一些理論機器學習研究人員正試圖建立深度學習的統一理論,他們有時在理論框架中使用比我在這裡談到的更高級/抽象的數學。但那是另一回事。
In general, why are there so many arguments about how much math you need to learn for ML?
總的來說,為什麼關於機器學習需要學習多少數學會有這麼多爭論?
Yeah, the question of “how much math do I need for ML/AI” can be a bit polarizing: some people say “no math needed” while others say “you need everything including axiomatic set theory,” but really the most generally correct answer is somewhere in the middle.
是的,"機器學習/人工智能需要多少數學"這個問題可能有點兩極化:有些人說"不需要數學",而其他人說"你需要所有東西,包括公理集合論",但實際上最普遍正確的答案介於兩者之間。
And for any sort of argument where a nuanced middle ground has to be established, it seems necessary to explicitly lay out what the specific middle ground is – what should be included, what shouldn’t be included, and why. So that’s what I’ve tried to do here.
對於任何需要建立微妙中間立場的爭論,似乎有必要明確闡述具體的中間立場是什麼——應該包括什麼,不應該包括什麼,以及為什麼。所以這就是我在這裡試圖做的。
Do you think in some way the sheer number of things to study from the get go might be somewhat of a limiting factor, or discourage people? How should one balance just deep studying while also satisfying curiosity and keeping a pulse on the cutting-edge?
你是否認為從一開始就要學習的東西數量之多可能在某種程度上是一個限制因素,或者會讓人感到氣餒?一個人應該如何平衡深入學習,同時滿足好奇心並保持對前沿發展的關注?
Yeah, I completely agree that the sheer mountain of work ahead of you can be discouraging, so it is definitely worth doing things to keep your interest alive: reading some articles about the cutting edge, toying around with new machine learning models/libraries, etc.
是的,我完全同意擺在你面前的工作量之大可能會讓人感到氣餒,所以做一些能保持興趣的事情確實值得:閱讀一些關於前沿發展的文章,嘗試新的機器學習模型/庫等。
At a fundamental level, playing around not as efficient for learning as actually building up your hard skills through deliberate practice. So you can’t spend all your time tinkering and expect to reach a high level of skill.
從根本上來說,隨意玩耍在學習效率上不如通過刻意練習來建立你的硬技能。所以你不能花所有時間都在摸索,卻期望達到高水平的技能。
However, if you burn yourself out and lose motivation to engage in deliberate practice, and you just stop… then that’s even worse.
然而,如果你讓自己筋疲力盡,失去了進行刻意練習的動力,然後就停止了……那就更糟糕了。
I think that ideally, you’d want to play around just enough to keep yourself motivated to keep building up your hard skills through deliberate practice, and you’d spend the rest of the time actually doing the latter.
我認為理想情況下,你應該玩耍到足以保持自己通過刻意練習建立硬技能的動力,然後把剩下的時間用於實際進行後者。
To zoom out and drive this point home, I want to point out the common misconception that ability is something to be “unlocked” by curiosity (which seems easy), not something “built” by deliberate practice (which seems hard).
為了放大視角並強調這一點,我想指出一個常見的誤解,即能力是通過好奇心來"解鎖"的(這看似容易),而不是通過刻意練習來"建立"的(這看似困難)。
This misconception sounds so ridiculous when you imagine it coming from an athletic trainer: “You want to get really good at basketball? Forget about practice drills – you were born to ball; all you need to do to unlock your inner baller is come in with the right attitude and play some pick-up ball at the park.”
當你想像這種誤解來自一位運動教練時,它聽起來如此荒謬:"你想在籃球上變得真正厲害?忘掉練習訓練吧——你天生就是打球的料;你所需要做的就是帶著正確的態度來,在公園裡打打業餘比賽。"
Now, I’m not against curiosity/interest. That’s not what I’m trying to say at all. But curiosity/interest does not itself build ability. Curiosity/interest motivates people to engage in deliberate practice, which is what builds ability.
現在,我並不反對好奇心/興趣。這完全不是我想表達的意思。但好奇心/興趣本身並不能建立能力。好奇心/興趣激勵人們進行刻意練習,而刻意練習才是建立能力的關鍵。
I’m not saying curiosity/interest doesn’t help, I’m just saying it’s not what moves the needle directly. Deliberate practice is what moves the needle directly. Curiosity/interest “greases the wheels,” so to speak, but it’s not what actually moves the wheels.
我並不是說好奇心/興趣沒有幫助,我只是說它不是直接推動進步的因素。刻意練習才是直接推動進步的因素。好奇心/興趣可以說是"潤滑劑",但它並不是實際推動輪子的力量。