How to get from high school math to cutting-edge ML/AI: a detailed 4-stage roadmap with links to the best learning resources that I’m aware of.
如何从高中数学进阶到前沿机器学习/人工智能:一个详细的四阶段路线图,附带我所知道的最佳学习资源链接。
1) Foundational math. 2) Classical machine learning. 3) Deep learning. 4) Cutting-edge machine learning.
1) 基础数学。2) 经典机器学习。3) 深度学习。4) 前沿机器学习。
I recently talked to a number of people who work in software and want to get to the point where they can read serious ML/AI papers like Denoising Diffusion Probabilistic Models.
我最近与一些从事软件工作的人交谈,他们希望达到能够阅读严肃的机器学习/人工智能论文(如《去噪扩散概率模型》)的水平。
But even though they did well in high school math, even AP Calculus, maybe even learned some undergraduate math…
但即使他们在高中数学中表现出色,甚至学习了 AP 微积分,可能还学习了一些本科数学...
the math in these cutting-edge ML papers still looks like hieroglyphics.
这些前沿机器学习论文中的数学仍然看起来像象形文字。
So, how do you get from high school math to cutting-edge ML?
那么,如何从高中数学水平达到前沿机器学习水平呢?
Here’s a 4-stage roadmap.
这里有一个四阶段路线图。
I’ll start by briefly describing all the stages – and then I’ll go back to each stage for a deep dive where I fully explain the rationale and point you to resources that you can use to guide your learning.
我将首先简要描述所有阶段 - 然后我会回到每个阶段进行深入探讨,充分解释理由并为您指出可以指导学习的资源。
- Stage 1: Foundational Math. All the high school and university-level math that underpins machine learning. All of algebra, a lot of single-variable calculus / linear algebra / probability / statistics, and a bit of multivariable calculus.
第一阶段:基础数学。所有支撑机器学习的高中和大学水平数学。包括所有代数、大量单变量微积分/线性代数/概率/统计,以及一些多变量微积分。 - Stage 2: Classical Machine Learning. Coding up streamlined versions of basic regression and classification models, all the way from linear regression to small multi-layer neural networks.
第二阶段:经典机器学习。编写基本回归和分类模型的精简版本,从线性回归到小型多层神经网络。 - Stage 3: Deep Learning. Multi-layer neural networks with many parameters, where the architecture of the network is tailored to the specific kind of task you’re trying to get the model to perform.
第 3 阶段:深度学习。具有多层神经网络和大量参数,其中网络架构针对您试图使模型执行的特定任务类型进行定制。 - Stage 4: Cutting-Edge Machine Learning. Transformers, LLMs, diffusion models, and all the crazy stuff that’s coming out now, that captured your interest to begin with.
第 4 阶段:前沿机器学习。Transformer 模型、LLMs、扩散模型,以及现在出现的所有令人兴奋的新技术,这些正是最初引起您兴趣的内容。
Note that I’ve spent the past 5+ years working on resources to support learners in stages 1-2, and there is a general lack of serious yet friendly resources in those stages, so I’m going to be including my own resources there (along with some others).
请注意,过去 5 年多来,我一直致力于开发资源,以支持处于第 1-2 阶段的学习者。在这些阶段,普遍缺乏严谨而友好的资源,因此我将在其中加入我自己的资源(以及其他一些资源)。
But in stages 3-4, all the resources I’ll reference are things I’m not affiliated with in any way.
但是在第 3-4 阶段,我将参考的所有资源都是我没有任何关联的内容。
Alright, let’s dive in! 好,让我们开始吧!
Stage 1: Foundational Math
第 1 阶段:基础数学
There’s a lot of math underpinning ML: all of high school algebra, a lot of single-variable calculus / linear algebra / probability / statistics, and a bit of multivariable calculus.
机器学习的基础涉及大量数学:包括所有高中代数、大量单变量微积分/线性代数/概率/统计,以及一些多变量微积分。
Last year, I spent some time mapping all this math out at a topic-by-topic level. In many of these math courses, some topics are absolutely essential to know for ML, while other topics are unnecessary.
去年,我花了一些时间将所有这些数学知识按主题逐一映射出来。在这些数学课程中,有些主题对机器学习来说是绝对必要的,而其他主题则不是必需的。
For instance, in multivariable calculus:
例如,在多变量微积分中:
- You absolutely must know how to compute gradients (they show up all the time in the context of gradient descent when training ML models), and you need to be solid on the multivariable chain rule, which underpins the backpropagation algorithm for training neural networks.
您必须知道如何计算梯度(它们在训练机器学习模型时的梯度下降背景下经常出现),并且您需要牢固掌握多变量链式法则,这是训练神经网络的反向传播算法的基础。 - But on the other hand, many other multivariable calculus topics like divergence, curl, spherical coordinates, and Stokes' theorem don’t really show up anywhere in ML.
但另一方面,许多其他多变量微积分主题,如散度、旋度、球坐标和斯托克斯定理,在机器学习中并没有真正出现。
Once I mapped out the list of required topics, we put together a Mathematics for Machine Learning course whose table of contents can be viewed here.
一旦我列出了所需主题的清单,我们就一起制定了一门机器学习数学课程,其目录可在此处查看。
(To see the list of all 242 individual topics, click on “Content” and then click to expand the Unit boxes.)
(要查看全部 242 个单独主题的列表,请点击"内容",然后点击展开单元框。)
If you’re working on Math Academy’s maximum-efficiency learning system, you can learn all this content in about 70 hours if you know your math up through single-variable calculus.
如果你正在使用 Math Academy 的最高效学习系统,并且你已经掌握了单变量微积分的数学知识,你可以在大约 70 小时内学完所有这些内容。
(And if you’re missing background knowledge, that’s totally fine – the diagnostic assessment will automatically figure out the prerequisite material that you need to learn in algebra, calculus, etc. and add it to your learning plan.
(如果你缺少背景知识,那完全没问题 - 诊断性评估将自动找出你需要学习的代数、微积分等先决条件材料,并将其添加到你的学习计划中。
Even if you’ve completely forgotten all the math you learned and need to rebuild your foundations from the ground up, starting with fractions, we’ve got you covered.)
即使你已经完全忘记了所学的所有数学知识,需要从头开始重建你的基础,从分数开始,我们也能满足你的需求。)
It’s also possible to learn these topics through free online resources. For instance,
通过免费的在线资源学习这些主题也是可能的。例如,
- Khan Academy covers arithmetic up through high school math,
Khan Academy 涵盖了从算术到高中数学的内容, - OpenStax covers high school and a bit of early university math,
OpenStax 涵盖了高中和一些早期大学数学, - MIT OpenCourseWare covers plenty of university math, and
MIT OpenCourseWare 涵盖了大量的大学数学,而 - for any given math topic, there are plenty of notes online and usually plenty of videos on YouTube.
对于任何给定的数学主题,网上都有大量的笔记,通常在 YouTube 上也有很多视频。
However, to make it all the way through this Stage 1 using free resources, you’ll have to piece together a hodgepodge of scattered educational content, and there will be a lot more unproductive friction that will not only slow you down but also make you more likely to get overwhelmed and give up.
然而,要仅使用免费资源完成这个第一阶段,你必须将零散的教育内容拼凑在一起,这将产生更多非生产性的摩擦,不仅会减慢你的进度,还会增加你感到不知所措和放弃的可能性。
(Personally, I self-studied a bunch of math subjects through MIT OpenCourseWare and a variety of textbooks when I was in high school. These were good resources and I came a long way with them, but for the amount of effort that I put into learning, I could have gone a lot further if my time were used more efficiently. For more info, see my post Why Not Just Learn from a Textbook?.)
(就我个人而言,我在高中时通过 MIT 开放课程和各种教科书自学了很多数学科目。这些都是很好的资源,我通过它们取得了很大进步,但是考虑到我投入学习的努力程度,如果我的时间能更有效地利用,我本可以走得更远。更多信息请参见我的文章《为什么不直接从教科书学习?》。)
For most people, this stage of foundational math learning is a make-or-break moment and the hardest part of their machine learning journey.
对大多数人来说,这个基础数学学习阶段是他们机器学习旅程中的关键时刻,也是最困难的部分。
Learning the foundational math for machine learning is like learning how to read: achieving literacy opens up a world of further learning experiences, but without literacy, that world is completely closed off to you.
学习机器学习的基础数学就像学习阅读:获得读写能力为你打开了更多学习体验的世界,但如果没有读写能力,这个世界对你来说就完全封闭了。
Stage 2: Classical Machine Learning
第 2 阶段:经典机器学习
Once you have your math foundations in place, you’re ready to start coding up streamlined versions of basic ML models from linear regression to small multi-layer neural networks.
一旦你掌握了数学基础,你就可以开始编写从线性回归到小型多层神经网络的基本机器学习模型的简化版本。
But don’t try to jump into the fancy cutting-edge ML models just yet – if you do, you’re going to be confused by a lot of patterns that come from classical machine learning.
但是不要急于一时跳入那些花哨的前沿机器学习模型——如果你这样做,你会被许多源自经典机器学习的模式弄得一头雾水。
Let me give a concrete example of what I mean by this. Let’s go back to that paper Denoising Diffusion Probabilistic Models.
让我给出一个具体的例子来说明我的意思。让我们回到那篇《去噪扩散概率模型》的论文。
If you look at the bottom of page 3, you’ll see that equation (8) looks something like the following, but with a lot more subscripts and function arguments (which I’ve left out for simplicity):
如果你看页面 3 的底部,你会看到方程(8)看起来像下面这样,但有更多的下标和函数参数(为简单起见,我省略了这些):
Even if you know
即使你知道
- what the means (“expectation” from probability/statistics),
的含义(来自概率/统计学的"期望"), - what the means (“standard deviation” also from probability/statistics),
的含义(同样来自概率/统计学的"标准差"), - what the means ("vector norm" from linear algebra and multivariable calculus),
的含义(来自线性代数和多变量微积分的"向量范数"), - what the means (arbitrary "constant of integration" that vanishes when we take the derivative, covered in calculus),
的含义(微积分中涉及的任意"积分常数",在求导时消失), - and so on ... 等等……
the way these symbols are arranged here might look really confusing.
这些符号在这里的排列方式可能看起来非常令人困惑。
But if you know your classical machine learning, the equation immediately hits you as resembling a “loss function” that measures the average squared difference between two quantities.
但如果你熟悉经典机器学习,这个方程立即会让你联想到衡量两个量之间平均平方差的"损失函数"。
(It’s no coincidence that the authors used the letter on the left-hand side of the equation – for “loss.”)
(作者在等式左侧使用字母 并非巧合 – 代表"损失"。)
In classical ML, in that average squared difference between two quantities, one of those quantities comes from your model, and the other comes from a data set, so the loss function in a sense measures how “wrong” your model is in its predictions across the data set.
在经典机器学习中,在两个量之间的平均平方差中,一个量来自你的模型,另一个来自数据集,因此损失函数在某种意义上衡量了你的模型在整个数据集上预测的"错误程度"。
(Loosely speaking, the goal of “training” a ML model is to minimize the loss function, that is, to adjust the parameters of the model to minimize how “wrong” the model is about the data.)
(粗略地说,"训练"机器学习模型的目标是最小化损失函数,即调整模型参数以最小化模型对数据的"错误"程度。)
My point here is that if you know about these modeling patterns from classical ML, then you’ll immediately have intuition about the equations you see in cutting-edge ML papers.
我在这里的观点是,如果你了解这些来自经典机器学习的建模模式,那么你将立即对前沿机器学习论文中的方程有直观理解。
It’s like how, if a musician has practiced playing their scales and identifying musical keys, they can often jump into any song – even one they’ve never heard it before – and improvise in a way that sounds good, because their understanding of underlying musical patterns has given them a level of “feeling” or “intuition” about the new musical structure in which they’re operating.
这就像一个音乐家如果练习过音阶和识别音调,他们通常可以融入任何歌曲——即使是他们从未听过的歌曲——并以听起来很好的方式即兴发挥,因为他们对基本音乐模式的理解给了他们一种对新音乐结构的"感觉"或"直觉"。
Similarly, an experienced software developer who knows all sorts of design patterns might be able to get a sense of how a piece of code operates just by noticing patterns in the high-level structure – even though they haven’t actually gone through line by line to understand the precise operations that are happening in the code.
同样,一个了解各种设计模式的有经验的软件开发人员可能只需注意到代码高级结构中的模式,就能感知一段代码的运作方式——即使他们实际上并没有逐行去理解代码中发生的精确操作。
If you know your classical machine learning, the same thing will happen to you when reading modern ML papers.
如果你了解经典机器学习,在阅读现代机器学习论文时也会发生同样的情况。
Just glancing at equations and diagrams, you’ll find meaningful features jumping out at you in intuitive ways.
仅仅瞥一眼方程和图表,你就会发现有意义的特征以直观的方式跃然纸上。
Reading the paper will feel like looking at a “picture” where you get a high-level sense of what’s going on and then zoom in on the details afterwards.
阅读论文会感觉像是在看一幅"图画",你先获得对正在发生的事情的高层次理解,然后再放大细节。
But if you don’t know your classical ML, then good luck reading modern ML papers, because the authors are not going to spell all this out for you!
但如果你不了解经典机器学习,那么阅读现代机器学习论文就会很困难,因为作者不会为你详细解释所有这些!
So, once you have your math foundations in place, how can you get up to speed classical machine learning?
那么,一旦你掌握了数学基础,如何快速掌握经典机器学习呢?
One problem with many ML courses is that they don’t have students implement the key models from scratch.
许多机器学习课程的一个问题是,它们没有让学生从头实现关键模型。
Now, I’m not saying you have to implement a full-fledged ML library with all the bells and whistles of scikit-learn, pytorch, tensorflow, keras, etc., …
现在,我并不是说你必须实现一个具有 scikit-learn、pytorch、tensorflow、keras 等所有功能的完整机器学习库,...
but simply coding up a streamlined version of each basic ML model along the way from linear regression to small multi-layer neural networks will do wonders for your understanding.
但简单地编写从线性回归到小型多层神经网络的每个基本机器学习模型的精简版本,将极大地提高你的理解。
This was the premise of a series of quantitative coding classes that I developed and taught from 2020-23.
这是我在 2020-23 年间开发和教授的一系列定量编码课程的前提。
- First, students implemented basic ML algorithms from scratch, coding up streamlined versions of polynomial and logistic regression, k-nearest neighbors, k-means clustering, and parameter fitting via gradient descent.
首先,学生从头实现基本的机器学习算法,编写多项式回归、逻辑回归、k 近邻、k 均值聚类和通过梯度下降进行参数拟合的精简版本。 - Then, they implemented more advanced ML algorithms such as decision trees and neural networks (still streamlined versions), and they reproduced academic research papers in artificial intelligence leading up to Blondie24, an AI computer program that taught itself to play checkers.
然后,他们实现了更高级的机器学习算法,如决策树和神经网络(仍然是精简版本),并复现了人工智能领域的学术研究论文,直至 Blondie24,一个自学下棋的人工智能计算机程序。
I wrote a textbook for those courses, Introduction to Algorithms and Machine Learning, which includes all the instructional material and exercises that I used during the courses. It’s freely available here.
我为这些课程编写了一本教科书《算法和机器学习导论》,其中包含了我在课程中使用的所有教学材料和练习。它可以在这里免费获取。
The textbook is self-contained, and once I finished it, my classes were almost entirely self-service. Students read the assigned chapter on their own and completed the exercises. In class, instead of giving traditional lessons, I answered questions and helped students debug their code.
这本教科书是自成体系的,一旦我完成了它,我的课程几乎完全是自主学习。学生自行阅读指定的章节并完成练习。在课堂上,我不再进行传统的授课,而是回答问题并帮助学生调试代码。
(My students came in without much coding experience, so the book also has you implement canonical data structures and algorithms including sorting, searching, graph traversals, etc. –
(我的学生进来时几乎没有编码经验,所以这本书还让你实现经典的数据结构和算法,包括排序、搜索、图遍历等 -
you can skip this stuff if you already know it, but if you don’t know it, it would definitely worth using these exercises to develop your algorithmic coding chops in general.
如果你已经知道这些,可以跳过,但如果你不知道,绝对值得利用这些练习来普遍提高你的算法编码能力。
Even if you’ve been working in software for years – if you don’t have experience writing this sort of algorithm-heavy code, it would be a good idea to use this as an opportunity to get some practice.
即使你已经从事软件工作多年 - 如果你没有编写这种算法密集型代码的经验,利用这个机会获得一些实践将是个好主意。
It’s like how even beyond competitive weightlifting, most serious athletes still lift weights to develop supporting musculature and improve their overall athleticism, since it indirectly carries over to increase their sport-specific performance.)
这就像是即使在竞技举重之外,大多数认真的运动员仍然进行举重训练以发展支撑肌肉并提高整体运动能力,因为这间接地提高了他们在特定运动中的表现。
IMPORTANT: it’s crucial to point out that the intended learning outcome of Introduction to Algorithms and Machine Learning is pretty general quantitative coding with a focus on ML/AI.
重要:需要指出的是,《算法与机器学习导论》的预期学习成果是以机器学习/人工智能为重点的通用定量编程。
While the book provides you with solid foundations, you would still want to follow up with a proper ML course afterwards that goes through the whole zoo of ML algorithms.
虽然这本书为你提供了坚实的基础,但你之后仍需要参加一个正式的机器学习课程,以全面了解各种机器学习算法。
For this, I would recommend working through Andrew Ng’s acclaimed Machine Learning Specialization on Coursera.
为此,我建议你学习 Andrew Ng 在 Coursera 上备受赞誉的机器学习专项课程。
This is a sequence of 3 courses:
这是一个由 3 门课程组成的系列:
- Course 1: Supervised Machine Learning: Regression and Classification
课程 1:监督机器学习:回归和分类 - Course 2: Advanced Learning Algorithms
课程 2:高级学习算法 - Course 3: Unsupervised Learning, Recommenders, Reinforcement Learning
课程 3:无监督学习、推荐系统、强化学习
But make sure to do the quizzes and assignments! It doesn’t count if you just watch the videos. Passively “following along” is not the same as, or even remotely close to, actually learning.
但请务必完成测验和作业!仅仅观看视频是不够的。被动地"跟随"与实际学习相去甚远,甚至完全不同。
Stage 3: Deep Learning 第 3 阶段:深度学习
So much of modern machine learning is built upon deep learning, that is, multi-layer neural networks with many parameters, where the architecture of the network is tailored to the specific kind of task you’re trying to get the model to perform.
现代机器学习的很大一部分建立在深度学习之上,即具有多个参数的多层神经网络,其中网络的架构是根据你试图让模型执行的特定任务类型而定制的。
If you asked me several years ago how to spin up on deep learning, I wouldn’t know what to tell you. While neural networks have been around for a while in the academic literature, deep learning didn’t really take off in popularity until around 2010, which means there wasn’t much demand for educational material until recently.
如果几年前你问我如何快速掌握深度学习,我可能不知道该如何回答。虽然神经网络在学术文献中已经存在了一段时间,但深度学习直到 2010 年左右才真正开始流行,这意味着直到最近才对教育材料产生了大量需求。
When I first got interested in deep learning, around 2013-14, the only way to spin up on it was to trudge through miscellaneous research papers, YouTube videos, and blog posts. It was extremely frustrating and inefficient.
当我在 2013-14 年左右第一次对深度学习产生兴趣时,唯一的学习方法就是艰难地翻阅各种研究论文、YouTube 视频和博客文章。这是极其令人沮丧和低效的。
Later in the 2010s, a number of deep learning textbooks came out – and while they definitely improved the state of deep learning education, they were seriously lacking concrete computational examples and exercises.
在 2010 年代后期,出现了一些深度学习教科书——虽然它们确实改善了深度学习教育的状况,但它们严重缺乏具体的计算示例和练习。
It was kind of like reading a long, thorough lecture, which was better than a hodgepodge of papers, but it was still pretty hard to learn from compared to, say, a well-scaffolded calculus textbook full of worked examples and hands-on exercises.
这有点像阅读一篇长而全面的讲座,虽然比杂乱无章的论文要好,但与一本结构良好、充满解题示例和动手练习的微积分教科书相比,学习起来仍然相当困难。
More recently, though, one deep learning textbook has caught my eye as a superb educational resource: Understanding Deep Learning by Simon J. D. Prince.
然而,最近有一本深度学习教科书引起了我的注意,它是一个极好的教育资源:Simon J. D. Prince 的《理解深度学习》。
It’s freely available here, and I would highly recommend to check out this highlights reel demonstrating what makes the book so awesome.
它可以在这里免费获得,我强烈建议你查看这个亮点集锦,它展示了这本书的优秀之处。
I’ve read through a number of chapters myself. It’s a serious yet friendly textbook – remarkably detailed and full of visualizations, quick concrete algebraic/numerical examples and exercises, historical notes/references, and references to current work in the field.
我自己已经阅读了几个章节。这是一本严肃 yet 友好的教科书——内容详尽,充满可视化内容、简洁具体的代数/数值示例和练习、历史注释/参考文献,以及对该领域当前工作的引用。
Overall, an amazing resource for anyone who has foundational math chops and knowledge of classical ML, and has paid attention to deep learning making headlines through the last decade, but hasn’t kept up with all the technical details.
总的来说,对于那些具有基础数学功底和经典机器学习知识,并在过去十年中关注过深度学习头条新闻,但没有跟上所有技术细节的人来说,这是一个很棒的资源。
By the way, it’s not just me who loves this book. It blew up on HackerNews last year and has 4.8/5 stars across 97 reviews on Amazon – and if you read those Amazon reviews, it’s obvious that this book has made a tremendous positive impact on many people’s lives.
顺便说一下,不仅仅是我喜欢这本书。它去年在 HackerNews 上引起轰动,在亚马逊上有 97 条评论,平均评分为 4.8/5 星——如果你阅读这些亚马逊评论,就会发现这本书对许多人的生活产生了巨大的积极影响。
Afterwards, I would also recommend to work through Jeremy Howard’s Practical Deep Learning for Coders (Part 1) on Fast.AI.
之后,我还建议你学习 Fast.AI 上 Jeremy Howard 的《实用深度学习编程(第 1 部分)》。
This course has a wealth of great hands-on, guided projects that go beyond textbook exercises.
这门课程包含大量优秀的实践项目指导,超越了教科书的练习。
Again, make sure to do the projects! It doesn’t count if you just watch the videos.
再次强调,一定要完成这些项目!仅仅观看视频是不够的。
(To be clear: I would still recommend to work through the exercises in the Understanding Deep Learning book so that you come into the Fast.AI projects with plenty of background knowledge.
(需要明确的是:我仍然建议你完成《理解深度学习》一书中的练习,这样你在开始 Fast.AI 的项目时就会有充分的背景知识。
Projects are great for pulling a bunch of knowledge together and solidifying your knowledge, but when you’re initially learning something for the very first time, it’s more efficient to do so in a lower-complexity, higher-scaffolding setting.
项目非常适合整合大量知识并巩固你的理解,但当你最初学习一些全新的内容时,在一个复杂度较低、支持性较高的环境中学习会更有效率。
Otherwise, without a serious level of background knowledge, it’s easy to get overwhelmed by a project – and when you’re overwhelmed and spinning your wheels being confused without making much progress, that’s very inefficient for your learning.
否则,如果没有足够的背景知识,很容易被项目所淹没——当你感到不知所措,困惑地原地打转而没有取得多少进展时,这对你的学习效率是非常低下的。
By the way, this is an instance of the “expertise reversal effect” in the science of learning.)
顺便说一下,这是学习科学中"专业知识反转效应"的一个例子。)
Stage 4: Cutting-Edge Machine Learning
第 4 阶段:前沿机器学习
Finally, we reach the end destination: transformers, LLMs, diffusion models, and all the crazy stuff that’s coming out now, that captured your interest to begin with.
最后,我们到达了终点:transformer 模型、LLMs、扩散模型,以及现在正在涌现的所有令人兴奋的新技术,这些正是最初激发你兴趣的内容。
Guess what? You know how I said that when I first got interested in deep learning, around 2013-14, the only way to spin up on it was to trudge through miscellaneous research papers, YouTube videos, and blog posts – and it was extremely frustrating and inefficient?
你猜怎么着?还记得我说过,当我在 2013-14 年左右首次对深度学习产生兴趣时,唯一的学习方法是艰难地阅读各种研究论文、YouTube 视频和博客文章——这个过程极其令人沮丧且效率低下吗?
Well, that’s just kind of how it goes when you reach the cutting edge of a field – and when you get to this “Stage 4,” that’s what you can expect (at least, for now, until the field matures further).
嗯,当你达到一个领域的前沿时,情况就是这样——当你进入这个"第四阶段"时,这就是你可以预期的(至少目前如此,直到该领域进一步成熟)。
That said, even when you’re operating at the edge of human knowledge, there is still a way to optimize your approach.
也就是说,即使你在人类知识的边缘操作,仍然有一种方法可以优化你的方法。
I’ll quote my colleague Alex Smith who who recently posted about his own experience getting up to speed on his dissertation topic while doing a PhD in mathematics:
我引用我的同事亚历克斯·史密斯最近发布的关于他在数学博士期间了解论文主题的经历:
- "My biggest mistake when starting my doctoral research was taking a top-down approach. I focused my efforts on a handful of research papers on the frontier of my chosen field, even writing code to solve problems in these papers from day one. However, I soon realized I lacked many foundational prerequisites, making the first year exceptionally tough.
"开始我的博士研究时,我最大的错误是采用了自上而下的方法。我将精力集中在所选领域前沿的少数几篇研究论文上,甚至从第一天就开始编写代码来解决这些论文中的问题。然而,我很快意识到我缺乏许多基础先决条件,这使得第一年异常艰难。
What I should have done was spend 3-6 months dissecting the hell out of all the key research papers and books written on the subject, starting from the very basics (from my knowledge frontier) and working my way up (the bottom-up approach)."
我应该做的是花 3-6 个月的时间彻底剖析该主题的所有关键研究论文和书籍,从最基础的开始(从我的知识前沿),然后逐步向上(自下而上的方法)。"
I know “cutting-edge” might sound like a single line where you cross from reading established textbooks to reading the latest arXiv preprint that blew up yesterday…
我知道"前沿"可能听起来像是一条单一的线,你从阅读既定教科书跨越到阅读昨天爆炸性的最新 arXiv 预印本...
but in reality, the cutting edge is more of a “zone” where the amount of guidance gradually fades. Just like how, on a physical knife, you typically see the blade start to narrow a couple millimeters before the true edge.
但实际上,前沿更像是一个"区域",在那里指导的数量逐渐减少。就像在物理刀刃上,你通常会看到刀刃在真正的边缘前几毫米开始变窄。
At the start of the cutting edge, there may not be any textbooks – but there are typically miscellaneous guided resources like videos, blog posts, and foundational research papers and literature reviews that are easier to follow than the latest research papers all the way at the end of the cutting edge.
在前沿的开始,可能没有任何教科书——但通常有各种引导性资源,如视频、博客文章以及基础研究论文和文献综述,这些比最新的研究论文更容易理解,而最新的研究论文则处于前沿的最末端。
For a concrete example: if you want to work on cutting-edge LLM research, don’t start by immediately trying to build on the latest paper that’s come out.
举个具体的例子:如果你想从事前沿的LLM研究,不要立即尝试在最新发表的论文基础上进行构建。
Instead, start out with a literature search and collect the important papers, videos, blog posts, etc., that will bring you from the beginning of the edge of the field to the very end.
相反,从文献搜索开始,收集重要的论文、视频、博客文章等,这些将带你从该领域边缘的起点到终点。
For instance, Andrej Karpathy’s Neural Networks: Zero to Hero series on YouTube would stand firmly in that pool of resources.
例如,Andrej Karpathy 在 YouTube 上的"Neural Networks: Zero to Hero"系列就属于这类资源。
So would Jeremy Howard’s From Deep Learning Foundations to Stable Diffusion on Fast.AI.
Jeremy Howard 在 Fast.AI 上的"From Deep Learning Foundations to Stable Diffusion"也是如此。
Responses to Follow-Up Questions
对后续问题的回应
I received some great follow-up questions about this post after it gained some traction on X/Twitter (August 2024). Feel free to contact me if you have any additional questions that aren’t addressed here.
在这篇文章在 X/Twitter 上引起关注后(2024 年 8 月),我收到了一些很好的后续问题。如果你有任何这里未涉及的其他问题,随时与我联系。
Don’t you need software engineering skills as well?
你不需要软件工程技能吗?
Yes. This roadmap is directed at people who work in software and want to get into serious ML/AI, so I’m assuming that they’re coming in with SWE skills.
是的。这个路线图针对的是在软件领域工作并想进入严肃的机器学习/人工智能领域的人,所以我假设他们已经具备软件工程技能。
But I’m sure there are also lots of people reading who don’t have SWE experience – including a “dual space” of people who have plenty of background in math but none in SWE (I’m sure plenty of math majors fall into this category).
但我相信也有很多读者没有软件工程经验 – 包括一个"对偶空间",即那些在数学方面有丰富背景但在软件工程方面没有经验的人(我相信很多数学专业的学生属于这一类)。
Which brings us to the next question:
这就引出了下一个问题:
Can you also provide a roadmap for learning the fundamentals of CS and coding specifically for doing cutting-edge ML/AI research?
你能否也提供一个学习计算机科学基础和编码的路线图,特别是为了进行前沿机器学习/人工智能研究?
I’d say that if you don’t know how to do this already, the first order of business would be to implement the canonical data structures and algorithms covered in Introduction to Algorithms and Machine Learning (sorting, searching, graph traversals, etc., basically the stuff you’d see in a typical Data Structures / Algorithms course). If you work through that textbook in full, you’ll naturally cover all that stuff in addition to the core foundations of classical machine learning.
我认为,如果你还不知道如何做这些,首要任务就是实现《算法导论》和《机器学习》中涵盖的经典数据结构和算法(排序、搜索、图遍历等,基本上是你在典型的数据结构/算法课程中会看到的内容)。如果你完整地学习这本教科书,你自然会涵盖所有这些内容,同时还能掌握经典机器学习的核心基础。
After that, I think it’s easier to pickup the rest of CS/coding as needed along the way since – at least, in my experience – it’s less hierarchical than math.
在那之后,我认为沿途根据需要学习剩余的计算机科学/编程知识会更容易,因为——至少根据我的经验——它比数学的层次结构性要低。
Don’t get me wrong, there are many CS/coding skills that you’ll need to learn if you don’t know already (off the top of my head: version control, interacting with databases and GPUs, various ML libraries/frameworks), but I these sorts of things can be picked up on the fly because, unlike in a tall hierarchy like math, it’s much easier to identify what prerequisites you’re missing whenever there’s something you don’t know how to do.
别误会我的意思,如果你还不知道的话,有很多计算机科学/编程技能你需要学习(我能想到的有:版本控制、与数据库和 GPU 交互、各种机器学习库/框架),但我认为这些东西可以边做边学,因为与数学这样的高层次结构不同,当你遇到不知道如何做的事情时,更容易识别出你缺少哪些前置知识。
Like, let’s say you get to a stage where you need to run a ML model on a GPU, and you don’t know how to hook it up to the GPU. That’s something you’ll need to learn, but it’s not like there’s a mountain of prerequisite knowledge leading up to that task. You don’t have to know much of the underlying theory behind how GPUs work. There are frameworks like PyTorch where you can define a model in terms of tensor operations and the framework handles the process of getting it running efficiently on a GPU. You just have to point the framework to the GPU, which is pretty much just a configuration thing.
比如,假设你到了需要在 GPU 上运行机器学习模型的阶段,但你不知道如何连接到 GPU。这是你需要学习的东西,但并不像有一座前置知识的大山需要攀登。你不需要了解太多 GPU 工作原理的底层理论。有像 PyTorch 这样的框架,你可以用张量运算定义模型,框架会处理在 GPU 上高效运行的过程。你只需要将框架指向 GPU,这基本上只是一个配置问题。
Now, I should mention that the way I’m interpreting “ML/AI research” is that you’re focusing on building models that are new in a theoretical sense, as opposed to deploying / scaling up existing models. If you’re actually wanting to get into ML/AI deployment, or maxing out ML/AI compute power, then there’s going to be more CS/coding/software prereqs.
现在,我应该提到,我对"机器学习/人工智能研究"的理解是,你专注于构建在理论上是新颖的模型,而不是部署/扩展现有模型。如果你实际上想进入机器学习/人工智能部署,或最大化机器学习/人工智能计算能力,那么就会有更多计算机科学/编程/软件方面的先决条件。
For instance, if you want to further improve the efficiency with which a model utilizes a GPU – like, making improvements to PyTorch itself or something like that – then obviously you’re going to have to know a lot more about low-level GPU programming. But I don’t think that stuff would really fall under the realm of “ML/AI research” proper; it would be more properly categorized as “developing supporting software for ML/AI” or something like that.
例如,如果你想进一步提高模型利用 GPU 的效率——比如对 PyTorch 本身进行改进之类的——那么显然你需要了解更多关于低级 GPU 编程的知识。但我认为这些内容并不真正属于"机器学习/人工智能研究"的范畴;它更适合归类为"开发支持机器学习/人工智能的软件"或类似的东西。
So, you don’t need to know graduate-level math for deep learning?
那么,你不需要研究生水平的数学知识来进行深度学习?
That’s right, the necessary math would be classified as undergraduate level. Now, I should point out that there are some theoretical ML researchers who are trying to build unifying theories of deep learning, and they sometimes use more advanced/abstract math in their theoretical frameworks that what I’ve talked about here. But that’s a separate thing.
没错,所需的数学知识可以归类为本科水平。现在,我应该指出,有一些理论机器学习研究人员正在尝试建立深度学习的统一理论,他们在理论框架中有时会使用比我在这里讨论的更高级/抽象的数学。但那是另一回事。
In general, why are there so many arguments about how much math you need to learn for ML?
总的来说,为什么会有这么多关于机器学习需要学习多少数学的争论?
Yeah, the question of “how much math do I need for ML/AI” can be a bit polarizing: some people say “no math needed” while others say “you need everything including axiomatic set theory,” but really the most generally correct answer is somewhere in the middle.
是的,"机器学习/人工智能需要多少数学"这个问题可能有点两极化:有人说"不需要数学",而其他人说"你需要学习包括公理化集合论在内的所有数学",但实际上最普遍正确的答案介于两者之间。
And for any sort of argument where a nuanced middle ground has to be established, it seems necessary to explicitly lay out what the specific middle ground is – what should be included, what shouldn’t be included, and why. So that’s what I’ve tried to do here.
对于任何需要建立微妙中间立场的争论,似乎有必要明确阐述具体的中间立场是什么——应该包括什么,不应该包括什么,以及为什么。所以这就是我在这里尝试做的。
Do you think in some way the sheer number of things to study from the get go might be somewhat of a limiting factor, or discourage people? How should one balance just deep studying while also satisfying curiosity and keeping a pulse on the cutting-edge?
你是否认为从一开始就有如此多的东西需要学习,在某种程度上可能是一个限制因素,或者会让人感到气馁?一个人应该如何平衡深入学习,同时满足好奇心并跟上最前沿的发展?
Yeah, I completely agree that the sheer mountain of work ahead of you can be discouraging, so it is definitely worth doing things to keep your interest alive: reading some articles about the cutting edge, toying around with new machine learning models/libraries, etc.
是的,我完全同意摆在你面前的庞大工作量可能会让人感到气馁,所以做一些能保持兴趣的事情确实值得:阅读一些关于前沿的文章,尝试新的机器学习模型/库等。
At a fundamental level, playing around not as efficient for learning as actually building up your hard skills through deliberate practice. So you can’t spend all your time tinkering and expect to reach a high level of skill.
从根本上说,随意玩耍在学习效率上不如通过刻意练习来培养你的硬技能。所以你不能把所有时间都花在摆弄上,却期望达到高水平的技能。
However, if you burn yourself out and lose motivation to engage in deliberate practice, and you just stop… then that’s even worse.
然而,如果你耗尽自己并失去进行刻意练习的动力,然后就此停止...那就更糟糕了。
I think that ideally, you’d want to play around just enough to keep yourself motivated to keep building up your hard skills through deliberate practice, and you’d spend the rest of the time actually doing the latter.
我认为理想情况下,你应该玩得恰到好处,以保持自己通过刻意练习培养硬技能的动力,而剩下的时间则实际上在做后者。
To zoom out and drive this point home, I want to point out the common misconception that ability is something to be “unlocked” by curiosity (which seems easy), not something “built” by deliberate practice (which seems hard).
为了放大视角并强调这一点,我想指出一个常见的误解,即能力是通过好奇心(看似容易)来"解锁"的,而不是通过刻意练习(看似困难)来"构建"的。
This misconception sounds so ridiculous when you imagine it coming from an athletic trainer: “You want to get really good at basketball? Forget about practice drills – you were born to ball; all you need to do to unlock your inner baller is come in with the right attitude and play some pick-up ball at the park.”
当你想象这个误解来自一个运动教练时,它听起来如此荒谬:"你想在篮球上真正变得很厉害?忘掉练习训练吧 - 你天生就是打球的料;你需要做的就是带着正确的态度来,在公园里打些业余球。"
Now, I’m not against curiosity/interest. That’s not what I’m trying to say at all. But curiosity/interest does not itself build ability. Curiosity/interest motivates people to engage in deliberate practice, which is what builds ability.
现在,我并不反对好奇心/兴趣。这完全不是我想说的。但好奇心/兴趣本身并不能培养能力。好奇心/兴趣激励人们进行刻意练习,而刻意练习才是培养能力的关键。
I’m not saying curiosity/interest doesn’t help, I’m just saying it’s not what moves the needle directly. Deliberate practice is what moves the needle directly. Curiosity/interest “greases the wheels,” so to speak, but it’s not what actually moves the wheels.
我并不是说好奇心/兴趣没有帮助,我只是说它不是直接推动进步的因素。刻意练习才是直接推动进步的因素。好奇心/兴趣可以说是"润滑剂",但它并不是实际推动轮子的力量。