这是用户在 2024-9-27 12:19 为 https://www.cell.com/neuron/fulltext/S0896-6273(24)00042-4 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
PerspectiveVolume 112, Issue 5p698-717March 06, 2024Open access
《透视》第 112 卷第 5 期p698-717 2024 年 3 月 6 日开放获取

Data science opportunities of large language models for neuroscience and biomedicine
神经科学和生物医学大型语言模型的数据科学机会

Danilo Bzdok
Correspondence
Corresponding author
Affiliations
Mila - Quebec Artificial Intelligence Institute, Montreal, QC, Canada
TheNeuro - Montreal Neurological Institute (MNI), Department of Biomedical Engineering, McGill University, Montreal, QC, Canada
1,3 danilobzdok@gmail.com
Andrew Thieme
Affiliations
Mindstate Design Labs, San Francisco, CA, USA
2

1, 3 danilobzdok@gmail.com ∙ 安德鲁·蒂姆
Oleksiy Levkovskyy
Affiliations
Mindstate Design Labs, San Francisco, CA, USA
2

2 ∙
Oleksiy Levkovskyy
附属机构
Mindstate Design Labs,旧金山,加利福尼亚州,美国
2
Paul Wren
Affiliations
Mindstate Design Labs, San Francisco, CA, USA
2

2 ∙
Paul Wren
附属机构
Mindstate Design Labs,旧金山,加利福尼亚州,美国
2
Thomas Ray
Affiliations
Mindstate Design Labs, San Francisco, CA, USA
2

2 ∙
Thomas Ray
附属机构
Mindstate Design Labs,旧金山,加利福尼亚州,美国
2
Siva Reddy
Affiliations
Mila - Quebec Artificial Intelligence Institute, Montreal, QC, Canada
Facebook CIFAR AI Chair
ServiceNow Research
1,4,5

2 ∙
Siva Reddy
隶属关系
Mila - 魁北克人工智能研究所,加拿大魁北克省蒙特利尔
Facebook CIFAR AI 主席
ServiceNow 研究
1 , 4 , 5
Cover Image - Neuron, Volume 112, Issue 5

Abstract 抽象的

Large language models (LLMs) are a new asset class in the machine-learning landscape. Here we offer a primer on defining properties of these modeling techniques. We then reflect on new modes of investigation in which LLMs can be used to reframe classic neuroscience questions to deliver fresh answers. We reason that LLMs have the potential to (1) enrich neuroscience datasets by adding valuable meta-information, such as advanced text sentiment, (2) summarize vast information sources to overcome divides between siloed neuroscience communities, (3) enable previously unthinkable fusion of disparate information sources relevant to the brain, (4) help deconvolve which cognitive concepts most usefully grasp phenomena in the brain, and much more.
大型语言模型( LLMs )是机器学习领域的一个新资产类别。在这里,我们提供了有关定义这些建模技术属性的入门知识。然后,我们反思新的研究模式,其中LLMs可用于重新构建经典的神经科学问题,以提供新的答案。我们认为LLMs有潜力(1)通过添加有价值的元信息(例如高级文本情感)来丰富神经科学数据集,(2)总结大量信息源以克服孤立的神经科学社区之间的分歧,(3)实现以前不可想象的融合与大脑相关的不同信息源,(4)帮助解卷积哪些认知概念最能有效地掌握大脑中的现象,等等。

Introduction 介绍

Language has more human information per bit than potentially any other form of data. Natural language processing (NLP) to analyze human text has come a long way. In the early days, simple language models like n-gram models (e.g., a 2-gram treats word-word combinations as unique entities) were used to study language and semantics with various goals. Language models have at times also been used to study various cognitive tasks like reading comprehension, language translation, and question answering. Researchers compared the performance of NLP models on these tasks with human performance to gain insights into human cognition, such as in the field of psycholinguistics.
语言每比特所包含的人类信息比任何其他形式的数据都多。用于分析人类文本的自然语言处理 (NLP) 已经取得了长足的进步。早期,像 n-gram 模型(例如,2-gram 将词与词的组合视为唯一实体)这样的简单语言模型被用来研究具有各种目标的语言和语义。语言模型有时也被用来研究各种认知任务,如阅读理解、语言翻译和问答。研究人员将 NLP 模型在这些任务上的表现与人类表现进行比较,以深入了解人类认知,例如在心理语言学领域。
The rise of deep learning after ∼2010 ignited the era of semantic “embeddings” in NLP modeling: single words, sentences, paragraphs, or entire documents could be encapsulated in a compact float-vector format that denotes meaning. Intuitively, such embeddings can be thought of as locations in a high-dimensional coordinate system that enable mapping of semantic entities (word sequences) relative to their contextual similarity.
1.
Mikolov, T. ∙ Sutskever, I. ∙ Chen, K. ...
Distributed representations of words and phrases and their compositionality
Adv. Neural Inf. Process. Syst. 2013; 26
,
2.
Le, Q. ∙ Mikolov, T.
Distributed representations of sentences and documents
PMLR. 2014; 32:1188-1196
,
3.
Conneau, A. ∙ Kiela, D. ∙ Schwenk, H. ...
Supervised learning of universal sentence representations from natural language inference data
Preprint atarXiv. 2017;
,
4.
McCann, B. ∙ Bradbury, J. ∙ Xiong, C. ...
Learned in translation: Contextualized word vectors
Adv. Neural Inf. Process. Syst. 2017;
The more two semantic entities denote similar contexts, the more similar their semantic embeddings will be. Using last-generation models like Word2Vec
5.
Mikolov, T. ∙ Chen, K. ∙ Corrado, G. ...
Efficient estimation of word representations in vector space
Preprint atarXiv. 2013;
and GloVe,
6.
Pennington, J. ∙ Socher, R. ∙ Manning, C.D.
Glove: Global vectors for word representation
researchers started to use these interoperable semantic embedding representations to quantify the relationships in meaning, such as between words or sentences.
2010 年之后深度学习的兴起点燃了 NLP 建模中语义“嵌入”的时代:单个单词、句子、段落或整个文档可以封装在表示含义的紧凑浮点向量格式中。直观上,这种嵌入可以被认为是高维坐标系中的位置,能够相对于上下文相似性映射语义实体(单词序列)。
1.
Mikolov, T. ∙ Sutskever, I. ∙ Chen, K. ...
Distributed representations of words and phrases and their compositionality
Adv. Neural Inf. Process. Syst. 2013; 26
2.
Le, Q. ∙ Mikolov, T.
Distributed representations of sentences and documents
PMLR. 2014; 32:1188-1196
3.
Conneau, A. ∙ Kiela, D. ∙ Schwenk, H. ...
Supervised learning of universal sentence representations from natural language inference data
Preprint atarXiv. 2017;
4.
McCann, B. ∙ Bradbury, J. ∙ Xiong, C. ...
Learned in translation: Contextualized word vectors
Adv. Neural Inf. Process. Syst. 2017;
两个语义实体表示相似上下文的次数越多,它们的语义嵌入就越相似。使用 Word2Vec 等上一代模型
5.
Mikolov, T. ∙ Chen, K. ∙ Corrado, G. ...
Efficient estimation of word representations in vector space
Preprint atarXiv. 2013;
和手套,
6.
Pennington, J. ∙ Socher, R. ∙ Manning, C.D.
Glove: Global vectors for word representation
研究人员开始使用这些可互操作的语义嵌入表示来量化含义的关系,例如单词或句子之间的关系。
Current LLMs are trained on more text than one human could read in hundreds or thousands of lifetimes. This allows them to perform feats like writing computer programming code, mathematics, planning, literature reviews and summations, or playing text-based games—many forms of emergent capabilities that even their developers did not anticipate.
7.
Bubeck, S. ∙ Chandrasekaran, V. ∙ Eldan, R. ...
Sparks of artificial general intelligence: Early experiments with gpt-4
Preprint atarXiv. 2023;
Sometimes such models are used to study how the brain itself processes contextual information and how the human mind generates language (please see Goldstein et al.,
8.
Goldstein, A. ∙ Zada, Z. ∙ Buchnik, E. ...
Shared computational principles for language processing in humans and deep language models
Nat. Neurosci. 2022; 25:369-380
Caucheteux et al.,
9.
Caucheteux, C. ∙ Gramfort, A. ∙ King, J.-R.
Evidence of a predictive coding hierarchy in the human brain listening to speech
Nat. Hum. Behav. 2023; 7:430-441
and Schrimpf et al.
10.
Schrimpf, M. ∙ Blank, I.A. ∙ Tuckute, G. ...
The neural architecture of language: Integrative modeling converges on predictive processing
Proc. Natl. Acad. Sci. USA. 2021; 118, e2105646118
for excellent examples), although this will not concern us here. As the current paradigm shifts and trends exponentially, LLMs learn what are probably the most powerful internal representations of meaning to date.
目前的LLMs接受的训练文本比一个人数百或数千辈子所能阅读的文本还要多。这使得他们能够执行诸如编写计算机编程代码、数学、规划、文献综述和总结或玩基于文本的游戏等技能——许多形式的新兴能力甚至是他们的开发人员都没有预料到的。
7.
Bubeck, S. ∙ Chandrasekaran, V. ∙ Eldan, R. ...
Sparks of artificial general intelligence: Early experiments with gpt-4
Preprint atarXiv. 2023;
有时,此类模型用于研究大脑本身如何处理上下文信息以及人类思维如何生成语言(请参阅 Goldstein 等人,
8.
Goldstein, A. ∙ Zada, Z. ∙ Buchnik, E. ...
Shared computational principles for language processing in humans and deep language models
Nat. Neurosci. 2022; 25:369-380
考谢特等人,
9.
Caucheteux, C. ∙ Gramfort, A. ∙ King, J.-R.
Evidence of a predictive coding hierarchy in the human brain listening to speech
Nat. Hum. Behav. 2023; 7:430-441
和施林普夫等人。
10.
Schrimpf, M. ∙ Blank, I.A. ∙ Tuckute, G. ...
The neural architecture of language: Integrative modeling converges on predictive processing
Proc. Natl. Acad. Sci. USA. 2021; 118, e2105646118
以获得优秀的示例),尽管我们在这里不关心这一点。随着当前范式的转变和趋势呈指数级增长, LLMs学习可能是迄今为止最强大的意义内部表征。
Human language mirrors human thought, which is why state-of-the-art NLP is likely to provide us with inherent advantages. In this perspective, we attempt to discuss impending implications for investigators in neuroscience and biomedicine.
人类语言反映了人类思维,这就是为什么最先进的 NLP 可能为我们提供固有的优势。从这个角度来看,我们试图讨论对神经科学和生物医学研究人员即将产生的影响。

Data science perspective on large language model solutions
大语言模型解决方案的数据科学视角

Historically, convolutional deep neural networks have revived AI excitement since 2010–2012, and LLMs are currently fueling yet another wave of momentum in the AI ecosystems. Language modeling has recently made substantial leaps forward following the introduction of the transformer architecture (for example, Vaswani et al.
11.
Vaswani, A. ∙ Shazeer, N. ∙ Parmar, N. ...
Attention is all you need
Adv. Neural Inf. Process. Syst. 2017; 30
was cited >90,000 times in the first 5 years after publication), driving the current thrust in AI innovation. For example, GPT-2 consists of 24 transformer blocks and recent architectures are even deeper (some of which have not been disclosed). As an instance of what has recently been called “generative AI” (gen AI), the outcome from these algorithms is not a class (e.g., patients with disease versus healthy group), number (e.g., cognitive performance measure), or discrete category (e.g., brackets of yearly income), but a structured “content” like language (as well as images or audio formatted information), i.e., synthesizing or fantasizing new content from previously ingested content.
从历史上看,自 2010 年至 2012 年以来,卷积深度神经网络重新点燃了人工智能的热潮,而LLMs目前正在推动人工智能生态系统的又一波势头。随着变压器架构的引入,语言建模最近取得了实质性的飞跃(例如,Vaswani 等人,2017)。
11.
Vaswani, A. ∙ Shazeer, N. ∙ Parmar, N. ...
Attention is all you need
Adv. Neural Inf. Process. Syst. 2017; 30
发表后的前 5 年内被引用超过 90,000 次),推动了当前人工智能创新的发展。例如,GPT-2 由 24 个变压器块组成,最近的架构甚至更深(其中一些尚未公开)。作为最近所谓的“生成人工智能”(gen AI)的一个例子,这些算法的结果不是类别(例如,疾病患者与健康组)、数字(例如,认知表现测量)或离散类别(例如,年收入的括号),而是像语言(以及图像或音频格式信息)这样的结构化“内容”,即从先前摄取的内容中合成或幻想出新内容。
Doing away with much of the complexity in the previous deep-neural-network generation, transformers have become state-of-the-art in NLP (Figures 1 and 2). This simplified architecture is much more scalable than its predecessors, partly because this modeling setup also lends itself well to parallelization of computation workflows. In contrast to previous deep NLP solutions, in transformer architectures, the interdependencies between word tokens, close or far, are equally well captured. Also departing from certain previous neural network design, transformer models are feed-forward deep learning architectures that do not include explicit loops of processing. Instead, there are implicit loops created by contextualization of the already generated, previous text fed back into the LLM as input (“autoregression”). Unlike the previous LLM generations revolving around BERT (bidirectional encoder representations from transformers; using a word-masking logic as the modeling objective), generative pre-trained transformer (GPT) architectures, such as ChatGPT, allocate attention only to the left-hand reading context or previous word tokens during training, which results in its unidirectional mode of processing, specifically autoregressive in nature. So-called position encoding is a feature of these architectures that quantifies a notion of where a word appears in a sentence (not just an unsorted “bag of words”). In other words, GPT variants predict the next word token in a sequence based on the preceding word tokens. Because of its unidirectional nature, GPT LLMs do not “see” or “consider” subsequent tokens when predicting the next word—it is looking to the past, not the future, of a given sentence (with analogy to how a human reads a book).
消除了上一代深度神经网络中的大部分复杂性,Transformer 已成为 NLP 中最先进的技术(图 1和图2 )。这种简化的架构比其前身更具可扩展性,部分原因是这种建模设置也非常适合计算工作流程的并行化。与之前的深度 NLP 解决方案相比,在 Transformer 架构中,单词标记之间的相互依赖关系(无论是近还是远)都同样可以很好地捕获。与之前的某些神经网络设计不同,Transformer 模型是前馈深度学习架构,不包含显式处理循环。相反,通过将已生成的先前文本作为输入反馈回LLM (“自回归”),会创建隐式循环。与之前围绕 BERT(来自 Transformer 的双向编码器表示;使用字屏蔽逻辑作为建模目标)的LLM世代不同,生成式预训练 Transformer (GPT) 架构(例如 ChatGPT)仅将注意力分配给左手阅读训练期间的上下文或先前的单词标记,这导致其单向处理模式,特别是本质上的自回归。所谓的位置编码是这些架构的一个特征,它量化了单词在句子中出现的位置的概念(而不仅仅是未排序的“词袋”)。换句话说,GPT 变体根据前面的单词标记来预测序列中的下一个单词标记。由于其单向性质,GPT LLMs在预测下一个单词时不会“看到”或“考虑”后续标记 - 它会着眼于给定句子的过去,而不是未来(类似于人类读书的方式) )。
Figure 1 The heart of the self-attention mechanism in transformer architectures like LLMs
图 1 LLMs等 Transformer 架构中自注意力机制的核心
Figure 2 The role of the self-attention layer in the transformer neural network architecture
图2自注意力层在Transformer神经网络架构中的作用
It is the self-attention mechanism that is at the heart of the transformer-augmented and GPT-like modeling architectures. This feature allows the model to allocate sensibility to different segments of the input text sequence based on different “attention scores.” Dedicating attention to closer versus further away word tokens turns out to be algorithmically identical—no iterative, step-by-step process is needed to involve information segments further away, as is required in earlier generations of deep learning architectures. In transformers, the mechanism of focusing on nearby or distant words in a sentence is handled in the same way, allowing the model to consider all parts of a sentence or text sequence simultaneously. In contrast to earlier neural networks, this does not require a sequential, “time-step” approach to process and coagulate distant ends of the input. The computational complexity of the common implementation of the self-attention mechanism scales quadratically with sequence length.
12.
Hassid, M. ∙ Peng, H. ∙ Rotem, D. ...
How much does attention actually attend? Questioning the Importance of Attention in Pretrained Transformers
Preprint atarXiv. 2022;
Despite various improvements in attention mechanisms, most of them still struggle in certain use cases involving especially long sequences.
13.
Tay, Y. ∙ Dehghani, M. ∙ Abnar, S. ...
Long range arena: A benchmark for efficient transformers
Preprint atarXiv. 2020;
Each transformer layer can “see” all tokens in its scope at once. Nevertheless, the depth of recursive information processing is limited by the number of consecutive transformer layers, such as in levels of nested meaning of sentences or nested multiplication of number sequences.
自注意力机制是 Transformer 增强和类 GPT 建模架构的核心。此功能允许模型根据不同的“注意力分数”将敏感性分配给输入文本序列的不同片段。事实证明,将注意力集中在较近和较远的单词标记上,结果在算法上是相同的——不需要迭代、逐步的过程来涉及较远的信息片段,正如早期深度学习架构所要求的那样。在 Transformer 中,以相同的方式处理关注句子中附近或远处单词的机制,允许模型同时考虑句子或文本序列的所有部分。与早期的神经网络相比,这不需要顺序的“时间步长”方法来处理和凝固输入的远端。自注意力机制的常见实现的计算复杂度与序列长度呈二次方缩放。
12.
Hassid, M. ∙ Peng, H. ∙ Rotem, D. ...
How much does attention actually attend? Questioning the Importance of Attention in Pretrained Transformers
Preprint atarXiv. 2022;
尽管注意力机制有了各种改进,但大多数在涉及特别长序列的用例中仍然举步维艰。
13.
Tay, Y. ∙ Dehghani, M. ∙ Abnar, S. ...
Long range arena: A benchmark for efficient transformers
Preprint atarXiv. 2020;
每个变压器层都可以立即“看到”其范围内的所有令牌。然而,递归信息处理的深度受到连续变换器层的数量的限制,例如句子的嵌套含义的级别或数字序列的嵌套乘法的级别。
Moreover, current LLM architectures typically have several parallel attention mechanisms lined up in each of the consecutive transformer layers. This “multi-headed attention” (1) enables placement of a simultaneous parallel focus on several different aspects of the input sequence, expanding the breadth of complexities that can be captured overall, and (2) thus allows for several dimensions of semantic representation to be identified and extracted all at once (with some analogy to modeling different latent factor components like principal component analysis or autoencoder neural networks).
14.
Bzdok, Danilo ∙ Yeo, B.T.T
Inference in the age of big data: Future perspectives on neuroscience
Neuroimage. 2017; 155:549-564

此外,当前的LLM架构通常在每个连续的转换器层中排列多个并行注意力机制。这种“多头注意力”(1)能够同时并行关注输入序列的几个不同方面,从而扩展了可以捕获的整体复杂性的广度,(2)因此允许语义表示的多个维度一次全部识别和提取(类似于对不同潜在因素成分进行建模,如主成分分析或自动编码器神经网络)。
14.
Bzdok, Danilo ∙ Yeo, B.T.T
Inference in the age of big data: Future perspectives on neuroscience
Neuroimage. 2017; 155:549-564
A notable and practically relevant choice is the temperature parameter (a scalar value in the range: 0 to positive number). This hyperparameter controls the degree of “creativity” in the model outputs, as a form of calibrating exploration versus exploitation. Setting high temperature (e.g., >1) produces a softer probability distribution in the word nomination in the last model layer. This leads to intentionally more fuzzy, and thus potentially less accurate but also more creative, outputs. In contrast, low temperature (e.g., <1) leads to a sharper probability distribution in output word relevances. In this mode of operation, the model becomes more deterministic, sticking closely to the most probable candidates in the output distribution, thereby reducing stochasticity in its responses.
一个值得注意且与实际相关的选择是温度参数(范围为 0 到正数的标量值)。该超参数控制模型输出中的“创造力”程度,作为校准探索与利用的一种形式。设置高温(例如,>1)会在最后一个模型层的单词提名中产生较软的概率分布。这会故意导致输出更加模糊,因此可能不太准确,但也更具创造性。相反,低温(例如,<1)会导致输出词相关性的概率分布更尖锐。在这种操作模式下,模型变得更具确定性,紧密贴合输出分布中最可能的候选者,从而减少其响应的随机性。
Despite simple modeling objectives (e.g., BERT invoking word masking and GPT3 invoking next-word prediction, while involving human feedback in the case of GPT4/ChatGPT), by their sheer enormity, transformer-endowed architectures have sparked few-shot learning (i.e., learning based on few examples from a target task) and performance paradigms, inherently deriving aspects of a semantic world model in multiple settings.
15.
Wei, J. ∙ Tay, Y. ∙ Bommasani, R. ...
Emergent abilities of large language models
Preprint atarXiv. 2022;
These capabilities are at the core of the self-supervised modeling regime (cf. next sections). These secondary consequences led even the creators of these models to struggle for explanations behind the successes of LLMs.
16.
OpenAI
GPT-4 Technical Report
Preprint atarXiv. 2023;

尽管建模目标很简单(例如,BERT 调用单词屏蔽,GPT3 调用下一个单词预测,同时在 GPT4/ChatGPT 的情况下涉及人类反馈),但由于其巨大的规模,Transformer 赋予的架构已经引发了小样本学习(即,基于目标任务中的少数示例进行学习)和性能范式,本质上在多种设置中导出语义世界模型的各个方面。
15.
Wei, J. ∙ Tay, Y. ∙ Bommasani, R. ...
Emergent abilities of large language models
Preprint atarXiv. 2022;
这些功能是自监督建模体系的核心(参见下一节)。这些次要后果甚至导致这些模型的创建者也努力寻找LLMs成功背后的解释。
16.
OpenAI
GPT-4 Technical Report
Preprint atarXiv. 2023;

Emerging scaling laws of large language model solutions
大型语言模型解决方案的新兴缩放定律

What are the limits of scale? As a key quality for impact, LLMs yield a rapidly higher quality of model instance with increasing number of training observations. Having roughly 2–20 times more training word tokens than model parameters for model fitting has led to impressive performances on multiple occasions. From a data perspective, it is challenging to get a sense of the upper bound on available text, text-transformed, and text-transformable data. As one concrete consideration, the size of the entire text volume on the internet may today reach around 2 trillion word tokens, based on simple normative assumptions (1.2 billion websites × 1,500 words per website on average [according to ChatGPT query, September 2023]). From a model perspective, from 2018 to 2022, the sizes of the LLMs have increased from ∼108 (e.g., ELMo, BERT-L) to ∼1011 (e.g., PaLM) parameters to be estimated. As a first rule of thumb holding across many goals and applications, expanding the depth and width of the model (increasing the number of parameters) leads to clear performance improvements. Knowing how a model scales is of strategic value as such insights inform decisions on resource allocation: how to prioritize compute budget, data troves, and model size.
规模的限制是什么?作为影响力的关键质量, LLMs随着训练观察数量的增加,可以快速提高模型实例的质量。用于模型拟合的训练词标记比模型参数多大约 2-20 倍,这在多种场合都取得了令人印象深刻的性能。从数据角度来看,了解可用文本、文本转换数据和可文本转换数据的上限具有挑战性。作为一个具体考虑因素,基于简单的规范性假设(12 亿个网站 × 每个网站平均 1,500 个单词 [根据 ChatGPT 查询,2023 年 9 月]),如今互联网上整个文本量的大小可能达到约 2 万亿个单词标记。 。从模型的角度来看,从 2018 年到 2022 年, LLMs的大小已从~10 8 (例如,ELMo、BERT-L)增加到~10 11 (例如,PaLM)待估计参数。作为适用于许多目标和应用程序的第一条经验法则,扩展模型的深度和宽度(增加参数数量)会带来明显的性能改进。了解模型如何扩展具有战略价值,因为此类见解可以为资源分配决策提供信息:如何确定计算预算、数据库和模型大小的优先级。
More specifically, one comprehensive, widely regarded empirical study in the deep learning literature explored and carefully benchmarked seven orders of magnitude of model scale.
17.
Kaplan, J. ∙ McCandlish, S. ∙ Henighan, T. ...
Scaling laws for neural language models
Preprint atarXiv. 2020;
These investigators designed computational experiments that successfully converged on three key factors that determine model scaling: (1) the number of model parameters (N), (2) the amount of available data (D), and (3) the amount of available computation power (C) used for model estimation. In stark contrast, in these experiments, the model performance only mildly depended on the actual shape of the model architecture. Overfitting (i.e., over-adjustment to idiosyncrasies in the training data) appeared to be largely prevented by increasing N and D in parallel. In contrast, performance decay resulted if only N or only D was increased (but see Touvron et al.
18.
Touvron, H. ∙ Lavril, T. ∙ Izacard, G. ...
Llama: Open and efficient foundation language models
Preprint atarXiv. 2023;
), holding the respective other factor fixed. Finally, continued scale up of N, D, and C displayed patterns of diminishing returns, following a power law.
更具体地说,深度学习文献中一项全面的、广泛关注的实证研究探索并仔细地对模型规模的七个数量级进行了基准测试。
17.
Kaplan, J. ∙ McCandlish, S. ∙ Henighan, T. ...
Scaling laws for neural language models
Preprint atarXiv. 2020;
这些研究人员设计的计算实验成功地集中在决定模型缩放的三个关键因素上:(1) 模型参数的数量 (N)、(2) 可用数据量 (D) 和 (3) 可用计算量用于模型估计的功率 (C)。形成鲜明对比的是,在这些实验中,模型性能仅轻微依赖于模型架构的实际形状。并行增加 N 和 D 似乎可以在很大程度上防止过度拟合(即对训练数据中的特性进行过度调整)。相反,如果仅增加 N 或仅增加 D,则会导致性能下降(但请参见 Touvron 等人,2014)。
18.
Touvron, H. ∙ Lavril, T. ∙ Izacard, G. ...
Llama: Open and efficient foundation language models
Preprint atarXiv. 2023;
),保持相应的其他因素固定。最后,N、D 和 C 的持续扩大显示出收益递减的模式,遵循幂律。
However, as a very recent development, rather than continuing the initial trend of increasing model size, LLMs have been shrunk back more and more in terms of parameter number.
18.
Touvron, H. ∙ Lavril, T. ∙ Izacard, G. ...
Llama: Open and efficient foundation language models
Preprint atarXiv. 2023;
,
19.
Hoffmann, J. ∙ Borgeaud, S. ∙ Mensch, A. ...
Training compute-optimal large language models
Preprint atarXiv. 2022;
Counterintuitively to many investigators, reducing the model size again, potentially aligning better with the actual amount of available data, boosted the model performance, loosened up memory requirements, and relieved the computational cost. These improvements may turn out to be critical for the applicability of LLM solutions to real-world problems and increase the potential, for example, of smart phones carrying dedicated LLMs in the years to come. In short, a nascent research stream indicates that more data are, relatively, more important than larger model sizes in terms of parameters, while both are driving factors, each by itself.
然而,作为最近的发展, LLMs并没有延续最初增加模型大小的趋势,而是在参数数量方面越来越缩小。
18.
Touvron, H. ∙ Lavril, T. ∙ Izacard, G. ...
Llama: Open and efficient foundation language models
Preprint atarXiv. 2023;
19.
Hoffmann, J. ∙ Borgeaud, S. ∙ Mensch, A. ...
Training compute-optimal large language models
Preprint atarXiv. 2022;
与许多研究人员的直觉相反,再次减小模型大小,有可能更好地与实际可用数据量保持一致,提高模型性能,放宽内存要求,并减轻计算成本。这些改进可能对于LLM解决方案对现实世界问题的适用性至关重要,并增加未来几年携带专用LLMs的智能手机的潜力。简而言之,一项新兴的研究表明,就参数而言,更多的数据相对而言比更大的模型更重要,而两者本身都是驱动因素。
Of note, measuring model performance critically depends on the investigator’s choice of evaluation metric.
20.
Schaeffer, R. ∙ Miranda, B. ∙ Koyejo, S.
Are emergent abilities of Large Language Models a mirage?
Preprint atarXiv. 2023;
These authors argue that changes in LLM behavior deemed “emergent” (abilities that are not readily apparent in smaller-scale models but are present in large-scale models
15.
Wei, J. ∙ Tay, Y. ∙ Bommasani, R. ...
Emergent abilities of large language models
Preprint atarXiv. 2022;
) may become apparent only due to researchers’ choice of certain evaluation metrics. Conversely, the authors
20.
Schaeffer, R. ∙ Miranda, B. ∙ Koyejo, S.
Are emergent abilities of Large Language Models a mirage?
Preprint atarXiv. 2023;
also showed that metric choice can induce seeming emergent abilities in diverse architectures and tasks. Hence, recent empirical investigations show
20.
Schaeffer, R. ∙ Miranda, B. ∙ Koyejo, S.
Are emergent abilities of Large Language Models a mirage?
Preprint atarXiv. 2023;
that changing metrics can weaken or strengthen signs of emergent abilities in LLM architectures as a function of model scale, with direct implications for AI safety and AI alignment.
值得注意的是,衡量模型性能关键取决于研究者对评估指标的选择。
20.
Schaeffer, R. ∙ Miranda, B. ∙ Koyejo, S.
Are emergent abilities of Large Language Models a mirage?
Preprint atarXiv. 2023;
这些作者认为, LLM行为的变化被认为是“涌现的”(在较小规模的模型中并不明显但在大规模模型中存在的能力)
15.
Wei, J. ∙ Tay, Y. ∙ Bommasani, R. ...
Emergent abilities of large language models
Preprint atarXiv. 2022;
)可能仅由于研究人员选择某些评估指标而变得明显。相反,作者
20.
Schaeffer, R. ∙ Miranda, B. ∙ Koyejo, S.
Are emergent abilities of Large Language Models a mirage?
Preprint atarXiv. 2023;
还表明,度量选择可以在不同的架构和任务中引发看似新兴的能力。因此,最近的实证研究表明
20.
Schaeffer, R. ∙ Miranda, B. ∙ Koyejo, S.
Are emergent abilities of Large Language Models a mirage?
Preprint atarXiv. 2023;
改变指标可以削弱或加强LLM架构中作为模型规模函数的新兴能力的迹象,对人工智能安全和人工智能一致性产生直接影响。
Overall, larger LLMs turn out to be more sample efficient than smaller LLMs in the fine-tuning or few-shot learning scenario. That is, paradoxically, the more model parameters need to be estimated, the less input data points you need to achieve comparable performance. As in data science in general, increased data quality can always lead to further performance gains. Although it is important to acknowledge that neural network scaling laws are still almost entirely empirical at this point, these scaling behaviors show robust trends (but see Caballero et al.
21.
Caballero, E. ∙ Gupta, K. ∙ Rish, I. ...
Broken neural scaling laws
Preprint atarXiv. 2022;
). The expansion and explosion of LLM architectures was fueled by (1) the invention of transformers, which tend to vary only slightly across recent LLMs, (2) availability of abundant data sources, and (3) availability of compute power at scale. Of relevance to the next section, the specific architecture of the model (number of layers, layer dimension, etc.) is relatively inconsequential, particularly as the model size increases.
总体而言,在微调或小样本学习场景中,较大的LLMs比较小的LLMs样本效率更高。也就是说,矛盾的是,需要估计的模型参数越多,实现可比性能所需的输入数据点就越少。与一般数据科学一样,提高数据质量总是可以带来进一步的性能提升。尽管重要的是要承认神经网络缩放定律在这一点上仍然几乎完全是经验性的,但这些缩放行为显示出强劲的趋势(但请参阅 Caballero 等人,2017)。
21.
Caballero, E. ∙ Gupta, K. ∙ Rish, I. ...
Broken neural scaling laws
Preprint atarXiv. 2022;
)。 LLM架构的扩展和爆炸式增长是由以下因素推动的:(1) 变压器的发明,在最近的LLMs中,变压器的变化往往略有不同,(2) 丰富的数据源的可用性,以及 (3) 大规模计算能力的可用性。与下一节相关的是,模型的具体架构(层数、层维度等)相对无关紧要,特别是随着模型大小的增加。

Large language models exhibit unprecedented transfer learning capabilities
大型语言模型展现出前所未有的迁移学习能力

For deep learning tools to thrive, there is commonly a need for data abundance. However, many areas of neuroscience do not have massive data troves readily available, let alone the internet-scale kinds of datasets that fuel text and image analysis in the AI community. This discrepancy begs the tactical question: what kinds of abundant non-neuroscience data can be leveraged to port modeling solutions over to revisit and attack neuroscience problems?
为了让深度学习工具蓬勃发展,通常需要丰富的数据。然而,神经科学的许多领域并没有现成的海量数据宝库,更不用说为人工智能社区中的文本和图像分析提供支持的互联网规模的数据集了。这种差异引发了一个战术问题:可以利用哪些丰富的非神经科学数据来移植建模解决方案,以重新审视和解决神经科学问题?
Intuitively, “transfer learning” is a mode of data analytics that revolves around storing structured knowledge gained while solving one problem, for applying it to a different but related problem. Transfer learning aims to improve the performance on a similar, often more constrained, modeling task that is typically (severely) under-resourced. In the context of deep learning, this modeling regime typically refers to the practice of, as a starting point, pre-training a model on a massive dataset and, then, applying or refining (“fine-tuning” its model parameters by slight adjustments) this model on a smaller dataset pertaining to a specific task of actual interest (please see https://www.ruder.io/transfer-learning/ for a comprehensive source on fine-tuning techniques for LLMs). This agenda cashes in on the hypothesis that the features learned by the pre-trained model can serve as a general representation, beneficial for the target task. Historically, the success of transfer learning typically depended on a high degree of similarity between the pre-training and fine-tuning tasks (but see next section).
直观地说,“迁移学习”是一种数据分析模式,围绕存储在解决一个问题时获得的结构化知识,以将其应用于不同但相关的问题。迁移学习旨在提高通常(严重)资源不足的类似且通常更加受限的建模任务的性能。在深度学习的背景下,这种建模方式通常指的是作为起点,在海量数据集上预训练模型,然后应用或细化(通过轻微调整“微调”其模型参数)的实践)该模型基于与实际感兴趣的特定任务相关的较小数据集(请参阅https://www.ruder.io/transfer-learning/了解LLMs微调技术的综合来源)。该议程兑现了这样的假设:预训练模型学习的特征可以作为通用表示,有利于目标任务。从历史上看,迁移学习的成功通常取决于预训练和微调任务之间的高度相似性(但请参阅下一节)。
LLMs, and other transformer-carrying architectures, have shown beyond-expectations capability in transfer learning, thus revolutionizing NLP by expanding the scope of executable tasks. As a key inflection point, until recently, the dominant paradigm still consisted in supervised model pre-training on massive corpora. This requirement of large quantities of data points with high-quality annotations was vexing. High-quality labels are typically logistically challenging to obtain—severely limiting what kinds of data available on the internet and other sources could actually be used for effective pre-training and thus transfer learning. It is only now that unsupervised pre-training, not requiring accurate annotations for each data point, came into reach and generated unseen performances. This watershed event considerably expands the scope of data usable for pre-training of LLMs.
LLMs和其他变压器承载架构在迁移学习方面表现出了超出预期的能力,从而通过扩大可执行任务的范围彻底改变了 NLP。作为一个关键的转折点,直到最近,主导范式仍然存在于大规模语料库上的监督模型预训练。这种对大量数据点和高质量注释的要求令人烦恼。高质量标签通常在逻辑上很难获得,这严重限制了互联网和其他来源上可用的数据类型实际上可以用于有效的预训练,从而进行迁移学习。直到现在,不需要对每个数据点进行准确注释的无监督预训练才得以实现并产生了看不见的性能。这一分水岭事件大大扩展了LLMs预培训可用的数据范围。
More formally, the more parameters that need to be estimated in an LLM, the slower the model development process. LLMs opened the door to new regimes of fine-tuning going beyond what pattern-learning algorithms could achieve before. Several approaches have been proposed to adapt a model to a new task while only updating or adding a relatively small number of parameters. One tactic consists of “freezing” (leaving unchanged) the parameters of several layers of a pre-trained LLM. This approach then adapts only a small fraction of adjustable parameters for the downstream task, thus avoiding “(catastrophic) forgetting” of originally extracted knowledge encapsulated in the initial model instance.
更正式地说, LLM需要估计的参数越多,模型开发过程就越慢。 LLMs为新的微调机制打开了大门,超越了模式学习算法以前所能实现的目标。已经提出了几种方法来使模型适应新任务,同时仅更新或添加相对少量的参数。一种策略是“冻结”(保持不变)预训练的LLM的多个层的参数。然后,这种方法仅针对下游任务调整一小部分可调整参数,从而避免“(灾难性)遗忘”封装在初始模型实例中的最初提取的知识。
This logic can be extended during fine-tuning by adding new learnable layers within the LLM. Such “adapters”
22.
Houlsby, N. ∙ Giurgiu, A. ∙ Jastrzebski, S. ...
Parameter-efficient transfer learning for NLP
PMLR. 2019; 97:2790-2799
,
23.
Pfeiffer, J. ∙ Rücklé, A. ∙ Poth, C. ...
Adapterhub: A framework for adapting transformers
Preprint atarXiv. 2020;
,
24.
Bapna, A. ∙ Arivazhagan, N. ∙ Firat, O.
Simple, scalable adaptation for neural machine translation
Preprint atarXiv. 2019;
can considerably reduce training time and compute costs on the target task. The selection of particularly high-quality data for the fine-tuning phase was shown to lead to competitive performance, with even fewer target task examples. LLMs proved remarkable in few-shot learning. At its extreme, zero-shot learning leveraging pre-trained LLMs (using a trained LLM on a new task, without providing examples for that new task) turned out to be proficient at a variety of downstream tasks out-of-the box, that is, even without adjustment of the pre-trained model.
25.
Radford, A. ∙ Wu, J. ∙ Child, R. ...
Language models are unsupervised multitask learners
OpenAI blog. 2019; 1:9
,
26.
Brown, T. ∙ Mann, B. ∙ Ryder, N. ...
Language models are few-shot learners
Adv. Neural Inf. Process. Syst. 2020; 33:1877-1901

通过在LLM中添加新的可学习层,可以在微调期间扩展此逻辑。这样的“适配器”
22.
Houlsby, N. ∙ Giurgiu, A. ∙ Jastrzebski, S. ...
Parameter-efficient transfer learning for NLP
PMLR. 2019; 97:2790-2799
23.
Pfeiffer, J. ∙ Rücklé, A. ∙ Poth, C. ...
Adapterhub: A framework for adapting transformers
Preprint atarXiv. 2020;
24.
Bapna, A. ∙ Arivazhagan, N. ∙ Firat, O.
Simple, scalable adaptation for neural machine translation
Preprint atarXiv. 2019;
可以大大减少目标任务的训练时间和计算成本。事实证明,为微调阶段选择特别高质量的数据可以带来具有竞争力的性能,而目标任务示例甚至更少。事实证明, LLMs在几次学习中表现出色。在极端情况下,利用预先训练的LLMs (在新任务上使用训练有素的LLM ,而不为该新任务提供示例)的零样本学习被证明精通各种开箱即用的下游任务,即是的,即使没有调整预训练模型。
25.
Radford, A. ∙ Wu, J. ∙ Child, R. ...
Language models are unsupervised multitask learners
OpenAI blog. 2019; 1:9
26.
Brown, T. ∙ Mann, B. ∙ Ryder, N. ...
Language models are few-shot learners
Adv. Neural Inf. Process. Syst. 2020; 33:1877-1901
In short, the monstrosity of LLMs, encompassing billions of adjustable model parameters, unlocked the extraction of quintessential representations from massive text corpora, without the previously acute need for supervised label annotations. Unsupervised deep learning turned out to be much more scalable in practice. Hence, neuroscientists at organizations without the means to train LLMs from scratch can still benefit from state-of-the-art performance by refining already pre-trained models on target tasks of primary interest, with reduced data and compute budget requirements. LLMs can thus better identify deep hidden patterns, relationships, and context within text. This led to the capability of responding to human queries, generating creative novel content, and forming accurate outcome predictions.

Foundation models as computational LEGO bricks

Paradigmatically, LLMs are trained initially on massive text corpora, such as internet content and other public or private sources. This leads the model to develop and instantiate a general internal representation of semantic meaning, even across different languages, including syntax and grammar, although it is a matter of current debate to what extent LLMs develop an understanding of meaning.
20.
Schaeffer, R. ∙ Miranda, B. ∙ Koyejo, S.
Are emergent abilities of Large Language Models a mirage?
Preprint atarXiv. 2023;
,
27.
Xiang, J. ∙ Tao, T. ∙ Gu, Y. ...
Language Models Meet World Models: Embodied Experiences Enhance Language Models
Preprint atarXiv. 2023;
,
28.
Berglund, L. ∙ Tong, M. ∙ Kaufmann, M. ...
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A".
Preprint atarXiv. 2023;
Going much beyond that, the model learns an efflorescence of general facts, certain apparent reasoning abilities, and, possibly, a semantic world representation. The evolution of foundation models can maybe be traced back to last-generation NLP models, before the transformer era (2017-), like Word2Vec
5.
Mikolov, T. ∙ Chen, K. ∙ Corrado, G. ...
Efficient estimation of word representations in vector space
Preprint atarXiv. 2013;
and GloVe,
6.
Pennington, J. ∙ Socher, R. ∙ Manning, C.D.
Glove: Global vectors for word representation
which expressed words in continuous vector spaces (cf. introduction), hinting at universality of spanned semantic spaces.
By distilling and assimilating the quintessence from disparate expansive sources, a general-purpose representation is formed that encapsulates vast, compact, and dense human knowledge, as a form of prior for downstream modeling. It is not just memorization but information extraction and structurization. Philosophically, such a successful compression of information can mark a milestone toward refined predictions, in that successful prediction indicates a form of information compression. Similar to a shared infrastructure or platform, such an AI engine can then act as the bedrock on which a variety of tasks can be built, making many quantitative modeling workflows feasible, efficient, and scalable. These “LEGO bricks” can be thought of as foundations because many downstream applications can be constructed on top of them, like stacking building blocks. This fresh attitude to quantitative modeling is the strict opposite of training specialized models for deployment in narrow tasks.
It is possible to use thousands of GPUs training an LLM for weeks on trillions of word tokens, with a result that can be stored and deployed on a smartphone. As a crucial consequence for the future, foundational modeling frameworks provide universal computational units that will potentially democratize access to high-quality AI solutions across broad categories of investigators. This is all the more important for the neurosciences, because investigators tend to operate on smaller datasets than those in the core machine learning community. Similarly in biological research, even the Human Cell Atlas project has produced gene expression data for “only” ∼40 million human cells from ∼6,000 donor individuals at the time of this writing.
Bold innovation will emerge from creative ideas on how to put these baseline operation systems to use, to revisit and tackle classic research questions—applications that were categorically unthinkable and infeasible before current NLP technology. Enabling researchers across diverse domains to bootstrap common building blocks may also help boost comparability across studies and foster collaboration across teams, institutions, and geographies. The fruit of deep learning breakthroughs will be increasingly accessible in always more resource-constrained settings. It is likely that foundation models will change the face of bioinformatics in neuroscience and biomedicine in the near future.

Large language models for biological sequences

The inductive biases of LLM learning engines immediately appear appropriate not only for word sequences, but also for different kinds of biological sequences, presenting many unexplored opportunities. The human genome encodes for ∼20,000 genes, the segments of DNA that form the basis for protein synthesis in cells of the brain and other body parts. A natural proving ground, with direct relevance to the neurosciences, is the “central dogma of biology:” the one-directional flow of genetic information from (1) nucleotide sequences in DNA to (2) base sequences in messenger RNA to (3) amino acid sequences in protein products.
As a principal goal, geneticists wish to map this progression of genetic information, to link alterations in the DNA sequence itself to corresponding functional outcomes. To that end, MetaAI has presented a protein language model (Figure 3) that predicts phenotypic consequences from differences in genetic variants.
29.
Brandes, N. ∙ Goldman, G. ∙ Wang, C.H. ...
Genome-wide prediction of disease variant effects with a deep protein language model
Nat. Genet. 2023; 55:1512-1522
A 650-million-parameter model was used to infer the totality of ∼450 million possible missense variant effects in the human genome—each a switch in a single DNA nucleotide that leads to an amino acid swap in the downstream protein (pathogenic or benign). Such variants in DNA gene encoding are of special interest since they entail protein alterations that can be linked to disease mechanisms and possible therapeutic targets. Such approaches enable an exhaustive profiling of protein-disrupting damaging variants across the entire genome in humans and other organisms.
Figure 3 Protein language models to predict the functional consequences of genetic variants
Can we automatically derive insights into the underlying cellular states and active biological pathways from RNA transcript expression data alone? At the level of single-cell RNA gene expressions, an LLM was trained on 10 million cells (Figure 4), each cell containing expression values for a fraction of the approximately 20,000 human genes.
30.
Cui, H. ∙ Wang, C. ∙ Maan, H. ...
scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI
Preprint atbioRxiv. 2023;
As a seminal example of a foundation model (cf. above), gene sets are modeled as making up biologically meaningful processes in biology analogous to how word sets make up meaningful sentences in language. By means of ingesting a mass of gene expression patterns, the model formed an internal representation of general principles of gene-gene relations and gene-cell relations. In addition to gene-specific tokens, special tokens were introduced to denote meta-information such as cell type, data batch, and experimental conditions like perturbations of signaling pathways and techniques used for RNA transcript sequencing. The authors also abolish the need for the input being a sequence: they designed a mission-tailored attention mechanism to get a tight grip on cohesive co-occurrence regimes of expressed genes, akin to auto-regressive generation, based on iteratively predicting expression of new genes in sets, akin to the next-word prediction goal in sentence sequences. That is, the authors recast the inductive bias from the usual sequence-of-words-in-a-sentence logic in exchange for a bag-of-genes-in-a-cell logic to avoid a strict sequence requirement. Once established, the trained foundational LLM could then be fine tuned and deployed with performance gains in a variety of different downstream tasks, including nuisance batch correction, cell type annotation, and prediction of targeted perturbation conditions. Such approaches show potential for leveraging self-supervised learning techniques to grasp complex single-cell mechanisms and use the ensuing internal embedding representations for integration across different organs and species.
Figure 4 Creating a foundation model of the “grammar” of transcriptome biology from exponentially growing single-cell genomics data
Going from gene level to the level of 3D protein structure requires prediction of ultimate 3D configuration from 1D amino acid sequences alone. The “protein folding problem” revolves around how information in our DNA compresses information about final protein forms. With >200 million protein structures in the database, AlphaFold2
31.
Jumper, J. ∙ Evans, R. ∙ Pritzel, A. ...
Highly accurate protein structure prediction with AlphaFold
Nature. 2021; 596:583-589
is based on LLMs to capture protein sequence interactions between amino acid residues that are far away from each other along the protein backbone. In a bruteforce shotgun learning approach, the authors showed that 1D sequence information does contain key information necessary to understand the complex process of how proteins actually fold in nature. At the protein-to-function level,
32.
Rives, A. ∙ Meier, J. ∙ Sercu, T. ...
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences
Proc. Natl. Acad. Sci. USA. 2021; 118, e2016239118
investigators trained 700-million-parameter 34-layer transformer models on 86 billion amino acids across 250 million protein sequences (UniParc database). The model-internal embedding representations were gleaned from just the sequence information itself. The trained model was found to instantiate knowledge relevant to the ultimate protein’s biochemical properties, elements of morphological structure in vivo, contact sites, and biological activity.
Taken together, capturing long-range interactions (i.e., tokens far apart from each other in the input sequence) turns out to be valuable to derive meaningful general principles not only in word sequences but also in different biological sequences. Nature appears to harbor underlying general rules that can now be exploited for extrapolating beyond actual sequence elements (e.g., nucleic acid, gene expression, amino acids) in the service of next-generation computational biology. The learned sequence embeddings can be used for various downstream research goals, including quality control procedures, grouping of biological entities, and enhancing phenotype predictions.
Moreover, LLMs serve as a platform that now enables advanced in silico models of the central dogma of biology (going from the DNA double helix to gene transcript expression to fully formed proteins). That is, once the LLM can accurately approximate the target system, reiterating trusted observations from previous rigorous experiments, investigators will be able to interrogate the LLM to extract new molecular insights about the target system and to identify broader driving biological mechanisms. We warrant caution to draw overly strict parallels between semantic language systems and molecular biology systems, given notable differences. Nevertheless, in the future, LLMs are in a unique position to help generate new sequences, never observed in the wild, that are biologically active.

Large language models for automated annotation

Neuroscience research often relies on accurate annotation for data elaboration, designing experiments, or interpretation of results. A recent study using classical NLP
33.
Yang, E. ∙ Milisav, F. ∙ Kopal, J. ...
The default network dominates neural responses to evolving movie stories
Nat. Commun. 2023; 14:4197
explored links between brain response signals of subjects watching the movie Forrest Gump and the evolution of the movie story, that is, the constituent semantic facets that make up the film narrative (Figure 5). This study serves as a prime example of research that depends on pertinent high-quality annotations of data points. Regarding the brain recordings collected from the studyforrest database (https://www.studyforrest.org/data.html), 3,000 individual images of whole-brain neural activity were acquired from each of 15 subjects as they watched the story in the 2-h film unfold (25 such images per minute). To enrich the provided dataset, all of the scenes throughout the movie were enriched by computationally derived meta-information. To this end, text data were obtained from previously underexploited sources: time-locked subtitles and an auditory-only narrated version of the film oriented toward blind audiences that describes the events and scenes in the movie—the starting point for NLP-enabled data augmentation.
Figure 5 Multi-modal brain-text integration using NLP based on movies
The combined scene-by-scene text information of Forrest Gump was captured as a bag-of-words matrix—the set of all unique words with their frequencies in a given time slice accompanying the entire movie. Latent semantic analysis was then used to deconvolve the scene-wise word statistics into unique semantic dimensions, to capture the underlying meaning and recurring themes in the story line. In parallel, in a classical top-down approach, human annotators (a group of students) manually attached tags to scenes by choosing among a set of 52 pre-defined “indicators” from the audiovisual version of the movie. These choices were based on the scenes’ emotional content, circumstances, and other aspects, predefined a priori to be relevant, based on existing knowledge. This classical approach, which emphasized (for example) the detailed characterization of human emotions based on natural subjective experiences of human observers, turned out to miss important nuances that the text-derived semantic meaning representations reflected well, showing the potential for future LLM approaches in naturalistic neuroscience.
Going beyond the status quo of manual annotation, the NLP approach (latent semantic analysis) enabled decomposition of the story into 200 semantic movie contexts, each with their scene-by-scene relevance. As a complement to human-derived emotion annotations, the semantic contexts provided a means to track the occurrence of characters (e.g., Lieutenant Dan), contexts (e.g., war), and scene properties (e.g., day versus night). Analysis of the integrated data revealed empirical connections between brain states and specific elements, concepts, and themes within scenes.
33.
Yang, E. ∙ Milisav, F. ∙ Kopal, J. ...
The default network dominates neural responses to evolving movie stories
Nat. Commun. 2023; 14:4197
Hence, algorithmically derived semantic facets were more successful in combined movie-brain-text analysis than traditional approaches that rely on human a priori intuition to determine which aspects should be most important.
LLMs present an opportunity to carry over knowledge and concepts from other areas of human activity into the process of how scientific research is conducted today. Pipelining annotation generation could dramatically enhance our capabilities to scale complex manual protocols such as those employed in image and video data, as in the study detailed above, but also many other forms of stimulus material, such as electronic health records, voice recordings, or the biometric outputs captured by wearable devices. Various other kinds of neuroscience-related data sources could be directly combined with brain signals on a single subject basis or at the group level. Historically, annotation of these data forms has required input from human experts, either directly or indirectly through tools dedicated to specific end-to-end input-output learning, such as neural networks trained to discern human emotions directly from visual data, or an electronic olfaction device engineered to estimate the subjective appeal of scent compounds based on their physical characteristics.
34.
Ye, Z. ∙ Liu, Y. ∙ Li, Q.
Recent Progress in Smart Electronic Nose Technologies Enabled with Machine Learning Methods
Sensors. 2021; 21, 7620
There are certain caveats associated with manual annotation in general, several of which LLMs can mitigate, including (1) high logistic and financial cost of manual effort, (2) ontological limitations of categorization systems used to derive annotation tags, (3) subjectivity from human annotators and subjectivity-based data, and (4) reproducibility.
Ultimately, as indicated above, due to the high cost required for their procurement, manually annotated vision and language datasets are relatively rare and often small in size (10,000–100,000 data points). In response to previous annotation data scarcity, numerous works
35.
Alayrac, J.-B. ∙ Donahue, J. ∙ Luc, P. ...
Flamingo: a visual language model for few-shot learning
Adv. Neural Inf. Process. Syst. 2022; 35:23716-23736
,
36.
Sharma, P. ∙ Ding, N. ∙ Goodman, S. ...
Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018;
,
37.
Thomee, B. ∙ Shamma, D.A. ∙ Friedland, G. ...
YFCC100M: The new data in multimedia research
Commun. ACM. 2016; 59:64-73
automatically scrape readily available paired vision-text data from the internet and other general-purpose sources. Now similar feats to those achieved in the image-text annotation domain can be achieved in text-text annotation scenarios. With LLMs annotations can be automatically generated after model pre-training on a variety of data relevant to the annotation task at hand. As a hypothetical example, a biotechnology company is interested in tagging first-hand accounts of psychoactive drug experiences with labels indicating different subjective effects; pairs of such accounts and manually applied subjective effect tags can be used for fine-tuning of a foundation model employed by the company. Alternatively, LLMs such as GPT4 can be prompted to perform this task without any additional training data, based on the assumption that their training sets provide enough context to discriminate between different subjective effect terms and examples thereof.
Phrases and sentences, like single words, can be automatically assigned information-rich semantic embeddings, the same is true for automatically (or manually) obtained annotations. Conversion of freeform textual information via LLM “encoders” to the structured embedding vectors enables continuous quantification of discrete semantic elements. In a complementary manner, LLM “decoders” serve to transform embeddings back into language text. Preprocessing natural language as embeddings unlocks the door to new methodologies for probing correlations between distinct linguistic patterns and neural activities. Associating natural language data with neurological measurements is a step toward profound comprehension of the generation, perception, processing, and interpretation of language by the human brain. The quantitative representation of natural language text is the industry standard intermediate form used in computational analysis, implying reproducibility and potential for more tunable and scalable augmentation. Language, serving as a tool to encapsulate information derived from the five human senses, affords the quantified representation of a diverse range of phenomena within human experience.
Once again touching on the world of image auto-annotation as a source of inspiration for text-annotation tasks, the tool RETfound is an innovative approach that addresses the image-to-text problem in the medical domain.
38.
Zhou, Y. ∙ Chia, M.A. ∙ Wagner, S.K. ...
A foundation model for generalizable disease detection from retinal images
Nature. 2023; 622:156-163
RETfound is a foundation model for labeling widely available retinal images with disease categories. It is designed to expedite diagnosis of diseases including cataract, central serous retinopathy, diabetic retinopathy, glaucoma, heart failure, macular dysfunction, myocardial infarction, Parkinson’s disease, stroke, and macular degeneration. The model architecture is based on the large vision transformer framework: an encoder is used to generate a high-resolution embedding space that can be used to differentiate between retinal image features, analogous to embeddings used by LLMs as a way of encoding semantics in natural language text.
RETfound’s decoder is used for image reconstruction, while the encoder is used to derive features for fine-tuning toward downstream disease prediction tasks. RETfound was pre-trained on 1.6M unlabelled retinal images via self-supervised learning—a paradigm where AI models learn to find patterns within a dataset without any additional training information. For example, if a neural network was trained in a self-supervised learning task using a training set consisting of pet images, the model would most likely learn to recognize shapes that correspond to cats, dogs, and other popular pets. The model knows how to distinguish between images of different types of pets, but it does not “know” that we call one group “cats” and other information that might be linked to the pets in the images. The same is true for RETfound in its pre-fine-tuned state: it can distinguish between distinct variations seen in retinal scan images, and this ability allows it to then be fine-tuned for particular disease detection tasks.
This fine-tuning was performed with specific expert-provided labels from a number of different datasets ranging in size. For example, the “OCTID” dataset, containing 470 retinal scans used to label conditions such as “normal,” “macular degeneration,” and “diabetic retinopathy,” and the Moorfields Eye Hospital-AlzEye dataset, which contains ophthalmic data linked to 353,157 patients’ health records who attended this hospital between 2008 and 2018, were used for fine-tuning to orient RETfound toward Wet-AMD prognosis.
39.
Wagner, S.K. ∙ Hughes, F. ∙ Cortina-Borja, M. ...
AlzEye: longitudinal record-level linkage of ophthalmic imaging and hospital admissions of 353 157 patients in London, UK
BMJ open. 2022; 12:e058552
With such comprehensive training, RETfound can be used to create text descriptions of retinal images based on predictions made in the context of pixel patterns in images from records generated by medical professionals. Hence, models like RETfound are designed to alleviate the annotation workload of experts, serving as inspiration for conceptual frameworks employing LLMs for similar purposes.
Image formats can be used to capture the physical world, on the one hand, and activity of neurons in the brain, on the other hand. Alternatively, they can serve as experimental variables such as pictures used in experiments depending on visual stimuli to explore links between brain scans and such stimuli. In contrast to image formats, chemical structures and descriptors thereof can capture key aspects of brain chemistry, neurophysiology, neuropharmacology, and chemosensory stimuli. Simplified molecular-input line-entry system (SMILES) is a method of description for representing and semantically re-expressing chemical structures as text-based objects (Figure 6). SMILES was first conceived based on the principles of molecular graph theory to represent chemical structure with rigorous specification in a way that was well suited for machine processing.
40.
Weininger, D.
SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules
J. Chem. Inf. Comput. Sci. 1988; 28:31-36
Figure 6 Examples of paired chemical structures and SMILES sequences
There are vast bodies of scientific literature containing chemical names, sometimes appearing in standardized form but oftentimes not. With the proper collection, curation, and integration strategy, a corpus combining chemical names and SMILES structures could be prepared to train an LLM or fine-tune a foundation model for the purpose of exploring potential predictive relationships between chemical structure and semantic content. If this can be achieved, going even one step further, the common embedding space could be connected to a generative model that outputs chemical structures on the basis of text inputs (e.g., “I would like to see novel chemical structures that will be able to enter the human CNS, please!”). In a not-so-distant future, such multi-modal LLMs could become a valuable partner to scientists to enhance the creative process of generating entirely new molecules with targeted properties, whether they be physical, chemosensory, or pharmacological.
Another potential use of a common embedding space between SMILES and natural language would be to analyze mixtures of chemicals as opposed to single chemicals. Just as interpretation of words and phrases appearing in natural language can be significantly influenced by their context, perception of odorant molecules present in chemosensory stimuli (naturally encountered as mixtures) is influenced by the combinations and concentrations of other mixture constituents. Furthermore, small molecules such as neurotransmitters, hormones, drugs, and toxins often act in tandem with their metabolites, impurities, and other biomolecules. These combined elements can exert biochemical and physiological effects in their surroundings, such as binding to target receptors or modulating signal transduction pathway activity. Hypothetically, the common latent embedding space of an LLM that was trained with SMILES and natural language could be used to navigate the complex, context-dependent, multiplicity of action by chemicals and mixtures, of direct relevance to neuroscience.
Another issue presented by annotation, separate from high cost, is the fact that annotation that relies on a predetermined ontology or classification system will be limited by the descriptive capability of such a description system. Typically, individuals who perform annotation tasks must be trained to properly use a given ontology for application of classifications to data points, as an attempt to mitigate the known challenges of inter-rater variability. Sometimes the training required to properly annotate data is extensive, and the annotators must be qualified as subject matter experts as opposed to lay persons. The embeddings generated by LLM encoders can potentially be “translated” to a set of terms within a targeted ontology using techniques like semantic similarity measurement or clustering.
If left untranslated, LLM embeddings offer a high level of semantic granularity that is not afforded by ontological classification. This specificity is valuable in any instance where researchers are interested in recording distinct outcomes, because it enables flexibility in terms of how to categorize annotations in a way that is of direct relevance to the specific experiment to be performed. As a straightforward hypothetical example, one could (1) generate semantic embeddings from annotation labels or another experimental variable recorded via text, (2) generate embeddings from terms present in targeted ontologies, or (3) calculate cosine distance between the two sets of embeddings to identify the “nearest neighbor” term from targeted ontologies for each text-based experimental variable. While such an approach might not afford the accuracy that can be provided by subject matter experts, what it lacks in resolution it compensates for with objectiveness and operational consistency that increases both scalability and reproducibility of annotation at scale. On the other hand, the embeddings yielded by LLMs also provide researchers with a means to analyze annotated datasets, via clustering or more sophisticated techniques, to identify new classification systems.
Ideally, we will soon be able to assign expert-grade annotations by means of LLMs, even in the absence of close collaboration with a card-carrying domain expert. To make matters more interesting, once it has been demonstrated that LLMs can apply pre-existing ontologies for annotation in a manner that is comparable or superior to the performance of experts, we can turn to “expert LLMs” to help with the identification and validation of new terms and ontologies that are derived in a data-driven way. We can also examine the results of LLM-based annotation to challenge incumbent classification systems that were designed by limited heuristics. While rule-based solutions operate based on explicit predefined criteria, black-box AI solutions (despite their opaque decision-making processes) often excel in handling vast and complex datasets,
41.
Bzdok, D. ∙ Ioannidis, J.P.
Exploration, inference, and prediction in neuroscience and biomedicine
Trends in neurosciences. 2019; 42:251-262
42.
Bzdok, D. ∙ Engemann, D. ∙ Thirion, B.
Inference and prediction diverge in biomedicine
Patterns. 2020; 1:100119
achieving superior predictive accuracy where traditional methods might struggle. Using LLM-assisted annotations as a complementary approach to supplement legacy top-down (e.g., manual categorization by domain specialists) and standard rule-based (e.g., predefined algorithms for data point classification) solutions is one way we can simultaneously leverage the knowledge that comes from expert experience as well as the new insights we can obtain from LLMs, a data form that can truly “speak for themselves.”
LLMs have been described as chameleons (https://karpathy.ai/lexicap/0215-large.html) or enabling forms of “role play.”
43.
Shanahan, M. ∙ McDonell, K. ∙ Reynolds, L.
Role play with large language models
Nature. 2023; 623:493-498
They can take on the personality and adopt the thought or writing style of known persons or categories of persons with specific traits, such as Charlotte Brontë, Carl Sagan, or a neuroscientist. This capacity can be leveraged in a variety of ways. In some annotation tasks, it is beneficial to seek counsel with a panel of experts across multiple disciplines, as opposed to a panel of evaluators who all share the same background. Several LLMs assuming different “chameleon” stances could be used in parallel in an annotation task, analogous to a panel of human raters. The LLMs can be asked to take the position of different experts, personality types, professions, age groups, and cultural backgrounds. LLMs not only address the problems presented by the influence of individual subjectivity on annotation tasks, but they also simultaneously enable expression and manipulation of such subjectivity. LLMs can eliminate the transient fluctuations of ephemeral emotional states experienced by human annotators, and if called for they can introduce them in a controlled and repeatable way.
There are many sources of inconsistency in the language used to describe neuroscience research as well as subjective experience. These vagaries fuel disagreement in the interpretation of annotation between different researchers. The universal format of coherent semantic embedding spaces enables capture and manipulation of vague or subjective language. Crucially, these representations are exactly repeatable across laboratories and other contexts of research and analysis; so long as the same LLM is used for the same task, with the same totality of previously estimated model parameters. From a practical standpoint in regards to scientific research, this feature should have a significant effect on shareability of annotated data across investigators or labs, hopefully expanding the breadth and depth of downstream applications for auto-annotation of datasets via LLMs.
Separate individuals can annotate the same data differently, and even a single annotator’s responses may vary over time. LLMs offer a more stable and consistent form of annotation because they are trained on a broad dataset and are not influenced by subjective experiences. In place of the subjectivity that influences human performance on manual annotation tasks, LLMs have a nuanced mapping of linguistic context captured by the use of language in their training corpora. Off the shelf, LLMs can be thought of as approximations of the average mind of all internet users, "crowd-sourcing thought," since a large portion of their training corpora is derived from the internet. Alternatively, in the event that a foundation model does not appear to capture enough nuance to achieve a specific task, it could be fine-tuned to approximate the average mind based on a certain subset of websites or internet users.
There are typically subjective aspects to the process of manual annotation, especially in the case that the object being annotated is something experienced on a subjective basis. In the example above where students were manually annotating scenes from Forrest Gump, they were asked to annotate the emotions they perceived to be expressed by the actors in the film. This task requires subjective interpretation of emotions portrayed in the movie in the first place, on top of the fact that emotional experience is highly subjective in nature. The studyforrest dataset also includes annotations of the physical location in which each scene takes place.
33.
Yang, E. ∙ Milisav, F. ∙ Kopal, J. ...
The default network dominates neural responses to evolving movie stories
Nat. Commun. 2023; 14:4197
Even though these annotations (“night” vs. “day,” “inside” vs. “outside”) are largely objective judgments made by subject matter experts (two individuals with academic background in film), there is still room for subjective interpretation by the annotators, as exemplified by the operational definition of “day” as any scene that was illuminated by sunlight, as opposed to some other determining factor.
LLMs enable reconciliation between the world of subjective phenomena and objective measurement. The representation of semantic entities via LLM embeddings preserves the discrete subjective or contextual meaning in text such that it can be compared in a consistent way with other text. For example, imagine a scenario where sentences are collected from social media posts to be auto-annotated with labels indicating emotion to be used in a training set for an NLP model that predicts the emotion of social media users from their posts. No matter how unique each envisioned sentence is, the distance between their embeddings and the embeddings of terms such as “enthusiastic,” “depressed,” “nostalgic,” or “peaceful” can be calculated in a uniform fashion. Due to the fact that LLM training corpora capture a large volume of text describing subjective phenomena, more stable and consistent annotations yielded by LLMs can readily be used to characterize subjective experience-based data elements, without using subjective human judgment as part of the annotation process.
The use of LLMs to automate annotation tasks is not a stepwise improvement; it is a next-generation approach that can disrupt a mainstream practice that would otherwise continue to fall prey to subjectivity and other forms of idiosyncrasy. For instance, consider the task of annotating emotions in a collection of diary entries. If given to a group of human annotators, one might label a passage as “sad” based on their personal experiences and cultural background, while another might see it as “reflective” or “nostalgic.” Because LLMs are autoregressive, state dependent, and have hyper-parameters such as temperature (cf. previous section “data science perspective on large language model solutions”), there are reasons why the answers for an identical prompt may not necessarily be exactly the same. Nonetheless, if experimental conditions are held constant, answers from LLMs should be largely restricted to narrow regions of semantic space. In this way, the LLM may offer a level of objectivity and consistency that human annotators, with their inherent subjectivity and idiosyncrasies, simply cannot match.

Large language model for text summarization and knowledge integration

The wide-ranging ballpark that is neuroscience touches various disciplines from physics to psychology. This wildly interdisciplinary field produces a myriad of rather separate experimental findings that can be overwhelming to integrate by human effort alone. Moreover, the breadth of the field often results in researchers working within a particular sub-community, focusing on narrowly specialized research areas, and potentially missing out on opportunities through potential cross-fertilization with other sub-disciplines. There may also be certain tasks that go beyond human cognitive ability, including reading experimental results that contain immense numbers of datapoints or distilling the content of all major scientific publications from the past year. LLMs can assist researchers to absorb large bodies of text that would otherwise be challenging to read and absorb in a reasonable amount of time.
The capabilities of LLMs extend beyond typical text summarization tasks, where text being gathered is presented as human-readable (albeit lengthy) natural language. LLM embeddings provide an objective quantification of subjective text (cf. previous paragraph) to resolve linguistic ambiguities and standardize outputs. Subjectivity-based text could be simple words or phrases such as those used to capture emotions portrayed by actors in Forrest Gump
33.
Yang, E. ∙ Milisav, F. ∙ Kopal, J. ...
The default network dominates neural responses to evolving movie stories
Nat. Commun. 2023; 14:4197
or those used to describe chemosensation of odor or flavor compounds.
44.
Sharma, A. ∙ Kumar, R. ∙ Ranjta, S. ...
SMILES to smell: decoding the structure–odor relationship of chemical compounds using the deep neural network approach
J. Chem. Inf. Model. 2021; 61:676-688
Or they could be far more complex, as is the case for text used in psychedelic research.
The common expression “the psychedelic experience” is used in a way that implies a uniformity across “trips.” In actuality, the psychedelic experience is full of nuance and variation rooted partially in the drug user’s set and setting as well as partially in psychopharmacological differences between drugs. Understanding the underlying factors determining nuanced outcomes observed in psychedelic drug users should help us to understand whether certain drugs or varieties of subjective effects can be employed to treat specific conditions, just as the different experiences provided via ingestion of psilocybin and MDMA have each shown early success in the treatment of OCD and PTSD, respectively. To investigate such nuances, a recent study used NLP techniques to analyze 6,850 “trip reports” from psychedelic drug users (Figure 7). The objective of the study was to draw connections between subjective experiences, 27 qualitatively distinct drugs, and a set of 40 associated neurotransmitter receptors expressed in the human brain.
45.
Ballentine, G. ∙ Friedman, S.F. ∙ Bzdok, D.
Trips and neurotransmitters: Discovering principled patterns across 6850 hallucinogenic experiences
Sci. Adv. 2022; 8, eabl6989
The results of this study include detailed word lists of ranked relevance for semantic dimensions capturing major themes present in experience reports, derived via canonical correlation analysis (CCA).
Figure 7 Multi-modal receptor-text integration using NLP to reveal the mechanistic basis of psychedelic drug experience
Human interpretation of the complex themes captured by thousands of words in a particular order is quite difficult. Each word in the ranked lists provided via CCA carries its own potential for subjective interpretation. The varied range of potential interpretation is further widened by the context provided by neighboring terms, as well as the transition in general meaning captured by different subsections of the lists (i.e., top 1 percent versus top 5 percent). Despite the results being presented as dense lists of highlighted words, an LLM can seamlessly abstract away from these word sets by extracting semantic core themes from them, deriving shared higher-level categories of subjective effects elicited by psychedelic drugs. These higher-level categories can then be leveraged to interrogate new hypotheses for drug discovery platforms and experimental treatment approaches, in search of new psychedelic drugs with targeted subjective effects intended to treat specific conditions. Future use of LLMs highlights yet another opportunity for researchers to glean insights from complex, unstructured data that humans might find challenging to cope with.
Medically oriented LLMs, such as Meta’s PMC-LLaMA,
46.
Wu, C. ∙ Zhang, X. ∙ Zhang, Y. ...
Pmc-llama: Further finetuning llama on medical papers
Preprint atarXiv. 2023;
offer a promising solution to the need for sifting through extensive text sources to aggregate and synthesize the essence of their meaning and informative value. By gathering and summarizing vast information landscapes, these models provide access to the quintessence and perhaps elements of understanding of complex topics. Specifically, PMC-LLaMA was designed to support individuals in navigating vast swaths of medical information by training on a massive corpora: 4.8M biomedical academic papers, 30K medical textbooks, as well as 202M tokens of medical question-answer pairs, rationales for decision making, and conversational dialogues. PMC-LLaMA was shown to be convincing in producing reasonable and coherent responses in zero-shot assessments of medical knowledge prompts, for example, answering questions from patients about their urinary tract infections and in-depth exam questions about microbiology and pharmacology. When asked a multiple-choice question about a drug-drug interaction involving tuberculosis and hormonal birth control medications, PMC-LLaMA correctly indicated the mechanism of the drug-drug interaction and elaborated on the rationale used to arrive at the answer (CYP3A4 induction via antibiotic drug rifampin leads to decreased concentrations of hormonal birth control, ultimately increasing the possibility of an unintentional pregnancy). PMC-LLaMA underscores the effectiveness of data-centric approaches in specialized domains and the value of domain-specific model tuning.
46.
Wu, C. ∙ Zhang, X. ∙ Zhang, Y. ...
Pmc-llama: Further finetuning llama on medical papers
Preprint atarXiv. 2023;
Such impressive responses to prompt queries represent a scenario of machine-assisted human intelligence where LLMs can be tailored to effectively educate users in specialized areas, highlighting the potential society-transcending impact of such models and the importance of domain-specific model development.
As another concrete example of gains in everyday life, instead of plowing through a series of thick textbooks, a medical student preparing for an exam could query models like PMC-LLaMA for information regarding specific topics to ensure they cover a wide range of material in a way that is more time efficient. Just as automation in industry made more time available for workers to accomplish other tasks, we can expect to see similar opportunities presented by LLM-based developments. However, the improvements will not all be simply life enhancing; many applications, such as an interactive LLM that has access to patient electronic health records, could potentially be lifesaving. Unfortunately, a recent statistical investigation by Rodziewicz et al.
47.
Rodziewicz, T.L. ∙ Houseman, B. ∙ Hipskind, J.E.
Medical Error Reduction and Prevention
StatPearls
StatPearls Publishing LLC., 2023
estimates that ∼400,000 hospitalized American patients experience some type of preventable harm each year, with roughly a quarter of such cases resulting in death. The lifesaving potential of AI in medicine has several areas where it can shine, such as (1) reducing workload of medical professionals, so that they can be more efficient with their time to better assess and treat their patients, and (2) acting as an early warning system to alert of potential adverse events from ranges of available treatment strategies.

Multi-source and multi-modal large language model synthesis

Over the last decades, the neurosciences have expanded into increasingly segmented silos of research activity. For example, Alzheimer’s disease (AD) is studied in several largely disconnected research communities. Epidemiologists studying the etiology of AD disease in human population strata do not regularly talk to geneticists, practicing neurologists, brain-imaging investigators, or animal experimentalists. The geneticists studying genome-wide risk variants related to AD do not necessarily cross-reference or integrate existing knowledge from these other neuroscience communities either. The imaging neuroscientists devoted to structural and functional differences in the AD brain do not necessarily take into consideration aspects of epidemiological population stratification when designing and interpreting their studies, and so on and so forth. Each AD research community operates in what appears to be its own “bubble,” with its own set of notable scientists, its own pool of commonly entertained hypotheses, and its distinct process of knowledge accumulation, yielding large quantities of papers published per year.
Given increasing amounts of research output every year, a single human is increasingly unable to read all these papers. Many areas of neuroscience research activity are siloed in similar ways. Such knowledge fragmentation is maybe one of the biggest challenges of the scientific enterprise in the 21st century. LLMs now offer an opportunity to assimilate and translate expanding knowledge from several complementary viewpoints on a single neuroscience topic.
LLMs are also starting to get tailored toward the medical domain, with promising results in tasks like medical exams and record keeping. To date, AI in medicine has often been based on computer vision tasks, with limited integration of text, voice, and other kinds of information. Summarization and integration of various data sources through LLMs thus holds tremendous promise for advancing AI assistance for practicing healthcare professionals. Biosensors, genome profiles, medical records, patient testimonials, metabolic panels, and other laboratory assays are examples of potential data sources for building a multi-modal AI framework oriented toward elucidation of patient-personalized clinical pathways.
48.
Hipp, R. ∙ Abel, E. ∙ Weber, R.J.
A Primer on Clinical Pathways
Hosp. Pharm. 2016; 51:416-421
The potential for such AI solutions is vast in terms of the direct impact it could make in the lives of patients and the performance of medical professionals, and has not been fully realized yet.
49.
Acosta, J.N. ∙ Falcone, G.J. ∙ Rajpurkar, P. ...
Multimodal biomedical AI
Nat. Med. 2022; 28:1773-1784
Currently, the use of LLMs in tools designed to help lighten the annotation workload of professionals is also a subject of interest in the realm of medicine. Although the ethics of using LLMs in medicine and medical research are beginning to be discussed,
50.
Harrer, S.
Attention is not all you need: the complicated case of ethically using large language models in healthcare and medicine
EBioMedicine. 2023; 90, 104512
it is now becoming apparent that LLMs could be effective as adjuncts to processes that currently occupy a large amount of human time and effort, such as electronic health record creation and processing, as well as many other activities such as diagnosis and prognosis of disease.
As a next holy grail, what non-text data modalities can be made LLM actionable? Broadly, LLMs may very well be the first technology that can seamlessly combine structured and unstructured information, both dynamically and at scale. Moreover, ChatGPT and similar LLM variants have successfully aggregated disparate text sources from several languages, geographies, and cultures into a single model instance.
LLMs hold promise in bridging the gap between disparate kinds of information, most obviously perhaps computer vision (i.e., images) and language (i.e., text). As a recent example from the machine learning community, Alayrac et al.
35.
Alayrac, J.-B. ∙ Donahue, J. ∙ Luc, P. ...
Flamingo: a visual language model for few-shot learning
Adv. Neural Inf. Process. Syst. 2022; 35:23716-23736
demonstrated that including such additional modalities can improve language modeling. Flamingo models were trained on large-scale multimodal corpora, drawn from the internet, containing naturally contextualized text and image information. Ensuing few-shot learning capabilities can be adapted to a diversity of tasks involving image and video material. Subsequently, the models can be queried to enrich other image material by generating free-form content or checking of predefined multiple-choice questions. Prompting a Flamingo model with task-specific examples can provide practical benefits in many settings, based on visually conditioned autoregressive text generation. As an early example in the neurosciences for reading the image from the subject’s mind,
51.
Naselaris, T. ∙ Prenger, R.J. ∙ Kay, K.N. ...
Bayesian reconstruction of natural images from human brain activity
Neuron. 2009; 63:902-915
a study used a model to attempt to reconstruct natural images only from brain activity measurements.
Further, DALL-E/CLIP (made available 2021/22 by OpenAI) was an early example of next-generation text-image fusion in generative AI, initially using GPT-3 variant under the hood, aiming at always more realistic images generated from user prompts. This multi-modal fusion engine can synthesize various forms and styles such as realistic natural images, painting-like art and symbols, and internal models of design schemes, invoking real and imagined objects, scenes, and people, without close training examples (zero-shot learning). Its component CLIP (contrastive language-image pre-training) was trained on ∼400 million pairs of images and text captions from the internet. This model is used to subselect among images generated by DALL-E for optimal output generation. CLIP combines computer vision and NLP within a single network to deep process, categorize, and generate text annotations for a wide array of images. Without a strict requirement for task-specific training, it can generalize beyond its specific training information toward new, never encountered tasks (cf. transfer learning above). In the neuroscience context, several forms of “images” could potentially be ingested in future LLM frameworks, such as structural and functional MRI brain imaging, PET, and fNIRS, and more broadly also EEG/MEG-derived brain images.
Hence, an important future research avenue will explore to what extent DALL-E/CLIP and similar emerging technologies can successfully extend from natural images to different modalities of brain “images.” For example, the NeuroSynth database presents a bottom-up approach
52.
Yarkoni, T. ∙ Poldrack, R.A. ∙ Nichols, T.E. ...
Large-scale automated synthesis of human functional neuroimaging data
Nat. Methods. 2011; 8:665-670
that has automatically extracted the activation coordinates in 3D image space for >3,000 published articles of brain-imaging task experiments, together with the full text of those articles. As such, this initiative has completed the effort of assembling a corpus of image-description pairs that has already provided value to the neuroscience community through a web interface for user queries. In a parallel research stream, the BrainMap database
53.
Laird, A.R. ∙ Lancaster, J.L. ∙ Fox, P.T.
BrainMap: the social evolution of a human brain mapping database
Neuroinformatics. 2005; 3:65-78 NI:3:1:065 [pii]
,
54.
Fox, P.T. ∙ Lancaster, J.L.
Opinion: Mapping context and content: the BrainMap model
Nat. Rev. Neurosci. 2002; 3:319-321 [pii]
has devised a human-made ontology of mental categories at play in brain-imaging experiments in a top-down fashion. The description systems for cognitive phenomena were hand-designed by human domain experts. Here too, an existing effort has already aggregated image-description pairs that may serve as an attractive starting beachhead for training or refining state-of-the-art multi-modal LLMs. One idea would be to merge NeuroSynth and Brainmap based on the studies that are available in both databases, expert definitions and full text annotations complementing each other to enable LLM-empowered queries and perhaps reasoning across both kinds of brain image meta-information. More broadly, such avenues aimed at transcending content types are especially promising because LLMs offer an unprecedented opportunity to fuse structured and unstructured information in a unified framework.
Over the next few years, neuroscientists can systematically examine what kinds of brain-relevant information lend themselves to emerging modes of LLM integration. What kinds of neuroscience information can be tokenized, and how? Recent LLM research showed promise for leveraging of embedded amino acid blocks, genes, and their mRNA transcripts, cells and cell types, phenotypes, and disease states. Potentially expanding their applicability in biology and medicine, LLMs may also be able to process token-transformed instances of brain region activity, white-matter fiber pathway involvement, brain structure change location, frequency band change in EEG/MEG, or calcium imaging. In so doing, neuroscientists may bring together sequence semantics across datasets and biological perspectives, forming unified perspectives on the brain. This agenda may demand model architecture innovations to represent these information layers. Alternatively, we can use the outputs of generously pre-trained LLMs as a form of distillation by encoding specific information modes, for integration in subsequently trained smaller models for an ultimate research goal. Concretely, datasets from the UK Biobank and other mega-datasets allow the LLM to associate genomic variant information and other molecular data with a variety of human health information. As a core aspiration for the intensely interdisciplinary neuroscience endeavor, LLMs can help us bridge the divide between disparate neuroscience communities and can enable us to form NLP models that can amalgamate the world’s knowledge mosaics.

Epistemological avenues to overcoming the current concept crisis

LLMs may provide an alternative toolkit that turns out to be valuable to audit and edit the human-conceived notions that neuroscience investigators act on to understand the brain. It is important to appreciate that, especially in classical hypothesis-driven research, the entire research endeavor hinges on the pre-assumed validity of the cognitive and neural terms that were used to articulate the experimental research conditions. Yet, many cognitive or psychological terms in frequent usage have brittle definitions and cannot be directly observed in nature. Many human-expert-determined concepts in neuroscience may not denote “natural kinds” in that they do not carve out discrete neural circuits in nature. An overwhelming majority of concepts of cognitive processes have been coined well before neuroscience emerged as a coherent discipline (around the middle of the 20th century), before brain function was beginning to be understood. Further, certain behaviors or cognitive concepts may emerge only in carefully designed experiments in healthy subjects or clinical conditions such as patients with localized brain lesions.
55.
Krakauer, J.W. ∙ Ghazanfar, A.A. ∙ Gomez-Marin, A. ...
Neuroscience Needs Behavior: Correcting a Reductionist Bias
Neuron. 2017; 93:480-490
According to this view, neurocognitive processes can be decomposed during subject engagement in specific experimental tasks, as an avenue to revealing the mapping between brain and behavior. Perhaps it is about time that we put to a test their validity with a disciplined data-driven approach.
The intricacies neuroscientists face when articulating observation of phenomena in the brain appear closely related to Ludwig Wittgenstein’s second main book Philosophical Investigations (1953/2001). The late Wittgenstein argued that confusion introduced by human language itself is the origin of most philosophical problems. For example, in psychology, there is still no widely accepted definition of even simple words like “cognitive” and “emotional.”
56.
Pessoa, L.
On the relationship between emotion and cognition
Nat. Rev. Neurosci. 2008; 9:148-158
,
57.
Van Overwalle, F.
A dissociation between social mentalizing and general reasoning
Neuroimage. 2011; 54:1589-1599 S1053-8119(10)01224-3 [pii]
Further, the brain network consistently recruited during theory-of-mind cognition, taking another’s individual’s point of view, is also consistently involved in an array of diverse psychological processes, including moral thinking, autobiographical memory retrieval, and spatial navigation.
58.
Bzdok, D. ∙ Schilbach, L. ∙ Vogeley, K. ...
Parsing the neural correlates of moral cognition: ALE meta-analysis on morality, theory of mind, and empathy
Brain Struct. Funct. 2012; 217:783-796
,
59.
Dohmatob, E. ∙ Dumas, G. ∙ Bzdok, D.
Dark control: The default mode network as a reinforcement learning agent
Hum. Brain Mapp. 2020; 41:3318-3341
,
60.
Spreng, R.N. ∙ Mar, R.A. ∙ Kim, A.S.N.
The common neural basis of autobiographical memory, prospection, navigation, theory of mind, and the default mode: a quantitative meta-analysis
J. Cogn. Neurosci. 2009; 21:489-510
Our legacy catalog of neurocognitive frameworks may not go in the right direction.
61.
György Buzsáki, M.
The brain from inside out
Oxford University Press, 2019
For example, why do we implicitly expect that terms and notions of William James’ opus magnum (Principles of Psychology, 1890) denote unique brain mechanisms? Further, when we encounter challenging-to-reconcile findings, we sometimes have the tendency to make up a new term instead of really getting at the core of the problem. Many neuroscience investigations take the outside-in approach
61.
György Buzsáki, M.
The brain from inside out
Oxford University Press, 2019
: they make up concepts first and then, only as a second step, go about locating or characterizing them in measures from the brain. This closely relates to what has been called “neo-phrenology” by some authors—the reductionist approach or “overlocalization,” attempting to map terms onto local geographies of the brain.
62.
Poldrack, R.A.
Can cognitive processes be inferred from neuroimaging data?
Trends Cogn. Sci. 2006; 10:59-63 S1364-6613(05)00336-0 [pii]
While modern neuroimaging has shown that specific brain areas are indeed more active during certain tasks, the brain is highly interconnected, and many cognitive functions are distributed across networks. Thus, pinpointing a single “spot” for a complex function can be misleading.
The research focus should perhaps be placed on the actual responses of the brain, not the (reification of) human-invented terms themselves. Indeed, it is neurocognitive processes in the brain that give rise to behavior and cognition. In short, it remains elusive how and to what extent psychological terms map onto regional brain responses, and vice versa.
62.
Poldrack, R.A.
Can cognitive processes be inferred from neuroimaging data?
Trends Cogn. Sci. 2006; 10:59-63 S1364-6613(05)00336-0 [pii]
,
63.
Laird, A.R. ∙ Fox, P.M. ∙ Eickhoff, S.B. ...
Behavioral interpretations of intrinsic connectivity networks
J. Cogn. Neurosci. 2011; 23:4022-4037
,
64.
Mesulam, M.M.
From sensation to cognition
Brain. 1998; 121:1013-1052
For these reasons, some authors put forward that neuroscience is increasingly rich in data but remains poor in theory,
65.
Voytek, B.
The data science future of neuroscience theory
Nat. Methods. 2022; 19:1349-1350
pointing to the acute need for new means of generating research hypotheses.
The analogous point can be made about definitions of brain disease, and especially terms in psychiatry. A same notion is not uniquely related to the same mechanism, and the same mechanism does not often isolate a distinct diagnostic entity. This realization may be part of the reason why an identical drug class often helps alleviate symptoms for nominally separate psychiatric conditions. The DSM-5 and ICD-10 manuals catalog psychiatric diseases based on the judgment of selected experts. Moreover, funding agencies entrust scientists with funding commitments only if their research proposals’ rationale and expected outcomes are firmly rooted in these human-made diagnostic categories. It has, however, become increasingly clear that pathophysiological processes in primary biology are exceedingly heterogeneous and mutually overlapping in degrees, even at the raw genetic level.
66.
Brainstorm Consortium ∙ Anttila, V. ∙ Bulik-Sullivan, B. ...
Analysis of shared heritability in common disorders of the brain
Science. 2018; 360, eaap8757
Consequently, today’s description systems for mental health conditions help communication between practicing medical doctors but lack biological validity in research and predictability in clinical care.
Despite these evident shortcomings of incumbent description systems in neuroscience, there have been few attempts to build such a system of semantic notions in a bottom-up fashion. In a seminal study (Figure 8),
67.
Beam, E. ∙ Potts, C. ∙ Poldrack, R.A. ...
A data-driven framework for mapping domains of human neurobiology
Nat. Neurosci. 2021; 24:1733-1744
a data-led approach was devised to design a framework for neurocognitive categories by pooling across information from ∼20,000 brain-imaging papers in humans. Capitalizing on the accumulated data trove of >25 years of brain-imaging research, NLP algorithms mined the semantic content of the research articles, which was interfaced with >600,000 topographic locations from functional brain scans (fMRI, PET). Paying equal and simultaneous attention to both semantic principles and neural activity principles allowed a systematic integration of brain and behavior in a holistic approach. Among other benefits, this approach helps overcome the dilemma between the forward-inference (concept-to-brain reasoning) vs. reverse-inference (brain-to-concept reasoning) that is haunting the neuroscience enterprise.
62.
Poldrack, R.A.
Can cognitive processes be inferred from neuroimaging data?
Trends Cogn. Sci. 2006; 10:59-63 S1364-6613(05)00336-0 [pii]
In empirical validation analyses, such “computational ontology” was demonstrated to better reproduce the term-function links in new, unseen research articles than widely embraced description systems in neuroscience and psychiatry.
Figure 8 NLP tools to organize the existing knowledge of human cognition in a fully bottom fashion
Taken together, the narratives and stories that we use to describe the world shape the way we design our neuroscience experiments and interpret what we find. In neuroscience, true progress requires particular sensibility to word usage, language hygiene, and variants in conceptualization. In a future of LLM-empowered neuroscience, we may be able to reformat the enshrined terminologies of psychological terms toward semantic frameworks of evidence-based mental categories, rather than perpetuating reified legacy terms from a previous historical era. Emerging LLM technologies can spark advances toward a biologically grounded redefinition of major brain disease nosology, cutting across diagnostic boundaries in a new era of evidence-based psychiatry, rather than relying on the judgment of selected experts alone. As Wittgenstein said, “The limits of my language mean the limits of my world.”
68.
Wittgenstein, L.
Philosophical Investigations
Basil Blackwell, 1958

Conclusions

Biology has become “computable” over the last 5–10 years, such as in the form of massive genetic databases combined with targeted CRISPR gene editing and machine learning analytics, bringing us a step closer to an engineering discipline. Our demonstrated ability to generate biomolecular data troves eclipses our realized ambition to actually glean understanding from these systems—neuroscientists today are literally “drowning in information but starving for knowledge” (as written by John Naisbitt).
69.
Naisbitt, J.
Megatrends: ten new directions transforming our lives
Warner Books, 1988
LLMs offer a new opportunity set in the game (but see Box 1). This model class shows that sheer statistical brute force can assist in demystifying the brain and disease by reading and generating biology, by crafting knowledge frameworks, and by unlocking never-before accessible modes of information integration and interrogation at scale. Foundation models will probably serve to extract, synergize, and synthesize knowledge from and across neuroscience domains, siloed “bubbles,” a task that may or may not exceed human comprehension. Neuroscientists will need to open up to and embrace the uncomfortable possibility that the human brain is a biological system that goes beyond what human intelligence alone can fully grasp without the assistance of AI tools applied to big data.
BOX 1
Limitations of current LLM tools
Despite LLMs being perhaps the most rapidly evolving technology of all times, a number of challenges remain for today’s incarnation of these models.
Hallucinations: Refers to the common issue where the model generates text or information that is not anchored in reality or the provided context. The model might generate plausible-sounding but incorrect or fabricated information, despite a deceivingly confident tone. By design, an LLM generates text, whether or not the model is certain or uncertain of its output. Hence, current LLM variants may be inherently less well positioned for accurate and reliable information queries (e.g., give exact paper references).
70.
Dziri, N. ∙ Milton, S. ∙ Yu, M. ...
On the origin of hallucinations in conversational models: Is it the datasets or the models?
Preprint atarXiv. 2022;
Dependency on big data: LLMs have a hunger for vast quantities of input data. Extended fractions of the internet have now been exploited for LLM development. Consequently, one might wonder whether we have saturated our available training data already. What are modes of future data generation for training ever-more powerful LLMs? One possibility is that last-generation LLMs will increasingly generate output data, on the internet or other venues, which will be fed back into next-generation LLMs. It is currently hard to anticipate what the ramifications of such a recursion scenario would be. As one possible consequence, solutions to performance benchmarks may increasingly contaminate the training data.
Resource hunger: Deploying LLMs requires a significant amount of computational power, information storage capacity, and energy consumption; probably also in terms of the lasting environmental footprint.
71.
Strubell, E. ∙ Ganesh, A. ∙ McCallum, A.
Energy and policy considerations for deep learning in NLP
Preprint atarXiv. 2019;
For the goal of training LLMs from scratch, the extent of necessary richness in compute-storage resources probably sidelines the large majority of institutions in industry, academia, and government on the planet.
Reasoning: Instances of this model class often lack common sense or the ability to understand and respond to novel situations that were not present in their training data. How do we make sure that LLMs act in line with human values (the so-called alignment problem)? Also, at times, these models may generate text that is not relevant or not fully aligned with the context provided to it. As part of an explanation, LLMs perform fairly well in single-step reasoning tasks, but face challenges in sequential integration of consecutive reasoning steps.
Biases and other ethical considerations: Further, LLMs inherit the biases that may be present in the ingested datasets during training. The models can inadvertently generate harmful, offensive, or otherwise skewed outputs.
72.
Nadeem, M. ∙ Bethke, A. ∙ Reddy, S.
StereoSet: Measuring stereotypical bias in pretrained language models
Preprint atarXiv. 2020;
Reinforcement learning from human feedback, calibrating LLMs toward the kinds of answers that humans expect, may be part of the solution. Further, current LLMs do not necessarily work well across languages and cultures.
73.
Liu, F. ∙ Bugliarello, E. ∙ Ponti, E.M. ...
Visually grounded reasoning across languages and cultures
Preprint atarXiv. 2021;
Certification: Watermark assignments or deciding whether a text was LLM generated or not is probably challenging to impossible.
Lack of explainability: For users and developers alike, it remains difficult to understand why a given model generated a particular response, which is a significant limitation for applications that require interpretability and transparency, especially given increasing political pressure for white-box machine learning solutions (cf. GDPR law in the European Union). Closed-source LLMs further complicate this matter.
Diminishing returns in scaling: Continuing to increase the data quantity and compute/storage resources is already starting to hit regimes with diminishing returns. Alternative strategies for bringing the emergent abilities of LLMs to the next level will probably be required in the future.
From a broader societal perspective, the industrial revolution touched mostly blue-collar jobs. In contrast, the current LLM revolution will perhaps mostly touch white-collar jobs, including those of research workers in the neurosciences. Indeed, the unreasonable effectiveness of LLMs has been compared by venture capitalists and investors to the invention of fire as a tool, electricity, or the internet.

Acknowledgments

D.B. was supported by the Brain Canada Foundation, through the Canada Brain Research Fund, with the financial support of Health Canada, National Institutes of Health (NIH R01 AG068563A, NIH R01 DA053301-01A1, NIH R01 MH129858-01A1), the Canadian Institute of Health Research (CIHR 438531, CIHR 470425), the Healthy Brains Healthy Lives initiative (Canada First Research Excellence fund), Google (Research Award, Teaching Award), and by the CIFAR Artificial Intelligence Chairs program (Canada Institute for Advanced Research).

Declaration of interests

Four co-authors are employees at MindState Design Labs (A.T., O.L., P.W., and T.R.) and five are equity holders (D.B., A.T., O.L., P.W., and T.R.).

References

Mikolov, T. ∙ Sutskever, I. ∙ Chen, K. ...
Distributed representations of words and phrases and their compositionality
Adv. Neural Inf. Process. Syst. 2013; 26
Le, Q. ∙ Mikolov, T.
Distributed representations of sentences and documents
PMLR. 2014; 32:1188-1196
Conneau, A. ∙ Kiela, D. ∙ Schwenk, H. ...
Supervised learning of universal sentence representations from natural language inference data
Preprint atarXiv. 2017;
McCann, B. ∙ Bradbury, J. ∙ Xiong, C. ...
Learned in translation: Contextualized word vectors
Adv. Neural Inf. Process. Syst. 2017;
Mikolov, T. ∙ Chen, K. ∙ Corrado, G. ...
Efficient estimation of word representations in vector space
Preprint atarXiv. 2013;
Pennington, J. ∙ Socher, R. ∙ Manning, C.D.
Glove: Global vectors for word representation
Bubeck, S. ∙ Chandrasekaran, V. ∙ Eldan, R. ...
Sparks of artificial general intelligence: Early experiments with gpt-4
Preprint atarXiv. 2023;
Goldstein, A. ∙ Zada, Z. ∙ Buchnik, E. ...
Shared computational principles for language processing in humans and deep language models
Nat. Neurosci. 2022; 25:369-380
Caucheteux, C. ∙ Gramfort, A. ∙ King, J.-R.
Evidence of a predictive coding hierarchy in the human brain listening to speech
Nat. Hum. Behav. 2023; 7:430-441
Schrimpf, M. ∙ Blank, I.A. ∙ Tuckute, G. ...
The neural architecture of language: Integrative modeling converges on predictive processing
Proc. Natl. Acad. Sci. USA. 2021; 118, e2105646118
Vaswani, A. ∙ Shazeer, N. ∙ Parmar, N. ...
Attention is all you need
Adv. Neural Inf. Process. Syst. 2017; 30
Hassid, M. ∙ Peng, H. ∙ Rotem, D. ...
How much does attention actually attend? Questioning the Importance of Attention in Pretrained Transformers
Preprint atarXiv. 2022;
Tay, Y. ∙ Dehghani, M. ∙ Abnar, S. ...
Long range arena: A benchmark for efficient transformers
Preprint atarXiv. 2020;
Bzdok, Danilo ∙ Yeo, B.T.T
Inference in the age of big data: Future perspectives on neuroscience
Neuroimage. 2017; 155:549-564
Wei, J. ∙ Tay, Y. ∙ Bommasani, R. ...
Emergent abilities of large language models
Preprint atarXiv. 2022;
OpenAI
GPT-4 Technical Report
Preprint atarXiv. 2023;
Kaplan, J. ∙ McCandlish, S. ∙ Henighan, T. ...
Scaling laws for neural language models
Preprint atarXiv. 2020;
Touvron, H. ∙ Lavril, T. ∙ Izacard, G. ...
Llama: Open and efficient foundation language models
Preprint atarXiv. 2023;
Hoffmann, J. ∙ Borgeaud, S. ∙ Mensch, A. ...
Training compute-optimal large language models
Preprint atarXiv. 2022;
Schaeffer, R. ∙ Miranda, B. ∙ Koyejo, S.
Are emergent abilities of Large Language Models a mirage?
Preprint atarXiv. 2023;
Caballero, E. ∙ Gupta, K. ∙ Rish, I. ...
Broken neural scaling laws
Preprint atarXiv. 2022;
Houlsby, N. ∙ Giurgiu, A. ∙ Jastrzebski, S. ...
Parameter-efficient transfer learning for NLP
PMLR. 2019; 97:2790-2799
Pfeiffer, J. ∙ Rücklé, A. ∙ Poth, C. ...
Adapterhub: A framework for adapting transformers
Preprint atarXiv. 2020;
Bapna, A. ∙ Arivazhagan, N. ∙ Firat, O.
Simple, scalable adaptation for neural machine translation
Preprint atarXiv. 2019;
Radford, A. ∙ Wu, J. ∙ Child, R. ...
Language models are unsupervised multitask learners
OpenAI blog. 2019; 1:9
Brown, T. ∙ Mann, B. ∙ Ryder, N. ...
Language models are few-shot learners
Adv. Neural Inf. Process. Syst. 2020; 33:1877-1901
Xiang, J. ∙ Tao, T. ∙ Gu, Y. ...
Language Models Meet World Models: Embodied Experiences Enhance Language Models
Preprint atarXiv. 2023;
Berglund, L. ∙ Tong, M. ∙ Kaufmann, M. ...
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A".
Preprint atarXiv. 2023;
Brandes, N. ∙ Goldman, G. ∙ Wang, C.H. ...
Genome-wide prediction of disease variant effects with a deep protein language model
Nat. Genet. 2023; 55:1512-1522
Cui, H. ∙ Wang, C. ∙ Maan, H. ...
scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI
Preprint atbioRxiv. 2023;
Jumper, J. ∙ Evans, R. ∙ Pritzel, A. ...
Highly accurate protein structure prediction with AlphaFold
Nature. 2021; 596:583-589
Rives, A. ∙ Meier, J. ∙ Sercu, T. ...
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences
Proc. Natl. Acad. Sci. USA. 2021; 118, e2016239118
Yang, E. ∙ Milisav, F. ∙ Kopal, J. ...
The default network dominates neural responses to evolving movie stories
Nat. Commun. 2023; 14:4197
Ye, Z. ∙ Liu, Y. ∙ Li, Q.
Recent Progress in Smart Electronic Nose Technologies Enabled with Machine Learning Methods
Sensors. 2021; 21, 7620
Alayrac, J.-B. ∙ Donahue, J. ∙ Luc, P. ...
Flamingo: a visual language model for few-shot learning
Adv. Neural Inf. Process. Syst. 2022; 35:23716-23736
Sharma, P. ∙ Ding, N. ∙ Goodman, S. ...
Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018;
Thomee, B. ∙ Shamma, D.A. ∙ Friedland, G. ...
YFCC100M: The new data in multimedia research
Commun. ACM. 2016; 59:64-73
Zhou, Y. ∙ Chia, M.A. ∙ Wagner, S.K. ...
A foundation model for generalizable disease detection from retinal images
Nature. 2023; 622:156-163
Wagner, S.K. ∙ Hughes, F. ∙ Cortina-Borja, M. ...
AlzEye: longitudinal record-level linkage of ophthalmic imaging and hospital admissions of 353 157 patients in London, UK
BMJ open. 2022; 12:e058552
Weininger, D.
SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules
J. Chem. Inf. Comput. Sci. 1988; 28:31-36
Bzdok, D. ∙ Ioannidis, J.P.
Exploration, inference, and prediction in neuroscience and biomedicine
Trends in neurosciences. 2019; 42:251-262
Bzdok, D. ∙ Engemann, D. ∙ Thirion, B.
Inference and prediction diverge in biomedicine
Patterns. 2020; 1:100119
Shanahan, M. ∙ McDonell, K. ∙ Reynolds, L.
Role play with large language models
Nature. 2023; 623:493-498
Sharma, A. ∙ Kumar, R. ∙ Ranjta, S. ...
SMILES to smell: decoding the structure–odor relationship of chemical compounds using the deep neural network approach
J. Chem. Inf. Model. 2021; 61:676-688
Ballentine, G. ∙ Friedman, S.F. ∙ Bzdok, D.
Trips and neurotransmitters: Discovering principled patterns across 6850 hallucinogenic experiences
Sci. Adv. 2022; 8, eabl6989
Wu, C. ∙ Zhang, X. ∙ Zhang, Y. ...
Pmc-llama: Further finetuning llama on medical papers
Preprint atarXiv. 2023;
Rodziewicz, T.L. ∙ Houseman, B. ∙ Hipskind, J.E.
Medical Error Reduction and Prevention
StatPearls
StatPearls Publishing LLC., 2023
Hipp, R. ∙ Abel, E. ∙ Weber, R.J.
A Primer on Clinical Pathways
Hosp. Pharm. 2016; 51:416-421
Acosta, J.N. ∙ Falcone, G.J. ∙ Rajpurkar, P. ...
Multimodal biomedical AI
Nat. Med. 2022; 28:1773-1784
Harrer, S.
Attention is not all you need: the complicated case of ethically using large language models in healthcare and medicine
EBioMedicine. 2023; 90, 104512
Naselaris, T. ∙ Prenger, R.J. ∙ Kay, K.N. ...
Bayesian reconstruction of natural images from human brain activity
Neuron. 2009; 63:902-915
Yarkoni, T. ∙ Poldrack, R.A. ∙ Nichols, T.E. ...
Large-scale automated synthesis of human functional neuroimaging data
Nat. Methods. 2011; 8:665-670
Laird, A.R. ∙ Lancaster, J.L. ∙ Fox, P.T.
BrainMap: the social evolution of a human brain mapping database
Neuroinformatics. 2005; 3:65-78 NI:3:1:065 [pii]
Fox, P.T. ∙ Lancaster, J.L.
Opinion: Mapping context and content: the BrainMap model
Nat. Rev. Neurosci. 2002; 3:319-321 [pii]
Krakauer, J.W. ∙ Ghazanfar, A.A. ∙ Gomez-Marin, A. ...
Neuroscience Needs Behavior: Correcting a Reductionist Bias
Neuron. 2017; 93:480-490
Pessoa, L.
On the relationship between emotion and cognition
Nat. Rev. Neurosci. 2008; 9:148-158
Van Overwalle, F.
A dissociation between social mentalizing and general reasoning
Neuroimage. 2011; 54:1589-1599 S1053-8119(10)01224-3 [pii]
Bzdok, D. ∙ Schilbach, L. ∙ Vogeley, K. ...
Parsing the neural correlates of moral cognition: ALE meta-analysis on morality, theory of mind, and empathy
Brain Struct. Funct. 2012; 217:783-796
Dohmatob, E. ∙ Dumas, G. ∙ Bzdok, D.
Dark control: The default mode network as a reinforcement learning agent
Hum. Brain Mapp. 2020; 41:3318-3341
Spreng, R.N. ∙ Mar, R.A. ∙ Kim, A.S.N.
The common neural basis of autobiographical memory, prospection, navigation, theory of mind, and the default mode: a quantitative meta-analysis
J. Cogn. Neurosci. 2009; 21:489-510
György Buzsáki, M.
The brain from inside out
Oxford University Press, 2019
Poldrack, R.A.
Can cognitive processes be inferred from neuroimaging data?
Trends Cogn. Sci. 2006; 10:59-63 S1364-6613(05)00336-0 [pii]
Laird, A.R. ∙ Fox, P.M. ∙ Eickhoff, S.B. ...
Behavioral interpretations of intrinsic connectivity networks
J. Cogn. Neurosci. 2011; 23:4022-4037
Mesulam, M.M.
From sensation to cognition
Brain. 1998; 121:1013-1052
Voytek, B.
The data science future of neuroscience theory
Nat. Methods. 2022; 19:1349-1350
Brainstorm Consortium ∙ Anttila, V. ∙ Bulik-Sullivan, B. ...
Analysis of shared heritability in common disorders of the brain
Science. 2018; 360, eaap8757
Beam, E. ∙ Potts, C. ∙ Poldrack, R.A. ...
A data-driven framework for mapping domains of human neurobiology
Nat. Neurosci. 2021; 24:1733-1744
Wittgenstein, L.
Philosophical Investigations
Basil Blackwell, 1958
Naisbitt, J.
Megatrends: ten new directions transforming our lives
Warner Books, 1988
Dziri, N. ∙ Milton, S. ∙ Yu, M. ...
On the origin of hallucinations in conversational models: Is it the datasets or the models?
Preprint atarXiv. 2022;
Strubell, E. ∙ Ganesh, A. ∙ McCallum, A.
Energy and policy considerations for deep learning in NLP
Preprint atarXiv. 2019;
Nadeem, M. ∙ Bethke, A. ∙ Reddy, S.
StereoSet: Measuring stereotypical bias in pretrained language models
Preprint atarXiv. 2020;
Liu, F. ∙ Bugliarello, E. ∙ Ponti, E.M. ...
Visually grounded reasoning across languages and cultures
Preprint atarXiv. 2021;

Figures (8) 人物 (8)

Article metrics 文章指标