这是用户在 2025-1-20 12:16 为 https://www.chicagobooth.edu/review/turning-weak-signals-into-strong-predictions 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

In fields including computer science and data science, it is common practice when predicting outcomes such as customer churn or image recognition to focus on variables with the highest predictive power. This often involves identifying a few “strong” signals—such as user engagement metrics for churn prediction or edge detection features in image recognition—while discarding “weak” variables that contribute less to overall model accuracy.
在计算机科学和数据科学等领域,预测客户流失或图像识别等结果时,通常会关注具有最高预测能力的变量。这通常涉及识别一些“强”信号——例如用于流失预测的用户参与度指标或图像识别中的边缘检测特征——同时舍弃对整体模型准确性贡献较小的“弱”变量。

But making accurate predictions in financial markets is notoriously challenging because the most easily exploitable opportunities for abnormal returns (alpha) have already been identified and capitalized on by sophisticated investors. This leaves financial datasets with weaker, more subtle signals—such as minor price inefficiencies or anomalies in trading patterns—which offer smaller potential gains and are far more difficult to detect.
但在金融市场中做出准确的预测 notoriously 具有挑战性,因为最容易被利用的异常收益(alpha)机会已经被成熟的投资者识别并利用。这使得金融数据集留下了较弱、较微妙的信号——例如小的价格低效或交易模式中的异常——这些信号提供的潜在收益较小,且更难以检测。

Chicago Booth PhD student Zhouyu Shen and Booth’s Dacheng Xiu suggest that these weak signals provide an important opportunity and that discovering how to best make use of them has become critical for anyone looking to improve predictive accuracy. A commonly used prediction method can struggle with them, their research finds—while an older, less-used model outperformed in their tests.
芝加哥大学布斯商学院的博士生沈周宇和布斯的邵承熙建议,这些微弱信号提供了一个重要的机会,发现如何最好地利用它们对于任何希望提高预测准确性的人来说变得至关重要。他们的研究发现,一种常用的预测方法在处理这些信号时可能会遇到困难,而一种较旧、使用较少的模型在他们的测试中表现更好。

Weak signals are prevalent in economic data. For example, changes to personal income, the unemployment rate, or corporate bond spreads are not seemingly relevant to someone trying to predict a move in industrial production. But such data could be helpful in combination, the researchers explain. After all, personal income changes are tied to consumer demand. Corporate bond spreads signal shifts in business borrowing costs. The unemployment rate provides a read on labor dynamics. Together, these variables could start to paint a more comprehensive picture of the factors influencing industrial production.
经济数据中普遍存在微弱信号。例如,个人收入的变化、失业率或公司债券利差似乎与试图预测工业生产变动的人无关。但研究人员解释说,这些数据结合在一起可能会有所帮助。毕竟,个人收入的变化与消费者需求相关。公司债券利差则反映了商业借贷成本的变化。失业率提供了劳动力动态的读数。综合来看,这些变量可能开始描绘出影响工业生产因素的更全面的图景。

A prediction model that works for strong signals might not necessarily work for a data set full of subtle signals, however. In this case, which machine learning models can best capture faint patterns in high-dimensional data sets (those with a lot of variables)?
一个适用于强信号的预测模型可能不一定适用于充满微弱信号的数据集。在这种情况下,哪些机器学习模型可以最好地捕捉高维数据集(即具有大量变量)中的微弱模式?

Related Reading  相关阅读

The common approach of focusing on strong signals and eliminating most weak signals to build predictive models has an advantage: It helps avoid overfitting, which occurs when a model becomes too tailored to its training data and loses the crucial ability to generalize to new, unseen data. However, when signals are weak, this selective process can lead to errors, undermining the benefits of a parsimonious (essentially simple) model by potentially excluding subtle yet valuable information or relying on incorrectly chosen signals.
专注于强信号并消除大多数弱信号以构建预测模型的常见方法有一个优点:它有助于避免过拟合,过拟合发生在模型过于针对其训练数据而失去对新数据(未见数据)进行概括的关键能力。然而,当信号较弱时,这种选择性过程可能导致错误,从而削弱简约(本质上简单)模型的好处,可能排除微妙但有价值的信息或依赖于错误选择的信号。

To discover which ML methods remain effective at making use of subtle signals, the researchers employed an approach that combined theoretical work, simulations, and empirical analysis.
为了发现哪些机器学习方法在利用微妙信号方面仍然有效,研究人员采用了一种结合理论研究、模拟和实证分析的方法。

Regression is a popular technique for economic and financial forecasting, especially the least absolute shrinkage and selection operator model, which automatically weeds out weaker variables. Shen and Xiu compared LASSO with Ridge regression, an older method that has become somewhat out of fashion. They then extended their analysis to include tree-based ML models (random forest and gradient-boosted regression trees) and neural networks.
回归是一种流行的经济和金融预测技术,特别是最小绝对收缩和选择算子模型,它自动剔除较弱的变量。沈和修将 LASSO 与岭回归进行了比较,后者是一种已经有些过时的旧方法。然后,他们将分析扩展到包括基于树的机器学习模型(随机森林和梯度提升回归树)和神经网络。

LASSO works well when there is a mix of strong and weak signals, but it struggles with data sets that consist mostly of faint signals, as is often the case in economics and finance. In fact, the researchers find that its performance can be worse than ignoring the signals altogether. Ridge regression, on the other hand, tends to do a better job of leveraging the cumulative power of less prominent signals, according to the research.
LASSO 在强信号和弱信号混合时表现良好,但在大多数由微弱信号组成的数据集上表现不佳,这在经济学和金融学中经常出现。事实上,研究人员发现它的表现可能比完全忽略信号还要差。另一方面,根据研究,岭回归在利用不太显著信号的累积效应方面往往表现得更好。

To validate their theoretical findings, the researchers performed simulations and empirical analyses that applied the methods to six real-world datasets from finance, macroeconomics, and microeconomics. These included datasets used to predict equity returns (for both individual stocks and the broader market), forecast industrial production growth and global economic growth, and analyze crime rates and pro-plaintiff decisions.
为了验证他们的理论发现,研究人员进行了模拟和实证分析,将这些方法应用于来自金融、宏观经济学和微观经济学的六个真实世界数据集。这些数据集包括用于预测股票收益(包括个别股票和更广泛市场)、预测工业生产增长和全球经济增长,以及分析犯罪率和有利于原告的裁决。

Ridge regression consistently provided predictions with higher accuracy than LASSO in data sets dominated by weak signals. This suggests Ridge regression is a more reliable tool for economic and financial prediction in these scenarios, the researchers write. Ridge keeps all variables in the model but ensures that less relevant details don’t dominate the prediction, whereas LASSO eliminates the less impactful variables altogether. This resulted in LASSO missing the subtle yet collectively significant weak signals.
岭回归在弱信号主导的数据集中始终提供了比 LASSO 更高准确性的预测。这表明,岭回归在这些情况下是经济和金融预测更可靠的工具,研究人员写道。岭回归保留模型中的所有变量,但确保不太相关的细节不会主导预测,而 LASSO 则完全消除了影响较小的变量。这导致 LASSO 错过了微妙但整体上重要的弱信号。

The researchers’ findings highlight that in scenarios where all signals are weak, Ridge regression delivers more accurate predictions than models such as LASSO that are focused on pruning datasets down to only the strongest signals.
研究人员的发现强调,在所有信号都较弱的情况下,岭回归比像 LASSO 这样的模型提供更准确的预测,而后者专注于将数据集缩减到仅包含最强信号。

Random forest was the better of the tree-based methods when signals were weak, outperforming gradient boosted regression trees. Neural networks, which avoid overfitting by applying certain penalties, performed better when these penalties prevented any single part of the model from having too much influence. This approach worked more effectively than methods such as LASSO, which use penalties to eliminate the influence of many model components entirely.
当信号较弱时,随机森林是基于树的方法中表现更好的,优于梯度提升回归树。神经网络通过施加某些惩罚来避免过拟合,当这些惩罚防止模型的任何单一部分过于影响时,表现更好。这种方法比使用惩罚完全消除许多模型组件影响的 LASSO 等方法更有效。

The research suggests that in a landscape where the obvious signals have been fully exploited, the real advantage lies in uncovering and utilizing the subtle, often overlooked patterns within the data. Shen and Xiu’s work finds that by embracing weak signals, researchers and practitioners alike can gain a more nuanced and comprehensive understanding of economic dynamics. Finding the appropriate ML method for a dataset is a gateway to recognizing the hidden value within seemingly inconsequential data points.
研究表明,在一个明显信号已被充分利用的环境中,真正的优势在于揭示和利用数据中微妙的、常常被忽视的模式。沈和修的研究发现,通过拥抱弱信号,研究人员和从业者都可以获得对经济动态更细致和全面的理解。为数据集找到合适的机器学习方法是识别看似无关紧要的数据点中隐藏价值的门户。

More from Chicago Booth Review
更多来自芝加哥布斯评论

Related Chicago Booth Review Topics
相关的芝加哥布斯评论主题

More from Chicago Booth
更多来自芝加哥大学布斯商学院

Results 1-3 of 4,347

Related Chicago Booth Topics
相关的芝加哥商学院主题

Get more Chicago Booth Review
获取更多芝加哥布斯评论

Your Privacy  您的隐私
We want to demonstrate our commitment to your privacy. Please review Chicago Booth's privacy notice, which provides information explaining how and why we collect particular information when you visit our website.
我们希望展示我们对您隐私的承诺。请查看芝加哥布斯的隐私声明,其中提供了有关我们在您访问我们的网站时如何以及为何收集特定信息的信息。

adnxs pixel adnxs pixel