这是用户在 2024-4-16 10:24 为 https://app.immersivetranslate.com/pdf-pro/8a789567-fb5a-4a67-95b0-e781c44785d1 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
2024_04_16_cde29cc7f597c890a7b1g

Unbox the Black-Box: Predict and Interpret YouTube Viewership Using Deep Learning
打开黑匣子:使用深度学习预测和解释YouTube观众群体

Jiaheng Xie , Yidong Chai , and Xiao Liu
谢家恒,柴一东,刘晓
aDepartment of Accounting & MIS, Lerner College of Business & Economics, University of Delaware, Newark, DE,
特拉华大学勒纳商学院会计与管理信息系统系,美国特拉华州纽瓦克
USA; bHefei University of Technology, Anhui, P.R. China; 'Department of Information Systems, Arizona State
美国亚利桑那州立大学信息系统系,中国安徽合肥工业大学
University, Tempe, AZ, USA
亚利桑那州坦佩市大学,美国

Abstract 摘要

As video-sharing sites emerge as a critical part of the social media landscape, video viewership prediction becomes essential for content creators and businesses to optimize influence and marketing outreach with minimum budgets. Although deep learning champions viewership prediction, it lacks interpretability, which is required by regulators and is fundamental to the prioritization of the video production process and promoting trust in algorithms. Existing interpretable predictive models face the challenges of imprecise interpretation and negligence of unstructured data. Following the design-science paradigm, we propose a novel Precise Wide-and-Deep Learning (PrecWD) to accurately predict viewership with unstructured video data and well-established features while precisely interpreting feature effects. PrecWD's prediction outperforms benchmarks in two case studies and achieves superior interpretability in two user studies. We contribute to IS knowledge base by enabling precise interpretability in video-based predictive analytics and contribute nascent design theory with generalizable model design principles. Our system is deployable to improve video-based social media presence.
随着视频分享网站成为社交媒体格局中的关键部分,视频观看量预测对于内容创作者和企业来说变得至关重要,以最小预算优化影响力和营销推广。尽管深度学习在观看量预测方面表现出色,但缺乏可解释性,这是监管机构所要求的,也是视频制作过程中优先考虑和促进算法信任的基础。现有的可解释性预测模型面临着解释不精确和忽视非结构化数据的挑战。遵循设计科学范式,我们提出了一种新颖的精确宽深度学习(PrecWD),可以准确预测视频观看量,同时精确解释特征效果,使用非结构化视频数据和成熟特征。PrecWD的预测在两个案例研究中优于基准,并在两个用户研究中实现了更好的可解释性。我们通过在基于视频的预测分析中实现精确可解释性,为信息系统知识库做出贡献,并通过可推广的模型设计原则为新兴设计理论做出贡献。我们的系统可部署以提升基于视频的社交媒体存在。

KEYWORDS 关键词

Design science; deep learning; video prediction; analytics interpretability; unstructured data
设计科学; 深度学习; 视频预测; 分析可解释性; 非结构化数据

Introduction 简介

Social media is taking up a greater share of consumers' attention and time spent online, among which video-sharing sites, such as YouTube and Vimeo, are quickly overtaking the crown. YouTube alone hosts over 2.6 billion active users and is projected to surpass textand image-based social media platforms, such as Facebook and Instagram [53]. The soaring popularity of the contents in video format makes video-sharing sites an effective channel to disseminate information and share knowledge.
社交媒体正在占据消费者在线注意力和时间的更大份额,其中视频分享网站,如 YouTube 和 Vimeo,正在迅速超越头部。仅 YouTube 就拥有超过 26 亿活跃用户,并预计将超过基于文本和图像的社交媒体平台,如 Facebook 和 Instagram[53]。视频格式内容的飞速普及使视频分享网站成为传播信息和分享知识的有效渠道。
Content consumption on these social media platforms has been a phenomenon of interest in information systems (IS) and marketing research. Prior work has investigated the impact of digital content on improving sales [18] and boosting awareness of a brand or a product [22]. They also examined the factors that may increase consumption [35] and offered some insights for the design of digital contents [6]. These studies acknowledge that digital content consumption and its popularity are understudied [35]. Our study approaches this domain with an interpretable predictive analytics lens: viewership
这些社交媒体平台上的内容消费已成为信息系统(IS)和营销研究的一个引人关注的现象。先前的研究调查了数字内容对提高销售[18]和增加品牌或产品知名度的影响[22]。他们还研究了可能增加消费的因素[35],并为数字内容设计提供了一些见解[6]。这些研究承认数字内容消费及其受欢迎程度尚未得到充分研究[35]。我们的研究通过可解释的预测性分析视角探讨了这一领域:观看量
CONTACT Yidong Chai 2 chaiyd@hfut.edu.cn Hefei University of Technology, P.O. 22, HFUT, 193 Tunxi Road, Hefei,
联系 Yidong Chai 2 chaiyd@hfut.edu.cn 安徽工业大学, 邮编 230009, 中国安徽省合肥市屯溪路 193 号
Anhui, P.R. China, 230009.
(-) 本文的补充数据可在 https://doi.org/10.1080/07421222.2023.2196780 上在线访问
(-) Supplemental data for this article can be accessed online at https://doi.org/10.1080/07421222.2023.2196780
(C) 2023 Taylor & Francis Group, LLC
prediction. Viewership is the metric video-sharing sites use to pay their content creators, defined as the average daily views of a video. While viewership prediction offers immense implications, interpretation elevates such value: evaluating a learned model, prioritizing features, and building trust with domain experts and end users. Therefore, we propose an interpretable machine learning (ML) to predict video viewership (narrative-based longform videos) and interpret the predictors.
预测。观看量是视频分享网站用来支付其内容创作者的指标,定义为视频的平均每日观看次数。虽然观看量预测具有巨大的影响,但解释提升了这种价值:评估学习模型,优先考虑特征,并与领域专家和最终用户建立信任。因此,我们提出了一种可解释的机器学习(ML)来预测视频观看量(基于叙事的长篇视频)并解释预测因素。
Murdoch et al. [49] show that predictive accuracy, descriptive accuracy, and relevancy form the three pillars of an interpretable ML model. Predictive accuracy measures the model's prediction quality. Descriptive accuracy assesses how well the model describes the data relationships learned by the prediction, or interpretation quality. Relevancy is defined as whether the model provides insight for the domain problem. In this study, both predictive and descriptive accuracy have relevancy to content creators, sponsors, and platforms.
Murdoch 等人[49]表明,预测准确性、描述准确性和相关性构成了可解释的 ML 模型的三大支柱。预测准确性衡量模型的预测质量。描述准确性评估模型描述数据关系的能力,或解释质量。相关性被定义为模型是否为领域问题提供洞察。在本研究中,预测准确性和描述准确性对内容创作者、赞助商和平台都具有相关性。
For content creators, the high predictive accuracy of viewership improves the allocation of promotional funds. If the predicted viewership exceeds expectation, the promotional funds can be distributed to less popular videos where content enrichment is insufficient to gain organic views. Meanwhile, high predictive accuracy facilitates trustworthy interpretation which offers effective actions for content creators. Video production requires novel shooting skills and sophisticated communication mindsets for elaborate audio-visual storytelling, which average content creators lack. The interpretation navigates them through how to prioritize the customizable video features. For sponsors, before their sponsored video is published, high predictive accuracy enables them to estimate the return (viewership) compared to the sponsorship cost. If the return-cost ratio is unsatisfactory, the sponsors could request content enrichment. For platforms, viewership prediction helps control the influence of violative videos. Limited by time and resources, the current removal measures are far from being sufficient, resulting in numerous high-profile violative videos infiltrating the public. YouTube projects the percentage of total views from violative videos as a measure of content quality [50]. To minimize the influence of violative videos, YouTube could rely on viewership prediction to prioritize the screening of potentially popular videos, among which violative videos can be banned before they reach the public.
对于内容创作者来说,观众预测准确性高有助于提高推广资金的分配。如果预测的观众超出预期,推广资金可以分配到较不受欢迎的视频上,这些视频内容不足以获得有机观看。同时,高预测准确性有助于提供可信赖的解释,为内容创作者提供有效的行动建议。视频制作需要新颖的拍摄技巧和复杂的沟通思维,用于精心的视听叙事,这是普通内容创作者所缺乏的。解释指导他们如何优先考虑可定制的视频特点。对于赞助商来说,在他们赞助的视频发布之前,高预测准确性使他们能够估计回报(观众数量)与赞助成本的比较。如果回报成本比例不尽人意,赞助商可以要求内容丰富化。对于平台来说,观众预测有助于控制违规视频的影响。受时间和资源限制,目前的移除措施远远不够,导致许多高调的违规视频渗透到公众中。 YouTube将违规视频的总观看次数百分比作为内容质量的衡量标准[50]。为了最大程度减少违规视频的影响,YouTube可以依靠观众预测来优先筛选潜在热门视频,其中违规视频可以在它们被公开之前被禁止。
Interpretable ML models can be broadly categorized as post hoc and model-based [49]. The general principle of these interpretable methods is to estimate the total effect, defined as the change of the outcome when a feature increases by one unit. Post hoc methods explain a black-box prediction model using a separate explanation model, such as Shapley Additive Explanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) [43, 52]. However, these standalone explanation models could alter the total effect of the prediction model, since they possess different specifications [43]. Model-based methods address such a limitation by directly interpreting the same prediction model. The cuttingedge model-based interpretable methods include the Generalized Additive Model framework (GAM) and the Wide and Deep Learning framework (W&D). GAM is unable to model higher-order feature interactions and caters to small feature sizes, limiting its applicability in this study. Addressing that, W&D combines a deep learning model with a linear model [13]. However, a few limitations persist for W&Ds. From the prediction perspective, W&Ds are restricted to structured data, constrained by the linear component. In video analytics, only using structured data hampers the predictive accuracy. From the interpretation perspective, W&Ds fall short in producing the precise total effect, defined as
可解释的机器学习模型可以广泛分类为事后和基于模型的[49]。这些可解释方法的一般原则是估计总效应,即当一个特征增加一个单位时结果的变化。事后方法使用单独的解释模型解释黑盒预测模型,例如 Shapley Additive Explanations (SHAP) 和 Local Interpretable Model-agnostic Explanations (LIME) [43, 52]。然而,这些独立的解释模型可能会改变预测模型的总效应,因为它们具有不同的规范[43]。基于模型的方法通过直接解释相同的预测模型来解决这种限制。最前沿的基于模型的可解释方法包括广义加性模型框架 (GAM) 和宽深学习框架 (W&D)。GAM 无法建模高阶特征交互并且适用于小特征大小,限制了它在本研究中的适用性。为了解决这个问题,W&D 将深度学习模型与线性模型结合起来[13]。然而,W&D 仍然存在一些限制。 从预测的角度来看,W&Ds 受限于结构化数据,受线性组件的约束。在视频分析中,仅使用结构化数据会影响预测准确性。从解释的角度来看,W&Ds 在产生精确的总效果方面表现不佳,定义为

the precise change of the prediction when a feature increases one unit. They use the weights of the linear component (main effect) to approximate the total effect of the input on the prediction, even though the main effect and total effect largely differ.
当一个特征增加一个单位时,预测发生的精确变化。他们使用线性组件的权重(主效应)来近似输入对预测的总效果,尽管主效应和总效果在很大程度上不同。
To address the limitations of the existing interpretable methods, we propose a novel model-based interpretable model that learns from both structured features and unstructured video data and produces a precise interpretation, named Precise Wide-and-Deep Learning (PrecWD). This work contributes to data analytics methodology and IS design theory. First, we develop PrecWD that innovatively extends the W&D framework to provide a precise interpretation and perform unstructured data analysis. As our core contribution, the proposed interpretation component can precisely capture the total effect of each feature. A generative adversarial network learns the data distribution and facilitates such an interpretation process. Because of our interpretation component, we are able to add the unstructured component to extend the W&D framework to unstructured data analytics. Empirical evaluations from two case studies indicate that PrecWD outperforms black-box and interpretable models in viewership prediction. We design two user studies to validate the contribution of our precise interpretation component, which indicates PrecWD can provide better interpretability than state-of-the-art interpretable methods. Our feature interpretation results in improved trust and usefulness of the model.
为了解决现有可解释方法的局限性,我们提出了一种新颖的基于模型的可解释模型,它从结构化特征和非结构化视频数据中学习,并产生精确的解释,命名为Precise Wide-and-Deep Learning(PrecWD)。这项工作对数据分析方法论和信息系统设计理论做出了贡献。首先,我们开发了PrecWD,创新地扩展了W&D框架,提供了精确的解释并执行非结构化数据分析。作为我们的核心贡献,所提出的解释组件可以精确捕捉每个特征的总效果。生成对抗网络学习数据分布并促进这样的解释过程。由于我们的解释组件,我们能够添加非结构化组件以扩展W&D框架以进行非结构化数据分析。来自两个案例研究的实证评估表明,PrecWD在观众预测方面优于黑盒和可解释模型。 我们设计了两项用户研究来验证我们精确解释组件的贡献,表明PrecWD可以比最先进的可解释方法提供更好的可解释性。我们的特征解释结果提高了模型的信任度和实用性。
Second, for design science information systems (IS) research , the successful model design offers indispensable design principles for model development: 1) Our interpretation as well as the user studies suggest generative models can assist the interpretation of predictive models; 2) Our ablation studies indicate raw unstructured data can complement crafted features in prediction. These design principles along with our model provide a "nascent design theory" that is generalizable to other problem domains. Our new interpretation component can be leveraged by other IS studies to provide precise interpretation for prediction tasks. The user studies can be adopted by IS research to evaluate interpretable ML methods. PrecWD is also a deployable information system for video-sharing sites. It is capable of predicting potentially popular videos and interpreting the factors. Video-sharing sites could leverage this system to actively monitor the predictors and manage viewership and content quality.
其次,对于设计科学信息系统(IS)研究,成功的模型设计为模型开发提供了必不可少的设计原则:1)我们的解释以及用户研究表明生成模型可以帮助解释预测模型;2)我们的消融研究表明原始非结构化数据可以在预测中补充精心设计的特征。这些设计原则连同我们的模型提供了一个可推广到其他问题领域的“新生设计理论”。我们的新解释组件可以被其他 IS 研究利用,为预测任务提供精确的解释。用户研究可以被 IS 研究采用,以评估可解释的 ML 方法。PrecWD 也是一个可部署的视频分享站点信息系统。它能够预测潜在热门视频并解释影响因素。视频分享站点可以利用该系统积极监控预测因素并管理观看量和内容质量。

Literature Review 文献综述

Video viewership prediction
视频观看量预测

This study focuses on YouTube, as it is the most successful video-sharing site since its establishment in 2005 and constitutes the largest share of Internet traffic [42]. We do not use short-form videos (e.g., TikTok and Reels), because most of them use trending songs as the background sound without narratives. Those background songs are the templates provided by the platform that is irrelevant to the video content. The sound of most YouTube videos is the direct description of the video content, for which we designed many narrative-related features, such as sentiment and readability, that are relevant to video production. Viewership is essential for content creators, as it is the key metric used by video-sharing sites to pay them [42]. For the platforms, their user-generated content is constantly bombarded with violative videos, ranging from pornography, copyrighted material, violent extremism, to misinformation. YouTube has recently developed its AI
本研究重点关注YouTube,自2005年成立以来,它已成为最成功的视频分享网站,并构成了互联网流量的最大份额[42]。我们不使用短视频(如TikTok和Reels),因为它们大多使用流行歌曲作为背景音,而没有叙事。这些背景音乐是平台提供的模板,与视频内容无关。大多数YouTube视频的声音是对视频内容的直接描述,因此我们设计了许多与叙事相关的特征,如情感和可读性,这些特征与视频制作相关。观众的观看量对内容创作者至关重要,因为这是视频分享网站用来支付他们的关键指标[42]。对于平台来说,它们的用户生成内容不断受到违规视频的轰炸,从色情、侵权材料、暴力极端主义到错误信息。YouTube最近已经开发了其人工智能...

system to prevent violative videos from spreading. This AI system's effectiveness in finding rule-breaking videos is evaluated with a metric called the violative view rate, which is the percentage of views coming from violative videos [50]. This disclosure shows that video viewership is a vital measure for YouTube to track popular videos where credible videos can be approved and violative videos can be banned from publication.
防止违规视频传播的系统。评估此 AI 系统在发现违规视频方面的有效性,使用一个称为违规视图率的指标,该指标是来自违规视频的观看百分比[50]。这一披露显示视频观看量对 YouTube 来说是一个重要的衡量标准,可以追踪受欢迎视频,可通过认可可信视频和禁止违规视频发布。
Recognizing the significance of video viewership prediction, prior studies have developed conventional ML and deep learning models [33, 39, 42, 57]. Although reaching sufficient predictive power, these models fail to provide actionable insights for business decision-making due to the lack of interpretability. Studies show that decision-makers exhibit an inherent distrust of automated predictive models, even if they are more accurate than humans [46, 47]. Interpretability can increase trust in predictions, expose hidden biases, and reduce vulnerability to adversarial attacks, thus a much-needed milestone for fully harnessing the power of ML in decision-making.
认识到视频观看预测的重要性,先前的研究开发了传统的 ML 和深度学习模型[33, 39, 42, 57]。尽管具有足够的预测能力,但由于缺乏可解释性,这些模型未能为业务决策提供可操作的见解。研究表明,决策者对自动预测模型存在固有的不信任,即使这些模型比人类更准确[46, 47]。可解释性能增加预测的信任度,揭露隐藏的偏见,并减少对敌对攻击的脆弱性,从而成为在决策中充分利用 ML 的重要里程碑。

Interpretability definition and value proposition
可解释性的定义和价值主张

The definition of interpretability varies based on the domain of interest [37]. Two main definitions exist in the area of business analytics, predictive analytics, and social media analytics . One definition is the degree to which a user can trust and understand the cause of a decision [46]. The interpretability of a model is higher if it is easier for a user to trust the model and trace back why a prediction was made. Molnar [47] notes that interpretable ML should make the behavior and predictions of ML understandable and trustworthy to humans. Under the first definition, interpretability is related to how well humans trust a model. The second definition suggests "AI is interpretable to the extent that the produced interpretation is able to maximize a user's target performance" [16]. Following this definition, Lee et al. [37] use usefulness to measure ML interpretability, as useful models lead to better decision-making performance.
可解释性的定义因兴趣领域而异 [37]。在商业分析、预测分析和社交媒体分析领域存在两个主要定义 。一个定义是用户能够信任和理解决策原因的程度 [46]。如果用户更容易相信模型并追溯预测原因,那么模型的可解释性就更高。Molnar [47]指出,可解释的机器学习应该使人类能够理解和信任机器学习的行为和预测情况。根据第一个定义,可解释性与人类信任模型的程度有关。第二个定义暗示着"AI 的可解释性是指所产生的解释能够最大化用户的目标性能" [16]。根据这一定义,李等人 [37]使用实用性来衡量机器学习的可解释性,因为有用的模型会带来更好的决策表现。
Interpretability brings extensive value to ML and the business world. The most significant one is social acceptance, which is required to integrate algorithms into daily lives. Heider and Simmel [30] show that people attribute beliefs and intentions to abstract objects, so they are more likely to accept ML if their decisions are interpretable. As our society is progressing toward the integration with , new regulations have been imposed to require verifiability, accountability, and more importantly, full transparency of algorithm decisions. A key example is the European General Data Protection Regulation (GDPR), which was enforced to provide data subjects the right to an explanation of algorithm decisions [14].
可解释性为机器学习和商业世界带来了广泛的价值。其中最重要的是社会接受度,这是将算法整合到日常生活中所必需的。海德尔和西梅尔[30]表明,人们会将信仰和意图归因于抽象对象,因此如果他们的决策是可解释的,他们更有可能接受机器学习。随着我们的社会朝着与 的整合发展,新的法规已经出台,要求算法决策具有可验证性、问责性,更重要的是完全透明。一个关键的例子是欧洲通用数据保护条例(GDPR),该条例要求为数据主体提供算法决策的解释权利[14]。
For platforms interested in improving content quality management, an interpretable viewership prediction method not only identifies patterns of popular videos but also facilitates trust and transparency in their ML systems. For content creators, it takes significant time and effort to create such content. An interpretable viewership prediction method can recommend optimal prioritization of video features in the limited time.
对于有意提高内容质量管理的平台来说,一种可解释的观众预测方法不仅能够识别热门视频的模式,还能促进他们的机器学习系统的信任和透明度。对于内容创作者来说,创作这样的内容需要大量的时间和精力。一种可解释的观众预测方法可以在有限的时间内推荐视频特征的最佳优先级。

Interpretable machine learning methods
可解释的机器学习方法

We develop a taxonomy of the extant interpretable methods in Table 1 based on the data types, the type of algorithms (i.e., model-based and post hoc), the scope of interpretation (i.e., instance level and model level), and how they address interpretability.
我们根据数据类型、算法类型(即基于模型和事后分析)、解释范围(即实例级和模型级)以及它们如何解释性地处理,对现有可解释方法进行了表 1 的分类。
Various forms of data have been used to train and develop interpretable ML methods, including tabular [28, 61], image [26,56], and text [5]. The scope of interpretation can be at either instance- or model-level. An interpretable ML method can be either embedded in the neural network (model-based) or applied as an external model for explanation (post hoc) [49]. Post hoc methods build on the predictions of a black-box model and add ad hoc explanations . Any interpretable ML algorithm that directly interprets the original prediction model falls into the model-based category [5, 10, 28]. For most model-based algorithms, any change in the architecture needs alteration in the method or hyperparameters of the interpretable algorithm.
各种形式的数据已被用于训练和开发可解释的机器学习方法,包括表格[28, 61]、图像[26,56]和文本[5]。解释范围可以是实例级或模型级。可解释的机器学习方法可以嵌入在神经网络中(基于模型),也可以作为外部模型用于解释(事后)[49]。事后方法建立在黑盒模型的预测基础上,并添加临时解释 。任何直接解释原始预测模型的可解释机器学习算法都属于基于模型的类别[5, 10, 28]。对于大多数基于模型的算法,架构的任何变化都需要修改可解释算法的方法或超参数。
Post hoc methods can be backpropagation- or perturbation-based. The backpropagationbased methods rely on gradients that are backpropagated from the output layer to the input layer [75]. Yet, the most widely used are the perturbation-based methods. These methods generate explanations by iteratively probing a trained ML model with different inputs. These perturbations can be on the feature level by replacing certain features with zeros or random counterfactual instances. SHAP is the most popular perturbation-based post hoc method. It probes feature correlations by removing features in a game-theoretic framework [43]. LIME is another common perturbation-based method [52]. For an instance and its prediction, simulated randomly sampled data around the neighborhood of the input instance are generated. An explanation model is trained on this newly created dataset of perturbed instances to explain the prediction of the black-box model. SHAP and LIME are both feature additive methods that use an explanation model that is a linear function of binary variables: [43]. The prediction of this explanation model matches the prediction model [43]. Eventually can explain the attribution of each feature to the prediction.
事后方法可以是基于反向传播或扰动的。基于反向传播的方法依赖于从输出层向输入层反向传播的梯度[75]。然而,最广泛使用的是基于扰动的方法。这些方法通过迭代地使用不同输入来探测经过训练的机器学习模型。这些扰动可以在特征级别上,通过用零或随机对立实例替换某些特征。SHAP是最流行的基于扰动的事后方法。它通过在博弈论框架中移除特征来探测特征之间的相关性[43]。LIME是另一种常见的基于扰动的方法[52]。对于一个实例及其预测,会在输入实例附近的邻域内生成模拟随机抽样数据。然后在这个新创建的扰动实例数据集上训练一个解释模型,以解释黑盒模型的预测。SHAP和LIME都是使用解释模型的特征添加方法,该模型是二进制变量的线性函数: [43]。这个解释模型的预测与预测模型相匹配 [43]。 最终可以解释每个特征对预测的归因。
These post hoc methods have pitfalls. Laugel et al. [36] showed that post hoc models risk having explanations resulted from the artifacts learned by the explanation model instead of the actual knowledge from the data. This is because the most popular post hoc models SHAP and LIME use a separate explanation model to explain the original prediction model [43]. The model specifications of are fundamentally different from . The feature contributions from are not the actual feature effect of the prediction model . Therefore, the magnitude and direction of the total effect could be misinterpreted. Slack et al. [59] revealed that SHAP and LIME are vulnerable, because they can be arbitrarily controlled. Zafar and Khan [74] reported that the random perturbation that LIME utilizes results in unstable interpretations, even in a given model specification and prediction task.
这些事后方法存在缺陷。Laugel 等人[36]表明,事后模型存在风险,解释可能是由解释模型学习到的人工制品而不是来自数据的实际知识。这是因为最流行的事后模型 SHAP 和 LIME 使用单独的解释模型 来解释原始预测模型 [43]。 的模型规范与 根本不同。来自 的特征贡献 不是预测模型 的实际特征效果。因此,总效果的大小和方向可能被误解。Slack 等人[59]揭示了 SHAP 和 LIME 的脆弱性,因为它们可以任意控制。Zafar 和 Khan[74]报告说,LIME 使用的随机扰动导致解释不稳定,即使在给定的模型规范和预测任务中也是如此。
Addressing the limitations of post hoc methods, model-based interpretable methods have a self-contained interpretation component that is faithful to the prediction. The modelbased methods are usually based on two frameworks: the GAM and the W&D framework. GAM's outcome variable depends linearly on the smooth functions of predictors, and the interest focuses on inference about these smooth functions. However, for prediction tasks with many features, GAMs often require millions of decision trees to provide accurate results using additive algorithms. Also, depending on the model architecture, overregularization reduces the accuracy of GAM. Many methods have improved GAMs: GA2M was proposed to improve the accuracy while maintaining the interpretability of GAMs [10]; NAM learns a linear combination of neural networks where each of them attends to a single feature, and each feature is parametrized by a neural network [5]. These
针对事后方法的局限性,基于模型的可解释方法具有一个忠实于预测的自包含解释组件。这些基于模型的方法通常基于两个框架:GAM 和 W&D 框架。GAM 的结果变量在预测因子的平滑函数上线性依赖,兴趣集中在这些平滑函数的推断上。然而,对于具有许多特征的预测任务,GAM 通常需要数百万个决策树,才能使用加法算法提供准确的结果。此外,根据模型架构,过度正则化会降低 GAM 的准确性。许多方法已经改进了 GAM:GA2M 旨在提高准确性,同时保持 GAM 的可解释性;NAM 学习神经网络的线性组合,其中每个神经网络关注单个特征,每个特征由神经网络参数化。

networks are trained jointly and can learn complex-shape functions. Interpreting GAMs is easy as the impact of a feature on the prediction does not rely on other features and can be understood by visualizing its corresponding shape function. However, GAMs are constrained by the feature size, because each feature is assumed independent and trained by a standalone model. When the feature size is large and feature interactions exist, GAMs struggle to perform well [5].
网络是联合训练的,可以学习复杂的形状函数。解释 GAMs 很容易,因为特征对预测的影响不依赖于其他特征,可以通过可视化其对应的形状函数来理解。然而,由于每个特征被假定为独立的并由独立模型训练,GAMs 受到特征大小的限制。当特征数量较大且存在特征交互时,GAMs 很难表现良好。
The W&D framework addresses the low- and high-order feature interactions in interpreting the importance of features [13]. W&Ds are originally proposed to improve the performance of recommender systems by combining a generalized linear model (wide component) and a deep neural network (deep component) [13]. Since the generalized linear model is interpretable, as noted in Cheng et al. [13], soon this model has been recognized or used as an interpretable model in many applications [12, 28, 62, 68]. The wide component produces a weight for each feature, defined as the main effect, that can interpret the prediction. The deep component models high-order relations in the neural network to improve predictive accuracy. Since W&D, two categories of variants have emerged. The first category improves the predictive accuracy of W&D. Burel et al. [9] leveraged convolutional neural network ( in the deep component to identify information categories in crisisrelated posts. Han et al. [29] used a CRF layer to merge the wide and deep components and predict named entities of words. The second category improves the interpretability of . Chai et al. [12] leveraged W&D for text mining interpretation. Guo et al. [28] proposed piecewise W&D where multiple regularizations are introduced to the total loss function to reduce the influence of the deep component on the wide component so that the weights of the wide component are closer to the total effect. Tsang et al. [62] designed an interaction component in W&D. This method was used to interpret the statistical interactions of housing price and rental bike count predictions.
W&D框架解决了解释特征重要性中的低阶和高阶特征交互问题[13]。最初,W&D被提出来改进推荐系统的性能,通过结合广义线性模型(宽组件)和深度神经网络(深度组件)[13]。由于广义线性模型是可解释的,正如Cheng等人[13]注意到的那样,很快这个模型就被认可或者用作了许多应用领域中的可解释模型[12,28,62,68]。宽组件为每个特征产生一个权重,定义为主效应,可以解释预测结果。深度组件在神经网络中建模高阶关系以提高预测准确性。自从W&D出现以来,出现了两类变体。第一类改进了W&D的预测准确性。Burel等人[9]利用卷积神经网络(深度组件)来识别危机相关帖子中的信息类别。Han等人[29]使用了CRF层来合并宽和深组件,并预测单词的命名实体。第二类改进了W&D的可解释性。Chai等人。 [12] 利用 W&D 进行文本挖掘解释。郭等人 [28] 提出了分段 W&D,其中引入了多个正则化项到总损失函数中,以减少深度组件对宽度组件的影响,使得宽度组件的权重更接近总效果。曾等人 [62] 在 W&D 中设计了一个交互组件。该方法被用于解释房价和租赁自行车数量预测的统计交互。
The W&D and its variants still fall short in two aspects. First, W&D uses the learned weights of the wide component (main effect) to interpret the prediction. only reflects the linear component which is only a portion of the entire prediction model. Consequently, is not the total effect of the joint model. For instance, the weight for feature does not imply that if increases by one unit, the viewership prediction would increase . The real feature interpretation for cannot be precisely reflected in . This imprecise interpretation also occurs in post hoc methods and GAMs, due to their interpretation mechanisms. Post hoc methods, such as SHAP and LIME, use an independent explanation model as a proxy to explain the original prediction model. This separate explanation mechanism inherently cannot directly nor precisely interpret the original prediction model. GAMs interpret each feature independently using a standalone model, thus cannot interpret the precise feature effect when all the features are used together. Precise total effect is critical, as it affects the weight (importance) of feature effects, which determines the feature importance order and effect direction. Correct feature importance order is essential for content creators to know which feature to prioritize. Given limited time, such order informs content creators of the prioritization of the work process. In addition, is constant for all values of , which assumes the feature effect is insensitive to changes of feature value. This assumption does not hold in real settings. For instance, when a video is only minutes long, increasing one minute would significantly impact its predicted viewership. When a video is hours long, increasing one minute does not have a visible effect on predicted viewership.
W&D及其变体在两个方面仍然存在不足。首先,W&D使用宽组件 对于所有的 值都是恒定的,这意味着特征效果对特征值的变化不敏感。这种假设在实际环境中并不成立。例如,当一个视频只有几分钟长时,增加一分钟会显著影响其预测的观看人数。而当一个视频长达数小时时,增加一分钟对预测的观看人数没有明显影响。
Second, unstructured data are not compatible with the W&D framework. The existing W&D framework enforces the wide component and the deep component to share inputs and be trained jointly, such that the wide component can interpret the deep component. The wide component in W&D is a linear model: , where is a vector of features, including raw input features and producttransformed features [13]. The raw input features are numeric, including continuous features and vectors of categorical features [13]. The product transformation is
其次,非结构化数据与 W&D 框架不兼容。现有的 W&D 框架强制要求宽组件和深度组件共享输入并联合训练,以便宽组件可以解释深度组件。W&D 中的宽组件是一个线性模型: ,其中 是一个包括原始输入特征和产品转换特征的 特征向量[13]。原始输入特征是数值型的,包括连续特征和分类特征的向量[13]。产品转换是
, where is the raw numeric input feature , and indicates whether the -th feature appears in the -th transformation. Both raw input features and product-transformed features are structured data. This is due to the structured nature of the linear model . It is incapable of processing unstructured videos in this study, thus significantly limiting W&D's performance in unstructured data analytics.
,其中 是原始数值输入特征 指示第 个特征是否出现在第 个转换中。原始输入特征和产品转换特征都是结构化数据。这是由于线性模型 的结构化特性。在本研究中,它无法处理非结构化视频,因此显著限制了 W&D 在非结构化数据分析中的性能。

Generative models for synthetic sampling
用于合成采样的生成模型

In order to calculate the precise total effect, we develop a novel model-based interpretation method, which we will detail in the Proposed Approach section. Learning the data distribution is critical for the precise total effect with a model-based interpretation method, which can be facilitated by synthetic sampling with generative models.
为了计算精确的总效果,我们开发了一种新颖的基于模型的解释方法,我们将在“拟议方法”部分详细介绍。学习数据分布对于使用基于模型的解释方法计算精确的总效果至关重要,这可以通过使用生成模型进行合成采样来实现。
Early forms of generative models date back to Bayesian Networks and Helmholtz machines. Such models are trained via an EM algorithm using variational inference or data augmentation [19]. Bayesian Networks require the knowledge of the dependency between each feature pair, which is useful for cases with limited features that have domain knowledge. When the feature size is large, constructing feature dependencies is infeasible and leads to poor performance. Recent years have seen developments in deep generative models. The emerging approaches, including Variational Autoencoders (VAEs), diffusion models, and Generative Adversarial Networks (GANs), have led to impressive results in various applications. Unlike Bayesian Networks, deep generative models do not require the knowledge of feature dependencies.
早期的生成模型形式可以追溯到贝叶斯网络和赫尔姆霍兹机。这些模型通过使用变分推断或数据增强的 EM 算法进行训练。贝叶斯网络需要了解每对特征之间的依赖关系,这对于具有领域知识的特征有限的情况非常有用。当特征数量很大时,构建特征依赖关系是不可行的,并且会导致性能不佳。近年来,深度生成模型取得了发展。新兴方法,包括变分自动编码器(VAEs)、扩散模型和生成对抗网络(GANs),在各种应用中取得了令人印象深刻的结果。与贝叶斯网络不同,深度生成模型不需要了解特征之间的依赖关系。
VAEs use an encoder to compress random samples into a low-dimensional latent space and a decoder to reproduce the original sample [55]. VAEs use variational inferences to generate an approximation to a posterior distribution. Diffusion models include a forward diffusion stage and a reverse diffusion stage. In the forward diffusion stage, the input data is gradually perturbed over several steps by adding Gaussian noise. In the reverse stage, a model is tasked at recovering the original input data by learning to gradually reverse the diffusion process, step by step [15]. GANs are a powerful class of deep generative models consisting of two networks: a generator and a discriminator. These two networks form a contest where the generator produces high-quality synthetic data to fool the discriminator, and the discriminator distinguishes the generator's output from the real data. Deep learning literature suggests that the generator could learn the precise real data distribution as long as those two networks are sufficiently powerful . The resulting model is a generator that can closely approximate the real distribution.
VAE使用编码器将随机样本压缩到低维潜在空间,并使用解码器复制原始样本。VAE使用变分推断生成对后验分布的近似。扩散模型包括前向扩散阶段和反向扩散阶段。在前向扩散阶段,输入数据逐步通过添加高斯噪声在几个步骤中逐渐扰动。在反向阶段,模型被要求通过学习逐步反转扩散过程来恢复原始输入数据。GAN是一类强大的深度生成模型,由生成器和鉴别器两个网络组成。这两个网络形成一个竞争,生成器产生高质量的合成数据以欺骗鉴别器,而鉴别器区分生成器的输出和真实数据。深度学习文献表明,只要这两个网络足够强大,生成器就可以学习到精确的真实数据分布。最终的模型是一个可以紧密逼近真实分布的生成器。
Although VAEs, diffusion models, and GANs are emergent generative models, GANs are preferred in this study. VAEs optimize the lower variational bound, whereas GANs have no such assumption. In fact, GANs do not deal with any explicit probability density estimation.
尽管 VAEs、扩散模型和 GANs 是新兴的生成模型,但在本研究中更倾向于使用 GANs。VAEs 优化较低的变分界限,而 GANs 则没有这样的假设。事实上,GANs 不涉及任何显式的概率密度估计。
The requirement of VAEs to learn explicit density estimation hinders their ability to learn the true posterior distribution. Consequently, GANs yield better results than VAEs [7]. Diffusion models are mostly tested in computer vision tasks. Its effectiveness in other contexts lacks strong evidence. Besides, diffusion models suffer from long sampling steps and slow sampling speed, which limits their practicality [15]. Without careful refinement, diffusion models may also introduce inductive bias, which is undesired compared to GANs.
VAEs 需要学习显式密度估计的要求阻碍了它们学习真实后验分布的能力。因此,GANs 比 VAEs 产生更好的结果[7]。扩散模型主要在计算机视觉任务中进行测试。它在其他情境中的有效性缺乏强有力的证据。此外,扩散模型受长采样步骤和缓慢采样速度的困扰,这限制了它们的实用性[15]。没有经过仔细的改进,扩散模型可能会引入归纳偏差,与 GANs 相比是不希望的。

Proposed approach 提出的方法

Problem formulation 问题阐述

Let denote a set of unstructured raw videos . Let denote the structured video features . The feature values of a video are represented by . The input to our model is . Viewership is operationalized as the average daily views of a video , computed as the view counts to date divided by the video age in days. A longitudinal study about YouTube viewership shows that the average daily views are stable in the long term [71]. In other words, "older videos are not forgotten as they get older" [71]. Many other studies have also used daily views as a variable of interest [23, 34, 71]. The intended practical use of our method is also to predict long-term viewership. For this purpose, ADV is an appropriate measure. The of video is denoted as . Our objective is to learn a model to predict where , and interpret the precise total effect of each feature on the output in the given model and feature setting.
表示一组非结构化原始视频 。让 表示结构化视频特征 。视频 的特征值由 表示。我们模型的输入是 。观看量被操作为视频 的平均每日观看次数,计算方法是迄今为止的观看次数除以视频的天龄。关于 YouTube 观看量的纵向研究表明,平均每日观看次数在长期内是稳定的[71]。换句话说,“随着视频变老,旧视频不会被遗忘”[71]。许多其他研究也使用每日观看次数作为感兴趣的变量[23, 34, 71]。我们方法的预期实际用途也是预测长期观看量。为此,ADV 是一个合适的度量。视频 被表示为 。我们的目标是学习一个模型 来预测 ,其中 ,并解释在给定模型和特征设置中每个特征 对输出 的精确总效应。

PrecWD model for video viewership prediction and interpretation
视频观看量预测和解释的 PrecWD 模型

The proposed model builds upon the state-of-the-art W&D framework while addressing two challenges: 1) W&D cannot offer the precise total effect and its dynamic changes; 2) W&D can only process structured data. We propose PrecWD, consisting of the following subcomponents.
提出的模型建立在最先进的 W&D 框架之上,同时解决了两个挑战:1)W&D 无法提供精确的总效应及其动态变化;2)W&D 只能处理结构化数据。我们提出了 PrecWD,包括以下子组件。

Piecewise linear component
分段线性组件

Each feature captures a different aspect of a video. Within each feature, heterogeneity between different values exists. It is essential to consider the homogeneity among similar feature values and the heterogeneity across different feature values. Specifically, we need to differentiate the varied feature effects when the feature is at different values. We leverage a piecewise linear function in the linear component, which is adopted from Guo et al. [28]. For the -th feature, let and . We partition each feature into intervals: , where . The piecewise feature vector for the -th data point is
每个特征 捕捉视频的不同方面。在每个特征中,不同值之间存在异质性。考虑类似特征值之间的同质性和不同特征值之间的异质性是至关重要的。具体来说,我们需要区分特征在不同值时的不同特征效应。我们在线性组件中利用了分段线性函数,该函数源自 Guo 等人[28]。对于第 个特征,令 。我们将每个特征分成 个区间: ,其中 。第 个数据点的分段特征向量是
where is the weight in the linear component, is the bias, and is the output.
其中 是线性组件中的权重, 是偏置, 是输出。

Attention-based second-order component
基于注意力的二阶组件

Prior studies suggest explicitly modeling feature interactions improves predictive accuracy [58]. In parallel with the piecewise linear component, we include an attention-based second-order component to model the feature interactions. The input to this component is . For each video feature , the interaction term of and is denoted as . Each interaction term has a parameter. A set of features will generate interaction terms. This will cause the learnable parameters to grow quadratically as the feature size increases. To prevent such a quadratic growth and optimize computational complexity, we add an attention mechanism in the second-order component where the number of interactions is fixed. The attention-based component could scale to a large number of interactions while salient interaction terms still stand out. The attention mechanism assigns a score to each interaction term .
先前的研究表明,明确建模特征交互可以提高预测准确性[58]。与分段线性组件并行,我们包括一个基于注意力的二阶组件来建模特征交互。该组件的输入是 。对于每个视频特征 的交互项被表示为 。每个交互项都有一个参数。一组 特征将生成 个交互项。随着特征大小的增加,可学习参数会呈二次增长。为了防止这种二次增长并优化计算复杂性,我们在二阶组件中添加了一个注意力机制,其中交互数是固定的。基于注意力的组件可以扩展到大量的交互,同时显著的交互项仍然突出。注意力机制为每个交互项 分配一个分数
where and are learnable parameters that are shared for all . The attention score is used to weigh the interaction terms. The output is as follows, where is a learnable weight and is the weighted sum of salient interaction terms.
其中 是可学习的参数,对所有 共享。注意力分数 用于加权交互项。输出如下,其中 是可学习权重, 是显著交互项的加权和。

Nonlinear Higher-Order Component
非线性高阶组件

The third component is a deep neural network that captures higher-order effects. The number of hidden layers is determined using a grid search in the empirical analyses. The purpose of the higher-order component is to leverage deep learning to improve predictive accuracy. Without loss of generality, for the -th video, each hidden layer computes:
第三个组件是一个捕捉高阶效应的深度神经网络。隐藏层的数量是通过实证分析中的网格搜索确定的。高阶组件的目的是利用深度学习提高预测准确性。不失一般性,对于第 个视频,每个隐藏层计算:
where is the layer number. is the ReLU. , and are the input, bias, and weight at the -th layer. Therefore, the input of the first layer is the feature vector (i.e., ). The output of this component is given by
层号在哪里。 是 ReLU。 是第 层的输入、偏置和权重。因此,第一层的输入是特征向量(即, )。该组件的输出由以下给出
where is the learnable weight and is the number of layers.
其中 是可学习的权重, 是层数。

Unstructured component 非结构化组件

The W&D enforces the wide component and deep component to share inputs so that the wide component can interpret the deep component. Since the wide component can only analyze structured data, the W&D is also restricted to structured data. However, videos are unstructured by nature. Obtaining high predictive accuracy in video analytics demands the capacity to process unstructured data. We relax the restraint of the subcomponents sharing inputs, because our proposed interpretation component could offer precise total effects without soliciting dependencies on the subcomponents. We extend the W&D with an unstructured component. The input to our model is . The structured and practically meaningful features are grouped in , which is fed to the previous three components. Unstructured video data are grouped in that is fed into the unstructured component. Two approaches can incorporate the unstructured data into our model: either use a representation learning model to learn hidden features for , then mix those hidden features with ; or design an unstructured component that can directly process raw videos and separate the unstructured effect from the structured effect. Both approaches are viable with our model, but we opt to the latter approach because the learned hidden features of are not human-understandable. Therefore, is more helpful in improving prediction accuracy rather than interpretability. Separating it from could ensure that can be cleaned up with the carefully designed, understandable, and practically meaningful video features. When using the wide component to process , it will be very clear to see the main effect which can be compared with our total effect. We devise a hybrid VGG-LSTM architecture for processing videos, shown in Equations 8-11. A VGG-16 architecture is designed to process the video frames, and an LSTM layer is added on the top for frame-byframe sequence processing. The last LSTM cell summarizes the video information for the th video .
W&D 强制宽组件和深组件共享输入,以便宽组件可以解释深组件。由于宽组件只能分析结构化数据,因此 W&D 也受限于结构化数据。然而,视频本质上是非结构化的。在视频分析中获得高预测准确性需要处理非结构化数据的能力。我们放宽了子组件共享输入的限制,因为我们提出的解释组件可以提供精确的总效果,而不需要依赖于子组件。我们通过一个非结构化组件扩展了 W&D。我们模型的输入是 。结构化且实际有意义的特征被分组在 中,然后馈送到前三个组件。非结构化视频数据被分组在 中,然后馈送到非结构化组件。 两种方法可以将非结构化数据纳入我们的模型中:要么使用表示学习模型来学习 的隐藏特征,然后将这些隐藏特征与 混合;要么设计一个非结构化组件,可以直接处理原始视频,并将非结构化效果与结构化效果分离。这两种方法在我们的模型中都是可行的,但我们选择后一种方法,因为 的学习隐藏特征不易理解。因此, 对提高预测准确性更有帮助,而不是可解释性。将其与 分离可以确保 可以通过精心设计的、易理解的、实际有意义的视频特征进行清理。当使用广泛组件处理 时,可以清楚地看到主要效果,可以与我们的总效果进行比较。我们设计了一个用于处理视频的混合VGG-LSTM架构,显示在方程式8-11中。设计了一个VGG-16架构来处理视频帧,并在顶部添加了一个LSTM层,用于逐帧序列处理。 最后一个 LSTM 单元总结了第 个视频的视频信息。
where are the gates, is the hidden state, and is the output.
其中 是门, 是隐藏状态, 是输出。

Precise interpretation component
精确解释组件

Our core contribution lies in the precise interpretation component. Our primary focus is to offer the precise total effect of viewership prediction. PrecWD predicts the outcome variable using
我们的核心贡献在于精确的解释组件。我们的主要关注点是提供观众预测的精确总效果。PrecWD 使用预测结果变量
where denotes the main effect, denotes the secondorder effect, denotes the higher-order effect, and denotes the unstructured effect. We use ReLU because viewership is non-negative. Take feature as the illustration example, the existing W&Ds would use to approximate the effect of on the prediction, which is different from the actual total effect of that equals to the change of when increases by one unit. In order to model the precise total effect and its dynamic changes, we predict the total effect of each feature at every value. The precise total effect of when is
其中 表示主效应, 表示二阶效应, 表示高阶效应, 表示非结构化效应。我们使用 ReLU 因为观众数是非负的。以特征 为例,现有的 W&Ds 会使用 来近似 对预测的影响,这与 的实际总效果不同,后者等于 增加一个单位时 的变化。为了建模精确的总效果及其动态变化,我们预测每个值处每个特征的总效果。当 的精确总效果是
is non-negative. Therefore, the precise total effect of is
是非负的。因此, 的精确总效果是
Equation 14 is intractable because of the integral computation. In order to facilitate such computation, we utilize the Monte Carlo method. Equation 14 can be transformed to:
方程 14 由于积分计算而难以解决。为了便于这种计算,我们利用蒙特卡罗方法。方程 14 可以转化为:
where denotes the -th sample drawn from the distribution . In order to compute the precise total effect of , it is necessary to learn the distribution , so that samples can be drawn from it. are very sparse in the Euclidean space when using Monte Carlo method. In order to learn a smooth and accurate distribution, we embody a generative adversarial network (GAN) to learn . To overcome the instability issues of GANs, we leverage the Wasserstein GAN with gradient penalty (WGAN-GP). We cohesively embed WGAN-GP in our model. The learning loss of the discriminator in our proposed method is given by
其中 表示从分布 中抽取的第 个样本。为了计算 的精确总效应,有必要学习分布 ,以便从中抽取样本。 在使用蒙特卡罗方法时在欧几里得空间中非常稀疏。为了学习平滑准确的分布,我们采用生成对抗网络(GAN)来学习 。为了克服 GAN 的不稳定性问题,我们利用带有梯度惩罚的 Wasserstein GAN(WGAN-GP)。我们将 WGAN-GP 紧密嵌入到我们的模型中。我们提出方法中鉴别器的学习损失由以下给出:
where is a score that measures the quality of the input sample. is the real distribution. is the learned distribution by the generator. is sampled uniformly along the straight lines between pairs of points sampled from and . The distribution of is denoted as . is the gradient penalty. is a positive scalar to control the degree of the penalty. The loss of the generator is:
其中 是衡量输入样本质量的分数。 是真实分布。 是生成器学习到的分布。 是在从 采样的点对之间均匀采样的直线上的样本。 的分布表示为 是梯度惩罚。 是一个正标量,用于控制惩罚的程度。生成器的损失为:
The trained generator can closely approximate the real distribution , which is fed to Equations 12-17 to yield the precise total effects. Improving upon W&Ds, our approach corrects the feature effect from to Such
训练有素的生成器可以紧密逼近实际分布 ,将其输入到方程 12-17 中以产生精确的总效应。我们的方法改进了 W&Ds,从 纠正了特征效应。这种差异体现在不同的特征排名和权重上,其对解释的实质影响在实证分析中显示出来。在线补充附录 2 展示了 PrecWD 算法。图 1 展示了其架构。

a difference is reflected in different feature rankings and weights, whose material impact on the interpretation is shown in the empirical analyses. Online supplementary appendix 2 shows the PrecWD algorithm. Figure 1 shows its architecture.
这种差异体现在不同的特征排名和权重上,其对解释的实质影响在实证分析中显示出来。在线补充附录 2 展示了 PrecWD 算法。图 1 展示了其架构。

Novelty of PreCWD PreCWD 的新颖性

PrecWD has two original and novel elements: 1) The previous subsection proposes a novel interpretation component that differentiates from W&D. W&D approximates the total effect using the main effect, while our interpretation component is able to offer a precise total effect for the prediction using Equations 12-17. In order to capture the dynamic total effect for each feature, our model predicts the total effect at every feature value. 2) We also design an unstructured component that extends the applicability of W&D to unstructured data analytics.
PreCWD 具有两个原创和新颖的元素:1)前一小节提出了一个新颖的解释组件,与 W&D 有所区别。W&D 使用主效应来近似总效应,而我们的解释组件能够使用方程 12-17 为预测提供精确的总效应。为了捕捉每个特征的动态总效应,我们的模型预测每个特征值的总效应。2)我们还设计了一个非结构化组件,将 W&D 的适用性扩展到非结构化数据分析。

Empirical analyses 实证分析

Case study 1: Health video viewership prediction
案例研究 1:健康视频观看量预测

Data preparation 数据准备

Due to the societal impact of healthcare and the timeliness of COVID-19, we first examine the utility of PrecWD in health video viewership prediction. We collected videos from wellknown health organizations' YouTube channels, including NIH, CDC, WHO, FDA, Mayo Clinic, Harvard Medicine, Johns Hopkins Medicine, MD Anderson, and Jama. We generated a dataset of 6,528 videos (298 GB). From the data perspective, this study falls into the category of unstructured analytics of viewership prediction, as we directly feed unstructured raw videos as well as video-based features into our model, in addition to webpage metadata. These video-based features and raw videos are directly relevant to video shooting and
由于医疗保健的社会影响和 COVID-19 的时效性,我们首先检验了 PrecWD 在健康视频观看量预测中的效用。我们从知名健康组织的 YouTube 频道收集了视频,包括 NIH、CDC、WHO、FDA、Mayo Clinic、哈佛医学院、约翰霍普金斯医学院、MD Anderson 和 Jama。我们生成了一个包含 6,528 个视频(298 GB)的数据集。从数据角度来看,这项研究属于观看量预测的非结构化分析类别,因为我们直接将非结构化原始视频以及基于视频的特征直接输入到我们的模型中,除了网页元数据。这些基于视频的特征和原始视频与视频拍摄和
Figure 1. PrecWD architecture.
图 1. PrecWD 架构。

editing, articulated in online supplementary appendix 3. This is in contrast with most existing viewership prediction studies that only use structured webpage metadata as features, such as duration, resolution, dimension, title, and channel ID, among others [42], which lack actionable instructions for video production. Our data size is large among unstructured predictive analytics studies and significantly larger than common video-based deep-learning analytics benchmarking datasets, which range from 66 to 4,000 videos [17].
编辑,详细说明在在线补充附录3中。这与大多数现有的观众预测研究形成对比,这些研究仅使用结构化的网页元数据作为特征,例如持续时间、分辨率、尺寸、标题和频道ID等[42],这些特征缺乏视频制作的可操作指导。我们的数据规模在非结构化预测分析研究中较大,并且明显大于常见的基于视频的深度学习分析基准数据集,这些数据集的视频数量范围从66到4,000个[17]。
The raw videos are directly fed into PrecWD via the unstructured component. We also generate the commonly adopted video features using BRISQUE. We utilize Liborosa to compute the acoustic features. In order to generate transcripts, we develop a speech recognition model based on DeepSpeech that achieves a 7.06 percent word error rate on the LibriSpeech corpus. The description, webpage, and channel features are extracted from the webpage. A description of all the features and practical actions on feature interpretation are available in online supplementary appendix 3 . This case study presents the most prominent features, but our model is not confined to these features. It is a generalized precise interpretable model that can take other features as needed by the end users.
原始视频直接通过非结构化组件输入到 PrecWD 中。我们还使用 BRISQUE 生成常用的视频特征。我们利用 Liborosa 计算声学特征。为了生成转录文本,我们开发了基于 DeepSpeech 的语音识别模型,在 LibriSpeech 语料库上实现了 7.06% 的词错误率。描述、网页和频道特征是从网页中提取的。所有特征的描述和特征解释的实际操作可在在线补充附录 3 中找到。这个案例研究展示了最突出的特征,但我们的模型不局限于这些特征。它是一个通用的精确可解释模型,可以根据最终用户的需要接受其他特征。

Evaluation of predictive accuracy
预测准确性评估

Based on the viewership prediction and interpretable ML literature, we design two groups of baselines: black-box (ML and deep learning) [20, 64, 66, 67, 76-78] and interpretable methods . The configurations of these baselines are reported in online supplementary appendix 4 . For all the following analyses, we adopt 10 -fold crossvalidation where the dataset is divided into 10 folds. Each time we use one fold for test, one fold for validation, and eight folds for training. All the performances in the empirical analysis are the average performance of 10 -fold cross-validation. Our model converged as evidenced in Table 2. Table 3 shows the prediction comparison with black-box methods.
根据观众收视率预测和可解释的机器学习文献,我们设计了两组基准线:黑盒(机器学习和深度学习)和可解释方法。这些基准线的配置在在线补充附录4中报告。对于所有后续分析,我们采用10折交叉验证,其中数据集被分为10个折叠。每次我们使用一个折叠进行测试,一个折叠进行验证,八个折叠进行训练。在实证分析中的所有性能都是10折交叉验证的平均性能。我们的模型收敛如表2所示。表3显示了与黑盒方法的预测比较。
Following recent interpretable ML studies (reviewed in online supplementary appendix 5), since this study focuses on providing a new interpretation mechanism, our goal for the predictive accuracy comparison is to be at least on par with, if not better than, the bestperforming black-box benchmarks, so that our prediction is reliable and trustworthy. Compared with the best ML method k-nearest neighbors ( , PrecWD reduces MSE by . Compared with the best deep learning method LSTM-2, PrecWD reduces MSE by . PrecWD remains the best when we finetune the benchmarks. These results suggest our prediction is reliable, even though the prediction improvement is not our primary contribution. The main downside of these black-box methods is that they cannot offer a feature-based interpretation, which is critical from the perspectives of trust, model adoption, regulatory enforcement, algorithm transparency, and practical implications and interventions for stakeholders. Extending the line of interpretability, we compare PrecWD with the state-of-the-art interpretable methods in Table 4. Compared with the best interpretable method W&D, PrecWD reduces MSE by .
根据最近的可解释机器学习研究(在在线附录5中审阅),由于本研究侧重于提供新的解释机制,我们对预测准确性比较的目标是至少与最佳黑盒基准性能相当,甚至更好,以确保我们的预测可靠且值得信赖。与最佳机器学习方法k最近邻相比( ,PrecWD将均方误差降低了 。与最佳深度学习方法LSTM-2相比,PrecWD将均方误差降低了 。当我们微调基准时,PrecWD仍然是最佳的。这些结果表明我们的预测是可靠的,即使预测改进不是我们的主要贡献。这些黑盒方法的主要缺点是它们无法提供基于特征的解释,这在信任、模型采用、监管执行、算法透明性以及利益相关者的实际影响和干预方面至关重要。延伸解释性研究,我们在表4中将PrecWD与最先进的可解释方法进行比较。与最佳可解释方法W&D相比,PrecWD将均方误差降低了
Such enhanced predictive accuracy has a significant practical value. Content creators can use our model to help with promotional budget allocation. According to fiverr.com and veefly.com, increasing one view costs an average of . We perform a detailed benefit analysis in online supplementary appendix 6 , which suggests that the average annual benefit of our prediction enhancement over the baseline is up to billion, and the unreached
这种增强的预测准确性具有重要的实际价值。内容创作者可以使用我们的模型来帮助进行促销预算分配。根据 fiverr.com 和 veefly.com 的数据,增加一个观看的平均成本为 。我们在在线补充附录 6 中进行了详细的效益分析,表明我们的预测增强相对于基准的平均年度效益高达 亿美元,未达到的观众数量减少了 73 亿次观看。赞助商可以使用我们的模型了解赞助视频的观众群体。如果内容创作者选择支付促销费用以满足赞助商的期望,这些成本最终将转移到赞助费用上,经济分析与内容创作者的情况类似。
ఫ웅
우웅
웅으.
응 웅
8 음
옹 1
0
in
A
. - ถิ.
யे
viewership is reduced by up to 73 billion views. Sponsors can use our model to understand the viewership of the sponsored videos. If content creators opt to pay for promotion to meet sponsors' expectations, such cost will ultimately be transferred to the sponsorship cost, and the economic analysis is similar to that for the content creators.
如果内容创作者选择支付促销费用以满足赞助商的期望,这些成本最终将转移到赞助费用上,经济分析与内容创作者的情况类似。
We fine-tune the hyperparameters of PrecWD in Table 5 to search for the best predictive performance. The hyperparameters include the number of hidden layers and the number of neurons in each layer. We also replace the higher-order component in PrecWD with other deep neural networks, including CNN, LSTM, and BLSTM, to evaluate our design choice. The final model has 3 dense layers in the higher-order component and 16 neurons in each layer. To ensure fair comparisons, all the baseline methods in Tables 3-4 underwent the same parameter-tuning process, and we reported the final aforementioned fine-tuned results.
我们微调表 5 中 PrecWD 的超参数,以寻找最佳的预测性能。这些超参数包括隐藏层的数量和每个层中的神经元数量。我们还用其他深度神经网络,包括 CNN、LSTM 和 BLSTM,替换了 PrecWD 中的高阶组件,以评估我们的设计选择。最终模型在高阶组件中有 3 个密集层,每层有 16 个神经元。为了确保公平比较,表 3-4 中的所有基准方法都经历了相同的参数调整过程,我们报告了最终前述微调结果。
We further perform ablation studies to test the efficacy of the individual components of PrecWD. The upper part of Table 6 shows that removing any component of PrecWD negatively impacts the performance, suggesting optimal design choices. In order to test the effectiveness of each feature group, we remove each feature group stepwise and test its contribution to the performance. The lower part of Table 6 shows that removing any feature group will hamper the performance.
我们进一步进行消融研究,以测试 PrecWD 各个组件的有效性。表 6 的上半部分显示,去除 PrecWD 的任何组件都会对性能产生负面影响,表明了最佳的设计选择。为了测试每个特征组的有效性,我们逐步去除每个特征组并测试其对性能的贡献。表 6 的下半部分显示,去除任何特征组都会阻碍性能。
Table 3. Prediction comparison of PrecWD with black-box methods.
表 3. PrecWD 与黑盒方法的预测比较。
Method Outcome Variable: ADV 结果变量: ADV
Outcome Variable: log(total view)
结果变量: log(总浏览量)
(add published_days as a feature)
(添加发布天数作为一个特征)
Outcome Variable: ADV (add
结果变量:ADV(添加
published_days as a feature)
作为特征的发布天数)
MSE MSLE MSE MSLE MSE MSLE
PrecWD (Ours) PrecWD(我们的) 165.442 0.992 2.411 0.031 165.296 0.991
Linear regression 线性回归
KNN-1
KNN-3
KNN-5
DT-MSE
DT-MAE
DT-Fredmanmse
SVR-Linear
SVR-RBF
SVR-Poly
SVR-Sigmoid
Gaussian Process-1 高斯过程-1
Gaussian Process-3 高斯过程-3
Gaussian Process-5 高斯过程-5
MLP-1
MLP-2
MLP-3
MLP-4
CNN-1
CNN-2
CNN-3
CNN-4
LSTM-1
LSTM-2
BLSTM-1
BLSTM-2
Abbreviations: MSE, mean squared error; MSLE, mean squared log error; PrecWD, Precise Wide-and-Deep Learning; KNN, k-nearest neighbors; DT-MSE, design technology-mean squared error; DT-MAE, design technology-modal acoustic emission; SVR, support vector regression; SVR-RBF, support vector regression-radial basis function; MLP, multilayer perception; CNN, convolutional neural network; LSTM, long short-term memory networks; BLSTM, bidirectional long short-term memory networks.
缩写:MSE,均方误差;MSLE,均方对数误差;PrecWD,精确宽深学习;KNN,k 最近邻;DT-MSE,设计技术-均方误差;DT-MAE,设计技术-模态声发射;SVR,支持向量回归;SVR-RBF,支持向量回归-径向基函数;MLP,多层感知器;CNN,卷积神经网络;LSTM,长短期记忆网络;BLSTM,双向长短期记忆网络。
Note: The details of the baseline models are reported in online supplementary appendix 4.
注:基准模型的详细信息请参阅在线补充附录 4。
. .
Table 6. Ablation studies.
表 6. 消融研究。
Method
Outcome Variable: 结果变量:
ADV
Outcome Variable: log 结果变量:对数
(total view) (add (总浏览量) (添加
published_days as 发布天数作为
a feature)
Outcome Variable: ADV 结果变量: ADV
(add published_days as (添加发布天数作为
a feature)
MSE MSLE MSE MSLE MSE MSLE
PrecWD 165.442 0.992 2.411 0.031 165.296 0.991
PrecWD without Unstructured Component
无结构组件的 PrecWD
PrecWD without Piecewise Linear Component
无分段线性组件的 PrecWD
PrecWD without Second-Order Component
不带二阶分量的 PrecWD
PrecWD without High-order Component
不带高阶分量的 PrecWD
PrecWD with Simple Linear Encoding
带简单线性编码的 PrecWD
PrecWD with 10 Ordinal One-hot Encoding
使用 10 个序数 One-hot 编码的 PrecWD
PrecWD with 20 Ordinal One-hot Encoding
使用 20 个序数 One-hot 编码的 PrecWD
0.994 0.992
PrecWD with 10 Ordinal Encoding
使用 10 个序数编码的 PrecWD
PrecWD with 20 Ordinal Encoding
具有 20 个序数编码的 PrecWD
PrecWD without Attention
没有注意力的 PrecWD
All Features 165.442 0.992 2.411 0.031 165.296 0.991
Without Webpage 没有网页
Without Unstructured 没有结构化
Without Acoustic 没有声学 1.030
Without Description 没有描述 1.018 0.034 1.027
Without Transcript 没有转录 1.004 0.998
Without Channel 没有频道 1.006 1.000
Abbreviations: MSE, mean squared error; MSLE, mean squared log error; PrecWD, Precise Wide-and-Deep Learning. . .
缩写:MSE,均方误差;MSLE,均方对数误差;PrecWD,精确的宽深度学习。 . .

Interpretation of PrecWD
PrecWD 的解释

As our core contribution, PrecWD can offer precise total effect using the proposed interpretation component. The WGAN-GP layer in the interpretation component generates samples to learn the data distribution to facilitate the computation of Equations 12-17. While the standard GAN might have a convergence problem, WGAN-GP resolves this issue by penalizing the norm of the gradient of the discriminator with respect to its input. Figure 2a shows that the generator loss and discriminator loss both converge in this study. Figure shows the discriminator validation loss shrinks together with the training loss and both converge, suggesting the discriminator does not overfit in this training process. We then evaluate the quality of the generated samples. Figure 3a shows the real samples and the generated samples are inseparable, suggesting the generated samples follow a distribution similar to the real ones. In addition, we use Principal Component Analysis to reduce the feature dimensions to 10 dimensions. Table 7 suggests that WGAN-GP can accurately generate samples whose distribution has no statistical difference from the real samples. We also examined alternative generative models, including VAE and Bayesian Network, whose generated samples differ from the real samples. This suggests that our generative process is accurate and is the best design choice. We further show the quality of the generated samples by presenting the two most important features in Figure 3b-c. WGAN-GP is able to generate very accurate distributions of these features. We plot the feature-based interpretations in Figure 4.
作为我们的核心贡献,PrecWD 可以利用提出的解释组件提供精确的总效果。解释组件中的 WGAN-GP 层生成样本以学习数据分布,以便计算方程 12-17。虽然标准的 GAN 可能存在收敛问题,但 WGAN-GP 通过惩罚鉴别器相对于其输入的梯度的范数来解决这个问题。图 2a 显示,在这项研究中,生成器损失和鉴别器损失都收敛。图 显示,鉴别器验证损失与训练损失一起缩小并收敛,表明鉴别器在这个训练过程中没有过拟合。然后,我们评估生成的样本的质量。图 3a 显示,真实样本和生成的样本是不可分割的,表明生成的样本遵循与真实样本类似的分布。此外,我们使用主成分分析将特征维度降低到 10 维。表 7 表明,WGAN-GP 可以准确生成其分布与真实样本没有统计差异的样本。 我们还研究了替代生成模型,包括 VAE 和贝叶斯网络,其生成的样本与真实样本不同。这表明我们的生成过程是准确的,并且是最佳的设计选择。我们进一步通过在图 3b-c 中展示生成样本的两个最重要特征来展示生成样本的质量。WGAN-GP 能够生成这些特征的非常准确的分布。我们在图 4 中绘制了基于特征的解释。
The transcript and description features have a salient influence on the prediction. These features include medical knowledge, informativeness, readability, and vocabulary richness. The results show that one unit of increase in transcript readability results in an increase of 757.402 predicted average daily views. Medical knowledge, operationalized as the number
抄本和描述特征对预测有显著影响。这些特征包括医学知识、信息量、可读性和词汇丰富性。结果显示,抄本可读性每增加一个单位,平均每日浏览量预测增加 757.402。医学知识,作为数量化的

a. Discriminator/Generator Loss
鉴别器/生成器损失
b. Train/Valid Loss b. 训练/验证损失
Figure 2. Wasserstein generative adversarial networks with gradient penalty (WGAN-GP) convergence. a) Discriminator/generator loss and b) train/valid loss. Note: The Wasserstein loss requires using and -1 , rather than 1 and 0 . Therefore, WGAN-GP removes the sigmoid activation from the final layer of the discriminator, so that predictions are no longer constrained to [0,1], but instead can be (https://www.oreilly.com/library/view/generative-deeplearning/9781492041931/ch04.html).
图 2. 具有梯度惩罚的 Wasserstein 生成对抗网络(WGAN-GP)收敛。a)判别器/生成器损失和 b)训练/验证损失。注意:Wasserstein 损失要求使用 和-1,而不是 1 和 0。因此,WGAN-GP 从判别器的最后一层中移除了 Sigmoid 激活,使得预测不再受限于[0,1],而是可以 (https://www.oreilly.com/library/view/generative-deeplearning/9781492041931/ch04.html)。
of medical terms [42], has a sizable influence on the prediction as well. One unit of increase in the transcript medical terms will raise the predicted average daily views by 440.649 . This is because an easy-to-read and medically informative transcript or description is associated with higher viewership as the viewers attempt to seek medical information from these videos.
医学术语[42]的使用对预测也有相当大的影响。转录医学术语的增加一个单位将使预测的平均每日观看次数增加 440.649。这是因为易于阅读和医学信息丰富的转录或描述与更高的观看次数相关联,因为观众试图从这些视频中获取医学信息。
The transcript and description sentiments also significantly affect the prediction. These sentiments in the video bring in personal opinions and experiences, which are relatable to viewers, thus enticing higher viewership. The channel features have a critical influence on the prediction as well. YouTube collects information from verified channels, such as phone numbers. Verified channels signal authenticity and credibility to viewers. Therefore, the viewers are more likely to watch the videos posted by these channels.
文字记录和描述情感也显著影响预测。视频中的这些情感带来了个人观点和经验,与观众产生共鸣,从而吸引更多的观众。频道特征也对预测产生重要影响。YouTube从已验证的频道收集信息,如电话号码。已验证的频道向观众传递真实性和可信度的信号。因此,观众更有可能观看这些频道发布的视频。
PrecWD is also capable of estimating the dynamic total effect. Figure 5 shows three randomly selected examples: description vocabulary richness, description readability, and transcript negative sentiment. Figure 5a shows that the total effect of description vocabulary richness is positive when its value is low. Such a total effect turns negative when description vocabulary richness is high. This is because when description vocabulary richness is low, enriching the vocabulary makes the language more appealing. As the vocabulary richness continues to increase, the description becomes too hard to comprehend and viewers lose interest in the video. Figure shows that the total effect of description readability increases when the readability value increases. This could be because when the description is readable, it is also easier for the viewers to understand the medical knowledge and other content in the video. Figure indicates that the total effect of transcript negative sentiment increases when the value of transcript negative sentiment increases. When a video is enriched with negative sentiment, it usually contains opinions and commentaries, which may be relatable to the viewers' experience or belief and even entice them to write comments. Those interactions in the comment section further enhance viewership.
PrecWD还能够估计动态总效应。图5显示了三个随机选择的例子:描述词汇丰富度、描述可读性和剧本负面情绪。图5a显示,当描述词汇丰富度低时,其总效应是正的。当描述词汇丰富度高时,总效应变为负。这是因为当描述词汇丰富度低时,丰富词汇使语言更具吸引力。随着词汇丰富度的持续增加,描述变得过于难以理解,观众对视频失去兴趣。图

a. Sample Distribution a. 样本分布
3.c. Transcript Medical Knowledge
3.c. 医学知识转录
Figure 3. Generated samples evaluation. a) Sample distribution; b) description readability; and c) transcript medical knowledge.
图 3. 生成样本评估。a) 样本分布; b) 描述可读性; 和 c) 转录医学知识。

Precise total effect versus main effect
精确总效应与主效应

PrecWD offers the precise interpretation (total effect), while the existing approaches (W&D) could only approximate the interpretation using the main effect. The interpretation error correction by our model has significant improvement on the feature effects. For instance, our model interprets description readability to have a positive influence on viewership. This is because readable descriptions are easy to comprehend, thus attracting viewers. However, the existing approach (main effect) interprets description readability to have a negative influence, contradicting common perception. We also quantify the influence of the interpretation error correction in Table 8, where the total effect and the existing approaches (main effects and regressions) have many opposite signs and very distinct feature importance order. Such differences further validate the contribution of our precise interpretation component. The interpretation errors are the direct reason for mistrust of , which we will show in the user study later. We also perform significance tests of the feature effects in online supplementary appendix 7 . The vast majority of the feature effects are significant . Only one low-ranked feature is not significant. Nevertheless, its total effect is close to zero as well.
PrecWD提供了精确的解释(总效果),而现有方法(W&D)只能使用主效应来近似解释。我们模型对特征影响的解释误差校正在其上有显著改进。例如,我们的模型解释说明的可读性对观看有积极影响。这是因为可读的描述易于理解,从而吸引观众。然而,现有方法(主效应)解释说明的可读性会产生负面影响,与通常认知相矛盾。我们还在表格8中量化解释误差校正的影响,总效果和现有方法(主效应和回归)有很多相反的标志和非常不同的特征重要性排序。这些差异进一步验证了我们精确解释组件的贡献。解释错误是对 的不信任的直接原因,我们将在用户研究中展示。我们还在在线补充附录7中对特征影响进行显著性测试。大多数特征影响都是显著的 。 只有一个低排名特征不显著。然而,它的总效应也接近于零。
Our precise interpretation has significant practical value. To be paid by YouTube, a content creator needs to obtain 20,000 views, 4,000 watch hours, and 1,000 subscribers.
我们的精确解释具有重要的实际价值。要由 YouTube 支付,内容创作者需要获得 20,000 次观看,4,000 小时的观看时间和 1,000 个订阅者。

Figure 4. Feature-based interpretation (normalized). Note: To compare the features in the same scale, we normalized the effect values. The -axis is the weight of a variable. The higher the absolute value of the weight is, the more important the variable is.
图 4. 基于特征的解释(归一化)。注意:为了在相同的比例下比较特征,我们对效应值进行了归一化处理。横轴是变量的权重。权重的绝对值越高,变量就越重要。
a. Description Vocabulary Richness
a. 描述词汇丰富度
b. Description Readability
b. 描述可读性
c. Transcript Negative Sentiment
c. 文本负面情绪
Figure 5. Examples of the dynamic total effect. a) Description vocabulary richness; b) description readability; and c) transcript negative sentiment.
图 5. 动态总效应示例。a) 描述词汇丰富度; b) 描述可读性; 和 c) 文本负面情绪。
The precise interpretation of PrecWD can help content creators achieve these requirements by deriving three actions, which are inspired by related interpretable ML studies, reviewed in online supplementary appendix 8 . First, we can sort the features by importance and understand what features are more important in predicting viewership, thus increasing trust from end users. Because monetary incentives are involved in practical usage, content
对 PrecWD 的准确解释可以帮助内容创作者实现这些要求,通过推导出三个行动,这些行动受到相关可解释的机器学习研究的启发,在在线附录 8 中进行了审查。首先,我们可以按重要性对特征进行排序,并了解哪些特征在预测观看量方面更重要,从而增加最终用户的信任。因为在实际使用中涉及到经济激励,内容

creators, sponsors, and platforms thrive on adopting the most trustworthy model. Second, based on the feature importance order, our model can recommend the most effective features to prioritize for getting higher projected viewership. It also recommends the prioritization order of the work process for content creators when time is limited. Third, content creators can allocate a tiered budget to different features to reach the optimal predicted viewership given the fixed budget. Table 8 suggests when the feature effects are imprecise (baseline), the feature importance order can be significantly off, and the sign of the effect can even be the opposite of common perceptions. Our user study will later validate that the feature importance order and signs of feature effects of our precise interpretation are more trustworthy and helpful than other imprecise interpretations. Once qualified to be paid, content creators can earn money based on the number of views a video receives. They can also receive external sponsorships. YouTube's pay rate hovers between and per view. PrecWD's interpretation can guide content creators to make adjustments in production and improve viewership for higher returns. Online supplementary appendix 9 provides a more detailed explanation of the actions that can be derived based on the feature interpretation.
创作者、赞助商和平台在采用最可信赖的模式上蓬勃发展。其次,根据特征重要性顺序,我们的模型可以推荐最有效的特征,以优先考虑获得更高的预期观众。它还建议在时间有限时内容创作者的工作流程的优先顺序。第三,内容创作者可以将分层预算分配给不同的特征,以在固定预算下达到最佳预测观众。表8表明,当特征效果不精确(基线)时,特征重要性顺序可能会明显偏离,甚至效果的符号可能与常见认知相反。我们的用户研究将在后期验证,我们精确解释的特征重要性顺序和特征效果的符号比其他不精确解释更可信赖和有帮助。一旦有资格获得报酬,内容创作者可以根据视频获得的观看次数赚钱。他们还可以获得外部赞助。YouTube的支付率在每次观看之间波动在 之间。 PrecWD 的解释可以指导内容创作者在制作过程中进行调整,提高观众收视率,从而获得更高的回报。在线补充附录 9 提供了更详细的解释,说明可以根据特征解释得出的行动。

Case study 2: Misinformation viewership prediction
案例研究 2:虚假信息观看量预测

Among all the videos, violative videos, such as misinformation videos, are the most concerning, as they lead viewers to institute ineffective, unsafe, costly, or inappropriate protective measures; undermine public trust in evidence-based messages and interventions; and lead to a range of collateral negative consequences [12]. Early identification and prioritized screening based on potential video popularity are the keys to minimizing undesired broad consequences of misinformation videos. This goal necessitates misinformation viewership prediction as well as the understanding of the factors. Case study 2 evaluates PrecWD by predicting misinformation viewership. A number of trusted sources have identified a set of misinformation videos on YouTube (Online supplementary appendix 10). We crawled these videos, resulting in 4,445 videos (208 GB of data).
在所有视频中,违规视频,如虚假信息视频,是最令人担忧的,因为它们会导致观众采取无效、不安全、昂贵或不恰当的保护措施;破坏公众对基于证据的信息和干预措施的信任;并导致一系列负面的附带后果[12]。根据潜在视频受欢迎程度进行早期识别和优先筛选是减少虚假信息视频不良广泛后果的关键。这一目标需要虚假信息观看量预测以及对因素的理解。案例研究 2 通过预测虚假信息观看量来评估 PrecWD。许多可信的来源已经在 YouTube 上识别出一组虚假信息视频(在线补充附录 10)。我们爬取了这些视频,共计 4,445 个视频(208 GB 的数据)。
PrecWD achieved consistent leading performance. Table 9 shows that, compared to the best ML model (KNN-3), PrecWD drops mean squared error (MSE) by 11.398. Compared with the best deep learning method (CNN-3), PrecWD reduces MSE by 3.988. Compared with the best interpretable model (W&D), PrecWD reduces the MSE by 29.206. Ablation studies (Table 10) show that excluding any component negatively impacts the performance, suggesting good design choices. Table 11 shows the generated data distribution has no difference from the real data distribution. We also performed hyperparameter tuning (Online supplementary appendix 11). The conclusions are consistent with case study 1 and in favor of our method.
PrecWD取得了一致领先的表现。表9显示,与最佳ML模型(KNN-3)相比,PrecWD将均方误差(MSE)降低了11.398。与最佳深度学习方法(CNN-3)相比,PrecWD将MSE降低了3.988。与最佳可解释模型(W&D)相比,PrecWD将MSE降低了29.206。消融研究(表10)显示,排除任何组件都会对性能产生负面影响,表明设计选择良好。表11显示生成的数据分布与真实数据分布没有差异。我们还进行了超参数调整(在线补充附录11)。结论与案例研究1一致,支持我们的方法。
The enhanced prediction has profound practical values. According to the website Statista 2020, the worldwide economic loss resulting from misinformation is about 78 billion dollars. With social media being a large part of people's daily life and an important source of information, misinformation shared on these platforms account for a significant portion of the damage. Each view of the videos with health misinformation could lead to a significant burden on the healthcare system and result in negative outcomes for the patients. PrecWD offers a more accurate popularity estimation tool compared to other prediction models. Using this tool, YouTube is able to more precisely identify potentially
增强的预测具有深远的实际价值。根据 2020 年 Statista 网站的数据,全球因错误信息而导致的经济损失约为 780 亿美元。随着社交媒体成为人们日常生活的重要组成部分和信息来源,这些平台上分享的错误信息占据了损失的重要部分。观看带有健康错误信息的视频可能会给医疗系统带来重大负担,并对患者产生负面影响。与其他预测模型相比,PrecWD 提供了更准确的受欢迎度估算工具。利用这一工具,YouTube 能够更精确地识别潜在的受欢迎视频。

popular videos. This helps address the misinformation problem by prioritizing the screening of potentially popular videos, thus minimizing the influence of misinformation in a more accurate manner.
这有助于通过优先筛选潜在受欢迎视频来解决错误信息问题,从而以更准确的方式减少错误信息的影响。
We also interpreted the prediction. The textual features and video sentiments are critical features associated with misinformation viewership. This interpretation sheds light on the management of content quality for video-sharing sites. They could monitor the transcript and description features. For instance, when videos show overwhelmingly negative content, they can be marked for credibility review to prevent the potential wide spread of misinformation. The feature interpretation table and explanations are included in online supplementary appendix 11 .
我们还解释了预测。文本特征和视频情感是与误导性观看相关的关键特征。这种解释为视频分享网站的内容质量管理提供了启示。他们可以监控文本和描述特征。例如,当视频展示压倒性的负面内容时,可以标记为可信度审查,以防止误导信息的潜在广泛传播。特征解释表和解释包含在在线补充附录 11 中。
Table 9. Prediction comparison of PrecWD with baseline models (case study 2).
表 9。PrecWD 与基线模型(案例研究 2)的预测比较。
Method Outcome variable: ADV 结果变量:ADV
Outcome Variable: log(total view)
结果变量:对数(总浏览量)
(add published_days as a feature)
(将发布天数作为一个特征添加)
Outcome Variable: ADV (add
结果变量:ADV(添加
published_days as a feature)
发布天数作为一个特征)
MSE MSLE MSE MSLE MSE MSLE
PrecWD (Ours) PrecWD(我们的) 140.202 0.728 2.156 0.022 139.583 0.713
Linear regression 线性回归
KNN-1
KNN-3
KNN-5
DT-MSE
DT-MAE
DT-Fredmanmse
SVR-Linear
SVR-RBF
SVR-Poly
SVR-Sigmoid
Gaussian Process-1 高斯过程-1
Gaussian Process-3 高斯过程-3
Gaussian Process-5 高斯过程-5
MLP-1
MLP-2
MLP-3
MLP-4 0.031 *
CNN-1
CNN-2
CNN-3
CNN-4
LSTM-1
LSTM-2
BLSTM-1
BLSTM-2
W&D
W&D-CNN
W&D-LSTM
W&D-BLSTM
Piecewise W&D-10 分段 W&D-10
Piecewise W&D-20 分段 W&D-20
Abbreviations: MSE, mean squared error; MSLE, mean squared log error; PrecWD, Precise Wide-and-Deep Learning; KNN, k-nearest neighbors; DT-MSE, design technology-mean squared error; DT-MAE, design technology-modal acoustic emission; SVR, support vector regression; SVR-RBF, support vector regression-radial basis function; MLP, multilayer perception; CNN, convolutional neural network; LSTM, long short-term memory networks; BLSTM, bidirectional long short-term memory networks; W&D, Wide and Deep Learning framework.
缩写:MSE,均方误差;MSLE,均方对数误差;PrecWD,精确宽深度学习;KNN,k 最近邻;DT-MSE,设计技术均方误差;DT-MAE,设计技术模态声发射;SVR,支持向量回归;SVR-RBF,支持向量回归-径向基函数;MLP,多层感知器;CNN,卷积神经网络;LSTM,长短期记忆网络;BLSTM,双向长短期记忆网络;W&D,宽深度学习框架。
. .
Table 10. Ablation studies (case study 2).
表 10. 消融研究 (案例研究 2).
Method
Outcome variable: 结果变量:
ADV
Outcome Variable: log 结果变量: 对数
(total view) (add (总浏览量) (添加
published_days as 发布天数作为
a feature)
Outcome Variable: ADV 结果变量: ADV
(add published_days as (添加发布天数作为
a feature)
MSE MSLE MSE MSLE MSE MSLE
PrecWD 140.202 0.728 2.156 0.022 139.583 0.713
PrecWD without Unstructured Component
无结构组件的 PrecWD
PrecWD without Piecewise Linear Component
无分段线性组件的 PrecWD
PrecWD without Second-Order Component
不带二阶分量的 PrecWD
PrecWD without High-order Component
不带高阶分量的 PrecWD
0.848
PrecWD with Simple Linear Encoding
带简单线性编码的 PrecWD
PrecWD with 10 Ordinal One-hot Encoding
使用 10 个序数 One-hot 编码的 PrecWD
PrecWD with 20 Ordinal One-hot Encoding
使用 20 个序数 One-hot 编码的 PrecWD
PrecWD with 10 Ordinal Encoding
使用 10 个序数编码的 PrecWD
PrecWD with 20 Ordinal Encoding
具有 20 个序数编码的 PrecWD
PrecWD without Attention
没有注意力的 PrecWD
Data Sources MSE MSLE MSE MSLE MSE MSLE
All (Ours) 140.202 0.728 2.156 0.022 139.583 0.713
Without Webpage 没有网页
Without Unstructured 没有结构化 0.779 0.776
Without Acoustic 没有声学
Without Description 没有描述
Without Transcript 没有转录 0.770 0.765
Without Channel 没有频道 0.737 0.720
Abbreviations: MSE, mean squared error; MSLE, mean squared log error; PrecWD, Precise Wide-and-Deep Learning. . .
缩写:MSE,均方误差;MSLE,均方对数误差;PrecWD,精确的宽深度学习。 . .

Evaluation of the precise interpretation component (descriptive accuracy)
精确解释组件的评估(描述准确性)

Since our precise interpretation component is the core contribution, this section compares PrecWD's interpretability with the state-of-the-art interpretable frameworks. There is currently no standard quantitative method to evaluate interpretability. Consequently, most computer science studies only show interpretability without an evaluation , 75]. In the business analytics discipline, a few studies have reached a consensus that conducting user studies via lab or field experiments is the most appropriate approach to evaluate ML interpretability . They design surveys to ask participants to rate the interpretability of a model. Following this practice, we design a user study with five groups in Table 12. We recruited 174 students from two national universities in Asia. They were randomly assigned to one of these five groups. We selected nine control variables, and this study passed randomization checks, where the control variables, summary statistics, and randomization -values are reported in online supplementary appendix 12 . The full survey can be found in online supplementary appendix 13.
由于我们的精确解释组件是核心贡献,本节将PrecWD的可解释性与最先进的可解释框架进行比较。目前还没有标准的定量方法来评估可解释性。因此,大多数计算机科学研究只展示可解释性而没有评估。在商业分析学科中,一些研究已经达成共识,通过实验室或现场实验进行用户研究是评估机器学习可解释性的最合适方法。他们设计调查问卷,要求参与者对模型的可解释性进行评分。遵循这一做法,我们设计了一项包括五组的用户研究,详见表12。我们从亚洲两所国立大学招募了174名学生。他们被随机分配到这五组中的一组。我们选择了九个控制变量,并且这项研究通过了随机化检查,其中控制变量、总结统计数据和随机化值在在线附录12中报告。完整的调查问卷可以在在线附录13中找到。
The participants were assigned an ML model to predict the daily viewership of a YouTube video. We showed them the variables the model uses and the weights of the variables. We disclosed that the more reasonable these variables and weights are, the more accurate the prediction would be, and that their compensation is positively related to the prediction accuracy. To ensure the participants understand how to read the variables and weights, we designed a common training session for all participants. To avoid imposing any bias, the training session is context-free, and the variables are pseudo-coded as variables 1-7. We displayed a pseudo model in the training (Online supplementary appendix 14). We informed them the weight of a variable indicates its importance. We showed an example: "If
参与者被分配了一个机器学习模型,用于预测 YouTube 视频的日观看量。我们向他们展示了模型使用的变量和变量的权重。我们披露说,这些变量和权重越合理,预测就越准确,并且他们的补偿与预测准确性呈正相关。为了确保参与者了解如何阅读这些变量和权重,我们为所有参与者设计了一个通用的培训课程。为了避免施加任何偏见,培训课程是无上下文的,变量被伪编码为变量 1-7。我们在培训中展示了一个伪模型(在线补充附录 14)。我们告诉他们,变量的权重表示其重要性。我们展示了一个例子:“如果
Table 12. User study groups.
表 12. 用户研究组。
Group Model Rationale
A PrecWD Our model
B W&D Best-performing interpretable baseline
最佳可解释基准线
C Piecewise W&D 分段 W&D State-of-the-art model-based interpretable model
最先进的基于模型的可解释模型
D SHAP (using our prediction model)
SHAP(使用我们的预测模型)
State-of-the-art post hoc explanation model
最先进的事后解释模型
E VAE-based model 基于 VAE 的模型 Best-performing generative baseline
最佳表现的生成基准
Abbreviations: PrecWD, Precise Wide-and-Deep Learning; W&D, Wide and Deep Learning framework; SHAP, Shapley Additive Explanations; VAE, variational autoencoders.
缩写:PrecWD,Precise Wide-and-Deep Learning;W&D,Wide and Deep Learning framework;SHAP,Shapley Additive Explanations;VAE,变分自编码器。
the weight of a variable is 0.3 , this means increasing this variable by 1 unit, the predicted viewership will increase by 0.3 units." After that, we designed the following two test questions to teach them how to read a model. If the participants choose an incorrect answer, an error message and a hint will appear on the screen (Online supplementary appendix 15). They need to find the correct answer before proceeding to the next page. This learning process ensures they can understand how to read the interpretation of a model. The pseudo model, test questions, error message, and hint wording are the same for all groups.
变量的权重为 0.3,这意味着增加这个变量 1 个单位,预测的收视率将增加 0.3 个单位。“之后,我们设计了以下两个测试问题来教导他们如何读取模型。如果参与者选择了不正确的答案,屏幕上将出现错误消息和提示(见在线补充附录 15)。在继续到下一页之前,他们需要找到正确的答案。这个学习过程确保他们能够理解如何读取模型的解释。伪模型、测试问题、错误消息和提示的措辞对所有组都是相同的。
Question 1: According to the above figure, when using the aforementioned model to predict the daily viewership of videos, what are the top two essential variables that have positive effects? (Options: Variable 1, Variable 2, Variable 3, Variable 4, Variable 5, Variable 6, Variable 7)
问题 1:根据上述图表,当使用上述模型预测视频的日收视率时,哪两个基本变量具有正效应?(选项:变量 1、变量 2、变量 3、变量 4、变量 5、变量 6、变量 7)
Question 2: According to the weights in the figure, if variable 6 increases by 1 unit, how will the aforementioned model prediction of video viewership change? (Options: Increase by 0.3 unit, Increase by 0.6 unit, Decrease by 0.3 unit, Decrease by 0.6 unit)
问题 2:根据图中的权重,如果变量 6 增加 1 个单位,上述视频观看量模型的预测会如何变化?(选项:增加 0.3 个单位,增加 0.6 个单位,减少 0.3 个单位,减少 0.6 个单位)
We first ask the participants to watch the same YouTube video to familiarize the prediction target and context. This is a health video for tobacco education from WHO. We then show them a screenshot of the video webpage (Online supplementary appendix 16), depicting the
我们首先要求参与者观看相同的 YouTube 视频,以熟悉预测目标和背景。这是来自世界卫生组织的烟草教育健康视频。然后我们向他们展示视频网页的截图(在线补充附录 16),
Figure 6. Our model in user study 1.
图 6. 我们在用户研究 1 中的模型。

potential variables we can use for prediction. Then, we show them the variables and weights of their assigned model (Figure 6). We choose the seven most important variables by weight, because seven is considered the "Magical Number" in psychology and is the limit of our capacity to process short-term information [45]. To help the participants fully understand Figure 6, we design the following four test questions. If the participants choose an incorrect answer, an error message and a hint will appear on the screen, as shown in online supplementary appendix 17 . They need to find the correct answer before proceeding to the next page. This learning process aims to teach them to understand the variables and weights of their assigned model. The wordings of the error message, hint, and test questions are the same across all groups.
我们可以用来预测的潜在变量。然后,我们向他们展示他们分配模型的变量和权重(图6)。我们通过权重选择了七个最重要的变量,因为七被认为是心理学中的“魔幻数字”,也是我们处理短期信息能力的极限[45]。为了帮助参与者充分理解图6,我们设计了以下四道测试题。如果参与者选择了错误答案,屏幕上将出现错误信息和提示,如在线补充附录17所示。他们需要找到正确答案才能进入下一页。这个学习过程旨在教会他们理解其分配模型的变量和权重。错误信息、提示和测试题的掐字都在所有组中相同。
Question 3: According to the aforementioned figure, when using the above model to predict video viewership, please rank the following variables from the most important to the least important. Please put the most important variable on the top and the least important on the bottom. Hint: The importance of a variable can be measured by the weight of the variable. (Options: The seven variables in a randomized order)
问题 3: 根据上述图表,在使用上述模型预测视频观看量时,请将以下变量从最重要到最不重要进行排名。请将最重要的变量放在顶部,最不重要的放在底部。提示: 变量的重要性可以通过变量的权重来衡量。(选项: 七个变量以随机顺序)
Question 4: What are the top 2 most essential variables in the aforementioned model? (Options: Four variables)
问题 4: 在上述模型中,最重要的两个变量是什么?(选项: 四个变量)
Question 5: If the creator of the aforementioned video would like to increase video viewership, increasing/decreasing which variable is more effective? (Options: Four variables)
问题 5: 如果上述视频的创作者想要增加视频观看量,增加/减少哪个变量更有效?(选项: 四个变量)
Question 6: According to the aforementioned figure, if the variable in the bottom row increases by 1 unit, how will the model prediction of video viewership change? (Options: Four changes)
问题 6:根据上述图表,如果底行中的变量增加 1 个单位,视频观看量的模型预测会如何变化?(选项:四种变化)
After answering these questions correctly, the participants have a good understanding of the assigned model. We then ask them to rate the interpretability of their assigned model. Following literature review, we use two metrics of interpretability: trust in automated systems and model usefulness, adopted from Chai et al. [11] and Adams et al. [4]. The Cronbach's Alpha is 0.963 for the trust in automated systems scale, and 0.975 for the usefulness scale, suggesting excellent reliability. The factor loadings are shown in online supplementary appendix 18 , showing great validity. We designed an attention check question in the scales ("Please just select neither agree nor disagree"). After removing those who failed the attention check, 140 participants remained. We perform -tests in Table 13 on PrecWD and the baseline groups to compare interpretability.
在正确回答这些问题之后,参与者对分配的模型有了很好的理解。然后我们要求他们评价他们分配的模型的可解释性。根据文献综述,我们使用了两个可解释性指标:信任自动化系统和模型有用性,这些指标源自 Chai 等人 [11] 和 Adams 等人 [4]。信任自动化系统量表的克伦巴赫α系数为 0.963,有用性量表的克伦巴赫α系数为 0.975,表明具有极高的可靠性。因子载荷显示在在线补充附录 18 中,表明具有很高的效度。我们在量表中设计了一个注意力检查问题(“请只选择不同意也不同意”)。在移除未通过注意力检查的参与者后,剩下 140 名参与者。我们在表 13 中对 PrecWD 和基准组进行了 t 检验,以比较可解释性。
Table 13. Interpretability comparison of PrecWD and interpretable methods.
表 13。PrecWD 和可解释方法的可解释性比较。
Group Mean of Interpretability: Trust
可解释性的含义:信任
Mean of Interpretability: Usefulness
可解释性的含义:有用性
PrecWD 2.183 2.101
W&D
Piecewise W&D 分段 W&D
SHAP
VAE
Abbreviations: PrecWD, Precise Wide-and-Deep Learning; W&D, Wide and Deep Learning framework; SHAP, Shapley Additive Explanations; VAE, variational autoencoders.
缩写:PrecWD,精确宽深度学习;W&D,宽深度学习框架;SHAP,Shapley 加性解释;VAE,变分自动编码器。
Table 14. Number of participants in original and final model.
表 14. 原始模型和最终模型中参与者数量。
Original: PrecWD 原始:PrecWD Original: Baseline 原文: 基线
Switch to PrecWD 切换到 PrecWD 21 84
Switch to Baseline 切换到基线 7 28
Abbreviations: PrecWD, Precise Wide-and-Deep Learning.
缩写:PrecWD,精确宽深学习。
PrecWD has significantly better interpretability than the baseline models. Such improvement is attributed to PrecWD's ability to capture the precise feature effects that lead to the most reasonable and trustworthy variables and ranking. Our model suggests that description readability, medical knowledge, and video sentiment are the most influential variables in predicting the viewership of the given WHO video. These variables are in line with the literature which documented that readability [69], medical knowledge [42], and sentiment [69] are the driving factors for social media readership. Such alignment with prior knowledge and perception gained users' trust. The feature effects in the baseline models, however, are imprecise due to the mismatch between the main effect and the total effect. Such a feature effect error results in many counter-intuitive variables and importance order in the baseline groups. These counter-intuitive variables are the direct reasons for mistrust in the baseline groups. For instance, W&D shows that the appearance of numbers in the video description is the most important variable, which has a minimal amount to do with the video content. Studies have shown that content characteristics are the leading factors for social media readership [32]. Users may find it difficult to believe the appearance of numbers could predict viewership. SHAP shows more audio tracks reduce viewership, which contradicts common sense, because more audio track options should attract more foreign language viewers. Piecewise W&D and VAE suggest the frequency of two-word phrases is the top variable, while content characteristics, such as medical knowledge and readability, are the least important. Such ranking is the inverse of common understanding and contradicts the literature . These counter-intuitive examples significantly reduce users' trust in these models.
PrecWD的可解释性明显优于基准模型。这种改进归因于PrecWD捕捉到导致最合理和可信的变量和排名的精确特征效应的能力。我们的模型表明,描述可读性、医学知识和视频情感是预测给定世卫组织视频观看量最具影响力的变量。这些变量与文献中记录的可读性[69]、医学知识[42]和情感[69]是社交媒体读者数量的驱动因素一致。这种与先前知识和观念的一致性赢得了用户的信任。然而,基准模型中的特征效应不精确,因为主效应与总效应之间存在不匹配。这种特征效应错误导致基准组中许多反直觉的变量和重要性顺序。这些反直觉的变量是基准组中不信任的直接原因。例如,W&D显示视频描述中数字的出现是最重要的变量,这与视频内容关系不大。 研究表明,内容特征是社交媒体读者数量的主要因素[32]。用户可能会觉得难以相信数字的出现可以预测观看量。SHAP显示更多的音轨会降低观看量,这与常识相矛盾,因为更多的音轨选项应该吸引更多的外语观众。分段W&D和VAE表明,两个词短语的频率是最重要的变量,而内容特征,如医学知识和可读性,则是最不重要的。这种排名与常识相反,与文献相矛盾。这些反直觉的例子显著降低了用户对这些模型的信任。
After the participants rated interpretability, we conduct a supplementary study to investigate which model they would finally adopt. We inform the participants that, if they think the variables and weights of the previous model are not reasonable, they have a chance to change to a different model. Since we incentivized the participants to choose the most reasonable model, their final adoption indicates the model that they trust the most. Then we show the variables and weights of all the five models, similar to Figure 6 . The order of the five models is randomized. We ask which model they would like to finally adopt for the prediction, and we measure the interpretability of the adopted model, reported in Tables 14 and 15.
在参与者评价可解释性之后,我们进行了一项补充研究,以调查他们最终会选择哪个模型。我们告知参与者,如果他们认为先前模型的变量和权重不合理,他们有机会更换到另一个模型。由于我们激励参与者选择最合理的模型,他们最终的选择表明他们最信任的模型。 然后我们展示了所有五个模型的变量和权重,类似于图 6。五个模型的顺序是随机的。我们询问他们最终想采用哪个模型进行预测,并测量所采用模型的可解释性,报告在表 14 和 15 中。
Table 15. Comparison of and time interpretability.
表 15。 时间可解释性的比较。
Interpretability Measurement
可解释性测量
Mean of 1
st
Trust
Mean of Time 时间的平均值
Trust
Mean of Time 时间的平均值
Usefulness
Mean of 2
nd
Time
PrecWD PrecWD PrecWD 的平均值 PrecWD 2.418 2.286 2.500
Baseline PrecWD
基准 精确宽深度
1.054 0.996
Abbreviations: PrecWD, Precise Wide-and-Deep Learning.
缩写:PrecWD,精确宽而深学习。
Figure 7. Models in user study 2 (Left: PrecWD, Right: SHAP).
图 7. 用户研究 2 中的模型(左:PrecWD,右:SHAP)。
Table 16. User study 2.
表 16. 用户研究 2.
Group Mean of Interpretability: Trust
可解释性的平均值:信任
Mean of Interpretability: Usefulness
可解释性的平均值:有用性
PrecWD 2.080 2.007
SHAP
Abbreviations: PrecWD, Precise Wide-and-Deep Learning; SHAP, Shapley Additive Explanations.
缩写:PrecWD,Precise Wide-and-Deep Learning;SHAP,Shapley Additive Explanations。
A total of 105 participants ( 75 percent) finally adopted our model. Table 15 shows that, for those who finally adopted our model, the second-time interpretability is higher than the first time. This is because after the participants see all five models, the relative advantage of our model is even more obvious, causing them to rate the interpretability of our model higher the second time.
总共有 105 名参与者(75%)最终采用了我们的模型。表 15 显示,对于最终采用我们模型的人来说,第二次的可解释性高于第一次。这是因为参与者在看到所有五个模型后,我们模型的相对优势更加明显,导致他们第二次评价我们模型的可解释性更高。
SHAP's interpretation only shows the feature importance. It cannot reveal the unitincrease effect as our model does. To compare our model and SHAP in the same format, we conduct a second user study with two groups from a university in Asia. For each group, we only show the feature importance without the unit-increase effect (the weight), depicted in Figure 7. Such display shows the variables, their relative importance, and effect direction, in which our model and SHAP are already significantly different. Such a difference sufficiently attributes to interpretability variance. The second user study uses the same control variables and deploys a similar training session, randomization checks, and attention checks, reported in Appendices 19-20. After removing those who failed the attention checks, 55 participants remained. Table 16 suggests our model's interpretability still outperforms SHAP.
SHAP 的解释仅显示特征重要性。它无法像我们的模型那样揭示单位增加效果。为了以相同的格式比较我们的模型和 SHAP,我们在亚洲的一所大学进行了第二次用户研究,分为两组 。对于每组,我们只显示特征重要性,没有单位增加效果(权重),如图 7 所示。这种显示显示了变量、它们相对重要性和效果方向,其中我们的模型和 SHAP 已经有了明显的不同。这样的差异足以归因于可解释性的差异。第二次用户研究使用相同的控制变量,并进行了类似的训练会话、随机化检查和注意力检查,报告在附录 19-20 中。在移除了未通过注意力检查的人员后,还剩下 55 名参与者。表 16 表明我们的模型的可解释性仍然优于 SHAP。

Discussion 讨论

Implications to IS knowledge base
对信息系统知识库的影响

In line with the design science research guidelines [31], this study identifies an impactful problem in social media analytics: video viewership prediction and interpretation. We developed a novel information system that predicts video viewership while interpreting the predictors. We conducted comprehensive evaluations and interpretations of the information system and design two user studies to assess its utility. This study also fits in the computational genre of design science research [51]. Our study develops an interdisciplinary approach that involves a novel computational algorithm and an analytical solution to a major societal problem,
符合设计科学研究指南[31],本研究确定了社交媒体分析中一个有影响力的问题:视频观看量的预测和解释。我们开发了一个新颖的信息系统,可以预测视频观看量并解释预测因素。我们对信息系统进行了全面评估和解释,并设计了两项用户研究来评估其效用。本研究还符合设计科学研究的计算流派[51]。我们的研究发展了一种跨学科方法,涉及一种新颖的计算算法和解决重大社会问题的分析解决方案,

thus holding great potential for generating IS research with significant societal impact , .
因此具有产生具有重大社会影响的信息系统研究的巨大潜力

Implications to methodology and IS design theory
对方法论和信息系统设计理论的影响

PrecWD uniquely models unstructured data and proposes a novel interpretation component to offer precise total effect and its dynamic changes. Our model shows two generalizable design principles for model development: 1) Generative models can assist the interpretation of predictive models (based on the model interpretation and the user study result); 2) Raw unstructured data can complement crafted features in prediction (based on Tables 6 and 10). These design principles along with PrecWD provide a "nascent design theory" for the design science paradigm of IS studies. Using these design principles, PrecWD can be generalized to understand the underlying factors of predictive analytics in other problem domains.
PrecWD 独特地对非结构化数据进行建模,并提出了一种新颖的解释组件,以提供精确的总效应及其动态变化。我们的模型展示了两个通用的设计原则用于模型开发:1)生成模型可以辅助预测模型的解释(基于模型解释和用户研究结果);2)原始非结构化数据可以在预测中补充精心设计的特征(基于表 6 和表 10)。这些设计原则连同 PrecWD 为信息系统研究的设计科学范式提供了一个“新生设计理论”。利用这些设计原则,PrecWD 可以推广到理解其他问题领域中预测分析的潜在因素。

Implications to practice
实践意义

This study offers many practical implications. For video-sharing sites, PrecWD is a deployable analytics system that can predict video viewership and offer the interpretation of the prediction. Our model provides insights to monitor video popularity where credible videos can be approved and violative videos can be banned before they are published, thus minimizing widespread infiltration of violative videos. Content creators can leverage our model to predict video viewership in order to determine where to allocate more promotional funds. Our interpretation ensures the trust of the model and gives content creators actionable directions to understand the importance of features and optimize the prioritization of video features in their workflow.
本研究提供了许多实际意义。对于视频分享网站,PrecWD 是一个可部署的分析系统,可以预测视频的观看量并提供对预测结果的解释。我们的模型提供了监控视频受欢迎程度的见解,可在视频发布之前批准可信视频并禁止违规视频,从而最大程度地减少违规视频的广泛渗透。内容创作者可以利用我们的模型来预测视频的观看量,以确定在哪里分配更多的推广资金。我们的解释确保了模型的可信度,并为内容创作者提供了可操作的方向,以了解特征的重要性并优化视频特征在其工作流程中的优先级。

Limitations and future directions
限制和未来方向

First, we focus on long-form videos. Future work can design new features related to the song templates of short-form videos. New models can be devised to consider the implicit relationship between short-form video content and background song templates. Second, content creators could pay for promotion to increase video exposure, which cannot be observed publicly. Our features may miss the influence of paid promotions. Nevertheless, this issue is mitigated because most of our videos are collected from non-profit health organizations that publish educational videos. They have little monetary incentive to pay for promotions. Third, recommendations, shares on social networks, long tail, and Mathew effects could influence viewership. Omitting them is another limitation. It is challenging to estimate these effects on a video as they vary significantly among different viewers. We did not attempt to capture these effects in our features. However, the video features reflect the video type, quality, visual and audio effects, and more, which affect whether users would actually watch a recommended video. Channel-level features (e.g., verification) also implicitly characterize the social network, long tail, and Mathew effects, because verified channels are usually more influential. Therefore, our features partially mitigate this limitation. Fourth, since SHAP and model-based methods have different underlying interpretation mechanisms. Quantitatively comparing them in the absolute same presentation is not achievable. User study 2 offers the most approximate comparison design. Nevertheless,
首先,我们专注于长视频。未来的工作可以设计与短视频歌曲模板相关的新功能。可以设计新模型来考虑短视频内容与背景歌曲模板之间的隐含关系。其次,内容创作者可以付费推广以增加视频曝光,这是无法公开观察到的。我们的功能可能会忽略付费推广的影响。然而,这个问题得到缓解,因为我们大部分的视频都是从发布教育视频的非营利性健康组织收集的。他们几乎没有动机为推广付费。第三,推荐、在社交网络上分享、长尾效应和马修效应可能会影响观看量。省略它们是另一个限制。估计这些效应对视频的影响是具有挑战性的,因为它们在不同的观众之间变化很大。我们没有尝试在我们的功能中捕捉这些效应。然而,视频特征反映了视频类型、质量、视觉和音频效果等因素,这些因素会影响用户是否会实际观看推荐的视频。频道级别的特征(例如。验证)也隐含地表征了社交网络、长尾和马修效应,因为经过验证的频道通常更具影响力。因此,我们的特征部分地缓解了这一限制。第四,由于 SHAP 和基于模型的方法具有不同的基础解释机制。在绝对相同的呈现中定量比较它们是不可实现的。用户研究 2 提供了最接近的比较设计。然而,

post hoc methods like SHAP have been repeatedly recommended against usage because of their unfaithful, vulnerable, and unstable issues. Fifth, as a shared limitation of most interpretable ML models, the interpretation implies correlation not causation. While such correlation can be used to derive many actionable recommendations, these models cannot directly use the feature weights to derive causal actions. To derive causal actions, studies can incorporate econometric models into ML models. Such a causal ML discipline is a largely different yet interesting paradigm for future work.
后续方法如 SHAP 已经因其不忠实、脆弱和不稳定的问题而被反复建议不要使用。第五,作为大多数可解释的 ML 模型的共同限制,解释意味着相关性而不是因果关系。虽然这种相关性可以用来得出许多可操作的建议,但这些模型不能直接使用特征权重来得出因果行动。为了得出因果行动,研究可以将计量经济模型纳入 ML 模型。这样的因果 ML 学科是一个在未来工作中大不相同但有趣的范式。

Conclusion 结论

This study proposes PrecWD for viewership prediction and interpretation. To address the pitfalls of prior interpretable frameworks, our study incorporates an unstructured component and innovatively captures the precise total effect as well as its dynamic changes. Empirical results indicate PrecWD outperforms strong baselines. Two user studies confirm that the interpretability of PrecWD is significantly better than other interpretable methods, particularly in improving trust and model usefulness. These findings offer implementable actions for content creators and video-sharing sites to optimize the video production process and manage content quality.
本研究提出了 PrecWD 用于观众预测和解释。为了解决先前可解释框架的缺陷,我们的研究结合了一个非结构化组件,并创新地捕捉了精确的总效应以及其动态变化。实证结果表明 PrecWD 优于强基线。两项用户研究证实,PrecWD 的可解释性显著优于其他可解释方法,特别是在提高信任和模型有用性方面。这些发现为内容创作者和视频分享站点提供了可实施的行动,以优化视频制作过程并管理内容质量。

Notes 注意事项

1 In May 2020, a video called "Plandemic" featured a prominent anti-vaxxer falsely claiming that billionaires were helping to spread the virus to increase use of vaccines. By the time YouTube removed the video, it had already hit 7.1 million views [63]. Other examples are in online supplementary appendix 1 .
2020 年 5 月,一部名为“Plandemic”的视频中,一位著名的反疫苗接种者虚假声称亿万富翁正在帮助传播病毒以增加疫苗的使用。YouTube 删除该视频时,它已经获得了 710 万次观看。其他示例请参见在线附录 1。
2 This unit effect is consistent with the interpretation format of linear regression. Although the prediction capability of linear regression is weak, it offers an easily understandable and largely accepted interpretation mechanism. The weight of a variable is usually interpreted as when increases one unit, will increase . This unit effect format has been commonly adopted in many interpretable machine learning studies for various applications [24]. Readability is the Flesch Reading Ease, formulated as: , which is the most popular and the most widely tested and used readability measurement by marketers, research communicators, and policy writers, among many others. Increasing readability means using fewer words in a sentence and using words with fewer syllables.
这种单位效应与线性回归的解释格式一致。尽管线性回归的预测能力较弱,但它提供了一种易于理解且广泛接受的解释机制。变量的权重通常被解释为当一个单位增加时,将会增加。这种单位效应格式已经在许多可解释的机器学习研究中被广泛采用,用于各种应用。可读性是弗莱施阅读易度,公式为:,这是最受欢迎和最广泛测试和使用的可读性测量标准,被市场营销人员、研究传播者和政策撰写者等许多人使用。提高可读性意味着在句子中使用更少的词汇和使用音节更少的词汇。
3 After the survey, we disclosed how their model performed in relative to the other four models. We compensated them with different-valued office supplies in the end, according to the model performance ranking.
调查后,我们披露了他们的模型相对于其他四个模型的表现。最终,根据模型表现排名,我们用不同价值的办公用品对他们进行了补偿。

Disclosure statement 披露声明

No potential conflict of interest was reported by the authors.
作者未报告任何潜在利益冲突。

Funding 资金支持

This research was carried out with the support of the "University of Delaware General University Research" fund. Yidong Chai is supported by National Natural Science Foundation of China (72293581, 91846201, 72293580, 72188101).
本研究得到了“特拉华大学总大学研究”基金的支持。柴一东得到了中国国家自然科学基金的支持(72293581,91846201,72293580,72188101)。

Notes on contributors 贡献者说明

Jiaheng Xie (jxie@udel.edu) is an Assistant Professor in the Department of Accounting & MIS at the University of Delaware's Alfred Lerner College of Business and Economics. His research interests are interpretable deep learning, health risk analytics, and business analytics. His prior works have been published in premier journals, including MIS Quarterly and Journal of Management Information Systems.
谢家恒(jxie@udel.edu)是特拉华大学阿尔弗雷德勒纳商学院会计与管理信息系统系的助理教授。他的研究兴趣包括可解释的深度学习、健康风险分析和商业分析。他的先前作品已发表在包括《管理信息系统季刊》和《管理信息系统杂志》在内的顶级期刊中。
Yidong Chai (chaiyd@hfut.edu.cn; corresponding author) received his PhD at Tsinghua University, China. He is a researcher in the School of Management of Hefei University of Technology, Philosophy and Social Sciences Laboratory of Data Science and Smart Society Governance of Ministry of Education, and Key Laboratory of Philosophy and Social Sciences for Cyberspace Behaviour and Management, in China. Dr. Chai's research interests include machine learning, cybersecurity, business intelligence, and health informatics.
柴一栋(chaiyd@hfut.edu.cn;通讯作者)在中国清华大学获得博士学位。他是中国合肥工业大学管理学院、中国教育部数据科学与智能社会治理哲学与社会科学实验室以及网络空间行为与管理哲学与社会科学重点实验室的研究员。柴博士的研究兴趣包括机器学习、网络安全、商业智能和健康信息学。
Xiao Liu (xiao.liu.10@asu.edu) is an Assistant Professor in the Department of Information Systems at Arizona State University. She received her PhD in Management Information Systems from the Eller College of Management at the University of Arizona. Dr. Liu's research interests include data science and predictive analytics in healthcare, education, and fintech. Her work has appeared in several academic journals and peer-reviewed conferences, such as MIS Quarterly, Journal of Management Information Systems, Journal of Medical Internet Research, Journal of the American Medical Informatics Association, and the Proceedings of International Conference in Information Systems, among others.
刘晓(xiao.liu.10@asu.edu)是亚利桑那州立大学信息系统系的助理教授。她在亚利桑那大学埃勒管理学院获得管理信息系统博士学位。刘博士的研究兴趣包括数据科学和预测分析在医疗保健、教育和金融科技领域的应用。她的研究成果发表在多个学术期刊和同行评议会议上,如《管理信息系统季刊》、《管理信息系统杂志》、《医学互联网研究杂志》、《美国医学信息学协会杂志》以及国际信息系统会议论文集等。

References 参考文献

  1. Abbasi, A.; Albrecht, C.; Vance, A.; and Hansen, J. Metafraud: Ameta-learning framework for detecting financial fraud. MIS Quarterly, 36, 4 (2012), 1293-1327.
    Abbasi, A.; Albrecht, C.; Vance, A.; 和 Hansen, J. Metafraud: 用于检测金融欺诈的元学习框架。《管理信息系统季刊》,36 卷,4 期(2012 年),1293-1327。
  2. Abbasi, A.; Zhang, Z.; Zimbra, D.; Chen, H.; and Nunamaker, J.F. Detecting fake websites: The contribution of statistical learning theory. MIS Quarterly, 34, 3 (2010), 435-461.
    Abbasi, A.; 张, Z.; 金布拉, D.; 陈, H.; and Nunamaker, J.F. 检测假网站: 统计学习理论的贡献. 管理信息系统季刊, 34, 3 (2010), 435-461.
  3. Abbasi, A.; Zhou, Y.; Deng, S.; and Zhang, P. Text analytics to support sense-making in social media: A language-action perspective. MIS Quarterly, 42, 2 (2018), 427-464.
    Abbasi, A.; 周, Y.; 邓, S.; and 张, P. 文本分析支持社交媒体中的意义构建: 语言行动视角. 管理信息系统季刊, 42, 2 (2018), 427-464.
  4. Adams, D.A.; Nelson, R.R.; and Todd, P.A. Perceived usefulness, ease of use, and usage of information technology: A replication. MIS Quarterly, 16, 2 (1992), 227-247.
    Adams, D.A.; Nelson, R.R.; and Todd, P.A. 感知有用性, 使用便捷性, 和信息技术的使用: 一项复制研究. 管理信息系统季刊, 16, 2 (1992), 227-247.
  5. Agarwal, R., Melnick, L., Frosst, N., Zhang, X., Lengerich, B., Caruana, R., and Hinton, G.E. Neural additive models: Interpretable machine learning with neural nets. In Advances in Neural Information Processing Systems, 2021, 34, pp. 4699-4711.
    Agarwal, R., Melnick, L., Frosst, N., Zhang, X., Lengerich, B., Caruana, R., and Hinton, G.E. 神经添加模型:具有神经网络的可解释机器学习。在神经信息处理系统的进展中,2021 年,34 卷,第 4699-4711 页。
  6. Akpinar, E.; and Berger, J. Valuable virality. Journal of Marketing Research, 54, 2 (2017), 318-330.
    Akpinar, E.; 和 Berger, J. 有价值的病毒性。市场营销研究杂志,54 卷,2 期(2017 年),318-330 页。
  7. Alqahtani, H.; Kavakli-Thorne, M.; and Kumar, G. Applications of Generative Adversarial Networks (GANs): An updated review. Archives of Computational Methods in Engineering, 28, (2021), 525-552.
    Alqahtani, H.; Kavakli-Thorne, M.; 和 Kumar, G. 生成对抗网络(GANs)的应用:最新综述。工程计算方法档案,28 卷(2021 年),525-552 页。
  8. Bastani, O.; Kim, C.; and Bastani, H. Interpreting blackbox models via model extraction. arXiv, arXiv preprint arXiv:1705.08504. (2017).
    Bastani, O.; Kim, C.; 和 Bastani, H. 通过模型提取解释黑盒模型. arXiv, arXiv 预印本 arXiv:1705.08504. (2017).
  9. Burel, G.; Saif, H.; and Alani, H. Semantic wide and deep learning for detecting crisis-information categories on social media. In International Semantic Web Conference. 2017, pp. 138-155.
    Burel, G.; Saif, H.; 和 Alani, H. 用于在社交媒体上检测危机信息类别的语义宽深学习. 在国际语义网会议上. 2017, pp. 138-155.
  10. Caruana, R.; Lou, Y.; Gehrke, J.; Koch, P.; Sturm, M.; and Elhadad, N. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. Proceedings of KDD, (2015), 1721-1730.
    Caruana, R.; Lou, Y.; Gehrke, J.; Koch, P.; Sturm, M.; 和 Elhadad, N. 用于医疗保健的可理解模型: 预测肺炎风险和医院 30 天再入院. KDD 会议论文集, (2015), 1721-1730.
  11. Chai, S.; Das, S.; and Rao, H.R. Factors affecting bloggers' knowledge sharing: An investigation across gender. Journal of Management Information Systems, 28, 3 (2014), 309-342.
    Chai, S.; Das, S.; 和 Rao, H.R. 影响博客知识分享的因素:跨性别调查。《管理信息系统杂志》,28 卷,3 期(2014 年),309-342 页。
  12. Chai, Y.; Li, W.; Zhu, B.; Liu, H.; and Jiang, Y. An interpretable wide and deep model for online disinformation detection. SSRN Electronic Journal, SSRN 3879632 (2022).
    Chai, Y.; Li, W.; Zhu, B.; Liu, H.; 和 Jiang, Y. 一种可解释的在线虚假信息检测宽深模型。《SSRN 电子期刊》,SSRN 3879632(2022 年)。
  13. Cheng, H.T.; Koc, L.; Harmsen, J.; Shaked, T.; Chandra, T.; Aradhye, H.; Anderson, G.; Corrado, G.; Chai, W.; Ispir, M.; Anil, R. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 2016, pp. 7-10.
    Cheng, H.T.; Koc, L.; Harmsen, J.; Shaked, T.; Chandra, T.; Aradhye, H.; Anderson, G.; Corrado, G.; Chai, W.; Ispir, M.; Anil, R. 推荐系统的宽深学习。在第 1 届深度学习推荐系统研讨会论文集中。2016 年,第 7-10 页。
  14. Cichy, P.; Salge, T.O.; and Kohli, R. Privacy concerns and data sharing in the internet of things: Mixed methods evidence from connected cars. MIS Quarterly, 45, 4 (2021), 1863-1891.
    Cichy, P.; Salge, T.O.; and Kohli, R. 隐私关注与物联网数据共享:来自连接汽车的混合方法证据。 MIS Quarterly,45,4(2021),1863-1891。
  15. Croitoru, F.-A.; Hondru, V.; Ionescu, R.T.; and Shah, M. Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 8 (2022), 1-22.
    Croitoru, F.-A.; Hondru, V.; Ionescu, R.T.; and Shah, M. 视觉中的扩散模型:一项调查。IEEE 模式分析与机器智能交易,14,8(2022),1-22。
  16. Dhurandhar, A.; Iyengar, V.; Luss, R.; and Shanmugam, K. TIP: Typifying the interpretability of procedures. arXiv, preprint arXiv:1706.02952. (June 2017).
    Dhurandhar, A.; Iyengar, V.; Luss, R.; and Shanmugam, K. TIP:程序可解释性的典型化。arXiv,预印本 arXiv:1706.02952。 (2017 年 6 月)。
  17. Dong, L.I.U.; Yue, L.I.; Jianping, L.I.N.; Houqiang, L.I.; and Feng, W.U. Deep learning-based video coding: A review and a case study. ACM Computing Surveys, 53, 1 (2019), 1-35.
    董,刘;岳,李;建平,林;厚强,李;和风,吴。基于深度学习的视频编码:综述和案例研究。ACM 计算调查,53,1(2019),1-35。
  18. Duan, W.; Gu, B.; and Whinston, A.B. The dynamics of online word-of-mouth and product sales-An empirical investigation of the movie industry. Journal of Retailing, 84, 2 (2008), 233-242.
    段,W.;顾,B.;和温斯顿,A.B.在线口碑和产品销售的动态-电影行业的实证研究。零售杂志,84,2(2008),233-242。
  19. Ebrahimi, M.; Chai, Y.; Samtani, S., and Chen, H. Cross-lingual cybersecurity analytics in the international dark web with adversarial deep representation learning. MIS Quarterly, 46, 2 (2022) 1209-1226.
    埃布拉希米,M.;柴,Y.;萨姆塔尼,S.;和陈,H.跨语言网络安全分析在国际暗网中的应用与对抗性深度表示学习。MIS 季刊,46,2(2022)1209-1226。
  20. Ebrahimi, M.; Nunamaker, J.F.; and Chen, H. Semi-supervised cyber threat identification in dark net markets: A Transductive and deep learning approach. Journal of Management Information Systems, 37, 3 (2020), 694-722.
    Ebrahimi, M.; Nunamaker, J.F.; 和 Chen, H. 在暗网市场中的半监督网络威胁识别: 一种传导和深度学习方法。《管理信息系统杂志》,37,3(2020),694-722。
  21. Fang, X.; Hu, P.J.-H.H.; Li, Z. (Lionel) L.; and Tsai, W. Predicting adoption probabilities in social networks. Information Systems Research, 24, 1 (2013), 128-145.
    Fang, X.; 胡, P.J.-H.H.; 李, Z.(Lionel)L.; 和 蔡, W. 在社交网络中预测采纳概率。《信息系统研究》,24,1(2013),128-145。
  22. Ferguson, R. Word of mouth and viral marketing: Taking the temperature of the hottest trends in marketing. Journal of Consumer Marketing, 25, 3 (2008), 179-182.
    Ferguson, R. 口碑营销和病毒式营销: 观察营销中最热门的趋势。《消费者营销杂志》,25,3(2008),179-182。
  23. Fortuna, G.; Schiavo, J.H.; Aria, M.; Mignogna, M.D.; and Klasser, G.D. The usefulness of YouTube videos as a source of information on burning mouth syndrome. Journal of oral rehabilitation, 46, 7 (2019), 657-665.
    Fortuna, G.; Schiavo, J.H.; Aria, M.; Mignogna, M.D.; 和 Klasser, G.D. YouTube 视频作为烧嘴综合症信息来源的实用性。口腔康复杂志, 46, 7 (2019), 657-665.
  24. Garnica-Caparrós, M.; and Memmert, D. Understanding gender differences in professional European football through machine learning interpretability and match actions data. Scientific Reports, 11, 1 (2021), 1-14.
    Garnica-Caparrós, M.; 和 Memmert, D. 通过机器学习可解释性和比赛行为数据了解职业欧洲足球中的性别差异。科学报告, 11, 1 (2021), 1-14.
  25. Goldstein, J.M.; Hofman, D.G.; Wortman Vaughan, J.; Poursabzi-Sangdeh, F.; Goldstein, D.G.; Hofman, J.M.; and Wallach, H. Manipulating and measuring model interpretability. In Conference on Human Factors in Computing Systems. 2021, pp. 1-52.
    Goldstein, J.M.; Hofman, D.G.; Wortman Vaughan, J.; Poursabzi-Sangdeh, F.; Goldstein, D.G.; Hofman, J.M.; 和 Wallach, H. 操纵和衡量模型可解释性。在人机交互会议上。2021, pp. 1-52.
  26. Goyal, Y.; Feder, A.; Shalit, U.; and Kim, B. Explaining Classifiers with Causal Concept Effect (CaCE), arXiv, preprint arXiv:1907.07165 (July 2019).
    Goyal, Y.; Feder, A.; Shalit, U.; 和 Kim, B. 用因果概念效应(CaCE)解释分类器,arXiv,预印本 arXiv:1907.07165(2019 年 7 月)。
  27. Gregor, S.; and Hevner, A. Positioning and presenting design science research for maximum impact. MIS Quarterly, 37, 2 (2013), 337-355.
    Gregor, S.; 和 Hevner, A. 将设计科学研究定位和呈现以实现最大影响。MIS Quarterly,37,2(2013 年),337-355。
  28. Guo, M.; Zhang, Q.; Liao, X.; and Zeng, D.D. An interpretable neural network model through piecewise linear approximation. arXiv, preprint arXiv:2001.07119 (2020).
    Guo, M.; Zhang, Q.; Liao, X.; 和 Zeng, D.D. 通过分段线性逼近实现可解释的神经网络模型。arXiv,预印本 arXiv:2001.07119(2020 年)。
  29. Han, Y.; Chen, W.; Xiong, X.; Li, Q.; Qiu, Z.; and Wang, T. Wide & deep learning for improving named entity recognition via text-aware named entity normalization. In ThirtyThird AAAI Conference on Artificial Intelligence. 2019.
    韩,Y.;陈,W.;熊,X.;李,Q.;邱,Z.;王,T. 通过文本感知命名实体规范化改进命名实体识别的广泛和深入学习。在第 33 届 AAAI 人工智能大会上。2019 年。
  30. Heider, F.; and Simmel, M. An experimental study of apparent behavior. The American Journal of Psychology, 57, 2 (1944), 243-259.
    海德尔,F.;西梅尔,M. 表观行为的实验研究。《美国心理学杂志》,57 卷,2 期(1944 年),243-259 页。
  31. Hevner, A.R.; March, S.T.; Park, J.; and Ram, S. Design science in information systems research. MIS Quarterly, 28, 1 (2004), 75-105.
    Hevner, A.R.;March, S.T.;Park, J.;Ram, S. 信息系统研究中的设计科学。《管理信息系统季刊》,28 卷,1 期(2004 年),75-105 页。
  32. Jaakonmäki, R.; Müller, O.; and vom Brocke, J. The impact of content, context, and creator on user engagement in social media marketing. In HICSS. 2017, pp. 1152-1160.
    Jaakonmäki, R.; Müller, O.; 和 vom Brocke, J. 内容、上下文和创作者对社交媒体营销中用户参与度的影响。在 HICSS。2017 年,页 1152-1160。
  33. Karahanna, E.; Xu, S.X.; Xu, Y.; and Zhang, N. The needs-affordances-features perspective for the use of social media. MIS Quarterly, 42, 3 (2018), 737-756.
    Karahanna, E.; Xu, S.X.; Xu, Y.; 和 Zhang, N. 社交媒体使用的需求-功能-特征视角。MIS Quarterly,42,3(2018),737-756。
  34. Koch, C.; Werner, S.; Rizk, A.; and Steinmetz, R. MIRA: Proactive music video caching using convnet-based classification and multivariate popularity prediction. In 26th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems. 2018, pp. 109-115.
    Koch, C.; Werner, S.; Rizk, A.; 和 Steinmetz, R. MIRA:使用基于 convnet 的分类和多变量流行度预测的主动音乐视频缓存。在第 26 届 IEEE 国际建模、分析和计算机与通信系统模拟研讨会上。2018 年,页 109-115。
  35. Krijestorac, H.; Garg, R.; and Mahajan, V. Cross-platform spillover effects in consumption of viral content: A quasi-experimental analysis using synthetic controls. Information Systems Research, 31, 2 (2020), 449-472.
    Krijestorac, H.; Garg, R.; 和 Mahajan, V. 跨平台溢出效应对病毒内容消费的影响: 使用合成对照的准实验分析。 信息系统研究, 31, 2 (2020), 449-472。
  36. Laugel, T.; Lesot, M.-J.; Marsala, C.; Renard, X.; and Detyniecki, M. The dangers of post-hoc interpretability: Unjustified counterfactual explanations. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19). 2019, pp. 2801-2807.
    Laugel, T.; Lesot, M.-J.; Marsala, C.; Renard, X.; 和 Detyniecki, M. 后事解释可能带来的危险: 不合理的反事实解释。 在第二十八届国际人工智能联合会议论文集 (IJCAI-19) 上的论文。 2019, pp. 2801-2807。
  37. Lee, D.; Manzoor, E.; and Cheng, Z. Focused Concept Miner (FCM): Interpretable deep learning for text exploration. SSRN Electronic Journal, SSRN 3304756 (May 2018).
    Lee, D.; Manzoor, E.; 和 Cheng, Z. 焦点概念挖掘机 (FCM): 用于文本探索的可解释深度学习。 SSRN 电子期刊, SSRN 3304756 (2018 年 5 月)。
  38. Lee, J.W.; and Chan, Y.Y. Fine-grained plant identification using wide and deep learning model. In 2019 International Conference on Platform Technology and Service. 2019, pp. 1-5.
    李, J.W.; 和陈, Y.Y. 使用宽深学习模型进行细粒度植物识别. 在 2019 年国际平台技术与服务会议上. 2019, 页 1-5.
  39. Li, J.; Larsen, K.; and Abbasi, A. Theoryon: A design framework and system for unlocking behavioral knowledge through ontology learning. MIS Quarterly, 44, 4 (2020), 1733-772.
    李, J.; 拉森, K.; 和阿巴西, A. Theoryon: 通过本体学习解锁行为知识的设计框架和系统. MIS Quarterly, 44, 4 (2020), 1733-772.
  40. Lin, Y.K.; Chen, H.; Brown, R.A.; Li, S.H.; and Yang, H.J. Healthcare predictive analytics for risk profiling in chronic care. MIS Quarterly, 41, 2 (2017), 473-495.
    林, Y.K.; 陈, H.; 布朗, R.A.; 李, S.H.; 和杨, H.J. 慢性护理风险分析的医疗保健预测分析. MIS Quarterly, 41, 2 (2017), 473-495.
  41. Lin, Y.-K.; and Fang, X. First, do no harm: Predictive analytics to reduce in-hospital adverse events. Journal of Management Information Systems, 38, 4 (2021), 1122-1149.
    林, Y.-K.; 和方, X. 首先, 不要伤害: 预测分析减少住院不良事件. 管理信息系统杂志, 38, 4 (2021), 1122-1149.
  42. Liu, X.; Zhang, B.; Susarla, A.; and Padman, R. Go to YouTube and Call Me in the Morning: Use of Social Media for Chronic Conditions. MIS Quarterly, 44, 1 (2020), 257-283.
    刘, X.; 张, B.; Susarla, A.; 和 Padman, R. 去 YouTube, 早上给我打电话: 社交媒体在慢性病中的应用. MIS Quarterly, 44, 1 (2020), 257-283.
  43. Lundberg, S.M.; and Lee, S.-I. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 2017. 2017.
    Lundberg, S.M.; 和李, S.-I. 解释模型预测的统一方法. 在 2017 年神经信息处理系统进展中. 2017.
  44. Mai, F.; Shan, Z.; Bai, Q.; Wang, X. (Shane); and Chiang, R.H.L. How does social media impact bitcoin value? A Test of the silent majority hypothesis. Journal of Management Information Systems, 35, 1 (2018), 19-52.
    麦,F.;单,Z.;白,Q.;王,X.(Shane);和蒋,R.H.L. 社交媒体如何影响比特币价值?对沉默大多数假设的检验。管理信息系统杂志,35,1(2018),19-52。
  45. Miller, G.A. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 2 (1956), 81-97.
    米勒,G.A. 七加或减二的神奇数字:我们处理信息的能力的一些限制。心理评论,63,2(1956),81-97。
  46. Miller, T. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267, (2019), 1-38.
    米勒,T. 人工智能中的解释:从社会科学中获得的见解。人工智能,267,(2019),1-38。
  47. Molnar, C. Interpretable Machine Learning. Lulu. com, 2019.
    Molnar, C. 可解释机器学习. Lulu. com, 2019.
  48. Montavon, G.; Binder, A.; Lapuschkin, S.; Samek, W.; and Müller, K.-R. Layer-wise relevance propagation: An overview. Lecture Notes in Computer Science, (2019), 193-209.
    Montavon, G.; Binder, A.; Lapuschkin, S.; Samek, W.; 和 Müller, K.-R. 分层相关性传播: 概述. 计算机科学讲义, (2019), 193-209.
  49. Murdoch, W.J.; Singh, C.; Kumbier, K.; Abbasi-Asl, R.; and Yu, B. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences of the United States of America, 116, 44 (2019), 22071-22080.
    Murdoch, W.J.; Singh, C.; Kumbier, K.; Abbasi-Asl, R.; 和 Yu, B. 可解释机器学习中的定义, 方法和应用. 美国国家科学院院刊, 116, 44 (2019), 22071-22080.
  50. NYTimes. YouTube discloses percentage of views that go to videos that break its rules - The New York Times. 2021. Accessed 16 April 2023. https://www.nytimes.com/2021/04/06/technol ogy/youtube-views.html .
    纽约时报。YouTube 披露违反其规则的视频所占比例 - 纽约时报。2021 年。访问日期 2023 年 4 月 16 日。https://www.nytimes.com/2021/04/06/technol ogy/youtube-views.html。
  51. Rai, A. Editor's comments: diversity of Design Science Research. MIS Quarterly, 41, 1 (2017), iiixviii.
    Rai, A. 编者点评:设计科学研究的多样性。MIS Quarterly,41 卷,1 期(2017 年),iiixviii。
  52. Ribeiro, M.T.; Singh, S.; and Guestrin, C. "Why should I trust you?" Explaining the predictions of any classifier. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2016), 1135-1144.
    Ribeiro, M.T.; Singh, S.; 和 Guestrin, C. “我为什么应该相信你?”解释任何分类器的预测。ACM SIGKDD 国际知识发现与数据挖掘会议论文集,(2016 年),1135-1144。
  53. Ruby, D. YouTube Statistics (2022) - Updated data, facts & figures shared! DEMANDSAGE, 2022. Accessed April 16, 2023. https://www.demandsage.com/youtube-stats/.
    Ruby,D. YouTube 统计数据(2022)- 更新的数据,事实和数字分享!DEMANDSAGE,2022 年。于 2023 年 4 月 16 日访问。https://www.demandsage.com/youtube-stats/。
  54. Saboo, A.R. Using big data to model time-varying effects for marketing resource (RE) allocation. MIS Quarterly, 40, 4 (2016), 911-939.
    Saboo,A.R. 使用大数据建模时间变化影响进行营销资源(RE)配置。MIS 季刊,40,4(2016),911-939。
  55. Samtani, S.; Chai, Y., and Chen, H. Linking exploits from the dark web to known vulnerabilities for proactive cyber threat intelligence: An attention-based deep structured semantic model MIS Quarterly, 46, 2 (2021), 911-946.
    Samtani,S.; Chai,Y. 和 Chen,H. 将暗网中的利用链接到已知漏洞以实现积极的网络威胁情报:一种基于注意力的深度结构语义模型 MIS Quarterly, 46, 2 (2021), 911-946。
  56. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D., and Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, (2017), 618-626.
    Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D., and Batra, D. Grad-CAM: 通过基于梯度的定位从深度网络获得的视觉解释. 在 IEEE 国际计算机视觉会议论文集中,(2017), 618-626.
  57. Shin, D.; He, S.; Lee, G.M.; Whinston, A.B.; Cetintas, S.; and Lee, K.-C. Enhancing Social Media Analysis with Visual Data Analytics: A Deep Learning Approach. MIS Quarterly, 44, 4 (2020), 1459-1492.
    Shin, D.; He, S.; Lee, G.M.; Whinston, A.B.; Cetintas, S.; 和 Lee, K.-C. 用深度学习方法增强社交媒体分析的视觉数据分析. MIS Quarterly, 44, 4 (2020), 1459-1492.
  58. Siegmund, N.; Kolesnikov, S.S.; Kästner, C.; Apel, S.; Batory, D.; Rosenmüller, M.; Saake, G. Predicting performance via automated feature-interaction detection. In International Conference on Software Engineering (ICSE). IEEE, 2012, pp. 167-177.
    Siegmund, N.; Kolesnikov, S.S.; Kästner, C.; Apel, S.; Batory, D.; Rosenmüller, M.; Saake, G. 通过自动特征交互检测预测性能. 在国际软件工程大会(ICSE)上. IEEE, 2012, pp. 167-177.
  59. Slack, D.; Hilgard, S.; Jia, E.; Singh, S.; and Lakkaraju, H. Fooling LIME and SHAP: Adversarial attacks on post hoc explanation methods. In Proceedings of the AAAI Conference on AI, Ethics, and Society. 2020, pp. 180-186.
    Slack, D.; Hilgard, S.; Jia, E.; Singh, S.; and Lakkaraju, H. 欺骗 LIME 和 SHAP:对事后解释方法的对抗攻击。在 AAAI 人工智能、伦理和社会会议论文集中。2020 年,第 180-186 页。
  60. Stieglitz, S.; and Dang-Xuan, L. Emotions and information diffusion in social media Sentiment of microblogs and sharing behavior. Journal of Management Information Systems, 29, 4 (2013), 217-248.
    Stieglitz, S.; 和 Dang-Xuan, L. 情绪和社交媒体中的信息传播微博情感和分享行为。信息系统管理杂志,29,4 (2013),217-248 页。
  61. Tosun, N.; Sert, E.; Ayaz, E.; Yilmaz, E.; and Gol, M. Solar power generation analysis and forecasting real-world data using LSTM and autoregressive CNN. In In 2020 International Conference on Smart Energy Systems and Technologies. 2020, pp. 1-6.
    Tosun, N.; Sert, E.; Ayaz, E.; Yilmaz, E.; 和 Gol, M. 太阳能发电分析和使用 LSTM 和自回归 CNN 预测实际数据。在 2020 年国际智能能源系统和技术会议上。2020 年,第 1-6 页。
  62. Tsang, M.; Cheng, D.; and Liu, Y. Detecting statistical interactions from neural network weights. In ICLR 2018. 2018.
    曾,M.; 郑,D.; 和 刘,Y. 从神经网络权重中检测统计交互。在 ICLR 2018 中。2018 年。
  63. Vynck, G. de, and Lerman, R. Facebook and YouTube are still full of covid misinformation The Washington Post. The Washington Post, 2021. Accessed April 16, 2023.https://www. washingtonpost.com/technology/2021/07/22/facebook-youtube-vaccine-misinformation/
    Vynck,G. de,和 Lerman,R. Facebook 和 YouTube 仍然充满了关于 COVID 的错误信息 华盛顿邮报。 华盛顿邮报,2021 年。于 2023 年 4 月 16 日访问。https://www. washingtonpost.com/technology/2021/07/22/facebook-youtube-vaccine-misinformation/
  64. Xie, J.; Liu, X.; Zeng, D.; and Fang, X. Understanding reasons for medication nonadherence: An exploration in social media using sentiment-enriched deep learning approach. In ICIS 2017 Proceedings. 2017.
    谢,J.; 刘,X.; 曾,D.; 和 方,X. 理解药物不依从的原因:利用情感丰富的深度学习方法在社交媒体中进行探索。在 ICIS 2017 论文集中。2017 年。
  65. Xie, J.; Liu, X.; Zeng, D.D.; and Fang, X. Understanding medication nonadherence from social media: A sentiment-enriched deep learning approach. MIS Quarterly, 46, 1 (2022), 341-372.
    谢杰;刘欣;曾德德;方晓。从社交媒体理解药物不依从:一种情感丰富的深度学习方法。MIS Quarterly,46,1(2022),341-372。
  66. Xie, J.; and Zhang, B. Readmission risk prediction for patients with heterogeneous hazard: A trajectory-aware deep learning approach. In ICIS 2018 Proceedings. 2018.
    谢杰;张斌。针对具有异质危险的患者的再入院风险预测:一种轨迹感知的深度学习方法。在 ICIS 2018 会议文集中。2018。
  67. Xie, J.; Zhang, Z.; Liu, X.; and Zeng, D. Unveiling the hidden truth of drug addiction: A social media approach using similarity network-based deep learning. Journal of Management Information Systems, 38, 1 (2021), 166-195.
    谢杰;张哲;刘欣;曾德。揭示药物成瘾的隐藏真相:一种基于相似性网络的深度学习社交媒体方法。管理信息系统杂志,38,1(2021),166-195。
  68. Xie, L.; Hu, Z.; Cai, X.; Zhang, W.; and Chen, J. Explainable recommendation based on knowledge graph and multi-objective optimization. Complex & Intelligent Systems 2021 7:3, 7, 3 (2021), 1241-1252.
    谢,李;胡,志;蔡,晓;张,伟;和陈,杰。基于知识图谱和多目标优化的可解释推荐。复杂智能系统 2021 年 7 卷 3 期,1241-1252 页。
  69. Yang, M.; Ren, Y.; and Adomavicius, G. Understanding user-generated content and customer engagement on Facebook business pages. Information Systems Research, 30, 3 (2019), 839-855.
    杨,敏;任,洋;和阿多马维修斯,G。理解用户生成内容和客户在 Facebook 商业页面上的参与。信息系统研究,30 卷 3 期(2019 年),839-855 页。
  70. Ye, H.; Cao, B.; Peng, Z.; Chen, T.; Wen, Y.; and Liu, J. Web services classification based on wide & Bi-LSTM model. IEEE Access, 7, (2019), 43697-43706.
    叶,华;曹,波;彭,忠;陈,涛;温,宇;和刘,军。基于宽和 Bi-LSTM 模型的 Web 服务分类。IEEE Access,7 卷(2019 年),43697-43706 页。
  71. Yu, H.; Xie, L.; and Sanner, S. The lifecyle of a Youtube video: Phases, content and popularity. In International AAAI Conference on Web and Social Media. 2015, pp. 533-542.
    余,H.;谢,L.;和 Sanner,S. Youtube 视频的生命周期:阶段,内容和流行度。在国际 AAAI 网络和社交媒体会议上。2015 年,第 533-542 页。
  72. Yu, S.; Chai, Y.; Chen, H.; Brown, R.A.; Sherman, S.J.; and Nunamaker, J.F. Fall detection with wearable sensors: A hierarchical attention-based convolutional neural network approach. Journal of Management Information Systems, 38, 4 (2021), 1095-1121.
    余,S.;柴,Y.;陈,H.;布朗,R.A.;谢尔曼,S.J.;和 Nunamaker,J.F. 可穿戴传感器的跌倒检测:一种基于分层注意力的卷积神经网络方法。管理信息系统杂志,38 卷,4 期(2021 年),1095-1121 页。
  73. Yu, S.; Chai, Y.; Chen, H.; Sherman, S.J.; and Brown, R.A. Wearable sensor-based chronic condition severity assessment: An adversarial attention-based deep multisource multitask learning approach. MIS Quarterly, Forthcoming, (2022).
    余,S.;柴,Y.;陈,H.;谢尔曼,S.J.;和布朗,R.A. 基于可穿戴传感器的慢性病严重程度评估:一种基于对抗性注意力的深度多源多任务学习方法。MIS Quarterly,即将发表,(2022 年)。
  74. Zafar, M.R.; and Khan, N. Deterministic local interpretable model-agnostic explanations for stable explainability. Machine Learning and Knowledge Extraction, 3, 3 (2021), 525-541.
    Zafar, M.R.; 和 Khan, N. 稳定可解释性的确定性本地可解释模型无关解释。机器学习和知识提取,3, 3 (2021), 525-541.
  75. Zeiler, M.D.; and Fergus, R. Visualizing and understanding convolutional networks. Lecture Notes in Computer Science, (2014), 818-833.
    Zeiler, M.D.; 和 Fergus, R. 可视化和理解卷积网络。计算机科学讲义,(2014), 818-833.
  76. Zhang, D.; Zhou, L.; Kehoe, J.L.; and Kilic, I.Y. What online reviewer behaviors really matter? Effects of verbal and nonverbal behaviors on detection of fake online reviews. Journal of Management Information Systems, 33, 2 (April 2016), 456-481.
    Zhang, D.; Zhou, L.; Kehoe, J.L.; 和 Kilic, I.Y. 真正重要的在线评论者行为是什么?言语和非言语行为对检测虚假在线评论的影响。管理信息系统杂志,33, 2 (2016 年 4 月), 456-481.
  77. Zhu, H.; Samtani, S.; Brown, R.A.; and Chen, H. A deep learning approach for recognizing activity of daily living (adl) for senior care: Exploiting interaction dependency and temporal patterns. MIS Quarterly, 45, 2 (2021), 859-896.
    朱,H.;Samtani,S.;布朗,R.A.;和陈,H. 一种用于识别老年护理日常生活活动(ADL)的深度学习方法:利用交互依赖性和时间模式。MIS Quarterly,45,2(2021),859-896。
  78. Zhu, H.; Samtani, S.; Chen, H.; and Nunamaker, J.F. Human identification for activities of daily living: A deep transfer learning approach. Journal of Management Information Systems, 37, 2 (2020), 457-483.
    朱,H.;Samtani,S.;陈,H.;和 Nunamaker,J.F. 日常生活活动的人类识别:一种深度迁移学习方法。管理信息系统杂志,37,2(2020),457-483。
Copyright of Journal of Management Information Systems is the property of Taylor & Francis Ltd and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.
《管理信息系统杂志》的版权归 Taylor & Francis Ltd 所有,其内容不得复制或通过电子邮件发送到多个站点或发布到邮件列表,未经版权所有者的明确书面许可。但是,用户可以打印、下载或通过电子邮件发送文章供个人使用。