Abstract 摘要
Big data analytics has become a significant trend for many businesses as a result of the daily acquisition of enormous volumes of data. This information has been gathered because it may be important for dealing with complicated problems including fraud, marketing, healthcare, and cyber security. Analytics are used by big businesses like Facebook and Google to analyse and make decisions about their massive volumes of collected data. Such analyses and decisions impact both the present and the future of technology. The inherent non-linear properties in huge data may be captured by deep learning (DL) algorithms using automated feature extraction techniques. In order to estimate renewable energy, energy consumption, demand, and supply, among other things, over the short, medium, and long term, a complete and in-depth investigation of generative, hybrid, and discriminative DL models is being conducted. This article examines the benefits and drawbacks of DL that depends on a variety of deep neural networks, including recurrent neural networks, multilayer neural networks, auto encoders and long short-term memory.
大数据分析已成为许多企业的重要趋势,原因是每天获取了大量数据。这些信息被收集起来,因为它可能对处理复杂问题(包括欺诈、营销、医疗保健和网络安全)很重要。大型企业(如 Facebook 和 Google)使用分析来分析和作出关于它们收集的海量数据的决策。这些分析和决策对技术的现在和未来产生影响。深度学习(DL)算法利用自动特征提取技术可以捕捉到大数据中的固有非线性特性。为了估计可再生能源、能源消耗、需求和供应等方面的短期、中期和长期情况,正在进行详尽而深入的生成式、混合式和判别式 DL 模型的研究。本文着重探讨了 DL 的利弊,这取决于多种深度神经网络,包括循环神经网络、多层神经网络、自动编码器和长短时记忆。
Similar content being viewed by others
其他人正在查看类似内容
Explore related subjects 探索相关主题
Discover the latest articles, news and stories from top researchers in related subjects.发现相关科研领域顶尖研究人员的最新文章、新闻和故事。
Avoid common mistakes on your manuscript.
1 Introduction
1 介绍
Big Data and the new tools and techniques it has enabled, such as Big Data Analytics (BDA), have completely changed how organisations and businesses work, opening up enormous new opportunities for corporations, professionals, and academics (Sun et al. 2018). Governmental and non-governmental organisations regularly generate vast volumes of data with a unique breadth and complexity nowadays, in addition to businesses and academic institutions (Debortoli et al. 2014). As a result, it is now vital for organisations all over the world to get relevant data and valuable advantages from these vast data resources (Sarkar 2017). However, as the research reveals, it can be challenging to quickly and expertly extract valuable insights from massive data (Zakir et al. 2015). In order to improve business performance and increase market share for the majority of organisations, BDA is now undeniably required to realise the full value of big data. Recent studies concentrate on deep learning methods to get reliable conclusions from huge volume of data.
大数据和它所带来的新工具和技术,比如大数据分析(BDA),已经彻底改变了组织和企业的工作方式,为企业、专业人士和学者们开辟了巨大的新机遇(Sun 等,2018 年)。政府和非政府组织现今定期产生着大量数据,具有独特的广度和复杂性,这除了企业和学术机构外(Debortoli 等,2014 年)。因此,现在对于世界各地的组织而言,从这些海量数据资源中获取相关数据和有价值的优势是至关重要的(Sarkar,2017 年)。然而,研究显示,迅速和专业地从海量数据中提取有价值的见解可能是具有挑战性的(Zakir 等,2015 年)。为了提升大多数组织的业绩并增加市场份额,现在毫无疑问地需要使用 BDA 来实现大数据的全部价值。最近的研究聚焦于深度学习方法,以从大量数据中获得可靠的结论。
A division of machine learning (ML) called deep learning (DL) is developed. Big data, which is given a lot of attention while creating these decision-making systems, may be utilised in its entirety using DL algorithms. The concept is to automatically extract characteristics from massive artificial neural networks and then use these features to inform choices (Nguyen et al. 2019). The number of neural network’s hidden layers is indicated by the term “deep” in this context. As the network is drilled deeper, the model’s performance increases. Deep learning algorithms’ ability to train on unlabelled data is one of its distinguishing characteristics (Gheisari and ZakirulAlam 2017). Both organised and unstructured forms of data may be processed using the deep learning algorithm, which can learn without human supervision. Many different businesses, including healthcare, finance, banking, and e-commerce, can benefit from deep learning. Any data that has pieces that are arranged in sequences is referred to as sequential data. In order to simulate problems involving time-dependent and sequential data, such as text generation, artificial intelligence, and stock market prediction, recurrent neural networks (RNN) are employed to analyse sequential data (Sarker 2021). For modelling sequential data, deep learning techniques namely recurrent neural networks (RNN) are used (Akbal and Ünlü 2022b). RNN and its variants, like gated recurrent units (GRU) and long short-term memories (LSTM), have become the key building elements for understanding sequential internet data in a number of academic fields, such as natural language processing and voice data analysis (Diao et al. 2019).
深度学习是机器学习(ML)的一个分支,被开发出来。大数据在创建这些决策系统时受到了很多关注,可以利用 DL 算法充分利用。该概念是从庞大的人工神经网络中自动提取特征,然后利用这些特征来指导选择(Nguyen 等人,2019 年)。在这种情况下,“深度”一词表示神经网络的隐藏层数量。随着网络深入挖掘,模型的性能会提高。无标记数据上训练的能力是深度学习算法的一个显著特征(Gheisari 和 ZakirulAlam,2017 年)。深度学习算法可以处理组织良好和无结构形式的数据,而无需人类监督。许多不同的行业,包括保健、金融、银行和电子商务,都可以从深度学习中受益。任何具有按序排列的片段的数据被称为序列数据。 为了模拟涉及依赖时间和顺序数据的问题,如文本生成、人工智能和股市预测,循环神经网络(RNN)被用来分析序列数据(Sarker 2021)。用于建模序列数据的深度学习技术,即循环神经网络(RNN)被使用(Akbal 和 Ünlü 2022b)。RNN 及其变体,如门控循环单元(GRU)和长短期记忆单元(LSTM),已成为理解多个学术领域中的顺序互联网数据的关键构建元素,例如自然语言处理和语音数据分析(Diao 等,2019)。
Hochreiter et al. originally suggested the use of LSTM, an implementation of the recurrent neural network, in 1997 (Hochreiter and Schmidhuber 1997). LSTM may learn to do tasks that require memory or state awareness, unlike the feed forward network designs that were previously defined. By allowing gradients to pass unmodified, LSTM partially overcomes the issue of disappearing gradients, which is a significant RNN restriction. Gated recurrent units (GRU) are a more compact variant of the LSTM (Körner and Marc 2021). Due to the absence of the output gate, GRUs is smaller than LSTM and can only outperform LSTM on particular simpler datasets (Cho et al. 2014b). Recurrent neural networks using LSTMs are capable of tracking long-term dependencies. Naul et al. developed GRU-based auto encoders and LSTM for automatically extracting features (Naul et al. 2018). In order to decrease dimensionality and recreate the original data set using an unsupervised method, an auto encoder that picks up the representation from the input data set is utilized. Back propagation is used as the foundation for the learning algorithm (Ng and Auto encoders 2018).
Hochreiter 等最初建议在 1997 年使用 LSTM,这是递归神经网络的一种实现(Hochreiter 和 Schmidhuber 1997)。LSTM 可以学习执行需要记忆或状态意识的任务,与先前定义的前馈网络设计不同。通过允许梯度无修改地传递,LSTM 部分克服了消失梯度的问题,这是递归神经网络的一个重要限制。门控递归单元(GRU)是 LSTM 的一种更紧凑的变体(Körner 和 Marc 2021)。由于没有输出门,GRU 比 LSTM 更小,只能在特定更简单的数据集上胜过 LSTM(Cho 等 2014b)。使用 LSTM 的递归神经网络能够跟踪长期依赖关系。Naul 等开发了基于 GRU 的自动编码器和 LSTM 以自动提取特征(Naul 等 2018)。为了使用无监督方法减少维度并重新创建原始数据集,利用从输入数据集中提取表示的自动编码器。 反向传播被用作学习算法的基础(Ng 和 Auto encoders 2018)。
This article is structured in the following manner. Section 2 briefly explains the related work based on sequence data, RNN, LSTM, GRU attention modules and also ensemble classifiers. Section 3 discusses the above techniques in detail. The manuscript is concluded in Sect. 4.
本文章的结构如下所示。第2部分简要说明基于序列数据、RNN、LSTM、GRU 注意力模块以及集成分类器的相关工作。第3部分详细讨论上述技术。手稿总结于第4节。
2 Related works
2 相关作品
2.1 Sequence data
2.1 序列数据
Modern industrial applications place a high value on effective anomaly identification and diagnosis in multivariate timeseries data. It is difficult to create a system that can rapidly and precisely identify unusual observations, nevertheless. Due to the dearth of anomaly labels, significant data volatility, and need for extremely quick inference times in contemporary applications, this is the case. Despite recent advances in deep learning algorithms for anomaly identification, only a select fraction of them can fully solve these issues. With the aid of attention-based sequence encoders and the understanding of broader temporal trends in the data, Tuli et al. (2022) proposal for TranAD, a deep transformer network-based anomaly detection and diagnosis model, uses inference quickly. TranAD employs adversarial training to achieve stability and focus score-based self-conditioning to enable robust multi-modal feature extraction. Moreover, we can train the model with little data thanks to model-agnostic meta learning (MAML). Comprehensive empirical experiments on six publicly accessible datasets show that TranAD can perform better in detection and diagnosis than state-of-the-art baseline approaches with data- and time-efficient training. Both the training times and the scores for anomaly detection are impacted by the window size. As smaller inputs need less inference time, TranAD can discover abnormalities more quickly when the window size is smaller. The local contextual information is not accurately shown if the window is too tiny. The area under the receiver operating characteristic curve (ROC/AUC) and the F1 score are both impacted by a window that is too big since brief abnormalities may be concealed among a lot of data sets. On the other hand, the predicting accuracy may decrease as a result of the conflict, duplication, and inconsistency of massive time-series data. In order to increase performance, (Jin et al. 2022) suggest a deep network by carefully choosing and comprehending the data. A variational Bayesian gated recurrent unit (VBGRU) is utilised to increase the anti-noise capacity and resilience of the model after first designing a data self-screening layer (DSSL) with a maximum information distance coefficient (MIDC) to filter input data with high correlation and low redundancy. Also, it effectively raises forecast resilience and accuracy. Compared to GRU and LSTM, this model needs greater training time.
现代工业应用非常重视在多变量时间序列数据中有效识别和诊断异常。然而,要创建一个能够迅速精确识别异常观测的系统是困难的。由于异常标签稀缺、数据波动显著以及当代应用程序中对极快推理时间的需求,这是一个事实。尽管在深度学习算法方面取得了一些进展用于异常识别,但其中只有很少一部分可以充分解决这些问题。在理解数据中更广泛的时间趋势以及基于注意力的序列编码器的帮助下,Tuli 等人(2022 年)提出了 TranAD,一种基于深度变压器网络的异常检测和诊断模型,能够快速进行推理。TranAD 采用对抗训练实现稳定性,并通过基于焦点得分的自身条件实现强大的多模特征提取。此外,我们可以通过模型无关的元学习(MAML)用少量数据来训练模型。 广泛的实证实验显示,TranAD 在检测和诊断方面比现有的基准方法表现更好,具有数据和时间高效的训练。训练时间和异常检测得分都受窗口大小的影响。由于较小的输入需要更少的推理时间,当窗口大小较小时,TranAD 可以更快地发现异常。如果窗口太小,本地上下文信息将无法准确显示。由于简短的异常可能被隐藏在大量数据集中,因此接收者操作特性曲线(ROC/AUC)下的面积和 F1 得分都会受到窗口过大的影响。另一方面,由于庞大时间序列数据的冲突、重复和不一致,预测准确性可能会下降。为了提高性能,(Jin 等人,2022 年)建议通过精心选择和理解数据来设计深度网络。 一种变分贝叶斯门控循环单元(VBGRU)被用来提高模型的抗噪能力和韧性,首先设计了一个具有最大信息距离系数(MIDC)的数据自筛选层(DSSL)来过滤具有高相关性和低冗余的输入数据。此外,它还有效提高了预测的韧性和准确性。与 GRU 和 LSTM 相比,这种模型需要更长的训练时间。
A deep learning algorithm is provided by Ünlü (2022) to model and estimate multistep daily Turkish power demands using data from 5 January 2015 to 26 December 2021. The development of novel and inventive deep neural network topologies as well as considerable processing breakthroughs are two key factors in deep learning’s rising appeal. Convolutional neural networks, Gated recurrent networks, and Long Short-Term Memory (LSTM) are trained and compared to anticipate daily power usage one to seven days in advance. The effectiveness of the suggested methods was assessed using the coefficient of determination (R2), root mean squared error, and mean absolute error, three separate performance criteria. According to the test set’s predicting findings, LSTM exhibits the best performance. The researched model’s stationarity or the model’s residual terms’ expectation of normal distribution and unidirectional correlation are not made in this model. The strength of the suggested model may be demonstrated by this characteristic. The researched model’s stationarity or the model’s residual terms’ expectation of normal distribution and unidirectional correlation are not made in this model. The strength of the suggested model may be demonstrated by this characteristic. By using univariate time series approaches, modelling long-term electrical loads may be more difficult. In this task, it’s important to take into account the other affecting factors.
一个深度学习算法由Ünlü提供(2022),利用从 2015 年 1 月 5 日到 2021 年 12 月 26 日期间的数据,对多步骤的土耳其每日用电需求进行建模和估算。深度学习不断增长的吸引力的两个关键因素是开发新颖和创新的深度神经网络拓扑以及相当大的处理突破。卷积神经网络、门控循环网络和长短时记忆(LSTM)被训练和比较,以预测提前一至七天的每日用电量。建议方法的有效性是通过确定系数(R2)、均方根误差和平均绝对误差三个单独的性能标准来评估的。根据测试集的预测结果,LSTM 表现出最佳性能。在这个模型中,研究的模型的稳定性或模型的残差项的正态分布期望和单向相关性并未考虑。建议模型的优势可能通过这一特征得到展示。 研究模型的稳定性或模型残差项的正态分布期望和单向相关性不在这个模型中进行。建议模型的优势可能通过这一特征得以展示。通过使用单变量时间序列方法,建模长期电力负载可能更加困难。在这项任务中,重要的是考虑其他影响因素。
In order to estimate and predict the power output of the Manisa wind farms, Akbal and Ünlü (2022a) employed a univariate model based on sequence-to-sequence learning. The short- to medium-term forecasting window covered by the study. The advantage of the proposed model is that predictions may be obtained with just the model’s own lagged value. According to empirical results, the model has a high coefficient of variation (R2) for the short-term prediction and a moderate R2 for the mid-term forecast. The model is still considered to be generally trustworthy even if mean squared error and mean absolute error in midterm estimates show a little reduction in R2. With a small adjustment, the recommended model may also be used to predict the lowest, highest, and average electricity production over a certain period in addition to hourly power output. This study concludes with two unique and intriguing suggestions for more investigation.
为了估算和预测马尼萨风电场的功率输出,Akbal 和Ünlü(2022a)采用了基于序列到序列学习的单变量模型。研究覆盖了短期到中期的预测窗口。所提出模型的优点是可以仅仅通过模型自身的滞后值来获得预测。根据实证结果,该模型在短期预测中具有较高的变差系数(R²),在中期预测中具有中等的 R²。即使中期估计中的均方误差和平均绝对误差表现出 R²略微降低,该模型仍然被认为是一般可信的。通过小幅调整,建议的模型也可以用来预测在一定时期内的最低、最高和平均电力生产,以及每小时的功率输出。这项研究得出了两个独特且有趣的建议,以供进一步调查。
Nowadays, there is a significant increase in the volume of AIR traffic, which has resulted in rising need for an air traffic monitoring system. Future air traffic density demands cannot be met by secondary surveillance radar and primary surveillance radar, two traditional surveillance technologies. In contrast to wireless communications situations, the air traffic likewise operates in a dynamic environment and is subject to a variety of dynamic influences. As a result, it is worthwhile to deploy machine learning models to anticipate flight delays by fully using the aviation data lake. In order to integrate the benefits of all the available data types, Gui et al. (2020), Wang et al. (2020) fed the complete dataset into a particular DL, allowing for discovering the best outcome in the broader and more precise solution space thus increasing forecast performance in delay in fight. RNNs have been generally utilized in the field of natural language processing (NPL) because they are suitable for sequential input. The LSTM network, one of the most powerful RNNs with a more complex cell organization, particularly resolves the gradient vanishing issue in RNNs. Considering the input vector x = [× 1, × 2,…, xT], and the state vector h = [h1, h2,…, hT] in which LSTM approach computes at every time to produce the output sequence y = [y1, y2,…, yT].
现在,航空交通量显著增加,这导致对空中交通监控系统需求增加。未来的空中交通密度需求无法通过传统的二次监视雷达和一次监视雷达来满足。与无线通信情况相比,空中交通也在动态环境中运作,并受各种动态影响。因此,值得部署机器学习模型,通过充分利用航空数据湖来预测航班延误。为了整合所有可用数据类型的好处,贵等人(2020 年)、王等人(2020 年)将完整数据集输入特定的深度学习模型中,从而在更广泛和更精确的解决方案空间中发现最佳结果,从而提高延误预测性能。RNNs 通常在自然语言处理(NPL)领域中被广泛使用,因为它们适合顺序输入。LSTM 网络是最强大的 RNN 之一,具有更复杂的细胞组织结构,特别解决了 RNN 中的梯度消失问题。 考虑输入向量 x = [× 1, × 2,…, xT],以及状态向量 h = [h1, h2,…, hT],其中 LSTM 方法在每个时间步计算以产生输出序列 y = [y1, y2,…, yT]。
The generic convolutional and recurrent architectures for temporal convolutional network (TCN) sequence modelling are systematically evaluated by Bai et al. (2018). The models are assessed using a wide variety of conventional benchmarking tasks for recurrent networks. The results show that a simple convolutional structural design achieves superior than traditional RNN like LSTMs while displaying a longer effective memory over a wide range of tasks and datasets. Any operation f: X T +1 → YT +1 which results in mapping is a sequence modelling network.
通用的卷积和循环体系结构对时间卷积网络(TCN)序列建模进行了系统评估,评估者是白 et al.(2018 年)。这些模型使用各种常规基准任务对循环网络进行评估。结果显示,简单的卷积结构设计在一系列任务和数据集上显示出比传统的 RNN(如 LSTMs)更优越的性能,并展现出更长的有效存储记忆。任何将 X T +1 → YT +1 进行映射的操作 f 都是一个序列建模网络。
The aim of sequence model discover a network f that diminishes the anticipated loss among actual outcome and predictions, L(y0,…, yT, f(× 0,…, xT), where the sequences and outputs are chosen from specific distribution. The elements of the sequence’s dilated convolution process F are computed as,
A residual block has a branch that leads to a set of transformations F (He et al. 2016), whose results are combined with the block’s input x:
According to the experimental findings, TCN models perform significantly better than general recurrent architectures like LSTMs and GRUs. The drawback was that TCNs had longer memory than recurrent systems do for the same amount of processing power.
2.2 Vanilla recurrent neural networks
2.2 香草循环神经网络
Another well-known neural network is the recurrent neural network (RNN), which uses time-series or sequential data and feeds the results of the previous stage as input to the current stage (Dupond 2019; Mandic and Chambers 2001). Recurrent networks, like feed-forward and CNN, also learn from training input, but they set themselves apart by having a “memory” that lets them use data from earlier inputs to influence current input and output. The output of an RNN is dependent on earlier parts in the sequence, unlike a normal DNN, which presumes that inputs and outputs are independent of one another. Learning lengthy data sequences can be difficult with ordinary recurrent networks due to the problem of vanishing gradients. Kim et al. (2016) presented a 2-stage approach. Initially, a deep RNN modelling is used to forecast the daily delay status of a specific airport, here the status were explained as mean delay of the entire arriving aircraft. Next, it uses a layered neuron network methodology for anticipating the delay for every specific aircraft by initial stage daily delay status and other data. The model’s two stages had accuracy of 85% and 87.42%, respectively. The DL approach needs a lot of information, according to this study. If not, the model will either perform badly or become over fit.
另一个众所周知的神经网络是递归神经网络(RNN),它使用时间序列或顺序数据,并将前一阶段的结果作为当前阶段的输入(Dupond 2019; Mandic 和 Chambers 2001)。 递归网络,像前馈和 CNN 一样,也是通过训练输入进行学习,但它们通过具有“记忆”而让自己脱颖而出,这使它们能够使用来自早期输入的数据来影响当前输入和输出。 RNN 的输出取决于序列中的更早部分,而不像正常的 DNN 那样假设输入和输出是彼此独立的。 由于消失梯度的问题,利用普通递归网络学习长数据序列可能会很困难。 Kim 等人(2016)提出了 2 阶段方法。 首先,使用深度 RNN 建模来预测特定机场的每日延误状态,这里的状态被解释为整个到达飞机的平均延误。 接下来,利用分层神经网络方法来预测每架特定飞机的延误,通过初始阶段每日延误状态和其他数据。 该模型的两个阶段的准确度为 85%和 87%。42%,分别。根据这项研究,DL 方法需要大量信息。否则,模型要么表现不佳,要么过拟合。
A novel approach is described by Diao et al. (2019) to considerably reduce the number of parameters in RNNs while retaining performance that is equivalent to or better than that of standard RNNs. The suggested idea restricts the parameter sharing between the weight matrices at each time step that correspond to the input data and hidden states. The unique design may be viewed as a compression of its conventional version, however in contrast to the majority of previous compression techniques, pre-training or complex parameter tweaking is not necessary. Figure 1 depicts the structure for imposing parameter restrictions in RNN.
作者 Diao 等人描述了一种新颖的方法,可以显著减少 RNN 中的参数数量,同时保持性能等同或优于标准 RNN。建议的想法是限制每个时间步对应于输入数据和隐藏状态的权重矩阵之间的参数共享。这种独特设计可以被视为其传统版本的压缩,然而与大多数先前的压缩技术相反,不需要预训练或复杂的参数调整。图 1 显示了在 RNN 中施加参数限制的结构。
The findings demonstrate that none of the extreme scenarios of sharing all or none of the parameters is the best way to describe various dependent input data. Utilizing shared partial parameters can drastically minimise the amount of RNN parameters by utilising the relationships between inputs.
Academics have historically used statistical techniques, such as the Auto Regressive Integrated Moving Average (ARIMA), to forecast traffic. However, ARIMA-based models are unable to make accurate predictions in a cellular environment that is very dynamic. Researchers are looking at LSTM and RNN-based deep learning techniques to build autonomous cellular traffic forecast models. Jaffry and Hasan (2020) offer a real-world call data record-based LSTM-based cellular traffic forecast model. The ARIMA model and a straightforward feed-forward neural network (FFNN) were contrasted with the LSTM-based prediction.
学术界历来使用统计技术,如自回归综合移动平均(ARIMA),来预测交通状况。然而,在一个非常动态的移动环境中,基于 ARIMA 的模型无法进行准确预测。研究人员正在研究基于 LSTM 和 RNN 的深度学习技术,以构建自主的移动交通预测模型。Jaffry 和 Hasan(2020)提供了一个基于真实通话数据记录的 LSTM 型移动交通预测模型。ARIMA 模型和简单的前馈神经网络(FFNN)与基于 LSTM 的预测进行了对比。
In a standard FFNN, information can only go in one direction: forward. Data for calculations and operations in the concealed, forward-looking layer is provided by the input layer. The output layer gets the information from the hidden layers and performs regression or classification predictions. In contrast, RNN use outcome of one layer as input for following layers. As a result, input from the very first layer is present in each layer. RNN takes both the input from the most recent time step and the input from earlier time steps into account. RNN may learn from all past time steps as a result, which enhances the accuracy of sequence predictions. Second, the learning process is halted when the gradient either totally vanishes or explodes to a very high value. This is an issue with pure RNNs. To resolve this problem, LSTM, an RNN version, was suggested in Hochreiter and Schmidhuber (1997). LSTMs were designed to get over the long-term reliance issue, which is the cause of the vanishing-gradient issue with Vanilla RNNs (Goodfellow et al. 2016). All gates and cell states that aren’t a part of the hidden layer’s final prediction and expression are:
在标准的前馈神经网络(FFNN)中,信息只能沿着一个方向传播:向前。计算和操作隐层的数据是由输入层提供的。输出层获取来自隐藏层的信息,并执行回归或分类预测。相比之下,循环神经网络(RNN)使用前一层的结果作为下一层的输入。因此,第一层的输入存在于每一层中。RNN 考虑了最近时间步的输入以及之前时间步的输入。结果,RNN 可以从所有过去的时间步学习,从而提高序列预测的准确性。其次,当梯度要么完全消失要么爆发到非常高的值时,学习过程会停止。这是纯 RNN 的一个问题。为解决这个问题,长短时记忆网络(LSTM)这种 RNN 版本在 Hochreiter 和 Schmidhuber 提出(1997 年)。LSTM 被设计用来解决长期依赖问题,这也是导致普通 RNN 出现梯度消失问题的原因(Goodfellow 等人,2016 年)。所有门和细胞状态,不属于隐藏层最终预测和表达的一部分:
Finally the outcome of \(y^{t}\). is computed as:
The symbol in these equations stands for the sigmoid operation, frequently called as the squashing operation since it limits the output to values between 0 and 1.The sigmoid operation is officially defined as \(\frac{1}{{1 + e^{ - x} }}\). Another squashing function is \(\varphi\), which is frequently squashed using tanh or rectified linear unit (relu) functions. The findings demonstrate the accuracy of LSTM and FFNN in predicting cellular traffic. When training a model for prediction, it was discovered that LSTM models converged more quickly.
这些方程中的符号代表了 Sigmoid 运算,常常被称为压缩操作,因为它将输出限制在 0 到 1 之间。Sigmoid 运算被正式定义为\(\frac{1}{{1 + e^{ - x} }}\)。另一个压缩函数是\(\varphi\),经常使用 tanh 或修正线性单元(relu)函数进行压缩。研究结果显示 LSTM 和 FFNN 在预测细胞流量方面的准确性。在训练预测模型时,发现 LSTM 模型收敛速度更快。
In Long et al. (2019) formalised the concept of “memory length” for recurrent networks and identified a general family of recurrent networks with maximal memory lengths. It is shown that these networks may be stacked into several layers to create efficient models, including gated convolutional networks. Due to the architecture of such networks, this model shows that there is no gradient vanishing or explosion during back-propagation, which could allow for a more principled design approach in practise. Additionally, it provides a novel example of this family, the attentive activation recurrent unit (AARU). The framework of the AARU network is depicted in Fig. 2.
2019 年,《龙等人》形式化了循环网络中的“记忆长度”概念,并确定了具有最大记忆长度的一类通用循环网络。表明这些网络可以堆叠成多层以创建高效模型,包括门控卷积网络。由于这些网络的架构,该模型表明在反向传播过程中不存在梯度消失或爆炸,这可能会使设计方法更为原则。此外,它提供了该类别的一个新例子,即注意力激活循环单元(AARU)。AARU 网络框架如图 2 所示。
Because they include internal memory, RNNs are among the most effective and reliable ANN types currently being used (Park et al. 2022). RNN can identify solutions for a wide range of issues thanks to its internal memory, which can recall its inputs (Ma and Principe 2018). By using a method called back propagation, which depends on the RNN’s gradient, RNN is optimised in terms of its weights to fit the training data. The gradient of the RNN, however, may disappear and explode during the optimization routing, which impairs the RNN’s capacity to learn from lengthy data sequences (Allen-Zhu et al. 2019). The LSTM architecture (Hochreiter and Schmidhuber 1997), a particular kind of RNN, is frequently utilised as a solution to these two issues (Le and Zuidema 2016). By retaining information for extended periods of time, LSTMs are specifically created to learn long-term dependencies of time-dependent data. In applications like speech recognition (Tian et al. 2017; Kim et al. 2017) and text processing (Shih et al. 2017; Simistira et al. 2015), LSTM carries out faithful learning. Moreover, because it has internal memory, the flexibility to be customised, and is unaffected by gradient-related problems, LSTM is especially appropriate for complicated data sequences like stock time series retrieved from financial markets.
因为它们包括内部记忆,RNNs 是目前使用中最有效和可靠的 ANN 类型之一 (Park 等人.2022)。RNN 能够通过其内部记忆识别各种问题的解决方案,可以召回其输入(马和普林西普等人 2018)。通过使用一种称为反向传播的方法,该方法依赖于 RNN 的梯度,RNN 在权重方面进行了优化,以适应训练数据。然而,RNN 的梯度在优化过程中可能会消失和爆炸,这会影响 RNN 从长序列数据中学习的能力(Allen-Zhu 等人 2019)。LSTM 架构(Hochreiter 和 Schmidhuber1997),一种特定类型的 RNN,经常被用作这两个问题的解决方案(Le 和 Zuidema2016)。通过长时间保留信息,LSTMs 专门用于学习依赖于时间的数据的长期依赖关系。在诸如语音识别(Tian 等人 2017;Kim 等人 2017)和文本处理(Shih 等人 2017;Simistira 等人 2015)等应用中,LSTM 进行了忠实的学习。 此外,由于它具有内部存储器、可以定制的灵活性,并且不受梯度相关问题的影响,LSTM 特别适用于从金融市场检索出的股票时间序列等复杂数据序列。
2.3 Long short-term memory 重试 错误原因
The vanishing gradient problem was first presented by Hochreiter and Schmidhuber (1997), and is now addressed by the widely used LSTM type of RNN design. An LSTM unit’s memory cell has the capacity to retain data for extended periods of time, and three gates control the movement of data into and out of the cell. For example, the “Forget Gate” decides what information from the previous state cell will be memorised and what information will be erased that is no longer helpful, while the “Input Gate” decides which information should enter the cell state and the “Output Gate” decides and controls the outputs. The LSTM network is one of the most effective RNNs because it addresses the problems associated with recurrent network training. LSTM utilised with Preeti et al. to forecast non-stationary financial finance (Bala and Preeti 2019). The study of the aforementioned academics demonstrates that the time-series prediction model based on LSTM is more accurate than the conventional regression model when used to the associated feature prediction of time-series. The model training time is lengthy, though, since there are too many LSTM parameters. In order to maximise the sensitivity of model learning and boost the precision of prediction, Hu and Zheng (2020) suggested a multi-stage attention network for a multivariate time-series prediction model. However, the issue of this model’s prolonged training period was not fully resolved. 重试 错误原因
Mujeeb et al. (2019) proposed DL methodology to estimate price and demand for huge data using Deep LSTM (DLSTM). Processing vast volumes of data by LSTM is simpler than with solely data-driven approaches because of the adaptive and automatic feature learning mechanism of DNN. Information from well-known real power markets was used to assess the suggested model. All months were used for the day and week ahead forecasting tests. Mean Absolute Error (MAE) and Normalized Root Mean Square Error(NRMSE) were used to evaluate forecast performance. The suggested DLSTM approach was compared to two traditional ANN time series forecasting techniques, Nonlinear Autoregressive Network with Exogenous Variables and Extreme Learning Machine (ELM). 重试 错误原因
The network over fits on the training set as a result of the gradient disappearing. Memorizing inputs rather than learning is over fitting. A model that has been over fitted cannot be relied upon to perform well on test or new data. The hidden layer of the LSTM has one neuron per cell, each of which represents cell in memory with a self-connected recurrent edge. This edge allows the gradient to continue without bursting or fading over a period of steps when the weight is set to 1. Figure 3 depicts the construction of one LSTM unit. 重试 错误原因
In terms of accuracy, DLSTM fared better than the benchmark forecasting techniques. The effectiveness of the suggested approach for predicting power prices and load is demonstrated by experimental findings. 重试 错误原因
Yang et al. (2019a) tested 5 word embeddings and 2 custom methods while de-identifying 10 PHI categories using a Bi-LSTM-CRF approach. Word embedding can discover properties in a low-dimensional matrix by converting words into vectors of real values, removing or minimising the need for feature engineering. From Google News, Common Craw, MIMIC-word2vec, MIMIC-fast Text, and MADE, word embeddings were retrieved. The highest recall and accuracy scores for Common Crawl (94.98 and 97.97, respectively) were linked with the highest F-measure score (96.46). There is no denying that this is an exciting piece of research, however the strategy used to adjust training techniques may be constrained by data from the test source. 重试 错误原因
The smart workshop makes extensive use of intelligent sensors and the IoT, which results in the collection of a sizable quantity of real-time production data. The data gathered may be thoroughly assessed in order to aid producers in making informed decisions. As contrast to traditional data processing methods, artificial intelligence (AI), the main big data analysis strategy, is being used in the industrial sector more and more. But there are differences in how well different AI models can decipher real-time data from smart job shop operations. Traditional ML and DL techniques cannot effectively take advantage of the temporal correlation of data, whereas LSTM is widely used in machine translation, dialogue generation, coding, and decoding technology precisely because it is particularly suited for dealing with the time series strongly connected issues. The predicted production plan will be used as an example by Wang et al. (2020), who will also create the LSTM and GRU model to handle the production process data. This will assist the production manager in meeting the deadline for the initial production plan job and in determining whether to postpone the production plan. It uses the two-layer LSTM system model from Fig. 4. The output vector of one LSTM hidden layer becomes the input vector for the following layer when using DLSTM, giving the entire model extra processing capacity. 重试 错误原因
Another name for LSTM is GRU. The impact is unaffected despite the GRU’s structure being less complex than the LSTM’s. Update door and Reset door are the only two door operations available in the GRU model. To regulate how much of the prior time’s state information is incorporated into the present state, an update gate is utilised. As a result, the update gate’s value increases. The amount of information from the prior state written depends on how big the reset gate is. GRU computation is represented in the following Equations, 重试 错误原因
here, \(\sigma\) represent the sigmoid operation, \(a_{t}\) represents activation sequences in the input gate, the hidden layers were represented as \(hl\) and the weight matrix denoted as \(Wi\).While LSTM and GRU perform similarly on various data sets, GRU has a simpler internal storage unit structure than LSTM. Therefore, while training the model, GRU runs significantly more quickly than LSTM. When the accuracy of the predictions is almost equal, it can be concluded from the experimental comparison that the GRU model takes a great deal less time than the LSTM model. As a result, the GRU model outperforms the LSTM model while analysing industrial data. Due to local optimum, both GRU and LSTM experience loss when training the dataset; this problem may be resolved by making the models better. 重试 错误原因
Worldwide, there are more and more different types of financial fraud, which causes enormous financial losses and is a significant issue. This issue has been thoroughly addressed over the past few years utilising methods from machine learning and data mining. These techniques still need to be developed in order to cope with massive data, compute quickly, and spot new attack patterns. Yara et al. (2020) suggested a DL based solution for identifying financial fraud based on the LSTM methodology. With regard to massive data, this approach seeks to increase the effectiveness and precision of existing detection techniques. The suggested approach is assessed on real dataset of credit card thefts, and the results are compared with a deep learning model currently available called the Auto-encoder model and many other ML strategies. The test outcomes showed that the LSTM performed flawlessly, achieving 99.95% accuracy in less than a minute.
Making decisions based on many criteria is one of the most important problems to address when dealing with concerns relating to alternative impacts in big data research. In order to deal with this problem for uncovering different qualities of which increased goods challenging establishing exact characteristics for the remoteness determining similarity across tests, uproarious for incorporating needless data at now linked group staggered progressive based on decision tree called structure A deep learning-based hierarchical clustering approach for huge sequence data is proposed by Papineni et al. (2021) with the fusion of several criteria-based decision making. However, unnecessary data severely weaken the viability interruption finding methodology. The following steps illustrates how this approach works,
Pseudocode steps to design MC-DL for BD:
-
1.
The model took into account the following inputs: a, b, c, and d model parameters established using datasets gathered from running records.
-
2.
Output: According to the instructions list for the issue that was resolved, be observed. 重试 错误原因
-
3.
Set the data stored in the batch 重试 错误原因
-
4.
For every dataset 重试 错误原因
\(B\left( {a_{i} k} \right) = b\left( {a_{i} } \right) + b\left( {N_{i} } \right)\) repeat for the whole data 重试 错误原因
If there are any influencing relationships between the agents of one initial value and those of other initial values at the global set of issues, they are used to compute the overall efficacy of ai coefficient. 重试 错误原因
$$B\left( {N_{i} } \right) = Neighboring set of values$$ 重试 错误原因 -
5.
The system’s starting value is stated as the following, 重试 错误原因
$$\mu_{ij} = b\left( {a_{i} } \right) + b\left( {a_{j} } \right){ / }K$$ 重试 错误原因The positive agent factors are the constant normalisation parameters, \(b\left( N \right) = \mathop \sum \nolimits_{j \in N} \mu_{ij} b\left( {a_{j} } \right)\). 重试 错误原因
-
6.
The mapping procedure established for all of the DL’s hidden layers is 重试 错误原因
\(a_{i + 1} = f_{k} (W_{i} *a_{i} + b_{i}\)) with its activation, i.e., the weights and hidden layers linking the DL networking may be cumulatively added with the sigmoid function. 重试 错误原因
The decision outcome for the x-variable that has been normalised is \(Decisionvalue\left( x \right) = \frac{{f_{k} x - mean value}}{{\delta_{k} }}\). 重试 错误原因
-
7.
End for 重试 错误原因
-
8.
Return 重试 错误原因
Mutual coefficients under the common agents can be derived based on the weighting index when computing the benefit factors of the nearby agents.
As a result, the benefit plan’s entire evaluation will be looked into. The suggested study improves our understanding of this area of stream research. It enables getting more effective, robust, and precise forecasts of the values for various criteria-based assisted decision modelling systems and their framework.
Bidirectional recurrent neural networks are used in Chadha et al. (2020) innovative method for time series-based condition monitoring and defect diagnostics. The use of bidirectional recurrent neural networks basically changes how fault detection is approached and enables handling fault relationships across longer time horizons, preventing crucial process failures and boosting system productivity overall. The capacity is further improved by an unique data preprocessing and restructure method that enforces generalisation, maximises data use, and produces more effective network training, particularly for sequential fault classification tasks. Standard recurrent designs like LSTM, GRU, and vanilla recurrent neural networks are outperformed by the proposed Bi-LSTM.
The testing findings for both binary and multi-class classification demonstrate the bidirectional LSTM Networks’ higher average fault discovering capabilities when compared to alternative designs.
Bian et al. (2021), (By setting up comparative experiments, comparing with LSTM, GRU, SVR, RF, LR, CNN-LSTM and Attention-LSTM, it is verified that the PSO-Attention-LSTM model has advantages in positive rate and false positive rate, and has stronger anomaly detection ability) propose an approach based on particle swarm optimization and LSTM by attention mechanism in response to consumer’s aberrant power consumption behaviour (PSO-Attention-LSTM). In order to fully analyse the detection efficacy of the model for diverse electrical thief behaviours, four composite modes are produced by merging the six frequent electricity theft modes, which are first specified in accordance with the real electrical steal behaviour. Second, a detection model based on PSO-Attention-LSTM is built using the Tensor Flow framework. The model may minimise the loss of past data while boosting critical information and suppressing irrelevant information by employing the attention mechanism to apply changing weights to the hidden state of the LSTM. Utilize PSO to choose the ideal model parameters, and then optimise the hyperparameters to increase the output of the model. The LSTM model has high time series data fitting regression skills and can take into account the time series features of user power usage. To improve the model’s detection and prediction accuracy, the Attention mechanism is also included. It gives LSTM hidden states different probability weights and amplifies the impact of important information. To find the optimal parameters and raise the model’s detection performance, use PSO-optimized model hyperparameters.
In this study, window sliding processing is done using one-day data, which implies that the data from the previous 24 time steps are used to forecast the data from the subsequent time step. The timing and pattern of consumers’ power consumption behaviour are taken into account in this process (Wang et al. 2018a; Deng et al. 2020). Figure 5 displays the unique prediction principle.
The LSTM layer’s prediction outcomes are successfully highlighted using the attention mechanism, which also enhances the model’s effectiveness as a predictor and its ability to identify changes in user power use. The LSTM layer, input layer, attention layer, and output layer make up majority of Attention-LSTM model. The LSTM feature vector acts as the input layer for the Attention layer, which is positioned underneath it. According to the weight distribution concept, improved weight parameters may be created by continuing to update. The final estimation of user power consumption is output by the fully linked layer. Figure 6 designs the Attention-LSTM approach’s frameworks.
This study’s detection model has poor detection performance on “burr” spots brought on by high levels of noise and unpredictable behaviour because it fails to take into account the relevant practical application considerations.
While employing an LSTM with high volatility and uncertainty, Kong et al. (2019) developed the short-term residential load predicting. In the interim, the method of mixing different projections was applied. The proposed LSTM framework generated the data set’s highest prediction performance. Jiao et al. (2018) proposed an LSTM-based method to forecast the load of non-residential users using a variety of relevant sequence information. K-means was utilised to analyse the daily load curve of non-residential customers, and testing results revealed that it was also more accurate than earlier load forecasting methods. But given that load forecasting is impossible without taking into consideration a sizable number of relevant factors and historical data. Single RNN algorithms like LSTM or GRU (Chung et al. 2014) might take into account the history data of temporal data, but they need to manually create the feature connection (Sak et al. 2014).
2.4 Gated recurrent unit
A Gated Recurrent Unit (GRU), developed by Cho et al. (2014a), is a well-liked variation of the recurrent network that use gating techniques to regulate and manage information flow between neural network cells. As seen in Fig. 7, the GRU is similar to an LSTM but has fewer parameters because it just contains a reset gate and an update gate. The main distinction between a GRU and an LSTM is therefore that a GRU has two gates (reset and update gates), whereas an LSTM has three gates (namely input, output and forget gates). The structure of the GRU makes it possible to capture dependencies from lengthy data sequences in an adaptive fashion without losing data from previous portions of the sequence. For of this, GRU is a little more simplified variation that frequently provides equivalent performance and is significantly faster to calculate (Chung et al. 2014). A technique for forecasting short-term wind power based on wavelet packet decomposition and enhanced GRU was proposed by Zu and Song (2018). (WPD-GRU-SELU). The wind power time series is divided into many sub-sequences with different frequencies using WPD as the initial stage in this procedure. Then, a modified GRU neural network is used to forecast the sequences of various frequency components. This network employs scaled exponential linear units (SELU) as the activation function to compress the hidden states and determine the output. Reconstructing the GRU neural network output data yields the thorough findings for wind power prediction. By including SELU as the output activation function, this approach revised the activation function of GRU. The activation functions are illustrated in the following equations,
门控循环单元(GRU),由 Cho 等人开发,是循环网络的一种备受喜爱的变体,使用门控技术在神经网络单元之间调控和管理信息流。如图`7`所示,GRU 类似于 LSTM,但参数更少,因为它只包含一个重置门和一个更新门。因此,GRU 与 LSTM 的主要区别在于,GRU 有两个门(重置和更新门),而 LSTM 有三个门(即输入、输出和遗忘门)。GRU 的结构使其能够以适应性方式捕获长数据序列的依赖关系,而不会丢失先前部分序列的数据。因此,GRU 是一个稍微简化的变体,通常提供相同的性能,并且计算速度明显更快(Chung 等人`2014`)。基于小波包分解和增强型 GRU 的短期风力预测技术由 Zu 和 Song 提出(`2018`)。(WPD-GRU-SELU)。 风电时间序列在此过程中被分成许多具有不同频率的子序列,使用 WPD 作为初始阶段。然后,使用改进的 GRU 神经网络来预测不同频率成分的序列。该网络采用了缩放指数线性单元(SELU)作为激活函数,用于压缩隐藏状态并确定输出。重建 GRU 神经网络输出数据可以为风电预测提供透彻的发现。通过将 SELU 作为输出激活函数,这种方法修改了 GRU 的激活函数。激活函数如下方程所示:
The potential value of the active hidden node is represented by \(\tilde{h}_{t}\) in the equation above, and the activation value of the current hidden node output is represented by \(h_{t}\). Reset and update gates are denoted by \(r_{t}\) and \(z_{t}\). The element-wise multiplier is denoted by \(\circ\). Control gates and candidate states, \(\sigma_{sig} \varphi_{tanh}\) are activated by the activation functions. The sigmoid and tanh expressions are,
Thus the researchers apply an updated GRU model to estimate wind power to increase prediction correctness. It was first introduced to use SELU as the output h t’s activation function. The SELU equation is:
here,\(\lambda = 1.0507009873554804934193349852946,\alpha = 1.6732632423543772848170429916717.\) The equation for GRU after integrating SELU are depicted below:
Jin et al. (2020) hybrid DL predictor separates the climatic information to fixed component groups with unique frequency characteristics using the empirical mode decomposition (EMD) method, trains a GRU as a sub-predictor for each group, and then adds the GRU results to the prediction result. The GRU in this model is trained using the stochastic gradient descent method utilising the data recognized to be input and output, enabling the determination of the ideal weight. Each group’s combined IMF sequences were utilised to train the GRU network. The GRU is made up of several GRU cells, and in this instance there are two hidden layers. On analysing Fig. 7, IPt, t = 1,2,...,n input to GRU model and OPt, t = 1, 2,..., n output.
The GRU with forward propagation are computed as (Yang et al. 2019b): 重试 错误原因
here, it ∈ Rd is vector input for every GRU cell; ut, cst,rst, and cht reflect the update gate, active state, reset gate, and candidate state of the present hidden node t time. Bias vectors are symbolized by b, whereas weight matrices U and W are determined during model training. The gradient descent approach is used to train the GRU, and the parameters are adjusted until convergence. Experiments employing meteorological data from an agricultural IoT system confirm the proposed model. The results of predictions show that the proposed predictor might offer more accurate forecasts of temperature, wind speed, and humidity data to meet the needs of precision agriculture output. 重试 错误原因
As a technique for predicting time series, recurrent neural networks were proposed by a number of researchers (Ahmad et al. 2017). The RNN’s shortcoming is that it can only remember recent correlations and dependencies because of the declining gradient (Song et al. 2019). The output gate, input gate, and forget gate of LSTM approach, an enhanced RNN, may be used to regulate how much information is retained or discarded by a sequence. Additionally, it prevents the RNN model from overlooking long-term states (Lin et al. 2020). In the GRU model, a modified version of the LSTM model, the forget gate and input gate are integrated into one update gate, but the cell state is lost (Chen and Chou 2022).Therefore, there is less parameter. The prediction performance of the GRU model greatly surpassed that of the other two dynamic models, RNN and LSTM. 重试 错误原因
For effectively using the link between temporal components in load data, increase the precision and effectiveness of short-term load forecasting, and get over the challenges brought on by load volatility and nonlinearity in accurate load forecasting. In 2021 (Shi et al. 2021), Shi et al. suggested a hybrid neural network model for estimating short-term demand based on the temporal convolutional network (TCN) and GRU. The distance correlation coefficient is employed to determine the link between the load and the climatic variables after reconstructing the characteristics using the fixed-length sliding time window approach. The last phase is to carry out perdition by using a temporal convolutional network to unearth hidden time correlations and historical information, such as weather data and electricity pricing. To improve prediction efficiency and accuracy, the cutting-edge AdaBelief optimizer and Attention mechanism are in fact used. Data on Spanish load, weather, and the PJM power system are used to demonstrate the usefulness and superiority of the suggested model. The recommended model may produce accurate load forecasting results extremely quickly, according on data from several short-term load forecasting periods and detailed analyses of the performance of various models. 重试 错误原因
2.5 Auto encoder and decoder 重试 错误原因
To overcome the aforementioned issues and improve the accuracy of predictions for specific times, Hou et al. (2021), Chenyu et al. (2021) advise utilising a Self-attention based Time-Varying (STV) prediction model. Using an encoder-decoder module with a multi-head self attention mechanism, researchers first examine how succeeding series are interconnected and hunt for common patterns. With the help of the multi-head self-attention mechanism, the model is able to project the historical series into several subspaces and extract detailed, high-level properties for prediction. Then, using the primary outcome of the encoding vector as our input, we use a decoder to forecast upcoming multi-step outcomes. Using this encoder-decoder module, the model may link input and prediction in order to seek for patterns that are consistent throughout a range of time periods in a series. However, the decoder’s outputs cannot be utilised as the final conclusions for prediction since they are slanted toward the distribution of normal periods. 重试 错误原因
Let’s say that \(Z \in {\varvec{R}}\) is the sub-input. encoder’s, then it is computed as: 重试 错误原因
\(c \in R^{{l \times d_{e} }}\) Formulated from encoder, decoder is computed as: 重试 错误原因
here, matrix for weight \(W_{1} \in {\varvec{R}}^{{\left( {l \times d_{e} } \right) \times \left( {2l \times d_{e} } \right)}} ,W_{2} \in {\varvec{R}}^{{\left( {2l \times d_{e} } \right) \times \left( {l \times d_{e} } \right)}} ,\) and \(W_{3} \in {\varvec{R}}^{{\left( {l \times d_{e} } \right) \times H}}\), and bias vectors \(b_{1 } \in {\varvec{R}}^{{2l \times d_{e} }} , b_{2 } \in {\varvec{R}}^{{l \times d_{e} }}\), and \(b_{3} \in {\varvec{R}}^{H}\) are fully-connected layers. 重试 错误原因
where \(W_{i}^{Q} \in {\varvec{R}}^{{d_{e} \times n}} ,W_{i}^{K} \in {\varvec{R}}^{{d_{e} \times n}} , W_{i}^{V} \in {\varvec{R}}^{{d_{e} \times n}}\), and \(W^{L} \in {\varvec{R}}^{{n \times d_{e} }}\) are learnable parameters.
If Weight = 1, conduct softmax operation for final attention marks,
here, length = 1 of time series, and 重试 错误原因
The significance of each past value is symbolized by this score in a semantic way. 重试 错误原因
Final outcome is computed as, 重试 错误原因
Forecast outcome \(\hat{y}\), forecasting outcome \(\hat{y}_{t}\) at specific time \(t\) computed as, 重试 错误原因
With this method, the regression problem’s unbalanced data are resolved. IOn the Call Traffic dataset, it improves baselines with median of 18.64% and 20.87%, while Elec CONS dataset, it improves baselines by 5.74% and 20%. 重试 错误原因
Traditional health advice is risky because it lacks security safeguards against intrusions and is unsuitable for offering useful advice. As a result, people are reluctant to provide critical medical information. It is essential to create a privacy-preserving health recommendation system that offers the user the top-N options based on their preferences and previous input while also protecting their privacy. These issues are addressed by Selvi and Kavitha (2022) layered discriminative de-noising convolution auto-encoder-decoder with a two-way recommendation system, which provides end users with efficient and secure access to health data. The modified blowfish algorithm used in this scheme guarantees users’ privacy. The Hadoop transform is used to structure the large amounts of patient-generated data. Here, the two-way system examines and picks out more useful elements from each patient’s explicit and implicit data, and then it fuses all of the learnt features to provide an effective suggestion. 重试 错误原因
This approach is used to train the developed methodology to provide recommendations based on feedback and learn characteristics from both implicit and explicit patient data. After the encoder has mapped the input data from a high-dimensional space, the decoder is utilised to recreate the data. High dimensional vectors are produced via the bag of search method and then encoded in the auto-encoder for improved input processing. The equation below illustrates this nonlinear mapping operation at the encoder side,
On the decoding side, the nonlinear mapping is provided by,
here, \(Wi_{1}\) and \(Wi_{2}\) represents the encoding layer weights, \(bi_{1}\) and \(bi_{2}\) are vectors in the bias, side information denoted as S, \(E_{i}\). and \(\widehat{{D_{i} }}\) are encoding and decoding activation operations.
It is assumed that a sigmoidal operation provided by the encoding and decoding layer’s activation operation is,
Before transmitting the features to the stacked de-nosing auto-encoder-decoder network, the CNN first executes convolution and pooling operations with the input data. To produce helpful health advice, the suggested convolutional network mixes two convolutional networks using a two-way model. Significant traits from implicit data are optimised differently than significant traits from explicit data. The fully linked layer incorporates all of these optimised characteristics to provide top-N suggestions. Here, word embedding, word pooling, and convolution are carried out by the convolutional network. The effectiveness of the proposed method is assessed and its performance is compared with that of more advanced techniques using various statistical indicators. It is clear from the result analysis that the suggested system outperforms the preceding approaches. Future multimodal feature fusion-based approaches will be used to provide recommendations with more accurate characteristics. 重试 错误原因
In order to improve the model’s capacity for simulating long-term reliance, Chen et al. (2018) created encode-decoder framework employing RNN’s that required recurrent unit for rebuilding previous state by present state. Using bottleneck deep autoencoders, Fong and Jun (2020) developed a revolutionary NL-DA approach in 2020. This research effort introduced two new contributions: The monotonicity constraint was added to the bottleneck deep auto encoders to help determine the single nonlinear component, and the suggested FS deep learning architecture was used to estimate the numerous nonlinear components. The suggested technique was evaluated using two real data sets, and the findings showed improved reconstruction error outcomes.
A unique optimised auto-encoder based dimensionality reduction model on huge datasets is described in Shikalgar and Sonavane’s (2021), Arifa and Shefali (2021) study. Auto encoder, in general, is a distinct NN with three layers. In order to narrow the output created and bring it closer to the input values provided, the weight of the layers is often modified with various training methods. Using a self-adaptive Bumble Bees Mating Optimization (SA-BBMO) method, which is a conceptual enhancement of the regular BBMO, the weight of the auto-encoder is adjusted in this research. This model’s loss of relevant data is negligible, and as a result, the quality loss is also negligible. For this reason, it is recommended that the presented work be used as a suitable approach for dimensionality reduction in large datasets.
2.6 Attention modules
2.6.1 One-to-many RNN
Gangi et al. (2019) use the MuSTC corpus to examine whether subsets of target languages assist transfer learning in SLT with a focus on the one-to-many scenario. This is the first study on the application of a multilingual technique for translating spoken languages (SLT) that is available to the best of our knowledge. This was accomplished by outlining the shortcomings of target forcing strategies analogous to machine translation, then outlining and evaluating our SLT-focused enhancements. According to the preliminary testing, the target forcing strategy as described in Melvin et al. (2017) performs poorly when compared to the unidirectional baselines. Actually, to spread throughout the whole input sequence, this is repeated across the temporal dimension.
A series of character embeddings that have been added together using trigonometric positional encoding are fed to the decoder. Three parallel 2D CNN layers are the foundation of the 2D Self-attention (2DSAN) algorithm, which computes three distinct representations of its inputs Q, K, and V. Calculating attention involves:
Here, i = {1,2,..,c}c represent total filters in CNN.In order to boost the performance of their encoder self-attention layers, Linhao et al. (2018) suggested the implementation of a distance penalty mechanism, although they do not go into more detail. Apply a distance penalty deducted from softmax’s input the equation above along similar lines. Using d =|i − j| distance between places (i,j) as our starting point, we can create a logarithmic penalty as follows.
This model really contains around 30 million characteristics, but large MT multilingual models can have orders of magnitude more. It were unable to do trials with bigger models since SLT models, particularly Transformer, used a significant amount of GPU memory.
2.6.2 Many-to-one RNN
The rapid growth in traffic in recent decades has resulted in significant environmental problems, like traffic noise pollution and green house gas emissions. The health and quality of life have both been negatively impacted by these issues. A deep learning-based traffic noise modelling created by Zhang et al. (2021a). In order to predict traffic noise using actual traffic data, the goal is to choose the best machine-learning model utilising multivariate traffic features. In this research, RNNs are thoroughly assessed for modelling time series traffic data, which was collected during an experiment in an inner city circle and comprises both video traffic and audio data. Then can follow the changes in traffic noise levels over a brief period of time because to our capacity to make short-term predictions. This would be highly beneficial for studies of traffic discomfort brought on by noise peaks, which empirical traffic noise models are unable to fulfil because their projections are usually averaged over a lengthy period of time (Rey et al. 2020).
The model’s output is traffic noise, while its inputs are various traffic feature variables. Based on the input traffic characteristics, the forecast for the output traffic noise level at each time step is made. Figure 8 illustrates three different architectures that may be used to formulate this problem. Here, n steps represents the length of the sequence, m the number of training samples, and n number of feature variables Fig. 2 various layers, which might be an LSTM, GRU, or a straightforward RNN, have grey spherical nodes that represent the recurrent units.
模型的输出是交通噪音,而其输入是各种交通特征变量。根据输入的交通特征,对每个时间步的输出交通噪音水平进行预测。图 8 展示了可能用于制定此问题的三种不同架构。这里,n 步代表序列的长度,m 表示训练样本的数量,n 表示特征变量的数量。图 2 显示了各种层,它可以是 LSTM、GRU 或在这里,灰色球形节点表示循环单元的简单 RNN。
The trained GRU model performed quite well, demonstrating the immense potential for using GRU for traffic noise. Decision-makers and urban authorities can utilise the traffic noise model to assist them with a number of activities, such as managing traffic volume, enforcing speed limits, evaluating new traffic infrastructure, etc. However, due to a lack of pertinent data, several more aspects, like the road surface, the environment, and the weather, were not included in this manuscript.
训练有素的 GRU 模型表现出色,展示了利用 GRU 处理交通噪音的巨大潜力。决策者和城市管理部门可以利用交通噪音模型协助他们进行许多活动,如管理交通量、执行速度限制、评估新的交通基础设施等。然而,由于缺乏相关数据,本文稿中未涵盖道路路面、环境和天气等几个方面。
2.6.3 Many-to-many RNN
The Booking.com-hosted ACM WSDM Web Tour 2021 Challenge is focused on using session-aware recommender systems in the tourism industry. This approach suggests user’s next destination by the sequence of travel reservations made during a journey. To manage the high degree of dimension Alonso (2021) proposes a strategy to manage a large dimensional output space in RNN-based Session-Aware recommender systems. This can be done by extending commonly used RNN-based recommender systems from their conventional many-to-one configuration to their many-to-many configuration is one of the primary contributions to this subject. Instead of having a user anticipate the following booking given a succession of bookings, we forecast the following booking at each time step. Additionally, it was demonstrated how this is a computationally effective substitute for doing data augmentation in a many-to-one RNN, in which every subsequence of a session is taken into account beginning with the first element.
Three parts may be identified in the neural network architecture. For each user trip booking, the model first concatenates all of the feature embeddings. A many-to-many RNN encoder is then fed these concatenated vectors to produce the probability mass function for the following city given the prior cities at each time step. The top 4 cities in the probability mass function of the last step are then suggested. The overall design of many-to-many RNN is shown in Fig. 9.
The empirical findings demonstrate how this model can significantly outperform many-to-one RNNs. It also demonstrates how this biases the learning issue in favour of shorter visits and suggests a remedy by weighting the loss function of the model. With this method, it is impossible to successfully address the bias generated by taking into account all sub sequences in the many-to-many design, much alone the consequence of having an unequal distribution of sequence lengths in each batch.
Chamorro et al. (2019) contend that many-to-many RNN topologies may be used for multidate crop identification in order to accurately identify crop classes in tropical regions at each date recorded in a multitemporal sequence. This approach for multidate crop recognition from multitemporal RS data created a special many-to-many configuration of a bidirectional ConvLSTM. A bidirectional LSTM in the proposed architecture takes inputs from a Fully Convolutional Network (FCN) encoder with a reduced spatial resolution. After the LSTM has processed the encoder’s input, the output is applied to a decoder to create a pixel-wise label representation at the original spatial resolution.
As seen in Fig. 10, this system integrates components of the architecture described in Rußwurm and Korner (2018) with the encoder-decoder structure from an FCN. For this encoder-decoder design, many FCN designs might be taken into consideration. We employ the dense FCN developed in Jegou et al. (2017) in the current work. In this construction, the three main types of blocks are: The Transition Up (TU) blocks execute up sampling operations, often a transposed convolution, the Transition Down (TD) blocks combine a convolution with a down sampling operation, and the Dense blocks (DB) are made up of sequences of convolutional layers with several bypassing connections. Between the steps of down sampling and up sampling, skip connections are employed.
On both datasets, BDenseConvLSTM scored the top, with a more noticeable performance gap in the LEM dataset. Utilizing this network for many-to-many, multi-temporal crop detection applications are therefore advised.
在两个数据集上,BDenseConvLSTM 得分最高,LEMM 数据集中的性能差距更为显着。因此建议利用这种网络进行多对多、多时期作物检测应用。
2.7 Reinforcement learning and ensemble learning for fine turning RNN models
2.7 增强学习和集成学习用于微调 RNN 模型
The sequential decision-making issue is approached differently by reinforcement learning than by other methodologies. In reinforcement learning, the ideas of an environment and an agent are frequently presented first. The agent can do a number of actions in the environment, each of which affects the state of the environment and has the potential to produce rewards (feedback): “positive” for excellent action sequences that produce “good” states and “negative” for poor action sequences that produce “bad” states. Reinforcement learning’s goal is to develop desirable behaviour patterns through interacting with the environment, which is generally referred to as a policy. Deep Reinforcement Learning (DRL) seeks to develop intelligent agents that can effectively learn to tackle challenging issues in the real world (Yadav et al. 2022). In order to solve the algorithmic trading problem of selecting the ideal trading position at any point during a trading activity in the stock market, this research paper proposes a novel approach based on deep reinforcement learning (DRL). In order to maximise the performance indicator known as the Sharpe ratio in a range of stock markets, Théate and Ernst (2021) provide a distinctive DRL trading technique. Based on the well-known DQN algorithm but significantly modified to address the specific algorithmic trading difficulty at hand, the Trading Deep Q-Network algorithm (TDQN) is a novel DRL approach. The generation of new trajectories from a small sample of past stock market data serves as the only basis for training the resulting in RL agent. RL is illustrated in Fig. 11.
顺序决策问题在强化学习和其他方法论之间有所不同。在强化学习中,环境和代理的概念经常首先被提出。代理可以在环境中执行多种操作,每种操作都会影响环境的状态,并有可能产生奖励(反馈):“积极”表示产生“良好”状态的优秀操作序列,“消极”表示产生“糟糕”状态的不良操作序列。强化学习的目标是通过与环境的交互来发展良好的行为模式,通常被称为策略。深度强化学习(DRL)旨在开发可以有效学习应对现实世界中具有挑战性问题的智能代理(Yadav 等人 2022)。为了解决在股市交易活动中在任何时间点选择理想交易位置的算法交易问题,本研究提出了一种基于深度强化学习(DRL)的新方法。 为了在各种股票市场中最大化被称为夏普比率的绩效指标,瑟特(Théate)和恩斯特(Ernst)(2021)提供了一种独特的 DRL 交易技术。基于著名的 DQN 算法,但经过显著修改以解决手头特定的算法交易困难,交易深度 Q 网络算法(TDQN)是一种新颖的 DRL 方法。将从过去的股市数据的一小部分样本生成新的轨迹作为训练 RL 代理的唯一基础。RL 在图 11 中展示。
The developing strategies to maximise an optimality criteria π, that directly dependent on instant rewardsrtunder certain time horizon is what RL approaches concerned with. The most common optimality criterion is the expected discounted sum of benefits over an indefinite time horizon. The optimum policy result π* is mathematically expressed as,
The crucial element R needs to be tweaked to provide the desired behaviour. Due to its large variance, DRL algorithms, like DQN, can be difficult to apply correctly to some situations, especially when the training and test sets are quite dissimilar.
关键元素 R 需要进行调整以提供所需的行为。由于其方差较大,如 DQN 这样的 DRL 算法在某些情况下可能很难正确应用,特别是当训练集和测试集相当不同的情况下。
3 Scenarios of DNN models
3深度神经网络模型的场景
Deep Neural Network (DNN) models work differently in various scenarios of operation, and each architecture is designed to handle specific types of data and tasks which is illustrated in Table 1. CNNs have been extensively applied in a range of different fields, including computer vision, speech processing, Face Recognition, etc.
深度神经网络(DNN)模型在各种操作场景中的工作方式不同,每种架构都设计为处理特定类型的数据和任务,这在第1表中有详细说明。CNN 已经广泛应用于各种不同领域,包括计算机视觉、语音处理、人脸识别等。
Below are some of the common datasets which are utilized in various deep neural network applications.
以下是一些常用于各种深度神经网络应用的常见数据集。
-
Question Answering Most often utilised dataset for question answering datasets are Natural Questions (NQ) Dataset (Kwiatkowski et al. 2019), Google released this dataset. There are 300 k questions in it, and the context is provided by the lengthy answers to each question. TyDi QA Dataset was released in 2020 (Clark et al. 2020). It contains about 200 k questions in 11 typologically distinct languages. QuAC Dataset was released in 2018 (Choi et al. 2018), in a conversational style, it comprises of 14 000 questions and the situations that go with them. SQuAD 2.0 Dataset is a newer iteration of the SQuAD dataset that was first released in 2018 (Rajpurkar et al. 2018). It contains more than 100 k questions and the settings for each answer. These datasets have been widely used to train and evaluate deep neural network models for question answering tasks.
问答 用于问题回答数据集中最常用的是自然问题(NQ)数据集(Kwiatkowski 等人,2019),由谷歌发布。其中包含 300k 个问题,每个问题的背景由详细答案提供。 TyDi QA 数据集于 2020 年发布(Clark 等人,2020)。它包含大约 11 种不同类型语言的 20k 个问题。QuAC 数据集于 2018 年发布(Choi 等人,2018)。以对话方式展示,包含 14,000 个问题及相关情境。SQuAD 2.0 数据集是 SQuAD 数据集的更新版本,首次发布于 2018(Rajpurkar 等人,2018)。其中包含 100k 多个问题以及每个答案的背景信息。这些数据集被广泛用于训练和评估用于问题回答任务的深度神经网络模型。 -
Machine Translation The WMT (Workshop on Machine Translation) (WMT Dataset: https://www.statmt.org/wmt21/translation-task.html) dataset is one of the most used for machine translation. The collection is composed of parallel corpora in a variety of languages and fields, including news, the web, and subtitles. English-German, English-French, and English-Chinese are the most frequently used language pairings in the dataset (Dai et al. 2019).
-
Image Classification Many publicly accessible image classification datasets are available, and these datasets can be used to train and test deep neural networks. ImageNet is a large dataset with over a million images divided into a thousand distinct types. It has been widely used to assess how well deep neural networks perform in classifying images (Deng et al. 2009). These smaller datasets, CIFAR-10 and CIFAR-100, have 10 and 100 classifications, respectively. They are frequently used as benchmarks for image classification algorithms and contain 32 × 32 colour images (Krizhevsky and Hinton 2009). MNIST dataset comprises of 10,000 test images and 60,000 training images of handwritten numbers. The photos are 28 × 28 pixels in size and are grayscale (LeCun et al. 2010). The Fashion-MNIST dataset includes 10,000 test images and 60,000 training photos of various articles of apparel. The photos are 28 × 28 pixels in size and are grayscale (Xiao et al. 2017). Although COCO is a sizable dataset for object detection, segmentation, and captioning, it also contains image classification labels for each of its 80 object categories (Lin et al. 2014). Over 9 million images in the OpenImages dataset have labels for object detection, segmentation, and classification. It is a sizable dataset (Kuznetsova et al. 2020).
图像分类 许多可公开访问的图像分类数据集可供使用,并且可以用来训练和测试深度神经网络。ImageNet 是一个庞大的数据集,包含超过一百万张图像,分为一千种不同的类型。它被广泛用来评估深度神经网络在图像分类中的性能表现 (Deng 等人 2009)。这些较小的数据集,CIFAR-10 和 CIFAR-100,分别具有 10 和 100 个分类。它们经常被用作图像分类算法的基准,并包含 32 × 32 的彩色图像 (Krizhevsky 和 Hinton 2009)。MNIST 数据集包括 10,000 张测试图像和 60,000 张手写数字的训练图像。这些照片尺寸为 28 × 28 像素,是灰度图像 (LeCun 等人 2010)。Fashion-MNIST 数据集包括 10,000 张测试图像和 60,000 张各种服装的训练照片。这些照片尺寸为 28 × 28 像素,是灰度图像 (Xiao 等人 2017)。 尽管 COCO 是一个用于目标检测、分割和字幕生成的庞大数据集,但它还包含了每个物体类别的图像分类标签(Lin 等人 2014)。OpenImages 数据集中的超过 900 万张图像都包含了目标检测、分割和分类的标签。这是一个庞大的数据集(Kuznetsova 等人 2020)。 -
Object Detection Deep neural networks are trained using a variety of well-known object identification datasets. The COCO (Common Objects in Context) dataset (COCO Dataset: https://cocodataset.org/), which has more than 330,000 photos and over 2.5 million object instances labelled in 80 categories, is a large object identification, segmentation, and captioning collection. This dataset of annotated images for object recognition, segmentation, and classification applications is called Pascal VOC (Visual Object Classes) (Pascal VOC Dataset: http://host.robots.ox.ac.uk/pascal/VOC/ Over 11,500 images and over 27,000 labelled object instances make up the dataset’s 20 object categories. Open Images (Open Images Dataset: https://storage.googleapis.com/openimages/web/index.html) dataset consists of over 9 million image URLs that have been tagged. It has annotations for identifying objects, segmenting data, and identifying visual relationships. One of the largest datasets available for object detection research, the collection contains more than 600 item classifications. KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) (KITTI Dataset: http://www.cvlibs.net/datasets/kitti/) is a dataset for autonomous driving research that consists of pictures taken while a car is moving and annotated to aid with object recognition, object tracking, and scene comprehension. More than 80,000 instances of each of the eight main item groups are labelled over more than 7500 photos in the collection. With more than 1.2 million images and more than 1000 item categories, ImageNet (ImageNet Dataset: http://www.image-net.org/) is a sizable dataset for object recognition and image classification. The dataset serves as the foundation for the yearly ImageNet Large Scale Visual Recognition Challenge (ILSVRC) and has been extensively utilised for object recognition and other computer vision applications.
目标检测 深度神经网络是使用各种众所周知的对象识别数据集进行训练的。COCO(Common Objects in Context)数据集(COCO 数据集:https://cocodataset.org/)共有超过 33 万张照片,以及超过 250 万个属于 80 个分类的对象实例的标签,是一个大型的对象识别、分割和字幕集合。用于对象识别、分割和分类应用的带标注图像的数据集被称为 Pascal VOC(Visual Object Classes)(Pascal VOC 数据集:http://host.robots.ox.ac.uk/pascal/VOC/) 其中包括约 11500 张图像和超过 27000 个带标签的对象实例,组成了该数据集的 20 个对象类别。Open Images(Open Images 数据集:https://storage.googleapis.com/openimages/web/index.html)数据集包含超过 900 万个带标签的图像 URL。它具有用于识别对象、分割数据和识别视觉关系的标注。作为对象检测研究中可用的最大数据集之一,该收集包含 600 多种项目分类。 KITTI(卡尔斯鲁厄理工学院和丰田技术研究所)(KITTI 数据集:http://www.cvlibs.net/datasets/kitti/)是用于自动驾驶研究的数据集,其中包含汽车行驶时拍摄的图片,并用于辅助物体识别、物体跟踪和场景理解。该集合中涵盖了八个主要物体类别的超过 80,000 个实例,在超过 7500 张照片中进行了标注。在拥有超过 120 万张图片和超过 1000 个物体类别的情况下,ImageNet(ImageNet 数据集:http://www.image-net.org/)是一个庞大的物体识别和图像分类数据集。该数据集是每年 ImageNet 大规模视觉识别挑战赛(ILSVRC)的基础,并已广泛用于物体识别和其他计算机视觉应用。 -
Speech Recognition Voice dataset (Mozilla Common Voice dataset: https://commonvoice.mozilla.org/en/datasets), which Mozilla maintains is a multilingual dataset including recordings of speakers in more than 50 different languages. The open-source dataset is constantly expanding thanks to volunteer contributions from all across the world. The LibriSpeech dataset (LibriSpeech dataset: http://www.openslr.org/12/), which is based on audiobooks and has more than 1000 h of voice recordings from various speakers, is another extensively used dataset. It is typical practise to train voice recognition algorithms using this dataset. The Google Speech Commands dataset (Google Speech Commands dataset: https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html), which includes brief audio recordings of straightforward spoken commands like “stop” and “go” that are often used in voice-controlled devices, serves as a third illustration. Many keyword spotting tasks use this dataset.
语音识别 语音数据集(Mozilla Common Voice 数据集:https://commonvoice.mozilla.org/zh-CN/datasets),Mozilla 维护的多语种数据集,包括 50 多种不同语言的发言者录音。这个开源数据集因来自世界各地的志愿者贡献而不断扩大。基于有声读物的 LibriSpeech 数据集(LibriSpeech 数据集:http://www.openslr.org/12/),包含来自各种发言者的 1000 多小时的语音录音,是另一个被广泛使用的数据集。使用这个数据集训练语音识别算法是一种典型做法。Google 语音命令数据集(Google 语音命令数据集:https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html),其中包括简短的口头命令的音频录音,如“停止”和“前进”,这些命令通常用于语音控制设备,是第三个例子。许多关键词识别任务使用这个数据集。 -
Anomaly Detection Depending on the relevant area or application, there are several datasets accessible for anomaly detection. The Numenta Anomaly Benchmark (NAB) (Numenta Anomaly Benchmark dataset: https://github.com/numenta/NAB) dataset was created primarily for testing anomaly detection techniques. There are more than 50 real-world datasets in it, including statistics on traffic, energy use, and machine measurements. KDD Cup 99 dataset (KDD Cup 99 dataset: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html) is frequently employed in the assessment of network intrusion detection systems. It has a significant collection of information about both typical and unusual network activity. This dataset from NASA (NASA Turbofan Engine dataset: https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/#turbofan), which comprises sensor readings from turbofan engines, is frequently used to forecast when a malfunction or abnormality would happen. The Yahoo S5 dataset (Yahoo S5 dataset: https://webscope.sandbox.yahoo.com/catalog.php?datatype=s&did=70) is a sizable time-series data set derived from Yahoo’s advertising platform. The purpose of it is to find abnormalities in advertising performance numbers.
异常检测 根据相关领域或应用,有几个可用于异常检测的数据集。Numenta 异常基准(NAB)(Numenta 异常基准数据集:https://github.com/numenta/NAB)数据集主要用于测试异常检测技术。其中包含 50 多个真实世界数据集,包括交通统计、能源使用和机器测量。KDD Cup 99 数据集(KDD Cup 99 数据集:http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html)经常用于评估网络入侵检测系统。它包含有关典型和非典型网络活动的大量信息。这个来自 NASA 的数据集(NASA 涡轮风扇引擎数据集:https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/#turbofan)包括来自涡轮风扇引擎的传感器读数,经常用于预测何时会发生故障或异常。雅虎 S5 数据集(雅虎 S5 数据集:https://webscope.sandbox.yahoo.com/catalog.php?datatype=s&did=70) is a sizable time-series data set derived from Yahoo’s advertising platform. The purpose of it is to find abnormalities in advertising performance numbers. -
Generative Models Neural networks that can learn to create new data samples that are comparable to a given training dataset are known as generative models. MNIST dataset (MNIST dataset: http://yann.lecun.com/exdb/mnist/) for training and assessing generative models, this dataset of 60,000 handwritten digits (0–9) is frequently utilised. The CIFAR-10 and CIFAR-100 datasets (CIFAR-10 and CIFAR-100 datasets: https://www.cs.toronto.edu/~kriz/cifar.html) are collections of 32 × 32 colour photographs that, respectively, include 10 and 100 classes. They are frequently employed for developing and testing generative models. The ImageNet dataset (ImageNet dataset: http://www.image-net.org/) is a sizable collection of over a million high-resolution pictures organised into a thousand different classifications. It is frequently employed to train generative models that can produce excellent images. CelebA dataset (CelebA dataset: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) is a large number of celebrity faces with numerous annotations and characteristics. It is frequently employed for testing and refining generative models that may produce realistic faces.
生成模型 可以学习生成与给定训练数据集相当的新数据样本的神经网络被称为生成模型。MNIST 数据集(MNIST 数据集: http://yann.lecun.com/exdb/mnist/)用于训练和评估生成模型,这个包含 60,000 个手写数字(0-9)的数据集被广泛使用。CIFAR-10 和 CIFAR-100 数据集(CIFAR-10 和 CIFAR-100 数据集: https://www.cs.toronto.edu/~kriz/cifar.html)是包含 32x32 像素彩色照片的集合,分别有 10 和 100 个类别。它们经常被用于开发和测试生成模型。ImageNet 数据集(ImageNet 数据集: http://www.image-net.org/)是一个分成一千个不同分类的超过一百万高分辨率图片的大集合。它经常被用来训练能够生成出色图像的生成模型。CelebA 数据集 (CelebA 数据集: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) 是一个包含大量名人脸部图像的数据集,附带着许多注释和特征。 经常用于测试和改进可能生成逼真面孔的生成模型。 -
Natural Language Processing For NLP tasks, a variety of datasets are accessible, including: A collection of movie reviews with sentiment annotations is the Stanford Sentiment Treebank (Socher et al. 2013). The Large Movie Review Dataset is a collection of movie reviews used to classify binary sentiment (Maas et al. 2011). A collection of product reviews from Amazon is available for sentiment analysis and product suggestion (McAuley and Leskovec 2013). The News Aggregator Dataset is a collection of news items used to identify topics (Nakov et al. 2013). A collection of questions and answers for machine comprehension is called the Question-Answering collection (Choi and Cardie 2008). A dataset for named entity recognition in news stories is the CoNLL 2003 NER Dataset (Sahu and Anand 2017). A dataset for language modelling based on Wikipedia articles is called the WikiText Language Modelling Dataset (Merity et al. 2016). The GLUE Benchmark is a collection of datasets for tasks related to natural language comprehension, such as textual entailment, sentiment analysis, and paraphrase identification (Wang et al. 2018b).
自然语言处理 对于 NLP 任务,可访问各种数据集,包括: 一组带有情感标注的电影评论集合是斯坦福情感树库(Socher 等人,2013)。 大型电影评论数据集是一组用于分类二元情感的电影评论(Maas 等人,2011)。 从亚马逊收集的产品评论集可用于情感分析和产品建议(McAuley 和 Leskovec,2013)。 新闻聚合数据集是一组用于识别主题的新闻项目(Nakov 等人,2013)。 一组用于机器理解的问题和答案的数据集称为问答集合(Choi 和 Cardie,2008)。 一份新闻故事中命名实体识别的数据集是 CoNLL 2003 NER 数据集(Sahu 和 Anand,2017)。 基于维基百科文章的语言建模数据集称为 WikiText 语言建模数据集(Merity 等人,2016). GLUE 基准是一个数据集合,用于与自然语言理解相关的任务,如文本蕴涵、情感分析和释义识别(Wang et al. 2018b)。 -
Recommendation Systems The MovieLens dataset, which includes user-rated movie reviews, is a well-liked dataset for recommendation engines. This dataset, which is frequently used for collaborative filtering-based recommendation systems, has around 27,000 films and 138,000 people (Harper and Konstan 2015). The Amazon product reviews dataset, which includes millions of evaluations and ratings for various goods, is another often used dataset for recommendation systems (Amazon product reviews dataset: https://s3.amazonaws.com/amazon-reviews-pds/readme.html).
-
Time-Series Analysis The M4 competition dataset (M4 competition dataset: https://www.m4.unic.ac.cy/the-dataset/), which includes over 100,000 time series from a variety of industries, including banking, energy, and transportation, is a well-liked dataset for time-series analysis. The dataset is helpful for comparing the precision of various forecasting algorithms since it contains historical values as well as prediction horizons for each time series. The NOAA Global Historical Climatology Network dataset (Menne et al. 2012), which includes daily weather measurements from hundreds of meteorological stations worldwide, is another extensively used dataset for time-series analysis. For climate analysis and weather forecasting, this dataset is often utilised.
-
Generative Models The MNIST dataset (MNIST homepage: http://yann.lecun.com/exdb/mnist/) for generating handwritten digits and the ImageNet dataset for generating images are only a couple of the publicly accessible datasets for generative modelling applications (Deng et al. 2009). The OpenAI GPT series models, which are extensively utilised for natural language processing applications, are also built via generative modelling (Radford et al. 2018)
4 Discussions
In several areas, including computer vision, natural language processing, and speech recognition, deep learning algorithms have achieved considerable strides. Researchers and professionals should be mindful of the restrictions and weaknesses that remain, nevertheless (Rusch and Siddhartha 2021). These are a few of them: (i) Deep learning models need a lot of high-quality data to train well. This can be difficult in some fields where data is hard to get by, expensive to gather or interpret, or biased, (ii) Deep learning models are sometimes seen as “black boxes,” making it difficult to comprehend how they make judgements. In circumstances where accountability and openness are crucial, this may be an issue, (iii) Deep learning models occasionally overfit to the training set, which causes them to perform well on the training set but badly on fresh, untried data, (iv) Deep learning methods may have difficulty extrapolating their results from training distributions to novel, unexplored data. Situations where the data distribution varies over time or between several settings might make this troublesome, (v) Deep learning models demand a lot of processing horsepower to run and train. For businesses with constrained computing resources or financial resources, this may be difficult, (vi) Deep learning algorithms are susceptible to adversarial assaults, in which rogue individuals purposefully alter the input data to deceive the model’s predictions, and (vii) Deep learning models have made great strides, yet they still lag behind human performance in key domains, such as common sense thinking, creativity, and emotional intelligence (Liu et al. 2021). Deep learning models may have distinct weaknesses and advantages in various operational circumstances. Deep learning models, for instance, must be extremely effective and precise in real-time applications like self-driving vehicles or robots, as well as able to function in dynamic and unexpected contexts. In contrast, interpretability and explain ability may be more important in areas like healthcare or finance.
在几个领域,包括计算机视觉、自然语言处理和语音识别方面,深度学习算法已取得了相当大的进展。研究人员和专业人士应该注意仍然存在的限制和弱点(鲁斯和悉达尔塔 2021)。这些是其中一部分:(i) 深度学习模型需要大量高质量数据才能训练良好。在一些数据难以获取、收集或解释成本高昂,或者存在偏见的领域中,这可能会很困难,(ii) 深度学习模型有时被视为“黑匣子”,使人难以理解它们是如何做出判断的。在需要问责制和开放性至关重要的情况下,这可能是一个问题,(iii) 深度学习模型有时会过度拟合训练集,导致它们在训练集上表现良好,但在新鲜、未经验证的数据上表现糟糕,(iv) 深度学习方法可能难以将其结果从训练分布推广到新颖、未经探索的数据。 在数据分布随时间变化或在多个设置之间变化的情况下,这可能会令人头疼。深度学习模型需要大量的处理能力来运行和训练。对于受限的计算资源或财务资源的企业来说,这可能会很困难。深度学习算法容易受到对抗性攻击,即流氓个体故意更改输入数据以欺骗模型的预测。深度学习模型已经取得了很大进展,但在关键领域如常识思维、创造力和情感智能方面仍落后于人类表现。深度学习模型在各种操作环境中可能具有不同的弱点和优势。例如,深度学习模型在实时应用如自动驾驶车辆或机器人中必须非常高效和精确,并且能够在动态和意外的情境中运行。相比之下,在医疗保健或金融领域,可解释性和可解释性可能更加重要。
Although vanilla RNNs may be used for classification, these techniques have a number of drawbacks and disadvantages that need to be taken into account such as, disappearing and exploding gradients, limited memory, difficult with irregular time intervals, require fixed-length inputs and outputs, not well-suited for handling inputs that come from different modalities, need for large amounts of data, and interpretability (Kag and Venkatesh 2021). Deep learning RNNs have been effectively used for a variety of applications, including natural language processing, speech recognition, picture and video captioning, time series prediction, and music generation. However, the effectiveness of these models can vary depending on the specific task and the quality and quantity of the training data. These limitations have led to the development of alternative architectures and techniques such as LSTMs, GRUs, attention mechanisms, and transformer models, which aim to address some of these issues. Long Short-Term Memory (LSTM) techniques for deep learning provide several benefits over vanilla RNNs, but they also have significant drawbacks and shortcomings that need to be taken into account like less memory, computationally expensive to train, require large amounts of labeled data to train, require fixed-length inputs and outputs, and difficult to interpret (Elmaz et al. 2021). In deep learning several different tasks, including speech recognition, video and image captioning, machine translation, and sentiment analysis, have been successfully completed with LSTM. In fields like medical diagnostics, where long-term relationships and temporal patterns are crucial, LSTM models have also demonstrated encouraging outcomes. However, the effectiveness of LSTM models can vary depending on the specific task and the quality and quantity of the training data. GRU techniques for deep learning provide several benefits over traditional RNNs and LSTMs, but they also have significant drawbacks including as the size of the data increases the computational requirements of deep learning models also increase, data quality, and data labeling. Overall, on analysing all the deep learning approaches, the GRU approaches can be effective for big data, there are several challenges that need to be addressed to ensure the models perform well and are practical to use. Overall, the GRU on large data has a promising future, with continuous research aimed at enhancing its scalability, performance, and interaction with other technologies to overcome the difficulties presented by big data analysis. Table 2 shows comparison of various deep neural network models along with its advantages and limitations.
尽管香草 RNNs 可以用于分类,但这些技术存在许多缺点和不利之处,需要考虑,比如消失和梯度爆炸、有限的内存、不规则时间间隔的困难、需要固定长度的输入和输出、不适合处理来自不同模态的输入、需要大量数据和可解释性(Kag 和 Venkatesh 2021)。深度学习 RNNs 已经被成功用于各种应用,包括自然语言处理、语音识别、图片和视频字幕、时间序列预测和音乐生成。然而,这些模型的有效性可能会根据具体任务、训练数据的质量和数量而变化。这些限制导致了替代架构和技术的开发,如 LSTMs、GRUs、注意机制和转换器模型,旨在解决部分问题。 长短期记忆(LSTM)技术对深度学习提供了多项优势,但它们也存在显著的缺点和问题,需要考虑,比如记忆较少,训练计算量大,需要大量标记数据来训练,需要固定长度的输入和输出,并且难以解释(Elmaz 等人,2021)。在深度学习中,包括语音识别、视频和图像字幕、机器翻译和情感分析在内,多种不同任务已成功使用 LSTM 完成。在医学诊断等领域中,长期关系和时间模式至关重要,LSTM 模型也展现出了令人鼓舞的成果。然而,对于具体任务以及训练数据的质量和数量,LSTM 模型的有效性可能会有所不同。GRU 技术对深度学习与传统 RNN 和 LSTM 提供了多项优势,但也存在显著缺点,包括随着数据规模增加,深度学习模型的计算需求也增加,数据质量和标记。 总的来说,分析所有深度学习方法后,GRU 方法对于大数据可以起到有效作用,有一些挑战需要解决,以确保模型表现良好且实用。总的来看,基于大数据的 GRU 有着充满希望的未来,正在持续研究以增强其可伸缩性、性能,以及与其他技术的交互,以克服大数据分析带来的困难。表格 2 显示了各种深度神经网络模型的比较,以及其优势和局限性。
Depending on the specific challenge at present and the available resources, numerous configurations for developing Deep Neural Network (DNN) models based on Big Data may be appropriate. The various designs and approaches utilised in DNNs for Big Data are summarised by Zhang et al. (2021b). The authors show how various topologies, such as deep belief networks (DBNs), recurrent neural networks (RNNs), and convolutional neural networks (CNNs), may be used in various applications. Wang et al. (2021) investigate multiple frameworks and setups to construct DNNs for Big Data analytics. The authors describe how various techniques, such as CNNs, RNNs, and autoencoders, may be used to analyse various kinds of Big Data, including picture, text, and graph data. The difficulties and opportunities at this intersection of big data and DNNs are discussed by Chen et al. (2021a). Future research is suggested by the authors in a number of areas, including the creation of more effective and efficient DNN architectures, the incorporation of DNNs with other AI technologies, and the improvement of Big Data processing algorithms.
The use of knowledge transformation methods to enhance the efficiency of deep learning models for Big Data has been explored in a number of researches. To increase the convergence and accuracy of deep learning models, data normalisation, which entails altering the input data to have a standard distribution, is one way of knowledge transformation. The performance of deep learning models can be enhanced by using knowledge transformation techniques, according to Zhang et al. (2021b). They provide a number of methods, including dimensionality reduction, feature scaling, feature selection, and data normalisation, which may be used to alter the input data to enhance the performance of deep learning models. Using pre-trained models or information acquired from one task to tackle another related task is known as transfer learning. Transfer learning may be thought of as a type of knowledge transformation where the information from one activity is applied to another task that is related. Fine-tuning, feature extraction, and domain adaptation are a few transfer learning approaches covered by Acharya et al. (2021) that can help deep neural networks perform better across a range of domains by transferring information. According to Zeng et al. (2021), Gilmer et al. (2017), shifting information from an enormous instructor network to a smaller student network may be accomplished by using an effective and precise knowledge distillation approach. The process of distilling and transforming a big teacher network’s body of information into a smaller student network is known as knowledge transformation. The student network is initially trained on the labelled data by Zeng et al. (2021) before being fine-tuned using the unlabeled data using the knowledge distillation technique. They also suggest a brand-new attention method that may aid in sharpening the pupil network’s concentration on crucial details and enhancing knowledge transmission. Overall, this method can aid in decreasing the size and computing expense of deep neural networks while keeping high accuracy, making it a helpful method for effective deep learning applications.
将知识转化方法用于增强大数据深度学习模型的效率已经在许多研究中得到探索。为了增加深度学习模型的收敛性和准确性,数据归一化是一种知识转化的方式,它涉及改变输入数据以使其具有标准分布。根据张等人的观点(2021b),利用知识转化技术可以提高深度学习模型的性能。他们提供了许多方法,包括降维、特征缩放、特征选择和数据归一化,这些方法可以用来改变输入数据以提高深度学习模型的性能。利用预先训练的模型或从一个任务中获取的信息来处理另一个相关任务被称为迁移学习。迁移学习可以被视为一种知识转化,其中来自一个活动的信息应用于另一个相关任务。微调、特征提取和领域适应是阿查里亚等人讨论的几种迁移学习方法。 (2021)可以帮助深度神经网络在各个领域表现更好,通过传输信息。根据曾等人(2021),吉尔默等人(2017)的说法,可以通过使用有效和精确的知识蒸馏方法,将来自庞大教师网络的信息转移到较小的学生网络。将大教师网络的信息体系提炼和转化为较小的学生网络的过程称为知识转换。学生网络最初由曾等人(2021)基于标记数据进行训练,然后使用知识蒸馏技术对未标记数据进行微调。他们还提出了一种全新的关注方法,可以帮助提高学生网络对重要细节的注意力,并增强知识传输。总的来说,这种方法可以帮助减少深度神经网络的大小和计算开销,同时保持高精度,使其成为有效的深度学习应用的有用方法。
It may be difficult to choose the best coefficients for deep neural network models, and this process frequently calls for rigorous testing with various configurations (Acharya et al. 2021). The parameters that are changed during training, such as learning rate, regularisation strength, and batch size, are referred to as the coefficients in a deep neural network model. Researchers frequently use intuition, experience, and trial-and-error experimentation to get the ideal coefficients for a deep neural network model (Gilmer et al. 2017). They might begin by leaving the coefficients at their default settings and then gradually adjust them in accordance with how the model performs on a validation set. Until the ideal configuration is found, the coefficients are typically adjusted and the performance is evaluated several times. Researchers may experiment with different configurations of the deep neural network model, such as the architecture, the number of layers, and the activation functions, in addition to changing the coefficients. The model’s performance can be significantly impacted by these setups, therefore they might need to be modified along with the coefficients. Deep neural network models are often chosen with specific topologies and coefficients based on a mix of theoretical knowledge and empirical analysis. In order to comprehend how the model behaves and how it performs across various datasets, researchers may employ statistical analysis and visualisation approaches (Acharya et al. 2021). Overall, choosing the best coefficients and configurations for a deep neural network model is a difficult and iterative process that needs careful testing and evaluation.
可能很难选择深度神经网络模型的最佳系数,并且这个过程通常需要进行严格的测试,使用不同的配置(Acharya 等人,2021)。在训练过程中更改的参数,例如学习率,正则化强度和批量大小,在深度神经网络模型中被称为系数。研究人员经常使用直觉,经验和试错实验来获得深度神经网络模型的理想系数(Gilmer 等人,2017)。他们可能会从将系数保持在默认设置开始,然后根据模型在验证集上的表现逐渐调整它们。直到找到理想的配置为止,系数通常会被调整,并且性能会被评估多次。研究人员可能会尝试不同的深度神经网络模型配置,例如架构,层数和激活函数,除了更改系数。 模型的性能可能受这些设置的显著影响,因此它们可能需要随系数一起进行修改。深度神经网络模型通常根据理论知识和经验分析的混合选择特定的拓扑结构和系数。为了理解模型的行为以及在各种数据集上的表现,研究人员可能会采用统计分析和可视化方法(Acharya 等人 2 2021)。总的来说,为深度神经网络模型选择最佳系数和配置是一个需要仔细测试和评估的困难且迭代的过程。
5 Conclusion
5 结论
In this survey, comprehensive study about the predictions, applications, and challenges of big data is performed. The types of neural networks that may be used to manage big data include recurrent neural networks (RNNs), long short-term memory (LSTM), and gated recurrent units (GRU). When processing sequential data, such as time-series, natural language, and audio data, these models are very helpful. RNNs can manage variable-length sequences, making them appropriate for big data processing. Both time-series analysis and natural language processing frequently make use of them. It can be difficult to train models on lengthy sequences because to the vanishing gradient problem, which is one restriction of RNNs. The vanishing gradient issue can be solved by LSTMs, a sort of RNN, by including memory cells that can retain data for extended periods of time. Speech recognition, sentiment analysis, and language modelling are common applications for LSTMs. LSTMs are especially helpful for managing huge data because they can effectively analyse lengthy sequences. GRUs is comparable to LSTMs in that they are effective at handling lengthy sequences. They are quicker to train and less prone to overfitting than LSTMs, although they have fewer parameters than LSTMs. Speech recognition, video analysis, and natural language processing are common applications for GRUs. For managing big data, useful models include RNNs, LSTMs, and GRUs. Variable-length sequences can be handled by RNNs, long-term information can be stored by LSTMs, and extended sequences can be effectively handled by GRUs with fewer parameters. The appropriate model will be determined by the particular application and the kind of data being studied. There are several potential future research directions for using RNNs, LSTMs, and GRUs in big data analysis such as model optimization, incorporating attention mechanisms, handling missing data and multi-modal learning. Researchers are exploring how RNNs, LSTMs, and GRUs can be adapted for multi-modal learning to handle more complex big data analysis tasks.
在这项调查中,对大数据的预测、应用和挑战进行了全面研究。可能用于管理大数据的神经网络类型包括递归神经网络(RNN)、长短期记忆(LSTM)和门控递归单元(GRU)。在处理时序数据(如时间序列、自然语言和音频数据)时,这些模型非常有帮助。RNN 可管理可变长度序列,使其适用于大数据处理。时间序列分析和自然语言处理经常使用它们。在长序列上训练模型可能有困难,因为 RNN 的消失梯度问题,这是 RNN 的一个限制。通过包含可以保存数据较长时间的记忆单元来解决消失梯度问题,这是 LSTM 的一种类型。语音识别、情感分析和语言建模是 LSTM 的常见应用。LSTM 特别有助于管理大数据,因为它们可以有效分析较长序列。GRU 与 LSTM 相似,因为它们能够有效处理长序列。 他们比 LSTMs 更容易训练,且不容易过拟合,虽然比 LSTMs 拥有更少的参数。语音识别、视频分析和自然语言处理是 GRUs 的常见应用。对于处理大数据,有用的模型包括 RNNs、LSTMs 和 GRUs。RNNs 可以处理可变长度的序列,LSTMs 可以存储长期信息,而 GRUs 可以有效处理较长序列,并且拥有更少的参数。适当的模型将由特定应用和研究的数据类型确定。在大数据分析中,使用 RNNs、LSTMs 和 GRUs 的潜在未来研究方向包括模型优化、整合注意机制、处理缺失数据和多模式学习。研究人员正在探索如何适应多模式学习,以处理更复杂的大数据分析任务。
References 参考资料
Acharya S, Rai A, Venkatesh S, Ravindranath Chowdary C (2021) A review of transfer learning in deep neural networks. J Big Data 8(1):1–21
Ahmad J, Larijani H, Emmanuel R, Mannion M, Javed A, Phillipson M (2017) Energy demand prediction through novel random neural network predictor for large non-domestic buildings. In: Proc. Annu. IEEE Int. Syst. Conf., pp 1–6. https://doi.org/10.1109/SYSCON.2017.7934803
Ahmad J, Larijani H, Emmanuel R, Mannion M, Javed A, Phillipson M (2017)能源需求预测通过新型大型非居住建筑物的随机神经网络预测器。在: IEEE Int. Syst. Conf. 年度会议论文集,第 1-6 页。https://doi.org/10.1109/SYSCON.2017.7934803Akbal Y, Ünlü KD (2022a) A univariate time series methodology based on sequence-to-sequence learning for short to midterm wind power production. Renew Energy 200:832–844. https://doi.org/10.1016/j.renene.2022.10.055
Akbal Y, Ünlü KD (2022a) 基于序列到序列学习的单变量时间序列方法用于短中期风电生产。Renew Energy 200:832-844。 https://doi.org/10.1016/j.renene.2022.10.055Akbal Y, Ünlü KD (2022b) A deep learning approach to model daily particular matter of Ankara: Key features and forecasting. Int J Environ Sci Technol 19(7):5911–5927. https://doi.org/10.1007/s13762-021-03730-3 重试 错误原因
Alaluf I, Polyak A, Goldberg Y (2021) SparseGAN: Sparsity-promoting generative adversarial networks for compressed sensing MRI. Med Image Anal 71:102036 重试 错误原因
Alhussein M, Al-Waisi Y, Khasawneh MT (2021) Deep conditional generative adversarial networks for multivariate time series anomaly detection. IEEE Access 9:33762–33771 重试 错误原因
Allen-Zhu Z, Li Y, Song Z (2019) On the convergence rate of training recurrent neural networks. In: Advances in neural information processing systems, pp 1310–1318. arXiv:1810.12065 重试 错误原因
Alonso MB (2021) Data augmentation using many-to-many RNNs for session-aware recommender systems. https://doi.org/10.48550/arXiv.2108.09858 重试 错误原因
Amazon product reviews dataset. https://s3.amazonaws.com/amazon-reviews-pds/readme.html
Arifa S, Shefali S (2021) Optimized auto encoder on high dimensional big data reduction: an analytical approach. Turk J Comput Math Educ 12(14):526–537 重试 错误原因
Bai S, Zico Kolter J, Koltun V (2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling.https://doi.org/10.48550/arXiv.1803.01271 重试 错误原因
Bala R, Singh RP (2019) Financial and non-stationary time series forecasting using LSTM recurrent neural network for short and long horizon. In: 10th ICCCNT 重试 错误原因
Bharathi Mohan G, Prasanna Kumar R (2022a) Survey of text document summarization based on ensemble topic vector clustering model . IOT based control networks and intelligent systems. https://doi.org/10.1007/978-981-19-5845-8-60 重试 错误原因
Bharathi Mohan G, Prasanna Kumar R (2022b) A comprehensive survey on topic modelling in text summarization. In: International conference on micro-electronics and telecommunication engineering. https://doi.org/10.1107/978-981-16-8721-1_22 重试 错误原因
Bian Y, Huang J, Cai X, Yuan J, Church K (2021) On Attention Redundancy: A Comprehensive Study. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 930–945, Online. Association for Computational Linguistics 重试 错误原因
By setting up comparative experiments, comparing with LSTM, GRU, SVR, RF, LR, CNN-LSTM and Attention-LSTM, it is verified that the PSO-Attention-LSTM model has advantages in positive rate and false positive rate, and has stronger anomaly detection ability 重试 错误原因
Cai L-Q, Wei M, Zhou S-T, Yan X (2020) Intelligent question answering in restricted domains using deep learning and question pair matching. IEEE Access 8:32922–32934. https://doi.org/10.1109/ACCESS.2020.2973728
CelebA dataset: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
Chadha GS, Panambilly A, Schwung A, Ding SX (2020) Bidirectional deep recurrent neural networks for process fault classification. ISA Trans 106:330–342. https://doi.org/10.1016/j.isatra.2020.07.011
Chamorro JA, Bermudez JD, Happ PN, Feitosa RQ (2019) A many-to-many fully convolutional recurrent network for multitemporal crop recognition. ISPRS Ann Photogramm Remote Sens Spatial Inf Sci 4:25–32. https://doi.org/10.5194/isprs-annals-IV-2-W7-25-2019
Chatterjee S, Zhang Y, Chang L, Huang TS (2021) XBM: Learning cross-modal binary representations with adversarial feature factorization. IEEE Trans Pattern Anal Mach Intell 43(4):1268–1282
Chen X, Ma L, Jiang W, Yao J, Liu W (2018) Regularizing RNNs for caption generation by reconstructing the past with the present. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp 7995–8003. https://doi.org/10.48550/arXiv.1803.11439
Chen L, Zhou Y, Wang X, Huang Z (2021a) Big data and deep learning: challenges and opportunities. J Big Data 8(1):1–31
Chen T, Luo Z, Liu Y, Han Y (2021b) AS-transformer: an attentive and separable transformer for structured prediction. IEEE Trans Pattern Anal Mach Intell
Chen C-J, Chou F-I, Chou J-H (2022) Temperature prediction for reheating furnace by gated recurrent unit approach. IEEE Access 10:33362–33369. https://doi.org/10.1109/ACCESS.2022.3162424
Chenyu H, Jiawei W, Bin C, Jing F (2021) A deep-learning prediction model for imbalanced time series data forecasting. Big Data Mining Anal 4(4):266–278. https://doi.org/10.26599/BDMA.2021.9020011
Cho K, Van MB, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014a) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078
Cho K, Merrienboer BV, Gulcehre C, Bahdanau D, Bougares F, Schwenk F, Bengio Y (2014b) Learning phrase representations using rnn encoder-decoder for statistical machine translation. https://doi.org/10.48550/arXiv.1406.1078
Choi Y, Cardie C (2008) Learning with compositional semantics as structural inference for subsentential sentiment analysis. In: Proceedings of the 46th annual meeting of the association for computational linguistics: human language technologies (ACL-HLT), pp 793–801
Choi E, He H, Iyyer M, Yatskar M, Yih W, Choi Y (2018) QuAC: question answering in context. arXiv preprint arXiv:1808.07036
Chung JS, Lee K (2021) Large-scale continuous speech recognition with chunk-based streaming decoder. IEEE/ACM Trans Audio Speech Lang Process 29:1765–1777
Chung J, Gulcehre C, Cho KH, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555
CIFAR-10 and CIFAR-100 datasets: https://www.cs.toronto.edu/~kriz/cifar.html
Clark E, Khandelwal U, Levy O, Manning CD (2020) TyDi QA: a Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages. arXiv preprint arXiv:2010.11934
COCO Dataset: https://cocodataset.org/
Dai Z, Yang Z, Yang Y, Carbonell JG, Le QV, Salakhutdinov R (2019) Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 2978–2988
Debortoli S, Muller O, Vom BJ (2014) Comparing business intelligence and big data skills. Bus Inf Syst Eng 6(5):289–300. https://doi.org/10.1007/s12599-014-0344-2
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: 2009a IEEE conference on computer vision and pattern recognition, pp 248–255
Deng DY, Li J, Zhang ZY, Teng YF, Hhuang Q (2020) Short-term electric load forecasting based on EEMD-GRU-MLR. Power Syst Technol 44(2):593–602
Diao E, Ding J, Tarokh V (2019) Restricted recurrent neural networks. In: 2019 IEEE international conference on big data (big data), pp 56–63. https://doi.org/10.1109/BigData47090.2019.9006257
Dupond S (2019) A thorough review on the current advance of neural network structures. Annu Rev Control 14:200–230
Elmaz F, Eyckerman R, Casteels W, Latré S, Hellinckx P (2021) CNN-LSTM architecture for predictive indoor temperature modeling. Build Environ 206:108327. https://doi.org/10.1016/j.buildenv.2021.108327
Fong Y, Xu J (2020) Forward stepwise deep auto encoder-based monotone nonlinear dimensionality reduction methods. J Comput Graphical Stat. https://doi.org/10.1080/10618600.2020.1856119
Gangi D, Mattia A, Matteo N, Marco T (2019) One-to-many multilingual end-to-end speech translation. In: 2019 IEEE automatic speech recognition and understanding workshop (ASRU)
Gheisari M, Wang G, Bhuiyan MZ (2017). A survey on deep learning in big data. In: IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), July 2017, pp 173–180. https://doi.org/10.1109/CSE-EUC.2017.215
Gilmer J et al (2017) Neural message passing for quantum chemistry. In: Proceedings of the 34th international conference on machine learning, pp 1263–1272
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, New York
Google Speech Commands dataset: https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html
Gu Y, Chen T, Mei Q (2021) Hierarchical image generation with convolutional neural networks. IEEE Trans Multimed 23:21–31
Gui G, Liu F, Sun J, Yang J, Zhou Z, Zhao D (2020) Flight delay prediction based on aviation big data and machine learning. IEEE Trans Veh Technol 69(1):140–150
Guo J, Fan Y, Liu Y, Huang J, Shi S (2021) Dual transfer learning for low-resource natural language understanding. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, no 10, pp 9008–9015
Harper FM, Konstan JA (2015) The MovieLens datasets: history and context. ACM Trans Interact Intell Syst 5(4):19. https://doi.org/10.1145/2827872
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition, pp 770–778
He K,张翔,任胜,孙健(2016 年)深度残差学习用于图像识别。在:IEEE 计算机视觉与模式识别会议,第 770-778 页Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Hou Q, Stringer B, Waury K, Capel, Reza Haydarlou H, Xue, Sanne Abeln J, Heringa J, Anton Feenstra K (2021) SeRenDIP-CE: sequence-based interface prediction for conformational epitopes, Bioinformatics, 37(20), October 3421–3427, https://doi.org/10.1093/bioinformatics/btab321.
Hou Q, Stringer B, Waury K, Capel, Reza Haydarlou H, Xue, Sanne Abeln J, Heringa J, Anton Feenstra K (2021) SeRenDIP-CE:用于构象抗原表位的基于序列的界面预测,生物信息学,37(20),10 月 3421–3427,https://doi.org/10.1093/bioinformatics/btab321。Hu J, Zheng W (2020) Multistage attention network for multivariate time series prediction. Neurocomputing 383:122–137
Hu J, Zheng W(2020)多阶段注意力网络用于多变量时间序列预测。Neurocomputing 383:122–137ImageNet Dataset: http://www.image-net.org/
ImageNet 数据集: http://www.image-net.org/Jaffry S, Hasan SF (2020) Cellular Traffic Prediction using Recurrent Neural Networks. In: 2020 IEEE 5th international symposium on telecommunication technologies (ISTT), pp 94–98
Jegou S, Drozdzal M, Vazquez D, Romero A, Bengio Y (2017) The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. Comput Vis Pattern Recognit Workshops 2017:1175–1183
Jiao R, Zhang T, Jiang Y, He H (2018) Short-term non-residential load forecasting based on multiple sequences LSTM recurrent neural network. IEEE Access 6:59438–59448. https://doi.org/10.1109/ACCESS.2018.2873712
Jin XB, Yang NX, Wang XY, Bai YT, Su TL, Kong JL (2020) Hybrid deep learning predictor for smart agriculture sensing based on empirical mode decomposition and gated recurrent unit group model. Sensors 20(5):1334. https://doi.org/10.3390/s20051334
Jin X-B, Gong W-T, Kong J-L, Bai Y-T, Su T-L (2022) A variational bayesian deep network with data self-screening layer for massive time-series data forecasting. Entropy 24:335. https://doi.org/10.3390/e24030335
Kag A, Venkatesh S (2021) Training recurrent neural networks via forward propagation through time. Int Conf Mach Learn PMLR 139:5189–5200
Karras T, Laine S, Aila T, Hellsten J (2021) Alias-free generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12296–12305
KDD Cup 99 dataset: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
Kim YJ, Choi S, Briceno S, Mavris D (2016) A deep learning approach to flight delay prediction. In: Proc. IEEE 35th digital avionics systems conference, pp 1–6. https://doi.org/10.1109/DASC.2016.7778092.
Kim J, El Khamy M, Lee J (2017) Residual LSTM: design of a deep recurrent architecture for distant speech recognition. In: Proceedings of the annual conference of the international speech communication association, pp 1591–1595. https://doi.org/10.21437/Interspeech.2017-477
KITTI Dataset: http://www.cvlibs.net/datasets/kitti/
Kong W, Dong ZY, Jia Y, Hill DJ, Xu Y, Zhang Y (2019) Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Trans Smart Grid 10(1):841–851. https://doi.org/10.1109/TSG.2017.2753802
Körner M, Marc R (2021) Recurrent neural networks and the temporal component. Deep learning for the earth sciences: a comprehensive approach to remote sensing, climate science, and geosciences, pp 105–119. https://doi.org/10.1002/9781119646181.ch8
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto
Kuznetsova A, Hanocka R, Shlens J, Ferrari V, Gupta A (2020) The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. arXiv preprint arXiv:1811.00982
Kuznetsova A,Hanocka R,Shlens J,Ferrari V,Gupta A(2020)The open images dataset v4:统一的图像分类、物体检测和规模的视觉关系检测。arXiv 预印本 arXiv:1811.00982Kwiatkowski T, Palomaki J, Redfield O, Collins M, Petrov S, Das D (2019) Natural questions: a benchmark for question answering research. Trans Assoc Comput Linguist 7:491–505 重试 错误原因
Le P, Zuidema W (2016) Quantifying the vanishing gradient and long distance dependency problem in recursive neural networks and recursive LSTMs.https://doi.org/10.48550/arXiv.1603.00423 重试 错误原因
LeCun Y, Cortes C, Burges C (2010) MNIST handwritten digit database. AT&T Labs. http://yann.lecun.com/exdb/mnist 重试 错误原因
LibriSpeech dataset: http://www.openslr.org/12/ 重试 错误原因
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision, pp 740–755 重试 错误原因
Lin L, Chen C-Y, Yang H-Y, Xu Z, Fang S-H (2020) Dynamic system approach for improved PM 2.5 prediction in Taiwan. IEEE Access 8:210910–210921. https://doi.org/10.1109/ACCESS.2020.3038853 重试 错误原因
Linhao D, Shuang X, Bo X (2018) Speech transformer: a no-recurrence sequence-to-sequence model for speech recognition. In: Proc. of international conference on acoustics, speech and signal processing, pp 5884–5888. https://doi.org/10.1109/ICASSP.2018.8462506 重试 错误原因
Liu M, Chen L, Du X, Jin L, Shang M (2021) Activated gradients for deep neural networks. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2021.3106044 重试 错误原因
Long D, Zhang R, Mao Y (2019) Recurrent neural networks with finite memory length. IEEE Access 7:12511–12520. https://doi.org/10.1109/ACCESS.2018.2890297 重试 错误原因
M4 competition dataset. https://www.m4.unic.ac.cy/the-dataset/ 重试 错误原因
Ma Y, Principe J (2018) Comparison of static neural network with external memory and RNNs for deterministic context free language learning. In: Proceedings of the international joint conference on neural networks, pp 1–7. https://doi.org/10.1109/IJCNN.2018.8489240. 重试 错误原因
Ma Y, Tang J, Zhao T, Liu L, Wang S, Zhang Z, Mei Q (2021) Dual graph attention networks for deep recommendation. IEEE Trans Knowl Data Eng 重试 错误原因
Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies (ACL-HLT), pp 142–150 重试 错误原因
Mandic D, Chambers J (2001) Recurrent neural networks for prediction: learning algorithms, architectures and stability. Wiley, Hoboken 重试 错误原因
McAuley J, Leskovec J (2013) Hidden factors and hidden topics: understanding rating dimensions with review text. In: Proceedings of the 7th ACM conference on recommender systems, pp 165–172
Melvin J, Mike S, Quoc VL, Maxim K, Yonghui W, Zhifeng C, Nikhil T, Fernanda V, Martin W, Greg C, Macduff H, Jeffrey D (2017) Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans Assoc Comput Linguist 5:339–351. https://doi.org/10.1162/tacl_a_00065
Menne MJ, Durre I, Vose RS, Gleason BE, Houston TG (2012) An overview of the global historical climatology network-daily database. J Atmos Oceanic Tech 29:897–910. https://doi.org/10.1175/JTECH-D-11-00103.1
Merity S, Xiong C, Bradbury J, Socher R (2016) Pointer sentinel mixture models. In: Proceedings of the 2016 conference on empirical methods in natural language processing (EMNLP), pp 620–629
MNIST dataset: http://yann.lecun.com/exdb/mnist/
MNIST 数据集:http://yann.lecun.com/exdb/mnist/MNIST homepage: http://yann.lecun.com/exdb/mnist/
MNIST 主页: http://yann.lecun.com/exdb/mnist/Mohamed SA, Abdou MA, Elsayed AA (2022) Residual information flow for neural machine translation. IEEE Access 10:118313–118320. https://doi.org/10.1109/ACCESS.2022.3220691
Mozilla Common Voice dataset: https://commonvoice.mozilla.org/en/datasets
Mujeeb S, Javaid N, Ilahi M, Wadud Z, Ishmanov F, Afzal MK (2019) Deep long short-term memory: a new price and load forecasting scheme for big data in smart cities. Sustainability 11(4):987. https://doi.org/10.3390/su11040987
Nakov P, Kirilov A, Derczynski L, Esteves D, Maynard H, Ritter A, Saggion S, Tsatsaronis G (2013) SemEval-2013 Task 2: sentiment analysis in Twitter. In: Proceedings of the 7th international workshop on semantic evaluation (SemEval-2013), pp 312–320
NASA Turbofan Engine dataset: https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/#turbofan
Naul B, Bhoom JS, Pérez F, Walt SVD (2018) A recurrent neural network for classification of unevenly sampled variable stars. Nat Astron 2(2):151–155. https://doi.org/10.1038/s41550-017-0321-z
Ng A (2018) Auto encoders. Unsupervised Feature Learning and Deep Learning (UFLDL) Tutorial 2018. http://ufldl.stanford.edu/tutorial/unsupervised/Autoencoders. Accessed 21 July 2018
Nguyen G, Dlugolinsky S, Bobak M, Tran V, Garcia AL, Heredia I, Malik P, Hluchy L (2019) Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey. Artif Intell Rev 52:77–124. https://doi.org/10.1007/s10462-018-09679-z
Numenta Anomaly Benchmark dataset: https://github.com/numenta/NAB
Open Images Dataset: https://storage.googleapis.com/openimages/web/index.html
Papineni SLV, Yarlagadda S, Akkineni H, Reddy AM (2021) Big data analytics applying the fusion approach of multicriteria decision making with deep learning algorithms. https://doi.org/10.48550/arXiv.2102.02637
Park D, Yoon S, Lee K (2021) SpecAugment 2.0: Improved data augmentation for automatic speech recognition. IEEE Signal Process Lett 28:151–155
Park Y, Gajamannage K, Jayathilake DI, Bollt EM (2022) Recurrent neural networks for dynamical systems: applications to ordinary differential equations. Collective Motion, and Hydrological Modeling, pp 1–15. https://doi.org/10.48550/arxiv.2202.07022
Pascal VOC Dataset: http://host.robots.ox.ac.uk/pascal/VOC/
Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding with unsupervised learning. Technical report, OpenAI. https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf
Rajpurkar P, Jia R, Liang P (2018) Know what you don’t know: unanswerable questions for SQuAD. arXiv preprint arXiv:1806.03822
Rey GG, Aumond P, Can A (2020) Variability in sound power levels: implications for static and dynamic traffic models. Transp Res Part D 84:102339. https://doi.org/10.1016/j.trd.2020.102339
Rusch TK, Siddhartha M (2021) UnICORNN: a recurrent model for learning very long time dependencies. Int Conf Mach Learn PMLR 139:9168–9178
Rußwurm M, Korner M (2018) Multi-temporal land cover classification with sequential recurrent encoders. ISPRS Int J Geo Inf 7:129. https://doi.org/10.3390/ijgi7040129
Sahu S, Anand S (2017) Named entity recognition on hindi news articles using conditional random fields. In: Proceedings of the 2017 international conference on data management, analytics and innovation (ICDMAI), pp 129–136
Sak H, Senior A, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. Comput Sci 338–342
Sarkar BK (2017) Big data for secure healthcare system: a conceptual design. Complex Intell Syst 3(2):133–151. https://doi.org/10.1007/s40747-017-0040-1
Sarker IH (2021) Machine learning: algorithms, real-world applications and research directions. SN Comput Sci 2:160. https://doi.org/10.1007/s42979-021-00592-x
Selvi T, Kavitha V (2022) A privacy-aware deep learning framework for health recommendation system on analysis of big data. Vis Comput 38:385–403. https://doi.org/10.1007/s00371-020-02021-1
Shi H, Wang L, Scherer R, Woźniak M, Zhang P, Wei W (2021) Short-term load forecasting based on adabelief optimized temporal convolutional network and gated recurrent unit hybrid neural network. IEEE Access 9:66965–66981. https://doi.org/10.1109/ACCESS.2021.3076313
Shikalgar A, Sonavane S (2021) Optimized auto encoder on high dimensional big data reduction: an analytical approach, turkish journal of computer and mathematics education, 12(14)
Shih CH, Yan BC, Liu SH, Chen B (2017) Investigating Siamese LSTM networks for text categorization. In: Proceedings—9th Asia-pacific signal and information processing association annual summit and conference (APSIPAASC), pp 641–646. https://doi.org/10.1109/APSIPA.2017.8282104
Simistira F, Ul-Hassan A, Papavassiliou V, Gatos B, Katsouros V, Liwicki M (2015) Recognition of historical Greek polytonic scripts using LSTM networks. In: Proceedings of the international conference on document analysis and recognition, ICDAR, pp 766–770. https://doi.org/10.1109/ICDAR.2015.7333865
Socher R, Perelygin A, Wu J, Chuang J, Manning C, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing (EMNLP), pp 1631–1642
Song J, Xue G, Ma Y, Li H, Pan Y, Hao Z (2019) An indoor temperature prediction framework based on hierarchical attention gated recurrent unit model for energy efficient buildings. IEEE Access 7:157268–157283. https://doi.org/10.1109/ACCESS.2019.2950341
Sun ZH, Sun LZ, Strang K (2018) Big data analytics services for enhancing business intelligence. J Comput Inf Syst 58(2):162–169. https://doi.org/10.1080/08874417.2016.1220239
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 31, no 1
Tan M, Pang R, Le QV (2020) EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790
Théate T, Ernst D (2021) An application of deep reinforcement learning to algorithmic trading. Expert Syst Appl 173:114632
Tian X, Zhang J, Ma Z, He Y, Wei J, Wu P, Situ W, Li S, Zhang Y (2017) Deep LSTM for large vocabulary continuous speech recognition.https://doi.org/10.48550/arXiv.1703.07090
Tuli S, Casale G, Jennings, NR (2022) Tranad: deep transformer networks for anomaly detection in multivariate time series data. arXiv preprint arXiv:2201.07284
Ünlü KD (2022) A data-driven model to forecast multi-step ahead time series of turkish daily electricity load. Electronics 11(10):1524. https://doi.org/10.3390/electronics11101524
Wang XQ, Chen YL, Yang Q, Liu HC (2018a) Analysis and prediction of user electricity consumption based on time series decomposition. Comput Eng Appl 38(9):230–236
Wang A, Singh A, Michael J, Hill F, Levy O, Bowman S (2018b) GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMN
Wang C, Du W, Zhu Z, Yue Z (2020) The real-time big data processing method based on LSTM or GRU for the smart job shop production process. J Algorithms Comput Technol 14:1–9. https://doi.org/10.1177/1748302620962390
Wang S, Ma Y, Jin D, Jiang H, Yu H (2021) Deep learning for big data analytics: a survey. J Big Data 8(1):1–37 重试 错误原因
WMT Dataset: https://www.statmt.org/wmt21/translation-task.html
Wu Z, Wan J (2021) Cascade anchor-based object detection with adaptive feature fusion and background filter. IEEE Trans Pattern Anal Mach Intell 43(4):1283–1299
Wu Q, Zhang J, Zhu X, Yu H, Chen J (2021) Spatio-temporal graph attention networks for air quality prediction. IEEE Trans Neural Netw Learn Syst 32(6):2336–2347
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747
Xu J, Liu Z, Yin X, Tian Z (2021) LADeepSAD: A deep self-attention network for online streaming anomaly detection. Neurocomputing 460:171–182
Yadav P, Mishra A, Lee J, Kim S (2022) A survey on deep reinforcement learning-based approaches for adaptation and generalization, machine learning. arXiv:2202.08444
Yahoo S5 dataset: https://webscope.sandbox.yahoo.com/catalog.php?datatype=s&did=70
Yang X, Lyu T, Li Q, Lee CY, Bian J, Hogan WR, Wu Y (2019a) A study of deep learning methods for de-identifcation of clinical notes in cross-institute settings. BMC Med Inform Decis Mak 19(5):232. https://doi.org/10.1186/s12911-019-0935-4
Yang W, Zuo W, Cui B (2019b) Detecting malicious urls via a keyword-based convolutional gated-recurrent-unit neural network. IEEE Access 7:29891–29900. https://doi.org/10.1109/ACCESS.2019.2895751
Yara A, Albatul A, Murad AR (2020) A financial fraud detection model based on LSTM deep learning technique. J Appl Secur Res. https://doi.org/10.1080/19361610.2020.1815491
Zakir J, Seymour T, Berg K (2015) Big data analytics. Issues Inf Syst 16(2):81–90
Zeng Z, Wang X, Guo Y (2021) Efficient and accurate knowledge distillation for deep neural networks. Neural Netw 140:176–185
Zhang X, Helmut K, Wim DR (2021a) Traffic noise prediction applying multivariate bi-directional recurrent neural network. Appl Sci 11(6):2714. https://doi.org/10.3390/app11062714
Zhang Y, Chen J, Tang J, Zhang X, Chen H (2021b) Trends and challenges in deep learning for big data: a survey. Appl Sci 11(3):1033
张 Y,陈 J,唐 J,张 X,陈 H(2021b)深度学习在大数据中的趋势和挑战:一项调查。Appl Sci 11(3):1033Zhang R, Yao Y, Sun A, Tay Y (2021c) Deep learning based recommendation: a survey. arXiv preprint arXiv:2105.09688
张瑞,姚瑶,孙岚,丹艾(2021 年)基于深度学习的推荐:一项调查。arXiv 预印本arXiv:2105.09688Zheng B, Chen L, Wang Y, Chen W, Zhang W, Chen Y (2021) Spatiotemporal forecasting of crowd flow with graph neural networks. IEEE Trans Neural Netw Learn Syst 32(5):1955–1966
Zu XR, Song RX (2018) Short-term wind power prediction method based on wavelet packet decomposition and improved GRU. J Phys 1087(2):022034. https://doi.org/10.1088/1742-6596/1087/2/022034
Zu XR,Song RX(2018 年)基于小波包分解和改进的 GRU 的短期风电功率预测方法。J Phys 1087(2):022034。https://doi.org/10.1088/1742-6596/1087/2/022034
Rights and permissions 权利和许可
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. 重试 错误原因