Determination of Flowing Grain Moisture Contents by Machine Learning Algorithms Using Free Space Measurement Data 使用自由空間測量資料的機器學習演算法判定流動穀物水分含量
Enes Yigit ^((®)){ }^{(®)} and Hüseyin Duysak ^((®)){ }^{(®)} Enes Yigit ^((®)){ }^{(®)} 和 Hüseyin Duysak ^((®)){ }^{(®)}
Abstract 摘要
The measurement of the moisture content of the stored grain in the silos provides the opportunity to take the necessary precautions to store the grain without spoiling. Since it is not possible to obtain all the moisture information of the stored grain with the current methods, in this study, a new method is proposed to determine the moisture content of the grain in real time during the loading processes. For this purpose, popular machine learning (ML) algorithms, i.e., KNN, SVR, and ANN, are used to predict the moisture content of the flowing grain. In order to measure the moisture content of the grain, a freespace electromagnetic measurement setup is constructed. Reflection and transmission coefficients are measured at 103 different frequency points between 1 and 2.48 GHz using a vector network analyzer (VNA) for three different grain types (Bulgur wheat, durum wheat, and corn silage kernel) with moisture content varying between 8%\mathbf{8 \%} and 25%25 \%. In this way, three datasets are constituted as datasets 1-31-3 corresponding to each grain type. The kk-fold cross-validation ( k-CVk-\mathrm{CV} ) technique is used to train and test the ML algorithms and the performance of the algorithms is evaluated with five different metrics. In addition, for each grain type, the error rates corresponding to each moisture content are evaluated separately and the relationship between moisture content and performance of algorithms is revealed. While the best results are obtained with KNN for durum wheat and corn silage kernel, SVR method gives the best results for bulgur wheat. This study reveals that the moisture content of flowing grain can be determined, thanks to proper modeling of ML algorithms and measurement setup. 藉由測量倉庫中儲存穀物的含水量,可採取必要的預防措施,使穀物儲存時不致變質。由於目前的方法無法取得儲存穀物的所有水分資訊,因此本研究提出一種新方法,可在裝載過程中即時判定穀物的水分含量。為此,本研究使用流行的機器學習 (ML) 演算法,即 KNN、SVR 及 ANN 來預測流動穀物的含水率。為了測量穀物的水分含量,我們建構了一個自由空間電磁測量裝置。使用向量網路分析儀 (VNA) 在 103 個不同頻率點 (1 到 2.48 GHz) 測量三種不同穀物類型 (Bulgur小麥、硬質小麥和玉米青穀)的反射和透射係數,穀物的含水量在 8%\mathbf{8 \%} 和 25%25 \% 之間變化。如此一來,每個穀物類型對應的資料集 1-31-3 就構成了三個資料集。使用 kk 折疊交叉驗證 ( k-CVk-\mathrm{CV} ) 技術來訓練和測試 ML 演算法,並使用五種不同的指標來評估演算法的效能。此外,針對每種穀物類型,分別評估每種含水率對應的錯誤率,並揭示含水率與演算法效能之間的關係。對於硬質小麥和玉米青穀粒,使用 KNN 得到最佳結果,而對於粗粒小麥,則使用 SVR 方法得到最佳結果。本研究揭示了流動穀物的水分含量是可以測定的,這要歸功於 ML 演算法的適當建模與量測設定。
0NE of the most important parameters affecting the storage, market value, chemical, and quality properties of grains is moisture content. The moisture content of grains must be suitable for safe storage. Grains with low moisture content are better to store because insects and molds need moisture to grow [1], [2]. That is why the desired moisture content of 影響穀物儲存、市場價值、化學和品質特性的最重要參數之一是含水量。穀物的含水量必須適合安全儲存。水分含量低的穀物更適合儲存,因為昆蟲和霉菌需要水分才能生長 [1]、[2]。這就是為什麼穀物的理想含水量
grains, such as corn and wheat, for safety storage must be less than 14%14 \% [3]. The moisture content of the grains can affect economic gains. During filling into storage vehicles, such as silos, the moisture measurement of the flowing grain can prevent both economic and grain losses. In recent years, there has been a wide range of technologies for the measurement of grain moisture content. These technologies are categorized as direct and indirect methods. Direct methods determine the absolute amount of moisture like the gravimetric method [4]. In the gravimetric method, a weighed quantity of grain is dried at a specific temperature and the moisture content is calculated by measuring the loss in weight of the grain [5], [6]. Direct methods are precision moisture measurement systems, and they are preferred for laboratory researches or reference method [7]. However, due to tedious procedures, it is timeconsuming and also a destructive method; thus, these methods are not suitable for real-time measurement. On the other hand, indirect methods determine the moisture content by measuring the physical properties of the grain [8], [9]. These properties are the electrical permeability, magnetic permeability, and conductivity of the grain. The effects of these electrical properties change according to the moisture content of the grain, the temperature of the environment, and the measurement frequency [10]. In recent years, many different indirect measurement methods, such as microwave, resistance, and capacitance, have been used to determine the moisture content of grain. These methods, which are nondestructive, are suitable for real-time moisture measurement and grain trade. Many studies have shown that the electrical properties of grains change with the moisture content [7]-[14]. Therefore, the moisture of the grain can be calculated by microwave material characterization techniques that give the electrical properties of the materials. These techniques include coaxial probe method, waveguide, resonant cavity method, and free space [15]. The selection of appropriate techniques is correlated with operating frequency, sample size, material state (liquid, solid, or semisolid), and testing type (destructive or nondestructive). With the development of the precision agriculture industry, real-time measurement of moisture content for bulk grains and seeds has been demanded [11], [16]. In this context, free-space techniques can be chosen since it allows wide frequency band measurement, nondestructive, real-time, and easy installation procedures, especially in terms of sample placing. Coaxial probe, waveguide, and resonant cavity provide good accuracy, 玉米和小麥等穀物的安全儲存必須小於 14%14 \% [3]。穀物的水分含量會影響經濟收益。在裝入倉庫等倉儲車輛的過程中,測量流動穀物的水分可以防止經濟損失和穀物損失。近年來,穀物水分含量的量測技術層出不窮。這些技術可分為直接法和間接法。直接方法可確定水分的絕對量,如重量法 [4]。在重量法中,稱量的穀物會在特定溫度下乾燥,並透過測量穀物的重量損失來計算水分含量 [5]、[6]。直接法是精密的水分測量系統,是實驗室研究或參考方法的首選 [7]。然而,由於程序繁瑣、耗時且屬於破壞性方法,因此這些方法並不適合用於即時測量。另一方面,間接法則是透過測量穀物的物理性質來決定含水率 [8]、[9]。這些特性是指穀物的電導率、磁導率和導電性。這些電性的影響會隨著穀物的含水量、環境溫度以及量測頻率而改變 [10]。近年來,許多不同的間接測量方法,例如微波、電阻和電容等,都被用來測定穀物的含水量。這些方法都是非破壞性的,適用於即時水分測量和穀物貿易。 許多研究顯示,晶粒的電性會隨著含水量的變化而改變 [7]-[14]。因此,顆粒的水分可以透過微波材料表徵技術計算出材料的電性。這些技術包括同軸探針法、波導法、諧振腔法和自由空間法[15]。適當技術的選擇與操作頻率、樣品大小、材料狀態 (液體、固體或半固體) 以及測試類型 (破壞性或非破壞性) 相關。隨著精準農業的發展,人們需要即時測量散裝穀物和種子的含水量 [11]、[16]。在此背景下,可選擇自由空間技術,因為它允許寬頻帶量測、無破壞、即時且安裝程序簡單,特別是在樣品放置方面。同軸探針、波導和諧振腔提供良好的精確度、
but they are not suitable for real-time moisture measurement for bulk material due to tedious application procedures such as preparation of sample holder and sample size. 但由於繁瑣的應用程序,例如準備樣品架和樣品大小,它們並不適合用於散裝材料的即時水份測量。
However, studies for the measurement of flowing grain moisture content are also very limited. In [11], [17], the moisture content is determined by obtaining the electrical properties of the flowing grain with a free-space measurement system. However, there are many limiting parameters such as bulk density and temperature [11] in the setup to maintain the sensitivity of the electrical characterization method. This makes the solution method nonpractical. On the other hand, in recent years, usage of the machine learning (ML) algorithms has increased in moisture measurement of grains. The moisture content of red wheat is estimated with artificial neural network (ANN) in [18]. In [19], transmission measurements for sweet corn inside a fixed sampler holder are collected with a freespace measurement system and the moisture content of corn is determined by deep neural network by performing feature extraction. In [3], radio wave signals of rice placed in the fixed holder are measured with the RFID sensor-based system and the moisture content of rice is estimated by ML algorithms. 然而,針對流動穀物水分含量測量的研究也非常有限。在 [11]、[17] 中,水分含量是透過自由空間量測系統取得流動穀物的電性來判定。然而,為了維持電性表徵方法的靈敏度,在設定上有許多限制參數,例如體積密度和溫度 [11]。這使得求解方法不切實際。另一方面,近年來,機器學習 (ML) 演算法在穀物水分測量上的使用越來越多。在 [18] 中,使用人工神經網路 (ANN) 估計紅小麥的水分含量。在 [19] 中,使用自由空間測量系統收集甜玉米在固定取樣器支架內的傳輸測量,並透過深度神經網路執行特徵抽取來判斷玉米的水分含量。在[3]中,利用 RFID 感測器系統測量放置在固定支架中的稻米的無線電波訊號,並利用 ML 演算法估算稻米的含水量。
In this study, ML and free-space measurement-based methods are proposed for the first time to determine the moisture content of flowing grain. The datasets are constituted measuring reflection and transmission coefficients for grains, including different moisture content. The datasets are used as input data for three well-known algorithms, including K-nearest neighbor (KNN), support vector regression (SVR), and ANN, and their results are evaluated. 本研究首次提出以 ML 與自由空間測量為基礎的方法來判斷流動穀物的含水率。資料集是由測量穀物的反射與透射係數所組成,包括不同的含水率。此資料集為三種知名演算法的輸入資料,包括 K-nearest neighbor (KNN)、支援向量回歸 (SVR) 及 ANN,並評估其結果。
Consequently, obtaining free-space measurement data for three different grain types and determining their moisture content with these data are among the main contributions of this study. 因此,本研究的主要貢獻包括取得三種不同穀物類型的自由空間量測數據,並利用這些數據判定其水分含量。
This study is presented under four main titles. In Section II, free-space measurement, experimental setup, and ML algorithms are mentioned. Section III contains the discussion and results of the algorithms. In Section IV, the study is summarized with concluding remarks. 本研究分為四個主要標題。第二節提到自由空間量測、實驗設定與 ML 演算法。第三節包含演算法的討論與結果。第四節為研究總結與結語。
II. Material And Methods II.材料與方法
A. Free-Space Measurement A.自由空間量測
Free space is a technique based on the transmission line theory for material characterization measurements. The reflection (S_(11))\left(S_{11}\right) and transmission coefficients (S_(21))\left(S_{21}\right) are measured by placing the material between two antennas. With these coefficients, the dielectric permittivity values of the material can be determined by numerical methods [11], [20], [21]. It is preferred due to its advantages, such as noncontact measurement and easy material preparation, and measurement of larger materials. In this study, reflection and transmission coefficients are measured with the free-space-based experimental setup to detect the moisture of flowing grain. The general structure of the free-space technique is shown in Fig. 1. 自由空間是一種基於傳輸線理論的材料特性量測技術。將材料放置在兩個天線之間,可以測量反射係數 (S_(11))\left(S_{11}\right) 和傳輸係數 (S_(21))\left(S_{21}\right) 。有了這些係數,材料的介電誘電率值就可以用數值方法來測定 [11]、[20]、[21]。由於其優點,例如非接觸式量測、材料準備容易、可量測較大的材料等,因此較受青睞。在本研究中,使用自由空間法的實驗裝置來量測反射係數與透射係數,以偵測流動晶粒的濕度。自由空間技術的一般結構如圖 1 所示。
B. Sample Preparation B.樣品製備
In order to predict grain moisture accurately, samples are prepared to include all possible moisture ranges during the 為了準確預測穀物水分,在製備樣品時,要包含所有可能的水分範圍。
Fig. 1. Representation of the general structure of the free-space technique. 圖 1.自由空間技術的一般結構表示。
Fig. 2. Scenes of sample preparation. 圖 2.樣品製備的場景。
storage of grains. The initial moisture content of the samples is accepted as the lowest level due to the dry air of Karaman, the region in which this study is carried out. The initial moisture contents of bulgur, wheat, and corn are measured as 8.3%8.3 \%, 9.5%9.5 \%, and 14.5%14.5 \%, respectively. About 20 kg of grain is used for each grain type and different amounts of water (approximately 500 ml ) are added by spraying in each sample preparation, as shown in Fig. 2. When water addition is carried out, the grain is mixed continuously, and in this way, the moisture is distributed homogeneously to all grains. Then, the lids of the grain box are closed and rested for 24 h . Thus, the moisture migration process of the grain is completed and samples with homogeneous moisture distribution are prepared. The moisture contents of the grains are given in Table I. 穀物的儲存。由於卡拉曼(本研究進行的地區)空氣乾燥,樣品的初始含水量被接受為最低水平。穀物、小麥和玉米的初始含水量分別測量為 8.3%8.3 \% 、 9.5%9.5 \% 和 14.5%14.5 \% 。如圖 2 所示,每種穀物類型使用約 20 公斤穀物,並在每次樣品製備時以噴灑方式加入不同份量的水(約 500 毫升)。加水時,穀物會持續混合,如此一來,水分就會均勻地分佈到所有穀物。然後,關上穀物箱蓋,靜置 24 小時。如此一來,穀物的水分遷移過程就完成了,並製備出水分均勻分佈的樣品。穀物的含水率如表 I 所示。
C. Experimental Setup C.實驗設定
The experimental setup shown in Fig. 3 consists of a Keysight vector network analyzer (VNA), Keysight 85056D Precision Calibration kit, two double-ridged A-Info LB-880NF [22] horn antennas with an operating frequency range of 0.8-8GHz0.8-8 \mathrm{GHz}, a sample holder made of mica material, and a metal structure. Antennas and sample holder are assembled on the metal structure. The distance of the antennas to the sample holder and the dimensions of the sample holder are adjusted so that the sample can be inside of the antennas’ beam patterns at all frequencies, as shown in Fig. 1. Antennas are located 27 cm away from the sample holder. Beamwidth of the antennas at 1 GHz is 81^(@)81^{\circ}, as shown in Fig. 1. 圖 3 所示的實驗裝置包括 Keysight 矢量網路分析儀 (VNA)、Keysight 85056D 精密校正套件、兩個工作頻率範圍為 0.8-8GHz0.8-8 \mathrm{GHz} 的雙脊 A-Info LB-880NF [22] 號角天線、由雲母材料製成的樣品架以及金屬結構。天線與樣品架組裝在金屬結構上。如圖 1 所示,調整天線與樣品架的距離以及樣品架的尺寸,使樣品在所有頻率下都能處於天線的波束圖內。天線與樣品架之間的距離為 27 公分。天線在 1 GHz 時的波束寬度為 81^(@)81^{\circ} ,如圖 1 所示。
TABLE I 表一
Data Numbers of Grains By Moisture Content 按水分含量劃分的穀物數據
資料集
標籤
Dataset
Label
Dataset
Label| Dataset |
| :---: |
| Label |
Product 產品
Moisture Content (%) 水份含量 (%)
數量
instances
Number of
instances
Number of
instances| Number of |
| :---: |
| instances |
The dimensions of sample holder are selected as 50cmxx70cmxx5cm50 \mathrm{~cm} \times 70 \mathrm{~cm} \times 5 \mathrm{~cm}. The sample holder has an unloading hole on the bottom side and about 1000cm^(3)1000 \mathrm{~cm}^{3} grain can be unloaded per second. While the grain is unloaded from the bottom of the sample holder, the grain is loaded continuously from the top side of holder and the sweep time of the VNA is configured as 1.45 ms . In this way, measurements are continuously taken during the grain flow. The measurement parameters of the experimental system are determined according to the radiated near-field region. Fresnel equation for the radiated near-field region is defined as follows: 樣品架的尺寸選擇為 50cmxx70cmxx5cm50 \mathrm{~cm} \times 70 \mathrm{~cm} \times 5 \mathrm{~cm} 。樣品座底部有一卸載孔,每秒可卸載約 1000cm^(3)1000 \mathrm{~cm}^{3} 顆粒。當晶粒從樣品座底部卸下時,晶粒會從樣品座頂端持續載入,VNA 的掃描時間設定為 1.45 ms。如此一來,就可以在晶粒流動的過程中持續進行量測。實驗系統的量測參數是根據輻射近場區域來決定的。輻射近場區域的 Fresnel 方程定義如下:
R > 0.62sqrt((D^(3))/(lambda))R>0.62 \sqrt{\frac{D^{3}}{\lambda}}
where DD is the maximum dimension of antennas and RR distance to MUT, as shown in Fig. 1, and lambda\lambda is the wavelength of electromagnetic wave. If D=28.4cmD=28.4 \mathrm{~cm} [22] and R=27cmR=27 \mathrm{~cm} 其中 DD 為天線的最大尺寸, RR 為到 MUT 的距離,如圖 1 所示, lambda\lambda 為電磁波的波長。如果 D=28.4cmD=28.4 \mathrm{~cm} [22] 和 R=27cmR=27 \mathrm{~cm}
TABLE II 表二
Some Examples of Feature VEctors Extracted From MEasuremEnts 從測量中萃取特徵向量的一些範例
# of Feature # 功能
Name of Feature 功能名稱
Sample 1 樣品 1
Sample 2 樣品 2
Sample 3 樣品 3
1
Frequency (GHz)(\mathrm{GHz}) 頻率 (GHz)(\mathrm{GHz})
1.2
1.8
2.4
2
Real of S_(11)S_{11}S_(11)S_{11} 的實數
-0.0552
-0.0259
0.0561
3
Imaginary of S_(11)S_{11}S_(11)S_{11} 的想像
0.1227
0.0226
-0.0648
4
Real of S_(21)S_{21}S_(21)S_{21} 的實數
0.1805
-0.0034
-0.1193
5
Imaginary of S_(21)S_{21}S_(21)S_{21} 的想像
0.0352
0.1181
0.0759
# of Feature Name of Feature Sample 1 Sample 2 Sample 3
1 Frequency (GHz) 1.2 1.8 2.4
2 Real of S_(11) -0.0552 -0.0259 0.0561
3 Imaginary of S_(11) 0.1227 0.0226 -0.0648
4 Real of S_(21) 0.1805 -0.0034 -0.1193
5 Imaginary of S_(21) 0.0352 0.1181 0.0759| # of Feature | Name of Feature | Sample 1 | Sample 2 | Sample 3 |
| :---: | :--- | :---: | :---: | :---: |
| 1 | Frequency $(\mathrm{GHz})$ | 1.2 | 1.8 | 2.4 |
| 2 | Real of $S_{11}$ | -0.0552 | -0.0259 | 0.0561 |
| 3 | Imaginary of $S_{11}$ | 0.1227 | 0.0226 | -0.0648 |
| 4 | Real of $S_{21}$ | 0.1805 | -0.0034 | -0.1193 |
| 5 | Imaginary of $S_{21}$ | 0.0352 | 0.1181 | 0.0759 |
from Fig. 1 are substituted in (1), measurement frequency must be lower than 2.48 GHz . Thus, measurement frequency band is determined as 1-2.48GHz1-2.48 \mathrm{GHz}. In Fig. 4, the magnitude and phase graphs of S_(11)S_{11} and S_(21)S_{21} measurements of bulgur wheat are given corresponding to different moisture contents. In addition, the measurement setup without material (free space) measurement is shown in Fig. 4. Compared with bulgur wheat and free-space measurements, phase shift and magnitude changes in S_(11)S_{11} and S_(21)S_{21} are observed. Similarly, changes in S_(11)S_{11} and S_(21)S_{21} values are found, depending on the moisture content of bulgur. 將圖 1 所示的頻率代入 (1),測量頻率必須低於 2.48 GHz 。因此,測量頻率範圍確定為 1-2.48GHz1-2.48 \mathrm{GHz} 。圖 4 給出了不同含水量的小麥 S_(11)S_{11} 和 S_(21)S_{21} 測量的幅值和相位圖。此外,圖 4 顯示了無材料 (自由空間) 量測的量測設定。與小麥和自由空間量測相比,可以觀察到 S_(11)S_{11} 和 S_(21)S_{21} 的相移和幅度變化。同樣地,也發現 S_(11)S_{11} 和 S_(21)S_{21} 值的變化,這取決於 bulgur 的水分含量。
D. Construction of Dataset D.建立資料集
Using the experimental setup shown in Fig. 3, measurements are performed for three grain types (bulgur wheat, durum wheat, and corn silage kernel) with various moisture contents. Measurements are performed for flowing grains with moisture in the range of 8%-25%8 \%-25 \% at about 25^(@)C.S_(11)25{ }^{\circ} \mathrm{C} . S_{11} and S_(21)S_{21} coefficients are collected with VNA. Moreover, prior to each measurement, the exact moisture contents of the grains are measured using the gravimetric device, as shown in Fig. 5. The measurements are collected at 1-2.48GHz1-2.48 \mathrm{GHz} frequencies with a total number of 103 during the flowing grain, as shown in Fig. 6. This measurement data includes the scattering parameters S_(11)S_{11} and S_(21)S_{21}. The feature vector is constituted with five feature variables consisting of frequency values, real, and imaginary parts of S_(11)S_{11} and S_(21)S_{21} as given in Table II. 使用圖 3 所示的實驗裝置,針對各種含水量的三種穀物 (球粒小麥、硬質小麥和玉米青穀粒) 進行量測。對於水份在 8%-25%8 \%-25 \% 範圍內的流動穀物進行量測,在約 25^(@)C.S_(11)25{ }^{\circ} \mathrm{C} . S_{11} 和 S_(21)S_{21} 系數時使用 VNA 進行收集。此外,在每次測量之前,都會使用重量測定裝置測量顆粒的確實含水量,如圖 5 所示。如圖 6 所示,在流動穀物的過程中,以 1-2.48GHz1-2.48 \mathrm{GHz} 頻率收集測量數據,總數為 103 個。此量測資料包括散射參數 S_(11)S_{11} 和 S_(21)S_{21} 。特徵向量由五個特徵變數組成,包括 S_(11)S_{11} 和 S_(21)S_{21} 的頻率值、實數和虛數部分,如表 II 所示。
Three datasets are constituted as datasets 1-3 corresponding to each grain type. Information about the datasets are given in Table I. When Table I is examined, it is seen that the number of data corresponding to some moisture ratios is more than 117. This is because the same humidity content coincides with moisture measurements performed on different days. Datasets include 721, 2781, and 2060 free-space measurement data including five feature variables (frequency, real, and imaginary parts of S_(11)S_{11} and S_(21)S_{21} ) for bulgur wheat, durum wheat, and corn silage kernel, respectively. 每個穀物類型有三個資料集,分別為資料集 1-3。資料集的相關資訊如表 I 所示。當檢查表 I 時,可以發現某些濕度比對應的資料數量超過 117 個。這是因為相同的濕度含量與不同日期進行的濕度測量吻合。資料集包括 721、2781 和 2060 個自由空間測量資料,其中包括五個特徵變數 ( S_(11)S_{11} 和 S_(21)S_{21} 的頻率、實數和虛數部分),分別用於粗粒小麥、硬質小麥和玉米青穀仁。
E. Evaluation Metrics and K-Cross Validation E.評估指標和 K 交叉驗證
The kk-CV technique is one of the most used methods for model selection and training and testing of the modeled algorithms on the dataset. It is preferred in various studies [23]-[26] due to ease of application. In the traditional method, a portion of the dataset is used to test the algorithm and the remaining data is used to train the algorithm. However, the k-CVk-\mathrm{CV} tests all the dataset and this increases the reliability of the modeled ML algorithm. As seen from Fig. 7, dataset is divided into sub-k-data. kk -CV 技術是模型選擇以及在資料集上訓練和測試建模演算法最常用的方法之一。由於易於應用,它在各種研究 [23] -[26] 中受到青睞。在傳統方法中,資料集的一部分用於測試演算法,其餘資料用於訓練演算法。但是, k-CVk-\mathrm{CV} 測試所有資料集,這增加了建模 ML 演算法的可靠性。從圖 7 可以看出,資料集被分成子資料。
Fig. 4. Magnitude and phase of S_(11)S_{11} and S_(21)S_{21} corresponding to each moisture content of bulgur wheat and free space. 圖 4. S_(11)S_{11} 和 S_(21)S_{21} 的大小和相位對應於球粒小麥和自由空間的各含水量。
Fig. 5. Moisture measurement with a gravimetric-based device. 圖 5.使用重量計裝置進行水份測量。
Fig. 6. Scene of the measurement of the transmission and reflection coefficients of the flowing grain. 圖 6.測量流動晶粒的穿透係數與反射係數的場景。
While each subdata is used for testing, the remaining k-1k-1 subdata is used for the training. Thus, testing and training are completed in kk-iterations. At each iteration, the evaluation metrics of the test data are calculated. The performance of the algorithm is determined by averaging each evaluation metrics in all iterations. In this study, kk is selected as 10. Five evaluation metrics are used to assess the performance of designed models. These metrics include mean absolute error (MAE), mean absolute percentage error (MAPE), mean squared error (MSE), root mean squared error (RMSE), and the adjusted determination coefficient (adjusted R^(2)R^{2} ). 每個子資料用於測試,而剩餘的 k-1k-1 子資料用於訓練。因此,測試和訓練會在 kk 次迭代中完成。在每次迭代中,都會計算測試資料的評估指標。演算法的效能取決於所有迭代中每個評估指標的平均值。在本研究中, kk 選定為 10。五個評估指標用於評估設計模型的性能。這些指標包括平均絕對誤差 (MAE)、平均絕對百分比誤差 (MAPE)、平均平方誤差 (MSE)、平均平方根誤差 (RMSE) 以及調整後的判定係數 (adjusted R^(2)R^{2} )。
Fig. 7. Representation of the kk-CV technique. 圖 7. kk -CV 技術的表示方法。
F. Design of ML Algorithms F.ML 演算法的設計
Modeling and evaluation of ML algorithms are performed in MATLAB. The moistures of grains are predicted by KNN, SVR, and ANN. 在 MATLAB 中執行 ML 演算法的建模與評估。使用 KNN、SVR 及 ANN 來預測顆粒的濕度。
KNN is the most used ML algorithm [23], [27]. KNN reaches the result by calculating the distance relationship between features of the training dataset and features of the new dataset. Distance is calculated by using some distance equations such as Euclidean, Manhattan, and City Block. The distance value is sorted and the outputs of the test data corresponding to kk-neighbors are determined. Since KNN is used for regression in this study, the outputs of the test data are determined by averaging the values corresponding to kk-neighbors as in [27]. In this study, the metric equation and neighbors’ numbers are tuned by using different metric equations (Euclidean, City Block, Manhattan, Hamming Chebychev, and correlation) and neighbors’ numbers (1-10). KNN 是最常用的 ML 演算法 [23]、[27]。KNN 透過計算訓練資料集的特徵與新資料集的特徵之間的距離關係來得出結果。距離是使用一些距離公式來計算的,例如 Euclidean、Manhattan 和 City Block。對距離值進行排序,並確定 kk -neighbors 對應的測試資料輸出。由於本研究使用 KNN 進行迴歸,因此測試資料的輸出是透過平均 kk -neighbors 對應的值來決定,如 [27]。在本研究中,使用不同的度量等式 (Euclidean、City Block、Manhattan、Hamming Chebychev 和 correlation) 和鄰居數 (1-10) 來調整度量等式和鄰居數。
TABLE III 表三
Parameters of the Modeled Algorithms 建模演算法的參數
Consequently, distance metric and number of neighbors that give the best results are used as Euclidean and 1, respectively. 因此,得到最佳結果的距離指標和鄰居數目分別使用 Euclidean 和 1。
SVR is a regression version of the support vector machine [28]. The main goal of SVR is to find the optimal regression line between input and output variables. The existing features of the dataset are transformed into highdimensional new features by means of kernel functions. The most well-known kernel functions are linear, polynomial, and Gaussian. The right kernel function, kernel coefficient, margin tolerance, and penalty coefficient of loss function improve the performance of the algorithm of the function. In this study, in order to the determine the optimum values of kernel and penalty coefficients for Gaussian kernel function, kernel, margin tolerance, and penalty coefficients are varied in the range of 0.1-10,0.1-100.1-10,0.1-10, and 0.1-100000.1-10000, respectively. Therefore, kernel and penalty coefficients are tuned as 0.45 and 100 , respectively. SVR 是支援向量機的迴歸版本 [28]。SVR 的主要目標是找出輸入與輸出變數之間的最佳迴歸線。資料集的現有特徵會透過核函數轉換成高維的新特徵。最著名的核函數有線性、多項性和高斯。正確的核函數、核係數、邊際容許量及損失函數的罰款係數可改善函數演算法的效能。在本研究中,為了確定高斯核函數的核與懲罰係數的最佳值,核、邊際公差與懲罰係數分別在 0.1-10,0.1-100.1-10,0.1-10 與 0.1-100000.1-10000 的範圍內變化。因此,核與罰則係數分別調整為 0.45 與 100 。
ANN is an effective ML method widely used in the prediction and classification of problems [18], [28]. It is developed with inspiration from the brain structure and basically consists of input, hidden, and output layers. To minimize the error between the target and the output, the ANN can train the network by weakening or strengthening the synaptic weights. Thus, it can successfully perform regression or classification with a trained network with appropriate parameters that provide the input and output relationship. In this study, while the number of hidden layers of the ANN are selected between 1 and 3, the nodes of the layers are selected between 4 and 64 . By evaluating the results of different configurations, the number of hidden layers and nodes of the ANN are determined. ANN 是一種有效的 ML 方法,廣泛應用於問題的預測與分類 [18]、[28]。它的發展靈感來自於大腦結構,基本上由輸入、隱藏和輸出層組成。為了最小化目標與輸出之間的誤差,ANN 可以透過弱化或強化突觸權值來訓練網路。因此,它可以透過提供輸入與輸出關系的適當參數,訓練網路成功執行迴歸或分類。在本研究中,ANN 的隱藏層數選擇在 1 到 3 之間,而各層的節點則選擇在 4 到 64 之間。透過評估不同配置的結果,決定 ANN 的隱藏層數及節點數。
Consequently, the hidden layer number is determined as 3 and their nodes are selected as 16,32 , and 64 , respectively. All parameters of algorithms used in this study are given in Table III. 因此,隱藏層的數目決定為 3,其節點分別選擇為 16、32 和 64。本研究使用的所有演算法參數如表 III 所示。
G. Proposed Method G. 建議方法
Fig. 8 shows the flowchart of the proposed method to determine the moisture of grains with ML algorithms. 圖 8 顯示利用 ML 演算法判斷穀物水分的建議方法流程圖。
Fig. 8. Proposed method. 圖 8.建議的方法。
Prior to training and testing, the dataset is normalized using minimum-maximum scaling method. The training and testing process is carried out according to the k-CVk-\mathrm{CV} technique. The dataset is divided into ten subdataset and each subdataset is used for testing and the remaining nine subdataset is used for training. The performances of the algorithms are calculated by taking the average of each metric criteria of ten subdatasets. 在訓練和測試之前,資料集會使用最小-最大縮放法進行規範化。訓練和測試過程是根據 k-CVk-\mathrm{CV} 技術進行。資料集分為 10 個子資料集,每個子資料集用於測試,其餘 9 個子資料集用於訓練。演算法的效能是以 10 個子資料集的平均值來計算。
III. RESUlts AND DISCUSSION III.結果與討論
In this study, KNN, SVR, and ANN are used to estimate the moisture content of flowing grain. Algorithms are trained and tested with k-CVk-\mathrm{CV} technique. Performances of the algorithms are obtained with most used five evaluation metrics. The estimation performances corresponding to each moisture content of the grains are also evaluated. Moreover, in order to evaluate the effect of each feature on performance of algorithms, each feature is extracted from the datasets, respectively. The effects of features are found with metrics. 本研究使用 KNN、SVR 及 ANN 來估計流動穀物的含水量。使用 k-CVk-\mathrm{CV} 技術對演算法進行訓練與測試。演算法的效能是以最常用的五個評估指標來取得。同時也評估了相對應於各種穀物水分含量的估計性能。此外,為了評估每個特徵對演算法效能的影響,我們分別從資料集中抽取每個特徵。特徵的影響可透過度量標準來發現。
A. Evaluation of Performance of Algorithms for Bulgur Wheat A.Bulgur 小麥的演算法效能評估
Dataset 1 including 721 bulgur wheat measurements is used for training and testing of algorithms. The effect of the features on the performance of the algorithms is investigated. Tables IV-VI show the metric results of algorithms in different feature combinations of datasets. The results have examined considerable performance differences observed between datasets containing S_(21)S_{21} and Comb #1 dataset (not containing S_(21)S_{21} ). However, all algorithms show the best performance in datasets, including of S_(11),S_(21)S_{11}, S_{21}, and ff. On the other hand, the results show that SVR model with Comb #8 dataset provides best metric values as 0.80 MAE , 1.35 MSE, 1.14 RMSE, 6.59 MAPE, and 0.89 adjusted R^(2)R^{2}. Moreover, the MAPEs of the algorithms corresponding to each moisture content are shown in Fig. 9. 資料集 1 包含 721 個 bulgur 小麥測量數據,用於演算法的訓練與測試。研究了特徵對演算法效能的影響。表 IV-VI 顯示演算法在不同資料集特徵組合的度量結果。結果檢驗了包含 S_(21)S_{21} 的資料集與 Comb #1 資料集(不包含 S_(21)S_{21} )之間觀察到的相當大的效能差異。然而,所有演算法都在資料集 (包括 S_(11),S_(21)S_{11}, S_{21} 和 ff ) 中展現最佳效能。另一方面,結果顯示使用 Comb #8 數據集的 SVR 模型提供了最佳的度量值,即 0.80 MAE、1.35 MSE、1.14 RMSE、6.59 MAPE 和 0.89 調整 R^(2)R^{2} 。此外,每個含水量對應的演算法的 MAPE 如圖 9 所示。
The variation of KNN’s MAPE values between 1 and 46 indicates that it has an unstable performance. Since KNN is a simple classifier that does not make any regressions, KNN 的 MAPE 值在 1 到 46 之間的變化顯示它的性能不穩定。由於 KNN 是一種不做任何回歸的簡單分類器、
Fig. 9. MAPE values corresponding to each moisture content for bulgur wheat. 圖 9.穀物小麥各水分含量對應的 MAPE 值。
TABLE IV 表四
Evaluation of Features EFFect of KNN For Bulgur Wheat 評估 KNN 對 Bulgur 小麥的特徵效果
Fig. 10. MAPE values corresponding to each moisture content for durum wheat. 圖 10.硬粒小麥各水分含量對應的 MAPE 值。
this result is expected. On the other hand, it is seen (from Figs. 9-11) that the performances of SVR and ANN increase as the moisture content of grain increases. Since the increase in 此結果在意料之中。另一方面,從圖 9-11 可以看出 SVR 和 ANN 的效能隨著穀物含水率的增加而增加。由於
Fig. 11. MAPE values corresponding to each moisture content for corn silage kernel. 圖 11.與玉米青穀粒各含水量相對應的 MAPE 值。
TABLE VII 表七
Evaluation of Features Effect of KNN for Durum WhEat 評估 KNN 對硬麥的特徵效果