表 3-4 监测点 A 从 2020 年 8 月 25 日到 8 月 28 日每天实测的 AQI 和首要污染物(逐日实测数据计算) Table 3-4 Daily measured AQI and top pollutants at monitoring site A from August 25 to August 28, 2020 (calculated from daily measured data)
监测点 A 从 2020 年 8 月 25 日到 8 月 28 日每日根据逐小时实测数据的平均值计算出的各项污染物的 IAQI 如表 3-5 所示, The IAQIs for each pollutant calculated from the average of the hour-by-hour measured data for each day from August 25 through August 28, 2020, at monitoring site A are shown in Table 3-5 below.
表 3-5 监测点 A 从 2020 年 8 月 25 日到 8 月 28 日每天实测的 IAQI 和空气质量等级(逐小时实测数据计算) Table 3-5 Measured IAQI and Air Quality Levels at Monitoring Site A for each day from August 25 to August 28, 2020 (calculated from hourly measured data)
则监测点 A 从 2020 年 8 月 25 日到 8 月 28 日根据逐小时实测数据的平均值根据逐小时实测数据的平均值的 AQI 和首要污染物,结果如表 3-6 所示。 The results of AQI and primary pollutants at monitoring point A from August 25 to August 28, 2020, based on the average of hourly measured data, are shown in Table 3-6.
表 3-6 监测点 A 从 2020 年 8 月 25 日到 8 月 28 日每天实测的 AQI 和首要污染物(逐小时实测数据计算) Table 3-6 Daily measured AQI and top pollutants at monitoring site A from August 25 to August 28, 2020 (calculated from hourly measured data)
根据结果可以看出:根据逐日实测数据计算出的 AQI 和根据逐小时实测数据的平均值根据根据逐日实测数据计算出的 AQI 值十分接近,且首要污染物相同。 According to the results, it can be seen that the AQI calculated based on the daily measured data and the average value based on the hourly measured data are very close to each other, and the top pollutants are the same.
4. 问题二分析与求解 4. Analysis and solution of problem II
4. 1 问题二分析 4.1 Analysis of question two
空气质量预报模型的构建需要符合实际情况且误差控制在一定范围内的、相对准确的相关变量测量数据作为支撑。但是在实际的监测站点设备在进行数据测量、数据记录、数据导出等过程中难免会存在一定问题,使得最终得到的数据(原始数据)与希望得到的良好数据存在一定差距。例如,监测站点设备在进行数据测量过程中会遇到各种不同条件的较为复杂的实际工况,最终会导致采集的原始数据中存在着或多或少的不良数据,包括连续或间断性的数值缺失、数值漂移(偏大或偏小)等情况。因此,对不良数据进行科学有 The construction of an air quality forecasting model needs to be supported by relatively accurate measurements of relevant variables that are consistent with the actual situation and within a certain margin of error. However, in the actual monitoring station equipment in the process of data measurement, data recording, data export and so on, there will inevitably be certain problems, so that the final data (raw data) and hope to get good data have a certain gap. For example, the monitoring station equipment in the data measurement process will encounter a variety of different conditions of the more complex actual working conditions, which will ultimately lead to the collection of raw data there are more or less bad data, including continuous or intermittent missing values, numerical value drift (biased large or small) and so on. Therefore, it is important to scientifically and effectively analyze the bad data.
效地预处理,对于空气质量预报模型的构建有着决定性意义。 Effective preprocessing is decisive for the construction of air quality forecasting models.
对于实时数据库采集的不同位点的数据来说,采集数据的监测站点设备的数据均有部分站点存在问题,即部分站点只含有部分时间段的数据,部分站点的数据全部为空值或部分数据为空值,同时存在部分站点的数据超出了限值,因此对原始数据进行处理后才可以使用;对于实际的监测站点设备采集的近地 2 米温度( ^(@)C{ }^{\circ} \mathrm{C} )、地表温度(K)、湿度(%)、近地 10 米风速( m//s\mathrm{m} / \mathrm{s} )、大气压( Kpa )的相应气象数据来说,由于这些气象参数通常情况下是波动的,但在较短时间范围内可以认为不发生变化。 For the data collected from different locations in the real-time database, the data collected by the monitoring station equipment have problems in some of the stations, i.e., some of the stations only contain data for part of the time period, some of the data in some of the stations are all null or some of the data are null, and at the same time, some of the data in some of the stations are out of the limits, so the raw data can only be used after processing; for the actual monitoring station equipment For the corresponding meteorological data collected by the actual monitoring station equipment of near-earth 2-meter temperature ( ^(@)C{ }^{\circ} \mathrm{C} ), surface temperature (K), humidity (%), near-earth 10-meter wind speed ( m//s\mathrm{m} / \mathrm{s} ), and atmospheric pressure (Kpa), since these meteorological parameters are usually fluctuating, they can be regarded as unchanged for a relatively short period of time.
4.2 问题二求解 4.2 Solving Problem 2
4.2.1 数据预处理 4.2.1 Data pre-processing
将一氧化碳( CO)) 、二氧化硫 (SO_(2))\left(\mathrm{SO}_{2}\right) 、二氧化氮 (NO_(2))\left(\mathrm{NO}_{2}\right) 、臭氧 (O_(3))\left(\mathrm{O}_{3}\right) 、粒径小于 10 mum10 \mu \mathrm{~m} 的颗粒物( PM_(10)\mathrm{PM}_{10} )、粒径小于 2.5 mum2.5 \mu \mathrm{~m} 的颗粒物( PM_(2.5)\mathrm{PM}_{2.5} )的浓度的原始数据进行可视化,如图 4-1 所示。 Raw data of carbon monoxide (CO )) , sulfur dioxide (SO2) (SO_(2))\left(\mathrm{SO}_{2}\right) , nitrogen dioxide (NO2) (NO_(2))\left(\mathrm{NO}_{2}\right) , ozone (O3) (O_(3))\left(\mathrm{O}_{3}\right) , particulate matter smaller than 10 mum10 \mu \mathrm{~m} ( PM_(10)\mathrm{PM}_{10} ), and particulate matter smaller than 2.5 mum2.5 \mu \mathrm{~m} ( PM_(2.5)\mathrm{PM}_{2.5} ) are visualized. This is shown in Figure 4-1.
图 4-1 污染物浓度原始数据可视图 Figure 4-1 View of raw pollutant concentration data
数据因监测站点设备调试、维护等原因,实测数据在连续时间内存在部分或全部缺失的情况;受监测站点及其附近某些偶然因素的影响,实测数据在某个小时(某天)的数值偏离数据正常分布;题目提供的监测气象指标共计五项(温度、湿度、气压、风向、风速), The data are missing in part or in whole for a continuous period of time due to commissioning and maintenance of the equipment at the monitoring stations; the values of the measured data deviate from the normal distribution of the data in a certain hour (day) due to certain incidental factors in the monitoring stations and their vicinity; the title provides a total of five meteorological indicators for monitoring (temperature, humidity, barometric pressure, wind direction and wind speed).
因不同监测站点使用设备存在差异,部分气象指标在某些监测站点无法获取。 Some meteorological indicators are not available at some monitoring stations because of differences in the equipment used at different stations.
综合以上分析,现针对附件 1 中的原始数据中各种类型的不良数据分别进行相应分析及处理。对于附件 1 中的原始数据,应将其中的异常值剔除掉,剔除的标准则可以采用拉依达准则( 3sigma3 \sigma 准则)。综上,针对该原始数据可能存在的异常情况作如下方面的预处理: After summarizing the above analysis, various types of bad data in the raw data in Annex 1 are analyzed and processed accordingly. For the raw data in Annex 1, the outliers should be eliminated, and the criteria for elimination can be the Lajda criterion ( 3sigma3 \sigma criterion). In summary, the following pre-processing is done for possible anomalies in the raw data:
(1)变量值缺失为空值 (1) Variable values are missing as null values
监测站点设备数据采集过程中导致部分点位出现连续的或者间断性的变量值缺失为空值,对于这种情况主要采取利用缺失值前后一小时内的数值取平均的方式来处理。 The data collection process of the monitoring station equipment leads to the occurrence of continuous or intermittent variable values missing as null values at some points, for which the average of the values in the hour before and after the missing values is mainly adopted to deal with the situation.
(2)变量值超出限值范围(超出最小最大值) (2) Variable values outside the limits (exceeding the minimum maximum value)
在实际操作过程中,可能会使得操作变量的实际控制超出附录所要求的最小最大值范围,对这种情况则考虑将超出范围区间的值剔除出去,即在对不同站点的数据取平均得出最终的确定值过程中,对这些超出最小最大值范围的数据取最小值或最大值。 In practice, the actual control of the operating variables may exceed the minimum and maximum values required by the appendix, in this case, we consider that the value out of the range is excluded, that is, in the process of averaging the data from different sites to arrive at the final value, the data that exceeds the minimum and maximum values are taken as the minimum or maximum value.
(3)变量值超出 3sigma3 \sigma 区间范围 (3) Variables with values outside the 3sigma3 \sigma interval
有理由认为监测站点设备变量的过程为连续的,不会出现跳跃的情况。但是可能由于操作装置采集数据过程可能存在问题,在实际的原始数据中,可以发现该类型的"跳跃"。因此对于站点采集到的数据,考虑将这样的跳跃值剔除出去,即在对站点的变量数据取平均得出最终的变量确定值过程中,对这些在 3sigma3 \sigma 范围外的数据不予考虑。 It is reasonable to assume that the process of monitoring the equipment variables at the site is continuous and that there are no jumps. However, this type of "jump" can be found in the actual raw data due to possible problems with the data collection process of the operating device. Therefore, for the data collected at the site, it is considered that such jumps should be excluded, i.e., data outside the 3sigma3 \sigma range are not taken into account in the process of averaging the variable data at the site to obtain the final determined value of the variable.
(4)变量值存在单位不一致的情况 (4) There are unit inconsistencies in the values of the variables
变量值在附件中存在单位不一致的情况,例如在一次预测数据中气压单位为 Kpa,而在实测数据中气压单位为 MBar,这两者之间的单位换算等价式为 1Kpa=10MBar1 \mathrm{Kpa}=10 \mathrm{MBar} 。 There are inconsistencies in the units of the variable values in the annexes, for example, the unit of barometric pressure is Kpa in a prediction and MBar in the measured data, and the unit conversion equivalence between the two is 1Kpa=10MBar1 \mathrm{Kpa}=10 \mathrm{MBar} .
处理后的数据及每日 AQI 值在附件 1、2 中给出,并且给出数据可视化的结果,如图 4-2 所示。 The processed data and daily AQI values are given in Annexes 1 and 2, and the results of the data visualization are given as shown in Figure 4-2.
图 4-2 污染物浓度处理后数据可视图 Figure 4-2 View of Processed Pollutant Concentration Data
4. 2.2 模型原理及框架 4. 2.2 Model rationale and framework
k-means 聚类算法模型 k-means clustering algorithm model
对于给定的一个包含 nn 个数据点的数据集 X={X_(1),X_(2),dots,X_(n)}X=\left\{X_{1}, X_{2}, \ldots, X_{n}\right\} ,其中 X_(i)inR_(d)\mathrm{X}_{\mathrm{i}} \in \mathrm{R}_{\mathrm{d}} ,以及要生成数据子集的数目 K,K\mathrm{K}, \mathrm{K}-Means 聚类算法将数据对象组织为 K 个划分 C={c_(k),i=1,2,dots,(K)}\mathrm{C}=\left\{\mathrm{c}_{\mathrm{k}}, \mathrm{i}=1,2, \ldots, \mathrm{~K}\right\} 。每个划分代表一个类 c_(k)c_{k} ,每个类 c_(k)c_{k} 有一个类别中心 mu_(i)\mu_{i} 。选取欧氏距离作为相似性和距离判别准则,计算该类内各点到聚类中心 mu_(i)\mu_{i} 的距离平方和 For a given dataset X={X_(1),X_(2),dots,X_(n)}X=\left\{X_{1}, X_{2}, \ldots, X_{n}\right\} containing nn data points, where X_(i)inR_(d)\mathrm{X}_{\mathrm{i}} \in \mathrm{R}_{\mathrm{d}} , and the number of data subsets to be generated the K,K\mathrm{K}, \mathrm{K} -Means clustering algorithm organizes the data objects into K divisions C={c_(k),i=1,2,dots,(K)}\mathrm{C}=\left\{\mathrm{c}_{\mathrm{k}}, \mathrm{i}=1,2, \ldots, \mathrm{~K}\right\} . Each division represents a class c_(k)c_{k} and each class c_(k)c_{k} has a category center mu_(i)\mu_{i} . The Euclidean distance is chosen as the similarity and distance criterion, and the sum of the squares of the distances from each point in the class to the clustering center mu_(i)\mu_{i} is computed
聚类目标是使各类总的距离平方和 J(C)=sum_(k=1)^(K)J(c_(k))\mathrm{J}(\mathrm{C})=\sum_{\mathrm{k}=1}^{\mathrm{K}} \mathrm{J}\left(\mathrm{c}_{\mathrm{k}}\right) 最小。 The goal of clustering is to minimize the total sum of squared distances J(C)=sum_(k=1)^(K)J(c_(k))\mathrm{J}(\mathrm{C})=\sum_{\mathrm{k}=1}^{\mathrm{K}} \mathrm{J}\left(\mathrm{c}_{\mathrm{k}}\right) for each class.
其中, d_(ki)={[1","X_(i)inc_(i)],[0","X_(i)!inc_(i)]:}\mathrm{d}_{\mathrm{ki}}=\left\{\begin{array}{l}1, \mathrm{X}_{\mathrm{i}} \in \mathrm{c}_{\mathrm{i}} \\ 0, \mathrm{X}_{\mathrm{i}} \notin \mathrm{c}_{\mathrm{i}}\end{array}\right. where, d_(ki)={[1","X_(i)inc_(i)],[0","X_(i)!inc_(i)]:}\mathrm{d}_{\mathrm{ki}}=\left\{\begin{array}{l}1, \mathrm{X}_{\mathrm{i}} \in \mathrm{c}_{\mathrm{i}} \\ 0, \mathrm{X}_{\mathrm{i}} \notin \mathrm{c}_{\mathrm{i}}\end{array}\right.
显然根据最小二乘法和拉格朗日原理,聚类中心 mu_(k)\mu_{k} 应该取为类别 c_(k)c_{k} 类各数据点的平均值。 Obviously according to the least squares method and Lagrange's principle, the clustering center mu_(k)\mu_{k} should be taken as the average of the data points of the category c_(k)c_{k} class.
K-Means 聚类算法从一个初始的 K 类别划分开始,然后将各数据点指派到各个类别中,以减少总的距离平方和。因为 K-Means 聚类算法中总的距离平方和随着类别个数 K 的增加而趋向于减少。因此,总的距离平方和只能在某个确定的类别个数 K 下,取得最小值。定义: The K-Means clustering algorithm starts with an initial K-category division and then assigns each data point to each category to reduce the total distance sum of squares. This is because the total distance sum of squares in the K-Means clustering algorithm tends to decrease as the number of categories K increases. Therefore, the total sum of squared distances can only be minimized for a defined number of categories K. Definition.
(1)两个数据对象间的距离: (1) Distance between two data objects:
我们采用欧式距离(Euclidean Distance)进行计算,计算公式为 We use the Euclidean Distance (EDD) for the calculation, which is calculated as
(2)准则函数 E (2) The criterion function E
对于 K-means 算法,通常使用准则函数 E,也就是误差平方和(Sum of Squared Error, SSE)作为度量聚类质量的目标函数。 For the K-means algorithm, the criterion function E, which is the Sum of Squared Error (SSE), is usually used as the objective function to measure the quality of clustering.
其中, d(d( )表示两个对象之间的距离。 where d(d( ) denotes the distance between two objects.
对于相同的 k 值,更小的 SSE 说明簇中对象越集中。对于不同的 k 值,越大的 k 值应该越小的 SSE。 For the same value of k, a smaller SSE indicates a greater concentration of objects in the cluster. For different k values, larger k values should have smaller SSE.
k-means 聚类算法实现步骤 Steps to implement the k-means clustering algorithm
首先,随机选择 k 个对象,每个对象代表一个簇的初始均值或中心;对剩余的每个对象,根据其与各簇中心的距离,将它指派到最近(或最相似)的簇,然后计算每个簇的新均值, 得到更新后的簇中心;不断重复,直到准则函数收玫。通常,采用平方误差准则,即对于每个簇中的每个对象,求对象到其中心距离的平方和,这个准则试图生成的 k 个结果簇尽可能地紧凑和独立。 First, k objects are randomly selected, each representing the initial mean or center of a cluster; for each remaining object, it is assigned to the nearest (or most similar) cluster based on its distance from the center of the clusters, and then the new mean is computed for each cluster to obtain the updated center of the clusters; and this is repeated until the criterion function is closed. Typically, a squared error criterion is used, i.e., for each object in each cluster, the sum of the squares of the distances of the objects to their centers, which attempts to generate k resultant clusters that are as compact and independent as possible.
步骤: Steps:
输入:聚类个数 k ,以及包含 n 个数据对象的数据库 X ; Inputs: the number of clusters k, and a database X containing n data objects;
输出:满足方差最小标准的 k 个聚类。 Output: k clusters that satisfy the criterion of minimizing variance.
处理流程: Processing Flow:
步骤 1 从 n 个数据对象任意选择 k 个对象作为初始聚类中心; Step 1 Arbitrarily select k objects from n data objects as initial clustering centers.
步骤 2 根据簇中对象的平均值,将每个对象重新赋给最类似的簇; Step 2 Reassign each object to the most similar cluster based on the average value of the objects in the cluster;
步骤 3 更新簇的平均值,即计算每个簇中对象的平均值; Step 3 Update the average of the clusters, i.e., calculate the average value of the objects in each cluster;
步骤4 循环 Step2 到 Step3 直到每个聚类不再发生变化为止 ^([4]){ }^{[4]} 。 Step 4 Loop Step2 through Step3 until each cluster no longer changes ^([4]){ }^{[4]} .
4.2.3 聚类结果 4.2.3 Clustering results
用 Matlab 通过 k-means 算法聚类得到的结果如下: The results obtained by k-means algorithm clustering using Matlab are as follows:
首先,由于在污染物排放情况不变的条件下,某一地区的气象条件有利于污染物扩散或沉降时,该地区的 AQI 会下降,反之会上升。气象因素中高温、低压、低湿、高风速、都有利于污染物浓度的清除和扩散,任何一个因素的变化均会引起环境空气质量的变化,而不同污染物受气象条件影响程度不同,故根据每个气象条件对污染物浓度的影响程度, First of all, since under the condition of unchanged pollutant emission, when the meteorological conditions in a certain area are favorable for pollutant diffusion or deposition, the AQI in that area will decrease, and vice versa will increase. Meteorological factors such as high temperature, low pressure, low humidity, high wind speed, are all favorable to the removal and diffusion of pollutant concentrations, and changes in any one of these factors will cause changes in ambient air quality, and different pollutants are affected by meteorological conditions to different extents, so according to the extent of the influence of each meteorological condition on pollutant concentrations, the
对气象条件和计算所得对应 AQI 值分别进行聚类分析,对气象条件进行划分,聚类中心结果保留 3 位小数, The meteorological conditions and the corresponding AQI values obtained from the calculations were subjected to cluster analysis, respectively, and the meteorological conditions were divided, and the results of the cluster centers were retained in 3 decimal places.
1.温度分成两类:高温(聚类中心为 27.691)、低温(聚类中心为 17.921) 1. Temperature is divided into two categories: high temperature (center of clustering is 27.691) and low temperature (center of clustering is 17.921).
2.湿度分成两类:高湿(聚类中心为 72.988)、低湿(聚类中心为 45.970) 2. Humidity is divided into two categories: high humidity (center of clustering is 72.988) and low humidity (center of clustering is 45.970).
3.气压分成两类:高压(聚类中心为1016.784)、低压(聚类中心为 1006.346) 3. Air pressure is divided into two categories: high pressure (center of clustering 1016.784) and low pressure (center of clustering 1006.346).
4.风速分成三类:微风(聚类中心为 2.226)、轻风(聚类中心为 1.530 )、软风(聚类中心为 0.994 ) 4. Wind speed is divided into three categories: light wind (center of clustering 2.226), light wind (center of clustering 1.530), and soft wind (center of clustering 0.994).
通过对聚类结果进行分析,可以将气象进行如下分类: By analyzing the clustering results, the weather can be classified as follows:
1:按温度分为两类:第一类高温,平均温度 27.691^(@)C27.691^{\circ} \mathrm{C} ,有利于污染物的扩散,下层气温高,使空气流动剧烈,底层乱流、湍流比较旺盛,有利于污染物向高空传输,从而使得近地面污染物浓度降低,使得空气质量指数降低,AQI 取值范围在( 0,60 )之间;第二类低温,平均温度 17.921^(@)C17.921^{\circ} \mathrm{C} ,不利于污染物扩散,即污染物浓度随环境温度的降低而增高,AQI 取值范围(60,150); 1: According to the temperature is divided into two categories: the first type of high temperature, the average temperature 27.691^(@)C27.691^{\circ} \mathrm{C} , is conducive to the diffusion of pollutants, the lower air temperature is high, so that the air flow is intense, the bottom of the turbulence, turbulence is more vigorous, is conducive to pollutants to the high altitude of the transmission, so that the pollutants near the ground to reduce the concentration of pollutants, so that the air quality index is reduced, AQI take the value of the range between (0,60); the second type of low temperature. The second type of low temperature, average temperature 17.921^(@)C17.921^{\circ} \mathrm{C} , is not conducive to the diffusion of pollutants, that is, the concentration of pollutants increases with the reduction of ambient temperature, AQI value range (60, 150);
2:按湿度分为两类:第一类,高湿,平均湿度 72.988%72.988 \% ,湿度高,水汽对污染物有吸附作用,特别是降水的时候,空气中的水汽含量高,会使 PM_(10)、PM_(2.5)\mathrm{PM}_{10} 、 \mathrm{PM}_{2.5} 质量增加而使浮悬颗粒物沉降到地面,降低 PM10、PM2.5的浓度,故高湿环境能够促使污染物的扩散,降低污染物浓度,AQI 取值范围( 0,65 );第二类低湿,平均湿度 45.970%45.970 \% ,不利于污染物扩散,污染物浓度较高,AQI 取值范围( 65,150 ); 2: According to the humidity is divided into two categories: the first category, high humidity, average humidity 72.988%72.988 \% , high humidity, water vapor on the pollutants have adsorption effect, especially when precipitation, the water vapor content in the air is high, it will make the PM_(10)、PM_(2.5)\mathrm{PM}_{10} 、 \mathrm{PM}_{2.5} mass increase and make the floating particulate matter to the ground to reduce the concentration of PM10, PM2.5, so the high humidity environment can contribute to the diffusion of pollutants, reduce Therefore, the high humidity environment can promote the diffusion of pollutants and reduce the concentration of pollutants, and the AQI is taken in the range of 0,65; the second category is low humidity, average humidity 45.970%45.970 \% , which is not conducive to the diffusion of pollutants, and the concentration of pollutants is high, and the AQI is taken in the range of 65,150;
3.按气压分为两类:第一类,高压,平均气压 1016.784MBar,大气压与污染物浓度正相关,高压时,大气层结构稳定,气流下沉,不利于污染物的垂直扩散,污染物浓度累积增加,AQI 取值范围(55,150);第二类,低压,平均气压 1006.346MBar,由于气流上升,有利于污染物扩散,AQI 取值范围( 0,55 ); 3. According to the air pressure is divided into two categories: the first category, high pressure, average pressure 1016.784MBar, atmospheric pressure and pollutant concentration is positively correlated, high pressure, the atmospheric structure is stable, the airflow is sinking, which is not conducive to the vertical diffusion of pollutants, and pollutant concentration accumulates and increases, the AQI takes the range of values (55,150); the second category, low pressure, the average pressure of 1006.346MBar, due to air flow Second category, low pressure, average air pressure 1006.346MBar, due to airflow rising, favorable to pollutant diffusion, AQI value range (0,55);
4. 按风速分成三类:第一类,微风,平均风速 2.226m//s2.226 \mathrm{~m} / \mathrm{s} ,风有利于污染物的水平扩散,风速越大,污染物水平扩散能力越大,降低污染物浓度,AQI 取值范围( 0,50 );第二类,轻风,平均风速 1.530m//s1.530 \mathrm{~m} / \mathrm{s} ,污染物浓度较高,AQI 取值范围(50,100);第三类,软风,平均风速 0.994m//s0.994 \mathrm{~m} / \mathrm{s} ,风速过低,混合作用强于扩散作用不利于污染物扩散,污染物浓度高,相应 AQI 值也较高,取值范围( 100,150)。) 。 4. According to the wind speed is divided into three categories: First category, light wind, average wind speed 2.226m//s2.226 \mathrm{~m} / \mathrm{s} , the wind is conducive to the horizontal diffusion of pollutants, the greater the wind speed, the greater the horizontal diffusion of pollutants, reduce the concentration of pollutants, the value of the AQI take the range of (0,50); the second category, light wind, the average wind speed 1.530m//s1.530 \mathrm{~m} / \mathrm{s} , the pollutant concentration is high, the value of the AQI take the range of (50,100); The third category, soft wind, average wind speed 0.994m//s0.994 \mathrm{~m} / \mathrm{s} , wind speed is too low, the mixing effect is stronger than the diffusion effect is not conducive to the diffusion of pollutants, the pollutant concentration is high, and the corresponding AQI value is also high, the value range (100,150 )。) 。 ).
温度、湿度、气压、风速的聚类结果图分别如图 4-3 所示。 The clustering result plots for temperature, humidity, barometric pressure, and wind speed are shown in Figure 4-3, respectively.
图4-3 温度、湿度、气压、风速的聚类结果图 Figure 4-3 Clustering Results for Temperature, Humidity, Barometric Pressure, Wind Speed
在依据 AQI 值对不同气象进行分别聚类之后。对整体也进行了聚类,因 k-means 聚类的先验性条件,需要预先给定聚类簇数,但我们不能预知分为多少簇合适,故设置了一个最大聚类簇数 13,进行多次聚类,依据聚类有效性指标 DB 和 SSE 对聚类结果进行评价,最终选择簇数为 3 ,将整体分为三类,聚类中心如表 4-1 所示(保留三位小数): After clustering the different meteorological conditions based on the AQI values, we also clustered the whole meteorological conditions. The overall clustering was also carried out, because of the a priori condition of k-means clustering, the number of clusters needs to be given in advance, but we can not predict how many clusters are appropriate, so we set a maximum number of clusters of 13 clusters for multiple clustering, and based on the clustering validity indexes of DB and SSE for the evaluation of the clustering results, and finally chose the number of clusters of 3, and the whole is divided into three categories, and the clustering center is shown in Table 4-1 (with three decimal places reserved). The cluster centers are shown in Table 4-1 (three decimal places are reserved):
通过聚类结果分析,可以将气象分为以下三类: By analyzing the clustering results, the weather can be classified into the following three categories:
第一类, 平均温度 26.60^(@)C26.60^{\circ} \mathrm{C}, 平均湿度 56.876%56.876 \%, 平均气压 1011.303 MBar , 平均风速 1.575m//s1.575 \mathrm{~m} / \mathrm{s}, 平均风向 79.178^(@)79.178^{\circ} ,在此气象条件下 SO_(2)、NO_(2)、PM10\mathrm{SO}_{2} 、 \mathrm{NO}_{2} 、 \mathrm{PM} 10 的平均污染浓度低,但是 PM_(2.5)、O_(3)\mathrm{PM}_{2.5} 、 \mathrm{O}_{3} 和 CO 的污染浓度在三类中最高; In the first category, average temperature 26.60^(@)C26.60^{\circ} \mathrm{C} , average humidity 56.876%56.876 \% , average barometric pressure 1011.303 MBar , average wind speed 1.575m//s1.575 \mathrm{~m} / \mathrm{s} , and average wind direction 79.178^(@)79.178^{\circ} , the average pollutant concentration in this meteorological condition is low in SO_(2)、NO_(2)、PM10\mathrm{SO}_{2} 、 \mathrm{NO}_{2} 、 \mathrm{PM} 10 , but the pollutant concentrations in PM_(2.5)、O_(3)\mathrm{PM}_{2.5} 、 \mathrm{O}_{3} and CO are the highest among the three categories;
第二类, 平均温度 25.100^(@)C25.100^{\circ} \mathrm{C} ,平均湿度 67.665%67.665 \% ,气压 1010.3 MBar ,平均风速 1.343m//s1.343 \mathrm{~m} / \mathrm{s} ,平均风向 288.373^(@)288.373^{\circ} ,在此气象条件下除 PM_(2.5)\mathrm{PM}_{2.5} 的平均浓度是三类中最高的,其它污染物浓度都居中; In the second category, average temperature 25.100^(@)C25.100^{\circ} \mathrm{C} , average humidity 67.665%67.665 \% , barometric pressure 1010.3 MBar, average wind speed 1.343m//s1.343 \mathrm{~m} / \mathrm{s} , and average wind direction 288.373^(@)288.373^{\circ} , all pollutant concentrations were intermediate under these meteorological conditions, except for the average concentration of PM_(2.5)\mathrm{PM}_{2.5} , which was the highest of the three categories;
第三类, 平均温度 23.081^(@)C23.081^{\circ} \mathrm{C} ,平均湿度 72.649 ,气压 1012.385 MBar ,平均风速 1.360m//s1.360 \mathrm{~m} / \mathrm{s} ,平均风向 50.188^(@)50.188^{\circ} ,在此气象条件下除 PM_(10)\mathrm{PM}_{10} 的浓度在三类中最高外,其它污染物浓度均处于最低水平。 In the third category, the average temperature was 23.081^(@)C23.081^{\circ} \mathrm{C} , the average humidity was 72.649, the barometric pressure was 1012.385 MBar, the average wind speed was 1.360m//s1.360 \mathrm{~m} / \mathrm{s} , the average wind direction was 50.188^(@)50.188^{\circ} , and all pollutants were at their lowest levels under these meteorological conditions except for PM_(10)\mathrm{PM}_{10} , which had the highest concentration among the three categories.
5. 问题三分析与求解 5. Problem 3 analysis and solution
5.1 问题三分析 5.1 Analysis of question three
一个有效的空气质量预报系统有助于人类掌握污染物未来浓度信息,制定相应的防治 An effective air quality forecasting system can help human beings to have information about the future concentration of pollutants and develop appropriate prevention and control measures.
策略,对空气中污染物的浓度水平提前给出精确的预报,使因污染物浓度超标所造成的非线性智能统计模型,目前常用 WRF-CMAQ 模拟体系对空气质量进行预报。WRF-CMAQ模型主要包括 WRF 和 CMAQ 两部分:WRF 是一种中尺度数值天气预报系统,用于为 CMAQ 提供所需的气象场数据;CMAQ 是一种三维欧拉大气化学与传输模拟系统,其根据来自 WRF 的气象信息及场域内的污染排放清单,基于物理和化学反应原理模拟污染物等的变化过程,继而得到具体时间点或时间段的预报结果 ^([6]){ }^{[6]} 。 The WRF-CMAQ model is a mesoscale numerical weather prediction system used to provide CMAQ with the required meteorological field data; CMAQ is a three-dimensional Eulerian atmospheric chemistry and transport simulation system that simulates the change process of pollutants based on physical and chemical reaction principles according to the meteorological info