Chapter 1 Introduction
1.1 Topic selection background
1.1.1 Passive investing and ETFs
The idea of ExchangeTraded Fund (ETF) came from the 1987 U.S. stock market crash. As a method of passive investment, it adopts diversified investment and passive management by copying or tracking a certain market index, so that Transaction costs are minimized to obtain market index returns. This investment method effectively reduces the nonsystematic risk in the investment portfolio by investing in a basket of stocks. At the same time, it has gradually become a recent trend due to its more flexible trading mechanism and lower cost compared to ordinary index funds. One of the most popular financial products among investors for decades (Yang Mozhu, 2013). As of the end of February 2024, the total number of global ETFs was 12,063, with a total scale of US$12.3 trillion. Among them, the U.S. ETF market is the largest ETF market, accounting for about 70% of the global scale.
Compared with the development of the global ETF market, my country's ETF market started late. The listing and issuance of China Securities SSE 50 ETF marked the birth of my country's ETF market. The development of ETFs in my country can be divided into three stages: the initial stage from 2005 to 2009, with a small number of ETF products and a small market size; the rapid development stage from 2010 to 2017, with the China Securities Regulatory Commission (CSRC) ) Relaxed restrictions on ETF issuance conditions, the number and scale of ETF products surged, exceeding 100 billion in 2013, becoming the second largest ETF market in Asia; 2018 to the present has been a stage of vigorous development, with the emergence of the bull market in 2018, the number of ETFs and The scale has grown explosively. As of the end of February 2024, the number of listed ETFs in my country has reached 933, with a scale of 2.41 trillion yuan.
With the expansion of the ETF market, the regulatory requirements for the ETF market are becoming more and more perfect, especially the emphasis on monitoring the liquidity quality of ETFs. ETF market makers, as liquidity providers in the ETF market, can provide liquidity to the ETF market through crossmarket arbitrage transactions. Therefore, the management of liquidity service providers is constantly being standardized. According to the "Shenzhen Stock Exchange Securities Investment Fund Business Guidelines No. 2  Liquidity Services" revised in 2023, liquidity services will be provided for fund products. The service providers should be clearly divided and different types of liquidity service providers should be managed differently; the requirements, business rules and management specifications of the main liquidity service providers should be clarified; the relevant information disclosure requirements should also be clarified to ensure the fairness and transparency of ETF market information. sex.
1.1.2 Characteristics of China’s stock market
China's stock market, due to its huge investor group, is growing in scale and social influence, forming an investor structure with its own characteristics. Research shows that factors such as irrational behavior in stock market transactions affect the normal price formation mechanism and market order, which in turn affects investors' ability to obtain reasonable longterm returns from the market. Taken together, the structure and behavior of my country’s investors have the following characteristics:
The number of investors is large, and individual investors account for a high proportion.
China's stock market has a huge investor base, including many small and mediumsized retail investors. In recent years, the absolute number of investors has been rising, with individual investors accounting for the majority of the number of households. At the same time, general individual investors have actively entered the market, and the cumulative number of new natural person investors in the past five years has exceeded 70 million, indicating that my country's residents' property is gradually transforming from savings to investment. The current shareholding ratio of individual investors remains at around 40%, and will increase significantly in 2023.
Individual investors trade frequently, and the main subjects of irrational trading behavior are individual investors.
Judging from the capital turnover rate (the investor's transaction amount is a multiple of the capital invested), the capital turnover rate of individual investors is at a relatively high level, and the frequency of individual transactions is much higher than that of institutions. At the same time, irrational trading behavior is mainly caused by individual investors. Empirical evidence shows that the holders and traders of lowpriced stocks, poorperforming stocks, high P/E stocks and ST stocks are mainly small and mediumsized individual investors. Second, individual investment generally shows a strong tendency to speculate, hoping to obtain shortterm gains through price fluctuations, and is greatly affected by market sentiment and hot news.
The collective irrational behavior of investors has contributed to stock market fluctuations.
The irrational factors of investors have contributed to the fluctuations of the stock market, and this phenomenon is even more obvious in the Chinese stock market, which is dominated by retail investors. For example, the participation of some investors with newly opened accounts, small asset sizes, lack of trading experience and high risk appetite directly or indirectly contributed to the surge in new stocks on the first day of listing. Their buying behavior firstly raised the stock price; Speculative sentiment; and analysis of some skyrocketing new stocks shows that the proportion of highrisk preference investors, new account investors, and small investors is higher than that of new stocks with normal growth.
Therefore, the Ashare market has distinct retail characteristics, which is one of the reasons for the active trading in the Ashare market. Compared with mature overseas capital markets, the shareholding ratio of institutional investors in the Ashare market is relatively low. This difference allows the Ashare market to exhibit unique market characteristics and performance.
Combining the importance of liquidity in the research of investment products, the retail structure and the development status of ETF, this article studies the relationship between ETF tracking error, liquidity and investor structure. This not only helps to better understand the operating mechanism and error sources of the ETF market, but also provides investors with better investment advice and risk management strategies, while providing theoretical and empirical basis for the regulation and investment practice of the ETF market.
1.2 Research content and methods
The research goal of this article is the relationship between ETF tracking error, liquidity and investor structure. Taking ETF funds that have been listed between 2013 and 2023 as research samples, factors such as ETF tracking error and liquidity indicators are calculated respectively, and panel regression analysis is performed. First, regression of ETF tracking error on liquidity and retail investor holding ratio is performed; secondly, based on the above regression results, an interaction term is added, that is, liquidity ◊ retail investor holding ratio, to further study the changes in ETF tracking error in the Chinese market relationship with investor structure and liquidity. Subsequently, the research results will be further grouped and studied, combined with influencing factors, classified according to the fund's share, trading volume, number of holders, and underlying underlying market, to study whether ETF tracking error and liquidity and the relationship between investors are heterogeneous. .
The research methods of this article include:
1. Theoretical analysis method
Based on historical relevant research data and conclusions, conduct a systematic analysis of the impact of liquidity and investor institutions on tracking errors.
2. Empirical analysis method
Using measurement methods, regression analysis was performed on the selected samples and related influencing factors to obtain the influencing relationship between them.
The article is mainly divided into the following parts: Chapter 1 is the introduction, which mainly discusses the background of the topic selection, outlines the research content, and explains the contribution and innovation of the article. Chapter 2 is a review of relevant literature, mainly including theoretical and empirical research on liquidity, retail investment behavior and ETF tracking error. The third chapter is theoretical analysis and research hypotheses. Through the analysis of the arbitrage behavior of authorized participants and the impact of retail investors on market fluctuations, the research hypotheses of the paper are given. Chapter 4 is the empirical research design, including the sources of data, calculation of variables and model design. Chapter 5 is the empirical research results, including descriptive statistics of variables, analysis of main regression results and robustness testing. Chapter 6 is the causal identification strategy, which mainly focuses on possible endogeneity problems in the model and further discusses the causal relationship between independent variables and dependent variables. Instrumental variables are introduced for regression and the results are analyzed. Chapter 7 is the heterogeneity test, which is mainly based on further research on the main regression results and discusses the impact of independent variables on dependent variables under different classifications.
1.3 Research contribution and innovation
1. Previous scholars’ research on liquidity focused more on the study of market liquidity on stock volatility, while this article identifies the subject of research as ETF funds with stock trading models.
2. Compared with past ETFrelated research, this article focuses on studying the risk factors affecting the tracking error of the ETF product itself, rather than the impact of the emergence of ETF products on existing investment targets (such as stock pricing efficiency or stock price fluctuations) .
3. The difference from the reference literature is that the main research point of the reference literature is the relationship between liquidity risk and ETF returns, variances and tracking errors, while this article focuses on studying the causes of ETF tracking errors, and based on the characteristics of the market studied, The variable of retail investment structure is introduced to study the relationship with tracking error under the interaction between retail investment and liquidity.
This article mainly refers to the research of Bae and Kim (2020), and uses Chinese ETF data to study and discuss the relationship between its liquidity, investor structure and tracking error.
Chapter 2 Review of related literature
2.1 Liquidity
Research on liquidity has a long history, and scholars at home and abroad have conducted a lot of research in this field. Generally speaking, liquidity can be divided into three levels: 1. Macro liquidity, mainly refers to the money supply; 2. Meso liquidity, mainly refers to the liquidity of financial markets or financial institutions; 3. Micro liquidity, also in this article One of the research objects refers to the ability of financial assets to convert into cash under market conditions.
The research content of liquidity mainly focuses on the relationship between liquidity and asset returns and how systemic liquidity risk is priced in asset returns. For example, Acharya and Pedersen (2005) and Amihud (2002) found that the persistence of liquidity allows it to be used to predict market returns, that is, low liquidity today means there will be low liquidity tomorrow, thus Demand higher earnings to compensate. Amihud and Mendelson (1986) used the bidask spread as an indicator of liquidity to study the relationship between liquidity and stock returns and found that stocks with larger bidask spreads have higher expected returns. Jones (2002) collected 100 years of bidask spread data for sample stocks in the Dow Jones Index and found that the average proportional spread and turnover rate can predict the market's excess returns.
In terms of liquidity risk pricing, Acharya and Pedersen (2005) established a liquidityadjusted Capital Asset Pricing Model (LCAPM) and found that independent asset returns are strongly correlated with liquidity risk. Pastor and Stambaugh (2003) found that individual stock returns are affected by overall market liquidity. Eckbo and Norli (2002) found that the proportional bidask spread of individual stocks has obvious comovement characteristics, and confirmed that the liquidity risk caused by this comovement is indeed priced in the crosssectional portfolio returns. Gibso and Mougeot (2004) also found evidence that liquidity risk is priced in the US market using monthly data.
In terms of ETF liquidity research, Borkovec et al. (2010) and Madhavan (2012) studied the pricing problem of ETFs in the context of the flash crash of US stocks. Borkovec found that a sharp increase in the bidask spread will lead to the failure of ETF price discovery. Cespa and Foucault (2014) established a theoretical model to prove that the low liquidity of ETFs will lead to increased uncertainty in their underlying assets, and this uncertainty is reflected in the weakening of the liquidity of their related ETFs. Clifford et al. (2014) found that ETF flows increase with large trading volume, small spreads and high price/NAV ratios. Roncalli and Zheng (2014) measured ETF liquidity and the liquidity of its underlying benchmark index and found that the two were correlated during the day, but there was no obvious relationship within the day.
2.2 ETF tracking error
In terms of measuring ETF tracking error, Haugen and Baker (1990) proposed three measurement indicators: the first is the determination coefficient. The determination coefficient is an indicator of the goodness of fit of the comprehensive measurement model to the sample observations. It also reflects the The degree of influence of multiple independent variables on the dependent variable. The closer the coefficient of determination is to the independent variable, the better it explains the dependent variable. The second is the beta of an index portfolio, which measures the volatility of the index portfolio relative to the overall market. The third is the variance of the return sequence difference between the indexed portfolio and the underlying index. Gastineau (2004) measures the performance of an ETF compared to its benchmark index by analyzing the operating efficiency of the ETF.
When studying the causes of ETF tracking errors, Chiang (1998) pointed out that transaction costs, cash flows, dividends, and changes in underlying index constituents will all affect the size of tracking errors. Manuel Ammann and Heinz Zimmermann (2001) pointed out that different asset allocation strategies lead to different sizes of tracking errors in investment portfolios. Kostovetsky (2003) compared the differences between ETF funds and passive index funds using factors such as investor trading preferences and taxes, and pointed out that the difference between ETFs and benchmark indexes mainly comes from management fees and changes in index constituents. Vardharaj et al. (2004) pointed out that different asset allocation strategies of the underlying index will have different effects on the tracking error of the indexed portfolio. Bae and Kim (2020) studied the US ETF market and found that weakly liquid ETFs have larger tracking errors, and proved the causal link between liquidity and tracking errors through the instrumental variable method.
In terms of studying ETF tracking error arbitrage, Lee and Shleifef (1991) and Pontiff (1996) found that because ETFs can be traded in the primary and secondary markets at the same time, the arbitrage opportunities that exist make the tracking error of ETFs smaller than that of ordinary passive indexes. Funds are smaller. Moussawi and Stahel (2016) used the 2015 flash crash event to study and found that arbitrage behavior can reduce tracking error. Peterffy (2010) reported that the deterioration of arbitrage liquidity will lead to the expansion of ETF tracking errors; Pan and Zeng (2016) also found through event analysis that the loss of arbitrageurs has an inhibitory effect on the reduction of ETF tracking errors.
In terms of domestic research, Chen Jiawei and Tian Yinghua (2005) discussed the causes of ETF tracking errors in response to the phenomenon of ETF discounts and premiums. Liu Wei et al. (2009) analyzed the causes of intraday errors in ETFs through the intraday data of ChinaAMC SSE 50ETF and Huaan SSE 180ETF. Huo Mingyun (2010) analyzed the factors affecting ETF arbitrage costs and the reasons for differences in cash balances. Ma Bin (2010) used ETF portfolios to conduct term arbitrage and found that there are a large number of term arbitrage opportunities in my country. Li Fengyu (2014) explained the phenomenon of ETF discount and premium from the perspective of investor sentiment and found that the relationship between investor sentiment and ETF premium rate in the Ashare market is different in different market environments. In a pessimistic market, the two are negatively correlated, while in a neutral or optimistic market Positive correlation in the market.
2.3 Retail investor behavior
For the research on retail investors, the classic theory is the "herding effect". Christie and Huang (1995) explained the herding effect as a phenomenon in which traders ignore their actual asset conditions and blindly follow market trends based on personal judgment. Hwang and Salmon (2004) defined the emergence of herding phenomenon as when investors make investment decisions no longer due to personal thinking and judgment, but to the choices of others or market wind direction, the herding effect has already appeared.
Research on the herding effect mainly focuses on the study of investor sentiment. Rubesam et al. (2021) found that when investor sentiment is overconfident or overly optimistic, investors tend to experience a herding effect. Tang Yong et al. (2020) confirmed through data that the herd effect has a stronger impact when the stock market falls; further, Ma Li (2016) found that due to the rise and fall limits in China's securities market, this limit may be a factor for the stock market. One of the reasons why the herding effect during the rising stage is less than that during the falling stage.
Regarding the investment behavior of retail investors and stock price fluctuations, Friedman (1953) and Fama (1965) believed that the investment behavior of retail investors has no impact on the pricing of financial assets. However, in subsequent research, Cutler et al. (1990) proposed that due to the information bias obtained by individual investors and institutional investors, individual investors are often at an information disadvantage, and most individual investors are often unable to effectively handle their income. Information leads to herding behavior, causing stock prices to deviate from their intrinsic value. In domestic market research, Yan Jiayuan (2015) believes that individual investors often engage in irrational trading behavior due to their limited ability to process information, and this behavior will cause largescale fluctuations in stock prices through the herding effect.
Chapter 3 Theoretical Analysis and Research Hypotheses
In the absence of arbitrage,ETF
secondary market price andETF
Net worth should remain consistent, otherwise forAPs
There is room for arbitrage. But in actual situations,ETF
secondary market price andETF
Since net worth is affected by different factors, such as transaction rebalancing, product structure, market changes, transaction costs, etc., it is difficult for the two returns to be consistent, so the amount ofETF
tracking error.APs
As a participant who can trade in both the primary market and the secondary market,ETF
The generation of tracking error makes it conduct arbitrage operations (Fig.3.1
), and the arbitrage operation makesETF
The tracking error gradually becomes smaller and tends to0。
Figure 3.1 ETF arbitrage
However, if the liquidity of ETFs is poor, this means that APs will increase transaction costs when conducting arbitrage operations, and the increase in transaction costs will inhibit APs from conducting arbitrage operations, so that the tracking error of ETFs will no longer become smaller. This may also be the reason why liquidity has an impact on ETF tracking error.
On the other hand, retail investors are often regarded as noise traders. Their ability to obtain, process and understand information is limited. In addition, retail investors often engage in irrational trading behavior due to their own limitations. This irrational behavior is contagious to each other and leads to Stock prices often experience largescale random jump fluctuations (Black, 1986). However, Cutler et al. (1990) proposed that due to the herd effect of retail investors (Bikhchandani et al., 1992), that is, a group of investors with asymmetric information, when previous investor decisions are visible and the action space is discrete, the behavior of early investors It will affect the investment behavior of subsequent investors; this behavior often causes the stock price to deviate from its intrinsic value. Therefore, in funds with concentrated retail investors, the herding behavior of retail investors may often cause the price of the ETF they hold to deviate from its intrinsic value (i.e., net value), further leading to the expansion of tracking errors.
For ETFs, the increase in retail investors will cause ETF prices to fluctuate away from their value. For APs, when retail investors hold more ETFs, their prices are more likely to rise and fall sharply, which makes APs pay more attention to liquidity. Therefore, after the increase of retail investors, the impact of liquidity on tracking error is more obvious.
Based on the above analysis, this article makes the following assumptions:
H1: Retail investors hold more ETFs, and their liquidity has a greater impact on the tracking error of ETFs.
Chapter 4 Empirical Research Design
4.1 Sample screening and data
This article extracts the basic information of listed funds in the CSMAR database and counts the number of ETFs established and listed in the domestic market from 2004 to 2023. In order to ensure that the data sample is sufficient, the period from 2013 to 2023 is selected as the research range. Among them, the selection criteria are:
The fund type is bond or stock;
The investment method is passive;
The fund has been listed on the Shanghai Stock Exchange or Shenzhen Stock Exchange;
Exclude QDII and Hong Kong Stock Connect type funds;
Exclude alternative investment funds;
After screening, the valid data is 842 funds.
See Table 4.1 for specific ETF statistics.
Table 4.1 Statistics of listed ETFs (year)
 Newly added every year
 累计 
2005  1  1 
2006  3  4 
2007  1  5 
2009  2  7 
2010  8  15 
2011  21  36 
2012  8  44 
2013  28  72 
2014  14  86 
2015  18  104 
2016  9  113 
2017  15  128 
2018  27  155 
2019  67  222 
2020  97  319 
2021  247  566 
2022  110  676 
2023  166  842 
When calculating and generating data in this article, in order to avoid the situation where the regression coefficient is too small due to different data magnitudes, the data is transformed into different magnitudes. At the same time, in order to avoid extreme values in the independent variable Amihud liquidity index and dependent variable tracking error , combined with the actual data frequency, perform standardization, normalization and rolling average smoothing operations on both.
4.2 Variable selection and calculation
4.2.1 Tracking error
When calculating ETF tracking error, we choose to use the difference between the daily return rate of the ETF's net value and its secondary market daily return rate as the tracking error indicator of the ETF.
4.2.2 Liquidity
When it comes to selecting ETF liquidity indicators, since ETFs have trading attributes similar to stocks, reviewing the research results related to stocks, there are the following indicators for measuring liquidity:
Bid and offer spread:
$''S=''{''P''}_{\mathrm{''A''}}''''{''P''}_{\mathrm{''B''}}$
Among them, ${''P''}_{\mathrm{''A''}}$ is the best buying price, and ${''P''}_{\mathrm{''B''}}$ is the best selling price.
If the bidask price difference is larger, it means the liquidity of the underlying asset is poorer. However, the limitation is that this indicator cannot reflect the impact of price changes caused by large transactions, nor can it reflect the situation of transactions being completed within the spread and outside the spread. At the same time, stocks with high prices generally have larger bidask price differences, so it is difficult to compare the liquidity of different stocks.
Amihud Illiquidity Ratio:
$''Amihud\; Illiquidity=''\frac{\left{''r''}_{\mathrm{''i,t''}}\right}{{''V''}_{\mathrm{''i,t''}}}$
Among them, ${''r''}_{\mathrm{''i,t''}}$ represents the return rate of stock i on day t, and ${''V''}_{\mathrm{''i,t''}}$ represents the trading volume of stock i on day t.
Quoted from the article Illiquidity and stock returns: crosssection and timeseries effects published by Amihud in 2002, it shows the range of stock price changes under a certain trading volume. If the stock's liquidity is good, its price change ratio should be smaller under a certain trading volume. Therefore, the smaller the Amihud indicator, the better the stock's liquidity.
Turnover rate:
$''Tu''''rnover=''\frac{''S''}{''V\times P''}$
Among them, V is the trading volume, S is the total number of outstanding shares, and P is the average trading price of the stock.
The turnover rate is used to measure the average holding time of a stock. The larger the turnover rate, the shorter the average holding time of the stock, the more active the stock trading, and the better the liquidity of the stock.
Taking into account the advantages and disadvantages of each liquidity index, this article chooses the Amihud liquidity index to measure the liquidity of ETFs.
In order to study whether retail investor holdings affect the tracking error of ETFs, this article selects the retail investor holding ratio as a data indicator to measure ETF retail investor holdings.
4.2.3 Control variables
In the selection of control variables, based on the existing research results, the following indicators are selected as the control variables for this regression: circulation share, total transaction volume on the day, benchmark index volatility, number of holders, whether to lend securities, cash ratio, and current subscription ratio , fund replication method.
4.2.4 Instrumental variables
According to the asset portfolio theory, Friedman (1961) proposed that if the money supply decreases, the proportion of money held by investors in their total wealth will decrease, which will indirectly lead to an increase in the marginal utility of cash. In this case, investors tend to sell nonmonetary assets (such as stocks, funds, etc.) to increase their actual wealth. On the contrary, when the money supply increases, investors are more likely to reduce their cash holdings and invest their funds in investment and financial products with certain risks. In the research of Michael (1974), he believed that the increase in money supply will cause investors to continuously choose between currency and other valuable assets in order to achieve the optimal combination of holding currency and valuable assets with maximum utility. Therefore, the money supply is related to the funds investors invest in it. However, money supply has no direct impact on the tracking error of ETFs. Therefore, this article chooses money supply M1 as the instrumental variable of this article to avoid possible endogeneity problems in the study.
The specific indicator expression and calculation method are shown in Table 4.2.
Table 4.2 Variable explanation and calculation
variable name
 定义 
Tracking Error (ETFNAV %)  The average of the difference between the ETF secondary market transaction price return and the ETF net value return over the rolling 120 trading days.

Tracking Error (ETFIND %)  The average of the difference between the ETF secondary market transaction price return and the ETF benchmark index return over the rolling 120 trading days.

Tracking Error (NAVIND %)  The average of the difference between the ETF's net value return and the ETF's benchmark index return over the rolling 120 trading days.

ETF Illiquidity  The rolling 120trading day average of the Amihud liquidity indicator. Among them, Amihud liquidity index = the absolute value of ETF daily return/ETF trading volume on that day (Amihud, Y 2002). If the trading volume on that day is 0, the Amihud indicator on that day is marked as 0. (mark source)

Individual Percentage  The proportion of ETF retail investors in the current period disclosed in the ETF's regular reports (annual report, semiannual report).

Interaction  Reciprocal = ETF Illiquidity×Individual Percentage

Shares Outstanding (in ¥billions)  ETF fund circulating shares.

Trading Volume (in ¥billions)  Trading volume × closing price of the day

Expense Ratio (%)  ETF annual management fee.

Index Volatility  The standard deviation of index returns over rolling 120 trading days.

Holder Number (in thousands)  The total number of holders at the end of the ETF reporting period.

Short Selling  Whether ETFs are securities lending, if so, it is 1, otherwise it is 0.

Cash &Deposit Ratio  The cash portion of the portfolio as a percentage of total fund assets is reported regularly. (Including bank deposits and liquidation reserves)

Requisition Ratio  The proportion of the current subscription amount of the fund to the total size.

Replicated  Base index replication method, 1 if it is a complete replication method, 0 otherwise.

Turnover (%)  Fund’s daily trading turnover rate = ((daily trading volume (lots))*100)/circulation turnover share of listed funds as of that day)*100.

M1(in ¥trillions)  The People's Bank of China publishes cash in circulation (M0) + demand deposits every month.

M1×ETF Illiquidity  control variables.

4.3 Model design
4.3.1 Panel regression model
In the regression analysis, since the data structure is multidimensional data, this article chooses the panel regression model to study and analyze ETF liquidity, retail investor structure and ETF tracking error. The regression model is established as follows:
$''Tracking\; Error=\; \alpha +''{''\beta ''}_{\begin{array}{c}''1''\\ \end{array}}''\times ETF\; Illiquidity+''{''\beta ''}_{\begin{array}{c}''2''\\ \end{array}}''\times Individual\; Percentage+Controls+\epsilon ''$ (4.1)
In order to further study the impact of the interaction between ETF liquidity and retail investor structure on ETF tracking error, the interaction term is added to the regression model, namely:
$''Tracking\; Error=\; \alpha +''{''\beta ''}_{\begin{array}{c}''1''\\ \end{array}}''\times ETF\; Illiquidity+''{''\beta ''}_{\begin{array}{c}''2''\\ \end{array}}''\times Individual\; Percentage+''{''\beta ''}_{''3''}''\times Interaction+Controls+\epsilon ''$ (4.2)
In order to avoid the impact of missing data on the regression results, when processing the regression data, due to the different data frequencies of different variables, all data frequencies are unified into semiannual frequency data (fund holder structural data will only be published in the fund annual report and semiannual report) ). After deleting missing benchmark index data, missing fund net value data, etc., 3249 observed data were finally selected as regression data.
Chapter 5 Empirical Research Results
5.1 Descriptive statistics of variables
Table 5.1 shows the descriptive statistical results of the main variables used in this study. The indicators include mean, standard deviation, maximum value, minimum value, etc.
Table 5.1 Descriptive statistics of variables
Variables  N  Mean  SD  P1  P50  P99 
      
Tracking Error (ETFNAV %)  3,249  0.31  0.29  0.07  0.19  1.41 
Tracking Error (ETFIND %)  3,249  0.31  0.27  0.07  0.20  1.29 
Tracking Error (NAVIND %)  3,249  0.05  0.09  0.01  0.03  0.46 
ETF Illiquidity  3,249  2.88e3  0.03  4.68e8  6.24e6  0.04 
Individual Percentage  3,249  0.46  0.32  2.80e3  0.43  1.00 
Interaction  3,249  4.86e4  4.84e3  1.03e8  2.42e6  0.01 
Shares Outstanding (in ¥billions)  3,249  1.38  3.55  3.00e3  0.21  19.51 
Trading Volume (in ¥billions)  3,249  0.07  0.27  0.00  6.63e3  1.43 
Expense Ratio (%)  3,249  0.44  0.13  0.15  0.50  0.60 
Index Volatility  3,249  1.18  15.57  1.01e3  0.01  0.03 
Holder Number (in thousands)  3,249  2.07e3  5.98e3  1.61e05  3.28e4  0.03 
Short Selling  3,249  0.12  0.33  0.00  0.00  1.00 
Cash &Deposit Ratio  3,249  0.02  0.04  1.50e3  0.02  0.09 
Requisition Ratio  3,249  0.48  1.24  0.00  0.19  5.25 
Replicated  3,249  0.97  0.16  0.00  1.00  1.00 
Turnover (%)  3,249  0.06  0.48  0.00  0.02  0.40 
M1(in ¥trillions)  3,249  64.54  4.434  54.39  64.74  69.56 
M1×ETF Illiquidity  3,249  0.02  0.17  2.88e07  4.03e05  0.25 
5.2 Main regression results
Carry out the regression of Formulas 4.1 and 4.2 respectively, controlling time and individual (fund) effects, and the results are shown in Table 5.2. Column (1) is the regression result without adding interaction terms. The results show that the regression coefficient of the ETF liquidity indicator on the ETF tracking error is significant, indicating that the ETF liquidity indicator has an impact on the ETF tracking error; the regression coefficient of the ETF liquidity indicator is 0.945, indicating that when the ETF liquidity changes by 1%, other conditions do not change. If the ETF liquidity changes, the tracking error of the ETF will change by about 0.95%; in addition, since the calculation formula of the ETF liquidity indicator is the daily yield/the trading volume of the day, if the liquidity of the ETF is strong, it means that the price fluctuation of the ETF with a certain trading volume The impact is not significant, so the ETF liquidity indicator should be smaller. Combined with the positive regression coefficient, the regression results indicate that the worse the liquidity of the ETF, the greater the tracking error.
Column (2) is the result after adding the interaction term. The results show that when the interaction term of liquidity The tracking error regression coefficient is significant. The regression coefficient of the interaction term is 7.199, indicating that the interaction term has a positive impact on ETF tracking error.
The above results show that in the Chinese market, on the one hand, ETFs with weak liquidity have higher ETF tracking errors; on the other hand, because domestic investors are mainly retail investors, the relationship between ETF tracking errors and their liquidity is less stable than those held by retail investors. The effect of relationship is more obvious on more ETFs.
To sum up, in the Chinese ETF market, the proportion of retail investors' holdings promotes the impact of ETF illiquidity on ETF tracking error.
In addition, based on the two regression results, the fluctuation of the benchmark index, ETF transaction amount, ETF cash holding ratio, and ETF subscription ratio all have an impact on ETF tracking error.
Table 5.2 Panel regression results
Dependent Variable  Tracking Error (ETF_NAV %) 
 (1)  (2) 
  
ETF Illiquidity  0..945***  0.154 
 (0.25)  (0.29) 
Individual Percentage  0.033  0.027 
 (0.06)  (0.06) 
Interaction   7.199*** 
  (2.65) 
Shares Outstanding (in ¥billions)  0.001  0.001 
 (0.00)  (0.00) 
Trading Volume (in ¥billions)  0.120*  0.119* 
 (0.06)  (0.06) 
Expense Ratio (%)  0.031  0.026 
 (0.42)  (0.41) 
Index Volatility  0.001***  0.001*** 
 (0.00)  (0.00) 
Holder Number (in thousands)  0.350  0.286 
 (1.81)  (1.81) 
Short Selling  0.007  0.006 
 (0.02)  (0.02) 
Cash &Deposit Ratio  0.264**  0.506*** 
 (0.11)  (0.19) 
Requisition Ratio  0.039**  0.040** 
 (0.02)  (0.02) 
Replicated  0.027  0.029 
 (0.02)  (0.02) 
  
Yearmonth Fixed Effects  YES  YES 
Fund Fixed Effects  YES  YES 
  
Nobs.  3213  3213 
Adjusted R^{2}  0.588  0.591 
1. ***, **, and * indicate significance at the 1%, 5%, and 10% levels respectively. The robust standard errors after clustering are in parentheses. The explained variables in each column are ETF tracking errors. 2. Column (1) represents the regression results without adding interaction terms, and column (2) represents the regression results after adding interaction terms. 3. Columns (1) and (2) both control date and fund fixed effects. 4. The definition of relevant variables is as shown in Table 4.2 of this article.
5.3 Robustness check
In order to prevent the research results from being affected by other factors, this article conducts the following three robustness tests to ensure the reliability and stability of the research results.
5.3.1 Dependent variable substitution
Panel regression is performed using the rolling 120day average of the difference between the ETF's net value return and its benchmark return and the rolling 120day average of the difference between the ETF's secondary market price return and its benchmark return as alternative cause variables. The regression results are shown in Table 5.3.
From Table 5.3, we can see that the regression coefficients of the interaction term on the replaced dependent variable are significant at the 10% and 5% levels respectively, indicating that the research results are robust.
Table 5.3 Robustness test regression results (dependent variable replacement)
Dependent Variable  Tracking Error (NAVIND %)  Tracking Error (ETFIND %) 
 (1)  (2) 
  
ETF Illiquidity  0.049**  0.221 
 (0.03)  (0.27) 
Individual Percentage  0.002  0.025 
 (0.01)  (0.05) 
Interaction  0.561*  5.886** 
 (0.32)  (2.87) 
Shares Outstanding (in ¥billions)  0.000  0.001 
 (0.00)  (0.00) 
Trading Volume (in ¥billions)  0.007  0.071** 
 (0.01)  (0.03) 
Expense Ratio (%)  0.025  0.004 
 (0.02)  (0.38) 
Index Volatility  0.000  0.001*** 
 (0.00)  (0.00) 
Holder Number (in thousands)  0.253  1.268 
 (0.43)  (1.22) 
Short Selling  0.005  0.022 
 (0.00)  (0.01) 
Cash &Deposit Ratio  0.128***  0.193 
 (0.03)  (0.24) 
Requisition Ratio  0.009***  0.021** 
 (0.00)  (0.01) 
Replicated  0.005**  0.018 
 (0.00)  (0.02) 
  
Yearmonth Fixed Effects  YES  YES 
Fund Fixed Effects  YES  YES 
  
Nobs.  3213  3213 
Adjusted R^{2}  0.610  0.693 
1. ***, **, and * indicate significance at the 1%, 5%, and 10% levels respectively, and the robust standard errors after clustering are in parentheses. 2. The result of column (1) is that the dependent variable is the difference between the ETF's net return and its benchmark return, and the result of column (2) is the difference between the ETF's price return and its benchmark return. 3. Columns (1) and (2) both control date and fund fixed effects.
5.3.2 Argument substitution
Since ETFs have secondary market trading characteristics similar to stocks, stocktype liquidity indicators can also be used to measure ETF liquidity. Referring to the discussion on liquidity measurement indicators in Chapter 4, this article uses ETF turnover rate to replace the Amihud liquidity indicator and conducts a robustness test on the regression results. The results are shown in Table 5.4.
It can be seen from Table 5.4 that after replacing the liquidity indicator, the regression coefficient of the new interaction term on the tracking error is still significant at the 5% level, and the direction of the regression coefficient is negative, consistent with the main regression results. This shows that the regression results are robust.
Table 5.4 Robustness test regression results (independent variable replacement)
Dependent Variable  Tracking Error (ETFNAV %) 
 (1) 
 
Turnover (%)  0.761* 
 (0.44) 
Individual Percentage  0.084 
 (0.07) 
Turnover (%)×Individual Percentage  1.453** 
 (0.72) 
Shares Outstanding (in ¥billions)  0.001 
 (0.00) 
Trading Volume (in ¥billions)  0.116* 
 (0.06) 
Expense Ratio (%)  0.021 
 (0.42) 
Index Volatility  0.001*** 
 (0.00) 
Holder Number (in thousands)  0.136 
 (1.73) 
Short Selling  0.007 
 (0.02) 
Cash &Deposit Ratio  0.196 
 (0.17) 
Requisition Ratio  0.039** 
 (0.02) 
Replicated  0.066 
 (0.07) 
 
Yearmonth Fixed Effects  YES 
Fund Fixed Effects  YES 
 
Nobs.  3213 
Adjusted R^{2}  0.586 
***, **, and * indicate significance at the 1%, 5%, and 10% levels respectively, and the robust standard errors after clustering are in parentheses.
5.3.3 Variable tailing
The independent variable and dependent variable are Winsorizwinsorized by 1% above and below respectively, that is, the 1% and 99% quantile data are used to replace the data outside the 1% and 99% quantile. The regression results are shown in Table 5.5.
As shown in Table 5.5, column (1) is only the dependent variable, column (2) is only the independent variable (ETF Illiquidity, Individual Percentage), and column (3) is the independent variable at the same time. , the dependent variable is winsorized. According to the table, regardless of the dependent variable or the independent variable, the regression results of the interaction term are significant, and the direction of the coefficient is consistent with the main regression result, indicating that the research results are robust.
Table 5.5 Robustness test regression results (variables winnowed)
Dependent Variable  Tracking Error (ETFNAV_w %)  Tracking Error (ETFNAV %)  Tracking Error (ETFNAV_w %) 
 (1)  (2)  (3) 
   
ETF Illiquidity  0.152  13.315**  11.770*** 
 (0.26)  (2.41)  (2.06) 
Individual Percentage  0.038  0.052  0.059 
 (0.05)  (0.05)  (0.04) 
Interaction  6.774***  12.562**  13.092*** 
 (2.44)  (6.28)  (4.69) 
Shares Outstanding (in ¥billions)  0.000  0.002  0.001 
 (0.00)  (0.00)  (0.00) 
Trading Volume (in ¥billions)  0.066**  0.104*  0.052* 
 (0.03)  (0.06)  (0.03) 
Expense Ratio (%)  0.000  0.092  0.057 
 (0.35)  (0.35)  (0.30) 
Index Volatility  0.001***  0.001***  0.001*** 
 (0.00)  (0.00)  (0.00) 
Holder Number (in thousands)  0.474  1.414  0.543 
 (1.31)  (1.85)  (1.32) 
Short Selling  0.019  0.004  0.010 
 (0.01)  (0.02)  (0.01) 
Cash &Deposit Ratio  0.452**  0.232*  0.198 
 (0.18)  (0.13)  (0.13) 
Requisition Ratio  0.028**  0.040**  0.029** 
 (0.01)  (0.02)  (0.01) 
Replicated  0.044*  0.020  0.036** 
 (0.03)  (0.02)  (0.01) 
   
Yearmonth Fixed Effects  YES  YES  YES 
Fund Fixed Effects  YES  YES  YES 
   
Nobs.  3213  3213  3213 
Adjusted R^{2}  0.639  0.625  0.671 
1.***, **, and * indicate significance at the 1%, 5%, and 10% levels respectively, and the robust standard errors after clustering are in parentheses. 2. Column (1) is the regression result after shrinking the dependent variable by 1% before and after, and keeping the other variables unchanged; Column (2) is the regression result after shrinking the independent variables ETF Illiquidity and Individual Percentage by 1% before and after, keeping the other variables unchanged. The following is a pair of regression results; column (3) is the regression result when the independent variables and dependent variables are shrunk by 1% and the remaining variables remain unchanged.
Cause and Effect Identification Strategy
6.1 Instrumental variable regression
In the empirical research results of Chapter 5, it is found that the tracking error of ETFs with poor liquidity is larger; at the same time, this effect will be strengthened on ETFs with a higher proportion of retail investors. In order to further establish the causal relationship between independent variables and dependent variables, and to solve potential endogeneity problems, including omitted variables, sample selection, twoway causality and measurement error, this chapter uses the instrumental variable method for further verification.
Based on Friedman's (1961) research, he believed that reducing the supply of money will lead to a smaller share of the money held by investors in total wealth, which indirectly leads to an increase in the marginal utility of cash, and investors will sell nonconventional products such as stocks and cash. Monetary assets to increase their real wealth; on the contrary, when the money supply increases, investors are more likely to reduce their cash holdings and purchase investment and financial products with certain risks. Therefore, this article chooses the indicator reflecting the money supply: M1 as the instrumental variable of this article.
After adding instrumental variables, the regression results are shown in Table 6.1. After using the instrumental variable method, the regression coefficient of the independent variable (interaction term) on the tracking error is still significant, which shows that through the test of instrumental variables, potential endogeneity problems can be eliminated.
Table 6.1 Instrumental variable regression
Dependent Variable  Individual Percentage  Interaction  Tracking Error (ETFNAV %) 
 (1)  (2)  (3) 
   
M1(in ¥trillions)  0.011***  0.000***  
 (0.00)  (0.00)  
M1×ETF Illiquidity  2.263***  0.056***  
 (0.61)  (0.01)  
ETF Illiquidity  12.110***  0.205***  5.766** 
 (3.45)  (0.04)  (2.76) 
Individual Percentage    1.564*** 
   (0.24) 
Interaction    44.704* 
   (25.12) 
Shares Outstanding (in ¥billions)  0.0276  0.000  0.048*** 
 (0.00)  (0.00)  (0.01) 
Trading Volume (in ¥billions)  0.068*  0.000*  0.044 
 (0.03)  (0.00)  (0.05) 
Expense Ratio (%)  0.290  0.001  0.660 
 (0.04)  (0.00)  (0.11) 
Index Volatility  0.001**  0.000  0.004*** 
 (0.00)  (0.00)  (0.00) 
Holder Number (in thousands)  0.202***  0.001  0.287 
 (0.14)  (0.02)  (0.06) 
Short Selling  0.128  0.000  0.294 
 (0.02)  (0.00)  (0.04) 
Cash &Deposit Ratio  0.958  0.019***  2.815*** 
 (0.16)  (0.00)  (0.81) 
Requisition Ratio  0.005***  0.000  0.009*** 
 (0.00)  (0.00)  (0.01) 
Replicated  0.181*  0.001***  0.505*** 
 (0.03)  (0.00)  (0.09) 
   
Yearmonth Fixed Effects  YES  YES  YES 
Fund Fixed Effects  YES  YES  YES 
   
F statistics  35.72  32.99  
Nobs.  3236  3236  3236 
Adjusted R^{2}  0.138  0.577  0.646 
1.***, **, and * indicate significance at the 1%, 5%, and 10% levels respectively, and the robust standard errors after clustering are in parentheses. 2. Column (1) and column (2) are the results of the first stage of instrumental variable regression, and the result of column (3) is the regression result of the independent variable on the dependent variable after fitting the regression.
Chapter 7 Heterogeneity Test
In order to further study the relationship between independent variables and dependent variables, this chapter conducts heterogeneity testing and classifies the data according to different classification standards to study the relationship between the two under different data sets. Combining the main regression results and factors affecting ETF tracking error, the following inferences are made: fund size, subscription and redemption ratio, cash ratio in the asset portfolio, and underlying underlying trading market conditions will affect the tracking error of ETF.
7.1 Fund size
For smallscale funds, if their holders are more retail investors, because retail investors hold the underlying assets for a shorter period than institutional investors (Chang Jiang, 2008), at the same time, Hsieh et al. (2020) found The herding effect of retail investors is stronger in smallcap stocks. Similarly, among smaller ETF funds, the impact of retail holdings on ETF liquidity is stronger on its tracking error. Therefore, based on the above analysis, this article puts forward the following hypotheses:
H2: For funds with small share sizes, high retail investor holdings cause ETF liquidity to have a greater impact on ETF tracking error.
In order to verify this hypothesis, this article classifies the sample ETF funds according to their share size, and divides funds with shares greater than 200 million into large share funds, and vice versa as small share funds. Regression analysis was performed on the two funds respectively, and the results are shown in Table 7.1.
Table 7.1 Heterogeneity test results 1
Dependent Variable  Tracking Error (ETFNAV %) 
 (1)  (2) 
  
ETF Illiquidity  23.684***  0.102 
 (7.71)  (0.27) 
Individual Percentage  0.160**  0.055 
 (0.07)  (0.08) 
Interaction  8.743  5.824*** 
 (8.12)  (2.04) 
Shares Outstanding (in ¥billions)  0.002  0.122 
 (0.00)  (0.36) 
Trading Volume (in ¥billions)  0.062  0.438 
 (0.04)  (0.59) 
Expense Ratio (%)  0.110  1.127** 
 (0.16)  (0.48) 
Index Volatility  0.000***  0.002*** 
 (0.00)  (0.00) 
Holder Number (in thousands)  1.152  36.851*** 
 (1.16)  (9.88) 
Short Selling  0.011  0.048 
 (0.02)  (0.06) 
Cash &Deposit Ratio  0.455  0.408*** 
 (0.37)  (0.13) 
Requisition Ratio  0.079***  0.015 
 (0.02)  (0.01) 
Replicated  0.000  0.040 
 (.)  (0.06) 
  
Yearmonth Fixed Effects  YES  YES 
Fund Fixed Effects  YES  YES 
  
Nobs.  1617  1552 
Adjusted R^{2}  0.426  0.692 
1. ***, **, and * indicate significance at the 1%, 5%, and 10% levels respectively, and the robust standard errors after clustering are in parentheses. 2. The result of column (1) is largeshare funds, and the result of column (2) is smallshare funds.
7.2 Fund trading volume
According to the main regression results, it can be seen that the worse the liquidity of ETF funds, the greater the tracking error. At the same time, a fund with larger trading volume indicates that its trading is more active, that is, its liquidity is better. Therefore, based on the above analysis, the following hypotheses are put forward:
H3: The smaller the trading volume of a fund, the greater its tracking error will be affected by the proportion of retail investors.
In order to verify this hypothesis, this article classifies the sample ETF funds according to their daily trading volume, and classifies funds with a daily trading volume greater than 10 million RMB as funds with good liquidity, and vice versa as funds with poor liquidity. Regression analysis was performed on the two funds respectively, and the results are shown in Table 7.2.
Table 7.2 Heterogeneity test results 2
Dependent Variable  Tracking Error (ETFNAV %) 
 (1)  (2) 
  
ETF Illiquidity  433.679  0.183 
 (786.43)  (0.26) 
Individual Percentage  0.065  0.042 
 (0.07)  (0.07) 
Interaction  6202.473  5.961*** 
 (6714.62)  (2.21) 
Shares Outstanding (in ¥billions)  0.003  0.006 
 (0.00)  (0.01) 
Trading Volume (in ¥billions)  0.052  6.078** 
 (0.04)  (3.08) 
Expense Ratio (%)  0.226***  0.649*** 
 (0.07)  (0.25) 
Index Volatility  0.001***  0.001*** 
 (0.00)  (0.00) 
Holder Number (in thousands)  2.179*  30.846*** 
 (1.27)  (9.99) 
Short Selling  0.017  0.093*** 
 (0.02)  (0.02) 
Cash &Deposit Ratio  0.300  0.401*** 
 (0.38)  (0.14) 
Requisition Ratio  0.092***  0.018 
 (0.01)  (0.01) 
Replicated  0.000  0.000 
 (.)  (.) 
  
Yearmonth Fixed Effects  YES  YES 
Fund Fixed Effects  YES  YES 
  
Nobs.  1306  1774 
Adjusted R^{2}  0.301  0.669 
1. ***, **, and * indicate significance at the 1%, 5%, and 10% levels respectively, and the robust standard errors after clustering are in parentheses. 2. The result of column (1) is a fund with good liquidity, and the result of column (2) is a fund with poor liquidity.
7.3 Concentration of holdings
It is known that retail investor holdings will amplify the impact of ETF liquidity on ETF tracking error. So, does the total number of fund holders have anything to do with the above impact? In this regard, this article puts forward the following hypotheses:
H4: The more dispersed the fund holding structure is, the smaller the effect of retail holdings on amplifying ETF liquidity on ETF tracking error.
This article refers to funds with more than 10,000 holders as holding diversified funds, and vice versa as holding concentrated funds. The classification regression results are shown in Table 7.3.
Table 7.3 Heterogeneity test results 3
Dependent Variable  Tracking Error (ETFNAV %) 
 (1)  (2) 
  
ETF Illiquidity  1070.789***  0.170 
 (293.86)  (0.26) 
Individual Percentage  0.131*  0.080 
 (0.07)  (0.06) 
Interaction  1835.086  6.231*** 
 (1955.36)  (2.25) 
Shares Outstanding (in ¥billions)  0.001  0.002 
 (0.00)  (0.01) 
Trading Volume (in ¥billions)  0.019  0.299** 
 (0.03)  (0.14) 
Expense Ratio (%)  0.145*  0.077 
 (0.08)  (0.44) 
Index Volatility  12.077***  0.001*** 
 (3.44)  (0.00) 
Holder Number (in thousands)  1.082  166.315*** 
 (0.97)  (46.82) 
Short Selling  0.017  0.027 
 (0.02)  (0.03) 
Cash &Deposit Ratio  1.717**  0.426*** 
 (0.82)  (0.14) 
Requisition Ratio  0.126***  0.015 
 (0.02)  (0.01) 
Replicated  0.000  0.057** 
 (.)  (0.03) 
  
Yearmonth Fixed Effects  YES  YES 
Fund Fixed Effects  YES  YES 
  
Nobs.  886  2281 
Adjusted R^{2}  0.356  0.674 
1. ***, **, and * indicate significance at the 1%, 5%, and 10% levels respectively, and the robust standard errors after clustering are in parentheses. 2. Column (1) results in holding diversified funds, and column (2) results in holding concentrated funds.
7.4 Underlying asset trading market
At present, the domestic trading markets include the Shanghai Stock Exchange, Shenzhen Stock Exchange and Beijing Stock Exchange. Due to the different trading activity of different exchanges, the liquidity of crossmarket funds and singlemarket funds is different. Combined with the main regression results, the following hypotheses are put forward:
H4: The interaction term between the proportion of retail investors in crossmarket funds and ETF liquidity has a greater impact on ETF tracking error.
Based on the above assumptions, if the fund's underlying asset trading market is greater than 1, the fund is classified as a crossmarket fund, otherwise it is a singlemarket fund. The regression results are shown in Table 7.4.
Table 7.4 Heterogeneity test results 4
Dependent Variable  Tracking Error (ETFNAV %) 
 (1)  (2) 
  
ETF Illiquidity  3.521*  0.117 
 (1.85)  (0.32) 
Individual Percentage  0.050  0.067 
 (0.04)  (0.13) 
Interaction  26.469***  5.402 
 (8.84)  (3.52) 
Shares Outstanding (in ¥billions)  0.004  0.001 
 (0.00)  (0.00) 
Trading Volume (in ¥billions)  0.117*  0.081* 
 (0.07)  (0.04) 
Expense Ratio (%)  0.487  0.332*** 
 (0.66)  (0.11) 
Index Volatility  0.001***  0.001*** 
 (0.00)  (0.00) 
Holder Number (in thousands)  2.940  0.771 
 (2.96)  (1.28) 
Short Selling  0.001  0.027 
 (0.02)  (0.04) 
Cash &Deposit Ratio  0.389***  0.558 
 (0.13)  (0.44) 
Requisition Ratio  0.053***  0.029 
 (0.02)  (0.02) 
Replicated  0.000  0.000 
 (.)  (.) 
  
Yearmonth Fixed Effects  YES  YES 
Fund Fixed Effects  YES  YES 
  
Nobs.  2381  831 
Adjusted R^{2}  0.578  0.624 
1.***, **, and * indicate significance at the 1%, 5%, and 10% levels respectively, and the robust standard errors after clustering are in parentheses. 2. The result in column (1) is a crossmarket fund, and the result in column (2) is a singlemarket fund.
Chapter 8 Summary and Reflection
This article delves into the relationship between retail holdings, ETF (exchangetraded fund) liquidity, and ETF tracking error. The study found two important conclusions: First, there is a positive relationship between the liquidity of ETFs and their tracking errors, that is, ETFs with less liquidity tend to exhibit larger tracking errors. Secondly, when retail investors hold more ETFs, the ETFs they hold have a more significant impact on the relationship between liquidity and tracking error.
Overall, this research is of great significance for understanding the operating mechanism of the ETF market and the impact of retail investment behavior. Compared with the U.S. ETF market, China's ETF market started late. Tracking error is one of the important angles to measure ETF risks. Carrying out relevant research will be helpful and meaningful to the supervision and product innovation of domestic ETFs.
For retail investors, this research sheds light on the risks they may face when holding multiple ETFs, especially among less liquid ETFs. Understanding and recognizing this correlation can help retail investors choose their investment portfolios more carefully, thereby reducing investment risks and increasing investment returns.
For ETF issuers and market regulators, the findings highlight the importance of liquidity management. Especially in ETFs with poor liquidity, measures should be taken to improve market liquidity to reduce ETF tracking errors and maintain the stability and healthy development of the market. In addition, for regulatory agencies, understanding the phenomenon that liquidity has a more significant impact on tracking error when retail investors hold multiple ETFs will help to formulate relevant policies and regulations more accurately to protect the rights and interests of retail investors and the market. of stability.
Future research directions can further explore the following aspects: First, consider expanding the research sample to cover more different types of ETFs and retail investors to verify the universality of the research results. Secondly, other factors that may affect ETF liquidity and tracking errors, such as market volatility, transaction costs, etc., can be further explored to build a more comprehensive model. In addition, the future development trends of the ETF market can also be discussed, especially the possible changes in the operating model of ETFs and investor behavior under the influence of emerging fields such as digital finance and intelligent investment.
The findings of this paper provide investors, regulatory agencies and academia with clues to an indepth understanding of the ETF market. It also enriches the research directions and conclusions of ETFrelated research. It is hoped that it can help provide certain reference and reference for future related research and market supervision. Enlightenment