823
Views
0
CrossRef citations to date
0
Altmetric
Articles

The study on systemic risk of rural finance based on macro–micro big data and machine learning

, &
Pages 261-275 | Received 11 Nov 2022, Accepted 14 Jul 2023, Published online: 06 Aug 2023

Abstract

It’s the basic premise of promoting the healthy development of rural finance and strengthening macro-prudential supervision to measure the systemic risk of rural finance accurately. We establish the dynamic factor CAPM and make an all-round and multi-angle quantitative study on the systemic risk of rural finance in China by constructing macro–micro index system and using machine learning to reduce the dimension of high-dimensional data. Our results show that the dynamic factor CAPM of using macro–micro big data can evaluate systemic risk of rural finance more comprehensively and systematically, and machine learning performs well in processing high-dimensional data. In addition, China's rural financial systemic risk is stable compared with the Shanghai and Shenzhen main markets, but it is also susceptible to macro and micro influenced factors. Finally, it is pointed out that the early warning system of rural financial systemic risk could be constructed at macro and micro level, respectively.

1. Introduction

In recent years, the economic development of rural areas has changed dramatically. Rural financial participants are no longer limited to banks, but have extended to securities, insurance, government, industry, small-micro financial institutions and other aspects. The development of rural financial diversification is conducive to create a fair competition environment and also useful to risk dispersion, but the systemic risk can’t be ignored due to the domino effect. Rural financial systemic risk early warning research is an urgent problem need to be solved during the rural economic and financial development at present. Improving the financial systemic risk early warning system, it will not only contribute to the healthy and orderly development of China's rural finance, but also have important theoretical value and practical significance for systemic financial risk prevention and financial supervision.

Systemic financial risk is global, processing, endogenous, and time-varying, and thus we measure the systemic financial risk under the consideration of micro characteristics and macro environmental factors, so that early identification of risk makes financial regulation and risk prevention precise and rapid. However, the existing literatures on systemic risk in rural finance are mainly based on qualitative analysis (Turvey, Citation2011), while quantitative literatures mainly analyse the impact of a single risk (Miranda & Gonzalez-Vega, Citation2011). We find some researches ignore the fact that the use of new technologies and new tools considers the full impact factors of systemic risk with the increase in the number of participants in the development of rural economy and finance. In this research, we explore the issue of regulation of systemic risk in rural finance under the influence of multiple participants, focusing on the use of new technology and new tools to quantify systemic risk in rural finance in a comprehensive study.

The definition of systemic risk is also inconsistent across different studies in the literature. The connotation of systemic risk is relatively rich, and its measurement methods are varied. The existing methods of studying systemic risk mainly include the integrated indicator method considering micro financial data, mean expected loss (MES), the conditional value at risk (CoVaR) and DIP using market data. Based on the research problem, we need to analyse the systemic risk of rural finance for better, and the article considers the importance of macro and micro big data, and uses the method of PCA factor dimensionality reduction. In addition, we analyse the systemic risk of rural finance in China comprehensively and dynamically according to the dynamic factor CAPM by least square and ridge regression, and other machine learning methods such as decision tree and random forest. Machine learning of dimension reduction nature has advantages in dealing with complex data, which makes it more flexible in more complex and higher dimensional data sets compared with the traditional regression estimation methods.

We enrich the research theory of systemic financial risk and provide a new analytical framework for measuring and analysing systemic risk in rural finance. The main conclusions of this article are as follows. Firstly, the dynamic CAPM (DCAPM) combined with the macro–micro big data sets can assess the systemic risk in rural finance more comprehensively and systematically compared with the traditional capital asset pricing model (CAPM). Secondly, it’s found that machine learning is beneficial to deal with high-dimensional complex data sets in constructing different models, and the random forest performs better than the decision tree in predicting systemic risk. Thirdly, we find that the systemic risk of rural finance in China is overall, processing and dynamic. In addition, the rural financial systemic risk is not as volatile as the main Shanghai and Shenzhen markets, but it’s also vulnerable to macroeconomic environment and micro-financial influence.

The rest of this article is organized as follows. A literature review is provided in Section 2. In Section 3, the regression methods and estimation models for studying the systemic risk of rural finance are introduced. Then, Section 4 is the most important part of the empirical research, and we analyse the rural financial systemic risk mainly from macro–micro factor index selection, data processing, factor dimensionality reduction analysis of machine learning, and model construction estimation of dynamic factor CAPM. Finally, we conclude this research with a general discussion in Section 5.

2. Literature review

2.1. Rural financial systemic risk

Rural economic development can’t be achieved without the support of rural finance, which can improve agricultural productivity and improve the welfare level of the poor by increasing access to credit. However, rural financial institutions may be more vulnerable to systemic risk than other economic sectors and institutions, which make it particularly important to analyse and understand the nature of rural financial systemic risk and also to estimate the risk. In the absence of formal techniques to manage risk, farmers may be forced to reduce their investments, which can have a significant negative impact on productive rural economic development.

Systemic financial risk is a phenomenon that prevails within the financial system. It is a widespread risk of financial instability, which cannot be completely eliminated, but needs to be controlled and prevented, or it will have a huge impact on the real economy (Bisias et al., Citation2012). Someone thought that systemic risk is the default of a participant in the financial system’s debt caused by some external shocks (Gandy & Veraart, Citation2017). Another opinion is that systemic risk in which undercapitalization of the financial sector as a whole is assumed to harm the real economy, leading to a systemic risk externality (Acharya et al., Citation2017). Although the definitions of systemic risk are inconsistent, they all reflect the typical characteristics of systemic risk as being complex, systematic, contagious, process-oriented and emergent.

As the complexity of rural financial system increases, financial risks also present diverse characteristics. It’s a hot and difficult issue in financial risk management research to construct indicators to establish the early warning risk system, and also the rural financial systemic risk of modelling estimation at present.

In rural areas, the insurance industry has a pivotal role in the process of protecting against risk, and poorer farmers may be more risk averse leading to a potentially greater need for insurance (Saqib et al., Citation2016). However, with the development of rural economy, insurance is not the only way to diversify risk. Furthermore, the idea of portfolio wind-dispersion risk is nothing new in rural financial risk management research (Hill et al., Citation2013). In the past, the systemic risk of rural finance could be immediately rescued even in crisis events. But nowadays, the fluctuation of rural financial systemic risk is influenced by many factors with the increase in the number of rural financial institutions and participants. As the saying goes, the whole body is affected. Therefore, it has become an urgent task to consider and study the systemic risk of rural finance in multiple directions and from multiple angles to prevent it before it happens.

2.2. Systemic risk estimation methods

In the application of methodological model research on measuring systemic financial risk, there are many estimation methods, and different scholars have proposed numerous measurement methods from their own perspectives. The more commonly used methods include the following: indicator assessment, the conditional value at risk (CoVaR), the marginal expected loss (MES), and using the beta coefficient with capital asset pricing model (CAPM). However, there is typically a large estimation error with the MES and CoVaR as these methods show very strong correlation with systemic risk (Danielsson et al., Citation2016). The financial sector is highly interconnected, and there will be inaccurate estimation and lead to biased estimation of systemic risk measures.

The methodological idea of this research is mainly based on the capital asset pricing model theory using beta coefficient to measure rural financial systemic risk, but not using the classical CAPM theory. Beta is often used as measuring systemic risk. For example, Benoit et al. (Citation2013) used the conditional firm beta to measure the value at risk of Market Risk Proxy. Billio et al. (Citation2012) used regime-switching beta model to measure dynamic risk exposures of hedge funds to various risk factors during different market volatility conditions, and they found that these risks are potentially common factors for the hedge fund industry in the down-state of the market. Straetmans and Chaudhry (Citation2015) mainly focused on the tail risk with tail beta model. Gong et al. (Citation2021) investigated that liquidity risk may play an important role in explaining the inverse relationship between beta and stock return.

Although the traditional CAPM theory has strong theoretical significance, it is not very realistic in the process of practical operation because of fixed beta assumed by traditional CAPM. However, it’s found that the fixed beta can’t explain some anomalies and the model doesn’t perform very well in the process of practical research. Therefore, a multi-factor dynamic CAPM has been developed based on the traditional CAPM. An earlier study on the dynamic CAPM was conducted by Hansen and Richard (Citation1987), who studied the linear dynamic CAPM theory and constructed a time-varying beta model. In addition, Lewellen and Nagel (Citation2006) show that the basic standard CAPM performs poorly under many market anomalies and that dynamic CAPM fits better than the traditional one in measuring the systemic risk of assets. Cederburg and O’Doherty (Citation2016), constructed a dynamic CAPM with macro variables such as lagged beta, market dividend rates, and credit spreads. They found that it can be a more effective estimation of systemic risk.

It’s no problem that people can still easily assess the risk for basic single exposures (Castellani et al., Citation2014; Elabed et al., Citation2013). However, once the risk evolves into a systemic one, there must be better frontier approach and systematic analysis techniques. Squartini et al. (Citation2017) also found that the beta of traditional CAPM is no longer applicable to measure the complicated systemic risk.

Moreover, these research methods mainly consider the impact of microeconomic agents without macro data sets when measuring systemic financial risk. However, it has led to the fact that rural systemic financial risk has long been no longer limited to the microeconomic agents of the market portfolio itself with the development of rural areas, as the increase in the number of agricultural-related financial institutions and the development of agricultural-related enterprises. Cosemans(2016) found that it can better estimate the beta value of the dynamic measure of systemic risk with introduction of the cross-term product of data on macro and micro variables. In addition, the study of dynamic and time-varying extensions to the CAPM in the time dimension is increasingly becoming a hot topic in capital asset pricing research (Ang et al., Citation2020).

2.3. Machine learning and systemic risk

With the development of the rural economy and the advent of the big data era, rural financial systemic risk gradually tends to be complex, and the risk management mode of rural households will involve various aspects such as insurance, government, banks, industry, and so on. So it is necessary to consider macro-environmental factors and micro-firm characteristics factors when assessing rural financial systemic risk. The basic research approach is relatively limited in studying systemic financial risk that is dynamic, complex and multidimensional, which requires further extensions in the research methodology. Cosemans et al. (Citation2016) measured the time-varying dynamic systemic risk beta of stock market by introducing macro variables (credit spreads, etc.), micro variables (book-to-market ratio, etc.) and cross product terms of the two. Boguth et al. (Citation2016) pointed out that the time-varying beta including macroeconomic information can more accurately portray the value of systematic risk of assets across time. Deng et al. (Citation2018) also declared that with the introduction of macro and micro economic variables, integrated hybrid beta has a higher explanatory power to yield compared with traditional methods. Some literatures have shown closer links between financial institutions during the crisis. Gong et al. (Citation2019) analysed the dynamics of the systemic risk measure of causally complex networks among financial institutions in the event dimension and spatial dimension. He et al. (Citation2022) also complicated that the network structure of the market has an indicator effect on the systemic risk contribution.

Machine learning achieves high dimensional measurement of macro and micro big data through variable selection and dimensionality reduction techniques to compress the redundant variation among predictor variables. Because of the better predictive role of machine learning models in fitting big data, related literature is based on the application of machine learning methods to financial risk management. The idea of nonlinear modelling methods such as machine learning can lead to more accurate risk prediction capabilities. Rapach et al. (Citation2013) used the LASSO method to study the global stock market, and Garcia-Jorcano and Sanchis-Marco (Citation2021) indicated that the principal component analysis is very important in the selection of risk measurement features, especially when it comes to dimensionality reduction analysis of multiple indicators and selection of indicators.

However, most of the current dynamic CAPM literatures consider only a few macro variables and use simple linear regression models to analyse time-varying betas, ignoring the rapid development of big data and machine learning techniques. Compared to the traditional empirical approach in capital asset pricing, machine learning includes a more extensive list of potential predictor variables and a richer specification of functional forms (Gu et al., Citation2020). In particular, when predictor variables are highly correlated, traditional forecasting research methods fail, and machine learning emphasizes variable selection as well as dimensionality reduction techniques that are well suited for high-dimensional multi-factor variable selection and macro and micro big data processing, interactions and nonlinear effects of machine learning can improve the predictive performance of systemic risk (Drobetz & Otto, Citation2021). Jiang et al. (Citation2021) analysed the time-varying characteristics of systemic risk in China's stock market under macro–micro dimensional big data, and the study also showed that the intelligent dynamic CAPM introduced by machine learning can significantly improve the risk pricing ability and reduce pricing bias compared with the traditional static CAPM.

2.4. Discussion

According to the literature review, we found that systemic risk is holistic, linkage, multidimensional and processing, and it’s the basis for scientific identification of systemic financial risk by using the method of multi-level and multi-dimensional financial risk measurement index. However, few studies have combined the information content of big data and used the analysis method of factor dimensionality reduction for the systemic risk of rural market, and the selection of variables is also relatively rough. Under the pushing of macroeconomic policies, we should consider the process of economic development and the global nature of the participating subjects when measuring and assessing the systemic risk of rural finance with the development of rural economy and the increase in the participation of agricultural and financial subjects. It is necessary to scientifically estimate the systemic risk of rural finance through macro and micro big data and combined with machine learning methods. In particular, macroeconomic risk indicators reflect the risks fluctuation from changes in macroeconomic factors, and it is particularly important to use macro and micro big data to study systemic financial risks.

In summary, this research combines comprehensive indicator method, big data analysis and machine learning to measure and analyse the systemic risk of rural finance, and our primary contributions are threefold. First, we collect several thousand macro–micro big data to provide an empirical basis for quantifying rural financial systemic risk. Few existing studies have quantitatively analysed rural financial systemic risk. Our research constructs an indicator system that encompasses not only the macroeconomic environment but also the micro agro-related business characteristic, unlike most previous studies that focused only on the financial micro-market system. Second, we use machine learning methods to perform factor dimensionality reduction analysis on the data, in which the analysis methods used are mainly principal components analysis (PCA), ridge regression (RR), decision tree (DT) and random forest (RF). It can achieve accurate quantitative analysis on the one hand, and avoid errors caused by subjective judgment on the other hand. Finally, and more importantly, we use the beta value of dynamic capital asset pricing model (DCAPM) to analyse the time-varying systemic risk of rural finance on the basis of factor dimensionality reduction analysis with machine learning, which could provide short-term early warning information of systemic financial risk and inspires new ideas and thoughts for the quantitative research of rural financial systemic risk.

3. Methodology

3.1. The traditional CAPM

The traditional capital asset pricing model (CAPM) assumes that the measure of systemic risk β is fixed. For a given asset i, the relationship between its expected return and the expected return of the market portfolio can be expressed as equation (1). (1) E(Ri)Rf=βi,m[E(Rm)Rf].(1)

According to the CAPM theory, the rate of return on an asset can be written in the following form (2) Ri=Rf+βi(RMRf)+ϵi.(2)

According to the basic theoretical assumption E(εi) = 0, cov(RM, εi) = 0, then taking the variance of the left and right sides of the above equation yields σi2=βi2σM2+σϵi2, where βi2σM2 denotes the risk associated with the whole market which means systemic risk; and σϵi2 denotes the non-systemic risk. Systemic risk can be measured by the value of β, and the value of systemic risk β for an asset i is displayed in equation (3). (3) βi=σ(Ri,Rm)σ2(Rm).(3)

βi could reflect the level of systemic risk of an individual asset. If it is equal to 1, the price volatility of asset i is the same as the volatility of the market portfolio. If it is less than 1, the price volatility of asset i is less than the market portfolio, or it is greater than 1, the volatility of asset i is greater. A positive β value indicates that the price volatility of the asset i changes in the same direction as the market portfolio, and vice versa.

Therefore, the systemic risk in the market of agricultural-related subjects, derived from the static capital asset pricing model (CAPM), which can be deformed into the following formula. (4) RnRf=α+β0(RmRf)+ϵ.(4)

Rn denotes the return of farm-related subjects, Rm denotes the market return, Rf denotes the risk-free return, and the one-year treasury yield is usually used for Rf here. α is the intercept term, ε is the residual term, ε needs to satisfy E(ε) = 0, cov(Rm, ε) = 0, and β0 is the slope which could capture systemic risk of rural finance. However, the above traditional CAPM is an equilibrium model derived from mean-variance utility theory, which assumes that people are rational, and the model omits a subscript t. The model is a static decomposition model, and there is no direct predictive effect in a time-causal sense for such a static model.

3.2. The dynamic CAPM

The traditional CAPM has strong theoretical implications, but it’s limited in areas related to investment. Then the dynamic CAPM (DCAPM) is promoted due to its scalability and better relevance. (5) Ri=αi+kmβkFk+ϵi.(5)

Fk in the above equation denotes the impact factor affecting the asset return, and βk is the factor coefficient, which represents the factor risk exposure. Compared to the traditional CAPM, dynamic factor model is not fixed in terms of factors. That is to say the risk impact does not originate only from market risk factors, but may also be influenced by other risk factors. Combined with the risk-free arbitrage pricing theory (APT), equation (5) can be deformed to equation (6). (6) RiRf=kmβk(FkRf)+ϵi.(6)

In the linear dynamic CAPM and linear time-varying beta literatures, the time-varying systemic risk βi,t of asset i is generally expressed as a linear function of the information set Zi,t-1 (Ferson & Siegel, Citation2009), and Zi,t-1 in equation (7) usually represents a few macroeconomic variables in the classical dynamic CAPM. The expression for the value of time-varying βi,t is given in the following equation. (7) βi,t=βi,0+βi,1Zi,t1.(7)

In equation (7), βi,t is the systemic risk value of asset i in period t, where βi,0 is the constant term, and βi,1 represents the influence coefficient on the systemic risk of asset i. However, Zi,t-1 merely represents a few macroeconomic variables. It is not possible to effectively measure all the factors that influence the value of systemic risk. Therefore, it is necessary to construct dynamic factor CAPM which introduces macro–micro big data sets.

3.3. Factor models of macro-micro big data

As it is known from the previous section, fixed beta is not very meaningful in practice. In addition, the beta value calculated from the market index alone does not fully represent the systematic risk, resulting in a too low beta value. The study finds that β value used to measure systemic risk is time-varying rather than fixed (Adrian et al., Citation2015), especially when dynamic CAPM with introduction of macro and micro big data can estimate systemic risk better.

With the basis of existing literature research (e.g., Gu et al., Citation2020; Jiang et al., Citation2021), this article constructs a dynamic factor model by introducing macro indicators and micro factors, which is different from the building models with macro–micro data sets in previous literatures (i.e. combining macro–micro data sets to construct a comprehensive function expression, see Jiang et al., Citation2021). In this paper, function expressions of macro data sets and micro data sets are constructed separately to extract useful data information to the maximum extent and avoid data information distortion. And then we construct a dynamic CAPM (DCAPM) with macro–micro data information to measure the time-varying β value of rural financial systemic risk. The specific model function expression is shown in the following formula (8). (8) Rn,t=αn+[βn,0+βn,1ft(Ht1)+βn,2ft(Wt1)]Rm,t+ϵt,(8) where Rn,t denotes the portfolio return of farm-related subjects and Rm,t denotes the market return. In equation (8), Ht-1 and Wt-1 denote the macro data information and micro data information sets respectively. ft (Ht-1) and ft (Wt-1) denote the function values obtained from information extraction of macro data set Ht-1 and micro data set Wt-1 based on machine learning algorithm, respectively. In addition, βn,0 denotes the systemic risk of farm-related subjects, and βn,1 and βn,2 denote the systemic risk of farm-related subjects after being affected by macroeconomic indicators and microeconomic indicators respectively.

Combined with risk-free arbitrage pricing theory (APT), the above equation (8) can be deformed to equation (9). (9) Rn,tRf=αn+[βn,0+βn,1ft(Ht1)+βn,2ft(Wt1)](Rm,tRf)+ϵt.(9)

We expand the equation (9) to the following equation (10). (10) Rn,tRf=αn+βn,0(Rm,tRf)+βn,1ft(Ht1)(Rm,tRf)+βn,2ft(Wt1)(Rm,tRf)+ϵt.(10)

3.4. Modelling with machine learning

3.4.1. Dimensionality reduction with PCA

The research mainly uses principal component dimensionality reduction analysis to extract macro and micro big data information in dealing with the information of macro and micro big data sets. The method of principal component analysis (PCA) converts multiple indicators into a few composite indicators. Each principal component is a linear combination of the original variables, independent of each other, and retains most of the information of the original variables.

Firstly, the standardized data sets are obtained to calculate the covariance matrix to determine the correlation between variables. If the correlation is large, the principal component dimensionality reduction analysis can be used, and the covariance matrix is set as W: (11) W=(ωij)n×n=[ω11ω12ω1nω21ω22ω2nωn1ωn2ωnn].(11)

λ in equations (12) and (13) is the eigenvalue of matrix W. The principal components are judged to select the appropriate factors to achieve factor reduction analysis by calculating the contribution rate bj and the cumulative contribution rate αp. The formula for calculating bj and αp is as following equations (12) and (13). (12) bj=λjk=1nλk(j=1,2,,n),(12) (13) αp=k=1pλkk=1nλk(pn).(13)

3.4.2. Regression methods

The research analyzes the systemic risk of rural finance by using linear regression techniques and nonlinear regression techniques respectively in order to compare and analyse the goodness-of-fit under different regression methods. The main linear regression techniques used in this research are least squares and ridge regression; the main nonlinear regression techniques are decision trees and random forest algorithms.

Both ridge regression and least squares belong to statistical methods, which is a type of linear regression method. When the data sets have multicollinearity between them, the least squares method is not very accurate, and then it can be estimated by ridge regression, which is also called weight decay in machine learning. The matrix form of regression analysis is as following equation (14). (14) y=j=1pγjxj+γ0.(14) In general, the objective of solving the above regression question using least squares is to minimize the following equation (15). (15) γˆ=argminγi=1N(yiγ0j=1pγjxi)2.(15)

However, ridge regression represents the addition of a penalty term to the above minimization objective function in equation (15). That is to say ridge regression is least squares regression with two-parametric penalty. (16) γˆbridge=argminγ{i=1N(yiγ0j=1pγjxi)2+λj=1pγj2}.(16)

3.4.3. Estimation of systemic risk

It’s very important to analyse systemic risk and determine the impact of the underlying market combined of machine learning nonlinear regression decision tree and random forest, as well as the introduction of macro and micro economic environment. Decision tree model is a tree-like structure, which starts from the root node. The model divides tests data samples into different subsets of data samples according to different results.

In order to train the samples as correctly as possible, sometimes it leads to overfitting because of too many branches in the decision tree model. Therefore, the random forest model is optimized based on the basic decision tree model in order to prevent the overfitting phenomenon. The decision tree and random forest are used to select the important factors and determine the importance of the features affecting the model. In addition, the test sets and train sets of the decision tree and random forest are judged by the relevant judgment criteria, mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and square of the decidable coefficient (R2).

Finally, the research uses the following formula in equation (17) to calculate the rural financial systemic risk value combining the dynamic factor model and machine learning. (17) βt=βn,0+βn,1ft(Ht1)+βn,2ft(Wt1).(17)

The equation (17) indicates that rural financial systemic risk is influenced not only by the base period β0, but also by the macro factor ft (Ht-1) and the micro factor ft (Wt-1).

4. Empirical analysis

4.1. Data and descriptive statistics

4.1.1. Description of sample data

Our research object adopts quarterly data, with the sample interval from Q2 of 2002 to Q2 of 2022 totalling 81 quarters of data, combined with the actual situation of China's rural financial market, and considering the availability and validity of data (e.g., Gu et al., Citation2020; Jiang et al., Citation2018). We use 13 macroeconomic indicators and 21 micro characteristics totalling 2754 macro–micro data to estimate the rural financial systemic risk. The selection of data indicators refers to Tan and Xia (Citation2020) and Jiang et al. (Citation2021) but the difference is that we study the systemic risk of rural finance and consider the impact factors of rural financial market including the ranks of rural area, farmers and agriculture. The macro data and micro financial data are unified at quarterly frequencies, the risk-free yield uses the quartered one-year treasury yield, the market portfolio yield uses the CSI 300 index yield, and the return of agriculture-related subjects uses the return of the agriculture, forestry, animal husbandry and fishery index compiled by Shenwan Hongyuan. The data are obtained from the website of National Bureau of Statistics and the database of Wind.

4.1.2. macro–micro indicators construction

The causes of systemic risk in rural finance are not only influenced by the market itself, but also by the national economic macro environment and other variables, such as low return but high risk and long cycle of agricultural investment, low rural income leading to a large proportion of ‘low quality’ loans to farmers and an increase in the rate of non-performing loans, government investment in agriculture, imperfect rural insurance system and so on. Therefore, combining with relevant references, the macroeconomic indicators selected in this research are mainly related to five aspects that affect rural development: macro environment, bank, insurance, government and enterprise level.

Among them, four macro environment indicators are the growth index of net income per rural household YoY (Year over Year) (RHITB), consumption expenditure per rural household YoY (RHETB), rural investment YoY (RITB) and macroeconomic sentiment index (MCI); four banking sector indicators are loans to rural financial institutions YoY (RLTB), liabilities to the central bank MoM (Month over Month) (RICHB), deposit to loan ratio (RDLR), and non-performing loan ratio (RNLR); one indicator for the insurance industry is the agricultural insurance income and expenditure ratio (RIER); three government-level indicators are the public budget income and expenditure ratio (PBIER), fiscal support to agriculture YoY (RFSTB), and fiscal support to agriculture ratio (RFSR); and one enterprise-level indicator is the leverage ratio of the agriculture-related enterprise (RELR). Descriptive statistics of macro indicators are shown in Table .

Table 1. Descriptive statistics of macro data indicators.

It should be noted that rural financial institutions in the banking sector indicator mainly refer to rural cooperative banks, rural commercial banks and rural credit cooperatives. The growth rate in the indicators is both YoY and MoM, which are different. YoY reflects the current growth level compared to the same growth level in the previous year (the data period can be annual, quarterly or monthly); the MoM is a comparison of adjacent cycles. The growth rate of the indicators can be seen in different ways by recording both YoY and MoM.

In addition, systemic risk is not only influenced by macroeconomics, but also micro financial indicators. According to the agriculture, forestry, animal husbandry and fishery index compiled by Shenwan Hongyuan, which reflects the micro characteristics of agriculture-related enterprises, we collect 21 micro characteristics financial indicators, which reflect the valuation, investment profitability, and asset flow of micro characteristics enterprises. The specific data indicators and their abbreviations are: operating income YoY (OITB), operating income MoM (OIHB), operating cost MoM (OCHB), operating cost ratio (OCR), net profit attributable to shareholders of the parent company YoY (NPTB), net profit attributable to shareholders of the parent company MoM (NPHB), total profit YoY (PTTB), total profit MoM (PTHB), return on net assets (ER), return on total assets (TAR), gearing ratio (ALR), gearing ratio MoM (ALRHB), current ratio (CR), current ratio MoM (CRHB), quick ratio (QR), quick ratio YoY (QRHB), total assets turnover ratio (TAT), assets MoM (AHB), assets YoY (ATB), debt MoM (DHB) and debt YoY (DTB). Descriptive statistics of micro-indicators are shown in Table .

Table 2. Descriptive statistics of micro data indicators.

4.2. The method of PCA

4.2.1. Feasibility test

In order to construct a dynamic factor analysis model to study the systemic risk of the market of agricultural subjects after the introduction of macro and micro data, it is important to perform dimensionality reduction analysis on macro and micro index data and construct a comprehensive factor model before that. First of all, it is necessary to conduct a feasibility test to determine whether the principal component analysis can be applied. According to the KMO test, some researches think the threshold value of the KMO test is greater than 0.5 (e.g., Han & Ma, Citation2021; Yu & Liu, Citation2018). Literatures indicate that if the result of KMO test is greater than 0.6, it is suitable for principal component analysis (e.g., Chen & Wang, Citation2010; Wang & Ren, Citation2021). According to the Bartlett test, the p-value is less than the specified significance level, which indicates that there is correlation between variables and suitable for PCA. The specific results of KMO test and Bartlett's test for macro and micro indicators are shown in Table .

Table 3. KMO test and Bartlett test results of macro-micro indicators.

From the results of the KMO test and Bartlett test in Table , it is known that the value of KMO is 0.657 for the macro factor (FH), while the value of KMO for the micro factor (FW) is 0.623. At the same time their Bartlett spherical test results show significant and correlation between the variables, and it’s indicated that the macro and micro indicators are valid with PCA.

4.2.2. Construct factor model

In this research, the factor extraction analysis is applied to 13 macroeconomic indicators and 21 microeconomic indicators respectively, and the eigenvalues, variance contribution and cumulative variance contribution of each principal component of macroeconomic and microeconomic indicators are obtained (see Table  for details). By analysing the Table  of variance explained by the number of principal components, we mainly look at the contribution rate of principal components to the explanation of variables. In general, the eigenvalue is less than one and the higher variance means that the principal component is more important. Table  only shows the principal component analysis of the top 10 macro and micro indicators due to the limited space.

Table 4. Eigenvalues and variance contribution of each principal component of macro-micro indicators.

As shown in Table , when the principal component is 7, the eigenvalue of total variance explained by macro indicators is 0.612 and the cumulative variance contribution rate is 85.923%; the eigenvalue of total variance explained by micro indicators is 1.000 and the cumulative variance contribution rate is 86.260%. When the principal component is 7, the cumulative variance contribution rate of both macro and micro indicators is greater than 85%, which indicates that the 7 principal component variables explain a stronger contribution rate, so it is more appropriate to keep the first 7 principal components for both macro and micro indicators.

Figure  shows the heat map of factor loading matrix of macro and micro indicators, in which the importance of the hidden variables in each principal component of the macro and micro indicators can be analysed and the degree of contribution of each principal component to the factor commonality can be visualized graphically from Figure . In addition, the values of principal component 1 to principal component 7 (F1-F7) are obtained respectively according to the macro and micro indicators component matrix calculation, and the values of macro factor FH and micro factor FW are obtained respectively according to the cumulative variance explanation formula. After that, we estimate the beta value of systemic risk in the market of agricultural-related subjects with FH and FW modelling analysis. The factor model calculation formula is as follows. FH=(0.283/0.859)×F1+(0.197/0.859)×F2+(0.106/0.859)×F3+(0.085/0.859)×F4+(0.075/0.859)×F5+(0.067/0.859)×F6+(0.047/0.859)×F7, FW=(0.259/0.863)×F1+(0.18/0.863)×F2+(0.144/0.863)×F3+(0.097/0.863)×F4+(0.071/0.863)×F5+(0.065/0.863)×F6+(0.048/0.863)×F7.

Figure 1. Heat map of factor loading matrix.

Figure 1. Heat map of factor loading matrix.

4.3. Construct models to analyze systemic risk

4.3.1. The difference of CAPM and DCAPM

In order to compare the empirical findings under different models, Table  shows the results of different model parameter estimates under the same regression method with the same model parameter estimate and also different regression methods with the same model. OLS denotes least squares estimation, and RR denotes ridge regression estimation in Table . CAPM denotes the traditional capital asset pricing model assuming beta is fixed, and DCAPM denotes the dynamic CAPM introducing macro and micro large data assuming β is time-varying. The CAPM parameter estimates are obtained according to equation (4), and the DCAPM parameter estimates are obtained according to equation (10).

Table 5. Static and dynamic CAPM with the macro-micro big data sets.

As shown in Table , the three models are significant at the 1% level as evidenced by the F-values, which indicates a relatively strong regression relationship between the model variables. Furthermore, we analyse the model fit through the R2 values in Table . It's revealed that the introduction of macro and micro big data performs better. In addition, the significance p-values of the parameter estimation results show that it’s not significant for the factor models with the introduction of macro and micro big data under the OLS estimation, but it performs better under ridge regression estimation with penalty term added.

Figure  shows the fit of the dynamic CAPM with ridge regression estimation. In general, it can be seen from the model fit that the difference between the true values and predicted values of the model is not very large, and even the true and predicted values are on the same line in some sample intervals. It indicates that the dynamic factor CAPM with macro–micro big data under ridge regression estimation performs well and can be used to measure the rural financial systemic risk.

Figure 2. Fitting diagram of the ridge regression model.

Figure 2. Fitting diagram of the ridge regression model.

4.3.2. Comparison of machine learning models

In this research, two nonlinear regression models, decision tree and random forest, are used to analyse the dynamic factor model with the introduction of macro and micro data to compare the goodness-of-fit of different machine learning models. Since machine learning requires a training set and a test set to build the regression model, the research calculates the importance of the features by slicing the sample data with the training set accounting for 0.8. The importance of the features is calculated by the established decision tree and random forest models, in which the importance of the dynamic factor model with decision tree and random forest fitted to the macro and micro big data is shown in Figure . The x-coordinate of Figure  indicates the degree of feature importance, and the y-coordinate of Figure  indicates the different factors ΔRhs, FHΔRhs and FWΔRhs, which are influenced by the market only, the macroeconomic environment and micro-financial characteristics respectively.

Figure 3. Feature importance histogram of decision tree and random forest.

Figure 3. Feature importance histogram of decision tree and random forest.

It can be seen that different factors ΔRhs, FHΔRhs and FWΔRhs have an important influence in measuring rural financial systemic risk from Figure . Both decision tree and random forest models indicate that the market systemic risk of farm-related subjects is most influenced by the macroeconomic environment (FHΔRhs), although the degree of influence factors fitted by the two models is not quite consistent. In particular, it’s 68.7% that the level of impact importance from the macroeconomic environment (FHΔRhs) measured by the decision tree model, while the importance measured by the random forest model is 53.3%.

To compare the regression fit superiority of the decision tree and random forest, Table  shows the prediction evaluation metrics of the cross-validation set, the training set and the test set to measure the prediction effectiveness of decision tree and random forest through quantitative metrics. We find that the evaluation metrics of the cross-validation set can continuously adjust the hyperparameters to obtain a reliable and stable model. Table  shows five model judgment criteria: Mean Square Error (MSE) indicates the expected value of the square of the difference between the predicted and actual values, Root Mean Squared Error (RMSE) is the square root of MSE, Mean Absolute Error (MAE) reflects the actual situation of the error of the predicted values, and Mean Absolute Percentage Error (MAPE) is the deformation of MAE. The smaller the above four judgment criteria, the higher the accuracy of model.

Table 6. Comparison of regression fit superiority between decision tree and random forest.

It can be seen from the judgment criteria MSE, RMSE and MAE from Table  that both decision tree and random forest perform very well in the training and test datasets. However, in terms of MAPE and R2, they perform well in the training data but perform poorly in the test dataset, although random forest performs better than decision tree in the test set. In general, the dynamic CAPM with introduction of macro–micro big data can measure the rural financial systemic risk under the analysis of machine learning. And also random forest model performs better than decision tree model when dealing with complex data for their predictive estimation.

4.3.3. Estimation of rural systemic risk

The traditional CAPM is more theoretical but less practically meaningful. In this research, we construct a dynamic factor model by introducing macro and micro big data, estimate the parameters of the model by least squares and ridge regression, and analyse its systemic risk impact importance factors by combining machine learning and other nonlinear regression methods. The time-varying β value of rural financial systemic risk is measured through equation (17) of the research, we plot the time-series of rural systemic risk β value under different periods as shown in Figure , and the descriptive statistics of rural systemic risk are shown in Table .

Figure 4. Time-series diagram of rural systemic risk beta value.

Figure 4. Time-series diagram of rural systemic risk beta value.

Table 7. Descriptive statistics of rural financial systemic risk β value.

As shown in Figure , the time-varying β value of rural systemic risk fluctuates below the 0.8 level during the sample interval, but the systemic risk varies widely during the COVID-19 pandemic in recent years, and changes are relatively smooth in other periods. Even if the change of rural financial systemic risk is larger in recent years, it does not exceed the critical value of 1. The criterion for judging the systemic risk by beta value is to compare with the critical value of 1. If beta is equal to 1, it means that the systemic risk of the main market involving agriculture is the same as the risk volatility of the Shanghai and Shenzhen markets. Otherwise, if it is less than 1, it means that the risk volatility of rural financial market is smaller than the Shanghai and Shenzhen markets; and if it is greater than 1, it means there will be greater risk volatility in rural financial market. A positive beta value indicates that the price fluctuations of the agricultural market and the Shanghai and Shenzhen markets move in the same direction, and vice versa.

As shown in Table , the mean value is 0.781, the maximum value is 1.026 (occurred in 2011 IV quarter), and the minimum value is 0.424 (occurred in 2021 IV quarter). From the results of descriptive statistics of β value, the mean value of rural financial systemic risk is less than 1, which means the fluctuation of rural financial systemic risk is lower compared with the Shanghai and Shenzhen subject markets. The systemic risk of rural finance in China is relatively smooth in the last five years, while the fluctuation of rural financial systemic risk becomes larger after the occurrence of the COVID-19 pandemic. However, the price fluctuations of agriculture-related enterprises are not very large from the concrete analysis of systemic risk, which is also in line with the common sense theory of high risk and high return contrary to low risk and low return.

5. Conclusion

The research integrates macro–micro big data sets with machine learning methods to delve into the rural financial systemic risk. Firstly, we take the quarterly data of the last 20 years to build macro–micro economic indicators by referring to the existing literatures. Secondly, we build composite index using dimensionality reduction indicators of the PCA method, then analysis the rural financial systemic risk in China by CAPM (the traditional capital asset pricing model) and DCAPM (the dynamic capital asset pricing model with macro–micro data information) and compare goodness-of-fit between different models. Again, we construct in-depth modelling and analysis on the systemic risk of rural finance in China using machine learning nonlinear models such as decision tree and random forest. Finally, we estimate the rural financial systemic risk, analyse its time-varying characteristics, and explore its risk volatility feature.

The conclusions of this article can be mainly including three parts through empirical research. First, the rural financial systemic risk can be measured more comprehensively and systematically by the dynamic CAPM of the introduction of macro–micro data sets, compared with the conventional Capital Asset Pricing Model (CAPM), and the dynamic CAPM under ridge regression estimation performs better than the OLS. Second, the method of machine learning is conducive to the processing of high-dimensional data, and the random forest regression model is better than the decision tree model in constructing the dynamic CAPM. Finally, the systemic risk of rural finance in China is time-varying, processing, and not fixed, and the change in the rural financial systemic risk is smoother and less volatile compared to the main market of Shanghai and Shenzhen, but systemic risk of rural finance in China is also influenced by the macroeconomic environment and microfinance from the dynamic change of beta value.

Based on the above research findings and combining with the actual situation in China, the construction and improvement of China’s rural financial systemic risk prevention mechanism are considered to the following two levels: macro and micro.

At the micro level, firstly, the internal control system of rural financial institutions needs to be improved, and the loan default system should be strictly authorized and approved. Secondly, for agriculture-related enterprises, we should increase the monitoring of their risk resistance, which can be conducted mainly from the internal department management settings. And also we should ensure relatively independence and division of duty between different business departments. Finally, it’s necessary not only to be highly targeted, but also to follow the principle of moderation and appropriateness when supervising agriculture-related enterprises and rural financial institutions, as excessive regulatory requirements may be counterproductive and lead to the rise of level of rural financial systemic risk.

At the macro level, there is needed to improve the macro control mechanism for systemic risk in rural finance. Systemic risk is overall, processing and endogenous. Risk controlling needs to be predicted well in advance, among which the systemic risk management ideas for rural finance can start from the following aspects. On the one hand, it's necessary to improve the agricultural insurance system and rural futures market. In addition, it’s good for rural economic development and risk management to increase farmers’ income and build a guarantee mechanism for agricultural investment. One the other hand, establish a multi-department collaborative service mechanism. At present, Local Government Units (LGUs) and financial institutions are in a fragmented position for the most part. There is a lack of LGUs guidance financial institutions and insufficient information sharing between various departments, resulting in the rural financial system being affected by multiple entities and insufficient risk diversification. Therefore, it’s considering that the mode such as ‘government-bank-insurance’ can be constructed. As the main participants in rural finance can be twisted into a strand, rural financial systemic risk could be diversified and prevented with market-oriented theory.

References

  • Acharya, V. V., Pedersen, L. H., Philippon, T., & Richardson, M. (2017). Measuring systemic risk. The Review of Financial Studies, 30(1), 2–47. https://doi.org/10.1093/rfs/hhw088
  • Adrian, T., Crump, R. K., & Moench, E. (2015). Regression-based estimation of dynamic asset pricing models. Journal of Financial Economics, 118(2), 211–244. https://doi.org/10.1016/j.jfineco.2015.07.004
  • Ang, A., Liu, J., & Schwarz, K. (2020). Using stocks or portfolios in tests of factor models. Journal of Financial and Quantitative Analysis, 55(3), 709–750. https://doi.org/10.1017/S0022109019000255
  • Benoit, S., Colletaz, G., Hurlin, C., & Perignon, C. (2013). A theoretical and empirical comparison of systemic risk measures. SSRN Electronic Journal, https://doi.org/10.2139/ssrn.1973950
  • Billio, M., Getmansky, M., & Pelizzon, L. (2012). Dynamic risk exposures in hedge funds. Computational Statistics & Data Analysis, 56(11), 3517–3532. https://doi.org/10.1016/j.csda.2010.08.015
  • Bisias, D., Flood, M. D., Lo, A. W., & Valavanis, S. (2012). A survey of systemic risk analytics. Annual Review of Financial Economics, 76(4), 119–131. https://doi.org/10.2139/ssrn.1983602
  • Boguth, O., Carlson, M., Fisher, A., & Simutin, M. (2016). Horizon effects in average returns: The role of slow information diffusion. The Review of Financial Studies, 29(8), 2241–2281. https://doi.org/10.1093/rfs/hhw024
  • Castellani, D., Vigan0, L., & Tamre, B. (2014). A discrete choice analysis of smallholder farmers’ preferences and willingness to pay for weather derivatives: Evidence from Ethiopia. Journal of Applied Business Research (JABR), 30(6), 1671–1692. https://doi.org/10.19030/jabr.v30i6.8882
  • Cederburg, S., & O’Doherty,, M. S. (2016). Does it pay to bet against beta? On the conditional performance of the beta anomaly. The Journal of Finance, 71(2), 737–774. https://doi.org/10.1111/jofi.12383
  • Chen, W., & Wang, X. J. (2010). Appraisal on intellectual capital and innovation capability: An empirical study based on the 20 years’ panel data of China. Science of Science and Management of S.& T. in China, 31(5), 193–199.
  • Cosemans, M., Frehen, R., Schotman, P., & Bauer, R. (2016). Estimating security betas using prior information based on firm fundamentals. Review of Financial Studies, 29(4), 1072–1112. https://doi.org/10.1093/rfs/hhv131
  • Danielsson, J., James, K. R., Valenzuela, M., & Zer, I. (2016). Can we prove a bank guilty of creating systemic risk? A minority report. Journal of Money, Credit and Banking, 48(4), 795–812. https://doi.org/10.1111/jmcb.12318
  • Deng, K. B., Guan, Z. H., & Chen, B. (2018). Macroeconomic policies and systemic risk in China’s stock market: An approach based on integrated hybrid betas. Economic Research Journal in China, 53(8), 68–83.
  • Drobetz, W., & Otto, T. (2021). Empirical asset pricing via machine learning: Evidence from the European stock market. Journal of Asset Management, 22(7), 507–538. https://doi.org/10.1057/s41260-021-00237-x
  • Elabed, G., Bellemare, M. F., Carter, M. R., & Guirkinger, C. (2013). Managing basis risk with multiscale index insurance. Agricultural Economics, 44(4–5), 419–431. https://doi.org/10.1111/agec.12025
  • Ferson, W. E., & Siegel, A. F. (2009). Testing portfolio efficiency with conditioning information. The Review of Financial Studies, 22(7), 2735–2758. https://doi.org/10.1093/rfs/hhn112
  • Gandy, A., & Veraart, L. A. M. (2017). A Bayesian methodology for systemic risk assessment in financial networks. Management Science, 63(12), 4428–4446. https://doi.org/10.1287/mnsc.2016.2546
  • Garcia-Jorcano, L., & Sanchis-Marco, L. (2021). Systemic-systematic risk in financial system: A dynamic ranking based on expectiles. International Review of Economics & Finance, 75, 330–365. https://doi.org/10.1016/j.iref.2021.04.001
  • Gong, C. M., Luo, D., & Zhao, H. N. (2021). Liquidity risk and the beta premium. Journal of Financial Research, 44(4), 789–814. https://doi.org/10.1111/jfir.12263
  • Gong, X. L., Liu, X. H., Xiong, X., & Zhang, W. (2019). Financial systemic risk measurement based on causal network connectedness analysis. International Review of Economics & Finance, 64, 290–307. https://doi.org/10.1016/j.iref.2019.07.004
  • Gu, S. H., Kelly, B., & Xiu, D. C. (2020). Empirical asset pricing via machine learning. The Review of Financial Studies, 33(5), 2223–2273. https://doi.org/10.1093/rfs/hhaa009
  • Han, X. K., & Ma, D. G. (2021). China’s systematic financial risk assessment and early warning based on AM-BPNN model. Statistics & Decision in China, 37(4), 138–141. https://doi.org/10.13546/j.cnki.tjyjc.2021.04.030
  • Hansen, L. P., & Richard, S. F. (1987). The role of conditioning information in deducing testable restrictions implied by dynamic asset pricing models. Econometrica, 55(3), 587–613. https://doi.org/10.2307/1913601
  • He, C. Y., Wen, Z., Huang, K., & Ji, X. Q. (2022). Sudden shock and stock market network structure characteristics: A comparison of past crisis events. Technological Forecasting and Social Change, 180, 121732. https://doi.org/10.1016/j.techfore.2022.121732
  • Hill, R. V., Hoddinott, J., & Kumar, N. (2013). Adoption of weather-index insurance: Learning from willingness to pay among a panel of households in rural Ethiopia. Agricultural Economics, 44(4–5), 385–398. https://doi.org/10.1111/agec.12023
  • Jiang, F. W., Ma, T., & Zhang, H. W. (2021). High risk low return? Explanation from machine learning based conditional CAPM model. Journal of Management Sciences in China, 24(1), 109–126. https://doi.org/10.19920/j.cnki.jmsc.2021.01.007
  • Jiang, F. W., Tang, G. H., & Zhou, G. F. (2018). Firm characteristics and Chinese stocks. Journal of Management Science and Engineering, 3(4), 259–283. https://doi.org/10.3724/SP.J.1383.304014
  • Lewellen, J., & Nagel, S. (2006). The conditional CAPM does not explain asset-pricing anomalies. Journal of Financial Economics, 82(2), 289–314. https://doi.org/10.1016/j.jfineco.2005.05.012
  • Miranda, M. J., & Gonzalez-Vega, C. (2011). Systemic risk, index insurance, and optimal management of agricultural loan portfolios in developing countries. American Journal of Agricultural Economics, 93(2), 399–406. https://doi.org/10.1093/ajae/aaq109
  • Rapach, D. E., Strauss, J. K., & Zhou, G. (2013). International stock return predictability: What is the role of the United States? The Journal of Finance, 68(4), 1633–1662. https://doi.org/10.1111/jofi.12041
  • Saqib, S. e., Ahmad, M. M., Panezai, S., & Ali, U. (2016). Factors influencing farmers’ adoption of agricultural credit as a risk management strategy: The case of Pakistan. International Journal of Disaster Risk Reduction, 17, 67–76. https://doi.org/10.1016/j.ijdrr.2016.03.008
  • Squartini, T., Almog, A., Caldarelli, G., van Lelyveld, I., Garlaschelli, D., & Cimini, G. (2017). Enhanced capital-asset pricing model for the reconstruction of bipartite financial networks. Physical Review E, 96(3), 032315. https://doi.org/10.1103/PhysRevE.96.032315
  • Straetmans, S., & Chaudhry, S. M. (2015). Tail risk and systemic risk of US and Eurozone financial institutions in the wake of the global financial crisis. Journal of International Money and Finance, 58, 191–223. https://doi.org/10.1016/j.jimonfin.2015.07.003
  • Tan, Z. M., & Xia, Q. (2020). The relationship between systemic financial risk and macroeconomic fluctuation in China: A study of index measurement and dynamic impact. Financial Theory & Practice in China, (3), 8–16.
  • Turvey, C. G. (2011). Microfinance, rural finance, and development: Multiple products for multiple challenges: Discussion. American Journal of Agricultural Economics, 93(2), 415–417. https://doi.org/10.1093/ajae/aaq107
  • Wang, J. S., & Ren, Y. H. (2021). Construction, analysis and prejudgment of the financial stability index of China. Journal of Quantitative & Technological Economics in China, 38(2), 24–42. https://doi.org/10.13653/j.cnki.jqte.20210128.001
  • Yu, L. P., & Liu, J. (2018). Are principal component analysis and factor analysis suitable for scientific and technological evaluation? Taking academic journals as an example. Journal of Modern Information in China, 38(6), 73–79+137.