Next Article in Journal
Comparison of Shallow Water Solvers: Applications for Dam-Break and Tsunami Cases with Reordering Strategy for Efficient Vectorization on Modern Hardware
Previous Article in Journal
Disaster-Risk, Water Security Challenges and Strategies in Small Island Developing States (SIDS)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Identification of the Debris Flow Process Types within Catchments of Beijing Mountainous Area

1
State Key Laboratory of Resources and Environmental Information Systems, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
2
University of Chinese Academy of Sciences, Beijing 100049, China
3
Jiangsu Center for Collaborative Innovation in Geographic Information Resource Development and Application, Nanjing 210023, China
4
Collaborative Innovation Center of South China Sea Studies, Nanjing 210093, China
5
School of Geographic and Oceanographic Sciences, Nanjing University, Nanjing 210023, China
6
Research Institute of Exploration and Development Dagang Oil Field, Tianjin 300280, China
*
Author to whom correspondence should be addressed.
Water 2019, 11(4), 638; https://doi.org/10.3390/w11040638
Submission received: 27 February 2019 / Revised: 20 March 2019 / Accepted: 22 March 2019 / Published: 27 March 2019
(This article belongs to the Section Water Resources Management, Policy and Governance)

Abstract

:
The distinguishable sediment concentration, density, and transport mechanisms characterize the different magnitudes of destruction due to debris flow process (DFP). Identifying the dominating DFP type within a catchment is of paramount importance in determining the efficient delineation and mitigation strategies. However, few studies have focused on the identification of the DFP types (including water-flood, debris-flood, and debris-flow) based on machine learning methods. Therefore, while taking Beijing as the study area, this paper aims to establish an integrated framework for the identification of the DFP types, which consists of an indicator calculation system, imbalance dataset learning (borderline-Synthetic Minority Oversampling Technique (borderline-SMOTE)), and classification model selection (Random Forest (RF), AdaBoost, Gradient Boosting (GBDT)). The classification accuracies of the models were compared and the significance of parameters was then assessed. The results indicate that Random Forest has the highest accuracy (0.752), together with the highest area under the receiver operating characteristic curve (AUROC = 0.73), and the lowest root-mean-square error (RMSE = 0.544). This study confirms that the catchment shape and the relief gradient features benefit the identification of the DFP types. Whereby, the roughness index (RI) and the Relief ratio (Rr) can be used to effectively describe the DFP types. The spatial distribution of the DFP types is analyzed in this paper to provide a reference for diverse practical measures, which are suitable for the particularity of highly destructive catchments.

Graphical Abstract

1. Introduction

Debris flow is one of the most influential natural disasters in mountainous areas [1,2] and it periodically causes a large number of losses of lives and properties as well as the destruction of ecosystems and infrastructures [3]. Debris flow, including water-flood, debris-flood, and debris-flow, is a constant threat to mankind and human achievements. The destruction that is based on different magnitudes of debris flow is characterized by the distinguishable sediment concentration, density, and transport mechanisms [4,5,6]. Researchers have paid great attention to susceptibility assessments of the debris flow disasters [7,8,9,10]. However, the studies failed to emphasize the practical problem that different disasters require different strategies to maintain the targeted solutions at the policy level. Therefore, the identification of the dominating debris flow process (DFP) type within catchments is of paramount importance in determining accurate and efficient tools that are necessary for the delineation and mitigation strategies in the early planning period [11].
Researchers showed that geomorphic parameters can be used to identify the catchment types. Terrain analysis explores the catchment formation mechanisms of different disaster types by revealing the relationship between the river basin size and its contribution to the basin [12,13,14,15]. Melton’s ruggedness number has been used to obtain a rapid first approximation of the potential debris flow disaster [16,17,18,19]. Additionally, it has been demonstrated that the Melton ratio, when combined with the catchment length, can be effectively used to differentiate between catchments that are prone to debris flow and debris flood [20]. Discriminant analysis using morphometric variables indicated that the basin area and fan gradient can be used to differentiate the debris flow and fluvial fan types that are based on the process [21]. Other studies have indicated that the standard deviations of the slope gradient and slope aspect are strong predictors for the identification of debris flow [22]. The assessment indicator system of debris flow with a variety of parameters has been established in previous studies. However, the contribution of the parameters to the DFP identification has not been previously studied. Therefore, this study aims to calculate the catchment parameters to determine the most significant parameter in the identification of DFP types.
Debris flow usually occurs coincidentally, therefore, recorded hazardous events show an imbalance in the number of different types. Traditional classification models usually improve the model performance by minimizing the classification errors in such a way that the majority class can be correctly predicted, whereas samples from the minority class tend to be incorrectly predicted [23]. Examples of the minority class are usually of primary interest and their correct recognition is more important than the recognition of examples from the other classes. Such a situation often occurs during hazard assessment, where the number of destructive events that require more attention is much smaller than the number of events that are not as devastating. So far, the strategies dealing with the imbalanced dataset can be divided into three categories: under-sampling (BalanceCascad, EasyEnsemble) [24], over-sampling (Synthetic Minority Oversampling Technique (SMOTE), k-nearest neighbor (KNN)) [23], and data cleaning (Tomek links, neighborhood cleaning rule (NCL)) [25]. These methods have shown a great deal of success in domains, such as fraudulent telephone calls [26], telecommunications management [27], text classification, and the detection of oil spills from satellite images [28]. With respect to the smaller datasets, over-sampling usually shows a better performance due to the limitation of the samples. Among all the over-sampling learning methods, borderline-SMOTE is an extension that generates synthetic samples while considering the data distribution [29].
Traditionally, the DFP types were identified based on the geomorphologic expertise [30]. For quantitative studies, empirical models were used to establish the empirical relationship between the geometric parameters and the DFP types [31,32]. In the early stage, representative models for quantitative prediction, including the logistic regression, Bayes discriminant, and neural network, were widely used [33,34,35]. Recently, machine learning ensembles and hybrid methods have received substantial attention in many fields due to their improved performance when compared with conventional methods [36,37]. Scholars have applied Random Forest (RF) and Support Vector Machines to flood risk assessment [7,38]. Nevertheless, ensemble frameworks for the identification of the DFP types have rarely been explored.
Recently, the prosperity of suburban tourism has increased the attention to the safety and stability of the mountainous area. Beijing, as the political, economic, and cultural center of China, is located between the Yan Mountains and Taihang Mountains. With the climate changing, Beijing mountainous area has repeatedly experienced serious debris flow during the summer in recent years [39,40,41,42]. In this condition, accurately making the targeted strategies for the debris flow disasters with various destruction powers becomes a challenge. Nowadays, a lot of work has focused on the debris flow hazard assessment on regional or catchment scales. However, few studies have set their sights on identifying the specific DFP type and deducing the dominating DFP type within catchments in Beijing, which is of vital importance for the prevention and mitigation of disasters. Therefore, using the documented debris flow disaster events and remote sensing images, we herein present a method that is based on morphometric criteria for the assessing of a first approximation of the DFP type within catchments in Beijing mountainous areas. We are supposed to determine the dominant DFP type by analyzing the morphometric parameters that are contingently connected to flowing.
In the rest of this paper, an integrated framework was established, which consists of indicator system establishment, imbalanced dataset learning, and classification model selection. The indicator system that is used in this study can be divided into parameters that are related to catchment shape and relief gradient, respectively. The imbalanced sample dataset was resampled while using borderline-SMOTE. The ensemble learning models RF, AdaBoost, and GBDTwere used to identify the DFP types. Finally, we analyzed the spatial distribution of the DFP types to provide environment management of the Beijing mountainous area a reference for well-directed measures, which are suitable for highly vulnerable regions.

2. Study Area

The mountainous regions that surround Beijing constitute an estimated area of 10,417.5 km2, accounting for 62% of the surface terrain, which extends over distances of 160 km from east to west and 176 km from south to north (Figure 1). Five rivers are distributed in the study area, that is, the Daqing Rivers, Yongding Rivers, Beiyun Rivers, Chaobai Rivers, and Jiyun Rivers, with more than 100 tributaries. Beijing is located in the semi-arid and semi-humid continental monsoon climate zone with four distinct seasons. The annual mean temperature ranges from 10 °C to 12 °C and the annual mean precipitation ranges from 238 mm to 514 mm. Approximately 75% of the precipitation occurs in the wet season, from June to September. Peak storms occasionally occur in the summer season, which usually trigger debris flow. The mountainous area of Beijing is a complex geological structure with complex folds and fractures and shale joints. The composition of the bedrock and the damaging of rock masses that are induced by tectonics and weathering favor the production of loose eluvial deposits, which are the main sources of the solid material involved in debris flow [43]. Due to the special geographical and climatic conditions, the catchments in Beijing are sensitive to floods, landslides, debris flows, and other natural disasters. Most debris flows in the Beijing mountainous area are distributed in the Western Mountains, Jundu Mountains, and Yan Mountains, which are separated by the Guan Gully and Chao Rivers [44]. Among these debris flow events, the most famous one was the thunderstorm that occurred on 21 July 2012, in the Beijing mountainous area. It caused up to 79 fatalities and losses of over RMB 100 billion.

3. Data and Methodology

3.1. Data Sources

3.1.1. ASTER-GDEM (Version 2)

The Ministry of Economy, Trade, and Industry (METI) of Japan and the United States National Aeronautics and Space Administration (NASA) jointly announced the release of the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Global Digital Elevation Model Version 2 (GDEM V2) on 17 October 2011. ASTER-GDEM provides the only high-resolution elevation image dataset that covers the global land surface. The data covering the Beijing mountainous area were downloaded from the geospatial data cloud website (http://www.gscloud.cn/) of the Computer Network Information Center (CNIC). Based on the 30 m spatial resolution DEM data, the small catchments in the study area were extracted while using an ArcGIS 10.2 hydrological module [45].

3.1.2. Debris Flow Inventory

The precision of the debris flow inventory greatly affects the reliability of the analysis results [46]. In this study, the debris flow inventory of the Beijing mountainous area was obtained from the list in the document, named Debris Flow in Beijing Mountain Area [42]. The latitudes and longitudes were obtained based on Google Earth (http://www.earth.google.com). The spatial locations were then checked using optical images and aerial photographs. As mentioned in the document, Zhong et al. classified the historical events into three types based on the analysis of deposits and the simulation experiment. Finally, 705 debris flow events were classified into three types that were based on the criteria shown in Table 1.
In this study, we determined the dominating DFP type of the catchment according to the debris flow events that occurred in the catchment. We sampled the catchments based on the following two assumptions [11]. Firstly, only catchments with at least two historical events were selected. Secondly, 80% of all debris flow events in a catchment that belong to the same type were selected. As a result, we obtained a total of 90 catchment samples, including 13 water-flood (Figure 2a), 44 debris-flood (Figure 2b), and 33 debris-flow (Figure 2c) catchments.

3.2. Parameters

Several studies have indicated that local flood-producing processes may be more easily analyzed in typical small-scale catchments than in large-scale ones in which the regional combination and interplay of controls is more important [48,49]. Therefore, the area of catchments that were analyzed in this study mainly varies from 3 to 50 km2. Parameters that related to the catchment shape and relief gradient can be used to model different processes [17,18,20,22]. We selected the circularity ratio (Cr), elongation ratio (Er), drainage density (Dd), and form factor (Ff) to characterize the shape features of the catchments. The roughness index (RI), Melton ratio (Mr), elevation relief ratio (Err), and relief ratio (Rr) were used to characterize the topographic features of the catchments. Table 2 defines these parameters.

3.2.1. Parameters Related to the Catchment Shape

Cr is affected by the lithological character of a catchment. The closer the Cr is to 1, the closer the catchment shape is to a circle. The ratio is more influenced by the length, frequency, and gradient of various orders than by the slope conditions and the drainage pattern of the catchment [59]. The areal properties express the planform and dimensions of the catchment. The Cr has been proven to be very promising for the characterization of the sediment dynamics [11]. To facilitate the understanding of this parameter, values of 0.79 and 0.61 are usually used as the thresholds for measuring the approximation to a rectangle or triangle [60].
The Er indicates that the catchment may be affected by faults and other tectonic activities; a high value of Er also illustrates that the catchment is prone to erosion or accumulation [59]. An Er value that is close to 1 indicates that the catchment shape is more like a circle. The Er varies from 0.6 to 0.8, indicating that the catchment has strong relief and steep slope. The higher the Er is, the higher the chances that the catchment has a higher infiltration capacity and lower runoff. In contrast to more circular catchments, the runoff in highly elongated catchments must travel greater distances to reach the catchment outlet. Therefore, a strong fluctuation and high Er are favorable morphometric conditions for debris-flow process [22].
The Dd difference is widely applied in the characterization of the physiographic age, as proposed by Davis [61,62]. The Dd varies with the rainfall, relief, infiltration capacity of the soil, and initial anti-erosion ability of the terrain. Therefore, catchments with a higher Dd usually have a more fragmented surface and worse water impermeability [51,63].
Horton proposed the Ff to predict the flow intensity of a catchment in a defined area. The Ff has an inverse relationship with the square of the axial length and a direct relationship with the peak discharge [54].

3.2.2. Parameters Related to Relief Gradient

The RI reflects the dispersion and collection ability of rainfall runoff. It indicates the local diversities of the elevation and slope. Moreover, the surface roughness affects the hydraulics of overland flow and sediment transport mechanics by increasing the flow resistance that is associated with microtopographic features [64]. The roughness of the slope is not conducive to the runoff, while it is conducive to flood generation.
The Mr, which is a dimensionless parameter, is used for measuring the roughness and average slope of the catchments [13,56]. It effectively characterizes the geological disaster process type of the river basin [18]. It also reflects the tectonic activity and the sediment transportation ability of the catchments. The Mr of the debris-flow dominated catchment is usually higher than 0.5, with the slope of the catchment being greater than 4° [19]. As mentioned in the study of Welsh et al., 0.3 and 0.6 can be used as the thresholds for the identification of the DFP types [18]. Based on the results from earlier studies, the Mr can be used to distinguish water-flood, debris-flood, and debris-flow processes [31,32].
The Err is one of the indicators of geomorphological dissection [57], which reveals the evolution of the catchment geomorphology [65]. The ratio can be used to characterize the formation and processes of the catchments. The Err is a simplified index of the hypsometric integral. The Err value ranges from 0 to 1; and, 1 indicates the strongest intensify erosion of the catchment.
The Rr is equal to the tangent of the angle that is formed by two planes intersecting at the outlet of the catchment, where one represents the horizontal and the other passes through the highest point of the catchment [58]. High Rr values indicate that the catchment tend to be located in the hilly regions, while the low values imply the plains and valleys. As for the stream slope, the inclinations of the ground surface are closely tied with its channel gradient and relief. Field studies showed a high degree of correlation between the high relief and fast drainage frequency. During the heavy rain, the fast drainage frequency and the steep stream channel slope lead to high discharge over a short duration [66].

3.3. Model and Method

In this study, the framework includes data acquisition and preprocessing, parameter calculation, samples over-resampling, and classification modelling (RF, AdaBoost, GBDT). The root-mean-square error (RMSE), mean absolute error (MAE), accuracy, recall, F1-score, kappa coefficient, and area under the receiver operating characteristic curve (AUROC) were used to measure model performance. By comparing the results, the model with the best performance was selected.
The procedure mainly consists of three parts. Firstly, based on the dataset that was collected from the documentation, the imbalanced dataset was resampled using the borderline-SMOTE model; secondly, classification models were constructed while using the training dataset and the parameters were calculated to improve the classification accuracy of the testing dataset; finally, the optimal classification model was obtained to finalize the type of the unknown catchments. Figure 3 shows a detailed overview of the modelling procedure.

3.3.1. Imbalanced Learning

In the imbalanced datasets, the number of samples of a given class is much higher than that of other classes. To obtain a higher overall accuracy, most of the traditional classifiers tend to favor the majority class, which has a large number of samples [67]. In this case, the imbalanced datasets require special attention. Class imbalance learning is a new learning problem that aims to deal with datasets with extremely skewed class distributions. Traditional methods that are used to release the restriction of imbalanced data include three categories: cost-sensitive learning, over-sampling, and under-sampling. Cost-sensitiveness is realized by adding a cost matrix consisting of a class misjudged punishment coefficient to raise the misjudgment cost weight of the default samples [68]. The BalanceCascade approach is an informed under-sampling technique, which is used to effectively overcome the weakness of information loss by randomly removing the redundant samples with random under-sampling techniques [24]. Over-sampling aims at increasing the samples of the minority class until they are equal to the majority class by randomly duplicating the minority class samples. The SMOTE, which creates artificial samples for the minority class, has been widely used to cope with the imbalanced ratio and a good performance has been achieved [23]. The process works as follows.
Let x i be an instance from the minority class. To create an artificial instance from x i , SMOTE first isolates the k-nearest neighbors of x i , from the minority class. Subsequently, it randomly selects one neighbor and then generates a synthetic example along the imaginary line connecting x i and the selected neighbor.
x n e w = x + r a n d ( 0 , 1 ) × | x i x n |
However, inevitable weakness exists. The selected neighbor and current sample may be in different classes. To further address the weakness, researchers presented a modified minority over-sampling method, that is, borderline-SMOTE, in which only the minority examples that are near the borderline are over-sampled while using SMOTE [29]. This study includes fewer water-flood samples than debris-flood and debris-flow samples, which could affect the accuracy of the classifiers. To deal with this, we selected the borderline-SMOTE method to preprocess the sample datasets and then obtained a final sample dataset, including 44 samples for each class.

3.3.2. Model Training

Ensemble methods use multiple learning algorithms to improve the predictive performance of the constituent learning algorithms [69]. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble only consists of a concrete finite set of alternative models, which typically allows for much more flexible structures. Based on a combination of the strategies, alternative methods can be divided into two categories, averaging and boosting methods. The driving principle of the averaging methods is to independently build several estimators and subsequently average their predictions. On average, the combined estimator is usually better than any of the single-based estimators, because its variance is reduced (e.g., bagging methods, forests of randomized trees). In contrast, the base estimators of the boosting methods are sequentially built and then one tries to reduce the bias of the combined estimator (the former estimator). The motivation is to combine several weak models to produce a powerful ensemble, for example, AdaBoost and Gradient Boosting.

Random Forest

Random Forest (RF) is a combination of tree predictors, such that each tree depends on the values of an independently sampled random vector, with the same distribution for all trees in the forest. The bootstrap resampling method is used to extract multiple samples from the original data. A classification tree is constructed for each bootstrap sample, the predictions of all taxa are combined, and the final result is obtained by voting [70]. The basic idea of RF is to combine multiple weak classifiers to form a strong classifier. These weak classifiers, which play a complementary role, reduce the impact of a single classifier error to improve the classification accuracy and stability. Randomness in the RF is the result of two randomization processes: firstly, a bootstrap sample is taken from the learning set for each tree; and secondly, a subset of the explanatory variables is randomly selected at each node. RF, as a natural nonlinear modeling tool, effectively solves multivariate predictions and therefore it is applied in many fields [71,72,73]. Furthermore, the RF model has achieved a good performance in flooding disaster assessment and risk analysis [7,74,75].
In this study, the RF model for the DFP type identification was implemented with the Python programming language. In this study, we used bootstrap sampling to extract the k samples from the original training set, and the size of each sample was the same as that of the original training set; a k decision tree model was established for the k samples to obtain k classification results. Based on the k classification, the results of each record determine its final classification.
H ( x ) = arg max Y i = 1 k I ( h i ( x ) = Y ) ,
where, H ( x ) represents the composition classification model, h i is the single decision tree classification model, Y is the output variable, and I ( x ) represents the function.
The number of features that are randomly chosen at each node is a key parameter of the RF, which may affect the stability of the model. The sensitivity of other parameters, such as the number of trees in the forest, as well as the size of each tree (i.e., the minimum number of samples for splits) have also been studied [76,77,78]. These RF parameters can be made by means of resampling techniques, such as bootstrap or cross-validation. In this study, the number of features for the best split was set to the square root of the total feature number (sqrt), the number of trees was 60, and the minimum number of samples to split was set to 2. Additionally, we adopted the balanced mode to automatically adjust the weights for each class.

AdaBoost

AdaBoost is an ensemble machine learning technique that was initiated by Freund and Schapire [79]. As a boosting method, AdaBoost is designed to sequentially build a series of classifiers from the weights of the sample, which were adjusted according to the error of previous predictions [80]. At a specific training stage, the learning weights of the samples with higher prediction errors from previous models are increased, while the learning weights of the samples with lower prediction errors are decreased. As the iterations proceed, samples that are difficult to predict receive more attention, lowering the global prediction error, are decreased. The final model is a linear combination of these base estimators with better classifiers generating higher coefficients, and vice versa. The base estimator is used as a classification and regression tree (CART) to estimate the feature importance after model fitting.
AdaBoost is sensitive to noisy data and outliers. In some cases, it can be less susceptible to the overfitting problem than other learning algorithms. Individual learners can be weak so long as the performance of each one is slightly better than random guessing; the final model can converge to a strong learner.
In this study, two user-configurable parameters were used for the AdaBoost training procedure, that is, the learning rate for every tree, which was set to 0.05, and the number of boosting stages, which was set to 100.

Gradient Boosting

Gradient Boosting (GBDT) is an integrated learning algorithm that consists of gradient boosting and decision trees and it automatically searches nonlinear interplay by decision-tree learning with minimal error [81]. The GBDT is a supervised machine learning algorithm and it comprises a family of powerful machine-learning techniques that have yielded promising results in a wide range of practical applications [82]. The GBDT is a type of additive model that performs classifications by combining decisions from a sequence of base classification tree models [83]. The GBDT uses a model ensemble technique, called gradient boosting, which iteratively builds a model, while improving the performance of the previous iteration model.
The name “Gradient Boosting” originates from the association of this method with gradient descent optimization [83], which is commonly used to solve classification problems by finding a local minimum of the loss function.
g t ( x ) = i = 0 t 1 f i ( x ) = g t 1 ( x ) + f t ( x )
Similarly, let g t ( x ) be the classification tree trained at iteration t, L [ y i , g ( x i ) ] be the loss function, and N be the number of observations; at each gradient boosting iteration, the algorithm determines a classification tree f t , which moves g t in the negative gradient direction L / g by a step-size of η . Hence, f t is chosen to be,
f t = a r g   min f i 1 N { L [ y i ,   g ( x i ) ] g ( x i ) f ( x i ) } 2 ,
and the algorithm sets,
g t + 1 = g t + η f t .
For classification problems with the sum-squared loss function,
L / g = y i g ( x i ) ,
Therefore, f t can be written, as follows,
f t = a r g   min f i = 1 N [ y i g ( x i ) f ( x i ) ] 2 .
In this study, some of the parameters of GBDT were set in advance. The learning rate for every tree was set to 0.1, the number of boosting stages to perform was set to 60, the depth for every tree was set to 6, the loss function was set as deviance, and 80% of the samples were used for fitting the individual base learners.

3.3.3. Model Validation

The goodness of fit for the classification model was evaluated while using a set of quantitative criteria, including the RMSE, MAE, recall (sensitivity), accuracy, F1-score, kappa coefficient, and AUROC.
The RMSE and MAE are often used for the validation of models, and are defined, as follows,
RMSE = 1 n i = 1 n ( y i y ^ i ) 2
MAE = 1 n i = 1 n | y i y ^ i | .
where, y is the vector of the observed values and y ^ is the vector of N predictions.
The confusion matrix is an efficient tool in describing the relationship between prediction and observation. The confusion matrix consists of true positive (TP), false positive (FP), true negative (TN), and false negative (FN). By definition, TP is the number of correctly classified catchments. The FP is the number of incorrectly classified catchments. The TN is the number of catchments that are correctly classified as two other types and FN is the number of catchments that are incorrectly classified as two types. The higher the TP is and the lower the FP, the better the results [84]. Based on the four possible consequences, the recall, accuracy, F1-score, and Cohen’s kappa criteria are formulated as:
Recall = TP TP + FN ,
Precision = TP TP + FP ,
Accuracy = TP + TN TP + TN + FP + FN ,
F 1 score = 2 × Precision × Recall Precision + Recall .
The Kappa coefficient is 0.6~0.8 and 0.8~1, representing a substantial and almost perfect agreement between the estimation and observation, respectively [85].
Kappa = p o p e 1 p e ,
p o = TP + TN TP + TN + FP + FN ,
p e = ( TP + FN ) ( TP + FP ) + ( FP + TN ) ( FN + TN ) TP + TN + FP + FN .
The receiver operating characteristic (ROC) curve is another useful and standard way of assessing the predictive power and the quality of probabilistic models [36]. Graphically, the x-axis and y-axis are plotted according to the sensitivity and 100-specificity, respectively [86]. The AUROC is a quantitative index for identifying the general performance of the models [36]. The higher the AUROC, the better the model performance. The AUROC ranges from 0.5 (for an inaccurate model) to 1 (a perfect model) [35], which can be computed as,
AUROC = TP + TN TP + FP + ( TN + FN )
In this study, the modeling and validating were implemented with the “scikit-learn”, which is a package for machine learning in Python. Scikit-learn (http://scikit-learn.org) offers packages for ensemble learning, including packages for bagging and averaging methods.

4. Results

4.1. Parameter Distribution Analysis

4.1.1. Distribution of the Catchment Shape

The value of Cr ranges from 0.22 to 0.79, with a mean of 0.52. Thresholds of 0.61 and 0.79 were proposed to indicate the approximation to a triangle and rectangle, respectively. The catchments in the study area are more similar to triangles, which indicates that the permeability of the catchments is weak (Figure 4a). The value of Er is in the range of 0.42–0.9, with a mean of 0.67. A total of 74% of the catchments are in the range of 0.6–0.8, indicating that the catchments are in the active process of erosion and accumulation (Figure 4b). The value of Dd ranges from 0 to 1.96 km/km2, with a mean of 0.34 km/km2. The higher value is mainly obtained in catchments with an area that is below 10 km2, in which the dense surface runoff causes the sharp incision on the ground (Figure 4c). The value of Ff varies from 0.14 to 0.63, with an average of 0.36. Based on the equation, the value of 0.79 is a threshold differentiating the catchment from a circle. In contrast, lower values indicate a shorter axial length and a more intense flow discharge (Figure 4d).

4.1.2. Distribution of the Relief Gradient

The four parameters of the relief gradient have a similar spatial pattern, that is, they are high in the southwest and low in the northeast in the Beijing mountainous area. The RI value is in the range of 1.01–1.31, with a mean of 1.11. Figure 4e indicates the concentration of high values in the south of the study area. A low RI is often observed along streams or around lakes. The Mr value ranges from 0.04 to 0.91 and the mean value is 0.23. The Rr value varies from 0.02 to 0.45 and the mean value is 0.14. The distribution of the Rr is consistent with that of the Mr; high values are detected along the valley axes from the southwest to northeast of the study area. However, low values are mainly scattered in the northern part, which is known as the Yan Mountains (Figure 4f,h). The Err value varies from 0.05 to 0.89, with a mean value of 0.63. The value is concentrated in the range of 0.5–0.7. Apart from the catchments on the borderline between the mountains and plain and the catchments that are close to the lakes, the Err values are rather high (Figure 4g).
The boxplot below shows an overview of the basic parameter samples, grouped by the three defined DFP types (Figure 5). To compare the results, all the parameters were normalized while using the Min-Max method. Most morphometric variables that were selected in this study were sensitive to the identification of the DFP types. The mean value of RI for the water-flood catchments was demonstrated to be higher than the debris-flood and the debris-flow catchments. The Cr, Mr, and Rr, on average, show the higher debris-flow values and they significantly differ from the debris-flood and waterflood catchments. An exception is Dd, which displayed lower values for debris-flow catchments, which can also effectively identify debris-flow catchments from the other two types. However, the Er, Ff, and Err were not sensitive to the DFP types.

4.2. Models Validation and Comparison

The debris flow inventory dataset was partitioned into subsets of 80% and 20% (Pareto principle) to be used for training and testing, respectively. All three models (RF, AdaBoost, and GBDT) that were discussed in the previous sections were fitted to the training and testing datasets using the Python environment. Five-fold cross-validation on the training dataset tuned the parameters for the models were tuned by and the optimum ones were used in the final models. The performance of a model is given by the statistics parameters, kappa coefficient, and AUROC, which were evaluated while using bootstrap resampling [87].
Table 3 lists the training and testing results of the three models. The comparison of the training and testing metrics indicates a clear decrease in the accuracy and sensitivity of all the models. This indicates the overfitting of the models with the training data and that further model validation is necessary. The results show that the RF model has the highest accuracy and recall (0.752 and 0.75, respectively), followed by the GBDT and AdaBoost model. Additionally, the RF model has the highest kappa coefficient of 0.625, signifying a substantial consistency between prediction and observation. However, the RF model also has the lowest RMSE and MAE values of 0.544 and 0.265, respectively.
The ROC curves for the three models were constructed using the training and testing datasets (Figure 6). The AUROC of the training dataset is high, indicating an almost perfect agreement between prediction and observation. In contrast, with respect to the validation dataset, the RF model yields the highest AUROC (0.73), followed by the GBDT (0.7), and then the AdaBoost (0.68) models. All of the models have an acceptable classification capability. With respect to the AUROC of each class using Random Forest, debris-flow has the highest AUROC (0.78), while water-flood and debris-flood yield values of 0.74 and 0.7, respectively.

4.3. Parameter Sensitivity Analysis Based on the RF Model

The importance of each parameter can be evaluated based on the worsening of the prediction if the parameter is randomly permuted. The parameter importance of each model was calculated during the training procedure with five-fold cross-validation. At the end of the training procedure, the importance of each parameter was obtained by averaging the difference, which was then normalized while using the standard deviation of all importance values of each parameter.
Figure 7 shows that the parameters can be broadly divided into three groups according to the evaluation results. The two parameters that are related to the catchment gradient relief, RI and Rr, occupy the top two ranks. The RI has the largest effect on the identification of the DFP types, contributing 17.7% to the classification. RI is an indicator of the microtopographic features and it affects the overland flow and sediment transport mechanics. As mentioned in Section 4.1 (Figure 5), the water-flood type showed a higher RI value than the other two types. The rougher topography surface increased the flow resistance during the sediment transport process, causing it to be more difficult for the solid material to move with the flow. Therefore, catchment with a higher RI value is more likely to induce water-flood. The Rr also influences the type identification, with a significant value of 15.4%. Rr is closely related to the channel gradient and relief, and the high Rr produces the high discharge with the more power. The Rr value of debris-flow type that is displayed in Section 4.1 (Figure 5) was much higher than the other types, for the debris-flow process with more solid material requires stronger carrying capacity. The second most important group of parameters includes the Mr, Err, Dd, and Cr, which contribute 14.2%, 13.7%, 12.5%, and 10.4% to the total classification, respectively. The last groups of parameters used are the Ff and Er, which rank seventh and eighth, indicating that the two parameters provide less information during the training procedure.

4.4. Mapping of the Debris Flow Process Type

The map of the DFP types was generated via the above-mentioned data processing framework. The proportions of the catchments that are dominated by different disaster processes vary. The results show that 179, 306, and 245 catchments are dominated by water-flood, debris-flood, and debris-flow, accounting for 20.04%, 57.32%, and 22.64% of Beijing mountainous area, respectively (Table 4).
Figure 8 shows that the water-flood process dominates 24.52% of the total catchment area. The concentration of water-flood prone catchments, which are significantly influenced by dissected terrain, is higher in the Taihang Mountains. In addition, almost half of the catchment area (41.92%) in the Beijing mountainous area are dominated by debris-flood process. The debris-flood process frequently occurs and it predominates the study area. The catchments in the Yan Mountains are dominated by debris-flood process because of the relatively gentle terrain and the slightly elongated shape. Furthermore, approximately one-third of the total study area (33.56%) belongs to the debris-flow process. Catchments that are dominated by debris-flow process are scattered in the study area. In the Taihang Mountains, catchments that are prone to debris-flow process are concentrated around coalmines. The abandoned coal gangue and the wasted fuel material source for the disaster. In the Yan Mountains, catchments that are dominated by debris-flow process are found along faults, with the more active tectonics.

5. Discussion

5.1. Validation against the Documentary Dataset

To validate the final classification model against the documentary data, several recorded events were considered (mainly based on the field investigation) [88,89,90,91,92,93,94,95,96,97,98]. Table 5 shows the confusion matrix of the predicted types, being estimated for each catchment of the documentary data set. Based on the final classification model, 10 of 14 catchments were correctly identified.
The validated results indicated that the prediction accurately classifies the water-flood process. Here, no clear validation results were acquired for the debris-flood process for the lack of documentary data. However, only three out of six catchments were correctly predicted as the debris-flow process.

5.2. Parameters Sensitivity Analysis

The RI reflects the local variability of the elevation and slope, which indicates the differences of the three process types (Figure 9). The correlation (R2 = 0.36) between the RI and elevation can be used as an indicator for the identification of catchments that are dominated by the debris-flood process. However, relationships were not observed for the other two process types. Consistent with the result that was proposed by Heiser et al. [11], the debris-flood process tends to form a distinctive channel-bed morphology, which is different from the other processes.
The ratio of the Rr and slope reveals the transport mechanisms along the flow path (Figure 10). There seems to be a relationship (R2 = 0.23) between the Rr and slope of the catchments that are dominated by the water-flood process. The high ratio indicates a long flow path within catchments, as a result, they are more prone to be a water-flood process. Contrary, debris-flow process tends to generate in a steep channel with strong entrainment of material and water from the flow path. Additionally, the low Rr-slope ratio of debris-flow process is in accordance with the results of previous studies [47].

5.3. Spatial Differentiation of the DFP Types

There is evidence that indicates that several factors could intensify the future debris flow risk, such as global warming and ongoing socioeconomic development in debris flow prone areas [99,100,101,102]. Climate change has caused the more frequent occurrence of extreme precipitation in summer, resulting in the debris flow warning threshold calling for more attention. In addition, the expansion construction in the mountainous area, on one hand, disturbed the balance of the surface water cycle; on the other hand, the prosperous economic development intensified the hazard vulnerability. Therefore, to obtain more details regarding the spatial distribution of the catchments, maximum continuous precipitation (MCP), moisture index (IM), distance to road, population density, and per capita Gross Domestic Product (GDP) related to debris flow were analyzed (Figure 11).
The MCP is one of the prerequisites that may lead to the outburst of debris flow during the heavy storm. Studies showed that spatial attention should be paid to the changing climate, which may affect the occurrence and magnitude of hydro meteorological hazards [103]. In the study area, most of the water-flood and debris-flow catchments in the area has the MCP of 200–300 mm, and debris-flood are mostly with the MCP above 300 mm. The moisture index (IM) influences the absorption of surface water and soil water saturation that causes the occurrence of debris flow. According to the IM map that was obtained from the Data Center for Resources and Environmental Sciences, Chinese Academy of Sciences (RESDC; http://www.resdc.cn), water-flood catchments are mostly distributed in the arid region, with a lower IM, while debris-flood is mainly distributed in the region with a higher IM. Debris-flow is distributed in both the arid and humid regions. Soil offers the growth environment for vegetations, which, to a great extent, determines the stability of surface material. In the study area, cinnamon soil and brown soil are the dominant types, accounting for more than 90% of the total area. Where, cinnamon soil is distributed in the region with intense sunshine, high soil temperature, and strong evaporation, and it is hard to efficiently conserve soil moisture, resulting in low vegetation coverage. Brown soil is mostly distributed in the region with high elevation, especially the watershed between rivers. High altitude area with suitable climate and little human interference is fit for vegetation growth. The spatial distribution of water-flood catchments is consistent with that of brown soil. Human activities, such as land use, road construction, and river bank invasion, have changed the mountainous environment, and disturbed the stability of the catchments in the long term. The artificial impervious surface hinders the discharge of debris flow, resulting in the accumulation of runoff water in the downstream. Most catchments in the study area are less than 1000 m away from roads. In most of the study area, the population density is 100–300 people per km2 and the per capita GDP is approximately 3000 RMB/km2. The rapid economic development in the mountainous area has caused an annual increase in the loss of human lives and it has increased the exposure of properties to all kinds of disasters. The lives and properties of both residents and tourists are threatened by debris flow disasters. This issue should be prioritized in disaster planning and prevention.

5.4. Model Deficiencies

Although the results of this study somewhat satisfy the classification demand, more work should be performed to make improvements. Here, we list several factors that need to be considered in future studies.
  • When referring to the recorded disaster events, the event location interpretation was difficult. It was also hard to discern the disaster types after human transformation [42]. Usually, only debris flow that caused huge losses is reported or listed in the documents. Thus, debris flow that occurred in a remote area or that did not cause damage to people was ignored. These issues lead to an incomplete disaster inventory.
  • The hydrology model was used to divide the catchments based on the DEM, supplemented by visual interpretation and manual modification. However, the catchment size greatly differs. The larger the catchments are, the more hazardous the events, and the more complex are the types. Therefore, catchments with different types of events are typically insufficient for model training and more information is needed to classify the dominated disaster type.
  • The RF model is regarded as one of the most effective and popular classification models. However, studies showed that the RF model has several drawbacks [71], for example, the algorithm tends to base the classification on the group with a larger number of samples. Therefore, the application of the RF model is limited [104].
  • When dealing with the disaster type of the particular catchment, the final choice of the prediction results should not only depend on the classification accuracy, but also on consideration of the actual field research. It is of vital importance to complete continuous simulation experiments to obtain a more suitable method.
Despite the drawbacks, the contributions of this study represent an approach that can be applied to the identification of the DFP types and to additional decision-making processes in hazard prevention.

6. Summary and Conclusions

With the consistent warming climate on global and national scales in recent years, severe extreme precipitation frequently occurs, which imposes a greater challenge on people to accurately make preparation for disaster prevention. Identifying the specific debris flow process type may powerfully aid in decision making. The objective of this study was to develop a model framework that can be used to identify the debris flow process (DFP) types in the Beijing mountainous area. This objective was achieved by applying ensemble learning to a dataset that integrated data from multi-sources. The dataset extracted the parameters that are related to the catchment shape and relief gradient. Based on the comprehensive datasets, three ensemble learning models (RF, AdaBoost, and GBDT) were developed. The results show that Random Forest more accurately identifies the DFP types than the other two models, with an overall accuracy of 75%. The key points of this study can be summarized, as follows:
  • This work generates insights into the suitability of different ensemble learning methods for the identification of the DFP types, demonstrating that Random Forest achieves a better result when compared with AdaBoost and Gradient Boosting.
  • By developing models with different subsets of parameters, it is possible to derive insights into the different parameters and their contribution to the classification model. In particular, the RI and Rr are optimal parameters in the identification of the DFP types, while Ff and Er are not sensitive to the DFP types in the study area.
The study provides knowledge that guides the abstract decision-making process of the concerned authorities. The proposed diverse strategies that are associated with the spatial distribution of various DFP types will be beneficial for the decision-makers.

Author Contributions

N.W. and W.C. conceived and designed the experiments; N.W. and M.Z. performed the experiments; N.W. and Q.L. analyzed the data; N.W., W.C. and J.W. wrote the paper.

Funding

This research was funded by China Institute of Water Resources and Hydropower Research (IWHR), grant number No. SHZH-IWHR-57 and National Natural Science Foundation of China, grant number No. 41571388.

Acknowledgments

The authors are grateful for financial support from the China Institute of Water Resources and Hydropower Research (IWHR). This work was supported by the China National Flash Flood Disaster Prevention and Control Project.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Badoux, A.; Andres, N.; Techel, F.; Hegg, C. Natural hazard fatalities in Switzerland from 1946 to 2015. Nat. Hazards Earth Syst. Sci. 2016, 16, 2747–2768. [Google Scholar] [CrossRef] [Green Version]
  2. Dowling, C.A.; Santi, P.M. Debris flows and their toll on human life: A global analysis of debris-flow fatalities from 1950 to 2011. Nat. Hazards 2014, 71, 203–227. [Google Scholar] [CrossRef]
  3. Xu, Z. Flash Flood Prevention and Control; China Water & Power Press: Beijing, China, 1981. [Google Scholar]
  4. Costa, J.E. Physical Geomorphology of Debris Flows; Springer: Heidelberg/Berlin, Germany, 1984; pp. 268–317. [Google Scholar]
  5. Phillips, C.J.; Davies, T.R.H. Determining rheological parameters of debris flow material. Geomorphology 1991, 4, 101–110. [Google Scholar] [CrossRef]
  6. Anderson, R.S.; Anderson, S.P. Geomorphology: The Mechanics and Chemistry of Landscapes; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
  7. Zhao, G.; Pang, B.; Xu, Z.; Yue, J.; Tu, T. Mapping flood susceptibility in mountainous areas on a national scale in China. Sci. Total Environ. 2018, 615, 1133–1142. [Google Scholar] [CrossRef] [PubMed]
  8. Chapi, K.; Singh, V.P.; Shirzadi, A.; Shahabi, H.; Bui, D.T.; Pham, B.T.; Khosravi, K. A novel hybrid artificial intelligence approach for flood susceptibility assessment. Environ. Model. Softw. 2017, 95, 229–245. [Google Scholar] [CrossRef]
  9. Razavi, S.T.; Kornejady, A.; Pourghasemi, H.R.; Keesstra, S. Flood susceptibility mapping using novel ensembles of adaptive neuro fuzzy inference system and metaheuristic algorithms. Sci. Total Environ. 2017, 615, 438–451. [Google Scholar] [CrossRef]
  10. Khosravi, K.; Pham, B.T.; Chapi, K.; Shirzadi, A.; Shahabi, H.; Revhaug, I.; Prakash, I.; Tien, D.B. A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. Sci. Total Environ. 2018, 627, 744–755. [Google Scholar] [CrossRef]
  11. Heiser, M.; Scheidl, C.; Eisl, J.; Spangl, B.; Hübl, J. Process type identification in torrential catchments in the eastern Alps. Geomorphology 2015, 232, 239–247. [Google Scholar] [CrossRef]
  12. Bull, W.B. Geomorphology of Segmented Alluvial Fans in Western Fresno County; U.S. Govt. Print. Off.: California, WA, USA, 1964; pp. 89–129.
  13. Church, M.; Mark, D.M. On size and scale in geomorphology. Environ. Model. Softw. 1980, 4, 342–390. [Google Scholar] [CrossRef]
  14. Guzzetti, F.; Marchetti, M.; Reichenbach, P. Large alluvial fans in the north-central Po Plain (northern Italy). Geomorphology 1997, 18, 119–136. [Google Scholar] [CrossRef]
  15. Saito, K.; Oguchi, T. Slope of alluvial fans in humid regions of Japan, Taiwan and the Philippines. Geomorphology 2005, 70, 147–162. [Google Scholar] [CrossRef]
  16. Jackson, L.E.; Kostaschuk, R.A.; Macdonald, G.M. Identification of debris flow hazard on alluvial fans in the Canadian Rocky Mountains. In Debris Flow/Avalanches: Process, Recognition, and Mitigation; Geological Society of America: Boulder, CO, USA, 1987; pp. 115–124. [Google Scholar]
  17. Pasuto, A.; Marchi, L.; Tecca, P.R. Flow processes on alluvial fans in the Eastern Italian Alps. Z. Geomorphol. 1993, 37, 447–458. [Google Scholar]
  18. Welsh, A.; Davies, T. Identification of alluvial fans susceptible to debris-flow hazards. Landslides 2011, 8, 183–194. [Google Scholar] [CrossRef]
  19. Chou, H.T.; Lee, C.F.; Lo, C.M. The formation and evolution of a coastal alluvial fan in eastern Taiwan caused by rainfall-induced landslides. Landslides 2016, 14, 109–122. [Google Scholar] [CrossRef]
  20. Wilford, D.J.; Sakals, M.E.; Innes, J.L.; Sidle, R.C.; Bergerud, W.A. Recognition of debris flow, debris flood and flood hazard through watershed morphometrics. Landslides 2004, 1, 61–66. [Google Scholar] [CrossRef] [Green Version]
  21. De, S.F.A.; Owens, I.F. Morphometric controls and geomorphic responses on fans in the Southern Alps, New Zealand. Earth Surf. Process. Landf. 2010, 29, 311–322. [Google Scholar]
  22. Rowbotham, D.; Scally, F.D.; Louis, J. The identification of debris torrent basins using morphometric measures derived within a GIS. Geogr. Ann. Ser. A Phys. Geogr. 2005, 87, 527–537. [Google Scholar] [CrossRef]
  23. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  24. Liu, X.Y.; Wu, J.; Zhou, Z.H. Exploratory under-sampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B 2009, 39, 539–550. [Google Scholar]
  25. Tomek, I. Two modifications of CNN. IEEE Trans. Syst. Man Cyberns. 1976, 6, 769–772. [Google Scholar]
  26. Fawcett, T. Combining data mining and machine learning for effective user profiling. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, 2–4 August 1996; pp. 8–13. [Google Scholar]
  27. Ezawa, K.J.; Singh, M.; Norton, S.W. Learning goal oriented Bayesian networks for telecommunications risk management. In Proceedings of the Thirteenth International Conference on International Conference on Machine Learning, Bari, Italy, 3–6 July 1996; pp. 139–147. [Google Scholar]
  28. Lewis, D.D.; Catlett, J. Heterogenous uncertainty sampling for supervised learning. Mach. Learn. Proc. 1994, 148–156. [Google Scholar] [CrossRef]
  29. Han, H.; Wang, W.Y.; Mao, B.H. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In Proceedings of the International Conference on Intelligent Computing, Hefei, China, 23–26 August 2005; pp. 878–887. [Google Scholar]
  30. Costa, J.E. Rheologic, geomorphic, and sedimentologic differentiation of water floods, hyperconcentrated flows, and debris flows. In Flood Geomorphology; Baker, V.R., Kochel, R.C., Patton, P.C., Eds.; Wiley: Hoboken, NJ, USA, 1988; pp. 113–122. [Google Scholar]
  31. Berti, M.; Simoni, A. Prediction of debris flow inundation areas using empirical mobility relationships. Geomorphology 2007, 90, 144–161. [Google Scholar] [CrossRef]
  32. Scheidl, C.; Rickenmann, D. Empirical prediction of debris-flow mobility and deposition on fans. Earth Surf. Process. Landf. 2010, 35, 157–173. [Google Scholar] [CrossRef]
  33. Ohlmacher, G.C.; Davis, J.C. Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA. Eng. Geol. 2003, 69, 331–343. [Google Scholar] [CrossRef]
  34. Rupert, M.G.; Cannon, S.H.; Gartner, J.E.; Michael, J.A.; Helsel, D.R. Using Logistic Regression to Predict the Probability of Debris Flows in Areas Burned by Wildfires, Southern California, 2003–2006; Open-File Report; U.S. Geological Survey: Reston, VA, USA, 2008.
  35. Bui, D.T.; Tuan, T.A.; Klempe, H.; Pradhan, B.; Revhaug, I. Spatial prediction models for shallow landslide hazards: A comparative assessment of the efficacy of Support Vector Machines, Artificial Neural Networks, Kernel Logistic Regression, and Logistic Model Tree. Landslides 2016, 13, 361–378. [Google Scholar]
  36. Pham, B.T.; Bui, D.T.; Dholakiad, M.B.; Prakashe, I.; Phamf, H.V.; Mehmoode, K.; Lef, H.Q. A novel ensemble classifier of rotation forest and Naive Bayer for landslide susceptibility assessment at the Luc Yen district, Yen Bai Province (Viet Nam) using GIS. Geomat. Nat. Hazards Risk 2016, 8, 649–671. [Google Scholar] [CrossRef]
  37. Pham, B.T.; Bui, D.T.; Prakash, I.; Dholakia, M.B. Hybrid integration of multilayer perceptron Neural Networks and Machine Learning Ensembles for landslide susceptibility assessment at Himalayan area (India) using GIS. Catena 2017, 149, 52–63. [Google Scholar] [CrossRef]
  38. Hong, H.; Tsangaratos, P.; Ilia, I.; Liu, J.; Zhu, A.X.; Chen, W. Application of fuzzy weight of evidence and data mining techniques in construction of flood susceptibility map of Poyang county, China. Sci. Total Environ. 2018, 625, 575–588. [Google Scholar] [CrossRef] [PubMed]
  39. Li, Q.; Xu, Z. The distribution of debris flow in the mountainous region of Beijing. Mt. Res. 1983, 42–48. (In Chinese) [Google Scholar] [CrossRef]
  40. Zhang, S.; Bi, X. Discussion on the controlling-measures of debris flows in the mountain areas of Beijing. Bull. Soil Water Convers. 1992, 46–51. (In Chinese) [Google Scholar] [CrossRef]
  41. Xie, H.; Zhong, D.; Jin, H. Debris flow and landslide disasters control in mountain area of Beijing City. Bull. Soil Water Conserv. 2001, 21, 37–45. (In Chinese) [Google Scholar]
  42. Zhong, D.; Xie, H.; Wang, S.; Wei, F.; Jin, H.; Liu, S.; Tang, J.; Yang, H. Debris Flow in Beijing Mountain Area; Commercial Press: Beijing, China, 2004. (In Chinese) [Google Scholar]
  43. Zhou, J. Technique of space prediction on flush flood and debris flow disaster. J. Soil Water Conserv. 2001, 15, 112–116. [Google Scholar]
  44. Xie, Y.; Cui, J. Prevention and prediction of debris flow in Beijing. Guizhou Sci. 1992, 3, 132. (In Chinese) [Google Scholar]
  45. Cheng, W.; Wang, N.; Zhao, M.; Zhao, S. Relative tectonics and debris flow hazards in the Beijing mountain area from DEM-derived geomorphic indices and drainage analysis. Geomorphology 2016, 257, 134–142. [Google Scholar] [CrossRef] [Green Version]
  46. Merz, B.; Thieken, A.H.; Gocht, M. Flood risk mapping at the local scale: Concepts and challenges. In Flood Risk Management in Europe: Innovation in Policy and Practice; Advances in Natural and Technological Hazards Research; Begum, S., Stive, M.J.F., Hall, J.W., Eds.; Springer Netherlands: Dordrecht, The Netherlands, 2007; pp. 231–251. [Google Scholar]
  47. Hungr, O.; Leroueil, S.; Picarelli, L. The Varnes classification of landslide types, an update. Landslides 2014, 11, 167–194. [Google Scholar] [CrossRef]
  48. Merz, R.; Blöschl, G. Flood frequency hydrology: 1. Temporal, spatial, and causal expansion of information. Water Resour. Res. 2008, 44, 8432. [Google Scholar] [CrossRef]
  49. Merz, R.; Blöschl, G. Flood frequency hydrology: 2. Combining data evidence. Water Resour. Res. 2008, 44, 147. [Google Scholar] [CrossRef]
  50. Potter, P.E. A quantitative geomorphic study of drainage basin characteristics in the Clinch mountain area Virginia and Tennessee. J. Geol. 1953, 65, 112–113. [Google Scholar] [CrossRef]
  51. Strahler, A.N. Quantitative Geomorphology of Drainage Basins and Channel Networks. In Strahler Handbook of Applied Hydrology; McGraw-Hill: New York, NY, USA, 1964. [Google Scholar]
  52. Schumm, S.A. Evolution of drainage systems and slopes in Badlands at Perth Amboy, New Jersey. Geol. Soc. Am. Bull. 1956, 67, 597–646. [Google Scholar] [CrossRef]
  53. Horton, R.E. Drainage-basin characteristics. Eos Trans. Am. Geophys. Union 1932, 13, 350–361. [Google Scholar] [CrossRef]
  54. Horton, R.E. Erosional development of streams and their drainage basins; hydrophysical approach to quantitative morphology. J. Jpn. For. Soc. 1945, 56, 275–370. [Google Scholar] [CrossRef]
  55. Kamphorst, E.C.; Jetten, V.; Guérif, J.; Pitkänen, J.; Iversen, B.V.; Douglas, J.T.; Paz, A. Predicting depressional storage from soil surface roughness. Soil Sci. Soc. Am. J. 2000, 64, 1749–1758. [Google Scholar] [CrossRef]
  56. Melton, M.A. The geomorphic and paleoclimatic significance of alluvial deposits in southern Arizona. J. Geol. 1965, 73, 1–38. [Google Scholar] [CrossRef]
  57. Evans, I.S. General Geomorphometry. Derivatives of Altitude and Descriptive Statistic; Harper & Row: Lower Manhattan, NY, USA, 1972. [Google Scholar]
  58. Schumm, S.A. Sinuosity of alluvial rivers on the Great Plains. Bull. Geol. Soc. Am. 1963, 74, 1089–1099. [Google Scholar] [CrossRef]
  59. Sreedevi, P.D.; Owais, S.; Khan, H.H.; Ahmed, S. Morphometric analysis of a watershed of south India using SRTM data and GIS. J. Geol. Soc. India 2009, 73, 543–552. [Google Scholar] [CrossRef]
  60. Kojima, T.; Saito, K.; Kakai, T.; Obata, Y.; Saigusa, T. Circularity ratio. A certain quantitative expression for the circularity of a round figure. Okajimas Folia Anatomica Japonica 1971, 48, 153–161. [Google Scholar] [CrossRef]
  61. Davis, W.M. Geographical Essays; Forgotten Books Ginn&Co.: Boston, MA, USA, 1909. [Google Scholar]
  62. Wooldridge, S.W.; Morgan, R.S. The Physical Basis of Geography; Longmans, Green& Co.: London, UK, 1937. [Google Scholar]
  63. Melton, M.A. Analysis of the Relations among Elements of Climate, Surface Properties, and Geomorphology; Technical report no. 11; Columbia University, Department of Geology: New York, NY, USA, 1957; Volume 2, pp. 14–33. [Google Scholar]
  64. Singh, V.P.; Liu, Q.Q. Effect of microtopography, slope length and gradient, and vegetative cover on overland flow through simulation. J. Hydrol. Eng. 2004, 9, 375–382. [Google Scholar]
  65. Wood, W.F. A Quantitative System for Classifying Landforms; Technical Report EP-124; U.S. Army Quartermaster Research and Engineering Center: Natick, MA, USA, 1960.
  66. Gopalakrishna, G.S.; Kantharaj, T.; Balasubramanian, A. Morphometric analysis of Yagachi and Hemavathi River basins around Alur Taluk, Hassan District, Karnataka, India. J. Appl. Hydrol. 2004, 17, 9–17. [Google Scholar]
  67. Chawla, N.V.; Japkowicz, N.; Kotcz, A. Editorial: Special issue on learning from imbalanced data sets. ACM SIGKDD Explor. Newsl. 2004, 6, 1–6. [Google Scholar] [CrossRef]
  68. Hand, D.J.; Vinciotti, V. Choosing k for two-class nearest neighbour classifiers with unbalanced classes. Pattern Recognit. Lett. 2003, 24, 1555–1562. [Google Scholar] [CrossRef]
  69. Dietterich, T.G. Ensemble methods in machine learning. In Proceedings of the International Workshop on Multiple Classifier Systems, Cagliari, Italy, 21–23 June 2000; Volume 1857, pp. 1–15. [Google Scholar]
  70. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  71. Chen, X.; Ishwaran, H. Random forests for genomic data analysis. Genomics 2012, 99, 323–329. [Google Scholar] [CrossRef]
  72. Tesfamariam, S.; Liu, Z. Earthquake induced damage classification for reinforced concrete buildings. Struct. Saf. 2010, 32, 154–164. [Google Scholar] [CrossRef]
  73. Dong, L.J.; Li, X.-B.; Peng, K. Prediction of rock burst classification using Random Forest. Trans. Nonferrous Met. Soc. China 2013, 23, 472–477. [Google Scholar] [CrossRef]
  74. Malekipirbazari, M.; Aksakalli, V. Risk assessment in social lending via random forests. Expert Syst. Appl. 2015, 42, 4621–4631. [Google Scholar] [CrossRef]
  75. Wang, Z.; Lai, C.; Chen, X.; Yang, B.; Zhao, S.; Bai, X. Flood hazard risk assessment model based on random forest. J. Hydrol. 2015, 527, 1130–1141. [Google Scholar] [CrossRef]
  76. Grömping, U. Variable importance assessment in regression: Linear Regression versus Random Forest. Am. Stat. 2009, 63, 308–319. [Google Scholar] [CrossRef]
  77. Strobl, C.; Malley, J.; Tutz, G. An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychol. Methods 2009, 14, 323–348. [Google Scholar] [CrossRef]
  78. Genuer, R.; Poggi, J.-M.; Tuleau-Malot, C. Variable selection using Random Forests. Pattern Recognit. Lett. 2010, 31, 2225–2236. [Google Scholar] [CrossRef]
  79. Freund, Y.; Schapire, R.E. Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on International Conference on Machine Learning, Bari, Italy, 3–6 July 1996; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1996; pp. 148–156. [Google Scholar]
  80. Drucker, H. Improving regressors using boosting techniques. In Proceedings of the Fourteenth International Conference on Machine Learning, Nashville, TN, USA, 8–12 July 1997; pp. 107–115. [Google Scholar]
  81. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  82. Alexey, N.; Alois, K. Gradient boosting machines, a tutorial. Front. Neurorobot. 2013, 7, 21. [Google Scholar]
  83. Elith, J.; Leathwick, J.R.; Hastie, T. A working guide to boosted regression trees. J. Anim. Ecol. 2008, 77, 802–813. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  84. Althuwaynee, O.F.; Pradhan, B.; Park, H.J.; Lee, J.H. A novel ensemble decision tree-based CHi-squared Automatic Interaction Detection (CHAID) and multivariate logistic regression models in landslide susceptibility mapping. Landslides 2014, 11, 1063–1078. [Google Scholar] [CrossRef]
  85. Landis, J.R.; Koch, G.G. The measurement of observer agreement for categorical data. Biometrics 1977, 33, 159–174. [Google Scholar] [CrossRef]
  86. Gorsevski, P.V.; Jankowski, P.; Gessler, P.E. Heuristic approach for mapping landslide hazard integrating fuzzy logic with analytic hierarchy process. Control Cybern. 2006, 35, 121–146. [Google Scholar]
  87. Efron, B.; Tibshirani, R. Cross-Validation and the Bootstrap: Estimating the Error Rate of a Prediction Rule; Stanford University: Stanford, CA, USA, 1995. [Google Scholar]
  88. Lv, J.; Gao, J.; Hu, F. Characteristics and development of the debris flow fan in Shuangjinshao Gully, Beijing. Res. Soil Water Conserv. 2010, 17, 140–143. [Google Scholar]
  89. Li, B.; Gao, J.; Hu, F.; Cui, Q.; Yang, Q.; Wang, Y. Granularity parameter of debris flow deposit in Wanghugou Gully, Beijing City. Sci. Soil Water Conserv. 2011, 9, 7–10. (In Chinese) [Google Scholar]
  90. Zhao, D.; Liu, H. Debris flow movement features of Sanhezhuang Village, Fangshan District. Urban Geol. 2016, 11, 60–63. (In Chinese) [Google Scholar]
  91. Huang, L.; Han, J.; Ji, W.; Zhang, L.; Qi, G. The characteristics of Chechangbeigou debris flow of Zhoukoudian Fangshan district and its stability evaluation. Urban Geol. 2016, 11, 48–55. (In Chinese) [Google Scholar]
  92. Cao, L.; Liu, Y. Research on prevention and treatment of mudslides in Mentougou district of Beijing city. Value Eng. 2016, 35, 210–212. (In Chinese) [Google Scholar]
  93. Yang, Q.; Gao, J.; Hu, F.; Liu, Y.; Zhang, J. Characteristics of the debris flow deposits in Daxigoubeigou gully, Beijing. Chin. J. Geol. Hazard Control 2010, 21, 39–41. (In Chinese) [Google Scholar]
  94. Zhou, R.; Wei, M.; Li, D.; Zhang, B.; Liu, Z.; Liu, C.; He, Y. Selected frequency luminescence characteristics for modern turbulent debris flow materials in Qingshui river basin, Beijing. Geogr. Res. 2012, 31, 619–626. (In Chinese) [Google Scholar]
  95. Yuan, F. Risk Assessment of the Debris Flow Gully at Qiulinpu Village of Fangshan District in Beijing. Master’s Thesis, China University of Geosciences, Beijing, China, 2014. (In Chinese). [Google Scholar]
  96. Han, S. Design of Debris Flow Prevention in Huanglianggenqiao Gully, Fengjiayu Town. Master’s Thesis, Beijing Forestry University, Beijing, China, 2016. (In Chinese). [Google Scholar]
  97. Shi, M. Study on Debris Flow Prediction and Earlier Warning System for Nanjiao Catchment, Beijing. Ph.D. Thesis, Jilin University, Changchun, China, 2016. (In Chinese). [Google Scholar]
  98. Ding, G.; Wang, Y.; Wang, C.; Yao, K.; Liu, H. Analysis of flow force characteristics of Damo gully and study on disaster mechanism. China Water Transp. 2017, 17, 218–220, 223. (In Chinese) [Google Scholar]
  99. Hirabayashi, Y.; Mahendran, R.; Koirala, S.; Konoshima, L.; Dai, Y.; Watanabe, S.; Kim, H.; Kanae, S. Global flood risk under climate change. Nat. Clim. Chang. 2013, 3, 816–821. [Google Scholar] [CrossRef]
  100. Milly, P.C.; Wetherald, R.T.; Dunne, K.A.; Delworth, T.L. Increasing risk of great floods in a changing climate. Nature 2002, 415, 514–517. [Google Scholar] [CrossRef]
  101. Arnell, N.W.; Gosling, S.N. The impacts of climate change on river flood risk at the global scale. Clim. Chang. 2016, 134, 387–401. [Google Scholar] [CrossRef]
  102. Ceola, S.; Laio, F.; Montanari, A. Satellite nighttime lights reveal increasing human exposure to floods worldwide. Geophys. Res. Lett. 2015, 41, 7184–7190. [Google Scholar] [CrossRef]
  103. Reder, A.; Rianna, G.; Vezzoli, R.; Mercogliano, P. Assessment of possible impacts of climate change on the hydrological regimes of different regions in China. Adv. Clim. Chang. Res. 2016, 7, 169–184. [Google Scholar] [CrossRef]
  104. Krušić, J.; Marjanović, M.; Samardžisć-Petrović, M.; Abolmasov, B.; Andrejev, K.; Miladinović, A. Comparison of expert, deterministic and Machine Learning approach for landslide susceptibility assessment in Ljubovija Municipality, Serbia. Geofizika 2017, 34, 251–273. [Google Scholar] [CrossRef]
Figure 1. Geographical setting of the study area.
Figure 1. Geographical setting of the study area.
Water 11 00638 g001
Figure 2. The photos of water-flood, debris-flood and debris-flow processes [47]. (a) Water-flood (Evans, S.G.); (b) Debris-flood (Hungr, O.); and, (c) Debris-flow (Suwa, H.).
Figure 2. The photos of water-flood, debris-flood and debris-flow processes [47]. (a) Water-flood (Evans, S.G.); (b) Debris-flood (Hungr, O.); and, (c) Debris-flow (Suwa, H.).
Water 11 00638 g002
Figure 3. Flow chart of the computational process.
Figure 3. Flow chart of the computational process.
Water 11 00638 g003
Figure 4. Numerical classes of parameters related to the catchment shape: (a) Cr; (b) Er; (c) Dd; (d) Ff; Numerical classes of parameters related to the relief gradient: (e) RI; (f) Mr; (g) Err; (h) Rr.
Figure 4. Numerical classes of parameters related to the catchment shape: (a) Cr; (b) Er; (c) Dd; (d) Ff; Numerical classes of parameters related to the relief gradient: (e) RI; (f) Mr; (g) Err; (h) Rr.
Water 11 00638 g004aWater 11 00638 g004b
Figure 5. Boxplot of the Min-Max normalized parameters grouped by the DFP types.
Figure 5. Boxplot of the Min-Max normalized parameters grouped by the DFP types.
Water 11 00638 g005
Figure 6. Receiver operating characteristic (ROC) curve and area under the receiver operating characteristic curve (AUROC) for (a) the testing dataset and (b) multi-class with RF.
Figure 6. Receiver operating characteristic (ROC) curve and area under the receiver operating characteristic curve (AUROC) for (a) the testing dataset and (b) multi-class with RF.
Water 11 00638 g006
Figure 7. Relative importance values of the parameters for the DFP types.
Figure 7. Relative importance values of the parameters for the DFP types.
Water 11 00638 g007
Figure 8. Distribution of the DFP types based on the RF model. The black points are coal mines and the brown lines are the main faults in Beijing mountainous area.
Figure 8. Distribution of the DFP types based on the RF model. The black points are coal mines and the brown lines are the main faults in Beijing mountainous area.
Water 11 00638 g008
Figure 9. Scatter plot of the RI versus elevation values for the DFP types. (a) Water-flood; (b) Debris-flood; and, (c) Debris-flow.
Figure 9. Scatter plot of the RI versus elevation values for the DFP types. (a) Water-flood; (b) Debris-flood; and, (c) Debris-flow.
Water 11 00638 g009
Figure 10. Scatter plot of the Rr versus slope values for the DFP types. (a) Water-flood; (b) Debris-flood; and, (c) Debris-flow.
Figure 10. Scatter plot of the Rr versus slope values for the DFP types. (a) Water-flood; (b) Debris-flood; and, (c) Debris-flow.
Water 11 00638 g010
Figure 11. Relationship between the influencing factors and catchment area of different DFP types. (a) Water-flood; (b) Debris-flood; and, (c) Debris-flow.
Figure 11. Relationship between the influencing factors and catchment area of different DFP types. (a) Water-flood; (b) Debris-flood; and, (c) Debris-flow.
Water 11 00638 g011aWater 11 00638 g011b
Table 1. Classification criteria for the three types of debris flow events.
Table 1. Classification criteria for the three types of debris flow events.
TypeFluid Density (t/m3)Volume Ratio
Water-flood<1.60.1–0.5
Debris-flood1.6–1.80.4–0.6
Debris-flow≥1.80.5–0.7
Table 2. Morphometric parameters related to the catchment shape and gradient relief.
Table 2. Morphometric parameters related to the catchment shape and gradient relief.
DefinitionFunctionParameterReference
Parameters related to the catchment shape
The circularity ratio (Cr) reflects the roundness of a catchment based on the analysis of the relationship between the area and circumference of catchment [50,51]. Cr = 4 π A C 2 A is the catchment area, km2;
C is the catchment circumference, km
[50,51]
The elongation ratio (Er) is defined as the ratio of the diameter of a circle with the same area as the catchment to the maximum catchment length [52]. Er = D B l D is the diameter of the circle which has the same area as the catchment, km;
Bl is the maximum length of the catchment, km
[52]
The drainage destiny (Dd) within the catchment per unit of area is the simplest and most convenient tool for the characterization of the degree of drainage development [53,54]. Dd = L A L is the total length of the streams, km;
A is the catchment area, km2;
both are given in units of the same system
[53,54]
The form factor (Ff) is defined as the ratio of the catchment area to the square of the catchment length [54]. Ff = A B l 2 A is the catchment area, km2;
Bl is the maximum length of the catchment, km
[54]
Parameters related to the relief gradient
The roughness index (RI) is the ratio of the surface area to its projected area [55]. RI = S 1 S 2 S 1 is the surface area, km2;
S 2 is the projected area, km2
[55]
The Melton ratio (Mr) is an index of the catchment ruggedness equal to the basin relief divided by the square root of the catchment area [56]. Mr = R A ÷ 1000 A RA is the catchment relief, m;
A is the catchment area, km2
[56]
The elevation relief ratio (Err) is the ratio of the difference between the average and minimum elevations of the catchment to the catchment relief [57]. Er = h m e a n h m i n h m a x h m i n h m e a n , h m i n , and h m a x are the mean, minimum, and maximum elevation of the catchment, km, respectively[57]
The relief ratio (Rr) is the dimensionless height length ratio [58]. Rr = R B l R is the relief of the catchment, km;
Bl is the maximum length of the catchment, km
[58]
Table 3. Model performance for the training and testing datasets.
Table 3. Model performance for the training and testing datasets.
ParametersTraining DatasetTesting Dataset
RFAdaBoostGBDTRFAdaBoostGBDT
RMSE00.53400.5441.0260.577
MAE00.22500.2650.4770.303
Recall10.80610.750.6060.712
Accuracy10.81410.7520.6070.709
F1-score10.80410.7380.5920.702
Kappa10.70810.6250.5320.568
AUROC1110.730.680.7
Table 4. Catchment type classification.
Table 4. Catchment type classification.
TypeCount Percentage (%)Area Percentage (%)
Water-flood20.0424.52
Debris-flood57.3241.92
Debris-flow22.6433.56
Table 5. Confusion matrix of the validation against the documentary dataset.
Table 5. Confusion matrix of the validation against the documentary dataset.
Predicted
Water-FloodDebris-FloodDebris-Flow
ObservedWater-flood701
Debris-flood000
Debris-flow033

Share and Cite

MDPI and ACS Style

Wang, N.; Cheng, W.; Zhao, M.; Liu, Q.; Wang, J. Identification of the Debris Flow Process Types within Catchments of Beijing Mountainous Area. Water 2019, 11, 638. https://doi.org/10.3390/w11040638

AMA Style

Wang N, Cheng W, Zhao M, Liu Q, Wang J. Identification of the Debris Flow Process Types within Catchments of Beijing Mountainous Area. Water. 2019; 11(4):638. https://doi.org/10.3390/w11040638

Chicago/Turabian Style

Wang, Nan, Weiming Cheng, Min Zhao, Qiangyi Liu, and Jing Wang. 2019. "Identification of the Debris Flow Process Types within Catchments of Beijing Mountainous Area" Water 11, no. 4: 638. https://doi.org/10.3390/w11040638

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop