Landslides Susceptibility Modelling using Multivariate Logistic Regression Model in the Sahla Watershed in Northern Morocco

This study aimed to assess landslide susceptibility in the Sahla watershed in northern Morocco. Landslides hazard is the most frequent phenomenon in this part of the state due to its mountainous precarious environment. The abundance of rainfall makes this area suffer mass movements led to a notable adverse impact on the nearby settlements and infrastructures. There were 93 identified landslide scars. Landslide inventories were collected from Google Earth image interpretations. They were prepared out of landslide events in the past, and future landslide occurrence was predicted by correlating landslide predisposing factors. In this paper, landslide inventories are divided into two groups, one for landslide training and the other for validation. The Landslide Susceptibility Map (LSM) is prepared by Logistic Regression (LR) Statistical Method. Lithology, stream density, land use, slope curvature, elevation, topographic wetness index, slope aspect, and slope angle were used as conditioning factors. The Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) was employed to examine the performance of the model. In the analysis, the LR model results in 96% accuracy in the AUC. The LSM consists of the predicted landslide area. Hence it can be used to reduce the potential hazard linked with the landslides in the Sahla watershed area in Rif Mountains in northern Morocco.


INTRODUCTION
Mass movements are the most frequent natural hazards that affect large areas of the Rif mountains region in Northern Morocco, mostly triggered by heavy rainfall. It is one of the most reoccurring phenomena along with the Mountains chain threatening infrastructure and human properties.
Within the context of mass wasting, landslides can affect communities and influence their activities. Thus, mapping and delineating susceptible zones to landslides is important for land use activities and management decision making.
The method implemented in this paper has the overall objective of developing an understanding of slope instability processes and patterns at a regional scale.
The main objective of this study is to assess landslide hazards in the Sahla watershed which is a subject that has not to gain much interest in scientific publications in the Rif area. It is expected that during the process, many conditioning factors affecting slope instability in the Rif mountains will be known, thus giving land-use planners working on landslides the ability to make appropriate decisions based on the quantify analyses of the the spatial probability (susceptibility) of landslide hazards in the Sahla watershed with the use of LR. Multivariate statistical model in order To build a consistent landslide inventory for the study area using aerial photographs, satellite images, literature review, and field survey cartography.

STUDY AREA
Sahla sub-catchment is located in the Central Rif mountains, is a part of Wadi (river) catchment named Ouerrha (Figure 1) limited from Northeast by Sra subcatchment Wadi, Southeast by Ouerrha Wadi, from the West by Aoulai Wadi, on the south part is the confluence with Ouerrha Wadi. Its boundaries were defined by a ridgeline in the total area of 175 Km 2 . This area was chosen for its geological and geomorphological characteristics.
The study area belongs administratively to the region Fes Meknes, province of Taounate, municipality of Ghafsai, characterized by a high density of population (82.36 inhabitants per km 2 ). (HCP, 2014)

Environmental Data
Landslide inventories can be developed from field surveys by interpretation of remotely sensed images based on either the spectral characteristics, shape, contrast, and the morphological expression (Kanungo et al., 2006), or aerial photographs  and Google images interpretation (Xu et al., 2013). The largest number of Landslides were mapped from Google Earth images interpretation of Central Rif. A total of 93 landslide scars were mapped ( Figure 2). To use the landslide Data from Google Earth in the GIS environment, it is required to digitize the Data from Google Earth images interpretation. Then, these items were saved to the computer as GIS compatible format, and the Data was again subsequently converted into shapefile format, then into a raster format.
In susceptibility assessment, it is crucial to assume that future landslides will occur in the same condition that caused the past landslides (Varnes, 1984). There are no strict guidelines for causal factors selection to be used in landslides modeling, and as such, the selected predisposing factors vary widely between studies . Also, the determination of landslide predisposing factors was associated with the availability of Data. The entire landslide causal factors that this paper has used also fall in this category. Source: By the authors Landslide Data was used as a dependent variable of eight causal factors including slope, curvature, aspect, stream density, lithological facies, and land use pattern which were selected as independent variables for the landslide hazard mapping. All of these data are commonly employed in landslide susceptibility analysis. Budimir et al. (2015) mention that in a total of 37 variables commonlly used slope, aspect, and lithology, are significantly used especially on studies regarding rainfall-induced landslides. The relevance of the spatial Data combination used in the prediction became an important issue in mass movements susceptibility analysis (Dewitte et al., 2010). A high quality DEM provides a high quality of its derivatives.
In order to to carry out detailed geomorphological analysis, a DEM with 5m pixel size of the study area was built, it is generated from two types of data: countours with 5m interval and quoted points, the altimetric data is derived from the Moroccan National Agency for Land Conservation, Cadastre and Cartography (ANCFCC) at 1:50000 scale.

Slope Angles and Aspects
The slope angle is known as the inclination between the horizontal plane and the slope topographic surface. For classification objectives, it was considered the parameters already adopted in different works of literature and authors all around the world (Guillard-Gonçalves, 2016). The relationship between slope angle and landslide occurrence is very strong (Guzzetti et al., 2005). Thus, slope angles that have higher values, at least, up to a certain value range, tend to be related to an increase in landslide occurrence. Almost 70% of the watershed area is dominated by slope angles below or equal to 15° and that only 1.5% of the study area has slope angles above 30° (Figure 3).
Aspect is known as a plane tangent to a topographic surface. It identifies the downslope direction of the maximum rate of change in value from each cell to its neighbors. Thus, the aspect can be identified as the slope orientation in azimuth. Aspect is measured clockwise in degrees from 0 (North azimuth) to 360, coming full circle. The value of each cell in an aspect dataset indicates the direction of the cell's slope faces.
Flat areas having no downslope direction are given a value of -1 (Burrough, 1986).
The slope aspect is recognized as a crucial topographic factor. It affects the quantity and daily cycle of solar radiation received at different times of the year and has a big influence on the microclimate, especially air temperature, humidity, and soil moisture (Rosenberg et al., 1983). All these influences must be taken into consideration. Thus, incorporating the aspect as a predisposing factor for landslide susceptibility assessment through the statistically based model makes too much sense. The slopes, within the study area, are mostly exposed to Southwest and West ( Figure 4).

Inverse of the Wetness Index
The Topographic Wetness Index (IWI) is generally used to simulate the soil moisture conditions quantitatively in a watershed, and it is commonly used as an indicator for static soil moisture content ( Figure 5). Thus, it is considered an important factor in the research of soil erosion and distributed hydrological models in watersheds (Sørensen et al., 2006). While concave areas can retain water (high IWI values), steep and convex areas are more prone to shed water (low IWI values). The IWI uses Flow Direction and Flow Accumulation raster's as inputs.
Flow direction is derived from the digital elevation model, and, from it, we can obtain the contributing area (Flow Accumulation). Typically, the IWI values range from less than 1 (dry cells) to greater than 20 (wet cells). Threshold values are applied to the output raster, via classification, based on the researcher's knowledge of the field, field characteristics, and observations of the local terrain's response to heavy precipitation and runoff. Specifically, the IWI relates drainage areas with slope variations within a watershed and it can be expressed by the Equation 1, defined by Beven and Kirkby (1993): Where a is contributing upstream area (m 2 ) from flow accumulation raster, and β is the local slope angle (degrees). It is important to mention that for its calculations it is important to convert degrees to radians.
The Inverse Wetness Index application (Equation. 2) avoids the errors arising where cell division matches with β = 0, since a, corresponding to the denominator value (Oliveira, 2012).
There are a couple of algorithms to calculate flow direction: D8 and D∞. For this paper, the algorithm D∞ was selected. Such an algorithm enables the determination of multiple flow directions, providing thus, better results when compared to algorithms that only assume 8 possible directions of flow (Sørensen et al., 2006). The procedure was done under an application called TauDEM (Terrain Analysis Using Digital Elevation Models) for ArcGIS software, it requires the existence of a DEM free of sinks. Then, the flow direction model was derived from it. The TWI of the Sahla watershed was categorized into 7 classes to reveal better discrimination. For this reason, we applied a range of classes based on a logarithmic progression of base 10. The TWI of Sahla watershed demonstrates that the beginning class is the areas where β = 0. such areas are mostly located in the valley bottoms. The spatial distribution of potential water accumulation, it can be observed that generally increases due to the proximity to the streams (TWI classes ]0-0.00001] and ]0.00001-0.0001]) being, the permanent or ephemeral streams, the locations where water accumulates. The steepest slope areas are associated with the TWI classes ]0.0001 -0.001] and ]0.001 0.01], and the interfluves areas are dominated by the TWI classes ]0.01 -0.1] and > 0.1.

Stream density
Stream density or wetted index is a commonly used method to simulate the amount of water in the soil quantitatively (Beven and Kirkby, 1993). It was used to approximate the distribution of groundwater circulating in the study area. It is carried out by defining the number of line elements of fixed length in a fixed area (Süzen and Doyuran, 2004), it is calculated by dividing the total length of streams by the watershed area (Equation 3) Stream density creates a relationship between drainage areas and slope variations within a catchment area.
The numbers of line elements were calculated per km. As expected, the concentration of streams and the wetted index diminish with distance length linear magnitude per unit area. In the classification of the stream density of underground water circulation, no preference was given to any zone (Figure 6), and the area was classified into seven classes of equal density. Around 43% of the study area has a stream density between 4.2-6.2 while the highest density class (12.6-14.6) of the stream density map occupies just 1.56%.
Concave slopes with low gradient, usually drain water into it, and it leads to high giving a high value of Wetted Index, while convex slopes allow water to flow away from it giving these areas a low wetted index value. Generally, the stream density index, range from less than one in very dry areas to more than twenty in very humid areas. This index increases with increasing proximity to the hydrographical network with permanent streams having a higher wetted index than seasonal watercourses. The map was classified permitting the area of each class to be calculated.

Hypsometry
The hypsometry of the study area has altitudes ranging from 250m to 1200m and the general relief can be divided into three main units (Figure 7).This includes the southern part, which is a relatively flat area, the middle part where the dam is located, and the northern area that constitutes of highest altitudes in the area. These unities were classified following their altitudes, shape, and depth which are important components in relief defining. The study area has altitudes ranging from 250 m to 1200 m.

Lithological Facies
Lithology is among the most important conditioning factor affecting the mechanisms of mass movements (Terzaghi, 1953) and plays a fundamental role in the formation of shallow materials. It has a key impact in monitoring the nature and rate of geomorphological processes happening on the slopes. Landslides being a geomorphological process partially depend on the lithology and weathering specifications of the underlying materials (Selby, 1993). The lithology factor of the Sahla watershed is developed from the geology map of the Taounate-Ain Aïcha region with a scale of 1:50000 (Suter, 1964), (Figure 8). Detail lithological formations could not be determined at this scale. Therefore, small lithological facies areas could not be identified.

Slope curvature
Slopes curvature is the inverse of the radius of a circle tangent to the soil surface and it can be measured in three ways; longitudinal profile, transversal profile, or a tangential profile (Clerici et al., 2010). It is difficult to compare the relationship between curvature and slope instability due to the unspecified curvature types employed. Generally, the concave slopes are most susceptible, because it is associated with the focus of surface and subsurface runoff (Zêzere et al., 2004). In this paper, the profile curvatures option was chosen because it gives the rate of change of gradient or it measures the downslope trend and identifies different breaks on the slope.
The profile curvature map was classified into three classes and it expresses the variation between positive (concavities) and negative values (convexities) ( Figure  9). Thus, the class that corresponds to the rectilinear slopes and flat areas is defined by positive and negative values near zero. The other classes (representing concavities and convexities are defined by the limits -0.05 and 0.05.

Land Use
The land use data was developed by direct cartography from existing map 1:50000 then updated from satellite images, and fieldwork ( Figure 10). Land use types as small as 10 m2 were mapped as much as they were visible of the satellite images. The land use considers some characteristics that can have an impact on slope movements (Zêzere et al., 1999). The land use was classified into several classes as follows; bare rocky soil, croplands, croplands and shrublands, croplands and trees, dense croplands, dense reforestation, dense trees, high-density afforestation, mosaic forest/croplands, natural forests, open grassland with sparse shrubs, urban area, and low-density reforestation over shrubland. Since the area is Mountainous, it has vast empty land full of vegetation where cattle can feed on. Due to overgrazing and other anthropic activities, the grassland area is highly degraded and large areas are bear with very little vegetation and soils. Most of the area is rural with very few houses.

Logistic Regression Model
LR is a multivariate model (Chau and Chan, 2005), also called the logistic model or logit model, which has been widely used to estimate the probability of landslide occurrence usually by relating the dependent variable (landslides in our case) with a variety of geoenvironmental or independent variables (Guzzetti et al., 2005). LR can be discrete, continuous, or both, and factors for multi-regression must be numerical while those for discriminant analysis must have a normal distribution. This model uses the forward method (Lee and Pradhan, 2007) to analyze a binary response from several predisposing factors and regresses a dichotomous dependent variable on a set of the independent variables that can be continuous, interval, or categorized. In this study, LR was used to analyze the relationship between multiple independent variables (X1, X2, . . . Xn) (predisposing factors) and the dependent variable (y) (landslides). The LR method is based on three main assumptions.
-The dependent variable is dichotomous with landslides indicated as 1 when there is presence or 0 when they are absent. -The independent variables are continuous and should only be included for significant importance. -The probability Y is equal to 1 given distinct values of X. That is if X and Y has a positive linear correlation, the probability that landslide will have a score of Y = 1. This indicates that as X (factors that caused previous slides) increases, the likelihood that Y (landslides) will be equal to 1 will tend to increase. As X increases, the probability that Y = 1 increases. This is based on the presumption that landslides will always occur under the same conditions that caused past landslides. The LR model used a dichotomous dependent variable (Y), and this requires areas without landslides to be represented (Figure 11). Since the independent variable (Y) is dichotomous and the first value (Y = 1) representing areas with landslides has been acquired through the inventory, the other value (Y = 0) representing areas with no slides had to be obtained. This was accomplished by randomly generating points called non-points (pixels) within the study area by developing random points in relatively safe areas which are the gentle slopes with a low gradient. The binary variable employed in this study is limited to two outcomes, representing the occurrence or nonoccurrence of cases (coded as 1 or 0, respectively). The model predicts the probability of the event as a function of the independent variables (Youssef et al., 2015). Many authors have used it (Cox, 1958) to ascertain the probability of landslides occurring by associating slope movements motion to landslide conditioning factors and represent landslides as (1) when they are present or (0) when absent. The quantitative relationship between landslides occurrence and their dependency on precondition variables was examined through the LR model which is expressed in its simplest form in Equation 4.
Where: P = probability of landslide occurrence ranging between 0 and 1 Z = a linear combination of conditioning variables e z = exponent of conditioning factors Z assumes a function as in Equation (4).
B0 here is the "intercept" or "constant term". B1...n here are the coefficients of the LR curve and X is the independent variable.
After elimination of highly correlated dependent variables, the sample datasets were then used to input to the LR algorithm within R language to compute the correlation of landslide to each predisposing factor. The ahead stepwise LR was carried out to include only the predictor variables with an essential contribution to the presence of landslides.
The susceptibility index map was built by incorporating the coefficient (Table 1) of each factor and summing the list of factors. Among the seven predisposing factors employed in constructing the model, four of them had positive computed weights, which means that they are significant in causing landslides occurrence. Among these factors, slope angle stands out as the most important factor of landslides. (Selby, 1993). Source: By the authors Shallow landslides were frequent on concave and rectilinear slopes while rockfalls were recorded on convex slopes, thus making curvature an important factor. The slope orientation and elevation had mild significance while elevation and lithology had minimal effect on slope instability. The spatial probability of the area to landslides was assessed using the success rate curve (Bai et al., 2008) (Figure 12).
Landslide predisposing factors settled to be necessary by the correlation and association test were joined using LR (Equation. 6) to build the susceptibility map of the study area. The weighted thematic layers for shallow landslides were developed by multiplying the rasters for conditioning factors by their coefficient and the susceptibility index map was built by combining the weighted conditioning factor maps. This involved the incorporation of the weighted maps in the raster calculator and summing them (Equation. 6). The entire process can be mathematically expressed as: Y = (Twi × (-0.1608782431) + Streamdens × 0.1495128397 +Geolsett × (-0.0000006806) + Aspect × 0.0014400887 +Slope × 0.2364887615 +Elev×(-0.0057429231) +Curvature × (-0.1938426176) -4.4304492308 (6) Where TWI is topographic wetness index, Streamdens is stream density, Geolsett is lithology, Aspect is slope aspect, Slope is slope angle, and Curvature is slope curvature. whereas the numbers on the equation are conditioning factors coefficients excepting the last number which is the intercept. The combination of the spatial probability layers of conditioning variables (Equation. 5) gave the susceptibility map ( Figure 10).
The susceptibility map was built using prediction values calculated from probabilities of binary values and thematic maps. The colour ramp displays a maximum susceptibility index of 0.999999 and 4.0027e-07 as the lowest. Negative spatial probabilities did not exist in any area of the map. However, great differences in susceptibility exist. The south around the outlet of the dam is more probable to be affected by landslides than the extreme north and extreme south regions. Figure 10-Landslide Susceptibility applied the LR using equal interval classification.

RESULTS VALIDATION
The model validation was carried out using 50% of recorded landslides randomly selected and were validated employing a complete set of landslides. Multivariate regression analyses were used in model validation and consisted of creating a relationship between the total affected terrain and the non-affected part using success rate curve. Here, the validation group of landslides, 50% (Figure 13), logistic regression susceptibility map obtained from the initial modeling and the ROC curve was developed by computing the background values (areas without landslides) with the susceptibility map as input. The crossing points determine the goodness of fit of the curve. The ROC curve is a plot that establishes the relationship between sensitivity (proportion of true positives) against specificity (proportion of false positives) of the model at a series of thresholds for a positive outcome.  The predictive capability or the competence of the susceptibility maps was judged using the ROC curves (Zizioli et al., 2013). It is a plot that sets the relationship between sensitivity (proportion of true positives) against specificity (proportion of false positives) of the model at a series of thresholds for a positive outcome. The sensitivity which is plotted on the y-axis is the likelihood that the area with a landslide is correctly classified while specificity (false negative rate) is the probability that the area with no-landslide is correctly classified. The x-axis expressed as 1specificity represents the false positive rate (Jaiswal et al., 2010).
The determination of the AUC enables the quantitative evaluation of the overall predictive capability of LR susceptibility model (Beguería, 2006), ranging between 0 and 1. A value closer to 1 indicates the good predictive ability of the model. A casual predictive power will be manifested for an AUC value of about 0.5, describing a diagonal straight line ( Figure  14). AUC value below 0.5 means models with a terrible predictive capacity and should not be taken into consideration (Bi and Bennett, 2003); . The mathematical expression of the AUC is given by Equation 7 (Garcia et al., 2007;(Pereira et al., 2012).
Where x is the portion of the study area predicted as susceptible by descending order and y is the percentage of correctly classified landslide area belonging to the validation group. Guzzetti et al. (2005) indicates thur fineat AUC values between 0.75 and 0.8 correspond to an acceptable model, while AUC values ranging between 0.8 and 0.9 indicates a good susceptibility model, and finally, AUC values > 0.9 typify excellent models. The success curve has an AUC of 0.96. The curve has a good prediction power with 96% of the landslides righty captured by the model (Figure 14). The performance of a model with such values is good and capable to predict future landslide events in the study area (Guzzetti et al., 2005).

Source: By the authors
The LR ROC is the measurement of the correlation between unstable and stable areas. A greater number of landslides were captured by the prediction curve ( Figure 12) with 95% of the area found under the Curve. The diagonal line indicates a 50% probability of occurrence. Prediction curves with landslide values below the diagonal line are considered to have a low predictive capability and should not be admissible (Guzzetti et al., 2005). With the curve largely above the diagonal line, the model is perfect and accepted following the proposal of Guzzetti et al. (2005).

DISCUSSION
The LR model makes a relationship between all the variables and slope movements at once. This looks more suitable since one factor alone may not be enough to explain the slope failure. The interaction between combinations of factors might give quite a different result than when examined independently. For example, a moderate slope with a big quantity of weathered material might fail due to a serious undercutting during road construction although the moderate slope or road factors by themselves are insignificant. In the study area, human action most often acts as a trigger rather than a prominent conditioning factor. This makes the LR model more fit to assess landslides in the area.
The LR model gives the contribution of each factor (e.g., slope curvature and elevation) to landslides employing coefficient, where the other models provide them by sub-classes. This makes it easier for nongeographers or non-earth specialists to easily join the factors with high coefficients to delimit the anticipated hazard in an area. Land-use planners may even decide to take measures that scale back the effects of the variable and determine which level of risk they are ready to accept or to take action against.
The witness of the LR model is that it assumes the independent variables are continuous and should only be included for practical relevance. In this case, we run the risk of creating a model unstable if two or more independent variables measure has the same effect. The major limitation is in the fact that the LR model considers landslides as points with equal values rather than polygons thereby neglecting the variations in landslide size which is an essential component in a landslide as viewed by Aleotti and Chowdhury (1999) Guzzetti et al. (1999) who recognize landslide magnitude to be incorporated in Varnes (1984) definition of a landslide.

FINAL CONSIDERATIONS
LSM is a fundament of disaster risk evaluation. There are a large variety GIS-based qualitative and quantitative methods beneficial to examine the relationship between landslides and landslide predisposing factors. This study broadens the utilization of LR to make a susceptibility map index based on GIS and R.
This study presents the performance of LR. The model displays satisfactory results although using an equal number of landslide and non-landslide pixels shows lightly accurate results in total. It can be concluded that the landslide causal factors (i.e., Slope, curvature, aspect) have a notable impact in causing landslides. This study also shows that predicting likely occur landslides by using LR can be the most suitable choice although the result can be more accurate on a larger scale. Susceptibility assessment is an indispensable means to outline areas prone to landslide, and it has become crucial information for decisionmakers and government.