Comparative analysis of methods applied in vegetation cover delimitation using Landsat 8 images

There is a wide availability of methods and techniques for classification of data from remote sensing images. However, one of the biggest challenges is to identify whether the applied method is really effective for the thematic mapping the terrain features. Thus, the aim of this work was to provide a comparative analysis involving data classification methods for mapping forest cover using orbital images from the Landsat 8 satellite. The applied method consisted of pre-processing the images, calculating the NDVI image, performing the infrared image composition and principal component analysis (PCA). The maximum likelihood classification method (MAXVER) was used to delimit the vegetation cover applied to the three types of databases. To validate the classification results, field data, Kappa analysis and pixel-by-pixel analysis were applied. The results pointed out that the NDVI method showed the least general similarity regarding to the reference data used for validation from the MAPBIOMAS project. It was possible to identify similar results in relation to the delimitation of forest cover. The results allowed identifying that the several methodologies available for classification of vegetation are of great value for the thematic mapping of forest resources. In addition, we conclude that the PCA showed the best capacity for delimiting the vegetation cover in the study region, closely followed by infrared composition, and the NDVI was the least accurate.


INTRODUCTION
Studies of changes in land use and land cover are of essential importance, especially those associated with the loss of vegetation cover.These changes can be associated with several anthropic and natural factors such as deforestation, floods, and climatic phenomena.Studies on the delimitation of vegetation cover can also provide important information to several socioeconomic applications, such as feeding subsidies or the creation of databases for environmental monitoring.These studies are mainly from image data and remote sensing techniques (BAN; GONG; GIRI, 2015;XUE;SU, 2017).
Through the reflectance and emission properties of earth's surface objects present throughout the different spectral bands, remote sensors are able to collect different responses of the solar radiation reflected or emitted in a given scene.The collected signal is important to image classification, i.e. a pattern recognition process (HAYES;SADER, 2001;MISRA et al, 2012).
According to Hayes and Sader (2001) and Misra et al (2012), the image classification aims at recognizing homogeneous patterns and objects, identifying similarities in the data to map a specific region of the earth's surface.In addition, Novo (2010) explains that this process uses technologies to numerically analyze digital image information.According to the author, the image classification can be defined as the value assigned to each pixel of the image according to the numerical property of the technique used., Hayes and Sader (2001) and Misra et al (2012) demonstrate that through the spectral signature of the different targets present on the earth's surface, it is possible to identify vegetation patterns, using different techniques based on vegetation indexes, such as NDVI and RGB composite band analysis According to Foody (2004), Lyons et al (2018) and Wei et al (2019) the advancements embodied to the image data capturing process by sensors on board satellite platforms has led to the development of several classification methodologies for mapping features on the earth's surface.The classification procedure can generate useful confidence intervals providing easy and straightforward interpretation of different objects present on the earth's surface.However, every result of a mapping from remote sensing data must be statistically evaluated in order to analyze the accuracy and fidelity of the classification.
The application of techniques that allow the reduction of information from remote sensing analysis without compromising the quality of the mapping to be performed is one of the current challenges for delimiting forest areas.This defiance is associated to the high correlation that the information of satellite bands presents among themselves over the different spectral bands (FORMAGGIO; SANCHES, 2017).
An effective way to minimize this high similarity of the spectral bands, removing redundancies and reducing the dimensionality of the data without losing essential information, is through statistical techniques applications such as principal component analysis (PCA).PCA is a method of linear transformation applied to a database that aims to get a response variable that can best represent a certain region.For this goal, information from the variance of the analyzed data set is used.In vegetation studies, this technique can be applied to extract specific spectral responses using data and remote sensing methods.The technique is limited by the spatial resolution of the study data that can interfere in the effective identification of areas with occurrence of forest cover (CHANDER; COAN; SCARAMUZZA, 2008;CHIANG, 2014;CŔOSTA et al, 2003;MISRA et al, 2012;TSAI;LIN;YOSHINO, 2007).In the literature, the application of PCA in products of the Landsat time series can be seen in works of mapping land use and geological use (VURAL; CORUMLUOGLU; ASRI, 2016), environmental changes in watersheds (BRAGA et al, 2019), delimitation of fire scars in vegetable lands (SILVA; COSTA; MATRICARDI, 2017) and landscape analysis with emphasis on vegetation (CRUZ et al, 2018).
Thus, the objective of this work was to provide a comparative analysis considering classification methods of land surface data for mapping forest cover using images from the Landsat 8 satellite.The used approach involves a comparative analysis encompassing the use of vegetation index, composite bands and principal component analysis.

Case study: Analysis of the Serra Azul stream sub-basin vegetation area
The research data used in this work correspond to the territorial area of the Serra Azul stream sub-basin, Minas Gerais -Brazil, geographically located between the parallels of 20º 15' and 20º 00' south latitude and the meridians 44º 15' and 44º 35' west longitude (Figure 1).The region has a reservoir, Serra Azul, which is responsible for the water supply of part of the metropolitan region of Belo Horizonte.The region is also subjected to an extensive agricultural activity spread throughout the territorial area.For the classification process, an image from the Landsat 8 satellite platform, referring to 19/09/2019 date, and samples of forest cover areas collected during field visits were used.The image was acquired through the Earth Explore project, from the United States Geological Survey -USGS, the coordinate system was converted to UTM projection, Zone 23 South and SIRGAS2000 datum (Geocentric Reference System for the Americas, held in 2000).

Remote sensing data pre-processing
The pre-processing of the satellite image, related to the atmospheric correction of the spectral bands, was carried out in the software QGIS 3.4.6(QGIS, 2019), using the resources made available in the Semi-Automatic Classification Plugin (SCP).According to Congedo (2016) and Leroux et al (2018), this plugin tool can automatically perform top of the atmosphere corrections in satellite images through a correction process called Dark Object Subtraction (DOS).This process, for its simplicity, is widely used in orbital images such as those of Landsat 8, Sentinel 2 and MODIS.According to Congedo (2016) and Leroux et al (2018), the method considers that there are always pixels in the image bands located in regions of radiometric shadows in which the corresponding radiation received in the sensor on board satellite is not the product of the target's reflection, but of atmospheric dispersion.Thus, it is necessary to convert the digital numbers of the image to the original radiance, calculate the reflectance at the top of the atmosphere and finally obtain the reflectance at the earth's surface level.

Normalized Difference Vegetation Index (NDVI)
Generation of the Normalized Difference Vegetation Index (NDVI) was achieved with the support of the Spring 5.2.7 software (CAMARA et al, 1996), using the "Arithmetic Operations" tool available in the "Image" menu.NDVI is widely used in researching works for delimiting vegetation cover, being characterized as an enhancement processing between the bands of the red spectrum and the near infrared spectrum of on board satellite remote sensors through mathematical operations.ROUSE et al (1973ROUSE et al ( , 1974) ) proposed the relationships presented in Equation 1, for the calculation of NDVI, which ranges from 1 to -1. (1) Where NIR is the reflectance in the near infrared spectrum and R is the reflectance in the red spectrum.
In the results of the NDVI calculations the pixel values closer to 1, correspond to a greater vegetation development and density.This is explained by a more intense response of the vegetation in the near infrared spectrum and very low in the red spectrum.The zero values are mainly because the reflectance of the clouds occurs similarly in the red and near infrared bands.Water is characterized by values close to -1, as it has a greater reflectance in the red band than in the near infrared band (MELO; SALES; OLIVEIRA, 2011).

Infrared image composition
The infrared image composition in order to enhance the region's vegetation cover was achieved with the 5, 4 and 3 bands of the Landsat 8 satellite.This technique has been used in the analysis of forest fragments (BARBOSA et al, 2018), analysis of vegetation composition and diversity (SHARMA; CHAUDHRY, 2018).
According to Ponzoni, Shimabukuro and Kuplich (2015), in the red spectrum of the visible region the vegetation is presented in a dark shade in the image due the absorption of red radiation by chlorophyll.In the near infrared spectrum region, the vegetation cover has a lighter gray level due to the spread of this kind of radiation throughout the vegetation cell structure, so being returned to the environment and increasing the response of vegetation in this spectrum.

Principal Component Analysis (PCA)
In the remote sensing literature, PCA is defined as a method to improve data interpretation, being widely used for the detection of vegetation as well as for changes in land use.In comparison with the original data, the results provided by application of PCA technique allow a better reading of the image features.Objects present on a land surface are better identified by applying the PCA if compared to the original spectral bands of the image.In Landsat images, about 97% of the information conveyed in the spectral bands is usually present in the first three principal components (PC), called PC1, PC2 and PC3 (ABDU, 2019;SABINS JR, 1987).
This multivariate tool was used in the present work for the classification of Landsat 8 satellite images.The transformation was performed using data from the six Landsat 8 bands, namely: 2 (blue -0.45 to 0.51 μm), 3 (green -0.53 to 0.59 μm), 4 (red -0.64 to 0.67 μm), 5 (near infrared -0.85 to 0.88 μm), 6 (short wave infrared 1 -1.57to 1.65 μm) and 7 (short wave infrared 2 -2.11 to 2.29 μm).According to Noble and Daniel (1986) and Jolliffe (2011), this PCA technique comprises changing the multivariate data related to the mean histogram of the data in matrix X, to satisfy the equality represented by Equation 2. The analysis of the histograms allows finding a vector ⃗ , called an eigenvector, and a scalar τ, called an eigenvalue.
The cov(X) represents the data covariance system, where the space is transformed into a smaller subspace, ⃗ represents the eigenvector and τ represents the eigenvalue.
After evaluating the variance and eigenvalues, the main components with the highest variance were selected for the classifications.From this analysis, only the first three factors (PC1, PC2 and PC3), which presented 99.9% of the data variability, were used in the classifications through band composition in the Spring 5.2.7 software platform (CAMARA et al, 1996) (Figure 2).

Image data classification
After the processing inherent to each specific data set, the classification was performed using the MAXVER algorithm available in the Spring 5.2.7 system.According to Camara et al (1996), the MAXVER classification uses the weighting analysis of the probability of occurrence between the average distances of the gray levels of the classes through sample training.This classification considers that the user has a good knowledge of the study area.
The classification was accomplished for the three types of data: NDVI image, infrared image composition and principal components PC1, PC2 and PC3.The images were classified considering two thematic classes -"forest cover" and "nonforest".Statistical analysis of a 99.9% range histogram was used, with an overall performance acceptance threshold of 99% and producer and user accuracy of 95% for the classification error matrix for map generation.

Data validation
Field data (Figure 3), pixel-by-pixel comparison through multiple windows and the Kappa analysis by means of KHAT statistic (a Kappa estimator) were used to validate the classification and to compare the study databases.According to White and Engelen (2000) and Hagen ( 2003), the pixel-by-pixel comparison procedure is widely used to validate the quality of a mapping resulting from remote sensing techniques.This process is intended to recognize local similarities by scrolling an image through a pre-established window.Thus, the method proposes to compare the pairs of categories of two maps, checking whether they are identical or differ from each other.For the case study, a window variation of 1 to 15 pixels was used.
According to Congalton and Green (1999), Kappa analysis is a discrete multivariate technique used in the evaluation of precision to determine statistically whether an error matrix is significantly different from the other.The result of carrying out a Kappa analysis is the ̂ (KHAT) statistic, a Kappa estimate that measures how well the classification was performed according to reference data, Equation 3.
Where k refers to the number of rows in the error matrix, n is the total number of observations (samples), is the number of observations in row i and column is the total of row i and is the total of column i.For this analysis, the map from MAPBIOMAS project was used as a reference image to make a comparison among the proposed classification methods.MAPBIOMAS is a project managed by the National Institute for Space Research that presents the classification of land use of all Biomes in the country through the use of data from the Landsat series, with spatial resolution of 30m.Since the focus of the work is on forest modifications, the classes of MAPBIOMAS data classes were reclassified so that two classes are considered, forest and non-forest.The forest class comprised all classes of native vegetation presented by the project, while the non-forest class comprised anthropogenic influences, such as agriculture, mining, urban areas and water.

RESULTS AND DISCUSSION
As shown in Table 1 and Figure 4, the methods delimited different extensions of coverage areas for the forest and non-forest classes in the study region.Regarding the MAPBIOMAS reference map, the results of the image classifications overestimate the forest area, by 39.20% (NDVI), 29.64% (infrared image) and 29.19% (PCA).Source: Authors.
The three methods should produce little different results, mainly related to overestimation of forest cover.The method allowed verifying differences concerning the classification techniques.Figure 5 shows the variations in results of the classifications along the special protection area present in the study region, which has forest areas along the entire special protection area.This is a region comprising a water supply reservoir for different municipalities provision, showing dense forest and plenty of water.Thus, four areas were chosen to exemplify these minor differences in results.Source: Authors.
In regions that have suffered anthropogenic influences associated with urbanization, as shown in Figure 6, it is possible to identify the variation of the method in the identification of forest areas.In area 2, with a predominance of forest, it was possible to identify that NDVI was the method that least identified these areas.In this respect, the infrared composition and PCA bands identified similar results.Regarding anthropized areas, corresponding to areas 1,3 and 4, it was found that PCA best identified non-forest areas.Through the four regions analyzed, it was possible to verify the variation in results of each classification method.Among them, the NDVI presented the greatest delimitation of non-forest areas amid the forest cover, showing itself as the one that had the greatest noise in the delimitation of vegetation.The PCA and the infrared image combination delimited a greater amount of vegetation cover, presenting a classification of non-forest in different regions.
In the validation step of the classifications performed with the field data, all methods indicated the same classification assigned to the field samples.It was not possible to verify differences between the classifications with the collection points.
The validation by Kappa analysis (Figure 7) identified that the infrared image and PCA presented similar results, with values between 0.60 and 0.62, which are considered strong according to the Spring software platform.NDVI had the lowest KHAT index, 0.47, of moderate classification.Regarding the percentage of correct answers, it was identified that the infrared image composition and PCA also presented similar results, between 0.81 and 0.82.The NDVI vegetation index had the lowest percentage of correct answers at 0.74.
This proximity may indicate that these selected original bands have a greater influence on PCA factors.The proximity of the resulting values can also be explained by the fact that bands 4 and 5 interact vigorously with the vegetation layer.According to Ponzoni and Shimabukuro (2010), vegetation interacts better in the bands of the visible spectrum, mainly red radiation, and the near infrared spectrum.This process is due to biological issues of vegetation associated with the production of chlorophyll and growth of the trunks.The validation by multiple windows allowed to verify the minimum and maximum similarity between the MAPBIOMAS reference data and the classifications performed (Figure8).It was possible to identify an increase in similarity values as the window size increased.The classifications of PCA components and infrared image composition showed similar results, ranging from 0.50 to 0.54 for minimum similarity and 0.82 to 0.90 for maximum similarity.NDVI data showed that the minimum similarity varied from 0.46 to 0.52 while the maximum similarity varied from 0.74 to 0.83.
The classification using NDVI showed the lowest general similarity (Figure 8).The low value demonstrates that the technique may have some limitations that may refer to noise in the image causing confusion in the image classification process.It was possible to identify areas of confusion, especially in regions of sparse vegetation, such as the areas in Cerrado biome.Ferreira Júnior (2015) indicates that for smaller regions, NDVI presents a better response in comparison with the other techniques analyzed, such as infrared image composition and PCA, for the identification of forest cover areas.
According to Herculano (2016), in the vegetation mapping applications, the noise present in the images can compromise the image classification process.Thus, the field visit is important to check the classification results.The same study indicates that in smaller areas, without major anthropic interference, NDVI and PCA may present similar results for mapping vegetation.
The literature demonstrates that the image classification can be affected by several noises that cause the object's poor classification.Ponzoni, Shimabukuro and Kuplich (2015) explain that the presence of shadows next to objects in the different bands of the spectrum, especially in non-forest regions, inducing a mapping error.Thus, it is always advisable to visit the study area to make adjustments in the object's identification.Digital image processing technology has been experiencing many advances over the past decades, which has allowed the improvement of classification techniques, as well as the comparison among methods.Chutia et al (2016), recommends that it is necessary to develop algorithms capable of extracting the best features from the land surface, especially when the technique used is not accessible for comprising training data.For example, Matasci et al (2015) suggest the use of non-linear methods such as Kernel Principal Component Analysis (KPCA) and Transfer Component Analysis (TCA).Using these techniques demands, however, robust data processing capabilities requiring some investments in computational resources.
The image's classification algorithms provide effective identification of patterns in images demonstrating that remote sensing is an effective tool for applications in the mapping of forms of land use and occupation, mainly associated with forest areas.However, is important to validate the data to guarantee the quality of the final work completed (FERREIRA JÚNIOR, 2015).

FINAL CONSIDERATIONS
The principal component analysis had results similar to the infrared image composition method, which allowed inferring that the three selected bands for infrared composition may have more influence than the others on the formation of the major variance factors of the principal components.Therefore, the PCA best characterized the vegetation cover in the region, being closely followed by infrared image composition, when validated with the land use data of the MAPBIOMAS project.
The anthropic areas, in the NDVI classification, show the least similarity degree.The several methods for classifying land use and cover based in remote sensing resources and image data have proven to be effective tools of useful interest for mapping different areas, especially those related to forested regions.In projects for mapping vegetation cover, special attention should be paid to the location of vegetation areas, as the effects of shadows and cloud interferences can introduce noise in the mapping results.Thus, it is strongly necessary to use field reference data to validate the classification results.

Figure 1 -
Figure 1 -Location of the study area, Serra Azul stream sub-basin.

Figure 3 -
Figure 3 -Field points data for classification validation.

Figure 4 -
Figure 4 -Result of image classification carried out by different techniques, namely NDVI, infrared bands (543) and PCA.

Figure 5 -
Figure 5 -Examples of variation in results of the classification of land use by different methods in some regions of the special protection area in the Serra Azul stream sub-basin, MG.

Figure 6 -
Figure 6 -Details of variation in results of the classification of land use by different methods in some regions of the anthropogenic area in the Serra Azul stream sub-basin, MG

Figure 7 -
Figure 7 -Result of validation by Kappa analysis for each database, namely NDVI, infrared composition and PCA.

Figure 8 -
Figure 8 -Result of validation by multiple windows for each database.(a) minimum validation window and (b) maximum validation window.

Table 1 .
Coverage, in hectares, of the analyzed land use classes for each database.