Canopy Cover Estimation in Lowland Forest in South Sumatera, Using LiDAR and Landsat 8 OLI imagery

Departement of Forest Management, Faculty of Forestry and Environment, IPB University, Academic Ring Road, Campus IPB Dramaga, Bogor, Indonesia 16680 Canopy Cover Estimation in Lowland Forest in South Sumatera, Using LiDAR and Landsat 8 OLI imagery Department of Forest Resource Conservation and Ecotourism, Faculty of Forestry and Environment, IPB University, Academic Ring Road, Campus IPB Dramaga, Bogor, Indonesia 16680 Received September 2, 2020/Accepted March 12, 2021 Canopy cover is one of the most important variables in ecology, hydrology, and forest management, and useful as a basis for defining forests. LiDAR is an active remote sensing method that provides the height information of an object in three dimensional space. The method allows for the mapping of terrain, canopy height and cover. Its only setback is that it has to be integrated with Landsat to cover a large area. The main objective of this study is to generate the canopy cover estimation model using Landsat 8 OLI and LiDAR. Landsat 8 OLI vegetation indices and LiDARderived canopy cover estimation, through First Return Canopy Index (FRCI) method, were used to obtain a regression model. The performance of this model was then assessed using correlation, aggregate deviation, and raster display. Lastly, the best canopy cover estimation was obtained using equation, FRCI = 2.22 + 5.63Ln(NDVI), 2 with R at 0.663, standard deviation at 0.161, correlation between actual and predicted value at 0.663, aggregate deviation at -0.182 and error at 56.10%.


Introduction
Canopy hydrology cover is one of the most important parameters used in ecology and forest management (Nakamura et al., 2017). It is defined as an area of land covered by vertical projections of canopies, which describes the structural conditions of a forest (Jennings et al., 1999). Furthermore, it is used as a critical variable for the definition and evaluation of forest gain (recovery) and loss (degradation and deforestation) (FAO, 2000). The measurement is useful for forest management applications such as for classification of forest structure, characterization of carbon sinks, forest fire behavior and fuel models, as well as the estimation of canopy light transmission (Ahmed et al., 2014).
The measurement of a canopy cover can be obtained either directly on the field or based on remote sensing technology. Field measurement provides more accurate data but relatively laborious and expensive. Meanwhile, remote sensing technology such as LiDAR offers large spatial coverage of an area, more efficient in time, less financial cost, and less human resources, and also more powerful for the detection and segmentation of trees - (Zhen et al., 2016). The technology is expected to replace the field measurement method (Hyyppa et al., 2001;Kim et al., 2016;Jeronimo et al., 2018;Irlan et al., 2020). LiDAR is an active remote sensing technology that provides height information of an object (Jakubowski et al., 2013). It calculates a distance by emitting a laser pulse and calculating the return time from the laser to its sensor after being reflected. LiDAR is well-suited for canopy cover estimation since it can penetrate through the canopy (Korhonen & Morsdorf, 2014). Other benefits of using this system are: can be used during day and night, more effective and efficient in operational cost compared to terrestrial surveys, can provide high precision and accurate elevation data, and can process a large amount of data in a short period of time (Jakubowski et al., 2013). Although this technology has promising benefits, it is often limited in spatial coverage, and is relatively expensive due to its acquisition cost. To overcome this limitation, it can be integrated with another remote sensing technology.
The integration LiDAR with Landsat 8 OLI is used to determine the potential of Landsat imagery, to accurately estimate forest canopy cover, in order to expand the measurement area (Ahmed et al., 2014).This integration could make LiDAR more cost-effective over larger areas (Hudak et al., 2002;Chen et al., 2012). The estimation of forest canopy cover can be obtained by using a regression between forest canopy cover, and vegetation indices that have been generated from LiDAR data and Landsat 8 OLI Imagery respectively. Vegetation indices are a mathematical combination or a transformation of spectral bands that Data LiDAR data was acquired using an airborne method on October, 2014, and then formatted in LASer (LAS). LAS is a file format for the interchange of 3-dimensional point cloud data. Afterwards, LiDAR pulses were converted into point clouds which were used to calculate the canopy cover.
The vegetation indices that have been proposed for determining the vigor and health of vegetation is intended to create better indices, which means that they take into account many factors, such as soil reflectance, atmosphere and vegetation density. All these improvements and modifications aim to get more reliable information about vegetation, based on their reflectances (Jakubowski et al., 2013). Canopy cover measurement is difficult. LiDAR makes measurement is easier and faster, as well as the coverage can be wider. So that building models and validation through Landsat/SPOT will be more effective. Therefore, the objective of this research is to examine the capability of vegetation indices to estimate forest canopy cover in dry lowland forests, in Sumatera Selatan, Indonesia.
accentuate the spectral properties of green plants: therefore, they can appear distinct from one another in features. Another research showed a result if vegetation indices gave a good result for detecting growth using remote sensing with standards error 9.16% (Jaya et al., 2019).

Location
This research was carried out in Harapan Forest of PT REKI, South Sumatera (Figure 1), which astronomically i s l o c a t e d a t E 1 0 3°7 ' 5 5 " -S 1 0 3°2 7 ' 3 9 " a n d S2°2'16"-S2°21'14" with elevation range of 30-120 m above the sea level. Administratively, it is located between Soralangun District and Batanghari District. This forest is a dry lowland type and is characterized by secondary forest, shrubs, opened area, and mixed plantations. The area has a humid tropical climate with even precipitation all year round and is divided into three categories: high, medium, and low secondary forest. In high secondary forest, the stratum consists of seedling (height < 1.5 m), sapling (heights > 1.5 m and diameter > 20 cm), pole (10 cm < diameter < 20 cm), and mature tree (diameter > 20 cm). The canopy cover in this category has a range between 71%-100%, and is dominated by Shorea sp., Litsea sp., and Palaquium sp. The medium is a transition between the high and low secondary forests. It has a canopy cover range of less than 40%, is dominated by pole (diameter 10-20cm), and is mostly covered by Shorea sp., Litsea sp., and Koompasia excelsa. The low secondary forest has a canopy cover of less than 40%, and is categorized as a very degraded forest (REKI, 2020). Shrubs and poles are commonly found in this region, with the low secondary forest particularly on the burnt areas. The LiDAR footprint located in the Harapan rainforest strips from west to east. LiDAR aquisition mode using full waveform dan discrete return with 800 m flying height and 500 Khz laser pulse frequency. waveform LiDAR and 6-8 points m for the discrete return LiDAR. For scanning angle we used field of View (FOV) 56, swath width 851 m, and forward overlap 6% for full waveform and 80% forward overlap for discrete return LiDAR.

Methods
Preceding the calculation, the quality of the point clouds was investigated to make sure that they had a complete classification, and their heights represented the entire population in the area. This quality check was conducted because classification and heights were the most important variables. The points cloud which had a usual height was filtered out, and used to generate the digital terrain model (DTM). Normalization was then performed to get the actual height of the canopy cover. Canopy cover estimation was calculated using first return cover index (FRCI) method as shown in Equation [1] , which utilized only the first and the single returns, because it was assumed that the last and intermediate were only providing little additional information (Korhonen et al., 2011).

[1]
Landsat 8 OLI is a satellite imagery that offers large spatial coverage, and wide range of spectral resolution. The Landsat imagery of path 125, row 6 was obtained in July 2014 from the United States Geological Surveys (USGS) (https://usgs.earthexplorer.com). This acquisition occurred three months before the LiDAR was obtained. Pre-processing data such as geometric and topographic correction were applied based on illumination correction (Tan et al., 2013). Geographic correction was conducted to correct distortions and assign the properties (and practical value) of the map to the images (Green et al., 2000). It was performed by reprojecting the initial projection from WGS 1984 Zone 48 N to WGS 1984 Zone 48 S. Then, topographic correction was performed using terrain illumination correction model by combining the mathematical model and empirical rotation model (Hudjimartsu et al., 2017). The model was the improvement of topographic influence correction (Tan et al., 2013). In mountainous areas, topography strongly influences the signal recorded by spaceborne optical sensors. In particular, the same surface cover slopes oriented away from and towards the sun which would appear darker and brighter, respectively, when compared to a horizontal geometry (Richter et al., 2009).
After the pre-processing had been carried out, the vegetation indices were calculated. Vegetation indices were calculated using raster calculator in Arcmap. Vegetation indices are defined as the mathematical transformation involving several bands from the optical sensors' imagery, which are used in creating new imagery that are more representative in presenting aspects related to vegetation.
The laser pulse transmitted by the LiDAR sensor can return as one (single return) or multiples returns (first, second, third, fourth and last return). Either the first canopy return or single canopy return is the most significance return, which is associated with the highest feature (tree canopy). To estimate the canopy cover, threshold for tree classification was set at 7 m, based on the field observation. So, object below this height was not included in the calculation. The result of the estimation was a point feature generated from 30 m × 30 m grid area, following Landsat 8 OLI image resolution. This point feature was then overlaid with Landsat vegetation indices. The canopy cover estimation value ranged between 0-100%. Vegetation indices were also derived from satellite data and have been widely used to assess variations in the physiological state and biophysical properties of vegetation (Huete et al., 2002). This study used normalized difference vegetation index (NDVI), green normalized vegetation index (GNDVI), and simple ratio vegetation index (SRVI) as predictors to predict the canopy cover estimation as shown in Equation [4] Refers to Equation [5], F(X, μ, σ) is the theoretical cumulative distribution function of the normal distribution function and (X) is the empirical distribution function of the note: Green = band with a wavelength of 500-600 nm; Red = band with a wavelength of 600-700 nm; NIR = band with a wavelength of 700-1000 nm.

Statistical methods: Building the regression model
Regression model was used in canopy cover equation, with FRCI as dependent variable and vegetation indices as independent variable. Before developing the regression model, several tests such as classic assumption and correlation test were carried out. The classic assumption test is a prerequisite in developing a regression model. For constructing this model, data should: 1) have a normal distribution, 2) have a homoscedasticity of errors, 3) have no autocorrelation occurrences, and 4) have no multicollinearity for the model estimation by using more than one independent variable. This research was only using one variable and a single date imagery, therefore, only the normality and heteroscedasticity of data were tested. The normality was tested using Kolmogorov-Smirnov test, which was first introduced by A. Kolmogorov (Kolmogorov, 1933) and later modified and proposed as a test by N. Smirnov (Smirnov, 1948). [5] [3] bx Exponential : y = ae [8] data. When the above equation results in large values of D, it indicates the data are not normal (Das & Imon, 2016). Heteroscedasticity test to find out whether there are differences in residual variance, from one observation to another. It is performed by the Glejser Test, using the SPSS program. The Glejser test is a well-known test for heteroscedasticity (Glejser, 1969), which is based on weak assumptions, and is very easy to implement. It checks for the presence of a systematic pattern in the variances of the errors, by estimating the auxiliary regression, where the absolute value of the residuals of the main equation is the dependent variable (Furno, 2005). The test decision is conducted when the significance value between the dependent variable and the residual value (Sig) is higher than 0.05. It means that there is no heteroscedasticity found in the data. Correlation test is used to determine the direction and strength of the relationship between two variables. While correlation measures the monotonic association between two variables. This monotonic relationship refers to a situation when both variables increases simultaneously, or when as one variable increases, the other decreases (Schober et al., 2018). The correlation test uses the equation as shown in Equation [6].
After building the regression model, an equation was 2 selected using the coefficient of determination (R ), the 2 standard error value (SE), and the analysis of variance. R shows the ability of the independent variable to explain the dependent variable. Furthermore, it describes and expresses the level of accuracy and closeness of the independent to the 2 dependent variable as a percentage. R is calculated using the formula as shown in Equation [12] (Draper & Smith, 1998).
The canopy cover estimation model was prepared using the equation models as shown in Equation [7][8][9][10][11] [6] 2 Quadratic : y = a + bx + cx [11] Note: r is the correlation coefficient, x is the independent i variable (vegetation index), and y is the dependent variable i (FRCI). Correlation coefficient values range between -1 ≤ r ≤ 1. A correlation coefficient closes to -1 indicates that there is a close relationship between the independent variable, and it has a negative slope for data distribution. Meanwhile, when it is close to +1, it shows that there is a close relationship between the free variable and the independent variable, and it has a positive data distribution slope. A regression model is prepared after the classic assumption and correlation test were done.

Linear : y = a + bx [7]
Logarithmic : y = a + bln(x)  The Standard Error value indicates the average distance of observational data from the regression line. The smaller the SE value, the more precise the model is in predicting. The Standard Error value is calculated using the formula as shown in Equation [13]. [13] note: S is the standard error value, Y is the actual value of Y.X canopy cover, and N-2 is the degree of freedom.

Model validation of regression
All data in the LiDAR strip were utilized during the validation process using the census methods. The plots for validation must not be covered by clouds. Validation testing was performed using the correlation criteria between LiDAR FRCI and estimated FRCI value, aggregate deviation, and visual comparison of raster model estimation with LiDAR raster data display.
The correlation test between LiDAR FRCI value and the value of estimated FRCI from regression model was conducted by testing the correlation between the percent cover values of the estimator canopy model with the percent cover value of the LiDAR. The greater the correlation value, the more it shows that the value of the percent cover estimation model created has a close relationship with the percent cover value of the LiDAR header data (Drapper and Smith, 1992).
prediction minus the mean for that variable. TSS is the total sum of squares associated with the outcome variable obtained from the sum of the squares of the measurements minus their mean. RSS is the sum of the squares of the measurements minus the prediction from the linear regression.
Visual comparison of the raster model It is comparative assessment of raster model estimation display and LiDAR raster display performed by visualization. The LiDAR FRCI raster display was better than the raster model estimation when visually they have a similar appearance Selection of the best regression model It is obtained by scoring the correlation criteria and aggregate deviation values. Meanwhile, scoring is made by weighing the correlation value and the aggregate deviation. The best model has the highest scoring value. Meanwhile, the criteria for this highest score is a large correlation value and a small aggregate deviation value. Equation [15] and Equation [16] are the scoring formula (Jaya, 2010).

[14]
Aggregate deviation is the difference between the sum of the canopy density of estimator model and the percentage of canopy cover from LiDAR data. Good aggregate deviation values range from -1 to 1. Aggregate deviation values according to Drapper and Smith (1992)  LiDAR data canopy cover calculations were performed by calculating the canopy cover in one Landsat pixel size of 30 m 30 m. The percentage calculation of canopy cover × was performed using the FRCI method which is based on the reflection of the first and single return that hit the trees. These returns were defined as the point cloud having a minimum height of 7 m. The number of sample observations used for generating the model were 338 sample plots with the high density sample class contributing the highest at 125 plots or 85.76% of the canopy cover per pixel. While the lowest contributor was the low density class which contributed only 99 and had an average of 23.67% canopy cover per plots pixel.
[16] [15] Plot sample identification The sample point was selected using purposive sampling by considering the presence of clouds, stand density based on canopy cover value, and canopy height (canopy height model). These points are scattered on the LiDAR line, in the work area of PT REKI. The distribution of sample points on the LiDAR footprintpath at PT REKI is presented in 1. Table   The FRCI value represents the percentage of canopy cover in one pixel with a size of 30 m 30 m. It is strongly × influenced by the specified height of the trees and the scan angle . The accuracy of FRCI values is influenced by the scan angle, because a large scan angle can cause inequality in tree canopy sampling. The error in its estimation can be caused by the density of the LiDAR also point. However, increasing this density does not necessarily increase the accuracy of FRCI estimation due to excessive data and the effects of classification on LiDAR points .

Results and Discussion
The correlation value for all vegetation indices on the overall index is positive, meaning that there is a positive linear relationship between independent and dependent variables. Therefore, an increase in the value of the independent variable will be followed by a simultaneous increase in the value of the dependent variable. Based on Table 2, there is a strong correlation between the independent variables, therefore only one was used in the preparation of the model to avoid multicollinearity.
Correlation analysis Correlation analysis was performed to determine the relationship between the independent and the dependent variable. Furthermore, the independent variable was the vegetation index consisting of NDVI, SRVI, and GNDVI, while the dependent was the percent cover value of the LiDAR data from FRCI. Correlation value was obtained through Pearson correlation value. Pearson correlation test results between FRCI with vegetation index are presented in will cause a simultaneous increase in the FRCI value. However, this only applies to certain intervals of vegetation value. Based on Figure 2, it can be seen that the NDVI value is above 0.75, whatever the FRCI value might be, the index value will always be in the range of 0.75. When the SRVI value is above 7, regardless the FRCI value might be, the index value will always be in the range of 7. When the GNDVI value is in the range of 0.65, whatever FRCI value, the index will always be the same, in the range of 0.65. This can be due to the effect of the spectral sensor response factor on the vegetation. An increase in the value of the vegetation index is not always followed by an increase in the value of FRCI, especially in a vegetation that has reached stagnant growth. The research of Yengoh et al. (2015) said, when photosynthesis has reached its maximum, there will be no The percentage of cover estimator model was made through a regression model with the dependent variable is the percentage of the cover canopy from LiDAR (FRCI) and the independent variable is the vegetation indexes. The regression analysis results of the percentage canopy cover estimator model are presented in Table 3.
Correlation value reflects the relationship between percentage of canopy cover from models and LiDAR. When The result of the validation test in the correlation criteria between canopy cover value from LiDAR and canopy cover estimation from models shows that they have an average correlation of 0.554. The highest value was recorded in the R7 model with a correlation value of 0.663, while the lowest was in the R12 model with a value of 0.389. change in the NDVI value because all the visible light has been absorbed.    The best canopy density estimation model according to Table 4 is the R7 model because, in addition with the NDVI, it has the highest score compared to other models. Furthermore, R7 has the equation, = 2.22 + 5.63Ln FRCI ( ). Prasetyo et al., (2019) found that NDVI has a good NDVI correlation with FRCI. Regression between LiDAR FRCI and FRCI models is intended to see how far FRCI values can explain the LiDAR FRCI and the errors generated from the regression model. The regression obtained the equation LiDAR FRCI = 0.332 + 0.712 FRCI R7, with 0.0150 as standard error and R at 43.9%. This shows that the LiDAR 2 FRCI can be explained by the best FRCI model as 43.9%, and the resulting error is at 56.3%. These results indicate that there are still sizable errors because the FRCI variable can only be explained with LiDAR FRCI variable at 43.9%.

Selection of the best regression model
The best model was selected by scoring using correlation criteria and aggregate deviation values. The best model has the highest correlation value criteria and the smallest aggregate deviation value. The results of the scoring are presented in Table 4.
Aggregate deviation is the difference between the percentage of canopy cover from models and LiDAR data. Deviation values have an average of -0,162. The largest aggregate deviation value was recorded in the R10 model with a value of -0.08, while the lowest was in the R12 model with a value of -0.221. Negative values have a tendency causing a deviation which tends to be lower than the actual value (underestimation). The raster display shows that of the 15 models built, only 4 (R1, R3, R7 and R9) have a raster look that resembles a LiDAR raster.
Based on the results of model testing and validation, the FRCI NDVI = 2.22 + 5.63Ln ( ) model was obtained as the best to explain the relationship between LiDAR FRCI and vegetation index. The equation has an R of 43.9%. Based on 2 this, it can be indicated that there are still some errors in the equation model. The error is allegedly caused by several factors such as the difference in acquisition time and spatial differences between LiDAR and satellite imagery. LiDAR was acquired on October 2014, while the Landsat imagery, which was used to create a vegetation index, was acquired on July on the same year.

Discussion
Comparison between LiDAR raster display and the best chosen model This comparison was made to see the accuracy of the model in estimating the percentage of canopy cover. Figure 3 shows a comparison of the best LiDAR raster and landscape model displays in some regions in the LiDAR pathway.
Knowingly that PT REKI is a secondary forest, it is very possible that changes in land cover will occur, especially in the bushes. LiDAR and Landsat have spatial resolutions of 0.5 and 30 m respectively. In addition, very detailed spatial resolution in LiDAR data enables a more accurate calculation of percentage of cover canopy and better identification between tree canopy and shrubs. However, calculating vegetation index values at 30 m will result resolution in more general values and cannot separate tree and non-tree class of vegetation.   Figure 4 shows some differences between the pixels found in LiDAR, and in Landsat, especially in the model. The appearance of raster in the model image tends to have a more yellowish color when compared to the LiDAR raster. This indicates that the estimated value of the model was actually underestimated when compared to LiDAR. Besides the different raster displays, the effects of different spatial resolutions are also seen through the FRCI values on LiDAR, which are calculated using different resolutions, a resolution of 30 m and 0.5 m. A comparison of FRCI values is presented in Table 5.
The effect on different spatial resolution between LiDAR and landsat imagery To see the effects, it was performed by comparing several pixels on Landsat used manual interpretation with those on LiDAR, with the same land cover. For example, several pixels were taken in the segment 1 of the LiDAR line. A comparison of the effects of spatial resolution between LiDAR and Landsat is presented in Figure 4. best model was underestimated and therefore had low accuracy. Furthermore, its accuracy and consistency are influenced by several factors, including the difference in spatial resolution and in acquisition time between LiDAR and Landsat.
The effect on differeny acquisition date between LiDAR and landsat imagery To see the effect of the different acquisition date on the model prediction, several pixels in Landsat were compared with those in LiDAR raster on different land covers. For example, several pixels were taken in the segment 1 of the LiDAR footprint-path. This comparison is shown in Figure 5. have higher values when compared to the same values at a resolution of 0.5 m. In addition, the value of FRCI on LiDAR with a resolution of 0.5 m tends to have values that are closer to the FRCI model. This proves that the LiDAR spatial resolution used can affect the accuracy value of the model. The higher the spatial resolution of the image, the lower the effect of the problem of mixing pixels and has the potential to extract much more detailed information about land use or land cover structures.

Conclusion
The followings are the differences in FRCI values due to differences in acquisition time in Table . 6 Table 6 shows that the value of LiDAR FRCI, with a spatial resolution of 30 and 0.5 m does not have a different FRCI value, but when it is compared to the FRCI values from the model, the difference is quite significant.The difference FRCI value between Lidar and model is due to the difference in acquisation time, as within that period, significant changes occurred in the level of vegetation in the forest.
pixels. In the case of the Landsat pixels, Figure 5 (b) shows a green color that indicates land cover with vegetation, while in Figure 5 (a), the red colored indicates a bare land cover. This results in differences in the value of FRCI on LiDAR and Landsat.
Canopy cover measurement in the field is difficult. LiDAR makes the measurement easier and faster, as well as the coverage can be wider. So that building models and validation through Landsat/SPOT could be extended to larger areas. The research is set to examine the capability of vegetation indices in estimating forest canopy cover in dry lowland forests in Sumatera Selatan, Indonesia. This research concludes that the best regression model is the R7 model. The equation of the regression is FRCI = 2.22 + 2 5.63Ln (NDVI). It has R of 63.80%, standard error of adj 0.161, correlation between actual FRCI and the alleged FRCI of 0.663, aggregate deviation of -0.182, and an error of 56.10%. This model was underestimated and, therefore, has low accuracy. Furthermore, its accuracy and consistency are influenced by the differences in spatial resolution and acquisition time between LiDAR and Landsat.