Regression Models for Estimating Aboveground Biomass and Stand Volume Using Landsat-Based Indices in Post-Mining Area

,


Introduction
Mining is an extraction activity to obtain natural resources from the earth layer.Some mining products are coal, metals, oil, etc.Despite mining has high economic value, in the other hand, it also one of the sectors that can lead to decreasing of environmental quality (Dontala et al., 2015;Aguirre-villegas & Benson 2017;Arshi, 2017).However, the environmental damage that occurs as a result of mining activity can be restored through reclamation.Reclamation assessment in Indonesia stated in Ministry of Forestry Regulation Number P.60 of 2009 about Guidelines for Assessment of Forest Reclamation Success where this regulation applied in forest area.There are three criteria for forest reclamation success in this regulation such as land arrangement, erosion and sedimentation control, and revegetation.Based on this regulation, revegetation criteria has the highest weight (50%) than the others criteria.This statement indicate that re-vegetation become the highest contributor that determine the value of reclamation success in Indonesia.
Monitoring of reclaimed area provides information related to post mining rehabilitation.Conventional methods such as field measurement and laboratory studies to monitor changes of vegetation cover and vegetation health mapping requires high labor force and time.The method may be suitable for local scale analysis, but are inefficient for regional scale (Karan et al., 2016;Lu et al., 2016).The availability of data and remote sensing technology is an opportunity to develop a method that is practical especially for natural resources monitoring.Field measurement become the primary data for regression development and model validation.
Result of vegetation inventory is basis data for management decision whether the reclaimed area was proposed for conservation area or another integrated uses such as plantation, agroforestry or silvopasture for local community and also ecotourism by considering environment conditions (Kodir et al., 2017).Vegetation condition such as basal area, biomass and stand volume become vegetation variables that feasible to measure and also reflects the environment condition in reclamation.Basal area become an applicable vegetation variable that measured in field measurement while biomass acts as ecological reflection and stand volume acts as economical reflection in reclamation area.Biomass represents the carbon stock, where 40-50% of biomass in tropical forest stored carbon (Pan et al., 2011;Min et al., 2013).There was several previous research that used remote sensing approach both from active or passive energy source, very high or moderate resolution for monitor or inventories vegetation variable depend on the availability of the data (Cartus et al., 2012;Mora et al., 2013;Sinha et al., 2015;Tanaka et al., 2015;Lu et al., 2016;Peng et al., 2019;Dos Reis et al., 2019;Chen et al., 2020;Hawryło et al., 2020;Salazar & Garcia 2020;Zhu et al., 2020).
This research integrates field measurement data with remote sensing approach that aims to develop regression model for estimating basal area, aboveground biomass and stand volume as a tool for reclamation monitoring and inventory.Remote sensing data was correlated with field measurement data to obtain reflection in terrestrial condition (basal area, aboveground biomass and stand volume) based on remote sensing data.

Methods
Time and location Field measurement conducted from January-April 2020 located at mining coal company namely PT Berau Coal, site Sambarata, Berau District, East Kalimantan Province, Indonesia while data processing conducted from May-October 2020 in IPB University, Bogor, West Java, Indonesia.
Research focused only in the revegetation area where dominated with sengon (Paraserianthes sp.), johar (Cassia siamea), akasia (Acacia mangium), and almost 80% of revegetation area consist of those vegetation as a result of primary revegetation.Research location showed in Figure 1.

Data and tools
Data that used in this research were 1).high resolution imagery (Worldview 1: Red Green Blue band composite in 2019) for selecting plot location; 2).Landsat 8 vegetation indices derived from surface reflectance data for vegetation indices value with Landsat 8 scene ID: LC81170582019123LGN01, Path 117 Row 58 acquired May 3th 2019; 3).Field measurement data that consist of 1-9 years old vegetation (planted year from 2011-2019).Vegetation type and diameter were recorded in tally sheet for terrestrial sample at selected plot that appropriate with Landsat 8-pixel imagery (30×30 m).
Tools that used in this research were 1) GPS Garmin64s for navigates to the selected plot; 2) laptop that installed ArcMap 10.5 for designing field measurement plot and spatial data processing, Microsoft Excel for field measurement data recapitulation, R studio for statistical analysis; 3) vegetation analysis tools for field measurement such as tape (centimetres unit) for diameter measurement and tally sheet for recording field measurement data; 4) compass and tape (meters unit) for building plot on selected location.
Research procedure Designing field measurement plot Plot designing started by creating grid plot that adjusted with Landsat 8 spatial resolution in the entire revegetated area using ArcMap 10.5 application.Plot for field measurement was 30×30 m where that size adjusted with Landsat 8 spatial resolution (Knight & Kvaran, 2014).Plot location selected using purposive sampling method by considering plantation age (planted year) and revegetation performance in every plantation age observed from high resolution imagery (Worldview 1).
Total plot for this research were 50 plot that distributed in every plantation age.Those 50 plots divided into 35 plots that Figure 1 Research location.
Jurnal Manajemen Hutan Tropika, 28(1), 1-14, April 2022EISSN: 2089-2063DOI: 10.7226/jtfm.28.1.1 ISSN: 2087-0469 used to build the model and the 15 remaining plots were used to verify the model (validation).Vegetation age consideration hopefully reflect the variance between the youngest until the oldest vegetation condition whether in terrestrial fact or in remote sensing value.Based on this assumption, it's important to locate sample plot in every plantation age and selected plot in every plantation age must be homogen within 30 × 30 m plot size, therefore the spectral reflectance or vegetation indices represented homogenous object.Illustration for selected plot in 1, 6, and 9 years old vegetation showed in Figure 2.
Landsat 8 vegetation indices A common remote sensing method for monitoring vegetation involves the use of data transformations.The transformations known as vegetation indices (VI) are defined as dimensionless, radiometric measures of vegetation, and have been used as indicators of the relative abundance, density, and vigor of green vegetation.The most typical wavelengths used in vegetation studies are Red (R) and Near Infra-Red (NIR) bands, although other bands have also been used.The reason for focusing on red and near-infrared is that these two wavelength regions tend to have particularly strong associations with vegetation (Kazar & Warner 2013).
There are several vegetation indices that found, but only vegetation indices provided by USGS were used in this research such as Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI), Soil Adjusted Vegetation Index (SAVI), Modified Soil Adjusted Vegetation Index (MSAVI), Normalized Difference Moisture Index (NDMI), Normalized Burn Ratio (NBR), and Normalized Burn Ratio 2 (NBR2).Those vegetation indices obtained by ordering through https://espa.cr.usgs.gov/.Formula of vegetation indices that used exactly same with formula that used by Gizachew et al (2016) Illustration for natural color and vegetation indices at research location showed in Figure 3.
Field measurement and derivation Field measurement started by building plot for vegetation analysis sized 30 × 30 m that already designed and selected in previous step.Model development flow for estimating basal area, aboveground biomass and stand volume showed in Figure 4.

Results and Discussion
Field measurement & remote sensing approach Individual and diameter distribution that indicates the horizontal distribution of the vegetation in each age showed in .Table 3 and Table 4 Based on Table 3 and Table 4, vegetation type divided into 4 class where there were three class vegetation that became main vegetation for revegetation (Paraserianthes sp., C. siamea, A. mangium) and the remaining class was the other vegetation except 3 main vegetation that mentioned previous.Paraserianthes sp.became the most abundant vegetation followed by C. siamea, this phenomenon indicates that in revegetation area dominated consistently with planted vegetation.Reclamation method such as revegetation or natural regeneration (succession) determines the ecosystem that formed where in this case revegetation method was selected and implied to the dominant vegetation in the reclamation area (Šebelíková et al., 2016).
Actually, in planted year 2011 has high vegetation density whether in individual density or use of space (diameter).This statement supported by a plot that has representative location in planted year 2011 with density of      195,982.83cm ha (Figure 5a) while the three remaining plot located at the side of the road and implied to the availability of the tree on those three plot.Based on this value, this one representative plot in planted year 2011 has 300 individual ha 1 with diameter 28.84 cm for each individual tree.Furthermore, this case affects to the data distribution of basal area, aboveground biomass and stand volume in the sampled plot graph especially in planted year 2011 that showed in Figure 5 and also the vegetation indices in these three plot that showed in Figure 6.
Generally, basal area, aboveground biomass, and stand volume at sampled plot increases in every increment in the age although based on individual density there was fluctuated.Those phenomenon indicates that there was increasing in vegetation growth especially in use of space.Reclamation increase the performance of environmental improvisation at post mining area (Skaloš et al., 2015).Growth dynamic of basal area, aboveground biomass and stand volume in each plantation age at sampled plot showed in Figure 5.
Based on remote sensing approach, the value of vegetation indices also reflects the same pattern as vegetation variables where the youngest vegetation has low value and increasing in every increment of age then decreasing occurs in several plots at 8-9 years old.There are variety in reflecting the plantation age, this variety explained by dividing value into 4 classes ranged from 0-0.25 (q1); 0.25-0.50(q2); 0.25-0.75(q3); and 0.75-1 (q4).NDVI was the only vegetation indices that distributed value ranged from q3 and q4 also NDMI distributed value from q1 and q2 where the others ranged from q2 and q3.Especially for NBR2, the value only ranged in class q2.This phenomenon indicates the capabilities of vegetation indices to reflect the vegetation growth based on remote sensing approach.Vegetation indices unit in this research are multiplied by 10000 as given by the USGS where the general unit that used in vegetation indices usually from -0.1 up to 0.1.Vegetation indices dynamics in each plantation age showed in Figure 6.(Ahmed et al, 2011;Eckert, 2012;Wahyuni et al., 2016).Correlation value ranged from -1 to 1, negative value indicates the relation that formed was inverse and positive indicates the relation that formed was parallel.Relationship between a variable with another variable can be said strong if the correlation value closer to 1 or -1 and weak if the correlation value closer to 0 (Schober et al., 2018).Correlation between Landsat 8 surface reflectance and vegetation indices with basal area, aboveground biomass and stand volume listed in Table 5 while Kolmogorov Smirnoff test for vegetation indices normality distribution showed in Table 6.

Correlation and variable normality test
Based on Table 5, correlation between Landsat 8 surface reflectance and vegetation indices with basal area, aboveground biomass and stand volume ranged from -0.67 to 0.82.All vegetation indices have higher correlation value than 0.7 except correlation between NDVI with stand volume.In surface reflectance side, only Band5 that has correlation value higher than 0.65 while Band4 (Red Band), and Band7 (Short Wave Infra-Red 2 Band) has correlation value higher than 0.65 in relation with basal area and aboveground biomass only.Based on that, Band5 (Near Infra-Red Band), Band4 (Red Band), and Band7 (Short Wave Infra-Red 2 Band) became single band that related to the vegetation condition.Near Infra-Red Band (NIR Band) sensitive to vegetation while Band4 (Red Band) sensitive to soil properties (Ding et al., 2014).Due to sensitivity to vegetation and soil properties, Band5 and Band4 were the most used component in equation to obtain vegetation indices value.Band5 (NIR Band) used to calculate NDVI, SAVI, MSAVI, NBR, NDMI, EVI, and Band4 used to calculate NDVI, SAVI, MSAVI, and EVI.
Table 6 listed distribution of the data that used to estimates vegetation variables based on remote sensing approach (vegetation indices).As statistics requirement, variable that used to estimate another variable has to normally distributed so the pattern of the regression line run properly.Kolmogorov Smirnoff test was used to test the normality of the variable, data that has normal distribution has p-value higher than 0.05 with confidence level 95%.Data that has normal distribution identified by the intensity chart formed bell shaped which means the most frequent value that appear located in the middle of x axes or mean equal or significantly close with median and mode (Ainiyah et al., 2016).Based on Table 6, all of the variables has normal distribution except NDVI variable that means NDVI unable to use as independent variable to estimate vegetation variables.

Model development Model development means develop regression equation to estimate dependent variable in this
case three vegetation variables such as basal area, aboveground biomass and stand volume based on independent variables in this case through remote sensing approach.Time, effort and cost become more measurable by using this regression method.Sample plot that representative used to develop regression equation adjusted with available resources without ignoring statistics rule.Furthermore, extrapolation can be applied to the entire research area using regression equation that developed.R-squared of regression equation model to estimates basal area, aboveground biomass and stand volume based on vegetation indices listed inTable 7. R-squared of regression equation model to estimates basal area ranged from 0.49 to 0.72; aboveground biomass ranged from 0.50 to 0.73 and stand volume ranged from 0.41 to 0.69.R-squared was a statistical parameter that used to measure proportion of the variance for a dependent variable that explained by an independent variable (Lavista et al., 2016).Based on description mentioned before, R-squared become important statistical parameter that considered in model selection.Furthermore, classical assumption was tested such as residual normality using Kolmogorov Smirnoff test and residual homogeneity using Glejser test on the model that has equal or higher R-squared value than the average.Rsquared average for basal area, aboveground biomass and stand volume regression equation model were 0.65; 0.65; and 0.60 respectively.These values become threshold for continuing to classical assumption test.Residual normality test and homogeneity test result on the sort listed model showed in Table 8, Table 9, and Table 10.
Based on Table 8, Table 9, and

Model selection
Selecting the best model to estimates basal area, aboveground biomass and stand volume using remote sensing were performed using the model verification measures.The verification of each regression model was tested by calculating their deviations from the estimate.The     12, and Table 13.
Based on Table 11, the highest score for estimates basal area was model M4 (Form 2 MSAVI) with final score 4.8.The equation of selected model was log(y) = 7.1894(x) + 5.0409 where x was MSAVI ranged from -0.5620 to 0.7510 and y was basal area.Aggregate deviation, mean deviation, RMSE, 2 Bias and R in this model 0.04, 27.02%, 32.66%, 1.88%, and 0.70 respectively.The aggregate deviation 0.04 met the criteria range from -1 to 1 while the mean deviation 27.02% where this value higher than 10% that suggested.RMSE was 32.66% and bias was 1.88% where the lower the RMSE and bias value is, the more accurate model will be.Based on deviation calculation, the averaged deviation for estimating basal area was 16.39%.This value higher than averaged deviation from Negara et al. (2021) was 11.3% that used drone imagery which has 5 cm spatial resolution for estimating basal area in post mining oil exploration area.MSAVI was found to be most sensitive to canopy fraction cover (Tsai et al., 2016).MSAVI increase the dynamic range of the vegetation signal while further minimizing the soil background influences, resulting in greater vegetation sensitivity as defined by a "vegetation signal" to "soil noise" ratio, the MSAVI can be said to be a more sensitive indicator for amount of vegetation (Qi et al., 1994)., the highest score for estimates Table 12 aboveground biomass was model M7 (Form 2 EVI) with final score 6.6.The equation of this selected model log (y) = 6.4503(x) + 4.6215 where x was EVI ranged from -0.5614 to 0.7511 and y was vegetation variable that estimated in this case aboveground biomass.Aggregate deviation, mean 2 deviation, RMSE, Bias and R in this model 0.09, 24.32%, 28.01%, 11.05%, and 0.72 respectively.The aggregate deviation for estimating aboveground biomass also met the criteria range from -1 to 1 while the mean deviation higher than 10% that suggested.Averaged deviation for estimating aboveground biomass was 18.09%.That value higher than averaged of deviation from Negara et al. (2021) was 15.2%.The EVI developed to optimize the vegetation signal with improved sensitivity in high biomass regions and improved vegetation monitoring through a de-coupling of the canopy background signal and a reduction in atmosphere influences (Huete et al., 2002).In several research, EVI was considered vegetation indices that can improved sensitivity to high biomass regions and improved vegetation monitoring capability (Kazar & Warner 2013;Yuan et al., 2016;Bao et al., 2019).
Based on Table 13, the highest score for estimates stand volume was model M3 (Form 2 NDMI) with final score 6.The equation of this selected model log(y)= 10.025(x) -1.9251where x was NDMI ranged from 0.2917 to 0.4845 and y was vegetation variable that estimated in this case stand volume.Aggregate deviation, mean deviation, 2 RMSE, Bias and R in this model respectively -0.15, 32.01%, 38.11%, 13.37%, and 0.69.The aggregate deviation and mean deviation for estimating stand volume also reflect the same with basal area and aboveground biomass model estimation where the aggregate deviation value from -1 to 1 and mean deviation higher than 10%.All selected model for estimate basal area, aboveground biomass and stand volume has averaged deviation between 16.39-24.37%especially for mean deviation and RMSE that has higher than 20%.Based on that explanation, estimating basal area, aboveground biomass and stand volume using medium resolution imagery (Landsat 8) has averaged deviation higher than 10%.If compared with previous studies that used more detail data such as drone imagery with 5 cm spatial resolution and SPOT 6 imagery with 1.5 m spatial resolution, this research has lower accuracy.Despite this research has lower accuracy, the data that used in this research was applied free-access data using Landsat-8 data.Model graph for estimating basal area, aboveground biomass and stand volume illustrates by plotting the vegetation indices on x axes and vegetation variable on y axes.Figure 7 showed model scatter plot of a).model for estimating basal area consist of MSAVI as x axes and basal area as y axes, b).model for estimating aboveground biomass consist of EVI as x axes and aboveground biomass as y axes, also c).model for estimating stand volume consist of NDMI as x axes and stand volume as y axes.Points depict the data that used to build the model while triangle depict the data that used to calculate the fitness (validation) of the model.

Conclusion
All of the developed models to estimate basal area, aboveground biomass, and stem volume used form 2 equation (exponential) with the expression which requires transformation of the dependent variable before developing the regression model equation.MSAVI, EVI, and NDMI were used as independent variables to estimate basal area, aboveground biomass, and stand volume, with equation 2 log(y) = 7.1894(x) + 5.0409 where x was MSAVI with R 2 0.70, log (y) = 6.4503(x) + 4.6215 where x was EVI with R 0.72, and log (y) = 10.025(x)-1.9251where x was NDMI 2 with R 0.69.The averaged deviation for estimating basal area, aboveground biomass, and stand volume was 16.39%, 18.09%, and 24.37%, respectively.Despite the fact that the average deviation was greater than 15%, the data used in this study was free-access data from Landsat-8.

Recommendation
The developed model can be used to estimates basal area, aboveground biomass and stand volume.However, larger number of samples should be collected to capture broader variance of the data and to represent more realistic output of the model as well.

Figure 2
Figure 2 Selected plot position and size appropriated with Landsat 8 pixel and homogeny field condition.

Figure 3
Figure 3 Display of natural colour and vegetation indices at research location (1:100,000).

Table 1
Allometric equation to obtain aboveground biomass and stand volume from diameter Table 2 Regression model development form 4 Jurnal Manajemen Hutan Tropika, 28(1), 1-14, April 2022 EISSN: 2089-2063 DOI: 10.7226/jtfm.28.1.1 test).Autocorrelation and multicollinearity test not conducted caused by the independent variable was not time series (continue) and the regression that developed only applied single independent variable for estimating basal area, aboveground biomass and stand volume.Regression model that developed must be fulfilled the residual requirement such as normal distribution, homoscedastic and independent.The residual has normal distribution if the Kolmogorov Smirnoff test has p-value > 0.05 and has homoscedastic if the Glejser test has p-value > 0.05.Model selection done by considering statistical accuracy such as aggregate deviation (AD), mean deviation (MD), Root Mean square (RMSE), Bias (E), and also R-squared.All of those statistical accuracy considered has equal weight to determine the selected model.Validation equation as shown in Equation [9] to Equation [12].

Figure 4
Figure 4 Model development flowchart for estimating basal area, aboveground biomass and stand volume.

Figure 5
Figure 5 Basal area (a), aboveground biomass (b), and stand volume (c) at sampled plot in every plantation age.

Figure 6
Figure 6 Vegetation indices dynamics in each plantation age.

Figure 7
Figure 7 Model graph for estimating a) basal area, b) aboveground biomass, and c) stand volume.

vegetation indices were tested using Kolmogorov Smirnoff test for normality distribution. Vegetation indices that has normal distribution indicated by p-value > α(0.05) at confidence level 95%. Regression model that developed in this research showed in Table 2. R-squared considered to the further step for classical assumption test (residual normality and heteroscedasticity
Field measurement activity consist of recording vegetation type and diameter.Furthermore, diameter data from field measurement was converted into basal area, aboveground biomass and stand volume.Basal area obtained based on the Equation [8].
2Meanwhile aboveground biomass and stand volume obtained using allometric equation.The allometric equation that used in this research selected by considering similarity or closeness location.Allometric equation that used in this research mentioned in Table1.

Table 3
Distribution of individual vegetation in each planted year (plantation age)

Table 4
Distribution of vegetation diameter in each planted year (plantation age) Correlation was the first step followed by variable normality test for regression model development.Correlation indicates relationship between a variable with another variable and normality test indicates distribution of values in a variable.Correlation procedure became an initiator statistical measurement that considered to estimate a dependent variable based on remote sensing approach in several research Table 10, there were several model that has normal distribution and homogeneity on the residuals such as Form 1 SAVI, Form 1 MSAVI, Form 1 EVI, Form 2 MSAVI, Form 2 NBR and Form 2 NDMI to estimates basal area; Form 1 NDMI, Form 1 EVI, Form 2 SAVI, Form 2 MSAVI, Form 2 NBR, Form 2 NDMI and Form 2 EVI to estimates aboveground biomass; Form 2 SAVI, Form 2 MSAVI, Form 2 NDMI, Form 2 EVI, Form 3 MSAVI and Form 3 NBR to estimates stand volume.Residuals that has normal distribution indicates by p-value higher than 0.05 resulted from Kolmogorov Smirnoff test and has homogeneity indicates by p-value higher than 0.05 resulted from Glejser test.Model that has normal distribution and homogeneity on the residual furthermore selected as proper model and continued to model selection step by calculating model accuracy or fitness using validation sample data.

Table 5
Correlation between Landsat 8 surface reflectance and vegetation indices with basal area, aboveground biomass and stand volume

Table 7 R
-squared of regression equation model to estimates basal area, aboveground biomass and stand volume based on vegetation indices

Table 6
Kolmogorov Smirnoff test for vegetation indices normality distribution

Table 8
Residual normality test and homogeneity test for basal area model to the estimate with the smaller deviation is expressing the better fitness.Model verification measures applied were aggregative deviation (AD), mean deviation (MD), root mean square error (RMSE), and bias (E).All of those verification values were then transformed into standardized scores.The highest score is expressing the lowest deviation.Generally, model fitness means comparing the value based on equation model with the actual condition in field.Scoring table for selecting the best model regression to estimates basal area, aboveground biomass and stand volume respectively showed in Table 11, Table closer

Table 9
Residual normality test and homogeneity test for aboveground biomass model 10 Table 10 Residual normality test and homogeneity test for stand volume model

Table 11
Scoring table for selecting the best model to estimates basal area Table 12 Scoring table for selecting the best model to estimates aboveground biomass

Table 13
Scoring table for selecting the best model to estimates stand volume