Portfolio Problem II

October 16, 2020

Do custom created classifications outperform global tree maps like Hansen’s?

In this problem, we explore some of the challenges of classifying forest pixels based on tree cover thresholds, and try to compare custom-created supervised classifications with tree cover threshold-based classification systems. We began by creating a set of training points of different landcover groups to rigorously pick out tree cover in our study area (the Rumphi district of Malawi) and creating decision trees based on those training points to assign landcover values to pixels.

Fig. 1. An image of the study area (outlined in black). Link to Code

Evaluation of Custom Classification

Based on visual inspection, this custom classification map seems to capture densely forested areas fairly well. Below are two images showing the same regions before and after our custom classification had been run. There seems to be some confusion regarding new tree growth in deforested areas and grassland (Fig. 1a), but overall, intact forest patches are well-identified, as can be seen in Fig. 1b.

Fig. 1a. At right, true color composite of Landsat imagery of Chelinda Lodge area. Forested patches are darkest green, while scrub is a lighter green and grassland/deforested areas are more brown. Compare to classified image at left. Forested patches are darkest green, while scrub is a lighter green, deforestation and water are shown in blue, and grassland is shown in yellow. Link to Code
Fig. 1b. At left, true color composite of Landsat imagery in a densely forested area split by non-forested landcover. Compare to classified image at right. Forested patches are darkest green, while scrub is a lighter green and grassland is shown in yellow. Link to Code

Comparing Custom Classification to Hansen’s Classification

Next, we compared our custom tree cover classification to a threshold-based tree cover map made from Hansen’s Global Forest Change at a threshold of 30%. When comparing to Fig. 1a., it looks like Hansen’s map is more likely than our custom classification to classify areas that have been deforested and are beginning to grow again as “tree cover” (Fig. 2a). It also seems that Hansen’s dataset at the 30% threshold is more likely than our dataset to classify non-treed areas as forested, as can be seen in Fig 2b and Fig. 3. However, Fig. 3. shows that although Hansen is more likely to produce false positives, the overall majority of the pixels seem to be in agreement about tree cover conditions across the two datasets, although there is more agreement regarding true negatives than true positives.

Fig. 2a. An image of the same area as shown in Fig. 1a. Colors indicate the following: green = agreement (forested), blue = agreement (not forested), yellow = disagreement (custom classification says tree, Hansen says no), and red = disagreement (Hansen says tree, custom classification says no). Link to Code
Fig. 2b. Left: satellite imagery; Right: Classified image. Colors indicate the following: green = agreement (forested), blue = agreement (not forested), yellow = disagreement (custom classification says tree, Hansen says no), and red = disagreement (Hansen says tree, custom classification says no). Link to Code
Fig. 3. Chart showing total area of pixels in the four agreement/disagreement categories when Hansen threshold is set to 30%. Link to Code

“Truthing” the Hansen Dataset

If we consider our custom classification to be “truth,” we can compare our results to Hansen’s to see at what threshold the Hansen dataset performs the most correct classification. When performing supervised classification, Adjognon et al. (2019) state that when probability of classification as tree cover is low, the rates of true positives (correctly classified trees) is highest. However, as thresholds increase, true positive rates decreased but true negative rates (correctly classified as not trees) increased, resulting in a higher balanced accuracy rate between true positive rate and true negative rate. The lower the threshold, the higher the rate of true positives, meaning that more areas of tree cover will be properly identified. The reverse applies for higher thresholds and true negatives (Adjognon et al., 2019).

To determine this threshold of achieving the highest balanced accuracy rate, we ran our comparison of the custom classification against different thresholds of the Hansen dataset.

Code For Multithreshold Analysis

It seems that a Hansen threshold of 21% creates the most accurate classification of tree cover pixels. At this low threshold, Hansen is more likely than our dataset to classify a pixel as “forested,” but the difference between Agree (Tree) and Agree (No Tree) pixels is minimized and the majority of classified pixels are in agreement (Fig. 4.).

Fig. 4. Agreements/Disagreements chart for the two datasets, with a threshold of 21% defined for the Hansen dataset. The Agree (Tree) and Agree (No Tree) are close in value to one another, yielding a high balanced accuracy rate. (“Second” in title refers to the order of the 21% threshold (second-lowest) when run against three other threshold values.) Link to Code

References

Adjognon, G.S., Rivera-Ballesteros, A., and van Soest, D. (2019). Satellite-based tree cover mapping for forest conservation in the drylands of Sub Saharan Africa (SSA): Application to Burkina Faso gazetted forests. Development Engineering, 4, 100039 (2019).  https://doi.org/10.1016/j.deveng.2018.100039