December 11, 2020
Tree detection in imagery
In this final entry, we attempt to provide insights from our experiences about how to improve the results of tree detection in from imagery in Malawi.
Landsat-8 Improvement and Errors of Omission/Commission: Diversifying Training Pixel Examples
In Portfolio Problem III, we compared tree cover classification between Landsat-8 and Sentinel-2 imagery. A core difference between Landsat-8 and Sentinel-2 data is the resolution of the two image collections. This seemed to be the root cause of some of the differences in tree cover detection between these two datasets. Landsat-8 seemed well-equipped to pick out large areas of contiguous tree cover but missed many pixels with smaller percentage of tree cover. The strength of Sentinel-2 lay in its ability to pick out smaller areas of tree cover, such as those in urban areas (Fig. 1), and areas on the “fringe” of large swaths of tree cover that were excluded by the Landsat-8 classification.
![](https://sites.middlebury.edu/rsportfolioeclinton/files/2020/12/Screen-Shot-2020-12-09-at-10.31.57-AM-1024x341.png)
The confusion of scrubland with trees presented another issue, as this resulted in errors of commission (scrubland pixels being assigned to the tree cover class) (Fig 2).
![](https://sites.middlebury.edu/rsportfolioeclinton/files/2020/12/Screen-Shot-2020-12-07-at-8.38.07-PM-Emma-Clinton.png)
One student ascribed this to a failure to provide a clear distinction between the two classes in the training points. However, it is possible that this could have to do with the “neighborhood majority” method used to ascribe landcover classes to validation pixels (see “Collect Earth Recommendations” below). With regard to errors of omission, where pixels that were actually tree cover were mis-identified as other landcover, tree cover pixels were most commonly misclassified as herbaceous grassland and bare ground (Fig. 5). This makes sense when we consider where the pixels that were frequently missed were located: at the edges of forested patches or small patches in the middle of grassland area.
Our data for errors of omission and commission came from one student who used a different methodology to assign values to the validation points. This student used data regarding number of trees and percent of tree cover instead of simply looking at the dominant land cover class assigned during the Collect Earth survey. Her map was one of the most accurate, but several others that did not use this extra data were more accurate. We therefore cannot conclude that this method yields better results.
One of our recommendations for the above issues would be providing the classifier with more diverse training point examples of tree cover pixels. Including examples of pixels with sparser tree cover, such as those at the edges of large forests and pixels partially containing tree cover in developed areas would likely lead to higher inclusion of trees in Landsat-8 classification. Indeed, when using RandomForest classification, the more diverse representation of what should be included in a land cover class that is provided, the better the classification will be.
To support this recommendation, we found that there was no clear correlation between the number of training points and the accuracy of the classifier; perhaps the issue lies more within how robust and diverse the examples provided by the training points included are.
Sentinel-2 Recommendations: Cloud Masking Issues
Overall, there was reported a 50-50 split between which satellite imagery was reported as better for classifying tree cover (average overall accuracy: Sentinel = 73%, Landsat = 70%). As previously mentioned, the larger pixel size of Landsat-8 may have been a contributing factor in the cases where it was outperformed by Sentinel-2. Finding issues with the Sentinel-2 classifications requires a little more investigation, as we expected its smaller pixel size to result in greater accuracy with regard to tree cover identification.
One issue that was mentioned with regard to Sentinel-2 was the fact that many gap-filled images seemed subject to some level of interference from cloud remnants, which created issues with the detection of tree cover as it altered the spectral signature of pixels (Fig. 3.)
The cloud cover mask we used can be found here (courtesy of Rodrigo Principe). Alterations to the cloud mask thresholds of inclusion/exclusion for percent cloud cover and inclusion of a wider temporal range of images may mitigate this issue, which would allow the user to take advantage of the smaller pixel size of Sentinel-2 data.
Collect Earth Recommendations
Additionally, the size of the grid used to define the dominant land cover class of each training points was 70m x 70m, so it is difficult to determine whether or not the dominant landcover class ascribed to the grid also applies to the Landsat-8 or Sentinel-2 pixel with which it overlaps. This would likely be less of an issue if our validation points were all located in areas of homogenous land cover. However, our validation points often contained multiple landcover types in their definition grid, meaning that Landsat-8 or Sentinel-2 pixels that overlapped the central point might not match with the land cover classification that had been assigned to the point based on its 4900 m2 neighborhood (Fig 2, Fig. 4, Fig. 5).
![](https://sites.middlebury.edu/rsportfolioeclinton/files/2020/12/Screen-Shot-2020-12-07-at-8.26.36-PM-1024x272.png)
![](https://sites.middlebury.edu/rsportfolioeclinton/files/2020/12/Screen-Shot-2020-12-07-at-8.21.50-PM.png)
One way to mediate this might be utilizing the Collect Earth grid footprint in Google Earth Engine and assigning a tree cover/not tree cover value to the validation point based on majority neighborhood pixels from our Landsat-8 or Sentinel-2 classifications to make the validation more specific to our imagery.
We also wonder if our inclusion of so few validation points that are all located in an area that does not offer any examples of tree cover in developed areas or near cropland (Fig. 6). A more diverse array of training points to compare against tree cover classification might make the validation process more accurate. One concern that was also brought up was the fact that the validation points were classified based on 2017 data, but the tree cover analyses were based on 2019 imagery. This likely did not cause many issues with our classification as our training pixels were mostly confined to a small region in a protected forest area, but if the scope of the training pixels were larger and covered developed lands, this may be something to keep in mind.
![](https://sites.middlebury.edu/rsportfolioeclinton/files/2020/12/Screen-Shot-2020-12-10-at-7.38.48-PM.png)