For this exercise, I created gap-filled images of Rwanda from Landsat-8 and Sentinel-2 imagery during the summer months from 2017-2020. These months were chosen to correspond with Rwanda’s dry season, when there is less cloud cover, which reduces the opportunity for gaps in the composite image. The range of years was extended in an attempt to fill gaps the 2019 Landsat composite image. This was done as opposed to extending the months to ensure landscape features that change seasonally would be consistent in all images. Unfortunately, even after extending the range some gaps remained. Because these gaps were small and isolated, they should not pose a serious threat to the analysis.

The largest gap in the Landsat composite image was just west of Mount Mikeno, DRC

My schema consisted of four classes – forestland, wetland, built areas, and cropland. These classes don’t encapsulate the diversity of land uses in the Rwanda, and this caused some issues in the classification. For example, large areas of soil and grasslands were classified as either urban or cropland because the computer didn’t have another option. However, visual inspection revealed that two classes with distinct spectral characteristics, wetland and forestland, were able to identify their respective land uses without large systemic errors. Thus, this four-classification schema was deemed adequate for mapping forest cover, even if distinctions between the other classes were unreliable.

The Mapathon dataset was very useful in validating the classification outputs. The most obvious advantage was being able to collect a large number of validation points quickly. I also found it useful to have the validation points collected by multiple people. If I’m the one collecting both training and validation points, I could end up with a classification system that is precise but inaccurate. If multiple people collect validation points this scenario is less likely. However, the resulting Mapathon spreadsheet is confusing, and configuring it to just the essential information was difficult and time-consuming.

Visualization of the data shows that, while the classifications identified roughly the same regions as forested using both sources of imagery, the Sentinel classification found much more forest cover than Landsat. This additional tree cover was mostly on the periphery of forests. Some forests had large areas in the middle, for example, with less dense tree cover, and these pixels were often identified as non-forest by Landsat but forest by Sentinel. This explains why the user accuracy for pixels with trees was so much higher using Sentinel imagery than Landsat imagery. When more pixels are identified as trees overall, it is less likely for pixels classified as trees to actually contain trees.

Landsat Non-TreesLandsat TreesSentinel Non-TreesSentinel Trees
Producer’s Accuracy89%56%84%60%
User’s Accuracy77%78%78%69%
Results of classifications. Producer accuracy refers to the likelihood of a given pixel type being identified correctly, whereas user accuracy refers the the likelihood the classification was correct given how it identified a certain pixel.

Overall, my classifications did an okay job at identifying tree cover for both sources, and besides the different in user accuracy, it was equally robust for each set of images. My classifications had an overall accuracy of roughly 75% and a kappa coefficient of 0.45. I also got a very high user accuracy for pixels without trees (~85%) when compared to pixels with trees (~60%), which makes sense given that most of the landscape was not forested.

Leave a Reply