Thoughts on Collect Earth and Mapathon Data

Overall, we found the Collect Earth tool and Mapathon data to be a creative and useful way to collect a lot of sample data for land use classifications. However, there were a few aspects of the survey that were unclear or buggy. One of the most common sources of confusion was how to count trees in a square. It wasn’t clear if tree count was supposed to be the number of pixels in a square that intersect with trees, or the total number of trees in the square. It also wasn’t always clear how to classify a square that had multiple different land uses, or what to do when the imagery available it was grainy and/or indecipherable. For these, some additional instruction in Collect Earth or a Read Me file could help clear up confusion.

Figure 1. Example squares that have the potential for confusion. The one on the left is split between uses, and the one on the right has a mix of uses.

There were also a couple smaller bugs. While it was generally easy to tell from which year we should gather imagery, there was one common edge case when it wasn’t clear. For many squares, the best imagery available for target year 2010 was from either 2006 or 2014, but the way the survey was structured, it wasn’t clear which option would be preferable. There were also no questions about land-use confidence on the survey for target year 2010, so this column in the spreadsheet was empty.

Figure 2. It’s not clear how to prioritize data from 2006, 2007, 2013, and 2014.

Confidence was something that many of us also had a difficult time answering. It felt a little arbitrary sometimes deciding whether we were confident, particularly when confidence is presented as a ‘yes’ or a ‘no.’ For example, when we weren’t certain about a square, but not completely lost either, neither option seemed appropriate. It might be better to ask for confidence on a sliding scale to account for this range of certainty.

The final exported spreadsheet was generally well formatted but had one major complication. When someone filling out the survey indicated that land use or tree cover didn’t change in a square between years, the data from the previous classification didn’t transfer to the next year. This made things difficult because it led to many gaps in the data where the analyst needed to find other columns to solve. For example, if one wanted to find the squares that were forested in 2019, it wasn’t enough to look at the column for land use in 2019. A square that was forested could have that indicated in the 2019 column, or in the 2010 column with a note that land use didn’t change, or in the 2000 column with two notes that land use didn’t change. If there was a way to copy data to other years in the spreadsheet when land use doesn’t change, it would make the final results much easier to interpret.

Figure 3. Since most pixels don’t change from year to year, there are a lot of gaps in land use column. The data would be much easier to work with if data transfered to these columns from previous years.

With this in mind, our class was still able to collect some useful data in Rwanda using Collect Earth, both about land use change and how the survey itself was functioning. Unsurprisingly, there was more coverage for recent years than the early 2000s, with the most commonly used satellite years being 2019, 2017, and 2006. The earliest satellite imagery in the region was from 2002, and because this satellite did not cover the whole study area, for the majority of squares there was no imagery for base year 2000. Out of the 142 squares we analyzed, we were able to find 31 pixels of natural, non-mangrove forest in 2010. Of these, 24 were still forests in 2019, 6 became croplands, and one became a wood plantation. Interestingly, there were 5 squares that transitioned the other way, beginning as cropland but having natural tree cover in 2019.

Land Use	Percent Confident
Natural Forest	87%
Plantation Foest	20%
Cropland	85%
Settlement	60%
Grassland	50%
Shrubland	40%
Barren	67%

Table 1. Percent of land use squares our class assigned with confidence.

Of course, the results of Mapathon data are only as reliable as the people filling out the form. In order to make sure the data actually match what’s on the ground, it’s important to make sure people are trained in viewing satellite imagery in the first place. The two most common land use categories, natural forests and cropland, were the two that our group most commonly reported with confidence, both having an overall confidence greater than 85%. However, we were by far the least confident in our designations of wooded plantations. This may be because we couldn’t tell the difference between plantations and natural forests, but it might be easier to know why this was so low if there was an ‘additional comments’ question on the survey. This would allow someone who is gathering Collect Earth data to indicate if there is any nuance that would be helpful to analysts in the future.

Leave a Reply Cancel reply