With a threshold score of 0.92 our model predicted that 73,717 tiles across Colombia and the eastern Caribbean islands contained schools. While this set certainly contained false positives and unverifiable tiles, it also significantly reduced the search space for schools in these countries to less than 0.15% of 52 million tiles. This shifted what was previously a nearly impossible task to one that could be done by 5 expert mappers in eight days.
With a validation speed of 10,000 tiles per day, the mappers identified 10,998 school geolocations, where 6,954 of them are unmapped schools (schools that were not part of the initial dataset of 44,665 schools). 60,568 predicted school tiles were tagged as “unrecognized” by our expert mappers, which turned into a heatmap (see the following figure). These tiles don’t have clear school features. During the machine learning prediction validation, we found that schools in rural areas are hard to verify as schools because residential houses may be used as school locations.
Machine learning model generalizability is an active research area (1, 2), and in our study, the school classifier we trained in Colombia generalized well in the Eastern Caribbean islands. We added 262 schools to the islands that had not been mapped before. If users are looking at different countries or regions that have different terrains than Colombia, we recommend to add few but representative schools to fine tune our current trained school classifier. Our school classifier, the Tensorflow Serving image(GPU version), lives on DockerHub now. It’s open-source and free to run as an end-points to users who want to send DG Vivid zoom 18 images tiles to classify schools in their area of interest.