Developed for UNICEF

AI models

Results and conclusions

Country models: tile-based school classifier

All country models performed really well with the validation dataset that all the model validation F1 scores were above 0.9 except Niger country model (0.87). The detailed model evaluation metrics, including precision, recall and F beta scores can be visualized through Table 4 Model evaluation column. You can click the ‘eval’ link to see our Tensorboard model training monitoring graph as well.

Table 4: All the country models and regional models under tile-based school classifier performed well over the validation datasets that you can click on ‘eval’ Tensorboard Dev linked or scrutinize F1, recall and precision scores under model evaluation column.

Despite their varied structure, many schools have identifiable overhead signatures that make them possible to detect in high-resolution imagery with deep learning techniques. Approximately 18,000 previously unmapped schools across 5 African countries, Kenya, Rwanda, Sierra Leone, Ghana, and Niger, were found in satellite imagery with a deep learning classification model. These 18,000 schools were validated by expert mappers and added to the map. We also added nearly 4,000 unmapped schools to Kazakhstan and Uzbekistan in central Asia, additional 1,097 schools in Honduras. In addition to finding previously unmapped schools, the models were able to identify already mapped schools up to ~ 80% depending on the country. To facilitate running model inference across over 71 million zoom 18 tiles of imagery our team relied on our open source tool ML-Enabler.

Table 5: The main findings from tile-based school classifier country models. The column “Known school” is validated school geolocations that have clear school feature (See Table 1 “TOTAL POINTS PER TAG AFTER VALIDATION” “YES” category; 2) column “ML threshold” is the model confident score threshold for each country model; 3) the column “Total detected” are the total number of detected schools with the certain ML threshold scores; 4) “ML output validation” indicates after the expert mappers validation of the ML outputs, the number of confirmed schools “Yes”, unrecognized schools “Un-reg” and “No” schools; 5) “True capture” is the percentage of real schools that correctly predicted by ML model, they higher the percentage means the country ML model perform better; 6) “Difference” is the number of schools that ML models did not find; 7) “Double confirmed” are the number of schools detected by ML models and are also known schools; 8) Unmapped schools are the schools that currently are not on the map but detected by ML models and confirmed by the expert mappers.

Kenya

The tile-based school classifier model for Kenya had made a randomly selected 6,000 “schools” and 6,200 “not-school” training dataset. The F1 score of the model is 0.9 over the validation dataset, which indicated it’s a good performance model.

There were more than 20,000 known schools that were validated and confirmed by expert mappers in Kenya. When we used 0.6 ML confident score “ML threshold” there were ~37,000 schools detected, and ~18,000 of them have very distinguished school features, which were confirmed as “YES” schools by the human expert mappers. The ML model was able to capture ~60% of known schools in Kenya and find another 8,000 unmapped schools (Table 5).

Figure 9: We mapped the true positive and negative of the model performance (on the left), that model was able to separate the two categories super well. The ROC curve (on the right) for the Kenya country model tells us when we use the ML threshold of 1.0, there will only be a 6% of false-positive rate.
Table 5: There were more than 20,000 known schools that were validated and confirmed by expert mappers in Kenya. When we used 0.6 ML confident score “ML threshold” there were ~37,000 schools detected, and ~18,000 of them have very distinguished school features. The ML model was able to detect ~60% of known schools in Kenya and find another 8,000 unmapped schools.

Rwanda

In the tile-based school classifier model for Rwanda, we selected all 3,200 positive and similar numbers of negative tiles in the training dataset. The F1 score of the model is 0.94 over the validation dataset, which indicated it’s a good performance model.

There were more than 3,200 known schools that were validated and confirmed by expert mappers in Rwanda. When we used the 0.87 ML confident score “ML threshold” there were ~6,500 schools detected, and ~3,700 of them have very distinguished school features. The ML model was able to detect ~60% of known schools in Rwanda and find more than 1,500 unmapped schools (Table 6).

Figure 10: We mapped the true positive and negative of the model performance (on the left), that model was able to separate the two categories super well. The ROC curve (on the right) for the Rwanda country model tells us when we use the ML threshold of 1.0, there will only be a 9% of false positives rate.
Table 6: There were more than 3,200 known schools that were validated and confirmed by expert mappers in Rwanda. When we used the 0.87 ML confident score “ML threshold” there were ~6,500 schools detected, and ~3,700 of them have very distinguished school features. The ML model was able to detect ~60% of known schools in Rwanda and find more than 1,500 unmapped schools

Sierra Leone

In the tile-based school classifier model for Sierra Leone, we selected all 9,800 “schools” and similar numbers of negative tiles of “not-school” in the training dataset. The F1 score of the model is 0.91 over the validation dataset, which indicated it’s a good performance model.

There were more than 9,800 known schools that were validated and confirmed by expert mappers in Sierra Leone. When we used the 0.5 ML confident score “ML threshold” there were ~17,000 schools detected, and ~5,000 of them have very distinguished school features. The ML model was able to detect ~75% of known schools and find another 1,300 unmapped schools. (Table 7).

Figure 11: We mapped the true positive and negative of the model performance (on the left), that model was able to separate the two categories super well. The ROC curve (on the right) for the Rwanda country model tells us when we use the ML threshold of 1.0, there will only be a 9% of false positives rate.
Table 7: There were more than 9,800 known schools that were validated and confirmed by expert mappers in Sierra Leone. When we used the 0.5 ML confident score “ML threshold” there were ~17,000 schools detected, and ~5,000 of them have very distinguished school features. The ML model was able to detect ~75% of known schools and find another 1,300 unmapped schools.

Niger

In the tile-based school classifier model for Niger, we selected all 1,600 positive and similar numbers of negative tiles in the training dataset. The F1 score of the model is 0.87 over the validation dataset, it’s the only country that achieved an F1 score under 0.9 but it indicated it’s a good performing model.

There were only 1,100 known schools that were validated and confirmed by expert mappers in Niger. When we used 0.92 ML confidence score “ML threshold” there were ~4,900 schools detected, and ~1,800 of them have very distinguished school features. The ML model was able to detect ~79% of known schools and find another 1,600 unmapped schools (Table 8). Note: Niger country model was trained with school data from Niger, Mali, Sudan, and Chad.

Figure 12: We mapped the true positive and negative of the model performance (on the left), that model was able to separate the two categories super well. The ROC curve (on the right) for the Rwanda country model tells us when we use the ML threshold of 1.0, there will only be a 9% of false positives rate.
Table 8: There were only 1,100 known schools that were validated and confirmed by expert mappers in Niger. When we used 0.92 ML confidence score “ML threshold” there were ~4,900 schools detected, and ~1,800 of them have very distinguished school features. The ML model was able to detect ~79% of known schools and find another 1,600 unmapped schools. Note: Niger country model was trained with school data from Niger, Mali, Sudan, and Chad.

Kazakhstan

In the tile-based school classifier model for Kazakhstan, we selected all 6,000 positive and similar numbers of negative tiles in the training dataset. The F1 score of the model is 0.92 over the validation dataset, it indicated it’s a good performing model.

There were only 6,000 known schools that were validated and confirmed by expert mappers in Niger. When we used the 0.87 ML confidence score “ML threshold” there were ~8,300 schools detected, and ~4,000 of them have very distinguished school features. The ML model was able to detect ~61% of known schools and find another 1,560 unmapped schools (Table 9.).

Figure 12: We mapped the true positive and negative of the model performance (on the left), that model was able to separate the two categories super well. The ROC curve (on the right) for the Rwanda country model tells us when we use the ML threshold of 1.0, there will only be a 9% of false positives rate.
Table 9: There were only 6,000 known schools that were validated and confirmed by expert mappers in Niger. When we used the 0.87 ML confidence score “ML threshold” there were ~8,300 schools detected, and ~4,000 of them have very distinguished school features. The ML model was able to detect ~61% of known schools and find another 1,560 unmapped schools.

Honduras

In the tile-based school classifier model for Niger, we selected all 4,300 positive and similar numbers of negative tiles in the training dataset. The F1 score of the model is 0.90 over the validation dataset, which indicated it’s a good performing model.

There were only 4,300 known schools that were validated and confirmed by expert mappers in Honduras. When we used 0.98 ML confidence score “ML threshold” there were ~14,400 schools detected, and ~1,900of them have very distinguished school features. The ML model was able to detect only ~42 % of known schools and find another 1,100 unmapped schools (Table 10).

Figure 12: We mapped the true positive and negative of the model performance (on the left), that model was able to separate the two categories super well. The ROC curve (on the right) for the Rwanda country model tells us when we use the ML threshold of 1.0, there will only be a 9% of false positives rate.
Table 10: There were only 4,300 known schools that were validated and confirmed by expert mappers in Honduras. When we used 0.98 ML confidence score “ML threshold” there were ~14,400 schools detected, and ~1,900of them have very distinguished school features. The ML model was able to detect only ~42 % of known schools and find another 1,100 unmapped schools.

Ghana (Test country)

Ghana is one of two countries as test countries besides Uzbekistan. It meant we did not build a tile-based school classifier country model for Ghana. A test country means we used a trained model from elsewhere to apply the model inference over Ghana’s dataset. Therefore, the Known schools in this context are schools we extracted from OSM that without human expert mappers validation at all. We used the West African model (trained with school & not-school data from Sierra Leone and Niger) to run the model inference over supertiles of Ghana. At ML confidence score of 0.92, we were able to detect 15,500 schools, after the expert mappers validation more than 6,400 schools were selected. There is only 17% of true capture (To intersect the double confirmed schools, which means ML validated schools by human expert mappers matched Known schools), that indicated schools in Ghana are way under mapped, and that’s the reason we were able to discover nearly 6,000 unmapped schools.

Table 11: As a test country, we did not develop a country model for Ghana. But only applied the West African model that trained with Sierra Leone and Niger datasets over the populated supertiles in Ghana. We were only able to extract about 3,000 schools from OSM. After the expert mappers validate the machine learning model output we found nearly 6,000 unmapped schools in Ghana.

Uzbekistan (Test country)

Uzbekistan is one of two countries as test countries besides Ghana. It meant we did not build a tile-based school classifier country model for Uzbekistan. A test country means we used a trained model from elsewhere to apply the model inference over the Uzbekistan dataset. Therefore, the Known schools in this context are schools we extracted from OSM that without human expert mappers validation at all. We used the Kazakhstan country model (trained with school & not-school data Kazakhstan) to run the model inference over supertiles of Uzbekistan. At ML confidence score of 0.95, we were able to detect more than 10,010 schools, after the expert mappers validation more than 3,100 schools were mapped. There is only 29% of true capture (To intersect the double confirmed schools, which means ML validated schools by human expert mappers matched Known schools), that indicated schools in Uzbekistan is way under mapped, and that’s the reason we were able to discover more than 2,200 unmapped schools.

Table 11: As a test country, we did not develop a country model for Uzbekistan. But only applied the Kazakhstan country model over the populated supertiles in Uzbekistan. We were only able to extract about 3,650 schools from OSM. After the expert mappers validate the machine learning model output we found more than 2,200 unmapped schools in Uzbekistan.

Regional and global models performance in Kenya

In this section, we meant to compare the model performances between the country, regional and global models. In this context, the country model is trained with the country-specific schools and not-schools data, the regional model is trained with countries datasets that are geophysically close to each other. For instance, the East African regional model is trained with Kenya and Rwanda datasets, and the West African regional model was trained with Niger, Sudan, Mali, Chad, and Sierra Leone datasets. It’s self-explanatory that the global model was trained with all-country datasets.

Kenya is the only country that has over 20,000 known schools that have been validated by expert mappers. Only 6,000 schools in Kenya were randomly selected to train the Kenya country model. It means there are more than 14,000 known schools left over as “test data” that were never exposed to the model. Therefore, Kenya is the perfect country to answer questions like:

  • How do regional and country models perform differently?
  • Is that necessarily true to build country-specific models or can we rely on only the regional model that’s generalized well across countries?

By plotting the results from Kenya model inferences with the Kenya country, East African regional and global model, we found:

  • The regional model outperformed the country model in that it produced fewer ML-detected schools (Figure 13 - blue bar), which means fewer false positives. It was able to detect more schools (the red bar) that fell under unknown schools.
  • Both country and regional model perform very well despite the models were only exposed to only a quarter of the available known schools in Kenya;
  • In the future, either model transfer-learning or fine-tuning, to train a regional model instead of developing country-specific models.
Figure 13: Model inferences with the Kenya country, East African regional model (Note the model was trained with Kenya and Rwanda datasets), and global models over 6 million supertiles in Kenya. With the model confidence scores set to 0.6, there were nearly 37,000 schools (the blue bar) were detected under the country model, > 35,000 schools under the regional model, and > 34,000 schools under the global model. The global model was trained with all countries' data. 10,896 (the red bar) schools fell under known schools with regional models, and 10,516 schools under the country model. The conclusion here is that the regional model is performing better in Kenya in that it predicted less false positive (the blue bar) but was able to predict more schools (the red bar) for Kenya.

Direct school detection in Kenya

Direct school detection model

We started a direct school detection model over Kenya the initial result looks promising. Model training will take about another 30 hours over the weekend until it reaches 50000 steps. Currently, the screenshot shows the result of 19,581 steps as fellows (Figure 14). The image on the left is the model prediction over the validation dataset, and on the right is the ground truth. The model inference with the trained direct school detection in Kenya. The speed of the ML inference is 23.4 tiles per second or 2,906,997 supertiles over 34.5 hours. We still use ML-Enabler to perform our model inference. Compared to the tile-direct school classifier, the direct school detention model in Kenya is less performing, that caused by object detection is less accurate and slower to train and inference than image classification models in general.

ML-Enabler35 is a machine learning integration tool in partnership between Development Seed and the Humanitarian OpenStreetMap Team36. ML Enabler is a registry for machine learning models in OpenStreetMap and aims to provide an API for tools like Tasking Manager37 to directly query predictions.

Figure 14: The GIF showed exactly how the model performance improves through the model training steps increase.
Figure 15: The confusion matrix over binary directe school detection model in Kenya. The model was able to separate background tiles from school tiles better.

Direct school detection model result validation

DevSeed Data Team expert mappers tagged the detected school from the model inference as appropriate as: “Yes”, “No” or “Unrecognized” like we did for tile-based school classifier models. The expert mappers also adjust the bbox to capture the school building complex when the detection is “Yes”. The model inference result validation was at the speed of 431 schools per minute.

  1. Bbox with a school detected.- We considered this following case as “school pattern = yes” only if part of the school buildings were detected, also we have to adjust the bbox to cover the whole schools boundary.
  1. Bbox with an unrecognized school.- We tag this case as “school pattern = unrecognized” only if in the bbox there are buildings but we can not recognize school patterns there.
  1. Bbox without any building detected.- We tagged this as “school pattern = no” in the detections where there are not any buildings.

__________________________________

35 "On-demand machine learning predictions for mapping tools ...." 5 Aug. 2020, https://developmentseed.org/blog/2020-08-05-on-demand-machine-learning-predictions-for-mapping-tools/. Accessed 17 Feb. 2021.
36 "Humanitarian OpenStreetMap Team." https://www.hotosm.org/. Accessed 17 Feb. 2021.
37 "HOT Tasking Manager." https://tasks.hotosm.org/. Accessed 17 Feb. 2021.