Developed for UNICEF

Summary

Summary

Development Seed is delighted to deliver Project Connect Phase III to the UNICEF Office of Innovation. The project was designed to apply scalable machine learning over high-resolution satellite imagery to map every school in Kenya, Rwanda, Sierra Leone, Niger, Honduras, Ghana, Kazakhstan, and Uzbekistan. Such a massive AI-assisted human-in-the-loop workflow is aiming to put every school on the map to accelerate connectivity, online learning, and other initiatives for children and their communities and drive economic stimulus under UNICEF GIGA mission in, particularly lower-income countries.

Accurate data about school locations is critical to provide quality education and promote lifelong learning, listed as UN sustainable development goal 4 (SDG4), to ensure equal access to opportunity (SDG10) and eventually, to reduce poverty (SDG1). However, in many countries, educational facilities’ records are often inaccurate, incomplete, or non-existent.

An accurate, comprehensive map of schools - where no school is left behind - is necessary to measure and improve the quality of learning. Such a map, in combination with connectivity data collected by UNICEF’s Project Connect initiative, can be used to reduce the digital divide in education and improve access to information, digital goods, and opportunities for entire communities. In addition, understanding the location of schools can help governments and international organizations gain critical insights into the needs of vulnerable populations, and better prepare and respond to exogenous shocks such as disease outbreaks or natural disaster.

Under the direction of the UNICEF GIGA mission, we developed AI-assisted rapid school mapping in 8 countries in Asia, Africa, and South America. The AI models we built include tile-based school classifier, which is a high-performing and accurate binary classification convolutional neural network, to search for schools from 71 million zoom 18 tiles (256 x 256 pixels per tiles in 60cm meter high-resolution Maxar Vivid imagery). The tile-based school classifier models we developed and trained include six specific country models that were tuned to perform well within the nation’s territorial boundaries, two regional models, and a global model. The two regional models were the East African model that trained with school data from Kenya and Rwanda, and the West African model that trained with datasets from Sierra Leone and Niger. The global models were trained with all countries’ school datasets. Both regional and global models were trained to generalize well in the geo-diverse landscape, by testing the East African regional and Kenya tile-based school classifier models in Kenya, we found the regional model outperformed the country-specific. It indicates that the model that was exposed to diverse looks and school features can outperform the model that only trains with limited features.

Following our past successful research and school mapping exercise in Colombia (see the report and blog post), we identified a set of identifiable school features from the overhead high-res satellite imagery. These features can be observed from space and have clear features, e.g. building size, shape, and facilities. Compared to the surrounding residential buildings, schools are bigger in size and the shapes vary from U, O, H, E, or L.

Despite their varied structure, many schools have identifiable overhead signatures that make them possible to detect in high-resolution imagery with modern deep learning techniques.

These identifiable school features were feature engineered as supertiles. Supertile is in 512x512x3 instead of regular slippy map tile in 256 x 256 x3. It contains more spatial information and is also in a higher spatial resolution (see the following figure.). We found that supertile provide more spatial information with higher dimensional imagery and boosted the model performance for all countries specific models.

A supertile of zoom 17 is made of 4 zoom 18 tiles, therefore zoom 17 tiles (on the left) is only equivalent to a quarter of supertile (on the right). The supertile represents higher spatial resolution and higher image dimension (512x512 pixels) compared to the original 256 x 256 pixels tile in zoom 17.

All six-country models performed really well with the validation dataset that all the model validation F1 scores were above 0.9 except the Niger country model (0.87). The detailed model evaluation metrics, including precision, recall and F beta scores can be visualized through Table 4 Model evaluation column.

Despite their varied structure, many schools have identifiable overhead signatures that make them possible to detect in high-resolution imagery with deep learning techniques. Approximately 18,000 previously unmapped schools across 5 African countries, Kenya, Rwanda, Sierra Leone, Ghana, and Niger, were found in satellite imagery with a deep learning classification model. These 18,000 schools were validated by expert mappers and added to the map. We also added nearly 4,000 unmapped schools to Kazakhstan and Uzbekistan in Asia, additional 1,097 schools in Honduras. In addition to finding previously unmapped schools, the models were able to identify already mapped schools up to ~ 80% depending on the country. To facilitate running model inference across over 71 million zoom 18 tiles of imagery our team relied on our open-source tool ML-Enabler.

For the detailed and main findings from tile-based school classifier country models is in the following table: 1) the column “Known school” is validated school geolocations that have clear school feature (See Table 1 “TOTAL POINTS PER TAG AFTER VALIDATION” “YES” category; 2) column “ML threshold” is the model confident score threshold from each country model; 3) the column “Total detected” are the total number of detected schools with the certain ML threshold scores; 4) “ML output validation” indicates after the expert mappers validation of the ML outputs, the number of confirmed schools “Yes”, unrecognized schools “Un-reg” and “No” schools; 5) “True capture” is the percentage of real schools that correctly predicted by ML model, they higher the percentage means the country ML model perform better in the country; 6) “Difference” is the number of schools that ML models did not find; 7) “Double confirmed” are the number of schools detected by ML models and are also known schools; 8) Unmapped schools are the schools that currently are not on the map but detected by ML models and confirmed by the expert mappers.

The six-country models we trained obtained high F1 scores in test datasets. The models were able to identify already mapped schools up to ~ 80% depending on the country. At the end of the project, we added 18,000 unmapped/missing schools in Africa, 4,000 to Asia, and more than 1,000 in Honduras.
The two test countries, Ghana and Uzbekistan don’t have their own country model trained. We ran model inference in Ghana with the West African regional model that trained with Niger and Sierra Leone and applied the Kazakhstan country model to Uzbekistan. Even though the ML confidence scores were high to filter the ‘predicted’ schools tiles for both countries (see ML threshold column), we were able to identify a fair amount of double confirmed and unmapped schools in both countries. The high ML threshold scores indicate a lower false-negative rate.

The unmapped schools in country-specific can be explored via the GIFs or our online map viewers.

Kenya map viewer
Rwanda map viewer
Sierra Leone map viewer
Niger map viewer
Ghana map viewer
Kazakhstan map viewer
Uzbekisthan map viewer
Honduras map viewer

Another AI model we explored in this phase of the project is the direct school detection model that is based on the object detection CNN model. In the direct school detection model over Kenya, the initial result looks promising. Model training will take about another 30 hours over the weekend until it reaches 50000 steps. The model inference with the trained direct school detection in Kenya. The speed of the ML inference is 23.4 supertiles per second (1404 supertile per minute) or 2,906,997 supertiles over 34.5 hours. We still use ML-Enabler to perform our model inference for both tile-based school classifier models in eight countries. Compared to the tile-direct school classifier, the direct school detention model in Kenya is less performing (explore our online map), which caused by object detection is less accurate and slower to train and inference than image classification models in general.

The GIF showed exactly how the model performance improves through the model training steps increase, with the ground truth on the right, and prediction on the left.

ML-Enabler1 is a machine learning integration tool in partnership between Development Seed and the Humanitarian OpenStreetMap Team2. ML Enabler is a registry for machine learning models in

OpenStreetMap and aims to provide an API for tools like Tasking Manager3 to directly query predictions. For the tile-based school classifier model inference is way faster than the direct school detection model. The speed of inference with ML-Enabler is up to 2000 supertile per second and 12,000 supertiles per minute. To run over 18 million supertiles for eight countries took us about 25 hours to scan and search for schools.

AI models bring their own pros and cons. The ML model scalability can be the main bottleneck when it comes to the application’s global scale. In the challenges and discussion section, we discussed detailed challenges that we were facing from model scalability, pros, and cons of AI models, the roadmap for the future work that may involve human-in-the-loop, and active learning methods. We were able to solve model scalability issues by mindfully design the internal data validation, model training on Google Kubernetes Cluster Engines with Kubeflow, model inference with our open-sourced tool, ML-Enabler.

However, distinguished school features that can be feature engineered to train the model also means we may introduce human bias to the model. In the end, the model may be able to recognize schools that are in distinguished building complexes, similar buildings rooftop, having swimming pools or basketball courts, but maybe really bad at recognizing schools that have smaller building size and in poorer neighborhoods or even densely populated urban areas. Bringing in human-in-the-loop is critical, especially, people with local knowledge about local school features. Such knowledge is harder to transfer to expert mappers who may grow up in a different culture and architecture context of schools. Though human-in-the-loop itself is not enough if our end goal is to have a machine learning model that speeds up missing schools searching and puts them on the map. An active learning method that allows human-AI to work together and improve the model prediction power is the foreseeable future to go, considering we’ve developed all the necessary toolings and technology under this phase of work that will help us to achieve this goal.

We train six-country models (Kenya, Sierra Leone, Kazakhstan, Rwanda, Niger, and Honduras), two-regional models (West and East Africa regional models), and a global model with all the combined training dataset from six countries under tile-based school classifier model training. The model training and experiment were deployed to GKE with Kubeflow and TFJob.

__________________________________

1 "On-demand machine learning predictions for mapping tools ...." 5 Aug. 2020, https://developmentseed.org/blog/2020-08-05-on-demand-machine-learning-predictions-for-mapping-tools/. Accessed 17 Feb. 2021.
2 "Humanitarian OpenStreetMap Team." https://www.hotosm.org/. Accessed 17 Feb. 2021.
3 "HOT Tasking Manager." https://tasks.hotosm.org/. Accessed 17 Feb. 2021.