AI models
Methodology
Tile-based school classifier
To quickly train the tile-based school classifier, we create a deep learning python package called “School-Classifier”. It is a deep learning package that utilizes pre-trained models from ImageNet. Xception is the main backbone pretrained model in the package. The package is designed to quickly install, transfer-learn, and fine-tune image classifiers with the built-in pre-trained models that can be deployed to Google Cloud Kubernetes Engine (GKE)26 with Kubeflow. The model training and experiment were deployed with a TFJob YAML file. The school classifier pipeline can be used to train other classifiers rather than just school in this case. They were written by Keras, a high-level python package that can allow users to quickly reconstruct neural networks Google's Tensorflow used as a backend. Here we use a TFJob Yaml file for the Kazakhstan country model as a showcase (Figure 6)
The training dataset for Each country model was split into train, validation, and test sets by 70:20:10 ratio. When a model training was kicked off to GKE Cluster with TFJob, it was trained and validated with train and validation sets using F1, precision, and recall scores. The training is tracked and monitored by Tensorboard27. TensorBoard is a TensorFlow's visualization toolkit that is tracking and visualizing metrics such as loss and accuracy during model training.
Direct school detection
We used TensorFlow’s Object Detection API to train object detection models for this task. Object detection models take an image as input and generate bounding boxes, predicted classes, and confidence scores for each prediction. Using the TFRecords training data, we trained a model of direct school detection on GCP with Kubeflow. The pipeline is very similar to the tile-based school classifier that showcases in Figure 7, except the school classifier model was a replacement script for direct school detection in this case. The Kubeflow is a tool that makes ML workflows on Kubernetes be deployed easier, simpler, portable, and scalable.
Supertile query for model inferences
We rely on population data a lot to improve the diversity of training data as well as identify an area of interest (AOI) to run inference over. This allows us to be efficient in our inference and validation processes. The baseline assumption for certain ML problems is that we don't want to try to run the model over an area where we know nobody is living — for example in school searching missions, we want to detect schools in areas where we think there are people living and have also certain population density.
In this phase, to be able to narrow down the searching space and only select and make supertiles that are only relevant to the populous parts of countries. We used a combination of WorldPop28 and OpenStreetMap. WorldPop is a 100m spatial resolution contemporary data on human population distributions. Some of this work includes osm-coverage-tiles29, which is Development Seed’s open source script to get tiles that cover any feature using the OpenStreetMap data. In this case, we un-rasterized WorldPop raster as points, and extract highway, buildings, sports, amenity, leisure, landuse= residential from OSM country PDF files and merge the layers and convert to mbtiles, and then using osm-coverage-tiles to get zoom 16 populated tiles (see the following Table).
We end up having 71 million zoom 18 tiles and 18 million supertiles of zoom 17 tiles for all the countries that we need to run the model inference over.
ML-Enabler for scalable model inference
ML-Enabler30 is a machine learning integration tool in partnership between Development Seed and the Humanitarian OpenStreetMap Team31. ML Enabler is a registry for machine learning models in OpenStreetMap and aims to provide an API for tools like Tasking Manager32 to directly query predictions.
ML Enabler makes it incredibly easy to spin up infrastructure to run your model along with all necessary resources. Through the new user interface, you can upload new models, spin up Amazon Web Services (AWS)33 resources, generate and preview predictions. Behind the scenes, ML Enabler uses AWS Cloudformation34 and will work with any AWS account. A few key infrastructure choices like instance count and concurrency can be made directly from the ML-Enabler interface. ML-Enabler uses lambda functions for downloading base64 images for inference from the specified Tiled Map Service (TMS) endpoint and writing inference outputs into the database.
Users can monitor the tile prediction queues right from the UI (Figure 8). When the processing is complete, predictions are automatically displayed in the map tab. It’s easy to toggle between different classes in your model, and filter predictions based on confidence threshold. Over each tile, the model’s raw output and the confidence score are displayed. This makes it really convenient to explore spatial patterns within the inferences. Here is a GIF showcases how we used ML-Enabler to run inferences for the country, regional and global models. Both model inferences of tile-based school classifiers and direct school detection were run by using ML-Enabler. Model outputs were then exported as CSVs that later were validated by our expert mapper before we put the schools on the map (Figure 7).
__________________________________
26 "Google Cloud Kubernetes Engine (GKE)." https://cloud.google.com/kubernetes-engine. Accessed 17 Feb. 2021.27 "TensorBoard | TensorFlow." https://www.tensorflow.org/tensorboard. Accessed 17 Feb. 2021.
28 "WorldPop." https://www.worldpop.org/. Accessed 25 Feb. 2021.
29 "developmentseed/osm-coverage-tiles: Gather infrastructure ... - GitHub." https://github.com/developmentseed/osm-coverage-tiles. Accessed 25 Feb. 2021.
30 "On-demand machine learning predictions for mapping tools ...." 5 Aug. 2020, https://developmentseed.org/blog/2020-08-05-on-demand-machine-learning-predictions-for-mapping-tools/. Accessed 17 Feb. 2021.
31 "Humanitarian OpenStreetMap Team." https://www.hotosm.org/. Accessed 17 Feb. 2021.
32 "HOT Tasking Manager." https://tasks.hotosm.org/. Accessed 17 Feb. 2021.
33 "Amazon AWS." https://aws.amazon.com/. Accessed 17 Feb. 2021.
34 "AWS CloudFormation - Infrastructure as ...." https://aws.amazon.com/cloudformation/. Accessed 17 Feb. 2021.