Developed for UNICEF

Challenges and discussion

Model scalability

To develop global scale school classifier or dire detection models, the scalability can not be overlooked. The scalability ranges from:

  • massive data validation exercises over geo-diverse landscapes, not to mention that schools look very different from rural to urban, from culture to culture and nation to nation;
  • Model development and training with diverse selected schools feature is a challenge task itself, and to create a generalized model that can actually searching for school-like building complexes from millions and trillions of satellite image tiles is not an easy task;
  • The biggest bottleneck for such a massive school searching and mapping through high-res satellite imagery archive, the model inference speed can be a game terminator.

In the following sectors we will be talking about how we’ve overcome the scalability issues that exist from data validation, model training and model inference. But scalability is still an important and ongoing task that we are trying to overcome from one machine learning project to another.

Data tooling

The ultimate goal of this project is to develop six country tile-based school classifier models that would be generalized and well-performing to accomplish their tasks. The tasks not only include if the models can identify known schools but also can find “school-like” building complexes in Asia, Africa and South America. To be able to do so the high-quality training datasets that have very distinguished schools feature on the overhead imagery is the key. To accomplish this, our expert mappers have to go through about 200,000 given schools from both UNICEF and OSM in the selected countries (Table 1). The data validation task is not trivial. Our expert mappers have to: 1) overlay the schools geolocation on top of the overhead imagery, in this case is Maxar Vivid basemap; 2) use the schools feature rule to quickly separate “schools” into “Yes”, “No” and “Unrecognized” categories for different countries. In the last phase of the project, our speed was 250 schools per hour, which means if we could not speed up our data validation, the data validation itself would take up to 800 hours.

Data tooling that could speed up the school data validation process is the key. The data validation includes: 1) training data validation that expert mappers need to find schools that with distinguished school features from overhead imagery, that can help model learn about what makes a school school; 2) Machine learning model inference result validation. We had more than 113,000 detected schools for 8 countries in Asia, Africa and South America, which could potentially cost us another 450 hours with the speed of 250 schools per hour.

We were using our internal validation tool, Chip-Ahoy, to validate ML output schools into three categories: yes, no, and un-recognized schools. Our past speed for ML school validation was 250 schools/hour, and we could validate 1083 tiles/hours (a school may cover >= 1 tile) with Chip Ahoy.

ML model training and model generalization

ML model can be trained to be generalized, if the training dataset are:

  • In high quality;
  • Data itself are representative or reflect of its real work representation;
  • Classes are not too imbalanced;
  • Hyper-parameter searching is an option.

The above issues proposed certain challenges when it comes to training large country, regional or global school classifiers or direct school detection models. To be able to achieve high-performing models, each challenge needs to be addressed specifically.

For instance, to obtain high quality training dataset, you will need to have either well-trained data annotators or have a data labelling platform that can “rate” annotators’ labelling skills. Geo-diversity and representative training dataset is very critical, t-SNE can be applied to decode and plot high-dimensional data to clusters that makes sense for such an evaluation. When it comes to class imbalance, to have a similar representative dataset for ‘schools’ and ‘not-school for binary classification machine learning models are actually manageable, and we just need to make sure the “not-school” class is diverse and inclusive to all the “classes” are included. For instance, water, desert, forest, residential buildings, dense urban, rural area, critical infrastructures and clouds are all examples of “not-school” class.

The last but not least model hyper-parameter searching and tuning is very computational intensive. When we don’t have a perfect and high-quality training dataset, we will run into searching for optimized parameters that are specific to certain classes or regions.

During the phase of the project above, solutions to address the challenges mentioned above were what we’ve done exactly. You can refer back to what we’ve done specifically in the section of technical report.

ML model inference speed

When it comes to applying trained machine learning models to make national wide prediction or inference, it may take weeks or even months. Therefore, making sure that the inference pipe can:

  • Serve the data to the model prediction or inference very efficiently;
  • Utilize cloud computer, especially, GPUs for model inference instead of only CPU;
  • Be containerized and run with RESTful API, in this case we use TFServing38.
  • Be paralleled instead of single prediction at a time. In this case we use our open-source tool called ML-Enabler39. Please refer back to the ML-Enable technical section.

__________________________________

38 "Serving Models | TFX | TensorFlow." https://www.tensorflow.org/tfx/guide/serving. Accessed 24 Feb. 2021.
39 "developmentseed/ml-enabler: ML Enabler - machine ... - GitHub." https://github.com/developmentseed/ml-enabler. Accessed 24 Feb. 2021.