Developed for UNICEF

Training data

Training data

A high-quality training dataset is essential for machine learning models to accurately learn object features. Our first step was to prepare a set of verified school locations, as well as a set of locations that are verified not to contain schools. The CNN will use its knowledge of both to build a model of determining whether any given image contains a school.

Figure 2: Training dataset generation for tile-based school classification went through 4 steps: 1) Step1. Data sourcing from UNICEF and OSM (Table 1); 2) Step 2. Expert mapper at Development Seed validated the school geolocation, and classified the schools into ‘YES’, ‘NO’ and ‘Unrecognized’ based on their visual features; 3) Step3. Supertile generated from ‘school’ and ‘not-school’ for the binary tile-based school classifier and geodiversity of the tiles were done through t-SNE; 4) Step 4. TFRecords of train, validation and test sets were created to train following Tensorflow tile-based school classifier. For more information please see the following sections.