Developed for World Bank World Bank Group

Methodology

1. Download imagery

To find all satellite imagery for each country, we first downloaded country boundaries from an online database of Global Administrative Areas. As Pakistan has an ongoing border dispute, we confirmed that these borders matched those matched the internal records of the World Bank. We then calculated the indices of every satellite imagery tile that overlapped each country's borders at a specific zoom (or spatial resolution). Tile indices simply consist of 3 numbers: X, Y, and Zoom coordinates reflecting the spatial location and pixel resolution of that tile. We obtained all relevant tiles for each country using a depth-first search algorithm: this algorithm kept a queue of tile indices stored in a last-in-first-out stack. At each iteration, the top tile was removed, it’s spatial boundaries were computed, and the algorithm checked for spatial overlap between the tile boundaries and the country’s boundaries. If this overlap was nonzero, the algorithm computed the four sub-tiles of the original tile (i.e., zooming in a single increment) and added these tiles to the stack. If, when the algorithm removed a tile index in the queue, it found that this tile both (1) overlapped with the country boundary and (2) was at the specified zoom (here, 18), then it appended this tile index to a text file and deleted it from the queue. Once the queue was empty, the algorithm terminated leaving a text file with all tile indices at a specific zoom that cover the country boundary. The advantage here was that it could process millions of tiles without ever storing more than 50 tile indices in RAM. This algorithm will be published as open source code after this project’s conclusion. In the interim, it is available in gen_tile_inds.py within the project's code repository.