Developed for World Bank Group

Methods

Land classification

For the purposes of the SEZ project, land classifications of interest include Bare Earth, Vegetation, and Impervious Surfaces (i.e., human-built surfaces). Spectrally, these classes are fairly distinct (even in just the visible spectrum) and can be grouped without any training data using clustering algorithms, such as k-means. Using k-means, a fixed number of classes are selected and the algorithm groups every pixel into k distinct classes. Of interest here are 3 classes, but more classes are used in order to account for other types of materials and effects (other materials, shadows, sun glint, BRDF effects from varying off-nadir angles, etc). For these zones, most of the areas will be covered by these three main classes, with outliers filtered into the extra incidental classes.

The same clustering methods can be used on any other imagery, and can make use of any number of spectral bands. With open Sentinel-2 data, the 10m resolution means that small buildings and roads may not be detected, leading to potentially significant errors in the land cover map. For higher accuracy, or looking at historical data before mid 2015, higher resolution Digital Globe imagery can be used. One of the biggest advantages of this indicator is that it doesn't require any specific resolution or number of bands -- the output resolution, and quality of segmentation, is based on the input data.

One drawback with unsupervised models like k-means is that the classes are not labeled automatically. A human is required to look at the image and the classmap to determine the labels. This could require looking at several scenes per zone and labeling the classes, which would not scale well with the number of zones and number of scenes per zone. While outside of the scope of Phase I, automatic labeling could be performed after clustering by using roughly derived spectral signatures for the 3 classes.

Conclusions

Across all the zones the land classification process worked well, yielding results that could be confirmed accurate via visual inspection of the images used. Clustering algorithms work best when there is a limited amount of spectral variability within classes of interest, such as when buildings tend to be made of the same materials and color, or roads are all constructed the same. Additionally, the more 'Other' types of materials there are (i.e., water, sand) the more difficult it is to label those images.

Hawassa IDZ, Ethiopia is a good example of a zone where there is little spectral variability between the buildings since they were all built at the same time using similar materials. Coega IDZ, South Africa conversely is a good example of a lot of spectral variability. Not only are buildings different across the zone but the large size of the zone means that even the spectral signatures of earth and vegetation vary naturally across the zone.

In terms of processing, the k-means clustering component of this indicator can be scaled up very easily. Even for 8 band high resolution imagery clustering algorithms do not take very long for most zones, less than a few minutes. The Coega zone, 50 times larger than the others, takes less than 30 minute to run on an 8 band scene. Running k-means on Sentinel data takes seconds. The difficulty in scaling up the land classification algorithm isn't in the processing, it's in the labeling of the resulting classification maps. Currently this must be done manually, where a user must specify the new class (vegetation, earth, built, other) for each output class is (typically a total of 5 classes initially). Due to the low number of final desired classes, however, this could be automated in a few different ways. In the future, classes could be auto-labeled by creating a simple spectral library for veg/earth/built and assigning each output class based on least Mahalanobis distance within some threshold, otherwise assign to 'Other'.