Developed for World Bank World Bank Group


2. Improving ML output

Detecting substations

Substations are a part of the transmission and distribution system. Their purpose is to transform voltage from low to high prior to electric energy transmission and vice versa to supply transmitted electricity to consumers. The location of these substations is useful when mapping the HV network since most HV towers end or begin at these points -- our Data Team estimated that they would be 15-20% faster with substation locations. It is also valuable information to developers focused on renewable energy projects. The location of substation is vital when planning potential connection points for the next generation of electricity-producing infrastructure.

For these reasons, autonomous detection of substations is an important goal for future HV mapping work. We did map substations as part of this project, but our ML model was not built to explicitly detect them. We relied on the Data Team to trace HV lines to their end points, and they generally terminate at a substation. Building an ML model explicitly capable of detecting substations would accomplish two major goals. First, it would facilitate the mapping process as these substations represent important hubs within the grid's network. Knowing their location is especially useful in crowded urban environments or where HV lines run underground near residential areas. Second, substation detection would provide a much needed tool for integrating renewable energy projects into the larger electricity network. Now that our team has mapped about a thousand substations across three full countries, we expect that we can use this data to train a ML model capable of automatically detecting substations.

Better detection of HV towers

When we began this project, we did not have any validated training data. We tasked our Data Team with manually reviewing three relatively small areas — one region in each country — and ensuring that every meter of ground was checked for the presence of a HV tower. Of course, this was a slow task (and exactly the problem we were trying to solve), but necessary to generate a training set of imagery to build our machine learning model. These initial training datasets covered about 1.05% of the total area that would eventually be processed when moving to the country-wide scale.

Now that we have the complete high-voltage infrastructure mapped in three full countries, we have a vastly larger set of data to train the next iteration of the model. Future iterations should start by adding training data from where the model performed most poorly. For example, the agricultural region of Pakistan is a great place to start as the gridded farmland regularly confused our ML model. In this iteration of the project, the only training data in Pakistan came from the desert mountainous region in the Western half of the country. This is likely the reason that the model generalized poorly to the more lush farmland along the Eastern border. The model was also confused in certain regions where there were issues with DG's imagery including shading or blurriness. Again, having more data with these issues present will provide training data that better represents the true data distribution and ameliorate the problem. In the future, a simple strategy might be to build training imagery from numerous small regions, representing most or all of each country's different terrain (as opposed to a few large regions as done initially). This will result in training data sets that better capture the full distribution of possible images the model might encounter when processing across an entire country.

SAR imagery

Here, we used optical (i.e., RGB) imagery to make all of our predictions. The initial pilot phase of this project, however, involved synthetic aperture radar (SAR) imagery -- an active sensing technique that emits and receives microwave energy. Man-made structures like HV towers strongly reflect SAR imagery making them visible even in low-resolution imagery. This imaging technique is also advantageous in that it is not affected by clouds or shadows. However, SAR suffers in mountainous and forested regions because the technique measures gradients in surface height. While we don’t expect SAR imagery to replace our current optical approach, it may act as a power tool to validate portions of the network with high confidence.