Developed by

Background

Countries create Special Economic Zones (SEZ) to encourage outside investment and economic growth in an area or industry. The creation of a SEZ involves a range of incentives, including: direct investment; trade and tax incentives; infrastructure improvements; free or cheap real estate; and others. The World Bank provides resources to some of these zones and more broadly would like to understand the economic impact of these SEZ arrangements. In particular, the World Bank is interested in regular, reliable, cost-effective, indicators of SEZ performance that are comparable between countries. This project investigates opportunities to generate economic indicators for SEZ regions using remote sensing data. Factors such as overall transport infrastructure (total length of roads), or number of buildings within the zone, can be used as in indicator of the growth, development, and impact of a SEZ. These indicators can be observed from space. Further, these observations can be regularly collected from public and commercial satellites using a standardized methodology. This collection can be automated to provide regular updates of all SEZ regions across the world.

Phase I

The goal of Phase I is to evaluate the overall viability and cost of this approach. To assess this, we selected nine SEZ regions and derived indicators across nine indicators. We utilized different data sources, including some open imagery data and high-resolution commercial data. This exercise provided insight into data availability, cost, and quality of indicators drawn through remote sensing methods. To assess the ability to automate this process, we developed repeatable tools to calculate a set of different quantitative indicators that should provide insight into any economic development. Here, there are 9 different methods used to calculate economic indicators over the 9 SEZs. A variety of data is used for calculating the indicators, but our strategy was to use open data wherever possible. This rest of this document describes the results and summary of each region. A technical description and discussion of challenges associated with each indicator are also given.

The goals of Phase 1 were to:

Develop code to calculate 8 different quantitative indicators for SEZs
Document methods developed and recommend future directions
Provide quantitative assessments of all SEZs using these indicators at two (or more) points in time
Assess quality, feasibility, and cost associated with calculating each indicator on a regular basis
Provide recommendations for evaluating economic indicators in reference regions (e.g., country-level, nearest large city).

Indicators

The indicators examined in Phase I are listed below and are ordered by implementation difficulty (from easiest to most difficult). See the methodology section on each for more detail on the algorithm and the data used.

Indicator	Description
1. Land classification	Classification of land usage (e.g., bare earth, vegetation, buildings, etc.)
2. Electricity consumption	Amount of nighttime lights usage as a proxy for electricity consumption
3. Transport infrastructure	Identification of road network to calculate total kilometers of roadways
4. Building footprints	Identification of buildings to calculate total square meters of buildings
5. Pollution	Estimation of air quality using aerosol optical thickness data from VIIRS
6. Building counts	Count of number of individual buildings
7. Traffic flow	Estimation of number of cars
8. Electricity reliability	Identification of nighttime power outages as a proxy for grid reliability
9. Call data records	Tracking of broad patterns in mobility, mobile payments, etc.

Data Sources

The data sources used in Phase I are listed below. Digital Globe imagery includes high resolution imagery from the GeoEye, Quickbird, Worldview-02 and Worldview-03 sensors and requires access to DG's GBDX platorm.

Data Source	Description
Sentinel-2	Public, medium spatial resolution, high temporal resolution
Digital Globe Imagery	Commercial, high spatial resolution, low temporal resolution
VIIRS Nighttime	Public, low spatial resolution, clean monthly data averages produced by NASA
VIIRS Aerosol Optical Thickness	Public, very low spatial resolution

Key Findings

Our preliminary findings for these indicators are generally positive, but vary somewhat by indicator. Given the time available, we sometimes relied on existing detection algorithms that have be developed for use in the US and Europe. Often these methods performed poorly, but were still able to give some important insight into the lower bound of accuracy we can expect, and the cost involved in running this algorithm. The graph below provides our preliminary findings

Indicator	Data Source	Observed Quality¹	Expected Quality²	Cost³
Land Classification	Sentinel/DG	Excellent	Excellent	Low/High
Electricity Consumption	VIIRS Night	Good	Good	Low
Transport Infrastructure	DG	Poor	Good-Excellent	Medium
Building Footprints	DG	Poor	Good-Excellent	Medium
Pollution	VIIRS AOT	Inferior	Poor	Low
Building Counts	DG	Fair	Good-Excellent	Medium
Traffic Flow	DG	Inferior	Good	High
Electricity Reliability	VIIRS Night	Inferior	Fair	Low/Medium
Call Data Records	CDRs	-	-	High

Observed Quality (Inferior, Poor, Fair, Good, Excellent) is our assessment of the usefulness of the data collected using existing algorithms
Expected Quality reflects that accuracy that we believe we could achieve with improved algorithms developed from existing data and machine learning technology.
Cost metric is one of: Low, Medium, High. This is our assessment of the financial costs (primarily for compute resources and satellite imagery) for each indicator at scale (i.e., hundreds of zones). See below for more info.

Costs

Low: Low cost means the data is publicly available at no cost and only the computation resources are required. With existing trained models, all of these algorithms can be run on an EC2 instance. Individual zones tend to be small, so even scaling up to hundreds of zones would not require more than a modest EC2 instance. Less than $100 per month in computing costs.
Medium: The computing costs are similar to above (<$100/month), but also require commercial data tiles. Diregarding the model training costs (see below), on average each zone will require 100 tile requests (zoom level 17), per date. Thus 400 zones will require 40,000 tile requests per date. The exception to this is Electricity Reliability which is Medium cost not because of commercial data but because it requires a much greater amount of publicly available data.
High: The computing costs are higher than above because higher resolution data is being used (zoom level 19), but certainly less than < $200/month. The higher costs come from the number of tile requests needed, which would be roughly 1500 per zone, per date. Thus for 400 zones, 600k requests would be made, per date.

Training costs

In comparison to the above costs, costs for training models (for Transport infrastructure, Building footprints and area, Traffic Flow) are far less. It takes about 10,000 tile requests to train a model, which can then be used across all zones, and a few days of processing time (< $50).

Monitoring SEZ

Navigation & Colophon

Executive Summary