Countries create Special Economic Zones (SEZ) to encourage outside investment and economic growth in an area or industry. The creation of a SEZ involves a range of incentives, including: direct investment; trade and tax incentives; infrastructure improvements; free or cheap real estate; and others. The World Bank provides resources to some of these zones and more broadly would like to understand the economic impact of these SEZ arrangements. In particular, the World Bank is interested in regular, reliable, cost-effective, indicators of SEZ performance that are comparable between countries. This project investigates opportunities to generate economic indicators for SEZ regions using remote sensing data. Factors such as overall transport infrastructure (total length of roads), or number of buildings within the zone, can be used as in indicator of the growth, development, and impact of a SEZ. These indicators can be observed from space. Further, these observations can be regularly collected from public and commercial satellites using a standardized methodology. This collection can be automated to provide regular updates of all SEZ regions across the world.
The goal of Phase I is to evaluate the overall viability and cost of this approach. To assess this, we selected nine SEZ regions and derived indicators across nine indicators. We utilized different data sources, including some open imagery data and high-resolution commercial data. This exercise provided insight into data availability, cost, and quality of indicators drawn through remote sensing methods. To assess the ability to automate this process, we developed repeatable tools to calculate a set of different quantitative indicators that should provide insight into any economic development. Here, there are 9 different methods used to calculate economic indicators over the 9 SEZs. A variety of data is used for calculating the indicators, but our strategy was to use open data wherever possible. This rest of this document describes the results and summary of each region. A technical description and discussion of challenges associated with each indicator are also given.
The goals of Phase 1 were to:
- Develop code to calculate 8 different quantitative indicators for SEZs
- Document methods developed and recommend future directions
- Provide quantitative assessments of all SEZs using these indicators at two (or more) points in time
- Assess quality, feasibility, and cost associated with calculating each indicator on a regular basis
- Provide recommendations for evaluating economic indicators in reference regions (e.g., country-level, nearest large city).
The indicators examined in Phase I are listed below and are ordered by implementation difficulty (from easiest to most difficult). See the methodology section on each for more detail on the algorithm and the data used.
|1. Land classification||Classification of land usage (e.g., bare earth, vegetation, buildings, etc.)|
|2. Electricity consumption||Amount of nighttime lights usage as a proxy for electricity consumption|
|3. Transport infrastructure||Identification of road network to calculate total kilometers of roadways|
|4. Building footprints||Identification of buildings to calculate total square meters of buildings|
|5. Pollution||Estimation of air quality using aerosol optical thickness data from VIIRS|
|6. Building counts||Count of number of individual buildings|
|7. Traffic flow||Estimation of number of cars|
|8. Electricity reliability||Identification of nighttime power outages as a proxy for grid reliability|
|9. Call data records||Tracking of broad patterns in mobility, mobile payments, etc.|
The data sources used in Phase I are listed below. Digital Globe imagery includes high resolution imagery from the GeoEye, Quickbird, Worldview-02 and Worldview-03 sensors and requires access to DG's GBDX platorm.
|Sentinel-2||Public, medium spatial resolution, high temporal resolution|
|Digital Globe Imagery||Commercial, high spatial resolution, low temporal resolution|
|VIIRS Nighttime||Public, low spatial resolution, clean monthly data averages produced by NASA|
|VIIRS Aerosol Optical Thickness||Public, very low spatial resolution|
Our preliminary findings for these indicators are generally positive, but vary somewhat by indicator. Given the time available, we sometimes relied on existing detection algorithms that have be developed for use in the US and Europe. Often these methods performed poorly, but were still able to give some important insight into the lower bound of accuracy we can expect, and the cost involved in running this algorithm. The graph below provides our preliminary findings
|Indicator||Data Source||Observed Quality1||Expected Quality2||Cost3|
|Electricity Consumption||VIIRS Night||Good||Good||Low|
|Electricity Reliability||VIIRS Night||Inferior||Fair||Low/Medium|
|Call Data Records||CDRs||-||-||High|
- Observed Quality (Inferior, Poor, Fair, Good, Excellent) is our assessment of the usefulness of the data collected using existing algorithms
- Expected Quality reflects that accuracy that we believe we could achieve with improved algorithms developed from existing data and machine learning technology.
- Cost metric is one of: Low, Medium, High. This is our assessment of the financial costs (primarily for compute resources and satellite imagery) for each indicator at scale (i.e., hundreds of zones). See below for more info.
Low: Low cost means the data is publicly available at no cost and only the computation resources are required. With existing trained models, all of these algorithms can be run on an EC2 instance. Individual zones tend to be small, so even scaling up to hundreds of zones would not require more than a modest EC2 instance. Less than $100 per month in computing costs.
Medium: The computing costs are similar to above (<$100/month), but also require commercial data tiles. Diregarding the model training costs (see below), on average each zone will require 100 tile requests (zoom level 17), per date. Thus 400 zones will require 40,000 tile requests per date. The exception to this is Electricity Reliability which is Medium cost not because of commercial data but because it requires a much greater amount of publicly available data.
High: The computing costs are higher than above because higher resolution data is being used (zoom level 19), but certainly less than < $200/month. The higher costs come from the number of tile requests needed, which would be roughly 1500 per zone, per date. Thus for 400 zones, 600k requests would be made, per date.
In comparison to the above costs, costs for training models (for Transport infrastructure, Building footprints and area, Traffic Flow) are far less. It takes about 10,000 tile requests to train a model, which can then be used across all zones, and a few days of processing time (< $50).