In partnership with Global Wildlife Conservation logo

Challenges

Core Challenges and Setbacks

This project faced a number of significant challenges, not only from the ongoing worldwide pandemic, but also some technical issues in implementation. The challenges cover issues from logistical problems resulting from the pandemic, data creation and sharing, data quality, training data class imbalance, model training and experiments, and model inference speed. Some were expected (logistical problems and the issue of small targets within large images), but the pandemic led to poor communications as people adapted to a work-from-home mentality, which caused difficulty delivering training data labels from TZCRC Annotation Lab.

Logistical Problems from the COVID19 Global Pandemic

  1. The lab setup was delayed as the lab was opened around the time that pandemic measures came into play;
  2. Supervision of the annotators was extremely difficult as the lab was opened around the time that pandemic measures came into play, and the local project manager was unable to spend significant time during the main part of the annotation work.

Dataset Creation and Sharing

  1. Setting up a new labeling tool, CVAT, and adding labeling tasks for first time volunteers was a learning curve, along with the logistical problems caused by the pandemic.

  2. The objects that appear in aerial images are small. The complex image background, variable image lighting, shading, and imaging angles added complexity to labelling tasks even though the annotators are wildlife domain experts.

  3. Aerial surveys were cancelled or delayed by partner agencies. An expected pipeline of regular high-resolution images was not available for use during the project.

  4. As identified in the proposal stage, aerial imagery presents several challenges for ML development.

    a. Aerial survey photography must cover wide strips (around 150m), and even with relatively high-resolution cameras (25 MP) target animals are often 20-40 pixels across, or less.

    b. Backgrounds vary dramatically depending on the habitat, time of day and even seasonal changes.

    c. The oblique imagery captured in PAS allows the observation of animals under canopy and for better ID of species - however, animal postures in oblique images vary considerably more than top-down images.

  5. Bandwidth limitations delayed in image delivery to Development Seed from Tanzania. Though the lab space at the Centre for Research Cooperation was supported by a local ISP, the daily and monthly data caps were rapidly exceeded. The available speed (10 megabit at best, typically much less) meant that images were not uploaded for weeks.

Model Output Validation

  1. Training data quality and error define the model performance and output quality. Currently model outputs from both the AIAIA Classifier and Detectors still need human validation.
  2. At least two to three iterations of human-in-the-loop feedback to correct training data error, classifier and detector outputs’ validation are expected in the following workflow, and each iteration should be followed by model retaining and evaluation until model performance is stabilized.