Computer Vision Sustainability Data Science Challenge | My Website

Satellite Image Recognition (Top 3)

Project Dates: October 2021 - December 2021

Challenge Site
This project was in collaboration with a team member for a sustainability focused data science competition sponsored by Microsoft and Capgemini.

Project Overview

This project involved developing an end-to-end computer vision pipeline for the purpose of predicting 18 land use labels related to carbon emissions, biodiversity, and solar potential. To this end, my partner and I customized a popular high level git repository, adding further methods tailored to our problem setting. We developed what seems to be, after conducting a review of the relevant resources, a novel method for balancing multi-label image data. This method used the earth movers distance metric to iteratively augment only the set of images within the dataset that brings the overall label distribution closer to a uniform. Additionally, we tested several popular semantic segmentation deep learning models, including Unet, Pspnet, and Segnet, with backbones including resnet50 and vgg16, among others. Our final model, which achieved the highest out of sample F1-score in the competition, was a Resnet50_Unet. Additionally, we implemented a custom method for mapping the set of labels to a reduced space if the model users are interested in focusing on a certain outcome, such as emissions, in particular.

Data Sources

The satellite images and true pixel level labels were provided by the competition sponsors. The data set was small, ~400 images in total, and significantly unbalanced.

Tools Used

As part of the challenge, we were given a budget to train our image models using the Azure Databricks cloud computing platform. The coding was done using python within the databricks notebook environment. Libraries used included pandas, tensorflow, keras, imgaug, and others.