U-Net based building footprint pre-annotation
In this article, I would like to present our new building pre-annotation for aerial images of SuperAnnotate platform, share it’s code and algorithm and the motivation of integrating it into our platform.
Outline
- Motivation: why did we decide to create a buildings pre-annotation
- Description of our algorithm and code
- Future Roadmap
- Concluding remarks
Motivation: why did we create a building pre-annotation algorithm
Aerial images annotation is a tedious work, and annotating hundreds of thousands of buildings takes a lot of effort and funds. Here at SuperAnnotate we strive to use state of the art computer vision technology to automate and accelerate the creation of pixel-perfect annotations. As a part of that effort, several smart pre-annotation algorithms were integrated into the (SuperAnnotate) platform, allowing our users to “fix” the auto-generated annotations, instead of starting from scratch. This allows our users to generate annotations of the same quality with less effort.
See Fig. 2 on how to get the auto-generated annotations on the SuperAnnotate vector projects.
The algorithm and code description
Our algorithm is based on the winning solution of Spacenet Building Detection. SpaceNet is a corpus of commercial satellite imagery and labeled training data to use for machine learning research. They host building and road detection challenges and open-source the best solutions.
The winner of the second building detection challenge uses a segmentation algorithm called U-Net, then cuts the segmentation mask into building footprints. U-Net architecture is shown in Figure 2. It can be summarized as an encoder-decoder network, with skip connections between the corresponding layers of encoder and decoder. U-Net is fast to train and has good performance even on relatively small datasets. Winner of the Spacenet challenge used only 4 layers instead of 5 by removing the last layer with 1024 channels. We verified that adding the layer back does not result in any improvement.
We merged the datasets of Vegas, Paris and Shanghai and trained a single network on the whole data. We did not use Khartoum city annotations due to lower quality of annotations. We also added some augmentation, which helped the network adapt to images from cities not present in the training data. Our pytorch code is open-sourced here with all the necessary instructions. We were able to achieve an IoU of 0.545 on the test set.
Future Roadmap
Currently, we have a model that works fairly well on most city images we have. Yet we believe that our model will benefit from adding more data from different cities. We also plan to implement a road detection algorithm to assist our users in road annotations.
Concluding remarks
We will keep updating on our progress in this medium channel regarding building and road pre-annotations. Please follow this channel to be first to get those updates!
Author: Martun Karapetyan, CV engineer at SuperAnnotate