Accuracy and runtime tradeoff in modern edge detection algorithms

How to improve both inference time (2x) and F1 score (by 0.06) of the pruned neural network using knowledge distillation.

In this article, I will share my experience with shrinking contour/edge detection models. I will describe in detail model architectures and training experiments that lead to a 6x faster network that underperforms SOTA models by only ~2%.

Table 1: Benchmark results of all models discussed in this article. Memory usage and CPU time are benchmarked on the input image of size 1000x1000

Outline

  • Evaluation
  • Building Blocks
  • Smaller network architectures
  • Experiments
  • Conclusion

Intro

Evaluation

Precision and recall are calculated on the basis of pixels coincidence of ground truth and predicted maps. Before the evaluation non-maximum suppression is applied to output prediction maps. For this purpose, I use this repo, which is a Python implementation of the original MATLAB evaluation codes posted on the BSDS500 home page. Besides it also includes a non-maximum suppression step.

Figure 1: RCF network architecture, red arrows correspond to upsampling to input size with bilinear interpolation.

Building blocks

Figure 3: Left: ConvBnRelu. Right: InvertedResidualBlock structure, where t is expanding ratio cin is the number of input channels and count is the number of output channels. Depthwise in a ConvBnRelu block means that convolution in that block is depthwise.
Figure 4: Scale Enhancement Module (SEM) structure, convolutional layers in all ConvBnRelu blocks are depthwise, numbers are written after kernel size are the number of output channels and dilation of the convolutional layers respectively. In convolutional layers of all ConvBnRelu blocks, padding is equal to dilation.

Smaller network architectures

Figure 5: Model architecture. IRBlock, t, c, n, s stands for InvertedResidualBlock with expansion factor t, output channel number c, the number of blocks joined sequentially n, the stride of the first convolution in the first block s. Red arrows represent upsampling through bilinear interpolation to the original input size.
Figure 6: SO Model architecture. IRBlock, t, c, n, s stands for InvertedResidualBlock with expansion factor t, output channel number c, number of blocks joined sequentially n, stride of the first convolution in the first block s. Red arrows represent upsampling through bilinear interpolation to the original input size.

Experiments

Experiment 0: Channel Pruning

Figure 2: Performance results of pruned RCF model on images from BSDS500 test set

While we achieved significant performance improvements of around 2.8x speedup, ODS-F1 score dropped from 0.811532 to 0.726367. So we decided to try the knowledge distillation approach with the goal of preserving the speedup we got with channel pruning while also having smaller performance drop. As you will see later we also managed to further increase the speedup.

Experiment 1: SO Model

This baseline model achieves 0.735527 ODS-F1 score. Besides the ODS-F1 score I will also present results of each experiment on 3 images from the BSDS500 test set.

Figure 7: Results of training SO Model on BSDS500 without knowledge distillation.

Experiment 2: KD with Hinton’s approach

Figure 8: Results of training SO Model on BSDS500 using Hinton’s approach of knowledge distillation.

Experiment 3: KD with knowledge adaptor

Figure 9: conv 3x3, 512, relu is the last convolutional layer of VGG16 and conv 1x1, 1280 is the last convolutional layer of MobilenetV2. Green arrows correspond to global average pooling operation.

After they are in the same dimension a mean squared error loss is calculated between them. We train the Model with cross entropy + 1e4 * adaptation loss. This model achieves a 0.787301 ODS-F1 score.

Figure 10: Results of training Model on BSDS500 using knowledge adapter described in Experiment 3.

Conclusion

I will be happy to share my codes and experience with anyone doing non-commercial research. You can freely contact me at Erik@superannotate.com.

Author: Erik Harutyunyan: Machine Learning Researcher at @SuperAnnotate

The fastest annotation platform and services for training AI. Learn more — https://superannotate.com/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store