CrackFormerSV2: Advanced Pavement Crack Segmentation with Swin Transformer V2 and Dual Attention Mechanisms

- 322795
Abstract
Favorite this paper
How to cite this paper?
Abstract

Pavement crack detection and segmentation are critical tasks for effective infrastructure maintenance. Despite promising advances driven by deep learning, significant challenges persist due to the inherent complexities of pavement crack characteristics. This paper introduces CrackFormerSV2, a novel encoder-decoder architecture specifically designed for robust pavement crack segmentation. A key feature of CrackFormerSV2 is its integration of the hierarchical feature extraction capabilities of Swin Transformer V2. Furthermore, the architecture incorporates dual attention mechanisms: the Convolutional Block Attention Module (CBAM) within the decoder blocks to refine feature maps, and a novel Skip Attention module that enhances traditional skip connections through a cross-attention strategy between corresponding encoder and decoder features. An Atrous Spatial Pyramid Pooling (ASPP) module is utilized at the bottleneck to effectively aggregate multi-scale contextual information crucial for capturing diverse crack patterns. The model is trained using a strategic learning rate schedule, employing distinct rates for the pre-trained encoder and the decoder. Evaluations conducted on established public benchmarks and a proprietary dataset demonstrate that CrackFormerSV2 achieves significant performance improvements across key metrics, including Intersection over Union (IoU), recall, precision, and F1 score, outperforming a baseline UNet-ResNet model. Continued optimization, including gradual unfreezing and the exploration of advanced loss functions, suggests a strong potential for CrackFormerSV2 to achieve or even surpass current state-of-the-art results in pavement crack segmentation.

Share your ideas or questions with the authors!

Did you know that the greatest stimulus in scientific and cultural development is curiosity? Leave your questions or suggestions to the author!

Sign in to interact

Have a question or suggestion? Share your feedback with the authors!

Institutions
  • 1 Universidad Nacional de Caaguazú UNCA
  • 2 Facultad Politécnica, Universidad Nacional de Asunción
  • 3 Facultad Politécnica, Universidad Nacional de Asunción.
  • 4 Facultad Politécnica Universidad Nacional de Asunción
Track
  • ST03 - Scientific Computing
Keywords
Crack Segmentation
Deep Learning
Vision Transformer
Swin Transformer V2
Attention Mechanism