CrackFormerSV2: Advanced Pavement Crack Segmentation with Swin Transformer V2 and Dual Attention Mechanisms

- 322795
Resumo
Favoritar este trabalho
Como citar esse trabalho?
Resumo

Pavement crack detection and segmentation are critical tasks for effective infrastructure maintenance. Despite promising advances driven by deep learning, significant challenges persist due to the inherent complexities of pavement crack characteristics. This paper introduces CrackFormerSV2, a novel encoder-decoder architecture specifically designed for robust pavement crack segmentation. A key feature of CrackFormerSV2 is its integration of the hierarchical feature extraction capabilities of Swin Transformer V2. Furthermore, the architecture incorporates dual attention mechanisms: the Convolutional Block Attention Module (CBAM) within the decoder blocks to refine feature maps, and a novel Skip Attention module that enhances traditional skip connections through a cross-attention strategy between corresponding encoder and decoder features. An Atrous Spatial Pyramid Pooling (ASPP) module is utilized at the bottleneck to effectively aggregate multi-scale contextual information crucial for capturing diverse crack patterns. The model is trained using a strategic learning rate schedule, employing distinct rates for the pre-trained encoder and the decoder. Evaluations conducted on established public benchmarks and a proprietary dataset demonstrate that CrackFormerSV2 achieves significant performance improvements across key metrics, including Intersection over Union (IoU), recall, precision, and F1 score, outperforming a baseline UNet-ResNet model. Continued optimization, including gradual unfreezing and the exploration of advanced loss functions, suggests a strong potential for CrackFormerSV2 to achieve or even surpass current state-of-the-art results in pavement crack segmentation.

Compartilhe suas ideias ou dúvidas com os autores!

Sabia que o maior estímulo no desenvolvimento científico e cultural é a curiosidade? Deixe seus questionamentos ou sugestões para o autor!

Faça login para interagir

Tem uma dúvida ou sugestão? Compartilhe seu feedback com os autores!

Instituições
  • 1 Universidad Nacional de Caaguazú UNCA
  • 2 Facultad Politécnica, Universidad Nacional de Asunción
  • 3 Facultad Politécnica, Universidad Nacional de Asunción.
  • 4 Facultad Politécnica Universidad Nacional de Asunción
Eixo Temático
  • ST03 - Computação Científica
Palavras-chave
Crack Segmentation
Deep Learning
Vision Transformer
Swin Transformer V2
Attention Mechanism