VISWIN TRANSFORMER: COULD IT BE THE WAY OUT IN BREAST CANCER DIAGNOSIS? -A CONCEPT PAPER

ABSTRACT

For decades, Breast Cancer has been a major cause of death globally. It has been confirmed that early detection of the disease increases the chances of survival. Deep learning models, particularly transformer-based architectures like Vision Transformers (ViTs) and Swin Transformers, have shown potential in breast cancer diagnosis. However, each model has limitations: ViTs are good in capturing global representations but are weak in getting local features, while Swin Transformers effectively captures local representations but struggles with long range dependencies. This study proposes a hybrid model named the ViSwin Transformer that merges the strength of ViTs and Swin Transformers, there by addressing the individual challenges of the respective models. The model begins with six layers of global multi-head attention followed by eight layers of window-based self-attention in four Swin Transformer stages. The final feature representation is processed through a multi-layer perceptron (MLP) for classification into benign or malignant categories.

Keywords: Vision Transformer, Swin Transformer, Breast Cancer, Diagnosis