2024-06-01

Masked Image Modelling

Self-supervised learning pipeline that pre-trains a Vision Transformer to reconstruct partially masked images and then fine-tunes on the Oxford-IIIT Pets dataset for segmentation.

Focus: generative pre-training, segmentation, ViT fine-tuning
Stack: PyTorch, Hugging Face

Reconstructions Segmentation Results

View code on GitHub