Model Augest, 2025
Model Archetecture
The ResNetViTLateFusion
model is a hybrid architecture that combines ResNet50 and ViT-B/16 to classify images by art style. ResNet extracts local, texture-based features, while ViT captures global structure using self-attention. Their outputs are concatenated and passed through a custom classifier. The model is trained in two stages: first with frozen ResNet and partially unfrozen ViT, then with both fully trainable using layer-wise learning rates. It uses a combination of supervised contrastive loss and label smoothing cross entropy for better generalization and feature separation, making it well-suited for nuanced visual tasks.
Open Interactive Model