ViTs-Mixup-CIFAR100-Weights

A collection of weights and logs for image classification experiments with modern Transformer architectures on CIFAR-100. These benchmarks are proposed for the convenience of conducting research in Mixup augmentations with Transformers since the most published benchmarks of Mixup variants with ViTs are based on ImageNet-1K. Please refer to our tech report for more details.

Since the original resolutions of CIFAR-100 are too small for ViTs, we resize the input images to $224\times 224$ (training and testing) while not modifying the ViT architectures. This benchmark uses the DeiT setup and trains the model for 200 or 600 epochs with a batch size of 100 on CIFAR-100. The basic learning rates of DeiT and Swin are $1e-3$ and $5e-4$, which is the optimal setup in our experiments. We search and report $\alpha$ in $Beta(\alpha, \alpha)$ for all compared methods. View config files in mixups/vits.
The best of top-1 accuracy in the last 10 training epochs is reported for ViT architectures. We released the trained models and logs in vits-mix-cifar100-weights.

ViTs' Mixup Benchmark on CIFAR-100

Backbones	$Beta$	DEiT-S(/16)	DEiT-S(/16)	Swin-T	Swin-T
Epoch	$\alpha$	200 epochs	600 epochs	200 epochs	600 epochs
Vanilla	-	65.81	68.50	78.41	81.29
MixUp	0.8	69.98	76.35	76.78	83.67
CutMix	2	74.12	79.54	80.64	83.38
DeiT	0.8,1	75.92	79.38	81.25	84.41
SmoothMix	0.2	67.54	80.25	66.69	81.18
SaliencyMix	0.2	69.78	76.60	80.40	82.58
AttentiveMix+	2	75.98	80.33	81.13	83.69
FMix*	1	70.41	74.31	80.72	82.82
GridMix	1	68.86	74.96	78.54	80.79
PuzzleMix	2	73.60	81.01	80.44	84.74
ResizeMix*	1	68.45	71.95	80.16	82.36
AlignMix	1	-	-	78.91	83.34
TransMix	0.8,1	76.17	79.33	81.33	84.45
AutoMix	2	76.24	80.91	82.67	84.70
SAMix*	2	77.94	82.49	82.62	84.85

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ViTs-Mixup-CIFAR100-Weights

ViTs' Mixup Benchmark on CIFAR-100