RandAugment:

Practical automated data augmentation with a reduced search space

Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, Quoc V. Le

Yuehchou Lee
National Taiwan University, Mathematics

Introduction

The reasons why I choose this paper:

Lack of our dataset (Brain tumor, hypopharyngeal cancer and hepatocellular carcinoma)

Widely used method for generating additional data

Simplifying the process of adjusting parameters

Recently published (14 Nov. 2019)

Pervious Publication (AutoAugment)

Disadvantage:

Rapidly increasing training complexity and computing time

NOT flexibly adjusting parameters

RandAugmentation Matches or Exceeds Predictive Performance of Other Augmentation Methods

AA: AutoAugment, Fast AA: Fast AutoAugment, PBA: Population Based Augmentation, RA: RandAugment

Contributions of Their Team

Demonstrate that the optimal strength of a data augmentation depends on the model size and training set size
Introduce a vastly simplified search space for data augmentation containing 2 interpretable hyperparameters
Demonstrate state-of-the-art results on CIFAR, SVHN, and ImageNet

Methods

Augmentation Policies (K = 14)

1. identity 2. autoContrast 3. equalize
4. rotate 5. solarize 6. color
7. posterize 8. contrast 9. brightness
10. sharpness 11. shear-x 12. shear-y
13. translate-x 14. translate-y

Python Code for RandAugment Based on Numpy

So RandAugmentation may express "$K^N$" potential policies!

Fixed $N = 2$ with $M = 9, 17, 28$ (Magnitude)

Operation Magnitudes Increase Rapidly in the Initial Phase of Training

The sum of all magnitude values are between $0 \sim 10$!

Normalized Plot of Operation Probability Parameters over Time

Policies Will be (Operation, Probability, Magnitude)

For example,

                                policies = [('ShearX', 0.6, 2), ...]

Results

CIFAR and SVHN

Test Accuracy (%) on CIFAR-10, CIFAR-100, SVHN and SVHN Core Set

AA: AutoAugment, Fast AA: Fast AutoAugment, PBA: Population Based Augmentation, RA: RandAugment

Optimal Magnitude of Augmentation

(a) Accuracy of Wide-ResNet-28-2, Wide-ResNet-28-7, and Wide-ResNet-28-10 across varying distortion magnitudes
(b) Optimal distortion magnitude across 7 Wide-ResNet-28 architectures with varying widening parameters (k)
(c) Accuracy of Wide-ResNet-28-10 for three training set sizes (1K, 4K, and 10K) across varying distortion magnitudes
(d) Optimal distortion magnitude across 8 training set sizes