why the hyperparameters, for both the training pipeline (train_search.py )and the final evaluation pipeline (train.py) differ a lot!? #107

NdaAzr · 2019-08-01T15:28:53Z

I am wondering why hyperparameters in training is different with the evaluation pipeline.

For example, here are the hyperparameters for CIFAR, in this format: training pipeline value -> final evaluation pipeline value:
cells: 8 -> 20
batch size: 64 -> 96
initial channels: 16 -> 36
epochs: 50 -> 600
droppath: 0.3->(with probability 0.2)
auxiliary weight: no -> yes (with weight 0.4)

NdaAzr · 2019-08-01T15:58:51Z

I found the answer to this question here:

https://openreview.net/forum?id=S1eYHoC5FX

For convolutional cells:

Our setup of #cells (8->20), #epochs (600) and weight for the auxiliary head (0.4) in the final evaluation exactly follows Zoph et al., 2018. The #init_channels is enlarged from 16 to 36 to ensure a comparable model size (~3M) with other baselines. Given those settings, we then use the largest possible batch size (96) for a single GPU. The drop path probability was tuned wrt the validation set among the choices of (0.1, 0.2, 0.3) given the best cell learned by DARTS.

YANGWAGN · 2019-09-02T15:40:09Z

Hi ,NdaAzr! In the code, I know that train_search.py haw to use, however , I don't see the code about obtaining the best architecture and saving the ultimate arch parameters. And how to Construct the architecture with the arch parameters in the train.py.
Thank you!

NdaAzr · 2019-09-03T14:23:27Z

Hi @YANGWAGN,

When you ran the train_search.py, the model.genotype() will give you the best-learned cell. So, you need to train it for a few numbers of epochs and get the genotype with the highest accuracy.

Then, you need to give the genotype.py this genotype and run train.py. see example here:

DARTS_v2 = Genotype(normal=[('dil_conv_3x3', 0), ('skip_connect', 1), ('skip_connect', 1), ('sep_conv_3x3', 2), ('sep_conv_3x3', 2), ('sep_conv_3x3', 0), ('skip_connect', 1), ('dil_conv_3x3', 0)], normal_concat=range(2, 6), reduce=[('dil_conv_3x3', 1), ('sep_conv_3x3', 0), ('max_pool_3x3', 0), ('dil_conv_5x5', 2), ('dil_conv_5x5', 3), ('max_pool_3x3', 1), ('max_pool_3x3', 0), ('max_pool_3x3', 1)], reduce_concat=range(2, 6))

DARTS = DARTS_v2

NdaAzr closed this as completed Aug 1, 2019

NdaAzr reopened this Aug 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why the hyperparameters, for both the training pipeline (train_search.py )and the final evaluation pipeline (train.py) differ a lot!? #107

why the hyperparameters, for both the training pipeline (train_search.py )and the final evaluation pipeline (train.py) differ a lot!? #107

why the hyperparameters, for both the training pipeline (train_search.py )and the final evaluation pipeline (train.py) differ a lot!? #107

why the hyperparameters, for both the training pipeline (train_search.py )and the final evaluation pipeline (train.py) differ a lot!? #107

Comments