Training recipe for specialized model? #5

Dundalia · 2023-08-02T05:33:13Z

In the paper you claim:

During training, we employ a batch size of 32 and utilize the AdamW optimizer with a constant learn- ing rate of 5e − 5. The model is trained for two epochs.

But in the specialization.py script the gradient_accumulation_steps parameter is set to 8 and you are training for three epochs. Which is the true training recipe for the deberta-10k-rank_net model?

The text was updated successfully, but these errors were encountered:

sunnweiwei · 2023-08-02T08:38:06Z

The training recipe for deberta-10k-rank_net is as described in the paper. We use 4 GPUs in training and the global batch size is 4 * 8 = 32. We evaluate the checkpoint of 2 epochs.

Dundalia · 2023-08-02T08:40:58Z

Thanks for the quick response! Crystal clear!

Albert-Ma closed this as completed Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training recipe for specialized model? #5

Training recipe for specialized model? #5

Training recipe for specialized model? #5

Training recipe for specialized model? #5

Comments