evaluation takes a lot of time #632

marziye-A · 2023-06-26T10:03:39Z

hi
training a whisper peft model takes about 5 hours but evaluating the model and calculating wer takes about 10 hours.
does anyone know why?any help is really appreciated!
my evaluation code is like this:

from peft import PeftModel, PeftConfig
from transformers import WhisperForConditionalGeneration, Seq2SeqTrainer

peft_model_id = "****/openai-whisper-tiny-peft-fa-transcribe-colab_test"
peft_config = PeftConfig.from_pretrained(peft_model_id)
model = WhisperForConditionalGeneration.from_pretrained(
    peft_config.base_model_name_or_path, load_in_8bit=True, device_map="auto"
)
model = PeftModel.from_pretrained(model, peft_model_id)

from torch.utils.data import DataLoader
from tqdm import tqdm
import numpy as np
import gc

eval_dataloader = DataLoader(vectorized_datasets["test"], batch_size=8, collate_fn=data_collator)
forced_decoder_ids = processor.get_decoder_prompt_ids(language=language, task=task)

model.eval()
for step, batch in enumerate(tqdm(eval_dataloader)):
    with torch.cuda.amp.autocast():
        with torch.no_grad():
            generated_tokens = (
                model.generate(
                    input_features=batch["input_features"].to("cuda"),
                    forced_decoder_ids=forced_decoder_ids,
                    max_new_tokens=255,
                )
                .cpu()
                .numpy()
            )
            labels = batch["labels"].cpu().numpy()
            labels = np.where(labels != -100, labels, processor.tokenizer.pad_token_id)
            decoded_preds = processor.tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
            decoded_labels = processor.tokenizer.batch_decode(labels, skip_special_tokens=True)
            metric.add_batch(
                predictions=decoded_preds,
                references=decoded_labels,
            )
    del generated_tokens, labels, batch
    gc.collect()
wer = 100 * metric.compute()
print(f"{wer=}")

MightyStud · 2023-07-10T11:32:08Z

What I could think of are two things:

you are using larger batch size in training than in validation (you are using 8 in evaluation from your code)
Lora (PEFT) models tends to take longer in inference (but in theory shouldn't take longer than training) than its alternative non peft model. There is a merge_and_unload() function that combine both models and in theory speed up the process.

github-actions · 2023-08-03T15:03:27Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions bot closed this as completed Aug 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation takes a lot of time #632

evaluation takes a lot of time #632

evaluation takes a lot of time #632

evaluation takes a lot of time #632

Comments