Cannot generate w/ do_sample=True on a LoRA model #81

minimaxir · 2023-02-13T19:19:41Z

From the last cell of the OPT Notebook, adding do_sample=True to generate():

/opt/conda/lib/python3.7/site-packages/peft/peft_model.py:550 in generate                        │
│                                                                                                  │
│   547 │                                                                                          │
│   548 │   def generate(self, **kwargs):                                                          │
│   549 │   │   if not isinstance(self.peft_config, PromptLearningConfig):                         │
│ ❱ 550 │   │   │   return self.base_model.generate(**kwargs)                                      │
│   551 │   │   else:                                                                              │
│   552 │   │   │   if "input_ids" not in kwargs:                                                  │
│   553 │   │   │   │   raise ValueError("input_ids must be provided for Peft model generation")   │
│                                                                                                  │
│ /opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py:27 in decorate_context        │
│                                                                                                  │
│    24 │   │   @functools.wraps(func)                                                             │
│    25 │   │   def decorate_context(*args, **kwargs):                                             │
│    26 │   │   │   with self.clone():                                                             │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                                               │
│    28 │   │   return cast(F, decorate_context)                                                   │
│    29 │                                                                                          │
│    30 │   def _wrap_generator(self, func):                                                       │
│                                                                                                  │
│ /opt/conda/lib/python3.7/site-packages/transformers/generation/utils.py:1442 in generate         │
│                                                                                                  │
│   1439 │   │   │   │   output_scores=generation_config.output_scores,                            │
│   1440 │   │   │   │   return_dict_in_generate=generation_config.return_dict_in_generate,        │
│   1441 │   │   │   │   synced_gpus=synced_gpus,                                                  │
│ ❱ 1442 │   │   │   │   **model_kwargs,                                                           │
│   1443 │   │   │   )                                                                             │
│   1444 │   │                                                                                     │
│   1445 │   │   elif is_beam_gen_mode:                                                            │
│                                                                                                  │
│ /opt/conda/lib/python3.7/site-packages/transformers/generation/utils.py:2462 in sample           │
│                                                                                                  │
│   2459 │   │   │                                                                                 │
│   2460 │   │   │   # pre-process distribution                                                    │
│   2461 │   │   │   next_token_scores = logits_processor(input_ids, next_token_logits)            │
│ ❱ 2462 │   │   │   next_token_scores = logits_warper(input_ids, next_token_scores)               │
│   2463 │   │   │                                                                                 │
│   2464 │   │   │   # Store scores, attentions and hidden_states when required                    │
│   2465 │   │   │   if return_dict_in_generate:                                                   │
│                                                                                                  │
│ /opt/conda/lib/python3.7/site-packages/transformers/generation/logits_process.py:92 in __call__  │
│                                                                                                  │
│    89 │   │   │   │   │   )                                                                      │
│    90 │   │   │   │   scores = processor(input_ids, scores, **kwargs)                            │
│    91 │   │   │   else:                                                                          │
│ ❱  92 │   │   │   │   scores = processor(input_ids, scores)                                      │
│    93 │   │   return scores                                                                      │
│    94                                                                                            │
│    95                                                                                            │
│                                                                                                  │
│ /opt/conda/lib/python3.7/site-packages/transformers/generation/logits_process.py:297 in __call__ │
│                                                                                                  │
│   294 │   def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.   │
│   295 │   │   top_k = min(self.top_k, scores.size(-1))  # Safety check                           │
│   296 │   │   # Remove all tokens with a probability less than the last token of the top-k       │
│ ❱ 297 │   │   indices_to_remove = scores < torch.topk(scores, top_k)[0][..., -1, None]           │
│   298 │   │   scores = scores.masked_fill(indices_to_remove, self.filter_value)                  │
│   299 │   │   return scores                                                                      │
│   300                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: "topk_cpu" not implemented for 'Half'

The text was updated successfully, but these errors were encountered:

younesbelkada · 2023-02-13T19:48:34Z

Hi @minimaxir
Thanks for the issue!
In fact you need to put your input_ids in the correct device, calling generate with input_ids.to(0) should fix your issue

minimaxir · 2023-02-13T20:32:06Z

Yep, a batch.to('cuda') did the trick. Thanks!

minimaxir closed this as completed Feb 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot generate w/ do_sample=True on a LoRA model #81

Cannot generate w/ do_sample=True on a LoRA model #81

Cannot generate w/ do_sample=True on a LoRA model #81

Cannot generate w/ do_sample=True on a LoRA model #81

Comments