Phi-3 Medium GGUF not working when running local on windows #413

siriux · 2024-06-09T11:11:31Z

Describe the bug

Phi-3 Medium GGUF not working when running local (without using huggingface) on windows.

It loads, but when I provide any input it crashes. Here is an example using $env:RUST_BACKTRACE=1

> cargo run --release --features cuda -- -i --token-source none -c .\chat_templates\phi3.json -n 32 gguf -m . -f ..\Phi-3-medium-4k-instruct-Q5_K_S.gguf
    Finished release [optimized] target(s) in 0.51s
     Running `target\release\mistralrs-server.exe -i --token-source none -c .\chat_templates\phi3.json -n 32 gguf -m . -f ..\Phi-3-medium-4k-instruct-Q5_K_S.gguf`
2024-06-09T11:05:18.106430Z  INFO mistralrs_server: avx: true, neon: false, simd128: false, f16c: true
2024-06-09T11:05:18.106567Z  INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> multinomial
2024-06-09T11:05:18.106683Z  INFO mistralrs_server: Model kind is: quantized from gguf (no adapters)
2024-06-09T11:05:18.108328Z  INFO hf_hub: Token file not found "C:\\Users\\Siriux\\.cache\\huggingface\\token"
2024-06-09T11:05:18.108471Z  INFO mistralrs_core::pipeline::gguf: Using chat template file at `.\chat_templates\phi3.json`
2024-06-09T11:05:18.108554Z  INFO hf_hub: Token file not found "C:\\Users\\Siriux\\.cache\\huggingface\\token"
2024-06-09T11:05:18.517203Z  INFO mistralrs_core::pipeline::paths: Loading `"..\\Phi-3-medium-4k-instruct-Q5_K_S.gguf"` locally at `".\\..\\Phi-3-medium-4k-instruct-Q5_K_S.gguf"`
2024-06-09T11:05:19.286315Z  INFO mistralrs_core::pipeline::gguf: Model config:
general.architecture: phi3
general.file_type: 16
general.name: Phi3
general.quantization_version: 2
phi3.attention.head_count: 40
phi3.attention.head_count_kv: 10
phi3.attention.layer_norm_rms_epsilon: 0.00001
phi3.block_count: 40
phi3.context_length: 4096
phi3.embedding_length: 5120
phi3.feed_forward_length: 17920
phi3.rope.dimension_count: 128
phi3.rope.freq_base: 10000
phi3.rope.scaling.original_context_length: 4096
quantize.imatrix.chunks_count: 234
quantize.imatrix.dataset: /training_data/calibration_data.txt
quantize.imatrix.entries_count: 160
quantize.imatrix.file: /models/Phi-3-medium-4k-instruct-GGUF/Phi-3-medium-4k-instruct.imatrix
2024-06-09T11:05:19.343517Z  INFO mistralrs_core::gguf::gguf_tokenizer: GGUF tokenizer model is `llama`, kind: `Unigram`, num tokens: 32064, num added tokens: 0, num merges: 0, num scores: 32064
2024-06-09T11:05:19.528332Z  INFO mistralrs_core::device_map: Model has 40 repeating layers.
2024-06-09T11:05:19.528532Z  INFO mistralrs_core::device_map: Using 32 repeating layers on GPU and 8 repeating layers on host.
2024-06-09T11:05:25.066355Z  INFO mistralrs_core::pipeline::chat_template: bos_toks = "<s>", eos_toks = "<|endoftext|>", unk_tok = <unk>
2024-06-09T11:05:25.069729Z  INFO mistralrs_server: Model loaded.
2024-06-09T11:05:25.072892Z  INFO mistralrs_core: GEMM reduced precision in BF16 not supported.
2024-06-09T11:05:25.107375Z  INFO mistralrs_core: Enabling GEMM reduced precision in F16.
2024-06-09T11:05:25.110102Z  INFO mistralrs_core::cublaslt: Initialized cuBLASlt handle
2024-06-09T11:05:25.110394Z  INFO mistralrs_server::interactive_mode: Starting interactive loop with sampling params: SamplingParams { temperature: Some(0.1), top_k: Some(32), top_p: Some(0.1), top_n_logprobs: 0, frequency_penalty: Some(0.1), presence_penalty: Some(0.1), stop_toks: None, max_len: Some(4096), logits_bias: None, n_choices: 1 }
> Hi
2024-06-09T11:05:29.537294Z ERROR mistralrs_core::engine: prompt step - Model failed with error: WithBacktrace { inner: ShapeMismatchBinaryOp { lhs: [1, 15, 1280], rhs: [1, 15, 40, 128], op: "reshape" }, backtrace: Backtrace [{ fn: "std::backtrace_rs::backtrace::dbghelp::trace", file: "/rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\..\..\backtrace\src\backtrace\dbghelp.rs", line: 131 }, { fn: "std::backtrace_rs::backtrace::trace_unsynchronized", file: "/rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\..\..\backtrace\src\backtrace\mod.rs", line: 66 }, { fn: "std::backtrace::Backtrace::create", file: "/rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\backtrace.rs", line: 331 }, { fn: "std::backtrace::Backtrace::capture", file: "/rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\backtrace.rs", line: 296 }, { fn: "candle_core::error::Error::bt" }, { fn: "candle_core::tensor::Tensor::reshape" }, { fn: "mistralrs_core::models::quantized_phi3::ModelWeights::forward" }, { fn: "<mistralrs_core::pipeline::gguf::GGUFPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs" }, { fn: "core::ops::function::FnOnce::call_once" }, { fn: "sparsevec::SparseVec<T>::from" }, { fn: "tokio::runtime::park::CachedParkThread::block_on" }, { fn: "tokio::runtime::context::runtime::enter_runtime" }, { fn: "std::sys_common::backtrace::__rust_begin_short_backtrace" }, { fn: "core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once" }, { fn: "std::sys::pal::windows::thread::impl$0::new::thread_start", file: "/rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\sys\pal\windows\thread.rs", line: 58 }, { fn: "BaseThreadInitThunk" }, { fn: "RtlUserThreadStart" }] }
2024-06-09T11:05:29.540404Z ERROR mistralrs_server::interactive_mode: Got a model error: "shape mismatch in reshape, lhs: [1, 15, 1280], rhs: [1, 15, 40, 128]\n   0: std::backtrace_rs::backtrace::dbghelp::trace\n             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\\std\\src\\..\\..\\backtrace\\src\\backtrace\\dbghelp.rs:131\n   1: std::backtrace_rs::backtrace::trace_unsynchronized\n             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\\std\\src\\..\\..\\backtrace\\src\\backtrace\\mod.rs:66\n   2: std::backtrace::Backtrace::create\n             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\\std\\src\\backtrace.rs:331\n   3: std::backtrace::Backtrace::capture\n             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\\std\\src\\backtrace.rs:296\n   4: candle_core::error::Error::bt\n   5: candle_core::tensor::Tensor::reshape\n   6: mistralrs_core::models::quantized_phi3::ModelWeights::forward\n   7: <mistralrs_core::pipeline::gguf::GGUFPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs\n   8: core::ops::function::FnOnce::call_once\n   9: sparsevec::SparseVec<T>::from\n  10: tokio::runtime::park::CachedParkThread::block_on\n  11: tokio::runtime::context::runtime::enter_runtime\n  12: std::sys_common::backtrace::__rust_begin_short_backtrace\n  13: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once\n  14: std::sys::pal::windows::thread::impl$0::new::thread_start\n             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\\std\\src\\sys\\pal\\windows\\thread.rs:58\n  15: BaseThreadInitThunk\n  16: RtlUserThreadStart\n", response: ChatCompletionResponse { id: "0", choices: [Choice { finish_reason: "error", index: 0, message: ResponseMessage { content: "", role: "assistant" }, logprobs: None }], created: 1717931128, model: ".", system_fingerprint: "local", object: "chat.completion", usage: Usage { completion_tokens: 0, prompt_tokens: 15, total_tokens: 15, avg_tok_per_sec: 19.582245, avg_prompt_tok_per_sec: inf, avg_compl_tok_per_sec: NaN, total_time_sec: 0.766, total_prompt_time_sec: 0.0, total_completion_time_sec: 0.0 } }

Latest commit
I'm using this commit: f257423

The text was updated successfully, but these errors were encountered:

EricLBuehler · 2024-06-09T15:53:12Z

Thank you for reporting this. I can reproduce the issue with:

RUST_BACKTRACE=1 cargo run --features cuda -- -i --token-source none -n 32 gguf -m bartowski/Phi-3-medium-4k-instruct-GGUF -f Phi-3-medium-4k-instruct-Q4_K_M.gguf -t microsoft/Phi-3-medium-4k-instruct

But not:

RUST_BACKTRACE=1 cargo run --features cuda -- -i --token-source none -n 32 gguf -m bartowski/Phi-3-mini-4k-instruct-GGUF -f Phi-3-mini-4k-instruct-Q4_K_M.gguf -t microsoft/Phi-3-mini-4k-instruct

This likely indicates an issue with the hyperparameters.

EricLBuehler · 2024-06-09T17:12:21Z

@siriux I just merged #414 which fix the issue on my machine. Can you please confirm that it works for you?

siriux · 2024-06-09T18:34:00Z

I works now, thank you!

EricLBuehler · 2024-06-09T18:38:26Z

Great!

siriux added the bug Something isn't working label Jun 9, 2024

siriux mentioned this issue Jun 9, 2024

Model Wishlist #156

Open

11 tasks

EricLBuehler added the triaged This error has been reproduced or otherwise triaged. label Jun 9, 2024

EricLBuehler mentioned this issue Jun 9, 2024

Fix Phi-3 GGUF #414

Merged

siriux closed this as completed Jun 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phi-3 Medium GGUF not working when running local on windows #413

Phi-3 Medium GGUF not working when running local on windows #413

Phi-3 Medium GGUF not working when running local on windows #413

Phi-3 Medium GGUF not working when running local on windows #413

Comments