We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug
Phi-3 Medium GGUF not working when running local (without using huggingface) on windows.
It loads, but when I provide any input it crashes. Here is an example using $env:RUST_BACKTRACE=1
> cargo run --release --features cuda -- -i --token-source none -c .\chat_templates\phi3.json -n 32 gguf -m . -f ..\Phi-3-medium-4k-instruct-Q5_K_S.gguf Finished release [optimized] target(s) in 0.51s Running `target\release\mistralrs-server.exe -i --token-source none -c .\chat_templates\phi3.json -n 32 gguf -m . -f ..\Phi-3-medium-4k-instruct-Q5_K_S.gguf` 2024-06-09T11:05:18.106430Z INFO mistralrs_server: avx: true, neon: false, simd128: false, f16c: true 2024-06-09T11:05:18.106567Z INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> multinomial 2024-06-09T11:05:18.106683Z INFO mistralrs_server: Model kind is: quantized from gguf (no adapters) 2024-06-09T11:05:18.108328Z INFO hf_hub: Token file not found "C:\\Users\\Siriux\\.cache\\huggingface\\token" 2024-06-09T11:05:18.108471Z INFO mistralrs_core::pipeline::gguf: Using chat template file at `.\chat_templates\phi3.json` 2024-06-09T11:05:18.108554Z INFO hf_hub: Token file not found "C:\\Users\\Siriux\\.cache\\huggingface\\token" 2024-06-09T11:05:18.517203Z INFO mistralrs_core::pipeline::paths: Loading `"..\\Phi-3-medium-4k-instruct-Q5_K_S.gguf"` locally at `".\\..\\Phi-3-medium-4k-instruct-Q5_K_S.gguf"` 2024-06-09T11:05:19.286315Z INFO mistralrs_core::pipeline::gguf: Model config: general.architecture: phi3 general.file_type: 16 general.name: Phi3 general.quantization_version: 2 phi3.attention.head_count: 40 phi3.attention.head_count_kv: 10 phi3.attention.layer_norm_rms_epsilon: 0.00001 phi3.block_count: 40 phi3.context_length: 4096 phi3.embedding_length: 5120 phi3.feed_forward_length: 17920 phi3.rope.dimension_count: 128 phi3.rope.freq_base: 10000 phi3.rope.scaling.original_context_length: 4096 quantize.imatrix.chunks_count: 234 quantize.imatrix.dataset: /training_data/calibration_data.txt quantize.imatrix.entries_count: 160 quantize.imatrix.file: /models/Phi-3-medium-4k-instruct-GGUF/Phi-3-medium-4k-instruct.imatrix 2024-06-09T11:05:19.343517Z INFO mistralrs_core::gguf::gguf_tokenizer: GGUF tokenizer model is `llama`, kind: `Unigram`, num tokens: 32064, num added tokens: 0, num merges: 0, num scores: 32064 2024-06-09T11:05:19.528332Z INFO mistralrs_core::device_map: Model has 40 repeating layers. 2024-06-09T11:05:19.528532Z INFO mistralrs_core::device_map: Using 32 repeating layers on GPU and 8 repeating layers on host. 2024-06-09T11:05:25.066355Z INFO mistralrs_core::pipeline::chat_template: bos_toks = "<s>", eos_toks = "<|endoftext|>", unk_tok = <unk> 2024-06-09T11:05:25.069729Z INFO mistralrs_server: Model loaded. 2024-06-09T11:05:25.072892Z INFO mistralrs_core: GEMM reduced precision in BF16 not supported. 2024-06-09T11:05:25.107375Z INFO mistralrs_core: Enabling GEMM reduced precision in F16. 2024-06-09T11:05:25.110102Z INFO mistralrs_core::cublaslt: Initialized cuBLASlt handle 2024-06-09T11:05:25.110394Z INFO mistralrs_server::interactive_mode: Starting interactive loop with sampling params: SamplingParams { temperature: Some(0.1), top_k: Some(32), top_p: Some(0.1), top_n_logprobs: 0, frequency_penalty: Some(0.1), presence_penalty: Some(0.1), stop_toks: None, max_len: Some(4096), logits_bias: None, n_choices: 1 } > Hi 2024-06-09T11:05:29.537294Z ERROR mistralrs_core::engine: prompt step - Model failed with error: WithBacktrace { inner: ShapeMismatchBinaryOp { lhs: [1, 15, 1280], rhs: [1, 15, 40, 128], op: "reshape" }, backtrace: Backtrace [{ fn: "std::backtrace_rs::backtrace::dbghelp::trace", file: "/rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\..\..\backtrace\src\backtrace\dbghelp.rs", line: 131 }, { fn: "std::backtrace_rs::backtrace::trace_unsynchronized", file: "/rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\..\..\backtrace\src\backtrace\mod.rs", line: 66 }, { fn: "std::backtrace::Backtrace::create", file: "/rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\backtrace.rs", line: 331 }, { fn: "std::backtrace::Backtrace::capture", file: "/rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\backtrace.rs", line: 296 }, { fn: "candle_core::error::Error::bt" }, { fn: "candle_core::tensor::Tensor::reshape" }, { fn: "mistralrs_core::models::quantized_phi3::ModelWeights::forward" }, { fn: "<mistralrs_core::pipeline::gguf::GGUFPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs" }, { fn: "core::ops::function::FnOnce::call_once" }, { fn: "sparsevec::SparseVec<T>::from" }, { fn: "tokio::runtime::park::CachedParkThread::block_on" }, { fn: "tokio::runtime::context::runtime::enter_runtime" }, { fn: "std::sys_common::backtrace::__rust_begin_short_backtrace" }, { fn: "core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once" }, { fn: "std::sys::pal::windows::thread::impl$0::new::thread_start", file: "/rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\sys\pal\windows\thread.rs", line: 58 }, { fn: "BaseThreadInitThunk" }, { fn: "RtlUserThreadStart" }] } 2024-06-09T11:05:29.540404Z ERROR mistralrs_server::interactive_mode: Got a model error: "shape mismatch in reshape, lhs: [1, 15, 1280], rhs: [1, 15, 40, 128]\n 0: std::backtrace_rs::backtrace::dbghelp::trace\n at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\\std\\src\\..\\..\\backtrace\\src\\backtrace\\dbghelp.rs:131\n 1: std::backtrace_rs::backtrace::trace_unsynchronized\n at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\\std\\src\\..\\..\\backtrace\\src\\backtrace\\mod.rs:66\n 2: std::backtrace::Backtrace::create\n at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\\std\\src\\backtrace.rs:331\n 3: std::backtrace::Backtrace::capture\n at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\\std\\src\\backtrace.rs:296\n 4: candle_core::error::Error::bt\n 5: candle_core::tensor::Tensor::reshape\n 6: mistralrs_core::models::quantized_phi3::ModelWeights::forward\n 7: <mistralrs_core::pipeline::gguf::GGUFPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs\n 8: core::ops::function::FnOnce::call_once\n 9: sparsevec::SparseVec<T>::from\n 10: tokio::runtime::park::CachedParkThread::block_on\n 11: tokio::runtime::context::runtime::enter_runtime\n 12: std::sys_common::backtrace::__rust_begin_short_backtrace\n 13: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once\n 14: std::sys::pal::windows::thread::impl$0::new::thread_start\n at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\\std\\src\\sys\\pal\\windows\\thread.rs:58\n 15: BaseThreadInitThunk\n 16: RtlUserThreadStart\n", response: ChatCompletionResponse { id: "0", choices: [Choice { finish_reason: "error", index: 0, message: ResponseMessage { content: "", role: "assistant" }, logprobs: None }], created: 1717931128, model: ".", system_fingerprint: "local", object: "chat.completion", usage: Usage { completion_tokens: 0, prompt_tokens: 15, total_tokens: 15, avg_tok_per_sec: 19.582245, avg_prompt_tok_per_sec: inf, avg_compl_tok_per_sec: NaN, total_time_sec: 0.766, total_prompt_time_sec: 0.0, total_completion_time_sec: 0.0 } }
Latest commit I'm using this commit: f257423
The text was updated successfully, but these errors were encountered:
Thank you for reporting this. I can reproduce the issue with:
RUST_BACKTRACE=1 cargo run --features cuda -- -i --token-source none -n 32 gguf -m bartowski/Phi-3-medium-4k-instruct-GGUF -f Phi-3-medium-4k-instruct-Q4_K_M.gguf -t microsoft/Phi-3-medium-4k-instruct
But not:
RUST_BACKTRACE=1 cargo run --features cuda -- -i --token-source none -n 32 gguf -m bartowski/Phi-3-mini-4k-instruct-GGUF -f Phi-3-mini-4k-instruct-Q4_K_M.gguf -t microsoft/Phi-3-mini-4k-instruct
This likely indicates an issue with the hyperparameters.
Sorry, something went wrong.
@siriux I just merged #414 which fix the issue on my machine. Can you please confirm that it works for you?
I works now, thank you!
Great!
No branches or pull requests
Describe the bug
Phi-3 Medium GGUF not working when running local (without using huggingface) on windows.
It loads, but when I provide any input it crashes. Here is an example using $env:RUST_BACKTRACE=1
Latest commit
I'm using this commit: f257423
The text was updated successfully, but these errors were encountered: