[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phi-3 Medium GGUF not working when running local on windows #413

Closed
siriux opened this issue Jun 9, 2024 · 4 comments
Closed

Phi-3 Medium GGUF not working when running local on windows #413

siriux opened this issue Jun 9, 2024 · 4 comments
Labels
bug Something isn't working triaged This error has been reproduced or otherwise triaged.

Comments

@siriux
Copy link
siriux commented Jun 9, 2024

Describe the bug

Phi-3 Medium GGUF not working when running local (without using huggingface) on windows.

It loads, but when I provide any input it crashes. Here is an example using $env:RUST_BACKTRACE=1

> cargo run --release --features cuda -- -i --token-source none -c .\chat_templates\phi3.json -n 32 gguf -m . -f ..\Phi-3-medium-4k-instruct-Q5_K_S.gguf
    Finished release [optimized] target(s) in 0.51s
     Running `target\release\mistralrs-server.exe -i --token-source none -c .\chat_templates\phi3.json -n 32 gguf -m . -f ..\Phi-3-medium-4k-instruct-Q5_K_S.gguf`
2024-06-09T11:05:18.106430Z  INFO mistralrs_server: avx: true, neon: false, simd128: false, f16c: true
2024-06-09T11:05:18.106567Z  INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> multinomial
2024-06-09T11:05:18.106683Z  INFO mistralrs_server: Model kind is: quantized from gguf (no adapters)
2024-06-09T11:05:18.108328Z  INFO hf_hub: Token file not found "C:\\Users\\Siriux\\.cache\\huggingface\\token"
2024-06-09T11:05:18.108471Z  INFO mistralrs_core::pipeline::gguf: Using chat template file at `.\chat_templates\phi3.json`
2024-06-09T11:05:18.108554Z  INFO hf_hub: Token file not found "C:\\Users\\Siriux\\.cache\\huggingface\\token"
2024-06-09T11:05:18.517203Z  INFO mistralrs_core::pipeline::paths: Loading `"..\\Phi-3-medium-4k-instruct-Q5_K_S.gguf"` locally at `".\\..\\Phi-3-medium-4k-instruct-Q5_K_S.gguf"`
2024-06-09T11:05:19.286315Z  INFO mistralrs_core::pipeline::gguf: Model config:
general.architecture: phi3
general.file_type: 16
general.name: Phi3
general.quantization_version: 2
phi3.attention.head_count: 40
phi3.attention.head_count_kv: 10
phi3.attention.layer_norm_rms_epsilon: 0.00001
phi3.block_count: 40
phi3.context_length: 4096
phi3.embedding_length: 5120
phi3.feed_forward_length: 17920
phi3.rope.dimension_count: 128
phi3.rope.freq_base: 10000
phi3.rope.scaling.original_context_length: 4096
quantize.imatrix.chunks_count: 234
quantize.imatrix.dataset: /training_data/calibration_data.txt
quantize.imatrix.entries_count: 160
quantize.imatrix.file: /models/Phi-3-medium-4k-instruct-GGUF/Phi-3-medium-4k-instruct.imatrix
2024-06-09T11:05:19.343517Z  INFO mistralrs_core::gguf::gguf_tokenizer: GGUF tokenizer model is `llama`, kind: `Unigram`, num tokens: 32064, num added tokens: 0, num merges: 0, num scores: 32064
2024-06-09T11:05:19.528332Z  INFO mistralrs_core::device_map: Model has 40 repeating layers.
2024-06-09T11:05:19.528532Z  INFO mistralrs_core::device_map: Using 32 repeating layers on GPU and 8 repeating layers on host.
2024-06-09T11:05:25.066355Z  INFO mistralrs_core::pipeline::chat_template: bos_toks = "<s>", eos_toks = "<|endoftext|>", unk_tok = <unk>
2024-06-09T11:05:25.069729Z  INFO mistralrs_server: Model loaded.
2024-06-09T11:05:25.072892Z  INFO mistralrs_core: GEMM reduced precision in BF16 not supported.
2024-06-09T11:05:25.107375Z  INFO mistralrs_core: Enabling GEMM reduced precision in F16.
2024-06-09T11:05:25.110102Z  INFO mistralrs_core::cublaslt: Initialized cuBLASlt handle
2024-06-09T11:05:25.110394Z  INFO mistralrs_server::interactive_mode: Starting interactive loop with sampling params: SamplingParams { temperature: Some(0.1), top_k: Some(32), top_p: Some(0.1), top_n_logprobs: 0, frequency_penalty: Some(0.1), presence_penalty: Some(0.1), stop_toks: None, max_len: Some(4096), logits_bias: None, n_choices: 1 }
> Hi
2024-06-09T11:05:29.537294Z ERROR mistralrs_core::engine: prompt step - Model failed with error: WithBacktrace { inner: ShapeMismatchBinaryOp { lhs: [1, 15, 1280], rhs: [1, 15, 40, 128], op: "reshape" }, backtrace: Backtrace [{ fn: "std::backtrace_rs::backtrace::dbghelp::trace", file: "/rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\..\..\backtrace\src\backtrace\dbghelp.rs", line: 131 }, { fn: "std::backtrace_rs::backtrace::trace_unsynchronized", file: "/rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\..\..\backtrace\src\backtrace\mod.rs", line: 66 }, { fn: "std::backtrace::Backtrace::create", file: "/rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\backtrace.rs", line: 331 }, { fn: "std::backtrace::Backtrace::capture", file: "/rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\backtrace.rs", line: 296 }, { fn: "candle_core::error::Error::bt" }, { fn: "candle_core::tensor::Tensor::reshape" }, { fn: "mistralrs_core::models::quantized_phi3::ModelWeights::forward" }, { fn: "<mistralrs_core::pipeline::gguf::GGUFPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs" }, { fn: "core::ops::function::FnOnce::call_once" }, { fn: "sparsevec::SparseVec<T>::from" }, { fn: "tokio::runtime::park::CachedParkThread::block_on" }, { fn: "tokio::runtime::context::runtime::enter_runtime" }, { fn: "std::sys_common::backtrace::__rust_begin_short_backtrace" }, { fn: "core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once" }, { fn: "std::sys::pal::windows::thread::impl$0::new::thread_start", file: "/rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\std\src\sys\pal\windows\thread.rs", line: 58 }, { fn: "BaseThreadInitThunk" }, { fn: "RtlUserThreadStart" }] }
2024-06-09T11:05:29.540404Z ERROR mistralrs_server::interactive_mode: Got a model error: "shape mismatch in reshape, lhs: [1, 15, 1280], rhs: [1, 15, 40, 128]\n   0: std::backtrace_rs::backtrace::dbghelp::trace\n             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\\std\\src\\..\\..\\backtrace\\src\\backtrace\\dbghelp.rs:131\n   1: std::backtrace_rs::backtrace::trace_unsynchronized\n             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\\std\\src\\..\\..\\backtrace\\src\\backtrace\\mod.rs:66\n   2: std::backtrace::Backtrace::create\n             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\\std\\src\\backtrace.rs:331\n   3: std::backtrace::Backtrace::capture\n             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\\std\\src\\backtrace.rs:296\n   4: candle_core::error::Error::bt\n   5: candle_core::tensor::Tensor::reshape\n   6: mistralrs_core::models::quantized_phi3::ModelWeights::forward\n   7: <mistralrs_core::pipeline::gguf::GGUFPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs\n   8: core::ops::function::FnOnce::call_once\n   9: sparsevec::SparseVec<T>::from\n  10: tokio::runtime::park::CachedParkThread::block_on\n  11: tokio::runtime::context::runtime::enter_runtime\n  12: std::sys_common::backtrace::__rust_begin_short_backtrace\n  13: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once\n  14: std::sys::pal::windows::thread::impl$0::new::thread_start\n             at /rustc/25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04/library\\std\\src\\sys\\pal\\windows\\thread.rs:58\n  15: BaseThreadInitThunk\n  16: RtlUserThreadStart\n", response: ChatCompletionResponse { id: "0", choices: [Choice { finish_reason: "error", index: 0, message: ResponseMessage { content: "", role: "assistant" }, logprobs: None }], created: 1717931128, model: ".", system_fingerprint: "local", object: "chat.completion", usage: Usage { completion_tokens: 0, prompt_tokens: 15, total_tokens: 15, avg_tok_per_sec: 19.582245, avg_prompt_tok_per_sec: inf, avg_compl_tok_per_sec: NaN, total_time_sec: 0.766, total_prompt_time_sec: 0.0, total_completion_time_sec: 0.0 } }

Latest commit
I'm using this commit: f257423

@siriux siriux added the bug Something isn't working label Jun 9, 2024
@siriux siriux mentioned this issue Jun 9, 2024
11 tasks
@EricLBuehler EricLBuehler added the triaged This error has been reproduced or otherwise triaged. label Jun 9, 2024
@EricLBuehler
Copy link
Owner

Thank you for reporting this. I can reproduce the issue with:

RUST_BACKTRACE=1 cargo run --features cuda -- -i --token-source none -n 32 gguf -m bartowski/Phi-3-medium-4k-instruct-GGUF -f Phi-3-medium-4k-instruct-Q4_K_M.gguf -t microsoft/Phi-3-medium-4k-instruct

But not:

RUST_BACKTRACE=1 cargo run --features cuda -- -i --token-source none -n 32 gguf -m bartowski/Phi-3-mini-4k-instruct-GGUF -f Phi-3-mini-4k-instruct-Q4_K_M.gguf -t microsoft/Phi-3-mini-4k-instruct

This likely indicates an issue with the hyperparameters.

@EricLBuehler
Copy link
Owner

@siriux I just merged #414 which fix the issue on my machine. Can you please confirm that it works for you?

@siriux
Copy link
Author
siriux commented Jun 9, 2024

I works now, thank you!

@siriux siriux closed this as completed Jun 9, 2024
@EricLBuehler
Copy link
Owner

Great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged This error has been reproduced or otherwise triaged.
Projects
None yet
Development

No branches or pull requests

2 participants