-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce pyroscope integration for continuous profiling on demand #4254
Conversation
I personally don't like the idea of attaching Also, it looks like the pyroscope server sometimes fails to shutdown if I call any API (say |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would help if you could also include an instruction of how to enable profiler and see the actual flamegraph
When I try to run with config in settings, i get this:
Which doesn't look too good. And it is also not related to this: grafana/pyroscope-rs#165, which is also doesn't look good. |
2a1627d
to
d194f0f
Compare
31c0d02
to
112e4d1
Compare
TLS tests seem to be failing because we already use See this. Is this debug actually doing something? Because it isn't a part of the current Settings struct. |
IIRC we used it before, but deprecated it some time ago. That would mean you can safely remove it. One problem though is that users might still have |
Sadly, this still crashes on my machine when enabling pyroscope through 2024-05-27T08:14:57.765853Z INFO actix_web::middleware::logger: 127.0.0.1 "PATCH /debug HTTP/1.1" 200 53 "http://localhost:6333/dashboard" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:126.0) Gecko/20100101 Firefox/126.0" 2.632543
2024-05-27T08:15:00.390173Z ERROR qdrant::startup: Panic backtrace:
0: qdrant::startup::setup_panic_hook::{{closure}}
at ./src/startup.rs:19:25
1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/alloc/src/boxed.rs:2034:9
2: std::panicking::rust_panic_with_hook
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:783:13
3: std::panicking::begin_panic_handler::{{closure}}
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:649:13
4: std::sys_common::backtrace::__rust_end_short_backtrace
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:171:18
5: rust_begin_unwind
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:645:5
6: core::panicking::panic_nounwind_fmt::runtime
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:110:18
7: core::panicking::panic_nounwind_fmt
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:123:9
8: core::panicking::panic_nounwind
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:156:5
9: core::slice::raw::from_raw_parts::precondition_check
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/intrinsics.rs:2799:21
10: core::slice::raw::from_raw_parts
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/slice/raw.rs:98:9
11: <pprof::collector::TempFdArrayIterator<T> as core::iter::traits::iterator::Iterator>::next
at /home/timvisee/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pprof-0.12.1/src/collector.rs:225:26
12: core::iter::traits::iterator::Iterator::fold
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/iter/traits/iterator.rs:2586:29
13: <core::iter::adapters::chain::Chain<A,B> as core::iter::traits::iterator::Iterator>::fold
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/iter/adapters/chain.rs:93:19
14: core::iter::traits::iterator::Iterator::for_each
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/iter/traits/iterator.rs:817:9
15: pprof::report::ReportBuilder::build
at /home/timvisee/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pprof-0.12.1/src/report.rs:110:17
16: pyroscope_pprofrs::Pprof::dump_report
at /home/timvisee/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyroscope_pprofrs-0.2.7/src/lib.rs:202:22
17: <pyroscope_pprofrs::Pprof as pyroscope::backend::backend::Backend>::report
at /home/timvisee/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyroscope_pprofrs-0.2.7/src/lib.rs:159:9
18: pyroscope::pyroscope::PyroscopeAgent<pyroscope::pyroscope::PyroscopeAgentReady>::start::{{closure}}
at /home/timvisee/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyroscope-0.5.7/src/pyroscope.rs:676:38
19: std::sys_common::backtrace::__rust_begin_short_backtrace
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:155:18
20: std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/mod.rs:528:17
21: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panic/unwind_safe.rs:272:9
22: std::panicking::try::do_call
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
23: __rust_try
24: std::panicking::try
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
25: std::panic::catch_unwind
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
26: std::thread::Builder::spawn_unchecked_::{{closure}}
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/mod.rs:527:30
27: core::ops::function::FnOnce::call_once{{vtable.shim}}
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
28: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/alloc/src/boxed.rs:2020:9
29: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/alloc/src/boxed.rs:2020:9
30: std::sys::pal::unix::thread::Thread::new::thread_start
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys/pal/unix/thread.rs:108:17
31: start_thread
at ./nptl/pthread_create.c:447:8
32: __GI___clone3
at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
2024-05-27T08:15:00.390249Z ERROR qdrant::startup: Panic occurred in file library/core/src/panicking.rs at line 156: unsafe precondition(s) violated: slice::from_raw_parts requires the pointer to be aligned and non-null, and the total size of the slice not to exceed `isize::MAX`
2024-05-27T08:15:00.398996Z DEBUG reqwest::connect: starting new connection: https://staging-telemetry.qdrant.io/
thread caused non-unwinding panic. aborting.
fish: Job 1, 'cargo run $argv' terminated by signal SIGABRT (Abort) Update: it only crashes when running the binary directly, it is fine when running through our default Docker image. I'm assuming some system library mismatch. |
* feat: Working pyroscope profiler integration * fix: Remove UpdateDebugConfigResponse * fix: Format * fix: Simplify debug settings type * fix: Remove /debug from API spec and clean pyrostate state * fix: Update openapi.json * refactor: Move models to suitable paths * refactor: Improve API and pyroscope state structure * fix: Format * feat: Use null debug for pyroscope config * fix: Pyroscope shouldnt be used and enabled in windows * fix: Format * fix: Remove pyroscope for macos * fix: Add more compile conditions * fix: Try to fix OS target issue * feat: Move DebugConfig and Pyroscope to common::debug * feat: Introduce and use DebugState instead of PyroscopeState * feat: Simplify debug_api.rs and move logic to debug.rs * fix: format * fix: Use debug patch enum * feat: Use parking lot Mutex * fix: Propagate errors instead of panic * fix: Format * feat: Forward stop agent errors * feat: Use enums * fix: For non linux OS * fix: Missing import for non linux os * fix: Remove redundant logs * feat: Take lock throughout the patch * Don't unwrap pyroscope state, it may be None * fix: Remove debug: true from tls config tests * refactor: Rename debug setting to debugger * refactor: Rename debug to debugger everywhere * Improve logs --------- Co-authored-by: timvisee <tim@visee.me>
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [qdrant/qdrant](https://qdrant.com/) ([source](https://togithub.com/qdrant/qdrant)) | service | patch | `v1.9.2` -> `v1.9.7` | --- ### Release Notes <details> <summary>qdrant/qdrant (qdrant/qdrant)</summary> ### [`v1.9.7`](https://togithub.com/qdrant/qdrant/releases/tag/v1.9.7) [Compare Source](https://togithub.com/qdrant/qdrant/compare/v1.9.6...v1.9.7) ### Change log #### Improvements - [https://github.com/qdrant/qdrant/pull/4517](https://togithub.com/qdrant/qdrant/pull/4517) - Do not allow embedding the web UI in an iframe - [https://github.com/qdrant/qdrant/pull/4556](https://togithub.com/qdrant/qdrant/pull/4556) - Include HNSW configuration in snasphots to fix some edge cases #### Bug fixes - [https://github.com/qdrant/qdrant/pull/4555](https://togithub.com/qdrant/qdrant/pull/4555) - Fix panic on start with sparse index from versions 1.9.3 to 1.9.6 - [https://github.com/qdrant/qdrant/pull/4551](https://togithub.com/qdrant/qdrant/pull/4551) - Fix positive/negative points IDs being excluded when using recommendation search with `lookup_from` ### [`v1.9.6`](https://togithub.com/qdrant/qdrant/releases/tag/v1.9.6) [Compare Source](https://togithub.com/qdrant/qdrant/compare/v1.9.5...v1.9.6) ### Change log #### Bug fixes - [https://github.com/qdrant/qdrant/pull/4472](https://togithub.com/qdrant/qdrant/pull/4472) - fix potential panic on recovery sparse vectors from crash - [https://github.com/qdrant/qdrant/pull/4426](https://togithub.com/qdrant/qdrant/pull/4426) - improve error message on missing payload index - [https://github.com/qdrant/qdrant/pull/4375](https://togithub.com/qdrant/qdrant/pull/4375) - fix in-place updates for sparse index - [https://github.com/qdrant/qdrant/pull/4523](https://togithub.com/qdrant/qdrant/pull/4523) - fix missing payload index issue, introduced in v1.9.5 ### [`v1.9.5`](https://togithub.com/qdrant/qdrant/releases/tag/v1.9.5) [Compare Source](https://togithub.com/qdrant/qdrant/compare/v1.9.4...v1.9.5) ### Change log #### Features - [https://github.com/qdrant/qdrant/pull/4254](https://togithub.com/qdrant/qdrant/pull/4254) - Add pyroscope integration for continuous profiling on demand #### Improvements - [https://github.com/qdrant/qdrant/pull/4309](https://togithub.com/qdrant/qdrant/pull/4309) - Allow to configure default number of shards per node - [https://github.com/qdrant/qdrant/pull/4317](https://togithub.com/qdrant/qdrant/pull/4317) - Allow to overwrite optimizer settings via config - [https://github.com/qdrant/qdrant/pull/4312](https://togithub.com/qdrant/qdrant/pull/4312), [https://github.com/qdrant/qdrant/pull/4369](https://togithub.com/qdrant/qdrant/pull/4369) - Improve vector size estimations, making index thresholds more reliable - [https://github.com/qdrant/qdrant/pull/4428](https://togithub.com/qdrant/qdrant/pull/4428) - Improve default maximum segment size, base it on number of CPUs used for indexing - [https://github.com/qdrant/qdrant/pull/4370](https://togithub.com/qdrant/qdrant/pull/4370) - Use consistent RocksDB settings for both put and remove - [https://github.com/qdrant/qdrant/pull/4376](https://togithub.com/qdrant/qdrant/pull/4376) - Improve ordering of insertions and deletions in RocksDB - [https://github.com/qdrant/qdrant/pull/4371](https://togithub.com/qdrant/qdrant/pull/4371) - Log error if segment flushing failed on drop - [https://github.com/qdrant/qdrant/pull/4352](https://togithub.com/qdrant/qdrant/pull/4352) - Promote REST request processing problems from warning to error - [https://github.com/qdrant/qdrant/pull/4368](https://togithub.com/qdrant/qdrant/pull/4368) - Improve error messages in cases of missing vectors - [https://github.com/qdrant/qdrant/pull/4391](https://togithub.com/qdrant/qdrant/pull/4391) - Improve shard state log message, not strictly related to snapshot recovery - [https://github.com/qdrant/qdrant/pull/4414](https://togithub.com/qdrant/qdrant/pull/4414) - Improve Dockerfile, don't invalidate caches each commit and allow debug settings #### Bug fixes - [https://github.com/qdrant/qdrant/pull/4402](https://togithub.com/qdrant/qdrant/pull/4402) - Fix deadlock caused by concurrent snapshot and optimization - [https://github.com/qdrant/qdrant/pull/4411](https://togithub.com/qdrant/qdrant/pull/4411) - Fix potentially losing vectors on crash by enabling RocksDB WAL - [https://github.com/qdrant/qdrant/pull/4416](https://togithub.com/qdrant/qdrant/pull/4416), [https://github.com/qdrant/qdrant/pull/4440](https://togithub.com/qdrant/qdrant/pull/4440) - Respect `max_segment_size` on data ingestion with optimizers disabled, create segments as needed - [https://github.com/qdrant/qdrant/pull/4442](https://togithub.com/qdrant/qdrant/pull/4442) - Fix potentially having bad HNSW links on multithreaded systems ### [`v1.9.4`](https://togithub.com/qdrant/qdrant/releases/tag/v1.9.4) [Compare Source](https://togithub.com/qdrant/qdrant/compare/v1.9.3...v1.9.4) ### Change log #### Bug fixes - [https://github.com/qdrant/qdrant/pull/4332](https://togithub.com/qdrant/qdrant/pull/4332) - Fix potentially losing a segment when creating a snapshot with ongoing updates - [https://github.com/qdrant/qdrant/pull/4342](https://togithub.com/qdrant/qdrant/pull/4342) - Fix potential panic on start if there is no appendable segment - [https://github.com/qdrant/qdrant/pull/4328](https://togithub.com/qdrant/qdrant/pull/4328) - Prevent panic when searching with huge limit ### [`v1.9.3`](https://togithub.com/qdrant/qdrant/releases/tag/v1.9.3) [Compare Source](https://togithub.com/qdrant/qdrant/compare/v1.9.2...v1.9.3) ### Change log #### Improvements - [https://github.com/qdrant/qdrant/pull/4165](https://togithub.com/qdrant/qdrant/pull/4165) - Handle Out-Of-Disk on insertions gracefully - [https://github.com/qdrant/qdrant/pull/3964](https://togithub.com/qdrant/qdrant/pull/3964) - Faster consensus convergence with batched updates - [https://github.com/qdrant/qdrant/pull/4301](https://togithub.com/qdrant/qdrant/pull/4301) - Deduplicate points by ID for custom sharding #### Bug fixes - [https://github.com/qdrant/qdrant/pull/4307](https://togithub.com/qdrant/qdrant/pull/4307) - Fix overflow panic if scroll limit is usize::MAX - [https://github.com/qdrant/qdrant/pull/4322](https://togithub.com/qdrant/qdrant/pull/4322) - Fix panic with missing sparse vectors after recovery of corrupted storage #### Web UI - [https://github.com/qdrant/qdrant-web-ui/pull/183](https://togithub.com/qdrant/qdrant-web-ui/pull/183) - Notification for miss-configured collections Full change log: https://github.com/qdrant/qdrant-web-ui/releases/tag/v0.1.26 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/bosun-ai/swiftide). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40MjAuMSIsInVwZGF0ZWRJblZlciI6IjM3LjQyMC4xIiwidGFyZ2V0QnJhbmNoIjoibWFzdGVyIiwibGFiZWxzIjpbXX0=--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
This will allow us to understand the bottlenecks (through chaos testing) and debug customer nodes when required.
Steps to run:
process_cpu:cpu:nanoseconds:cpu:nanoseconds{service_name="qdrant"}
as the query to see only Qdrant profiling resultsAlternatively you may call the API:
All Submissions:
dev
branch. Did you create your branch fromdev
?Changes to Core Features: