[go: nahoru, domu]

Page MenuHomePhabricator

Only select o11y-owned datasources on the Grafana Datasource utilization dashboard
Open, MediumPublic

Description

In preparation for T228380: Tech debt: sunsetting of Graphite, The SRE Observability team wishes to improve overall reporting pending sources to migrate before we can comfortably turn things off without affecting any production system.

Currently tracking via overall ingest on both systems in the Graphite vs Prometheus metrics Dashboard, we have identified the need to track in use metrics separately as a better indicator of when we can effectively turn Graphite read-only as an initial step towards deprecation.

For the sake of this effort, we are defining "in-use" metrics as data points emitted to graphite and captured in some dashboard or alert (making them useful). Metrics stored and un-accessed will be flagged for sunsetting and deprecation.

With this definition in mind, tracking dashboards/panels in Grafana that are backed by graphite metrics will be our key migration metric to indicate migration progress at the later stages.

The Grafana Datasource Utilization dashboard should then only count queries to the graphite data source. Other data sources (e.g., graphite-synthetic-testing) are not within the scope of graphite deprecation effort.

Event Timeline

herron subscribed.

Overall this dashboard is meant to show graphite utilization for the whole installation, so I think the thing to do is add filters to drill down as needed.

For now I've manually created a variable that contains dashboard uids from graphite-synthetic-testing, and the dashboard should now load with these excluded by default.

Longer-term I'm thinking of a metric that exposes essentially grafana-wtf explore datasources and using that to back the filter variable query. But I'm not sure I'll have time to work on it soon.

With the simple approach in place I think its safe to set this to low for now

lmata raised the priority of this task from Low to Medium.May 21 2024, 3:25 PM
lmata updated the task description. (Show Details)