[go: nahoru, domu]

Closed Bug 1657947 Opened 4 years ago Closed 1 month ago

New Metric Type: "Surface" aka 2D Distributions aka ...

Categories

(Data Platform and Tools :: Glean Metric Types, enhancement, P1)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: chutten, Assigned: chutten)

References

(Blocks 1 open bug)

Details

(Whiteboard: [telemetry-parity])

Attachments

(4 files)

Proposal for changing an existing or adding a new Glean metric type

Who is the individual/team requesting this change?

:chutten, for Project FOG and on behalf of generic use cases

Is this about changing an existing metric type or creating a new one?

Creating a new (perhaps compound) metric type

Can you describe the data that needs to be recorded?

To answer a question like "How many tabs does a Firefox usually have open?" we need buckets that are tab counts and whose values are a timing distribution. (for this specific question we could get away with the bucket values being timespans instead, but as soon as you know how many tabs Firefox usually has open you'll want to know if that time was spent all at once or in several small pieces over the measurement window). This is a use case for continuous-continuous surfaces where the number of tabs is a continuous distribution and the timing samples for each tab count are continuous distributions, and this is approximated in Firefox Telemetry by some really odd uses of keyed exponential histograms. (And there's MEMORY_DISTRIBUTION_AMONG_CONTENT which probably wishes it was a surface)

(And then there's VIDEO_SUSPEND_RECOVERY_TIME_MS and VIDEO_HIDDEN_PLAY_TIME_PERCENTAGE and friends which are discrete-continuous-continuous volumes)

Other data that could be handled by this sort of construct are those described in bug 1657470 and bug 1657473. They describe concrete use-cases of discrete-discrete surfaces (here meaning that both axes are discrete/discontinuous/categorical as opposed to being continuous (yes, our timing/memory distributions are actually discrete but they're approximating a continuous distribution)).

Keyed Histograms in Firefox Telemetry are an example of data that could be described by a discrete-continuous surface. (whether this is a good idea or not is not an evaluation I am prepared to make at this time).

Can you provide a raw sample of the data that needs to be recorded (this is in the abstract, and not any particular implementation details about its representation in the payload or the database)

For a session where Firefox has some distribution of times with 0 tabs, 1 tab, etc, it might look like

time_spent_with_tab_count: {0: {1: 23, 2: 42, ...}, 1: {1: 56, 2: 67, ...}, ...}

What is the business question/use-case that requires the data to be recorded?

Various, but browser engagement is the original driver.

How would the data be consumed?

GLAM could do some wicked-neato datavis stuff with surfaces, I'm sure, but I assume initial consumption will be of summary statistics of the values in each bucket displayed on something like re:dash. UDFs will almost certainly need to be written to help manage this.

Why existing metric types are not enough?

There does not exist anything that handles this well in either the Glean SDK or Firefox Telemetry.

What is the timeline by which the data needs to be collected?

Unknown.

Whiteboard: [telemetry-parity]

We have a use-case that I believe would be well suited to a discrete-continuous surface. i.e. a keyed timing distribution.

We are measuring performance timings within the network component.
But because each network request is prioritized (for example, trackers are given very low priority) it would be ideal to have the timings keyed on their classOfService rather than having all values merged.
It's not a performance concern if low-priority tracking resources are slow (in fact it's intentional), so we'd prefer to see the distributions independently grouped by priority. (There are currently eleven classOfService flags).

Another use-case in Necko is breaking down performance metrics by protocol version.
i.e. timing the same sequence, but under http1/http2/http3.
While we could make three probes for each measurement, a keyed histogram would make development and analysis easier.

Assignee: nobody → chutten
Severity: -- → N/A
Status: NEW → ASSIGNED
Priority: -- → P1
See Also: → 1695795
Blocks: 1907945

There are two remaining steps to add this new metric type to the data platform, as documented here: https://mozilla.github.io/glean/dev/core/new-metric-type/platform.html

ni?myself to get to those today.

Flags: needinfo?(chutten)

Actually, I'm going to skip adding labeled_{custom|memory|timing}_distribution support to lookml-generator at this time as I don't think there's a way for me to add support in a way that'd allow people to use data from those metric types correctly.

Flags: needinfo?(chutten)

Hi Chris, looking at the patches it seems like labeled_timing_distribution's may be ready for use quite soon.
Is that right?
This would be very useful to us, e.g. bug 1907418 and bug 1908234.

Flags: needinfo?(chutten)

(In reply to Andrew Creskey [:acreskey] from comment #10)

Hi Chris, looking at the patches it seems like labeled_timing_distribution's may be ready for use quite soon.
Is that right?
This would be very useful to us, e.g. bug 1907418 and bug 1908234.

Depends on your definition of "quite soon", but it will certainly be sooner rather than later. The work between here and being useful in mozilla-central for bugs like 1907418 and 1908234 are:

  1. Metric implementation review and landing: https://github.com/mozilla/glean/pull/2896
    • It's a little chunky of a review because it required changing how all labeled_* metrics are constructed. Jan-Erik has already started looking at it.
    • After this is done, this bug will be resolved.
  2. New Glean SDK release
    • This is usually routine
    • We're expecting to craft this release this week
  3. Vendor the new Glean SDK release into mozilla-central
  4. Expose the new metric types in Firefox on Glean so they're usable in Firefox
    • This is necessary because e.g. Firefox is IPC-aware and has gecko-specific datatypes and requires C++ and JS APIs in addition to Rust.
    • This is usually routine. This one may be a little more complex because it's labeled.
    • I've already begun work on this using the under-review implementation from step 1, and will continue working on this in parallel
    • You can follow this work in, and optionally block your instrumentation bugs on, bug 1907945. This bug will be closed after Step 1

So we're definitely a lot closer to them being useful than we were just a little while ago. But there's still non-trivial amounts of work to be done, even if it's mostly routine.

Flags: needinfo?(chutten)

Thank you; we're happy to wait as this will greatly simplify our collection.

Blocks: 1908760

chutten merged PR [mozilla/glean]: Bug1657947 New metric types: labeled_{custom|memory|timing}_distribution (#2896) in fd68d93.

This concludes the implementation work of these new metric types in the Glean SDK (Rust bindings only). Work will continue with cutting a release and vendoring it into m-c (no bug yet), then adding Firefox Desktop-specific APIs, features, docs, and tests to make it usable (bug 1907945) (I've already got labeled_custom_distribution working against a local release of the Glean SDK, so progress is chugging along nicely).

Status: ASSIGNED → RESOLVED
Closed: 1 month ago
Resolution: --- → FIXED
Blocks: 1909244
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: