This document gives the best practices on how to use histograms in code and how to document the histograms for the dashboards. There are three general types of histograms: enumerated histograms, count histograms (for arbitrary numbers), and sparse histograms (for anything when the precision is important over a wide range and/or the range is not possible to specify a priori).
[TOC]
Measure exactly what you want, whether that's the time used for a function call, the number of bytes transmitted to fetch a page, the number of items in a list, etc. Do not assume you can calculate what you want from other histograms, as most ways of doing this are incorrect.
For example, suppose you want to measure the runtime of a function that just calls two subfunctions, each of which is instrumented with histogram logging. You might assume that you can simply sum the histograms for those two functions to get the total time, but that results in misleading data. If we knew which emissions came from which calls, we could pair them up and derive the total time for the function. However, histograms are pre-aggregated client-side, which means that there's no way to recover which emissions should be paired up. If you simply add up the two histograms to get a total duration histogram, you're implicitly assuming the two histograms' values are independent, which may not be the case.
Directly measure what you care about; don't try to derive it from other data.
When defining a new metric, think ahead about how you will analyze the data. Often, this will require providing context in order for the data to be interpretable.
For enumerated histograms in particular, that often means including a bucket that can be used as a baseline for understanding the data recorded to other buckets: see the enumerated histogram section.
Histograms are taxonomized into categories, using dot (.
) characters as
separators. Thus, histogram names should be in the form Category.Name or
Category.Subcategory.Name, etc., where each category organizes related
histograms.
It should be quite rare to introduce new top-level categories into the existing taxonomy. If you're tempted to do so, please look through the existing categories to see whether any matches the metric(s) that you are adding. To create a new category, the CL must be reviewed by [email protected].
Prefer the helper functions defined in histogram_functions.h. These functions take a lock and perform a map lookup, but the overhead is generally insignificant. However, when recording metrics on the critical path (e.g. called in a loop or logged multiple times per second), use the macros in histogram_macros.h instead. These macros cache a pointer to the histogram object for efficiency, though this comes at the cost of increased binary size: 130 bytes/macro usage sounds small but quickly adds up.
These logging macros and functions have long names and sometimes include extra parameters (defining the number of buckets for example). Use a helper function if possible. This leads to shorter, more readable code that's also more resilient to problems that could be introduced when making changes. (One could, for example, erroneously change the bucketing of the histogram in one call but not the other.)
When using histogram macros (calls such as UMA_HISTOGRAM_ENUMERATION
), you're
not allowed to construct your string dynamically so that it can vary at a
callsite. At a given callsite (preferably you have only one), the string
should be the same every time the macro is called. If you need to use dynamic
names, use the functions in histogram_functions.h instead of the macros.
If you must use the histogram name in multiple places, use a compile-time constant of appropriate scope that can be referenced everywhere. Using inline strings in multiple places can lead to errors if you ever need to revise the name and you update one location and forget another.
Generally, don't be concerned about the processing cost of emitting to a histogram (unless you're using sparse histograms). The normal histogram code is highly optimized. If you are recording to a histogram in particularly performance-sensitive or "hot" code, make sure you're using the histogram macros; see reasons above.
Enumerated histogram are most appropriate when you have a list of connected / related states that should be analyzed jointly. For example, the set of actions that can be done on the New Tab Page (use the omnibox, click a most visited tile, click a bookmark, etc.) would make a good enumerated histogram. If the total count of your histogram (i.e. the sum across all buckets) is something meaningful—as it is in this example—that is generally a good sign. However, the total count does not have to be meaningful for an enum histogram to still be the right choice.
Enumerated histograms are also appropriate for counting events. Use a simple boolean histogram. It's usually best if you have a comparison point in the same histogram. For example, if you want to count pages opened from the history page, it might be a useful comparison to have the same histogram record the number of times the history page was opened.
In rarer cases, it's okay if you only log to one bucket (say, true
). However,
think about whether this will provide enough context. For
example, suppose we want to understand how often users interact with a button.
Just knowing that users clicked this particular button 1 million times in a day
is not very informative on its own: The size of Chrome's user base is constantly
changing, only a subset of users have consented to metrics reporting, different
platforms have different sampling rates for metrics reporting, and so on. The
data would be much easier to make sense of if it included a baseline: how often
is the button shown?
There is another problem with using another histogram as a comparison point. Google systems for processing UMA data attempt to exclude data that is deemed unreliable or somehow anomalous. It's possible that it may exclude data from a client for one histogram and not exclude data from that client for the other.
If only a few buckets are emitted to, consider using a sparse histogram.
Enums logged in histograms must:
- be prefixed with the comment:
// These values are persisted to logs. Entries should not be renumbered and // numeric values should never be reused.
- be numbered starting from
0
. Note this bullet point does not apply for enums logged with sparse histograms. - have enumerators with explicit values (
= 0
,= 1
,= 2
) to make it clear that the actual values are important. This also makes it easy to match the values between the C++/Java definition and histograms.xml. - not renumber or reuse enumerator values. When adding a new enumerator, append the new enumerator to the end. When removing an unused enumerator, comment it out, making it clear the value was previously used.
If your enum histogram has a catch-all / miscellaneous bucket, put that bucket
first (= 0
). This makes the bucket easy to find on the dashboard if additional
buckets are added later.
In C++, define an enum class
with a kMaxValue
enumerator:
// These values are persisted to logs. Entries should not be renumbered and
// numeric values should never be reused.
//
// LINT.IfChange(NewTabPageAction)
enum class NewTabPageAction {
kUseOmnibox = 0,
kClickTitle = 1,
// kUseSearchbox = 2, // no longer used, combined into omnibox
kOpenBookmark = 3,
kMaxValue = kOpenBookmark,
};
// LINT.ThenChange(//path/to/enums.xml:NewTabPageActionEnum)
The LINT.IfChange / LINT.ThenChange
comments point between the code and XML
definitions of the enum, to encourage them to be kept in sync. See
guide
and more details.
kMaxValue
is a special enumerator that must share the highest enumerator
value, typically done by aliasing it with the enumerator with the highest
value: clang automatically checks that kMaxValue
is correctly set for enum class
.
In Mojo, define an enum
without a kMaxValue
enumerator as kMaxValue
is
autogenerated for Mojo C++ bindings:
// These values are persisted to logs. Entries should not be renumbered and
// numeric values should never be reused.
//
// LINT.IfChange(PreloadType)
enum PrerenderType {
kPrefetch = 0,
// kPrerender = 1, // deprecated, revamped as kPrerender2
kNoStatePrefetch = 2,
kPrerender2 = 3,
};
// LINT.ThenChange(//path/to/enums.xml:PreloadType)
In C++, the histogram helpers use the kMaxValue
convention, and the enum may
be logged with:
UMA_HISTOGRAM_ENUMERATION("NewTabPageAction", action);
or:
UmaHistogramEnumeration("NewTabPageAction", action);
where action
is an enumerator of the enumeration type NewTabPageAction
.
Logging histograms from Java should look similar:
// These values are persisted to logs. Entries should not be renumbered and
// numeric values should never be reused.
@IntDef({NewTabPageAction.USE_OMNIBOX, NewTabPageAction.CLICK_TITLE,
NewTabPageAction.OPEN_BOOKMARK})
private @interface NewTabPageAction {
int USE_OMNIBOX = 0;
int CLICK_TITLE = 1;
// int USE_SEARCHBOX = 2; // no longer used, combined into omnibox
int OPEN_BOOKMARK = 3;
int COUNT = 4;
}
// Using a helper function is optional, but avoids some boilerplate.
private static void logNewTabPageAction(@NewTabPageAction int action) {
RecordHistogram.recordEnumeratedHistogram(
"NewTabPageAction", action, NewTabPageAction.COUNT);
}
Finally, regardless of the programming language you are using, add the definition of the enumerator to enums.xml, and add linter checks to keep the C++/Java and XML values in sync:
<!-- LINT.IfChange(NewTabPageActionEnum) -->
<enum name="NewTabPageActionEnum">
...
</enum>
<!-- LINT.ThenChange(//path/to/cpp_definition.h:NewTabPageAction) -->
Note: this method of defining histogram enums is deprecated. Do not use this for new enums in C++.
Many legacy enums define a kCount
sentinel, relying on the compiler to
automatically update it when new entries are added:
enum class NewTabPageAction {
kUseOmnibox = 0,
kClickTitle = 1,
// kUseSearchbox = 2, // no longer used, combined into omnibox
kOpenBookmark = 3,
kCount,
};
These enums must be recorded using the legacy helpers:
UMA_HISTOGRAM_ENUMERATION("NewTabPageAction", action, NewTabPageAction::kCount);
or:
UmaHistogramEnumeration("NewTabPageAction", action, NewTabPageAction::kCount);
When adding a new flag in
about_flags.cc, you need to add a
corresponding entry to enums.xml. This is automatically verified
by the AboutFlagsHistogramTest
unit test.
To add a new entry:
- After adding flags
to about_flags.cc,
run
generate_flag_enums.py --feature <your awesome feature>
or simplygenerate_flag_enums.py
(slower).
You can alternatively follow these steps:
- Edit enums.xml, adding the feature to the
LoginCustomFlags
enum section, with any unique value (just make one up, although whatever it is needs to appear in sorted order;pretty_print.py
can do this for you). - Build
unit_tests
, then rununit_tests --gtest_filter=AboutFlagsHistogramTest.*
to compute the correct value. - Update the entry in enums.xml with the correct value, and move
it so the list is sorted by value (
pretty_print.py
can do this for you). - Re-run the test to ensure the value and ordering are correct.
You can also use tools/metrics/histograms/validate_format.py
to check the
ordering (but not that the value is correct).
Don't remove entries when removing a flag; they are still used to decode data from previous Chrome versions.
histogram_macros.h provides macros for some common count types, such as memory or elapsed time, in addition to general count macros. These have reasonable default values; you seldom need to choose the number of buckets or histogram min. However, you still need to choose the histogram max (use the advice below).
If none of the default macros work well for you, please thoughtfully choose a min, max, and bucket count for your histogram using the advice below.
For the max, choose a value such that very few histogram samples exceed the max. If a sample is greater than or equal to the max value, it is put in an "overflow" bucket. If this bucket is too large, it can be difficult to compute statistics. One rule of thumb is that at most 1% of samples should be in the overflow bucket (and ideally, less). This allows analysis of the 99th percentile. Err on the side of too large a range versus too short a range. Remember that if you choose poorly, you'll have to wait for another release cycle to fix it.
For the min, use 1 if you care about all possible values (zero and above). All histograms have an underflow bucket for emitted zeros, so a min of 1 is appropriate. Otherwise, choose the min appropriate for your particular situation.
Choose the smallest number of buckets that give you the granularity you need. By default, count histogram bucket sizes increase exponentially with respect to the value (i.e., exponential binning), so you can get fine granularity when the values are small yet still reasonable resolution when the values are larger. The macros default to 50 buckets (or 100 buckets for histograms with wide ranges), which is appropriate for most purposes. Because histograms pre-allocate all the buckets, the number of buckets selected directly dictates how much memory is used. Do not exceed 100 buckets without good reason (and consider whether sparse histograms might work better for you in that case—they do not pre-allocate their buckets).
You can easily emit a time duration (time delta) using base::UmaHistogramTimes, base::UmaHistogramMediumTimes, base::UmaHistogramLongTimes, and their friends. For the critical path, UMA_HISTOGRAM_TIMES, UMA_HISTOGRAM_MEDIUM_TIMES, UMA_HISTOGRAM_LONG_TIMES macros, and their friends, as well as helpers like SCOPED_UMA_HISTOGRAM_TIMER are also available. Many timing histograms are used for performance monitoring; if this is the case for you, please read this document about how to structure timing histograms to make them more useful and actionable.
You can easily emit a percentage histogram using the UMA_HISTOGRAM_PERCENTAGE macro provided in histogram_macros.h. You can also easily emit any ratio as a linear histogram (for equally sized buckets).
For such histograms, you want each value recorded to cover approximately the same span of time. This typically means emitting values periodically at a set time interval, such as every 5 minutes. We do not recommend recording a ratio at the end of a video playback, as video lengths vary greatly.
It is okay to emit at the end of an animation sequence when what's being animated is fixed / known. In this case, each value represents roughly the same span of time.
Why? You typically cannot make decisions based on histograms whose values are recorded in response to an event that varies in length because such metrics can conflate heavy usage with light usage. It's easier to reason about metrics that avoid this source of bias.
Many developers have been bitten by this. For example, it was previously common to emit an actions-per-minute ratio whenever Chrome was backgrounded. Precisely, these metrics computed the number of uses of a particular action during a Chrome session, divided by length of time Chrome had been open. Sometimes, the recorded rate was based on a short interaction with Chrome–a few seconds or a minute. Other times, the recorded rate was based on a long interaction, tens of minutes or hours. These two situations are indistinguishable in the UMA logs–the recorded values can be identical.
The inability to distinguish these two qualitatively different settings make such histograms effectively uninterpretable and not actionable. Emitting at a regular interval avoids the issue. Each value represents the same amount of time (e.g., one minute of video playback).
Histograms can be added via Local macros. These still record locally, but are not uploaded to UMA and are therefore not available for analysis. This can be useful for metrics only needed for local debugging. We don't recommend using local histograms outside of that scenario.
It is common to be interested in logging multidimensional data–where multiple pieces of information need to be logged together. For example, a developer may be interested in the counts of features X and Y based on whether a user is in state A or B. In this case, they want to know the count of X under state A, as well as the other three permutations.
There is no general purpose solution for this type of analysis. We suggest using the workaround of using an enum of length MxN, where you log each unique pair {state, feature} as a separate entry in the same enum. If this causes a large explosion in data (i.e. >100 enum entries), a sparse histogram may be appropriate. If you are unsure of the best way to proceed, please contact someone from the OWNERS file.
Histogram expiry is specified by the expires_after
attribute in histogram
descriptions in histograms.xml. It is a required attribute. The attribute can
be specified as date in YYYY-MM-DD format or as Chrome milestone in
M*(e.g. M105) format. In the latter case, the actual expiry date is about
12 weeks after that branch is cut, or basically when it is replaced on the
"stable" channel by the following release.
After a histogram expires, it ceases to be displayed on the dashboard. Follow these directions to extend it.
Once a histogram has expired, the code that records it becomes dead code and should be removed from the codebase. You should also clean up the corresponding entry in histograms.xml. In rare cases, a histogram may be expired intentionally while keeping the code around; such cases must be annotated appropriately in histograms.xml.
In rare cases, the expiry can be set to "never". This is used to denote metrics of critical importance that are, typically, used for other reports. For example, all metrics of the "heartbeat" are set to never expire. All metrics that never expire must have an XML comment describing why so that it can be audited in the future. Setting an expiry to "never" must be reviewed by [email protected].
<!-- expires-never: "heartbeat" metric (internal: go/uma-heartbeats) -->
It is never appropriate to set the expiry to "never" on a new histogram. Most new histograms don't turn out to have the properties the implementer wants, whether due to bugs in the implementation or simply an evolving understanding of what should be measured.
Here are some guidelines for common scenarios:
- If the listed owner moved to a different project, find a new owner.
- If neither the owner nor the team uses the histogram, remove it.
- If the histogram is not in use now, but might be useful in the far future, remove it.
- If the histogram is not in use now, but might be useful in the near future, pick ~3 months (also ~3 milestones) ahead.
- Otherwise, pick an expiry that is reasonable for how long the metric should be used, up to a year.
We also have a tool that automatically extends expiry dates. The most frequently accessed histograms, currently 99%, have their expirations automatically extended every Tuesday to 6 months from the date of the run. Googlers can view the design doc of the program that does this. The bottom line is: If the histogram is being checked, it should be extended without developer interaction.
In general, set an expiry that is reasonable for how long the metric should be used, up to a year.
Some common cases:
- When adding a histogram to evaluate a feature launch, set an expiry date consistent with the expected feature launch date.
- If you expect the histogram to be useful for an indefinite time, set an expiry date up to 1 year out. This gives a chance to re-evaluate whether the histogram indeed proved to be useful.
- Otherwise, 3-6 months (3-6 milestones) is typically a good choice.
You can revive an expired histogram by setting the expiration date to a date in the future.
There's some leeway here. A client may continue to send data for that histogram for some time after the official expiry date so simply bumping the 'expires_after' date at HEAD may be sufficient to resurrect it without any data discontinuity.
If a histogram expired more than a month ago (for histograms with an expiration date) or more than one milestone ago (for histograms with expiration milestones; this means top-of-tree is two or more milestones away from expired milestone), then you may be outside the safety window. In this case, when extending the histogram add to the histogram description a message: "Warning: this histogram was expired from DATE to DATE; data may be missing." (For milestones, write something similar.)
When reviving a histogram outside the safety window, realize the change to histograms.xml to revive it rolls out with the binary release. It takes some time to get to the stable channel.
It you need to revive it faster, the histogram can be re-enabled via adding to the expired histogram allowlist.
The expired histogram notifier notifies histogram owners before their histograms expire by creating crbugs, which are assigned to owners. This allows owners to extend the lifetime of their histograms, if needed, or deprecate them. The notifier regularly checks all histograms across the histograms.xml files and identifies expired or soon-to-be expired histograms. It then creates or updates crbugs accordingly.
If a histogram expires but turns out to be useful, you can add the histogram's name to the allowlist to re-enable logging for it, until the updated expiration date reaches the Stable channel. When doing so, update the histogram's summary to document the period during which the histogram's data is incomplete. To add a histogram to the allowlist, see the internal documentation: Histogram Expiry.
In rare cases, a histogram may be expired intentionally while keeping the code around. For example, this can be useful for diagnostic metrics that are occasionally needed to investigate specific bugs, but do not need to be reported otherwise.
To avoid such histograms to be flagged for code clean up, they must be annotated
in the histograms.xml with the expired_intentionally
tag as follows:
<histogram name="Tab.Open" enum="TabType" expires_after="M100">
<expired_intentionally>Kept as a diagnostic metric.</expired_intentionally>
<owner>[email protected]</owner>
<summary>Histogram summary.</summary>
</histogram>
Test your histograms using chrome://histograms
. Make sure they're being
emitted to when you expect and not emitted to at other times. Also check that
the values emitted to are correct. Finally, for count histograms, make sure
that buckets capture enough precision for your needs over the range.
Pro tip: You can filter the set of histograms shown on chrome://histograms
by
appending to the URL. For example, chrome://histograms/UserActions
shows
only histograms whose names contain "UserActions", such as
"UMA.UserActionsCount".
In addition to testing interactively, you can have unit tests examine the values emitted to histograms. See histogram_tester.h for details.
See also chrome://metrics-internals
(docs)
for more thorough manual testing if needed.
By default, histograms in unit or browser tests will not be actually uploaded. In general, you can rely on the UMA infrastructure to upload the metrics correctly.
Do not rely upon using histograms in tests as a way to prove correctness of
your main program logic. If a unit or browser test uses a histogram count as a
way to validate logic then that test coverage would be lost if the histogram is
deleted after it has expired. That situation would prevent cleanup of the
histogram. Construct your tests using other means to validate your general
logic, and only use
HistogramTester
to verify that the histogram values are being generated as you would expect.
If you have or entries that need to be updated to match code, you can use HistogramEnumReader or HistogramVariantsReader to read and verify the expected values in a unit test. This prevents a mismatch between code and histogram data from slipping through CQ.
For an example, see BrowserUserEducationServiceTest.CheckFeaturePromoHistograms.
The top of go/uma-guide has good advice on how to go about analyzing and interpreting the results of UMA data uploaded by users. If you're reading this page, you've probably just finished adding a histogram to the Chromium source code and you're waiting for users to update their version of Chrome to a version that includes your code. In this case, the best advice is to remind you that users who update frequently / quickly are biased. Best take the initial statistics with a grain of salt; they're probably mostly right but not entirely so.
When changing the semantics of a histogram (when it's emitted, what the buckets represent, the bucket range or number of buckets, etc.), create a new histogram with a new name. Otherwise analysis that mixes the data pre- and post- change may be misleading. If the histogram name is still the best name choice, the recommendation is to simply append a '2' to the name. See Cleaning Up Histogram Entries for details on how to handle the XML changes.
Please delete code that emits to histograms that are no longer needed. Histograms take up memory. Cleaning up histograms that you no longer care about is good! But see the note below on Cleaning Up Histogram Entries.
Document histograms in an appropriate metadata/foo/histograms.xml file.
There is also a google-internal version of the file for two cases:
- The histogram is confidential (an accurate description about how to interpret the histogram would reveal information about Google's plans). In this case, you must only document the histogram in the internal version.
- The corresponding code that emits the histogram is internal (added only to Chrome code, not to Chromium code). In this case, you may document the histogram in either the internal or external version.
If possible, please add the histograms.xml description in the same changelist in which you add the histogram-emitting code. This has several benefits. One, it sometimes happens that the histograms.xml reviewer has questions or concerns about the histogram description that reveal problems with interpretation of the data and call for a different recording strategy. Two, it allows the histogram reviewer to easily review the emission code to see if it comports with these best practices and to look for other errors.
Histogram descriptions should be roughly understandable to someone not familiar with your feature. Please add a sentence or two of background if necessary.
Note any caveats associated with your histogram in the summary. For example, if the set of supported platforms is surprising, such as if a desktop feature is not available on Mac, the summary should explain where it is recorded. It is also common to have caveats along the lines of "this histogram is only recorded if X" (e.g., upon a successful connection to a service, a feature is enabled by the user).
Histogram descriptions should clearly state when the histogram is emitted (profile open? network request received? etc.).
Some histograms record error conditions. These should be clear about whether all errors are recorded or only the first. If only the first, the histogram description should have text like:
In the case of multiple errors, only the first reason encountered is recorded. Refer
to Class::FunctionImplementingLogic() for details.
For enumerated histograms, including boolean and sparse histograms, provide an
enum=
attribute mapping enum values to semantically contentful labels. Define
the <enum>
in enums.xml if none of the existing enums are a good fit. Use
labels whenever they would be clearer than raw numeric values.
For non-enumerated histograms, include a units=
attribute. Be specific:
e.g. distinguish "MB" vs. "MiB", refine generic labels like "counts" to more
precise labels like "pages", etc.
Histograms need owners, who are the experts on the metric and the points of contact for any questions or maintenance tasks, such as extending a histogram's expiry or deprecating the metric.
Histograms must have a primary owner and may have secondary owners. A primary owner is a Googler with an @google.com or @chromium.org email address, e.g. [email protected], who is ultimately responsible for maintaining the metric. Secondary owners may be other individuals, team mailing lists, e.g. [email protected], or paths to OWNERS files, e.g. src/directory/OWNERS.
It's a best practice to list multiple owners, so that there's no single point of failure for histogram-related questions and maintenance tasks. If you are using a metric heavily and understand it intimately, feel free to add yourself as an owner.
Notably, owners are asked to determine whether histograms have outlived their usefulness. When a histogram is nearing expiry, a robot files a reminder bug in Monorail. It's important that somebody familiar with the histogram notices and triages such bugs!
Tip: When removing someone from the owner list for a histogram, it's a nice courtesy to ask them for approval.
Histograms may be associated with a component, which can help make sure that histogram expiry bugs don't fall through the cracks.
A histogram is associated with the buganizer_public
component listed in the
DIR_METADATA file adjacent to the histograms.xml file if present.
There are two other ways in which components may be associated with a
histogram. The first way is to add a tag containing the component ID to a
histogram or histogram suffix, e.g. 1456399. The second
way is to specify an OWNERS file as a secondary owner for a histogram. If the
OWNERS file has an adjacent DIR_METADATA file that contains a
buganizer_public
component, then that component is associated with the
histogram. If there isn't a parallel DIR_METADATA file with such a component,
but an ancestor directory has one, then the ancestor directory's component is
used.
If more than one component is associated with a histogram, tag is favored over adjacent DIR_METADATA file and over OWNERS file.
Note: For non-Chromium Issue Tracker (ChromeOS Public Tracker or internal) components, make sure [email protected] has access to create and update issues.
For some histograms, an increase or a decrease in the reported values can be
associated with either an improvement or a deterioration. For example, if you
are tracking page load speed, then seeing your metrics tracking page load time
in milliseconds getting gradually larger values, perhaps as the result of a
Finch study, may signify worse performance; on the contrary, seeing a reduction
in the page load speed may indicate an improvement. You can provide this
information on the movement direction by adding a tag
<improvement direction="LOWER_IS_BETTER"/>
within your <histogram>
. The
opposite is <improvement direction="HIGHER_IS_BETTER"/>
.
For other histograms where there may not be a movement direction that's clearly
better, you can set <improvement direction="NEITHER_IS_BETTER"/>
.
This <improvement>
tag is optional. You can also add/delete this tag or make a
correction to its direction
attribute any time.
When the code to log a histogram is deleted, its corresponding histograms.xml entry should also be removed. Past histogram data will still be available for viewing on Google's internal UMA dashboard.
The CL to remove one or more histograms can also specify an obsoletion message through special syntax in the CL description. This also applies to variants of a patterned histogram and to suffix entries for a suffixed histogram.
The changelist that obsoletes a histogram entry should be reviewed by all current owners.
Delete the entry in the histograms.xml file.
- In some cases there may be artifacts that remain, with some examples being:
- Empty
<token>
blocks, or individual<variant>
s. <enum>
blocks from enums.xml that are no longer used.- Suffix entries in
histogram_suffixes_list.xml
.
- Empty
- Please remove these artifacts if you find them.
- Exception: please update the label of
<int value=... label=... />
with the(Obsolete)
prefix, e.g.<int value="1" label="(Obsolete) Navigation failed. Removed in 2023/01."/>
rather than deleting them, if the surrounding<enum>
block is not being deleted.
- Exception: please update the label of
An obsoletion message is displayed on the dashboard and provides developers context for why the histogram was removed and, if applicable, which histogram it was replaced by.
Note: You can skip this step if the histogram is expired. This is because tooling automatically records the date and milestone of a histogram's removal.
You can provide a custom obsoletion message for a removed histogram via tags on the CL description:
- Add the obsoletion message in the CL description in the format
OBSOLETE_HISTOGRAM[histogram name]=message
, e.g.:OBSOLETE_HISTOGRAM[Tab.Count]=Replaced by Tab.Count2
- To add the same obsoletion message to all the histograms removed in the CL,
you can use
OBSOLETE_HISTOGRAMS=message
, e.g.:OBSOLETE_HISTOGRAMS=Patterned histogram Hist.{Token} is replaced by Hist.{Token}.2
- Notes:
- The full tag should be put on a single line, even if it is longer than the maximum CL description width.
- You can add multiple obsoletion message tags in one CL.
OBSOLETE_HISTOGRAMS
messages will be overwritten by histogram-specific ones, if present.
- You could also include information about why the histogram was removed. For example, you might indicate how the histogram's summary did not accurately describe the collected data.
- If the histogram is being replaced, include the name of the replacement and make sure that the new description is different from the original to reflect the change between versions.
It is sometimes useful to record several closely related metrics, which measure the same type of data, with some minor variations. You can declare the metadata for these concisely using patterned histograms. For example:
<histogram name="Pokemon.{Character}.EfficacyAgainst{OpponentType}"
units="multiplier" expires_after="M95">
<owner>[email protected]</owner>
<owner>[email protected]</owner>
<summary>
The efficacy multiplier for {Character} against an opponent of
{OpponentType} type.
</summary>
<token key="Character">
<variant name="Bulbasaur"/>
<variant name="Charizard"/>
<variant name="Mewtwo"/>
</token>
<token key="OpponentType">
<variant name="Dragon" summary="dragon"/>
<variant name="Flying" summary="flappity-flap"/>
<variant name="Psychic" summary="psychic"/>
<variant name="Water" summary="water"/>
</token>
</histogram>
This example defines metadata for 12 (= 3 x 4) concrete histograms, such as
<histogram name="Pokemon.Charizard.EfficacyAgainstWater"
units="multiplier" expires_after="M95">
<owner>[email protected]</owner>
<owner>[email protected]</owner>
<summary>
The efficacy multiplier for Charizard against an opponent of water type.
</summary>
</histogram>
Each token <variant>
defines what text should be substituted for it,
both in the histogram name and in the summary text. The name part gets
substituted into the histogram name; the summary part gets substituted in
the summary field (the histogram description). As shorthand, a
<variant>
that omits the summary
attribute substitutes the value of
the name
attribute in the histogram's <summary>
text as well.
*** promo
Tip: You can declare an optional token by listing an empty name: <variant name="" summary="aggregated across all breakdowns"/>
. This can be useful when
recording a "parent" histogram that aggregates across a set of breakdowns.
You can use the <variants>
tag to define a set of <variant>
s out-of-line.
This is useful for token substitutions that are shared among multiple families
of histograms within the same file. See
histograms.xml
for examples.
*** promo
Warning: The name
attribute of the <variants>
tag is globally scoped, so
use detailed names to avoid collisions. The <variants>
defined should only
be used within the file.
By default, a <variant>
inherits the owners declared for the patterned
histogram. Each variant can optionally override the inherited list with custom
owners:
<variant name="SubteamBreakdown" ...>
<owner>[email protected]</owner>
<owner>[email protected]</owner>
</variant>
*** promo
Tip: You can run print_expanded_histograms.py --pattern=
to show all generated
histograms by patterned histograms or histogram suffixes including their
summaries and owners. For example, this can be run (from the repo root) as:
./tools/metrics/histograms/print_expanded_histograms.py --pattern=^UMA.A.B
*** promo
Tip: You can run print_histogram_names.py --diff
to enumerate all the
histogram names that are generated by a particular CL. For example, this can be
run (from the repo root) as:
./tools/metrics/histograms/print_histogram_names.py --diff origin/main
For documentation about the <histogram_suffixes>
syntax, which is deprecated,
see
https://chromium.googlesource.com/chromium/src/+/refs/tags/87.0.4270.1/tools/metrics/histograms/one-pager.md#histogram-suffixes-deprecated-in-favor-of-pattern-histograms
Sparse histograms are well-suited for recording counts of exact sample values that are sparsely distributed over a large range. They can be used with enums as well as regular integer values. It is often valuable to provide labels in enums.xml.
The implementation uses a lock and a map, whereas other histogram types use a vector and no lock. It is thus more costly to add values to, and each value stored has more overhead, compared to the other histogram types. However it may be more efficient in memory if the total number of sample values is small compared to the range of their values.
Please talk with the metrics team if there are more than a thousand possible different values that you could emit.
For more information, see sparse_histograms.h.
Any Chromium committer who is also a Google employee is eligible to become a metrics reviewer. Please follow the instructions at go/reviewing-metrics. This consists of reviewing our training materials and passing an informational quiz. Since metrics have a direct impact on internal systems and have privacy considerations, we're currently only adding Googlers into this program.
If you are a metric OWNER, you have the serious responsibility of ensuring Chrome's data collection is following best practices. If there's any concern about an incoming metrics changelist, please escalate by assigning to [email protected].
When reviewing metrics CLs, look at the following, listed in approximate order of importance:
Does anything tickle your privacy senses? (Googlers, see go/uma-privacy for guidelines.)
Please escalate if there's any doubt!
Is the metadata clear enough for all Chromies to understand what the metric is recording? Consider the histogram name, description, units, enum labels, etc.
It's really common for developers to forget to list when the metric is recorded. This is particularly important context, so please remind developers to clearly document it.
Note: Clarity is a bit less important for very niche metrics used only by a couple of engineers. However, it's hard to assess the metric design and correctness if the metadata is especially unclear.
- Does the metric definition make sense?
- Will the resulting data be interpretable at analysis time?
Is the histogram being recorded correctly?
-
Does the bucket layout look reasonable?
-
The metrics APIs like base::UmaHistogram* have some sharp edges, especially for the APIs that require specifying the number of buckets. Check for off-by-one errors and unused buckets.
-
Is the bucket layout efficient? Typically, push back if there are >50 buckets -- this can be ok in some cases, but make sure that the CL author has consciously considered the tradeoffs here and is making a reasonable choice.
-
For timing metrics, do the min and max bounds make sense for the duration that is being measured?
-
-
The base::UmaHistogram* functions are generally preferred over the UMA_HISTOGRAM_* macros. If using the macros, remember that names must be runtime constants!
Also, related to clarity: Does the client logic correctly implement the metric described in the XML metadata? Some common errors to watch out for:
-
The metric is only emitted within an if-stmt (e.g., only if some data is available) and this restriction isn't mentioned in the metadata description.
-
The metric description states that it's recorded when X happens, but it's actually recorded when X is scheduled to occur, or only emitted when X succeeds (but omitted on failure), etc.
When the metadata and the client logic do not match, the appropriate solution might be to update the metadata, or it might be to update the client logic. Guide this decision by considering what data will be more easily interpretable and what data will have hidden surprises/gotchas.
-
Is the CL adding a reasonable number of metrics/buckets?
- When reviewing a CL that is trying to add many metrics at once, guide the CL author toward an appropriate solution for their needs. For example, multidimensional metrics can be recorded via UKM, and we are currently building support for structured metrics in UMA.
- There's no hard rule, but anything above 20 separate histograms should be escalated by being assigned to [email protected].
- Similarly, any histogram with more than 100 possible buckets should be escalated by being assigned to [email protected].
-
Are expiry dates being set appropriately?
This document describes many other nuances that are important for defining and recording useful metrics. Check CLs for these other types of issues as well.
And, as you would with a language style guide, periodically re-review the doc to stay up to date on the details.
When working with histograms.xml, verify whether you require fully expanded OWNERS files. Many scripts in this directory process histograms.xml, and sometimes OWNERS file paths are expanded and other times they are not. OWNERS paths are expanded when scripts make use of merge_xml's function MergeFiles; otherwise, they are not.