Run Null A/B test for DYM suggestions
Analysis of the A/B test for Glent Method 0 (M0) has proven to be a bit difficult because of the large variability in user behavior between the buckets, which is dominated by the performance of the phrase suggester suggestions.

In order to get a better idea of the baseline variability of the phrase suggester (which appears to be all over the place, and which dominates the composite suggestion performance), use the control bucket data from the current A/B to generate a bootstrapped estimate of the variance of the key DYM metrics, as well as the range of common values.

(If the variability is high and we are feeling ambitious, we could also discuss a follow-up task to try to find the outlying sources of high variability, if any—such as web page–scraping bots or shared IP addresses—and filter them.)

Closing this because we have already done some reporting and dealt with some of the difficulties there. We don't really need the null A/B report anymore.