docs: add a guide to running Lighthouse at scale #10511

paulirish · 2020-03-26T01:44:08Z

A partner wanted some high level guidance on this topic and I thought it'd make decent sense to have here in the repo.

Any high level thoughts before ya'll tear me apart in the details?

patrickhulce

I think this is great! covers the gist of each really well I'd say

patrickhulce · 2020-03-26T18:49:24Z

docs/running-at-scale.md

+
+# Running Lighthouse at Scale
+
+Many Lighthouse users want to collect Lighthouse data for hundreds or thousands of URLs daily. First, anyone interested should understand [how variability plays into web performance measurement](./variability.md) in the lab.


love that this is a prereq :)

patrickhulce · 2020-03-26T18:52:33Z

docs/running-at-scale.md

+
+* PRO: You don't need to maintain testing hardware.
+* PRO: A simple network request returns complete Lighthouse results
+* CON: The URLs must be web-accessible.


I think the other big con here is limited control over configuration, even if the URL is web accessible, testing behind auth, different form factors or throttling isn't really possible

patrickhulce · 2020-03-26T18:54:15Z

docs/running-at-scale.md

+
+* PRO: Easy multiple-run configuration, selection of median run
+* PRO: Server UI offers timeseries graphs [(example)](https://lhci-canary.herokuapp.com/app/projects/d1e4b15c-e644-4552-b136-e975f486a2ce/dashboard) supported by [straightforward APIs](https://github.com/GoogleChrome/lighthouse-ci/blob/master/packages/server/src/api/routes/projects.js).
+* CON: Must create and maintain testing environment


in this context, PSI + LHCI suddenly makes a lot more sense. I was pretty dismissive of PSI as a core CI use case, but production monitoring it'd be kind of a game changer. could run a simple cron on the LHCI server to request PSI results even, so it's a single docker deploy to get setup.

with that lens we might then even say "here are your choices for collection: PSI, self-maintained CLI" and "here are your choices for storage/querying: LHCI, bespoke storage thing™ "

patrickhulce · 2020-03-26T19:04:17Z

docs/running-at-scale.md

+* PRO: A simple network request returns complete Lighthouse results
+* CON: The URLs must be web-accessible.
+
+Approx eng effort: ~5 minutes for the first result. ~30 minutes for a script that evaluates and saves the results for hundreds of URLs.


as someone who has written multiple systems that save lighthouse results to some sort of database now I think ~30 minutes is a massive underestimation of trying to save and query LH results in any way that isn't "dump these reports to a local filesystem"

I think a big con of 1&2 and pro of 3 is that you don't need to worry about doing a bunch of work to consume the data. There's a great emphasis here already on creating and maintaining a test environment which is definitely a big stumbling block, but the storage of historical data is also pretty complex and annoying to build. Being upfront about that might help folks choose the right solution for them.

e.g. if you've got some big bespoke storage plan for how you consume the data on your platform then LHCI isn't really bringing much to the table for you, but if it's something you haven't thought about at all, then you're gonna be in for a lot of frustration pretty quick with option 1 or 2 when you realize you can't do anything useful with this hunk of reports on your filesystem

WDYT about breaking the eng effort into 3 components instead of 2? "first result", "setup of larger system to collect", "setup of larger system to query/consume"

patrickhulce · 2020-03-26T19:06:33Z

docs/running-at-scale.md

+
+## Option 2: Using the Lighthouse CLI on cloud hardware
+
+The [Lighthouse CLI](https://github.com/GoogleChrome/lighthouse#using-the-node-cli) is the foundation of most advanced uses of Lighthouse and provides considerable configuration possibilities. For example, you could launch a fresh Chrome in a debuggable state (`chrome-debug --port=9222`) and then have Lighthouse repeatedly reuse the same Chrome. (`lighthouse <url> --port=9222`). That said, we wouldn't recommend this above a hundred loads, as state can accrue in a Chrome profile. Using a fresh profile for each Lighthouse run is the best approach for reproducible results.


should we use an example that we would endorse? :)

maybe custom headers/throttling options/puppeteer/etc?

patrickhulce · 2020-03-26T19:13:30Z

docs/running-at-scale.md

+[Lighthouse CI](https://github.com/GoogleChrome/lighthouse-ci#readme) leverages the CLI at its core and it provides a complete experience for those who want to understand how each commit in development affects their Lighthouse results. So while the product is designed for running Lighthouse for every pushed git commit, it's possible to use it for some production monitoring usecases. See [this recipe](https://github.com/GoogleChrome/lighthouse-ci/issues/5#issuecomment-591578507) to fake git commit data while testing production URLs.
+
+* PRO: Easy multiple-run configuration, selection of median run
+* PRO: Server UI offers timeseries graphs [(example)](https://lhci-canary.herokuapp.com/app/projects/d1e4b15c-e644-4552-b136-e975f486a2ce/dashboard) supported by [straightforward APIs](https://github.com/GoogleChrome/lighthouse-ci/blob/master/packages/server/src/api/routes/projects.js).


discussed previously but I think this is like the PRO

patrickhulce · 2020-03-26T19:14:29Z

docs/running-at-scale.md

+
+You'll be running Lighthouse CLI on your own machines, and we have guidance on the [specs of machines suitable](./variability.md#run-on-adequate-hardware) for running Lighthouse without skewing performance results. The environment must also be able to run either headful Chrome or headless Chrome.
+
+* PRO: Ultimate configurability


PRO: supports on-premise and private URLs

It would be good to have some recommendations if you want to run lighthouse inside a Docker container.
We have been running lighthouse + puppeteer in this way and we have seen drastic changes from v5 to v6.

Our general recommendation on docker is: don't if you can help it

Probably worth adding to this doc @paulirish the various issues we've seen with docker and why it should be avoided (shared memory issue and flags to workaround, etc) :)

I found this on and old issue: #6162 (comment)

I've been looking for documentation about "Device Class" or if is still being used.
Lately I've started to get with V6 very different results, but if I contrast this with the benchmarkIndex property, makes a lot of sense.
I don't know if is worth to add to the documentation in case anyone want to use it.

In my case, I will create a bucket for devices (similar to the device class) and group the results by ranges of benchmarkIndex

We shelved the device class targeting because the absolute value of BenchmarkIndex has different characteristics across OS and CPU architecture, so we didn't feel confident enough to automatically adjust thresholds with it. The new plan is to simply put warnings in the report if we think it was running on an underpowered device (see #9085).

patrickhulce · 2020-03-26T19:14:41Z

docs/running-at-scale.md

+
+[Lighthouse CI](https://github.com/GoogleChrome/lighthouse-ci#readme) leverages the CLI at its core and it provides a complete experience for those who want to understand how each commit in development affects their Lighthouse results. So while the product is designed for running Lighthouse for every pushed git commit, it's possible to use it for some production monitoring usecases. See [this recipe](https://github.com/GoogleChrome/lighthouse-ci/issues/5#issuecomment-591578507) to fake git commit data while testing production URLs.
+
+* PRO: Easy multiple-run configuration, selection of median run


PRO: supports on-premise and private URLs

paulirish added 3 commits March 25, 2020 18:33

docs: add a guide to running Lighthouse at scale

c66f8e8

add another link to variability

b7d001e

wording

d7dbdd4

paulirish requested a review from a team as a code owner March 26, 2020 01:44

paulirish requested review from patrickhulce and removed request for a team March 26, 2020 01:44

googlebot added the cla: yes label Mar 26, 2020

devtools-bot assigned patrickhulce Mar 26, 2020

devtools-bot added the waiting4reviewer label Mar 26, 2020

wording

025c778

vercel bot deployed to Preview March 26, 2020 01:53 View deployment

patrickhulce reviewed Mar 26, 2020

View reviewed changes

patrickhulce added waiting4committer and removed waiting4reviewer labels Mar 26, 2020

alekseykulikov mentioned this pull request Apr 8, 2020

Proposal: use PageSpeed Insights API to collect results GoogleChrome/lighthouse-ci#251

Closed

patrickhulce removed their assignment Nov 15, 2021

devtools-bot added the waiting4reviewer label Dec 14, 2021

Update running-at-scale.md

1fab957

vercel bot had a problem deploying to Preview August 3, 2023 20:14 Failure

connorjclark approved these changes Sep 6, 2023

View reviewed changes

connorjclark merged commit 3225e8b into main Sep 6, 2023
29 of 30 checks passed

connorjclark deleted the atscale branch September 6, 2023 19:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add a guide to running Lighthouse at scale #10511

docs: add a guide to running Lighthouse at scale #10511


		# Running Lighthouse at Scale

		Many Lighthouse users want to collect Lighthouse data for hundreds or thousands of URLs daily. First, anyone interested should understand [how variability plays into web performance measurement](./variability.md) in the lab.


		## Option 2: Using the Lighthouse CLI on cloud hardware

		The [Lighthouse CLI](https://github.com/GoogleChrome/lighthouse#using-the-node-cli) is the foundation of most advanced uses of Lighthouse and provides considerable configuration possibilities. For example, you could launch a fresh Chrome in a debuggable state (`chrome-debug --port=9222`) and then have Lighthouse repeatedly reuse the same Chrome. (`lighthouse <url> --port=9222`). That said, we wouldn't recommend this above a hundred loads, as state can accrue in a Chrome profile. Using a fresh profile for each Lighthouse run is the best approach for reproducible results.


		You'll be running Lighthouse CLI on your own machines, and we have guidance on the [specs of machines suitable](./variability.md#run-on-adequate-hardware) for running Lighthouse without skewing performance results. The environment must also be able to run either headful Chrome or headless Chrome.

		* PRO: Ultimate configurability


		[Lighthouse CI](https://github.com/GoogleChrome/lighthouse-ci#readme) leverages the CLI at its core and it provides a complete experience for those who want to understand how each commit in development affects their Lighthouse results. So while the product is designed for running Lighthouse for every pushed git commit, it's possible to use it for some production monitoring usecases. See [this recipe](https://github.com/GoogleChrome/lighthouse-ci/issues/5#issuecomment-591578507) to fake git commit data while testing production URLs.

		* PRO: Easy multiple-run configuration, selection of median run

docs: add a guide to running Lighthouse at scale #10511

docs: add a guide to running Lighthouse at scale #10511

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment