[go: nahoru, domu]

Open Bug 1898340 Opened 3 months ago Updated 2 months ago

Vulkan validation layer not available in WebGPU CI

Categories

(Core :: Graphics: WebGPU, defect, P1)

defect

Tracking

()

ASSIGNED

People

(Reporter: jimb, Assigned: ErichDonGubler)

References

(Blocks 2 open bugs)

Details

We're seeing this message in a lot of logs:

[WARN  wgpu_hal::vulkan::instance] InstanceFlags::VALIDATION requested, but unable to find layer: VK_LAYER_KHRONOS_validation

This means that wgpu_hal::vulkan::Instance::init tried to enable the Vulkan validation layer, but it was not installed on the system.

It's pretty important that the Vulkan validation layers be available and installed when we run WebGPU tests in CI, because that will help us catch errors much sooner.

Marking as P1: "CTS impact widespread enough that it obscures other issues", which isn't quite right, but this problem definitely obscures other issues.

Severity: -- → S3
Priority: -- → P1

Is the vulkan-validationlayers package installed in our image?

Tagging in :ahal for NI, though I do so from a place of ignorance. There may be a better resource for this, though.

Flags: needinfo?(ahal)

A link to an example task / log would be useful.

But if this requires an update to host VM images (and not just a Docker image), then you'll need to file a ticket in:
https://mozilla-hub.atlassian.net/jira/software/c/projects/RELOPS/boards/528

You can ping #relops in Slack for assistance, or @jgibbs directly if this requires escalation. You'll need to give them the worker-pool these tasks are running on.

Flags: needinfo?(ahal)

:ahal: From https://treeherder.mozilla.org/logviewer?job_id=459219670&repo=try&lineNumber=2506-2509:

[task 2024-05-22T16:39:52.743Z] 16:39:52     INFO - PID 5823 | [WARN  wgpu_hal::vulkan::instance] InstanceFlags::VALIDATION requested, but unable to find layer: VK_LAYER_KHRONOS_validation
[task 2024-05-22T16:39:52.762Z] 16:39:52     INFO - PID 5823 | [WARN  wgpu_hal::vulkan::instance] GENERAL [Loader Message (0x0)]
[task 2024-05-22T16:39:52.762Z] 16:39:52     INFO - PID 5823 |     	terminator_CreateInstance: Failed to CreateInstance in ICD 1.  Skipping ICD.
[task 2024-05-22T16:39:52.763Z] 16:39:52     INFO - PID 5823 | [WARN  wgpu_hal::vulkan::instance] 	objects: (type: INSTANCE, hndl: 0x7fdabeacf000, name: ?)
Summary: Vulkan validation layer not available in Firefox CI → Vulkan validation layer not available in WebGPU CI

Ah ok, these tasks are running in a virtualbox VM inside the GCP VM (it was the only path forward at the time). It's still a relops ticket though.

We want to migrate these tasks over to a regular GCP image at some point.. so maybe that's something that can be investigated as an alternative to fixing the virtualbox VM. Note this platform has pretty terrible test coverage, so unless these tests are particularly important to test against Wayland, then from my perspective this wouldn't be a P1.

:ahal: Wayland is more an incidental detail of how we got our Linux coverage, honestly (see also bug 1836805). I suppose it'd also be good to get coverage over X11, though I'm not aware of 22.04 runners for X11 yet.

This bug is about having the Vulkan validation layers available whenever we test on Linux in CI.

If it's easier to add vulkan-validationlayers to the virtualbox VM's image than it is to migrate the virtualbox-using tasks over to a regular GCP image, then we should pursue that first. But if they're comparable amounts of work, and the regular GCP image includes vulkan-validationlayers, then doing that migration would make sense as a way to address this issue.

:ahal: Wayland is more an incidental detail of how we got our Linux coverage

Ah ok, if these are only running on Wayland, then that obviously changes things :)

There is an experimental non-virtualbox pool available, but it's not currently running any tests and I'm not sure if it's ready for use or not yet. It might be worth a quick try run though.. you can use:

./mach try fuzzy --worker-override t-linux-wayland=gecko-t/t-linux-2204-wayland-experimental

Then select the affected tasks in the fzf window as normal.

Assignee: nobody → egubler
Status: NEW → ASSIGNED

Submitted a Try push here, should get results in <1h: try:8bf6fd8a9d78

Looks like ☝🏻 this Try push encountered adapters that were/are blocklisted. Guess I better follow :jimb's advice at bug 1844627, comment 10 and get some better blocklist diagnostics into place, so we have the information we need. 😅

You need to log in before you can comment on or make changes to this bug.