[go: nahoru, domu]

Page MenuHomePhabricator

Migrate tools.wmflabs.org to https only (and set HSTS)
Closed, ResolvedPublic

Assigned To
Authored By
yuvipanda
Jun 14 2015, 12:54 AM
Referenced Files
F12139559: Screen Shot 2017-12-30 at 03.50.59.png
Dec 30 2017, 3:51 AM
Tokens
"Love" token, awarded by aborrero."Like" token, awarded by Steinsplitter."Like" token, awarded by Bawolff."Baby Tequila" token, awarded by Liuxinyu970226."Love" token, awarded by Framawiki."Baby Tequila" token, awarded by tom29739.

Description

In line with switching over everything else.

Should be fairly simple to do.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I really think we could just flip the switch at the ingress proxy and then deal with the fallout. Mixed content warnings/errors are really all that I can think of that may come up. As @yuvipanda pointed out in T102367#2214936, TLS termination happens in front of the tools themselves.

I really think we could just flip the switch at the ingress proxy and then deal with the fallout. Mixed content warnings/errors are really all that I can think of that may come up. As @yuvipanda pointed out in T102367#2214936, TLS termination happens in front of the tools themselves.

The place where this breaks is request cycles that start with an HTTP POST. There is no mechanism in the HTTP protocol to redirect a POST.

We could do the same thing as what we did in prod. Allow HTTP POST for a while, then make a percentage of requests fail, and then finally make them all fail.

I really think we could just flip the switch at the ingress proxy and then deal with the fallout. Mixed content warnings/errors are really all that I can think of that may come up. As @yuvipanda pointed out in T102367#2214936, TLS termination happens in front of the tools themselves.

We could do the same thing as what we did in prod. Allow HTTP POST for a while, then make a percentage of requests fail, and then finally make them all fail.

Is there any resistance to redirecting GET requests from http to https at the proxy?

Is there any resistance to redirecting GET requests from http to https at the proxy?

Long overdue.

Mentioned in SAL (#wikimedia-cloud) [2017-10-15T20:15:18Z] <bd808> Updated tool-admin-web from 83d3174 to 9994980 (Redirect http -> https & set STS header with 1day duration) T102367

What you can do to help modern browsers, though, without taking the redirect and/or POST-breakage risks, is universally add a long-duration HSTS header so that they lock onto HTTPS if they've ever visited once via HTTPS.

I have added a PHP based redirect from http -> https into https://tools.wmflabs.org/admin/tool/admin. I have also added a logic to that app that will set Strict-Transport-Security: max-age:86400; includeSubDomains; preload on all responses. This will be a small experiment towards the idea of adding an HSTS to all dynamicproxy responses.

Be careful with preload. It's only purpose is to signal to the Chromium list maintainers that it's ok to you preload you (making HSTS more-or-less permanent), and once you're emitting anyone can submit your domain for preload inclusion.

Be careful with preload. It's only purpose is to signal to the Chromium list maintainers that it's ok to you preload you (making HSTS more-or-less permanent), and once you're emitting anyone can submit your domain for preload inclusion.

Yikes. I removed that flag from the header.

If I have my tool set a strict-transport-security: max-age=86400 header, will that impact other tools as well since they're on the same subdomain? Or will it just affect my tool?

If I have my tool set a strict-transport-security: max-age=86400 header, will that impact other tools as well since they're on the same subdomain? Or will it just affect my tool?

I would think it will break other tools that don't support HTTPS. We should probably add this header to proxy_hide_header in modules/dynamicproxy/templates/urlproxy.conf until we're ready to start using HSTS across tools.wmflabs.org.
Long-term solution to prevent that class of problems: T125589

Edit: Turns out this was done above anyway... I'd still be cautious and to have a go at T128409 first

If I have my tool set a strict-transport-security: max-age=86400 header, will that impact other tools as well since they're on the same subdomain? Or will it just affect my tool?

It will effect all of tools.wmflabs.org, but ... I turned this header on for the admin tool in October (T102367#3686094) and we have not had any widespread reports of problems caused by it. I think we really can start setting this header globally at the proxy and also redirecting GET requests from http to https. We should not redirect non-HTTPS POST requests as the behavior of that is undefined for clients (some detail in T128409#2233337).

Edit: Turns out this was done above anyway... I'd still be cautious and to have a go at T128409 first

I think I tend to agree with @scfc's comment in T128409#2073647 that we really can't reliably certify that everything will work over HTTPS before trying to switch. We can however be ready to respond to reports of breakage by helping tool maintainers update their code. The most common causes I can think of would be (a) hardcoded http protocols on URLs and (b) frameworks generating absolute links which are not respecting the X-Forwarded-Proto header sent by the proxy. We should try to document how to fix (b) for common frameworks before flipping this switch globally.

Change 432935 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] toolforge: Redirect GET & HEAD to https

https://gerrit.wikimedia.org/r/432935

Change 481979 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] toolforge: Redirect tools-static to https

https://gerrit.wikimedia.org/r/481979

Change 481979 merged by Andrew Bogott:
[operations/puppet@production] toolforge: Redirect tools-static to https

https://gerrit.wikimedia.org/r/481979

Change 432935 merged by Andrew Bogott:
[operations/puppet@production] toolforge: Redirect GET & HEAD to https

https://gerrit.wikimedia.org/r/432935

Change 482142 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] toolforge: Redirect GET & HEAD to https (take 2)

https://gerrit.wikimedia.org/r/482142

Change 482142 merged by Andrew Bogott:
[operations/puppet@production] toolforge: Redirect GET & HEAD to https (take 2)

https://gerrit.wikimedia.org/r/482142

Currently tools.wmflabs.org is violating RFC 6797 section 7.2 by sending the HSTS header over HTTP:

An HSTS Host MUST NOT include the STS header field in HTTP responses conveyed over non-secure transport.

$ curl -v http://tools.wmflabs.org
* Rebuilt URL to: http://tools.wmflabs.org/
*   Trying 185.15.56.5...
* TCP_NODELAY set
* Connected to tools.wmflabs.org (185.15.56.5) port 80 (#0)
> GET / HTTP/1.1
> Host: tools.wmflabs.org
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Server: nginx/1.13.6
< Date: Tue, 26 Mar 2019 06:13:57 GMT
< Content-Type: text/html
< Content-Length: 185
< Connection: keep-alive
< Location: https://tools.wmflabs.org/
< Strict-Transport-Security: max-age=86400

It's also violating RFC 6797 section 7.1 by sending the HSTS header twice over HTTPS:

If an STS header field is included, the HSTS Host MUST include only one such header field.

curl --http1.1 -v https://tools.wmflabs.org -o /dev/null
* Rebuilt URL to: https://tools.wmflabs.org/
Trying 185.15.56.5...
* TCP_NODELAY set
* Connected to tools.wmflabs.org (185.15.56.5) port 443 (#0)
[output omitted]
> GET / HTTP/1.1
> Host: tools.wmflabs.org
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.13.6
< Date: Tue, 26 Mar 2019 06:22:52 GMT
< Content-Type: text/html; charset=UTF-8
< Content-Length: 8719
< Connection: keep-alive
< Set-Cookie: _s=eclunr0i35p2nebqbnq70venbf; path=/; HttpOnly
< Vary: Cookie
< X-Frame-Options: DENY
< Content-Security-Policy: default-src 'self' https://tools-static.wmflabs.org; child-src 'none'; object-src 'none'; img-src 'self' data: https://tools-static.wmflabs.org
< Strict-Transport-Security: max-age=86400; includeSubDomains
< Strict-Transport-Security: max-age=86400

I assume this is something in our nginx proxy, right? @Vgutierrez Could you please help us review the config?

I assume this is something in our nginx proxy, right? @Vgutierrez Could you please help us review the config?

so from a quick review of https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/modules/dynamicproxy/templates/domainproxy.conf, line 56 is responsible of sending the HSTS header twice over HTTPS as the HSTS header is also set as part of line 39.
I assume that line 56 again is responsible for sending the header over HTTP as the same virtualhost is being used for serving http and https traffic

Live config from tools-proxy-03.tools.eqiad.wmflabs shows only the add_header Strict-Transport-Security "max-age=86400"; in the nginx config. Looking at the logic of how I added that, I think I assumed (incorrectly apparently) that the return 301 https://$host$request_uri; redirect would stop all further processing of the request. It should be possible to adjust the logic so that this header is only sent when the connection is using TLS.

The Strict-Transport-Security: max-age=86400; includeSubDomains header is actually being sent by the "admin" application (R1922:7be2a095ce41: Strict-Transport-Security: remove "preload" flag). This is a leftover from the experiments prior to adding this header at the proxy level. We should remove this from the app. We should probably also add a proxy_hide_header Strict-Transport-Security; config line to the proxy to keep any other tool from sending arbitrary STS headers to the client.

Mentioned in SAL (#wikimedia-cloud) [2019-03-27T21:49:20Z] <bd808> Updated to a03551b (Remove STS header and http->https redirect) T102367

Change 499669 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] dynamicproxy: Prevent STS header from non-TLS connections

https://gerrit.wikimedia.org/r/499669

Change 499669 merged by Bstorm:
[operations/puppet@production] dynamicproxy: Prevent STS header from non-TLS connections

https://gerrit.wikimedia.org/r/499669

Currently tools.wmflabs.org is violating RFC 6797 section 7.2 by sending the HSTS header over HTTP:

My attempt at fixing this in https://gerrit.wikimedia.org/r/499669 was reverted in https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/502308/.

Reasons for the revert:

  1. @bd808 did not actually test the nginx config changes and they were not syntactically correct.
  2. @bd808 did not remember that this exact same syntax problem is why we have the header sent over http in the first place. (See https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/432935/ and its replacement in https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/482142/).

We can't use the if construct in nginx config to guard the add_header setting. This in turn means that the entire nginx config used by domain-proxy and url-proxy will need to be rewritten if we want to ensure that the STS header is only presented to TLS secured connections. The current config for both mixes http and https handling in the same server config block leaving us no way to decide whether or not to emit the STS header. Splitting http and https into separate server blocks and only placing the STS header in the https server will fix that problem. It will require some mechanism to duplicate almost all of the other logic in the current config as long as we continue to support the POST loophole however.

We have left the POST loophole open for more than a year. Now that we have introduced the *.toolforge.org naming scheme (T234617), I think we should close the POST loophole, turn up the HSTS duration, and as a side effect fix the RFC violation reported in T102367#5056919.

Change 593747 had a related patch set uploaded (by QEDK; owner: QEDK):
[operations/puppet@production] toolforge: Increase HSTS max-age directive to one month

https://gerrit.wikimedia.org/r/593747

We have left the POST loophole open for more than a year. Now that we have introduced the *.toolforge.org naming scheme (T234617), I think we should close the POST loophole, turn up the HSTS duration, and as a side effect fix the RFC violation reported in T102367#5056919.

We should gradually work towards the one-year max-age imo. Also, is there any way for the nginx proxy to hide the header if a tool chooses to send their own?

Also, is there any way for the nginx proxy to hide the header if a tool chooses to send their own?

Technically this would be possible, but in practice it would not be useful for the legacy tools.wmflabs.org proxy (the subject of this particular ticket). The reason it would not be useful is that this is a single hostname and thus subject to a single HSTS policy. If N different tools sent out N different headers that would be confusing to receiving user-agents and more so to crawlers that hit many different tools.

Even for the new $TOOL.toolforge.org scheme, controlling the HSTS header at the shared proxy layer rather than at the tool layer makes more sense. User facing TLS is managed at the shared proxy, which means the certificates and cipher suites are managed at the shared proxy. The HSTS header should continue to be manged at that level to ensure that these things all make sense when taken as a collective system.

Also, is there any way for the nginx proxy to hide the header if a tool chooses to send their own?

Technically this would be possible, but in practice it would not be useful for the legacy tools.wmflabs.org proxy (the subject of this particular ticket). The reason it would not be useful is that this is a single hostname and thus subject to a single HSTS policy. If N different tools sent out N different headers that would be confusing to receiving user-agents and more so to crawlers that hit many different tools.

Even for the new $TOOL.toolforge.org scheme, controlling the HSTS header at the shared proxy layer rather than at the tool layer makes more sense. User facing TLS is managed at the shared proxy, which means the certificates and cipher suites are managed at the shared proxy. The HSTS header should continue to be manged at that level to ensure that these things all make sense when taken as a collective system.

I don't disagree that we should manage it at the proxy layer, I meant to say that if a tool was supplying it's own header, then ideally the proxy should not supply one, now that we are moving to a host-based URL naming.

I don't disagree that we should manage it at the proxy layer, I meant to ask that if a tool was supplying it's own header, then ideally the proxy should suppress supplying one.

Ah! The inverse of what my brain interpreted the original question as. Sorry about that confusion. I thought we were already doing that hiding, but upon checking the config we are not. I would agree that yes we should be for the reasons I outlined in T102367#6100515. I think the "right" way to do this with the current urlproxy nginx config would be to add a proxy_hide_header Strict-Transport-Security; directive right after the current add_header Strict-Transport-Security "max-age=..."; directive.

QEDK moved this task from Inbox to Code Review on the User-QEDK board.

Change 593747 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] toolforge: Increase HSTS max-age directive to one month

https://gerrit.wikimedia.org/r/593747

bd808 moved this task from Inbox to Soon! on the cloud-services-team (Kanban) board.

Last step is to set HSTS to 1 year and then close this.

Change 612947 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] toolforge: Set Strict-Transport-Security to 366 days

https://gerrit.wikimedia.org/r/612947

Change 612948 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] toolforge: Perform HTTPS redirects unconditionally

https://gerrit.wikimedia.org/r/612948

bd808 moved this task from To Do to In Dev/Progress on the User-bd808 board.
bd808 moved this task from In Dev/Progress to Needs Review/Feedback on the User-bd808 board.

Change 612947 merged by Andrew Bogott:
[operations/puppet@production] toolforge: Set Strict-Transport-Security to 366 days

https://gerrit.wikimedia.org/r/612947

Mentioned in SAL (#wikimedia-cloud) [2020-07-17T16:29:10Z] <bd808> Disabled Puppet on tools-proxy-06 to test nginx config changes manually (T102367)

Mentioned in SAL (#wikimedia-cloud) [2020-07-17T16:47:43Z] <bd808> Enabled Puppet on tools-proxy-06 following successful test (T102367)

Change 612948 merged by Andrew Bogott:
[operations/puppet@production] toolforge: Perform HTTPS redirects unconditionally

https://gerrit.wikimedia.org/r/612948

mean-time-to-implement: 5 years. Ouch. But it is done now!