[go: nahoru, domu]

Page MenuHomePhabricator

Connect testing flows for push notification development
Closed, ResolvedPublic

Description

There will be a few phases of development for iOS push notifications.

For the first phase, when the iOS engineers are developing and debugging on their test devices, we would like to have a flow for triggering push notifications that goes through batching logic in the push notifications service and the APNS sandbox environment, to our test devices. Ideally this flow would mimic what a live flow would be like, like an end-to-end sandbox flow. For example, would we be able to add to a talk page via desktop, and have the notification pushed and appear on a test device via the sandbox environment without any extra steps? Also, would we be able to trigger these notifications from multiple different Wikis? Note we would also need this to work along with the notifications API (either prod or a beta flavor) since we will need to fetch this to determine the push notification content.

Related to this, we have a few app identifiers that we need to test with this flow. Currently I believe the push notifications service is configured with a debug_topic of org.wikimedia.Notifications-Utility. To be able to test in the main app and not this side prototype app, we would need to add one (or more) of these:

  1. org.wikimedia.wikipedia - This is the identifier for the main Wikipedia app, whose endpoints point to production servers. My hope is that this would be safe, if a beta version of the push notifications service can only send through the sandbox apns environment, there would be no chance of pushes from it showing up in TestFlight or App Store builds.
  2. org.wikimedia.wikipedia.tfbeta - This is our staging version of the app. It's the same as the main app, but has a different icon and currently points to the mobileapps staging endpoints for PCS. We use it rarely for testing staged PCS changes, but we could lean on it for this testing flow.
  3. org.wikimedia.wikipedia.tfalpha - This is our experimental version of the app. We use it to push one-off experimental builds when we are early in development on a feature. It can be configured to point to any endpoint environment.

For the second phase, we will want QA to test against a TestFlight deployment of the org.wikimedia.wikipedia app identifier, which is all production. As a last phase we will also want external beta testers to test against these TestFlight builds as well. Is it possible to spin up the production flow for push notifications in these phases before we release to the App Store?

Event Timeline

Tsevener updated the task description. (Show Details)
Tsevener updated the task description. (Show Details)
Tsevener renamed this task from Set up testing flows for iOS push notification development to Connect testing flows for push notification development.Jun 24 2021, 9:15 PM

Hi @MSantos, I noticed this was moved to Tracking. I might've worded this task title poorly but it will need PI's involvement and we'll be blocked until we can land on how we are going to trigger these notifications during development and testing. It may be helpful to read https://phabricator.wikimedia.org/T284238#7172056 for some context on why this was spun up. Happy to answer any questions y'all have. Thanks!

Hi folks, just checking in on this.

Would it be possible to have an end-to-end flow working at least for the sandbox environment in the next couple of weeks? We have a way of triggering them manually so it's not totally blocking us on development, but it would be good to have an end-to-end flow spun up just in case it surfaces any unforeseen issues on our end. I would rather know about any potential issues early rather than later after significant development time. Thanks!

@Jgiannelos should be able to devote some time to this next week -- the team has been a bit heads down on the maps rollout preparations and trying to cram things in between availability of others that are needed for this. Thanks for the heads up about your requirements around this.

Update: made https://phabricator.wikimedia.org/T287897 as side-work to be done after testing against beta labs and discussing with @Jgiannelos this morning. Pleaase let us know if there are any questions around that, thanks!

After our discussion it seems like the current setup on beta cluster is enough for testing the whole flow for push notifications on sandbox. I added org.wikimedia.wikipedia.tfalpha on the debug topics on beta push notifications.

Is there anything else missing to allow testing using org.wikimedia.wikipedia.tfalpha on beta cluster in the scope of this ticket?

@Jgiannelos sorry for the delay, I just tried testing on a device hooked up to Xcode against the beta cluster and it worked! Exciting stuff, thanks so much. This will really speed up our development. I have a couple of next stages for this:

  1. Is it possible to have multiple apps that the beta cluster sends to through sandbox? I would love it if debug topics could have all 3 - org.wikimedia.wikipedia.tfalpha, org.wikimedia.wikipedia.tfbeta, and org.wikimedia.wikipedia, but the first two are most important.
  2. QA will test this through TestFlight, which means we'll also need the production flow working when they test. I'm not sure if this is set up yet, I haven't tested a TestFlight build in a while but last time I checked nothing seemed to come through (Edit: confirmed nothing comes through on a TestFlight org.wikimedia.wikipedia.tfalpha build through the beta cluster). All 3 app ID options against the production flow would be great, but for early testing they should be able to get by with just org.wikimedia.wikipedia.tfalpha & org.wikimedia.wikipedia.tfbeta. The org.wikimedia.wikipedia app ID will definitely need to be in place in the production flow for releasing to external beta testers and the final App Store release, of course.

@Jgiannelos heads up, I'm no longer able to see a sandbox notification on device when triggering from the Desktop beta cluster. Is there something particular about which topic I do this on, since we removed the debug mode functionality here - https://phabricator.wikimedia.org/T287897#7294759?

Hi @Jgiannelos and/or @ssastry - can someone take a look re: Toni's comment above? Also let us know if it would be helpful to set up a time to meet together. Thanks!

Hi @Tsevener ! Beta cluster was using an old image. I just deployed the latest one after the patch we merge. Let me know if that did the trick.

Just posting to keep this ticket up to date - the deploy didn't fix things, and may have been introduced with https://phabricator.wikimedia.org/T287909. @Jgiannelos is doing more investigation.

After testing on MW beta, I can see requests to the push service after a Thanks echo event to a test user with bogus APNS tokens:

{"name":"push-notifications","hostname":"ab93cc22c07a","pid":1,"level":10,"msg":"Incoming request","request_id":"YTB4AYRSbzdvx1cuVz7VegAAAAE","request":{"url":"/v1/message/apns","headers":{"x-request-id":"YTB4AYRSbzdvx1cuVz7VegAAAAE","content-type":"application/json; charset=utf-8","user-agent":"MediaWiki/1.37.0-alpha","content-length":"97"},"method":"POST","params":{"0":"/v1/message/apns"},"query":{},"remoteAddress":"172.16.3.153","remotePort":52726},"levelPath":"trace/req","time":"2021-09-02T07:06:42.215Z","v":0}
2021-09-02T07:06:52.534Z apn Request ended with status 400 and responseData: {"reason":"BadDeviceToken"}

The BadDeviceToken error is expected. At least we know that the service tries to send a notification triggered by an echo event.
I think its relevant to this change being landed: https://phabricator.wikimedia.org/T287909#7293958

Now that https://phabricator.wikimedia.org/T287897 is merged I think we:

  • Have a way to test on all sandbox environments from beta cluster
  • Have a way to test on all prod environments from the actual prod deployment (now that notifications are not silent)

@Tsevener do you think there is anything left for the scope of this ticket?

@Jgiannelos thanks, I did a little bit of testing on the sandbox tfalpha build and I'm able to see the pushes again through the beta cluster. I'll continue testing our other configurations tomorrow morning and will let you know if it's all working well.

@Jgiannelos so the beta cluster --> push to sandbox apps seems to work well. I tested with multiple different topic IDs and we receive them on our development devices. Thanks for fixing that!

But I wasn't able to trigger push notifications from the production desktop Wikipedia (I was trying from the Test language), through to our production (TestFlight) builds. Is there anything missing with this pipeline, does something still need to be deployed for this to work? Is there any particular limit to which topic IDs we are able to send production pushes to? This will block QA entirely from testing, so I don't think we can consider this ticket done till this flow is set up.

Let me know if I can help debug in any way. If you have an iOS device I can also invite you to test out our build on TestFlight.

I haven't deployed https://phabricator.wikimedia.org/T287897 in production because I wasn't sure how this would affect already registered tokens (if any). I can do it on the next deployment window if we are OK with this in production.

@Jgiannelos ah gotcha, yeah any registered tokens out there should only be from developers at this point since we've never gone live with push, so we should be good to deploy it. Thanks so much!

That will cover the musts of this ticket. I do have a couple of additional nice-to-haves, but I can spin them up as a separate ticket later only if they turn into a blocker for us:

  1. Can we also trigger sandbox APNS notifications from production desktop Wikipedia? This will enable engineers to see production Wiki pushes on their development devices, basically in case there's some bug very specific to production Wikipedia. We could limit this to only the Test and Test2 languages.
  2. Can we also trigger production APNS notifications from the beta cluster? This would be for QA to have a beta cluster to run scripts against, so they can see those push notifications.

Basically it would be nice to have criss crossed flow options in addition to the flows you've already given us. These are definitely not a must at this point, just mentioning them here for completion and early thoughts.

  1. Can we also trigger sandbox APNS notifications from production desktop Wikipedia? This will enable engineers to see production Wiki pushes on their development devices, basically in case there's some bug very specific to production Wikipedia. We could limit this to only the Test and Test2 languages.

Currently both prod and staging deployments are using production APNS. I am not sure if staging allows traffic in the first place but we could potentially re-use it for sandbox APNS and then point test/test2 wikis to staging, but that needs input and collaboration with SRE folks.

  1. Can we also trigger production APNS notifications from the beta cluster? This would be for QA to have a beta cluster to run scripts against, so they can see those push notifications.

Beta push notifications is a public endpoint and I don't think its a good idea to allow production APNS requests. I think there was a similar discussion here: https://phabricator.wikimedia.org/T274456

I think its better to file tickets for those 2 issues if/when they become blockers because I don't think there is a very straightforward solution (especially for the 2nd point)

@Jgiannelos sounds good, I'll file tickets for those later on if we need it. Please let us know when https://phabricator.wikimedia.org/T287897 is deployed so I can retest the issues I had in https://phabricator.wikimedia.org/T285417#7329905.

Currently both prod and staging deployments are using production APNS.

This sentence confused me - I think we do need to keep our sandbox APNS flow as well. Just confirming - when this task is done & deployed these flows will be in place:

  • Beta cluster Wikipedia notification trigger → pushes to sandbox APNS → engineers receive push on their development devices (this flow is working as of Thursday Sept 2).
  • Production Wikipedia notification trigger → pushes to production APNS → QA, external beta testers and eventually all of our users will receive these via TestFlight & AppStore builds. (I will test for this in TestFlight once https://phabricator.wikimedia.org/T287897 is deployed).

I'm not totally sure how the push notification service and it's different environments fit in here but just wanted to make sure we're on the same page.

Sorry for the confusion, what you describe is how things should work.
I think what caused the confusion is mentioning our staging env which is one of our deployments on kubernetes that's the closest to production config and which we don't actively use other than manually testing new releases.

Beta cluster is a different environment from staging.

@Jgiannelos Aah got it, yeah I sometimes equate beta cluster = staging in my mind. Thanks for the clarification!

@Tsevener I pushed the latest version to production. Let me know if you manage to reproduce the same flow.

@Jgiannelos sorry for the delay - I tested this on our experimental build in TestFlight (pointing to MediaWiki prod) and it's working! Thanks for your help, it's great to see these in place. We'll test against our production build once we're further along, but as long as there isn't any specific targeting of topic ID on your side (org.wikimedia.wikipedia vs org.wikimedia.wikipedia.tfalpha) I expect it'll work fine.

Hi @Jgiannelos - I noticed this past week (yesterday and today, though it might have happened earlier) that I no longer seem to receive push notifications from the beta cluster on my testing device, so it seems the sandbox flow is messed up again. I did check the production flow and that's all still working. We are able to manually trigger pushes to our development devices so it's not blocking us, but it would be nice to get this up and running to be able to develop end-to-end again.

When you take a look, would it also be possible to decrease the delay time on this flow to 1 minute or so? That would speed up our testing while developing. The production delay can stay as it is, of course. Thanks!

Tsevener lowered the priority of this task from High to Medium.

This seems to be working again now! Maybe just a temporary hiccup.

Tsevener removed Jgiannelos as the assignee of this task.