[go: nahoru, domu]

Page MenuHomePhabricator

nshahquinn-wmf (Neil Shah-Quinn)
senior data scientist, Movement Insights, Wikimedia Foundation

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Apr 16 2015, 4:17 PM (485 w, 2 d)
Availability
Available
LDAP User
Neil Shah-Quinn (WMF)
MediaWiki User
Neil Shah-Quinn (WMF) [ Global Accounts ]

Recent Activity

Yesterday

nshahquinn-wmf moved T362594: Update the editor_month table with an Airflow job from Waiting on others to Doing on the Movement-Insights board.
Sat, Aug 3, 11:05 PM · Movement-Insights
nshahquinn-wmf moved T362595: Update the new_editors table with an Airflow job from Waiting on others to Doing on the Movement-Insights board.
Sat, Aug 3, 11:05 PM · Movement-Metrics, Movement-Insights
nshahquinn-wmf triaged T371766: Document fields of canonical wiki dataset as Medium priority.
Sat, Aug 3, 10:04 PM · Analytics-Canonical-Data, Movement-Insights
nshahquinn-wmf created T371766: Document fields of canonical wiki dataset.
Sat, Aug 3, 10:03 PM · Analytics-Canonical-Data, Movement-Insights
nshahquinn-wmf updated the task description for T314541: Update the active_editors table with an Airflow job.
Sat, Aug 3, 9:58 PM · Patch-For-Review, Movement-Metrics, Movement-Insights

Fri, Aug 2

nshahquinn-wmf changed Source Repo from https://github.com/wikimedia-research/canonical-data to https://gitlab.wikimedia.org/repos/movement-insights/canonical-data on Analytics-Canonical-Data.
Fri, Aug 2, 9:18 PM
nshahquinn-wmf renamed T273197: Update all Wmfdata-Python run functions to have consistent API from Update run functions to accept a filepath as well as a string for the SQL command to Update all Wmfdata-Python run functions to have consistent API.
Fri, Aug 2, 8:44 PM · Data-Engineering, Product-Analytics, Wmfdata-Python
nshahquinn-wmf added a comment to T369325: [MI 3] Investigate trends in movement metrics.

This week, I:

  • Had exploratory conversations with Jan Eissfeldt, Becky Maung, Isaac Johnson, Nino Hemmer, Kinneret Gordon, Marshall Miller, and Sonja Perry
Fri, Aug 2, 7:27 PM · Epic, Movement-Insights
nshahquinn-wmf added a parent task for T356701: Temporary Accounts Initiative (IP Masking) - Add user_is_temp to data tables: T371651: Update Movement Insights tables and movement metrics code to accomodate temporary users.
Fri, Aug 2, 12:20 AM · Product-Analytics, Movement-Insights, Temporary accounts, Data Products, Data-Engineering, Data-Platform
nshahquinn-wmf added a subtask for T371651: Update Movement Insights tables and movement metrics code to accomodate temporary users: T356701: Temporary Accounts Initiative (IP Masking) - Add user_is_temp to data tables.
Fri, Aug 2, 12:20 AM · Movement-Insights
nshahquinn-wmf updated the task description for T371651: Update Movement Insights tables and movement metrics code to accomodate temporary users.
Fri, Aug 2, 12:20 AM · Movement-Insights
nshahquinn-wmf created T371651: Update Movement Insights tables and movement metrics code to accomodate temporary users.
Fri, Aug 2, 12:10 AM · Movement-Insights

Thu, Aug 1

nshahquinn-wmf added a project to T371560: REQUEST: A useful namespace_canonical_name column in wmf_raw.mediawiki_project_namespace_map: Analytics-Canonical-Data.

This would definitely be very useful.

Thu, Aug 1, 12:02 AM · Analytics-Canonical-Data, Data-Platform

Wed, Jul 31

nshahquinn-wmf closed T371157: Some new users do not have account creation log events as Declined.

@matej_suchanek thank you very much! I didn't realize that and that does seem to be a very big part of the puzzle.

Wed, Jul 31, 2:05 AM · MediaWiki-Engineering, Data-Persistence, MediaWiki-Logevents

Tue, Jul 30

nshahquinn-wmf added a comment to T363125: sustainability of wikitech.wikimedia.org.

Plan has been draften in the "Wikitech Migration Plan" document

Tue, Jul 30, 5:46 PM · wikitech.wikimedia.org, Security, Epic, cloud-services-team

Sat, Jul 27

nshahquinn-wmf added a comment to T371157: Some new users do not have account creation log events.

It's possible this is expected behavior, but I couldn't find any documentation saying so.

Sat, Jul 27, 1:54 AM · MediaWiki-Engineering, Data-Persistence, MediaWiki-Logevents
nshahquinn-wmf created T371157: Some new users do not have account creation log events.
Sat, Jul 27, 1:52 AM · MediaWiki-Engineering, Data-Persistence, MediaWiki-Logevents

Fri, Jul 26

nshahquinn-wmf added a comment to T369325: [MI 3] Investigate trends in movement metrics.

Weekly updates:

  • Had exploratory conversations with Irene Florez, Leila Zia, Kate Zimmerman, Maryana Pinchuk, Sam Patton, Jaime Anstee, and Zack McCune
Fri, Jul 26, 5:55 PM · Epic, Movement-Insights

Thu, Jul 25

nshahquinn-wmf moved T314541: Update the active_editors table with an Airflow job from Planned Next 2 weeks to Doing on the Movement-Insights board.
Thu, Jul 25, 11:44 PM · Patch-For-Review, Movement-Metrics, Movement-Insights
nshahquinn-wmf moved T369327: Choose topics for first two trend investigations from Planned Next 2 weeks to Doing on the Movement-Insights board.
Thu, Jul 25, 11:44 PM · Movement-Insights
nshahquinn-wmf updated the task description for T362595: Update the new_editors table with an Airflow job.
Thu, Jul 25, 11:29 PM · Movement-Metrics, Movement-Insights
nshahquinn-wmf closed T221828: Mediawiki-history release - Backlog as Declined.

I suspect that this tracking task is no longer useful.

Thu, Jul 25, 11:27 PM · Data-Engineering-Icebox, Analytics

Wed, Jul 24

nshahquinn-wmf updated the task description for T365387: Issues in the dumps → mediawiki wikitext history → content gap metrics pipeline can significantly delay the movement metrics report.
Wed, Jul 24, 7:11 PM · Epic, Movement-Metrics, Movement-Insights

Tue, Jul 23

nshahquinn-wmf updated the task description for T362595: Update the new_editors table with an Airflow job.
Tue, Jul 23, 6:57 PM · Movement-Metrics, Movement-Insights
nshahquinn-wmf updated the task description for T362595: Update the new_editors table with an Airflow job.
Tue, Jul 23, 5:48 PM · Movement-Metrics, Movement-Insights

Mon, Jul 22

nshahquinn-wmf moved T356230: Conda-Analytics packages incompatible with latest versions of Pandas and Numpy from Done to Needs Review on the Data-Platform-SRE (2024.07.08 - 2024.07.28) board.
Mon, Jul 22, 8:55 PM · Data-Platform-SRE (2024.07.08 - 2024.07.28), Movement-Insights
nshahquinn-wmf triaged T370718: Specify Conda-Pack as a dependency as Low priority.
Mon, Jul 22, 8:19 PM · Wmfdata-Python, Data-Engineering
nshahquinn-wmf added a comment to T356230: Conda-Analytics packages incompatible with latest versions of Pandas and Numpy.

@BTullis that makes sense!

Mon, Jul 22, 7:58 PM · Data-Platform-SRE (2024.07.08 - 2024.07.28), Movement-Insights
nshahquinn-wmf added a parent task for T370713: Upgrade to Pyspark ≥ 3.5: T370705: Upgrade to Pandas ≥ 2 in Conda-Analytics.
Mon, Jul 22, 7:56 PM · Data-Platform-SRE
nshahquinn-wmf added a subtask for T370705: Upgrade to Pandas ≥ 2 in Conda-Analytics: T370713: Upgrade to Pyspark ≥ 3.5.
Mon, Jul 22, 7:56 PM · Data-Platform-SRE
nshahquinn-wmf created T370713: Upgrade to Pyspark ≥ 3.5.
Mon, Jul 22, 7:56 PM · Data-Platform-SRE
nshahquinn-wmf added a subtask for T370710: Upgrade to Numpy ≥ 1.24 in Conda-Analytics: T370712: Upgrade to Pyspark ≥ 3.4.
Mon, Jul 22, 7:55 PM · Data-Platform-SRE
nshahquinn-wmf added a parent task for T370712: Upgrade to Pyspark ≥ 3.4: T370710: Upgrade to Numpy ≥ 1.24 in Conda-Analytics.
Mon, Jul 22, 7:55 PM · Data-Platform-SRE
nshahquinn-wmf created T370712: Upgrade to Pyspark ≥ 3.4.
Mon, Jul 22, 7:54 PM · Data-Platform-SRE
nshahquinn-wmf added a parent task for T370711: Upgrade to Pyarrow ≥ 10.0.1 in Conda-Analytics: T370707: Upgrade to Pandas ≥ 2.2 in Conda-Analytics.
Mon, Jul 22, 7:52 PM · Data-Platform-SRE
nshahquinn-wmf added a subtask for T370707: Upgrade to Pandas ≥ 2.2 in Conda-Analytics: T370711: Upgrade to Pyarrow ≥ 10.0.1 in Conda-Analytics.
Mon, Jul 22, 7:52 PM · Data-Platform-SRE
nshahquinn-wmf created T370711: Upgrade to Pyarrow ≥ 10.0.1 in Conda-Analytics.
Mon, Jul 22, 7:52 PM · Data-Platform-SRE
nshahquinn-wmf created T370710: Upgrade to Numpy ≥ 1.24 in Conda-Analytics.
Mon, Jul 22, 7:48 PM · Data-Platform-SRE
nshahquinn-wmf created T370707: Upgrade to Pandas ≥ 2.2 in Conda-Analytics.
Mon, Jul 22, 7:43 PM · Data-Platform-SRE
nshahquinn-wmf created T370705: Upgrade to Pandas ≥ 2 in Conda-Analytics.
Mon, Jul 22, 7:41 PM · Data-Platform-SRE
nshahquinn-wmf moved T362595: Update the new_editors table with an Airflow job from Doing to Waiting on others on the Movement-Insights board.
Mon, Jul 22, 7:22 PM · Movement-Metrics, Movement-Insights

Sat, Jul 20

nshahquinn-wmf updated the task description for T362595: Update the new_editors table with an Airflow job.
Sat, Jul 20, 11:24 PM · Movement-Metrics, Movement-Insights

Fri, Jul 19

nshahquinn-wmf added a comment to T369325: [MI 3] Investigate trends in movement metrics.

I've been doing some initial exploratory work:

  • conversations with members of Movement Insights and Morten Warncke-Wang
  • looking through recent movement metrics output and brainstorming ideas
  • scheduling stakeholder conversations (I have six scheduled for the coming week)
Fri, Jul 19, 2:55 AM · Epic, Movement-Insights
nshahquinn-wmf updated the task description for T362595: Update the new_editors table with an Airflow job.
Fri, Jul 19, 2:51 AM · Movement-Metrics, Movement-Insights

Thu, Jul 18

nshahquinn-wmf renamed T365211: Standardize time range of Wikicharts charts from Wikicharts charts have a fixed start date to Standardize time range of Wikicharts charts.
Thu, Jul 18, 2:06 AM · Movement-Metrics, Movement-Insights
nshahquinn-wmf moved T359688: Ensure that all the data dependencies of movement metrics have documented timelines and owners from Backlog to Upstream on the Movement-Metrics board.
Thu, Jul 18, 12:21 AM · Movement-Metrics, Movement-Insights
nshahquinn-wmf moved T370196: Retire the old movement metrics intermediate tables from Backlog to Code improvements on the Movement-Metrics board.
Thu, Jul 18, 12:21 AM · Movement-Metrics, Movement-Insights

Wed, Jul 17

nshahquinn-wmf triaged T370225: Provide recommendations to Data Engineering on Airflow documentation improvements as Medium priority.
Wed, Jul 17, 6:46 PM · Movement-Insights
nshahquinn-wmf moved T370225: Provide recommendations to Data Engineering on Airflow documentation improvements from Incoming to Planned Next 2 weeks on the Movement-Insights board.
Wed, Jul 17, 6:46 PM · Movement-Insights
nshahquinn-wmf updated the task description for T365387: Issues in the dumps → mediawiki wikitext history → content gap metrics pipeline can significantly delay the movement metrics report.
Wed, Jul 17, 5:51 PM · Epic, Movement-Metrics, Movement-Insights
nshahquinn-wmf updated the task description for T362595: Update the new_editors table with an Airflow job.
Wed, Jul 17, 1:41 AM · Movement-Metrics, Movement-Insights

Tue, Jul 16

nshahquinn-wmf created T370225: Provide recommendations to Data Engineering on Airflow documentation improvements .
Tue, Jul 16, 10:07 PM · Movement-Insights
nshahquinn-wmf triaged T370196: Retire the old movement metrics intermediate tables as Medium priority.
Tue, Jul 16, 6:57 PM · Movement-Metrics, Movement-Insights
nshahquinn-wmf created T370196: Retire the old movement metrics intermediate tables.
Tue, Jul 16, 6:57 PM · Movement-Metrics, Movement-Insights
nshahquinn-wmf lowered the priority of T362595: Update the new_editors table with an Airflow job from High to Medium.
Tue, Jul 16, 6:45 PM · Movement-Metrics, Movement-Insights
nshahquinn-wmf lowered the priority of T362594: Update the editor_month table with an Airflow job from High to Medium.
Tue, Jul 16, 6:45 PM · Movement-Insights
nshahquinn-wmf lowered the priority of T362593: Update the content_interactions table with an Airflow job from High to Medium.
Tue, Jul 16, 6:45 PM · Movement-Metrics, Movement-Insights
nshahquinn-wmf lowered the priority of T333225: Migrate the movement_metrics ETL jobs to Airflow from High to Medium.
Tue, Jul 16, 6:44 PM · Epic, Movement-Insights
nshahquinn-wmf lowered the priority of T314541: Update the active_editors table with an Airflow job from High to Medium.
Tue, Jul 16, 6:44 PM · Patch-For-Review, Movement-Metrics, Movement-Insights
nshahquinn-wmf updated the task description for T362593: Update the content_interactions table with an Airflow job.
Tue, Jul 16, 6:42 PM · Movement-Metrics, Movement-Insights
nshahquinn-wmf updated the task description for T362595: Update the new_editors table with an Airflow job.
Tue, Jul 16, 6:38 PM · Movement-Metrics, Movement-Insights
nshahquinn-wmf placed T361143: Update the main unique devices documentation up for grabs.
Tue, Jul 16, 6:33 PM · Movement-Insights
nshahquinn-wmf moved T333225: Migrate the movement_metrics ETL jobs to Airflow from Backlog to [FY24/25 Q1-Q2] - Movement Insights on the Movement-Insights board.
Tue, Jul 16, 6:32 PM · Epic, Movement-Insights
nshahquinn-wmf added a project to T333225: Migrate the movement_metrics ETL jobs to Airflow: Epic.
Tue, Jul 16, 6:32 PM · Epic, Movement-Insights
nshahquinn-wmf removed a project from T362594: Update the editor_month table with an Airflow job: Patch-For-Review.
Tue, Jul 16, 6:31 PM · Movement-Insights
nshahquinn-wmf updated the task description for T362594: Update the editor_month table with an Airflow job.
Tue, Jul 16, 6:31 PM · Movement-Insights

Mon, Jul 15

nshahquinn-wmf updated the task description for T362594: Update the editor_month table with an Airflow job.
Mon, Jul 15, 11:12 PM · Movement-Insights
nshahquinn-wmf moved T362594: Update the editor_month table with an Airflow job from Doing to Waiting on others on the Movement-Insights board.

I figured out the anomaly, and I'm waiting until it's fixed to do another backfill of the table: T369851.

Mon, Jul 15, 10:06 PM · Movement-Insights
nshahquinn-wmf moved T370090: Make the movement-metrics repo a Python package from Backlog to Code improvements on the Movement-Metrics board.
Mon, Jul 15, 9:59 PM · Movement-Metrics, Movement-Insights
nshahquinn-wmf triaged T370090: Make the movement-metrics repo a Python package as Low priority.
Mon, Jul 15, 9:59 PM · Movement-Metrics, Movement-Insights
nshahquinn-wmf created T370090: Make the movement-metrics repo a Python package.
Mon, Jul 15, 6:05 PM · Movement-Metrics, Movement-Insights

Fri, Jul 12

nshahquinn-wmf closed T362130: Clean up the movement-metrics codebase as Resolved.

Clean up work is never done, but we've finished this batch!

Fri, Jul 12, 5:58 PM · Epic, Movement-Metrics, Movement-Insights
nshahquinn-wmf closed T362130: Clean up the movement-metrics codebase, a subtask of T359207: Improve the delivery of the movement movements (SDS 2.6.2), as Resolved.
Fri, Jul 12, 5:58 PM · Epic, Movement-Insights
nshahquinn-wmf closed T361329: Convert Wikicharts to use regularly-calculated metrics and canonical data instead of static files, a subtask of T359695: Convert Wikicharts code to modules and clean it up, as Resolved.
Fri, Jul 12, 5:57 PM · Movement-Metrics, Movement-Insights
nshahquinn-wmf closed T361329: Convert Wikicharts to use regularly-calculated metrics and canonical data instead of static files as Resolved.
Fri, Jul 12, 5:57 PM · Movement-Metrics, Movement-Insights
nshahquinn-wmf added a comment to T369325: [MI 3] Investigate trends in movement metrics.

No update.

Fri, Jul 12, 12:31 AM · Epic, Movement-Insights

Thu, Jul 11

nshahquinn-wmf added a comment to T369851: NEW BUG REPORT Mediawiki_history contains duplicate rows for some revisions.

This may help in diagnosing the problem: looking at the snapshot, the number of duplicates is not uniform across event_timestamp. There are almost none until 2014, and then the number generally increases until the most recent month.

Thu, Jul 11, 9:50 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Movement-Insights, Analytics-Data-Problem, Data-Platform
nshahquinn-wmf renamed T369851: NEW BUG REPORT Mediawiki_history contains duplicate rows for some revisions from NEW BUG REPORT Mediawiki_history contains duplicate rows for many revisions to NEW BUG REPORT Mediawiki_history contains duplicate rows for some revisions.
Thu, Jul 11, 5:18 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Movement-Insights, Analytics-Data-Problem, Data-Platform
nshahquinn-wmf triaged T369851: NEW BUG REPORT Mediawiki_history contains duplicate rows for some revisions as High priority.
Thu, Jul 11, 5:17 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Movement-Insights, Analytics-Data-Problem, Data-Platform

Wed, Jul 10

nshahquinn-wmf added a comment to T361329: Convert Wikicharts to use regularly-calculated metrics and canonical data instead of static files.

MR 21, which is in review, removes the unused files. After it's merged, we can resolve this.

Wed, Jul 10, 10:35 PM · Movement-Metrics, Movement-Insights
nshahquinn-wmf closed T368218: Move the code from the calculate notebook into a Python module, a subtask of T362130: Clean up the movement-metrics codebase, as Resolved.
Wed, Jul 10, 10:28 PM · Epic, Movement-Metrics, Movement-Insights
nshahquinn-wmf closed T368218: Move the code from the calculate notebook into a Python module as Resolved.

@Hghani's MR has been merged!

Wed, Jul 10, 10:28 PM · Movement-Metrics, Movement-Insights
nshahquinn-wmf moved T365211: Standardize time range of Wikicharts charts from Backlog to Visualization on the Movement-Metrics board.
Wed, Jul 10, 10:27 PM · Movement-Metrics, Movement-Insights
nshahquinn-wmf moved T365086: Fetch content counts from AQS using internal API endpoint from Backlog to Code improvements on the Movement-Metrics board.
Wed, Jul 10, 10:26 PM · Movement-Metrics, Movement-Insights
nshahquinn-wmf moved T365511: Wikicharts content gap charts should use same style as other charts from Backlog to Visualization on the Movement-Metrics board.
Wed, Jul 10, 10:26 PM · Movement-Metrics, Movement-Insights
nshahquinn-wmf moved T367228: Update Content gap metrics data table and charts in movement-metrics repo from Backlog to Metrics and definitions on the Movement-Metrics board.
Wed, Jul 10, 10:26 PM · Movement-Metrics, Movement-Insights
nshahquinn-wmf moved T368217: Ensure consistent formatting of the movement-metrics codebase from Backlog to Code improvements on the Movement-Metrics board.
Wed, Jul 10, 10:26 PM · Movement-Metrics, Movement-Insights
nshahquinn-wmf moved T368218: Move the code from the calculate notebook into a Python module from Backlog to Code improvements on the Movement-Metrics board.
Wed, Jul 10, 10:26 PM · Movement-Metrics, Movement-Insights
nshahquinn-wmf moved T368995: Move metrics in wikicharts/resources/data to metrics folder from Backlog to Code improvements on the Movement-Metrics board.
Wed, Jul 10, 10:26 PM · Movement-Metrics, Movement-Insights
nshahquinn-wmf moved T369396: Remove chart data from monthly report notebook from Backlog to Code improvements on the Movement-Metrics board.
Wed, Jul 10, 10:26 PM · Movement-Metrics, Movement-Insights

Tue, Jul 9

nshahquinn-wmf added a comment to T362594: Update the editor_month table with an Airflow job.

I ran the second backfill, but ran into another weird anomaly 😩

Tue, Jul 9, 11:00 PM · Movement-Insights

Sat, Jul 6

nshahquinn-wmf updated the task description for T369396: Remove chart data from monthly report notebook.
Sat, Jul 6, 12:37 AM · Movement-Metrics, Movement-Insights
nshahquinn-wmf updated the task description for T369396: Remove chart data from monthly report notebook.
Sat, Jul 6, 12:36 AM · Movement-Metrics, Movement-Insights
nshahquinn-wmf updated the task description for T369396: Remove chart data from monthly report notebook.
Sat, Jul 6, 12:33 AM · Movement-Metrics, Movement-Insights
nshahquinn-wmf created T369396: Remove chart data from monthly report notebook.
Sat, Jul 6, 12:24 AM · Movement-Metrics, Movement-Insights

Jul 5 2024

nshahquinn-wmf updated the task description for T362594: Update the editor_month table with an Airflow job.
Jul 5 2024, 2:09 AM · Movement-Insights
nshahquinn-wmf updated subscribers of T315024: Creating a Spark session causes a torrent of log spam.

By chance, I discovered that there is a quieter_spark_log4j.properties file in analytics-refinery. It's not quiet enough for our purposes, but we could use it as the base for a configuration we like. Refinery is also deployed automatically to HDFS, so that's a good way to make it widely available.

Jul 5 2024, 1:46 AM · Data-Engineering, Product-Analytics
nshahquinn-wmf renamed T369327: Choose topics for first two trend investigations from Choose topic for first trend investigation to Choose topics for first two trend investigations.
Jul 5 2024, 12:44 AM · Movement-Insights
nshahquinn-wmf moved T369325: [MI 3] Investigate trends in movement metrics from Incoming to [FY24/25 Q1-Q2] - Movement Insights on the Movement-Insights board.
Jul 5 2024, 12:08 AM · Epic, Movement-Insights
nshahquinn-wmf triaged T369327: Choose topics for first two trend investigations as High priority.
Jul 5 2024, 12:05 AM · Movement-Insights