Talk:Wikimedia Foundation Annual Plan/2023-2024/Product & Technology/OKRs

WE1: Contributor experience

Latest comment: 1 year ago46 comments20 people in discussion

comments relating to Bucket 1: Wiki Experiences, Objective 1

Result 1 Unreverted contributions

This is definitely the most off-putting aspect of the projects, particularly for young contributors excited about current events, sports, music, &c. We should consider making spaces that are designed for and friendly to unreverted contributions : draft spaces or interfaces with fewer of the constraints that make contributors feel that feedback from the editing community is reverting or undoing their work. For all contributions this could include ways to leave non-revdeleted contributions visible to the original contributor/uploader, with better custom explanations (better than unfriendly warnings, disappearances, and the language of deletion and reversion) of what is going on when these are removed from their original namespace or made less visible. That contributes to this result as drafted, but would also have long-term improvements for accessibility and welcoming especially for newcomers with limited context or on mobile with limited (visual and network) bandwidth. –SJ talk 23:43, 12 April 2023 (UTC)Reply

Hi @Sj!

Nice to E-Meet you, my name is Jaz and I am the Lead Product Manager for the Wikipedia Apps.

Thanks for taking the time to share this reflection. I agree with you on the importance of newcomers feeling welcomed. The past few annual planning cycles had an explicit focus on newcomers (young contributors), which was very energizing. This year is intended to focus on ensuring as we grow young contributors, moderators have the capacity to keep up with those contributions. Supporting our more tenured volunteers with their moderation load is also really exciting and important to balance. When crafting this key result we aimed to represent impact (growing content that is unreverted and consumed by others) rather than output (# of contributions overall). The thinking here is if that young contributor leverage things like Growth Experiments, Edit Check or their Sandbox/draft space as a safe or guided place to learn how to edit, they will ultimately get to a place of contributing to the growth of Wikipedia content that is not reverted.

I hope this additional context about our thinking when crafting this KR is helpful. JTanner (WMF) (talk) 22:28, 14 April 2023 (UTC)Reply

I assume this section intentionally doesn't include contributions to Commons, i.e. photo uploads from mobile devices. From all I know this is probably a good idea. Just wanted to clarify. --Thiemo Kreuz (WMDE) (talk) 13:37, 18 April 2023 (UTC)Reply

Hi @Thiemo Kreuz (WMDE)! Hope all is well.

Your assumption is a correct one. We aren't including Commons contributions for this specific KR at this time. There is a Commons specific KR in the Future Audiences bucket. JTanner (WMF) (talk) 00:54, 19 April 2023 (UTC)Reply

Result 2 Workflow improvements

One workflow to look at would be regarding abusefilters (and also somewhat relatedly spamblacklist). Basic functionality needed to test abusefilters to make them easy to edit without breaking stuff is done using user scripts (and a lot more could be done). I think a lot more anti-abuse stuff and improvements to anti-vandalism filters could be done if abusefilters were improved. This would improve things across all wikis and would also improve new user experience since they'd be less likely to be accidentally stopped from making edits from false positives. Galobtter (talk) 04:09, 13 April 2023 (UTC)Reply

@Galobtter Thanks for sharing your thoughts! I'm the Product Manager for the Moderator Tools team so I'm particularly interested in requests for improvements to tools like AbuseFilter and SpamBlacklist. Could you elaborate a little on the testing improvements you'd like to see for AbuseFilter? Special:AbuseFilter/test exists already so I'm curious where you see that functionality not being sufficient and/or what other testing mechanisms would be helpful.

I also wanted to highlight that the AbuseFilter has directly inspired the Editing team to start work on Edit check - a tool that aims to support new editors and reduce moderator burdens by guiding newer editors towards positive contributions before they submit their edit. This could reduce the need for edit filters aimed at good faith contributions - they'd love to hear what you think if you haven't shared your thoughts over on the project page already. Samwalton9 (WMF) (talk) 09:41, 13 April 2023 (UTC)Reply

My number one request would be to implement something like en:User:Suffusion of Yellow/batchtest-plus (see also phab:T36180). The current test implementation is extremely barebones - before the script existed I would manually have to examine a bunch of past hits to make sure I didn't accidentally break the filter. I think a proper implementation could also include some automated checks on save to prevent people from accidentally blocking a lot of edits or making the filter not work. Something like phab:T208842 would also be nice but less essential.

Showing abusefilter changes is also tablestakes functionality which doesn't exist (also see en:User:Suffusion_of_Yellow/filterDiff). The current filter notes are also pretty bad; en:User:Suffusion_of_Yellow/filterNotes helps. I think these two and lots of other improvements could be solved with phab:T227595 but that's a pretty hard task.

@Suffusion of Yellow: probably has more thoughts also lol

Edit check looks useful and I'll leave some thoughts there. Galobtter (talk) 21:31, 13 April 2023 (UTC)Reply

To add to that, en:User:Suffusion of Yellow/batchtest-plus is also useful because it allows for easy checking of false negatives. Galobtter (talk) 21:47, 13 April 2023 (UTC)Reply

Also in terms of functionality improvements, I'd loooove en:WP:Deferred changes, there are so many edits where it 100% should be reviewed in a centralized fashion but can't be disallowed. en:Special:AbuseFilter/39 is one important one - half the edits that go through are literally libel and should absolutely be reviewed but there's no way to do that. Most of the tagging vandalism filters could use that too. Galobtter (talk) 23:02, 13 April 2023 (UTC)Reply

@Galobtter: Thanks for linking to the user script. Perhaps phab:T329359 would be the next step towards implementing the user script? It would make tools way more efficient if someone can just supply a list of diffs to be evaluated. 0xDeadbeef (talk) 03:01, 14 April 2023 (UTC)Reply

I agree; if there's to be a focus on one improvement, it should be DC. A simpler possibility might a "reviewed" button at Special:AbuseLog, available to to users with some minimal right. That could reduce duplication of effort, at least, but unlike DC it wouldn't prevent bad edits from going live to every reader. Suffusion of Yellow (talk) 20:53, 17 April 2023 (UTC)Reply

A simpler possibility might a "reviewed" button at Special:AbuseLog, available to to users with some minimal right. Yes this would be great and relatively easy to implement. And doing only this would allow this change not be tied to the the functionally dead mw:Extension:FlaggedRevs. Galobtter (talk) 22:55, 17 April 2023 (UTC)Reply

It might be tough to find a metric here, I think users which are dealing with a lot more of moderating should answer that one rather than me. I know English wikipedia does have things like en:Template:Vandalism information, but I doubt WMF will use that. CAPTCHA is an outstanding issue here. Looking at past phabricatior discussions (mainly phab:T241921) it seems like if we want a better CAPTCHA, then it is a matter of improving open-source proof-of-concepts to usable state. It will also be interesting how effective StopForumSpam phab:T273220 will be on spam. IP Editing: Privacy Enhancement and Abuse Mitigation/Improving tools#3. A database for documenting long-term abusers is also going to be really useful.--Snævar (talk) 19:09, 13 April 2023 (UTC)Reply

All the stuff at IP_Editing:_Privacy_Enhancement_and_Abuse_Mitigation/Improving_tools#Existing_tools_used_by_editors looks pretty useful. Galobtter (talk) 21:48, 13 April 2023 (UTC)Reply

I don't believe much thought has gone into the recent changes patrol workflow, which is a significant time drain for our most experienced editors (think of 1000-2000 patrolled changes a month). On the RC page, unpatrolled changes are marked with a red exclamation mark, but not on the user contribution and page history pages. Each change (diff) must be opened separately and marked as patrolled, which requires a lot of switching between browser tabs: RC page ! → page history to check for recent edits by the same or other users (which ones are unpatrolled?) → individual diffs to check their patrol status and to patrol → back to RC page where we left off. External tools are not always available, and some do not work on non-English wikis at all; most patrollers are even wary of installing user scripts! Upon patrolling thousands of changes, I came to realize that a single page to patrol recent changes in the style of inline diffs with a patrol button is the most efficient workflow: one can open and check several diffs at the same time, mark them as patrolled with a single click, and do it from any of the lists. I proposed the tool in our 2023 Community Wishlist Survey (here) and received some support; I suspect, though, that users with no advanced rights will not feel the pain of having to patrol hundreds of changes and are unlikely to vote in favor of such ideas in sufficient numbers to make them happen. I'm not saying this idea should be the final outcome, but a thorough examination of the workflow may yield a solution that saves many (wo)man-hours. pon_{o_r} (talk) 13:46, 14 April 2023 (UTC)Reply

Hello @Ponor -- I'm Marshall Miller, one of the director of product at WMF. Thank you for doing so much important patrolling work! I know that it can be a grind, and if we can help patrollers spend their time more sustainably would be a great outcome for this moderators-focused key result. @Samwalton9 (WMF) and I are curious to learn more from you. Here are some questions, if you have time to answer:

On which wiki do you do patrolling work?
Do you prefer to use tools like Huggle to do patrolling? If not, why not? I know those tools are not available on all wikis.
Are you talking about patrolling in the context of FlaggedRevisions? Or also on wikis without FlaggedRevisions?
Would you want to patrol from a mobile device, if there were an easy way to do so?
Do you prefer to patrol changes on all sorts of topics, or would it be helpful to be able to filter to topic areas (e.g. Biographies, Music, Geography, etc.)?

Thank you! MMiller (WMF) (talk) 00:56, 15 April 2023 (UTC)Reply

Thank you for your response, MMiller. I'd love to help with this as much as I can.

(which wiki) Patrolling and fighting vandalism through Abuse Filter has been my main (wiki)preoccupation for over a year, in addition to extensive bot editing. On hrwiki we strive to have 100% changes patrolled. Last year we had some 6000 patroller actions a month (3300 IP edits, 1900 reverted), which fell down to 3500 this year (1300 IP, 500 rev'd). That's a major relief for our patrollers and we are no longer chasing vandals all day every day, we actually get to do some of our own work.

(external tools) I installed Huggle on my Linux HiDPI laptop once or twice, it never worked. I don't think any other patroller on hrwiki uses Huggle either. We patrol whenever we can from wherever we can, that's why I'm thinking this should be a core functionality. We have RTRC gadget installed, which comes close to what I want when it comes to patrolling, except for the fact that it shows the diff on top of the page, and I like picking changes from the list of changes instead of going from one to another in sequence; as I check the diff on top, I loose track of where I left off in the list. I also like to check page history and user contribution pages for unpatrolled edits, so I made this little script that adds a patrol button to the three lists, and I paired it with with inlineDiff (which I like better, but it works only on English-speaking wikis) or ExpandDiffs (for other wikis), and I'm very happy with the result. Feel free to try the combination yourself, I myself am running it on test.wiki.

(flagged revisions) I have no experience with Flagged Revisions.

(mobile patrol) I already patrol from my phone when needed; the two-script combination works on m.wiki (on all pages when Advanced mode is on, because the scripts use some specific html hooks). Quite happy with that!

(topics filter) No strong preference. I patrol vandalisms on all pages. Same with fixing references, formatting, templates. When it comes to fact checking, I prefer STEM fields, "common knowledge" pages, tend to avoid history and pop culture. The filter would probably be more helpful on a few wikis with extremely high edit rates, and I can't speak of their workflows.

Let me know if I can be of any further help. We can continue the discussion elsewhere, even by email or your internal lists. pon_{o_r} (talk) 14:23, 15 April 2023 (UTC)Reply

@Ponor Thanks for elaborating, this is really helpful. I have some further questions if you don't mind! You noted that patroller actions decreased from 6000 to 3500 - is that specifically because you introduced abuse filters and bots? It sounds like a real win to be saving that much patroller time and effort. Did you have any filters before you started working on them, or is this the first time for hr.wiki? Thanks for noting the difficulties of recent changes patrol - I agree this is something we haven't thought about as much as we should have. Samwalton9 (WMF) (talk) 13:12, 17 April 2023 (UTC)Reply

@Samwalton9 (WMF) Happy to help! I can't say if the decrease is entirely due to the filters, or if there's a slight global decrease in the number of edits as well. Some filters existed on hr.wiki before, but were more permissive (set to warn or tag only). The situation was unsustainable. After becoming a sysop I spent good two months studying our IP (!) vandals, and started filtering their edits more aggressively sometime in May 2022. This includes blocking on certain actions to prevent them from having any opportunity to circumvent the filters (easily). What's not included in the patrol numbers that you mentioned is me watching (patrolling) the abuse filter logs, closely in the beginning, but even now almost daily (still much easier than having to clean up the mess in the articles). If there are any significant false positives, we help with those edits. Things are calmer now in the filters as well: it seemed to me that we had kids bored at school vandalizing in packs, and I'm guessing they've found other venues to amuse themselves. The filters definitely helped with the number of manual blocks, which are also labor-intensive actions: 30 in March 2023 vs. 140 in 2022 and 200 in 2021 (total blocks are around 280 for all three years).

Regarding the RC patrol: I've checked some random pages and screenshots of what your teams have been working on. I feel there's tendency to treat changes as self-contained pointlike edits that can be accepted or reverted, while in reality inexperienced editors make many, usually overlapping edits that need to be patrolled as one. The scripts I'm using help me with this: I check the cumulative diff (and use DiffEdit if something needs to be fixed), and use the [patrol] buttons to quickly mark all edits as patrolled (on either of the three history pages), instead of opening and patrolling every diff separately. Speaking of patroller workflows...

LMK if I can be of any further help! pon_{o_r} (talk) 15:00, 17 April 2023 (UTC)Reply

I want to note that very few projects without FlaggedRevs actually seek to manually mark every edit as patrolled. If a project really wants that...it's a bit late to get FlaggedRevs. The WMF spending time on what's essentially remaking FlaggedRevs (Ponor's idea) would not be useful, imo, as this is not a common problem or something that isn't addressed by existing, community-made tools. In terms of real-time recent changes, there are tools made for non-English projects that are heavily modifiable, such as SWViewer.

What's much more common is new page patrolling, especially on projects (such as my home wiki) which allow IP page creations. The English Wikipedia has a shiny interface for new page patrolling, and no one else does. It has long been something other projects have wanted...and hopefully something that Product can put time into setting it up for other wikis as well, or making it more easily installable/modifiable. Best, Vermont (🐿️—🏳️‍🌈) 00:40, 18 April 2023 (UTC)Reply

Hm... this only means that we have another set of problems, globally, because tools, and I can't stress this enough, that are not part of MW are not easily discoverable and transferable. I know nothing about FlaggedRevs and how different they are from *whatever* we have for patrollers on hrwiki. I do know of a few projects that are decently staffed and do try to have all changes marked as patrolled (communicated, fixed, reverted, etc.); I've also seen wikis that look like vandals' heavens, I'm guessing everyone there has already given up. As for other wikis' community-made tools, they're... well, other wikis' community-made tools (happy for them!), but we should be looking for unified user/patroller experience. SWViever? I just gave it another try; I'm not the stupidest person and I clicked on various options, but during this 1 hour it showed me 5 IP edits, no RC context (already reverted? fixed? patrolled?), some oversized diffs, and no option to mark as patrolled; once I saw the diff, the change disappeared. Of 30 other patrollers, I know no one who uses the tool. I suppose it's good for reverting vandalisms, but patrolling (for us) means a lot more. pon_{o_r} (talk) 14:58, 1 June 2023 (UTC)Reply

I'm adding here some thoughts/open research questions from a short brainstorming discussion between Product Analytics/GDI/Research team members on WE1.KR2, in hopes that they are helpful as you develop this goal. Please feel welcome to contact me (TAndic (WMF)) or any of the people below for any questions or feedback.

We know that rejection of editors' work can be a strong demotivating factor. Will easing moderation worsen demotivation in editors? Are there ways to soften the detrimental affective impact delivered by the systems we develop? -- MWang_(WMF)
Can the Community Insights questions for administrators help in measuring overall moderator satisfaction/community health as these workflows are implemented, or do we need a different approach? -- TAndic_(WMF)
What global and local patterns of moderation can inform how to build workflows that work across wikis? What differences exist between wikis, and can we look to block logs to understand this better? Can community ambassadors help facilitate understanding of these similarities and differences? -- Neil_Shah-Quinn_(WMF)
What ground-truth data can we use to create language-agnostic ML models to assist patrollers? -- Pablo_(WMF)
What's the boundary of "moderation" or "editors with extended rights"? It's good to be precise on what actions and users you're working with. -- Any of us on this list

- TAndic (WMF) (talk) 11:06, 3 May 2023 (UTC)Reply

Brakuje mi tu łatwej możliwości dodawania tematów na tej stronie ;-)... Ale to tak na marginesie.

Jeśli chodzi o moje doświadczenia, to muszę powiedzieć, że przeglądanie zmian i ogólniej obsługa obserowanych to jest coś na co powinien spojrzeć team od UX. Ostatnio poprawiłem sobie trochę CSS dla obserownaych na moblikach. Dzięki temu mogę łatwiej usuwać z obserowanych krzyżykiem (przyciski były za małe, za małe odstępy, niezbyt przemyślany układ). Jest też kwestia przeglądania zmian specyficzna dla ograniczonej liczby wiki. Jak wchodzę w podgląd zmian, to nie ma pod ręką przycisku do zatwierdzenia oraz wycofania zmiany. Zatwierdzenie i (rzadziej) wycofanie to jest to co najczęściej robię po sprawdzaniu diffa w wersji mobilnej. Nux (talk) 23:35, 16 May 2023 (UTC)Reply

Kontekst: zdarza mi się przeglądać zmiany w... w zamkniętym pomieszczeniu, w przerwach ;). Nux (talk) 23:42, 16 May 2023 (UTC)Reply

@Nux Thanks for your comment! Apologies for responding in English, and please let me know if I've misinterpreted anything. Our team has been doing some work on content moderation on mobile web, and one of the projects we've just done some design work on is improving the diff page - if you have thoughts on the designs we've posted there I'd love to hear them. We also noticed that the Watchlist could use improvement. I'm not sure if our team will work on that, our priority is going to be the diff page in the short term, but thank you for bringing it up. Samwalton9 (WMF) (talk) 11:58, 17 May 2023 (UTC)Reply

@Samwalton9 (WMF) that looks good :). Currently bottom bar on the diff is too big so its good this will change.

What is missing now and also in this new design is PendingChanges support. Would be great if you could add a review action (as a 3rd button or at least in the bottom menu). Nux (talk) 00:12, 18 May 2023 (UTC)Reply

@Galobtter, Suffusion of Yellow, Snævar, Ponor, and Nux: Hi all - I just wanted to follow up in this thread to say that our team has published information and open questions about the project we're exploring for next year at Moderator Tools/Automoderator. Our hope is that we can reduce the overall burden of patrolling new edits, freeing up moderator time. I'd really appreciate your thoughts and input on that talk page. Samwalton9 (WMF) (talk) 12:04, 1 June 2023 (UTC)Reply

Result 3 Core articles

We should be careful not to conflate percentage of content, with real increase of content in a specific area. If the number of geography articles doubled, but random Youtube Celebrity articles increased by 100x, it would mean a decrease in geography articles percentage wise, but still a net improvement to geography coverage. Given that Wikipedia has virtual unlimited space, we shouldn't care about "non-core" content, as much as we should care about the findability/quality of existing core content. Maybe that's obvious, but let's avoid using percentage here, otherwise deleting non-relevant articles would also be another way to achieve the same goal ;p Shushugah (talk) 14:41, 12 April 2023 (UTC)Reply

+1 - increase in (coverage + quality) score or those categories? In projects that don't have content evaluation (like WP 1.0!) setting up something like that, & writing bots + scripts that can make this easier for communities to manage, could be a useful intermediate step.

Scaling issue of metric past top-20 wikis: TL;DR: There is typically three classes of articles in sub-top-20 wikis: stubs, good articles and featured articles. Stub may have an different definition. Further explaination: Start-class is a top-20 wikipedia metric. It rarely exists outside of those. Stub is common outside of top-20, but it is probably defined differently. In my homewiki, a mid-sized wiki, stub is defined as an uncomplete article, in terms of amount of content, not the presentation (f.x. styling) of it. Size of an article in bytes is also not usable, because an article can be outside of ten thousands of kilobytes but would still be complete.
Implementation-wise, I think Edit check and an possible collab between the Language team and Growth team on article translation would prove to be successful. I am aware that Growth has attempted to work with the Language team.--Snævar (talk) 19:09, 13 April 2023 (UTC)Reply

Hello @Shushugah , @Snævar. Thanks for sharing your comments. In case we have not met before - I am Runa, the Senior Director of Product for Language and Content Growth, and oversee the operations of several teams that are engaged in this area of work. I hope my inputs below will be helpful for this discussion and happy to follow up further.

During our preparatory work we have indeed pondered over the interpretation of the percentage value, and potential risks of the kind that you highlighted. Coming with a perspective of knowledge gap as manifested through content gap across languages, there will inevitably be some layering to how this will look like in real terms in the coming days.
@Snævar's 2nd point about definition is spot on. We expect quality qualifiers to vary across mid-sized wikis and the solutions implemented under this key result to take those into account, while at the same time working with the communities around those wikis to find common ground.
A narrow example for instance - if mid-sized wiki A indicates that an article with 'x' key sections (of their determination) is not a stub, and mid-sized wiki B indicates that an article with size 'y' lines (of their determination) is not a stub, then we would like to help support their approach of growing content of that structure/size in a way that allows them to accelerate that process meaningfully, and also help provide visibility across wikis for the opportunity to share from each other. We hope that the content contributors/reviewers and the Wikipedia admins will have key insights here as things start rolling.
While I do hope deleting 'non-relevant' articles will not happen (unless needed), we recognise that it is a choice that the community associated with a Wikipedia can make as part of their regular content review processes (including outside of any initiatives associated with this key result) and may influence the metrics.
As you are most likely already aware, geography and gender was identified as important topics within the Movement Strategy Recommendations, and these topics are often the focus of initiatives to boost content growth. This is a huge alignment opportunity as well under this key result, and we can develop ideas which can help existing processes within those initiatives.

Runa Bhattacharjee (WMF) (talk) 15:10, 14 April 2023 (UTC)Reply

Adding a section here to address an off-wiki question that we had received on this topic:

Q: it looks like WMF saying it is going to start writing wikipedia articles itself directly (especially about gender and geography topics)
- As Wikimedia Foundation has a clearly defined separation in this matter, we do not intend to write any content directly as part of this or any other objective of the annual plan. The intention is to facilitate the content growth opportunities that currently exist, are known, or could be initiated within the movement (with or without direct support from WMF teams that already support these activities e.g. Campaigns) and offer meaningful product and tech solutions aligned with recommendations under the movement strategy and other parts of the WMF annual plan. Thank you. --Runa Bhattacharjee (WMF) (talk) 15:25, 14 April 2023 (UTC)Reply

I'm afraid I don't understand this because of the way it is phrased. What is a "share of articles"? What's "start-class"? What's meant to increase here, in relation to what? Can we rephrase this in simpler English? --Thiemo Kreuz (WMDE) (talk) 13:43, 18 April 2023 (UTC)Reply

Hello @Thiemo Kreuz (WMDE) -- thank you for reading and reacting to these OKRs. I think I can help answer your questions!

"Share of articles" refers to the portion of the articles in a wiki. It's like saying "percent of articles", but sometimes I feel like the word "share" is less ambiguous. Here's an example why. If it said, "X% increase in the percent of articles in high-impact topics in mid-size Wikipedias", that might be mistakenly read as referring to the percent of all Wikipedia articles across all wikis that are "articles in high-impact topics in mid-size Wikipedias". But we want to be referring to the percent of articles inside a given mid-size Wikipedia that are in high-impact topics. I don't know if there is a perfect way to phrase it to remove all ambiguity. Does that make sense?
"Start-class" is a quality designation that exists in English Wikipedia. Thank you for reminding me that most wikis don't have that designation. In practice, we will need to figure out how we do want to measure quality in ways that work for all Wikipedias. So for now, I changed the text you saw to say, "that meet YY quality criteria". I hope you think that's better.

MMiller (WMF) (talk) 01:30, 20 April 2023 (UTC)Reply

I still don't understand. "Increase in the share of articles in high-impact topics" sounds like something should be increased in certain topic-areas – but what? What "aligns with Foundation-wide metric"? Which "metric"? Do you want to increase the number of articles about (for example) gender and geography topics in relation to the total number of articles? Do you want to increase the quality of articles in certain topic-areas? Maybe the sentence is simply missing a word and should be "increase in the share of high-quality articles in high-impact topics"? Thiemo Kreuz (WMDE) (talk) 07:14, 20 April 2023 (UTC)Reply

WE1 KR3, says: "X% increase in the share of articles in high-impact topics in mid-size Wikipedias (of articles meeting shared quality standards) -- high-impact topics to be chosen as a collaboration between departments, potentially starting with gender and geography."

What is meant by "departments"? Departments of the WMF? Amir E. Aharoni (talk) 07:13, 19 April 2023 (UTC)Reply

Yes. Runa Bhattacharjee (WMF) (talk) 01:17, 20 April 2023 (UTC)Reply

I'm posting here some thoughts from a short discussion session about WE1 KR3. Our group included CMyrick-WMF, IFlorez (WMF), Isaac (WMF), and MPopov (WMF). After reading WE1.3 on Core Articles and thinking about important background research to be aware as well as open questions, we came up with the following short list:

Relevant background research: Knowledge Gaps Taxonomy (paper)
Open question (and current research): the Global Data and Insight Team is running experiments to understand the effect of campaign-related training interventions. How might those results inform this work?
Open question: How would we track the source of this change in article topic share – e.g., is it coming from organic editing or from specific campaigns?
Open question: Are gender and geography capturing the important gaps? Also don’t want to throw out other important equity work (e.g. religious). Digging deeper, if there is an increase in the share of articles in geography, whose geography?

Don't hesitate to let me know if you have feedback or questions and I'll do my best to route to the appropriate individual. --Isaac (WMF) (talk) 20:50, 1 May 2023 (UTC)Reply

Result 4 IP Blocks

This object is absolutely on point, but I am confused how to measure this key-result without conflicting false positives. There are two mutually countering measures to be implemented. One is increasing the visibility of finding an IP exemption (which can only be validated in isolation), and second, measuring that a decrease in exemption requests occur, due to improved blocking. — The preceding unsigned comment was added by Shushugah (talk)

In the first years, we might want to see an increase in exemption requests granted. Needs testing. –SJ talk 23:43, 12 April 2023 (UTC)Reply

One thing that seems relevant to this is making sure that IP users who are inadvertently blocked actually understand what do about it, which includes showing the block message. I tried editing from my mobile phone connected to T-mobile's network, which is blocked globally and on enwiki (en:Special:Contributions/2607:FB90:0:0:0:0:0:0/32), and both the mobile and desktop website didn't show me the customized block message en:Template:TMOblock which explains what's going on clearly. It instead said "your editing is blocked because of multiple blocks". The mobile message did tell me to log into an account if I have one, but the desktop one didn't even say that. (I guess this is some manifestation of phab:T233996).

Relatedly, I'm not sure if appeals are the best metric. If a block is causing collateral, it is still unlikely to be lifted, unless the collateral is super high. Seems to be better to focus on making sure IP users know how to get to Wikipedia:Request an account and are able to edit in other ways. Galobtter (talk) 04:58, 14 April 2023 (UTC)Reply

Thank you for thinking about this, @Shushugah, @Sj, and @Galobtter. This was a difficult key result to construct, for the very reasons you're pointing out -- but I'm glad you think that work to improve the IP blocks situation is warranted. We think there are two big problems with blocking: (1) too many people get blocked erroneously, and (2) it's hard for those people to get unblocked. I think if we had an absolutely ideal blocking system, what would we see is: (a) not too many blocks happening because we are able to block just the right people, (b) of those blocks most everyone would appeal because it would be obvious how, (c) of those appeals very few would be granted because the blocks would have been the right ones in the first place.

So if we work towards that ideal, we want to see more of the blocked people appealing (i.e. we've made it easier to request an exemption), and fewer of those appeals being granted (i.e. we've gotten better at blocking the right people in the first place). That's what this metric expresses. I think a key aspect of how the KR is written is that we are talking about the "share of IP blocks that get appealed" -- so we're not trying to increase the overall number of appeals, but rather the percentage of blocked users who appeal. If we only improve the request flow, then yes, the number of appeals would increase at first. But if we also figure out how to get more surgical with blocking, then the number would go down, even if the share of blocks getting appeals goes up. Another key aspect is our thinking that most anyone who finds themselves blocked would want to appeal if they could -- whether they are blocked erroneously, or whether they are an actual abuser. What do you think of that assumption? Do you think that's true? Please help us clarify and sharpen our thinking here in this complicated domain!

Galobtter, regarding your point about whether a block would be lifted if causing super high collateral -- do you think that improving the exemption flow and then clearly reporting on the number of exemption requests would help functionaries judge whether the collateral is high enough to change the block? And regarding the idea of directing users to "Request an account", maybe that sort of "counts" as an appeal flow (along with exemption requests), which we could report on for functionaries?

Another question we're trying to be mindful of is whether functionaries would be worried about an increasing workload of exemption requests. Is that a substantial burden right now? MMiller (WMF) (talk) 01:15, 15 April 2023 (UTC)Reply

I don't deal with IP range blocks all that much, but in general I think the thinking is - sure the collateral is a lot, but there's nothing that can be done. e.g nobody wants to block all of tmobile usa, we already know there's going to be a lot of collateral, but if long-term abusers are using it for vandalizing Wikipedia, and the abusefilter isn't being effective, there's nothing that can be done but keep it blocked - if it's unblocked it's going to be used for a lot of vandalism.

And since en:WP:HARDBLOCKs are relatively rare, I think the idea is that most people should have access to an unblocked home wifi or somewhere where they can create an account and then be able to edit.

And regarding the idea of directing users to "Request an account", maybe that sort of "counts" as an appeal flow (along with exemption requests), which we could report on for functionaries? that seems reasonable.

I think overall this idea needs workshopping with checkusers and other admins familiar with range blocks. Galobtter (talk) 03:34, 15 April 2023 (UTC)Reply

┌─────────┘
Hi @MMiller (WMF): Now that you mention it, improving the efficiency and reducing administrative workloads should perhaps be the primary measure of new work like this. Workloads are high across the board, and affect who does the work and burnout as well as marginal efficacy. Some measures that could help lighten loads:

(Estimate exemption workload.) Lower total work expended per exemption / account-creation (yes, I'd include successfully requesting an account)
(Estimate collateral blocking of current rangeblocks.) Lower total collateral blocking --> measure of increased precision of blocks.
(Estimate likeliness of a block to be successfully appealed.) Lower % of blocks that are unnecessarily appealed (obviously not going to be exempted). Increase the % of borderline blocks that are appealed / get accounts created.
You don't want to build an "automatic appeal" that makes the act of blocking take 20 times longer. Give guidance + flow that help those with a plausible appeal find a way to edit.
(Simplify the explanation for blocked users.) Increase the % of users experiencing blocks who understand what is going on, and ways they can edit / propose edits / request accounts if they were not blocked due to their own actions.

Any of these that don't have an obvious automatic metric could be measured through spot-checks or surveys of affected users.

A weakness of using OKRs for complex systems is the incentive to find a number to optimize even when externalities aren't yet known / measured / included. Explicitly making objectives for "developing metrics for expected externalities" and "doublechecking function's are becoming more efficient alongside realizing other objectives" can help insure against process-creep burnout. –SJ talk 14:34, 17 April 2023 (UTC)Reply

The current Key Result isn't in line with community plans for those metrics; SJ's comment is leaning in the right direction. The Key Result cites that we want the number of appealed blocks to increase, and the number of appeals that result in an unblock to decrease. I don't understand how these are useful metrics, even after reading the explanation section. A few things:

The number of appeals is not a reflection of whether people know about appeals processes. Especially with proxy blocks, 90% of the time our response is going to be "turn off your VPN." We're working on a wizard to help ensure that only people who have a chance at a successful appeal, i.e., people with a valid reason to get an IP block exemption, end up sending an appeal.
- We want to improve the information that blocked users are offered, not number of appeals. "Mak[ing] the appeal process clear" would optimally result in people knowing whether they have a chance at an appeal prior to going through the effort of writing one, which also takes functionary time to respond to and close. Instead, we only want to receive appeals from people who have valid reasons to appeal.
- In other words, the new wizard, if successful, would result in fewer appeals and a larger share that are accepted, which is the direct opposite of the Key Result indicator.
Appeals of IP blocks very rarely result in an unblock of the IP itself, and is basically limited to situations where the appelant is the target of the block, or where the appeal reminds local admins (or stewards) that the block is no longer necessary.
- Let's take an example of a long-term abuse rangeblock on a local project. If someone collaterally affected by the block (not the LTA) appeals, this does not change the reasoning for or validity of the initial block. The collaterally affected user would likely be given IPBE, rather than unblocking the LTA's IP. If, however, the appelant is the target of the block, then it's treated as any usual LTA appeal would be, and isn't really relevant for this objective.
- In other words, the actual unblock rate is irrelevant in the vast majority of cases, i.e., every case where the appelant is collateral or if it's a proxy block. What's pertinent is the amount of appeals which result in either account creation, (g)IPBE granting, or both.
- I'm also confused by "if we are able to block only exactly the right users, then we'll see very few of them getting unblocked." Blocks with high collateral don't necessarily result in unblocks; it depends on maaaany factors, including the level of current abuse, whether it's a p2p connection, abuse history/frequency, range size and allocation, ISP, etc. As long as we use IP blocks, there will be collateral. IP unblock rates are not a measure of block accuracy.

I hope this helps! Best, Vermont (🐿️—🏳️‍🌈) 00:28, 18 April 2023 (UTC)Reply

@Vermont @Sj I want to drop a quick comment here to say that this feedback is very valuable. Thanks so much for taking the time to read through the annual plan and respond to it. We are looking into revising the KR. I will respond back when we have a proposed change in mind. This might take a while. Thanks for your patience. -- NKohli (WMF) (talk) 09:32, 21 May 2023 (UTC)Reply

In 2020 the Portuguese Wikipedia decided to make registration mandatory to edit, that is equivalent to block all IPs without block the account creation. That change was positive as can be seen in this study, and as a ptwiki volunteer I can say the community is very satisfied with that change and can not image return to the situation we had before it. We realized when all IPs were blocked that the positive effect in the registered users were significantly higher than the negative effect in the contribution of users that don't want to register to edit, and the high amount of new registered users showed us that many users that were editing as IPs didn't see the registration as a big problem to continue to edit. So, in my opinion, IP blocks are not that bad as we need to create an objective to increase the appealing on them. Danilo.mac talk 15:57, 3 May 2023 (UTC)Reply

--note: with regards to WE1.4, this has now been cut form the document. See note below: #Removal of some KRs from the plan. LWyatt (WMF) (talk) 18:45, 24 June 2023 (UTC)Reply

Result 5 Wikifunctions Abstract Wikipedia

We are adding a KR for Wikifunctions to the annual plan because it supports the growth of quality and relevant content, and not currently captured in the plan. It wasn’t included earlier this year for logistical reasons, as the team was undergoing personnel changes. Since then, Wikifunctions has launched and the team is stable and actively developing the product.

WE2: Reading and media experience

Latest comment: 1 year ago5 comments4 people in discussion

comments relating to Bucket 1: Wiki Experiences, Objective 2

Firefox and Edge allready have an implementation like this called "reading mode" and Chrome is working on one. If it is needed, given that it does exist, then we should not be re-inventing the wheel but look into what can be re-used. MediaWikis own lazy-image loading also comes to mind.--Snævar (talk) 19:09, 13 April 2023 (UTC)Reply

Reading "media experience" got me excited for a moment, only to learn that "we have to deprioritize engagement with images". The thing is: The Commons community is begging for help for quite a while now, reaching a point where they describe Commons as being unusable for anything that involves more than a handful of pictures. ~~What are we going to tell them? Is this potentially covered by another OKR?~~ Edit: I found this is covered by FA1. --Thiemo Kreuz (WMDE) (talk) 14:22, 18 April 2023 (UTC)Reply

Result 2 Holding "interested" readers

I'm afraid there is a fundamental misconception here. We might be the only website in the world that doesn't need to trick users into not leaving the site. Quite the contrary: The most successful user experience is when a user finds exactly what they need and immediately leaves the site. That's the opposite of this OKR. I can see an attempt to work around this conflict by talking about "interested" readers. But how is "interested" defined? Is someone "interested" when they keep clicking for a certain amount of time? How do we get a meaningful baseline for this, and how do we tell if an increase is actually a positive effect? Do we simply assume people are somehow "more interested" when they keep clicking longer? What if they just got lost and don't know any more what they have been looking for? Do we still count this as a success? --Thiemo Kreuz (WMDE) (talk) 14:22, 18 April 2023 (UTC)Reply

WE2.1 Reader Experiences : Recommendations for Quality reading experience by adapting the default experience

Build on previous research done by the research team on e.g., readability and combine this with readers surveys
Use these readers surveys to help extend and validate readability research
Generate recommendations for potential interventions to individual contributor communities (in collaboration with product teams)—particularly because contributor communities may differ quite a lot from readers
(Important note that none of this would require explicit customization based on reader characteristics but could simply inform defaults)

cc: --MKampurath (WMF) (talk) 19:14, 28 April 2023 (UTC)Reply

--note: with regards to WE2.3, this has now been cut form the document. See note below: #Removal of some KRs from the plan. LWyatt (WMF) (talk) 18:45, 24 June 2023 (UTC)Reply

WE3: Knowledge Platform

Latest comment: 1 year ago26 comments13 people in discussion

comments relating to Bucket 1: Wiki Experiences, Objective 3

Result 1 Developer tools

WE3 KR1 says: "Reduce fragmentation in developer workflows, measured by X% increase in adoption of officially supported developer tools".

What does it actually mean?

What are "officially supported developer tools"?

Does this include tools for developing templates and modules? The creation and maintenance of templates and modules is development, and it is arguably more relevant to the experienced editors' experience than the PHP and JavaScript things in Wikimedia Gerrit and GitLab. If it only includes PHP and JavaScript things in Wikimedia Gerrit and GitLab, please say so explicitly.

Does this include translatewiki? By itself, submitting translations on translatewiki is not exactly development, but it is definitely a part of the development cycle of MediaWiki core, extensions, apps, and many other related tools. Is it "officially supported"? If it is, does "increase in adoption of officially supported developer tools" include encouraging developers who don't yet use translatewiki for the localization of their tools to move to translatewiki? Amir E. Aharoni (talk) 07:23, 19 April 2023 (UTC)Reply

We will need to define what "officially supported developer tools" means. To me it is having clearly defined workflows that promise some level of support. For example we do not have an official local developer environment right now, there are three main ones. I would like there to be a clear workflow where people can expect a certain level of support. We do have other workflows such as the Train which run on a certain schedule and there are clear expectations.

I do not expect us to have official workflows for everything in the first year and doubt we would touch tools for developing templates and modules through this KR in this timeframe. I also don't think it is reasonable to draw the line at Gerrit and Gitlab either. Do the people creating templates and modules view themselves as developers? I had the feeling they don't or is it more clear to say software development to separate kinds of development? KChapman (WMF) (talk) 16:45, 20 April 2023 (UTC)Reply

While i can understand why it might be reasonable to include template folks with developers (they are writing programs after all), i feel like expanding [mediawiki] developer to include everything technical causes us to lose sight of what we are doing, since all these groups are really different and when combining them the resulting group is too broad to do or say anything useful about. Bawolff (talk) 14:08, 21 April 2023 (UTC)Reply

To build on what @KChapman (WMF) said, the explanation for the key result tries to define the intended scope at this time: The goal of this key result is to provide standard development tools that meet the needs of most Wikimedia developers. I can see how there may still be some room for confusion in this depending on what parts of the technical contribution spectrum you are currently most concerned with.

As Kate implies with her statements about local developer environments the subset of Wikimedia technical development being considered here is code contributions to MediaWiki core, MediaWiki extensions, and ideally also a subset of services (like Thumbor for example) that are used in the Wikimedia production environment to power the project wikis.

There are certainly many additional areas of technical contribution which also deserve attention, but we also need to be careful to focus our goals narrowly enough that we are able to actually make some measurable advances rather than just taking time to catalog our grievances and wishes. -- BDavis (WMF) (talk) 14:43, 21 April 2023 (UTC)Reply

Result 2 Committed patches

Although I don't have data to back this up, my impression is the real bottle neck is going from a few commits to active contributors. My impression is that it is relatively easy to get people to do the first few commits, they then generally have a bad experience, and leave. The real problem is going from 5-50, not 0-5. Bawolff (talk) 15:59, 12 April 2023 (UTC)Reply

I share in the impression Bawolff describes above — plenty* of (new and experienced) developers submit "drive-by patches" for issues they've personally experienced or during outreach events (hackathons, good first tasks etc.), but then stop committing. I personally attribute a great deal of this "new developer attrition" to our perhaps lacking approach to volunteer code review. This objective should attempt to resolve these points of contention, and as alluded to above, using a low metric of 5+ commits may not meaningfully increase the number of people willing and able to contribute to the MediaWiki code base — TheresNoTime (talk • they/them) 16:15, 12 April 2023 (UTC)Reply

Recently I feel the struggle has been convincing people that in fact, they can contribute to the software and it's not just limited to WMF staff. I would like to see WMF leadership take an active stance in countering that narrative rather than (unintentionally?) reinforcing it. Legoktm (talk) 06:38, 13 April 2023 (UTC)Reply

We felt that 5 commits was the tipping point beyond a couple drive by commits, I think there are different ways to analyze the patch data and I personally think 50 is too high, but I'm not 100% sold on 5 being the right amount either. Is there a different persona for a contributor between 5-50? Say what is the average situation for someone that has made 10 patches? KChapman (WMF) (talk) 19:18, 13 April 2023 (UTC)Reply

One thing that occurs to me, is that this goal can be read 2 (or 3) ways - get more external to WMF commiters or get more inter-team commits (or both). In the past, (I feel) WMF has often done outreach initiatives revolving around getting people to make their first few commits. I feel (i.e. have no data to back up) that these have often been a waste. People are coached on how to make some trivial commit, fix a spelling error in a code comment or something, get that trivial commit merged. They then try to do something slightly non trivial and are immediately thrown to the wolves. They inevitably get frustrated and leave. Often this is over code review, which can often require having social connections to the right people (And of course knowledge/mentorship on how to write good, easy to review code). Both WMF staff and non-staff experience this, but WMF staff have more resources to deal with it. Anyways, I think there are many talented people interested in contributing to software used in WMF production, however I believe our culture turns them away. Unless we fix that, everything else is a band-aid solution at best. Bawolff (talk) 01:24, 14 April 2023 (UTC)Reply

Yes, this KR (outcome) is intentionally not making a distinction between staff developers and volunteers - we think an increase either way is a good thing, and we'll be happy with progress on both sides. The work required to get to this outcome (which we haven't specified yet) should include taking away any existing barriers, including (as you indicate) addressing issues in our culture, or bottlenecks/difficulties within code review. -- Mark Bergsma (WMF) (talk) 11:28, 19 April 2023 (UTC)Reply

I'm a bit confused by the wording of this, is it intended to be about contributing to any MediaWiki related repository that's used in production or core specifically? Taavi (talk!) 16:15, 12 April 2023 (UTC)Reply

A 20% increase in the number of people who have ever committed code to MW core (510 * 20% = 102 new people) would be quite an ambitious goal indeed! On the other hand, it could be quite a meaningless goal if the repo chosen is something that currently only 1-5 committers in it. Bawolff (talk) 16:29, 12 April 2023 (UTC)Reply

The idea is we would pick a few repositories to focus on. Those repositories would need to be one used on Wikimedia Production systems, rather than repos that are used elsewhere. KChapman (WMF) (talk) 19:16, 13 April 2023 (UTC)Reply

That makes a lot of sense to focus on something specific. However, the goal is very different if we're talking about something like mw:Extension:ApiFeatureUsage (Currently 6 non-bot authors w/ 5+ commits) vs mw:Extension:Wikibase (139 not counting submodules). Bawolff (talk) 01:24, 14 April 2023 (UTC)Reply

I want to commit code to MW core and at the moment I do not have the experience how to do it. I think I am able to learn it and so I think it is a great goal to increase the number of people who contribute to Mediawiki. I think it is not possible to measure it in that way and I think such exact metrics are not necessary at least for me.--Hogü-456 (talk) 20:01, 17 April 2023 (UTC)Reply

Being one of the few staff members that currently do a lot of drive-by reviews I must say I'm really not happy with how this is phrased. It puts the blame on developers like myself when this really shouldn't be the problem. What do we win when we make more people submit even more patches? We can hardly keep up anyway. How does an increased amount of work help "getting teams unblocked"? That's a cross-team communication problem, not something that can be solved with "more patches", and especially not with patches from volunteers. We need a decrease in patches that get never reviewed, not an increase. We need to decrease the amount of technical debt we keep around. We need to undeploy rarely used features that are effectively unmaintained and become serious security risks over time. To enable us to do this we don't need developers. We have them, and they are happy to delete code. We need product owners that are willing to make these decisions. Furthermore, writing code is an act of knowledge transfer. You can't use commits – or lines of code, or really anything – to measure "knowledge" when the commits are nothing but – let's say – a library upgrade. Please bring this OKR back to the drawing board. --Thiemo Kreuz (WMDE) (talk) 15:24, 18 April 2023 (UTC)Reply

First off, I totally agree with you on undeploying unused features and decision making.

The idea here is not to get more patches submitted, but to have more people successfully submit patches. My (personal) thinking is that the people who are currently doing a lot of work on MW core and friends should spend mroe time training others to be able to do this work. In other words, we want to flatten the distribution: instead of having a few people submit a lot of patches, better share the work among more developers.

The best way to achieve this, in my mind, is to clarify component boundaries, improve code quality, and grow knowledge sharing between teams. DKinzler (WMF) (talk) 09:40, 19 April 2023 (UTC)Reply

It should really not talk about "increasing numbers" then but about distribution, ownership, knowledge sharing, and – as said – reducing the number of unmerged patches. --Thiemo Kreuz (WMDE) (talk) 07:20, 20 April 2023 (UTC)Reply

The OKR framework forces us to specify a single number, for better or worse...

The number of unmerged patches isn't really the concern. The idea is to increase the number of people who effectively take ownership of the code. -- Duesentrieb (talk) 16:07, 20 April 2023 (UTC)Reply

Please make it a useful number then and not something that reads like "more workload for everybody". Specifically:

Why do we need to "increase the number of authors"? Where do these people come from? Is this about staff? Are we going to hire more people? If not, doesn't this imply the number of people working on something else must decrease? If it's not specifically about staff, how does it make sense to put full-time devs and volunteers in the same bucket?
What does "committed" mean? Does it matter if these commits are merged or not?

--Thiemo Kreuz (WMDE) (talk) 14:50, 21 April 2023 (UTC)Reply

This KR (outcome) is about getting more people working on, and gaining knowledge about, MediaWiki, with their code actually ending up (getting merged) in production. Staff and volunteers alike. That could mean some staff not already working in MediaWiki, spending some of their time getting familiar with it, or indeed attracting more volunteers to contribute. If the bottleneck and obstacle is code reviews, then we should prioritize that as well as part of the work to achieve this KR, just like the other two KRs under this objective focus on reducing/removing some of the other barriers that currently exist in people effectively contributing and collaborating (like developer tooling, and disagreement on technical direction/policies). -- Mark Bergsma (WMF) (talk) 14:48, 24 April 2023 (UTC)Reply

I'm terribly sorry, but I'm not sure we talk about the same sentence. Increase by 20% the number of authors that have committed more than 5 patches in a specific set of MediaWiki repositories that are deployed to production. Neither of the relevant words appears in this quote:

"Committed" means neither reviewed nor merged. At least change it to something like "got successfully merged".
How is the number of patches related to transferring knowledge? If there is any causal relationship between the two it's more the opposite: People with a lot of knowledge tend to submit and review less patches – but more complicated ones. And no, "complicated" doesn't mean "big".
Where are these volunteers supposed to come from? What would be their motivation to touch MediaWiki code if they have never done that before? Are we willing to let them actually own products the technical community cares about?
What would be the motivation for staff members to touch code that's not part of their job?
How does it make sense to put the two groups in one bucket? They are not "alike", not even remotely. The unclear relationship between the two roles is part of the problem. Can we please not do this and talk about them separately?

Thiemo Kreuz (WMDE) (talk) 16:29, 2 May 2023 (UTC)Reply

Result 3 Technical strategic direction

Product and Technology leadership has identified key areas where strategic direction is needed to increase the impact of technical work. So...what are those areas? Out of the examples given, I don't think the issue is lack of strategic direction. I think the issue lies with the WMF actually following through with the things that have been previously discussed for years worked on for a few months and then entirely ignored.

We have had plenty of discussions around "support for MediaWiki outside Wikimedia", including hiring people to fill those roles, and then nothing. The same could be said about "creating a policy for open-source software". Legoktm (talk) 06:35, 13 April 2023 (UTC)Reply

There have been discussions, but is there a clear policy on how we support MediaWiki usage outside of Wikimedia? I'm not aware of one. KChapman (WMF) (talk) 19:18, 13 April 2023 (UTC)Reply

Its not really clear to me what a policy on "support MediaWiki usage outside of Wikimedia" means in this context or what its scope would be. (I appreciate you can't tell me the details of a policy that doesn't exist yet, but this is such a broad description that it can include anything from "We decided to buy out miraheze to actively support external usages" to "We'll try not to screw over external users as long as it isn't too inconvenient". Grounding this into some context of what sort of things we're talking about, might be helpful). Bawolff (talk) 01:37, 14 April 2023 (UTC)Reply

I agree with @Legoktm that historically the hard thing has not been "having a discussion" or "creating a policy" but *resourcing the work required*. Absent a concrete commitment to either properly resource the policy (regardless of cost) or upfront expectation-setting (ie, "you can create any policy you want but you will only have 3 FTEs over an initial 5-year period to implement it") it seems this OKR is likely to end up as previous efforts have: with a shared agreement that "something should be done" and even strong ideas "what that something is" but then no resources to actually do anything. For context: phab:T113210, part of the 2016 Developer Summit (and a regular annual feature of summits since then). Cscott (talk) 16:34, 19 April 2023 (UTC)Reply

SDS1: Defining essential metrics

Latest comment: 1 year ago4 comments3 people in discussion

comments relating to Bucket 2: Signals & Data Services, Objective 1

It is great to see WMF minding about metrics, the metrics are what show us if we are doing a good job or not, what makes positive impact and what not. Do you plan to ask volunteers opinions for the metrics definitions? I would like to give some ideas for metrics and also help collect the data. For example, I created a graph of the evolution of the referenced articles proportion in ptwiki, that shows the percentage of the articles that has sources increase each year, and at some point in the future it will reach the 100%, I think that is a good metric for content. Another example is the user retention tool, when some event increase the user retention we can note the colors of the horizontal line in the month of that event are slightly stronger, what can help identify the impact of some activities in the collaboration. I think it is important choose metrics that can be used to identify events that have a significant impact, when we identify such event we can encourage that kind of activity when the impact is positive and discourage when it is negative or had a lot of work for no impact. For example, in that references graph I discovered that the jumps in 2013 and 2014 were bots adding sources to thousands of articles about cities, something that should be encouraged. Danilo.mac talk 01:18, 16 May 2023 (UTC)Reply

Hi @Danilo.mac! Yes, we plan include community perspectives as we refine definitions or identify new potential metrics. One of our key results under Signals & Data Services is about establishing and implementing a process to ensure that our metrics continually evolve to support data-informed decision making (see SDS2: Making empirical decisions, key result 4), and we'll need to incorporate community input into that process.

In addition, under SDS1: Defining essential metrics, our goals include more clearly defining existing metrics and providing public documentation about them. By making the documentation publicly available, we hope that community members will be able to engage with the metrics and provide feedback about ways we can improve them.

To your point about asking for opinions from volunteers, I think we need to explicitly identify how we plan to gather feedback from community members as part of our project roadmaps.

Regarding the work you'd done to analyze references in Portuguese Wikipedia - yes! I agree that looking at sources is helpful for evaluating content. The Research team incorporated article reference counts into a language-agnostic model to determine article quality - you can see details about the model here: Machine learning models/Proposed/Language-agnostic Wikipedia article quality. We're using this version of the model in our annual plan metric for content (Wikimedia Foundation Annual Plan/2023-2024/Goals#Content: Increase quality and reliability of encyclopedic content). KZimmerman (WMF) (talk) 23:37, 9 June 2023 (UTC)Reply

Thanks! I didn't know that language-agnostic quality evaluation system, in ptwiki we have a system that uses very similar parameters, I commented now about it in the talk page. Danilo.mac talk 18:00, 12 June 2023 (UTC)Reply

--note: with regards to SDS1.2, this has now been cut form the document. See note below: #Removal of some KRs from the plan. LWyatt (WMF) (talk) 18:45, 24 June 2023 (UTC)Reply

SDS2: Making empirical decisions

comments relating to Bucket 2: Signals & Data Services, Objective 2

SDS3: Using and distributing data

Latest comment: 1 year ago7 comments6 people in discussion

comments relating to Bucket 2: Signals & Data Services, Objective 3

How problems with blazegraph will be handled is left ambiguous; migrating away from bg (or clarifying that this isn't happening?) could be more explicit. Wikimedia's choice of graph database has significant ripple effects across the free knowledge ecosystem, and opens opportunities for collaboration. Leaving it in limbo while talking about mitigations like sharding the graph adds uncertainty and doubt for some contributors (who could otherwise help with this / be better WD contributors). Implementing a migration may be a binary / unsatisfying OKR but there are also aspects of performance of the result... –SJ talk

ElasticSearch, the search software used by WMF, is lacking in language support. ElasticSearch uses things like snowball stemmer (https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-snowball-tokenfilter.html) for language support. It would be interesting to start an collab here, get information from communities on available language information, like tokanisers, stopwords, etc. and see how hard it would be to intergrate them into ElasticSearch. I think the communities themselves may not have the technical expertise to do this entirely themselves.--Snævar (talk) 19:09, 13 April 2023 (UTC)Reply

Improving language support is something that we already do regularly, but that is probably not getting the visibility it deserves. We are working on a blog post to explain better the work that has been going on over the last 2+ years in improving language support (I'm not sure yet when this will be ready, but hopefully in the next few weeks). Note that this work could also be aligned with WE2.1 and WE2.2 (Reading and media experience). If you want to dig a bit more into what we've been doing on that front, you could have a look at task T219550 and its subtasks. Or in the "Language stuff" column of the Discovery-Search Phab board. GLederrey (WMF) (talk) 13:47, 17 April 2023 (UTC)Reply

The blog post in question has been published! Trey Jones (WMF) (talk) 14:26, 1 May 2023 (UTC)Reply

In the past, we've generally focused on adding support for languages that don't have any language-specific components, fixing identified bugs, and upgrading new components when they come out. I'm always on the lookout for open-source stemmers or other components that can be ported or wrapped for use with Elasticsearch. We've ported or written components for Khmer, Slovak, Korean, Esperanto, Polish, Bosnian, Croatian, Serbian, & Serbo-Croatian, Chinese, Hebrew, and Ukrainian. We've developed custom config using existing Elastic components for Mirandese and Nias. We looked into components for Vietnamese and Japanese that didn't get deployed because they had problems, and I tried to develop custom config for other languages, but the speakers I was working with lost interest over time (it can be tedious). All of those projects were driven by finding new open source components, bug reports, or working with motivated speakers (my favorite!). If you know of specific components that are either new or significantly better than our existing components, I'd love to hear about them. Same for descriptions of specific non-stemming problems for any language, to see if there's a straightforward way to address it. Phab is a good place to discuss in detail; tag me! And as Guillaume said, you can see our current backlog of language-related tasks in the "Language Stuff" column on our workboard. Finally, note that Elasticsearch supports the Snowball stemmer, but it isn't the default stemmer for every language. I believe the default stemmer is the "stemmer" token filter, which uses different algorithms for different languages (most of which are just wrappers around algorithms provided by Lucene), and has multiple options for many languages. We also use third-party language plugins. The Snowball stemmer is used for some of them, but it isn't the only one or the most common one. Trey Jones (WMF) (talk) 14:33, 17 April 2023 (UTC)Reply

While the above two things are certainly interesting, they seem like bad OKRs as they are much too focused on a specific solution. Bawolff (talk) 00:52, 14 April 2023 (UTC)Reply

Our goal in the crafting of OKRs is to describe in the "objectives" the outcome we want to see, and then what "key results" would indicate we are progressing to that outcome. By intention, they are not meant to be prescriptive about how we intend to achieve those outcomes. That will be documented in hypotheses, which the teams will be working on developing next week. Part of the reason to do this is to be clear about who is accountable for what. In my role (VP of Data Science and Engineering), I will have accountability for certain objectives, and some of the directors reporting to me will have accountability for certain key results. That gives those teams the latitude to try out different hypotheses to see which are most effective at achieving the outcomes and key results we seek. -- TTaylor (WMF) (talk) 21:11, 14 April 2023 (UTC)Reply

--note: with regards to SDS3.2, this has now been cut form the document. See note below: #Removal of some KRs from the plan. LWyatt (WMF) (talk) 18:45, 24 June 2023 (UTC)Reply

FA1: Describe multiple potential strategies

comments relating to Bucket 3 (Future Audiences), Objective 1

FA2: Test hypotheses

Latest comment: 1 year ago10 comments4 people in discussion

comments relating to Bucket 3 (Future Audiences), Objective 2

These are fantastic. Exactly the sorts of things I'm excited to see. It feels like burying the lede for it to come at the very end; and from the first round of this draft, may have only 5% of the total focus? If we know we want to reach audiences on other platforms, and to use generative AI and language models to empower users in constructive ways, the question is how we can quickly test and validate and make progress on promising approaches. But "test hypotheses for potential strategies" sounds more tentative that that. Can we make goals around the two specific examples here less tentative?

There's also a ripple effect here: deciding we're going to do curious, fun experiments in these areas feels like an invite to creative community members already doing their own things along those lines; whereas private/internal research for future reports that may or may not be acted on feels exclusive [or even negative to people who are already seeing success in these arenas and would feel distanced by the idea that someone else has to prove that to their own satisfaction]. –SJ talk 23:51, 12 April 2023 (UTC)Reply

Thanks for the feedback, SJ! Breaking out your questions & providing some replies:

RE: "feels like burying the lede/only 5% of total focus" Couple of points on why 5% and why it's not a primary focus in the annual plan:

First, our intention with this year's plan is to focus very strongly and intentionally on supporting use-cases and personas that historically haven't gotten as much attention from WMF Product & Tech but are critical to the success of our projects (e.g., experienced editors, moderators). It's a tradeoff we're making and does mean we have fewer resources to explore future scenarios and strategies, but I think it's the right call to ensure that we're serving our current users first and foremost, while still carving out some space to explore what future users could need.
Second, we've never intentionally focused on exploring future audiences in this way at WMF, and there's a lot of groundwork to do to ensure that we're doing so strategically. We don't want to be devoting a big chunk of Product & Tech resources to doing stuff before we even know what the stuff is, and we definitely aren't ready to make a big bet on the One Big Killer Feature That Will Solve The Future. The current thinking is that as we learn more through focused attention and lightweight experiments over the course of this year, that thinking will filter into and influence next year's annual plan buckets. This goes into your other point...

RE: "test hypotheses for potential strategies" sounds more tentative that that. Can we make goals around the two specific examples here less tentative?" Yes! We'll be doing so over the coming month and will have more concrete goals to share. But as stated above, the goals for this track are a little different than other AP buckets – we want to be a) monitoring where future trends are pointing and b) testing assumptions that help us gain confidence in knowing what those big future bets could be. For example, (completely hypothetical example) if we:

see/learn over the course of the year that the world is becoming inundated by AI-generated misinformation
assume that people would be really excited to take refuge in a knowledge platform that isn't awash in low-quality knowledge
assume they also increasingly want to interact with knowledge via an AI-assisted natural language interface instead of old-school search...

... one goal could be to test the above assumptions with an AI-assisted Wikipedia browsing experience. The goal wouldn't be to treat that as a finished product to build out and maintain forever, but to create some kind of test/MVP/prototype to help prove those assumptions right (or wrong), and if we proved them right, you could imagine a big chunk of next year's AP focused on "Wiki(AI)Experiences." (Again, completely hypothetical – this is just one set of assumptions.) But the "Future Audiences" bucket would continue to exist and continue to stay small and lean to continue to poke into the even farther future.

RE: creating an invite to creative community members already doing their own things along those lines vs. private/internal research for future reports that may or may not be acted on, which feels exclusive or even negative... Totally with you. One of the goals we already know we have is to make this knowledge open & available to community members and invite anyone who is curious about/excited by future possibilities to join us on this journey. Stay tuned for more soon!

Hope that helps, and please let me know if you have more questions! MPinchuk (WMF) (talk) 15:29, 13 April 2023 (UTC)Reply

Result 1: Global Youth Audiences

The first hypothesis (still) says: "One of the strategic directions we're sure we want to investigate is around the spreading of free knowledge on other platforms, like YouTube, Instagram, etc. A tremendous amount of knowledge is consumed in these places for free" (emphasis added). YouTube is not spreading *free knowledge* because the knowledge is not *free* (as in speech) -- it is copyrighted, it can not be redistributed or remixed, etc. The "standard YouTube license" grants rights to YouTube for distribution, it does not create a contribution to the universe of free knowledge. Similarly, knowledge on these platform is most frequently not consumed "for free", it is explicitly ad-supported. The viewer's attention is being sold to pay for the content. (If that is "for free" than why don't we have ads on Wikipedia?) I think it would be best if we could fix this language; cf en:Gratis versus libre -- if nothing else, we will need to provide guidance for translation into those languages that actually make these distinctions, when we say that YouTube videos are "libre" and that consuming YouTube content is "gratis". One possible reworking: "One of the strategic directions we're sure we want to investigate is around the use of free (libre) knowledge from sources like Wikipedia to create content on other platforms, like YouTube, Instagram, etc. A tremendous amount of knowledge is consumed in these places, apparently "for free" (gratis), although most of it is advertising supported." Cscott (talk) 15:32, 19 April 2023 (UTC)Reply

Just adding a note here that Cscott & I chatted about this a bit offwiki – we'll keep this clarification in mind when we move to the next phase of documenting this work (likely in a new standalone page onwiki)! MPinchuk (WMF) (talk) 13:56, 3 May 2023 (UTC)Reply

Result 2: Conversational AI

"A technology that looks like it will be transformative in the free knowledge ecosystem". It's just another search interface. Sure, let's explore it. But please, please don't quote OpenAI advertising as if it would prove anything. The companies goals are quite the opposite of what the free knowledge movement needs to aim for. Please stay away from that. It's just not necessary and certainly doesn't make us sound more trustworthy. --Thiemo Kreuz (WMDE) (talk) 14:34, 18 April 2023 (UTC)Reply

Hi @Thiemo Kreuz (WMDE): If I'm understanding correctly, you sound very skeptical of AI/chatbot technology, and OpenAI as a company in particular. I want to note that the example I listed above came from one of the 100+ Wikimedians who attended the community call on AI last month. Their idea was to build AI-assisted search (not OpenAI specifically – they are far from the only AI model provider) on Wikipedia to attract readers to our platform, rather than allow companies like OpenAI to use our content in their products and establish a monopoly on knowledge search.

To your other point, it's entirely possible that AI will be "just another search interface," but search interfaces are currently very critical in bringing readers to our projects (75% of reading sessions on Wikipedia start from a search engine), so understanding how we can continue to get readership and contributors on popular AI assistants if they do become the new Google is pretty important.

What the future will look like and which strategy to pursue to sustain our projects and community (e.g., building our own AI assistant on our projects, finding ways to attract readers on external AI assistants, or something completely different/unrelated to AI) is a difficult question with no easy answers, but the aim of Future Audiences work is to provide more data to help our movement understand the potential benefits and tradeoffs of multiple potential futures. As I said to SJ above, we're not placing any bets on any one strategy, technology, or company – we're testing assumptions and providing recommendations for future investment.

Does this help clarify? Please do let me know if you have more questions! MPinchuk (WMF) (talk) 23:23, 18 April 2023 (UTC)Reply

Sorry, I should have been more clear. I was referring to that quote on the content page: "just 2 months after launching, ChatGPT had 100 million active users, making it the fastest-growing new consumer web application of all time." That's corporate speech: Why does it need to say "just"? What's their definition of an "active user"? What's the definition of "fast growing"? What's a "consumer web app"? If all of this means anything than that there is something extremely fishy going on. Let's please stay away from that and not use this quote as if it would prove anything. I'm sure we are able to find better arguments why it's a good idea (and I agree it is!) to invest donor's money into machine learning. --Thiemo Kreuz (WMDE) (talk) 07:31, 19 April 2023 (UTC)Reply

Ahhhh, I see! Completely heard and understood on that. Thank you so much for clarifying and pointing that out.

Did you attend the AI community call in March? If this is something you're interested in, I'd be happy to add you to the list of community members to ping for future calls/updates/input sessions! MPinchuk (WMF) (talk) 15:10, 19 April 2023 (UTC)Reply

I agree with Thiemo regarding the overall tone of this explanation, which seems rather hype-y and still lacks any mention of misinformation and knowledge integrity, which the research team had explicitly flagged on the previous draft. I'd like to see us putting our WMF perspective and values into the explanation to describe why *we* should be working on this. Cscott (talk) 15:32, 19 April 2023 (UTC)Reply

@MPinchuk (WMF): Sure, I was there. I would be happy to be on that list. Thiemo Kreuz (WMDE) (talk) 06:57, 20 April 2023 (UTC)Reply

Conversations and communication

Latest comment: 1 year ago8 comments8 people in discussion

It has been approximately one week since this "draft OKRs" document was published and thereh have been a bunch of different people provide their feedback so far. By name, thank you to each of you: @Thiemo Kreuz (WMDE), @Vermont, @Galobtter, @Suffusion of Yellow, @Hogü-456, @Ponor, @Sj, @0xDeadbeef, @Bawolff, @Snævar, @Legoktm, @TheresNoTime, @Taavi, @Shushugah.

As a kind of meta-comment to the feedback you've given, I would like to know where you found out about this page? And, where do you think it would be most useful to tell other people about it? There is a lot more information and documentation about WMF Annual Plan coming on Meta very soon, and that will be shared widely (with a wide range of "community conversation" calls, translations etc.). But above that that, I am especially interested to know if you, the people who have already commented here, are aware of groups and individuals who you know would be interested to read/comment here but probably don't know about this page. I would like to make sure they are not missed. I hope that agree with me that there is fast, direct responses from the relevant WMF Product/Technical staff to your comments here, so hopefully other people will benefit from that too. You are welcome to spread awareness of this document - especially into parts of our movement that don't normally visit Meta - and I look forward to reading any suggestions you have too.

Sincerely, LWyatt (WMF) (talk) 21:25, 18 April 2023 (UTC)Reply

I found about this page from a post at en:WP:AN specifically asking for comments regarding WE1.2 since it is about improvements for "admins, patrollers, functionaries, and moderators of all kinds". Both WE 1.2 and WE1.4 could benefit from more admin/functionary comments so think it'd be useful to circulate this to adminstrator's noticeboards (or the equivalent) of various wikis. Galobtter (talk) 21:43, 18 April 2023 (UTC)Reply
I found out about this page from Galobtter's post at enwiki's edit filter noticeboard. 0xDeadbeef (talk) 01:04, 19 April 2023 (UTC)Reply
i found out from wikitech-l. Bawolff (talk) 22:14, 18 April 2023 (UTC)Reply
I was pointed to this page by a colleague. I think parts of it are relevant for the same or a very similar selection of people as in the technical decision forum. --Thiemo Kreuz (WMDE) (talk) 09:04, 19 April 2023 (UTC)Reply
It was highlighted in the Product Department meeting on April 19. Cscott (talk) 15:33, 19 April 2023 (UTC)Reply
I read the announcement on Wikitech-l.--Hogü-456 (talk) 20:53, 19 April 2023 (UTC)Reply
I only became aware when Galobtter pinged me. Suffusion of Yellow (talk) 18:57, 24 April 2023 (UTC)Reply

reader experience / accessibility and AI

Latest comment: 1 year ago1 comment1 person in discussion

On the subject of reader experience / accessibility / blind readers, my question is whether WMF could enter into a partnership with producers of image description ai. Still hardly any image in Wikipedia is provided with alt text (let alone imagine description for the blind). We can't expect images to be voluntarily described by people. There are very promising image description ai since a few months, e.g. very good this one as a by-product of an image generator ai : https://minigpt-4.github.io/ (alas it's from Saudi Arabia I must add). Would it be possible, for example, to program a bot that searches for images in Wikipedia without alt text, lets alt texts of these images be written in the image description ai, which are then stored in the images, for example in wiki Commons? KH32 (talk) 06:15, 18 May 2023 (UTC)Reply

Removal of some KRs from the plan

Latest comment: 1 year ago1 comment1 person in discussion

In June, each team in the Product and Tech department developed hypotheses that they think will contribute to achieving the key results (KRs). One of the big things we learned as part of developing and reviewing hypotheses is that we have too many key results for the resources we have. Rather than try to spread resources more thinly, or leave KRs to be picked up later, we are going to remove the following KRs from the annual plan. If we decide to start work on a KR later in the year, we’ll add it back to the plan.

SDS1.2: For three out of the four core metric areas [content, contributors, relevance, sustainability], at least 1 dataset is fully and publicly documented with clear guidance on how to use it to guide strategic decisions.

SDS3.2: Identify and implement a way to measure editor and reader satisfaction with search, evaluate satisfaction, and use the evaluation to inform at least 1 product decision.

WE1.4: X% increase in the share of IP blocks that get appealed, with static or decreasing share of appeals that get unblocked.

WE2.3: Deepen reader engagement with Wikipedia via 0.05% of unique devices engaging in non-editing participation.

Why take it out, only to possibly add it back in? We want to be clear to ourselves and our communities what we’re focusing on, where we are devoting our efforts, and ultimately what goals we are setting for ourselves and what we are accountable for. These changes are reflected on the OKR page.

Posted, on behalf of the Annual Planning coordinators of the Product&Tech department - LWyatt (WMF) (talk) 18:28, 24 June 2023 (UTC)Reply

Wiki Experiences 1.2 project page

Latest comment: 1 year ago2 comments1 person in discussion

Wiki Experiences 1.2 (Complete improvements to four workflows that improve the experience of editors with extended rights...) is one of the larger Key Results in this plan, and has five teams contributing to it. I've put together an overview page detailing which Product teams are working towards this goal and including links to their project pages, where input and feedback is being sought. Please check it out and let me know if anything is unclear! Samwalton9 (WMF) (talk) 09:29, 6 October 2023 (UTC)Reply

Quick ping to folks who commented in the WE1.2 section above in case you're interested - @Galobtter, Suffusion of Yellow, 0xDeadbeef, Snævar, Ponor, Vermont, and Nux: Samwalton9 (WMF) (talk) 09:32, 6 October 2023 (UTC)Reply

SDS KR adjustments following Q1 review

Latest comment: 9 months ago1 comment1 person in discussion

During the review of Q1 work against the annual plan, I made some observations about the delivery of Signals and Data Services work related to the structure of the key results. Our program management counterparts had noticed during regular report gathering that the way our work was organized introduced risks into health and deliverability of projects: there were multiple instances where one person owned multiple KRs, and individual teams were responsible for multiple hypotheses at once. I reflect that during the initial planning process, we took an expansive approach to OKRs, often writing KRs that we want to have as goals but without a reasonable expectation that all the KRs could be completed within the fiscal year. Put simply, there was too much work specified in the annual plan in the Signals and Data Services bucket for the number of people dedicated to the work. I requested that objective owners reduce the number of KRs overall, and with some specific guidance about what to prioritize, we (bucket owner and objective owners, with consultation from KR owners and affected teams) have reduced the total number of KRs to five.

Along the way, objective owners worked to absorb and apply the prioritization guidance, and considered whether work that was specified in KRs contributed directly to other KRs and might more readily be described as hypotheses. We also stopped listing KRs as “on hold”, something that was done in the previous quarter for a KR we could not resource. We believe that the updated set of KRs presents a sharper focus on the most important outcomes for the fiscal year, and that we have improved the probability of successful delivery of this part of the annual plan. We have prepared explanatory text for each of the changed KRs that are now in the plan.

The changes to KR text are published on this page. These changes will take effect for Q3 plan execution, starting in January.

Key Result Changes

SDS Objective 1 Objective text: "Each metric and dimension in our essential metric data set is scientifically or empirically supported, standardized, productionized, and shared across the Foundation."

Objective Overview:

Focus: Ensuring that our metrics are clear and reliable. Leaders and staff understand the metrics and how they connect to their work.
Supports: measuring and showing data in a way that leaders can make timely decisions.

Key Result Changes:

SDS 1.1: "For three out of the four core metric areas, provide at least 1 metric with documented adherence to essential metric criteria."
Status: Retained & reduced scope. We reduced the scope by focusing on documenting the extent to which the core metrics meet the essential metrics criteria instead of committing to ensuring that metrics achieve the criteria this fiscal year. This KR is aimed at ensuring our metrics are clear and reliable.

SDS 1.2 "For three out of the four core metric areas, at least 1 dataset is fully and publicly documented with clear guidance on how to use it to guide strategic decisions"
Status: Removed. Documentation may support other key results, but it is not something that we prioritize for its own sake.
SDS 1.3 "Establish and implement a process to ensure that our essential metrics continually evolve to support data-informed decision making."
Status: Removed. The criteria we established for essential metrics will be applied to our core metrics in the revised SDS 1.1, but we will not implement work related to other essential metrics.
SDS 1.4 "Five annual plan initiatives engage with a core metric as a point of inquiry, to measure and communicate progress, or to inform the direction of resources."
Status: New. Our goal is ensuring leaders and staff understand the metrics and how they connect to their work. The focus is on clarifying core annual plan metrics and relating them to annual plan goals, with Foundation leadership as the intended audience. This KR incorporates some of the intentions and insights from the work previously done under the eliminated SDS 2.2 KR (Four cross-department Wikimedia initiatives adopt a core metric as a measure of progress or impact) and the completed SDS 2.3 KR (For three out of the four core metric areas, publish data reports that display measurements and trends based on core metrics, made available to the public).

SDS Objective 2 Objective text: "Wikimedia staff and leadership make data-driven decisions by using essential metrics to evaluate program progress and assess impact."

Objective Overview:

Focus: supporting data-informed decisions across key audiences and for our products by codifying essential metrics and metric review processes into tools and outputs.
Supports:
- Measuring and showing data in a way that leaders can make timely decisions.
- Building/adjusting our metrics platform so that we can collect/measure data to help product development teams learn and iterate faster.

Key Result Changes:

SDS 2.1: "100% of our defined and produced essential metrics data is consistently described in a data catalog to include provenance and means of production."
Status: Removed. A data catalog being produced as described may instead be considered as a hypothesis to advance other remaining KRs in this objective.

SDS 2.2: "Four cross-department Wikimedia initiatives adopt a core metric as a measure of progress or impact."
Status: Removed. We found this KR was impossible to achieve without the work necessitated by SDS Objective 1. As KR interdependencies make them difficult to manage independently, we chose to remove this KR.

SDS 2.3: "For three out of the four core metric areas, publish data reports that display measurements and trends based on core metrics, made available to the public."
Status: Closed as completed, followed by SDS 2.6 and SDS 1.4. The emphasis in the new key results is now on decision making, not on producing reports. If reports help facilitate decision making, they will appear as hypotheses.

SDS 2.4: "Establish and implement a process to ensure that our essential metrics continually evolve to support data-informed decision making."
Status: Previously moved (had become SDS 1.3)

SDS 2.5: "Four feature teams use shared tools to evaluate and improve user experiences based on empirical data from user interactions."
Status: Retained as-is. This KR is driving our work to develop a metrics platform for product development impact assessment.

SDS 2.6: "Senior Leadership can periodically use a shared tool to evaluate the Foundation’s progress against core metrics."
Status: New. Drawn from SDS 2.3, we have instead tried to describe the result we expected to see from having reports produced – it enables senior leadership to evaluate organizational progress against core metrics. Given that creating a metrics platform for use by the movement is a multi-year effort, this is our first step toward creating the tooling, with a clear audience and use case that we will build on.

SDS Objective 3 Objective text: Users can reliably access and query Wikimedia content at scale

Objective Overview:

Focus: Supporting data-driven workflows across the movement with data products that are trustworthy and fit for purpose.
Supports: fixing WDQS, a key community-facing data product

Key Result Changes:

SDS 3.1: "Wikidata knowledge graph can be reloaded within 10 days for a graph of up to 20 billion tuples."
Status: Retained and updated. While we remain focused on improving the experience of using Wikidata through querying, we aim to better describe the specific measure of impact that addresses one of the largest current risks with WDQS: in the event of catastrophic failure, our ability to reload the data and get it back online in a timely manner is critical.

SDS 3.2: "Identify and implement a way to measure editor and reader satisfaction with search, evaluate satisfaction, and use the evaluation to inform at least 1 product decision."
Status: Previously removed. This KR was removed before the fiscal year began, as we reviewed available resourcing.

SDS 3.3: "For each of the four core metric areas, at least one dataset is systematically logged and monitored, and staff receive alerts for data quality incidents as defined in data steward-informed SLOs."
Status: Removed. The engineering team who would work on this is also working towards SDS 2.6, and so we prioritized that work as more impactful over this work.

SDS 3.4: "Three productionized non-privacy-sensitive essential metric datasets are publicly available."
Status: Removed. While public availability of our data remains important to us, it is a more impactful use of our resources – and therefore more highly prioritized – that we improve our ability to assess our impact in a timely fashion.

TTaylor (WMF) (talk) 01:30, 13 January 2024 (UTC)Reply