[go: nahoru, domu]

Page MenuHomePhabricator

Web2Cit-ResearchComponent
ActivePublic

Watchers

  • This project does not have any watchers.
  • View All

Details

Description

Evaluate Wikipedia's automatic citation generation before and after Web2Cit .

The Web2Cit Research subproject's goal is to (a) develop tools to evaluate coverage of Wikipedia's automatic citation generation before (Citoid) and after Web2Cit implementation (Web2Cit + Citoid), and (b) publish the corresponding reports.

Recent Activity

May 16 2024

Aklapper placed T299727: Add missing parameters to list of citation templates up for grabs.

@Gimenadelrioriande: Removing task assignee as this open task has been assigned for more than two years - see the email sent to all task assignees on 2024-04-15.
Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome! :)
If this task has been resolved in the meantime, or should not be worked on by anybody ("declined"), please update its task status via "Add Actionโ€ฆ ๐Ÿก’ Change Status".
Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator. Thanks!

May 16 2024, 5:03 PM ยท Web2Cit-Research

Apr 23 2024

Nidiah closed T301516: Consider alternative "url" parameters, a subtask of T299727: Add missing parameters to list of citation templates, as Declined.
Apr 23 2024, 1:35 PM ยท Web2Cit-Research
Nidiah closed T301516: Consider alternative "url" parameters as Declined.
Apr 23 2024, 1:35 PM ยท Web2Cit-Research

Jul 12 2023

Aklapper changed the edit policy for Web2Cit-Research.
Jul 12 2023, 10:56 AM

Feb 15 2023

diegodlh added a subtask for T329745: Share a table of URLs for which automatic tests have been uploaded: T317448: Consider returning a result even if the fallback template does not apply.
Feb 15 2023, 2:41 PM ยท Web2Cit-Research
diegodlh added a comment to T329745: Share a table of URLs for which automatic tests have been uploaded.

Some of the target webpages are returning a "non applicable target webpage" error, which complicates collaborators evaluating the tests and writing templates.

Feb 15 2023, 2:41 PM ยท Web2Cit-Research
diegodlh created T329745: Share a table of URLs for which automatic tests have been uploaded.
Feb 15 2023, 2:39 PM ยท Web2Cit-Research

Feb 10 2023

Nidiah added a comment to T322057: Ignore citations having wikilinks.

Thanks for the reminder @Aklapper!

Feb 10 2023, 7:50 PM ยท Web2Cit-Research
Nidiah closed T322057: Ignore citations having wikilinks as Resolved.
Feb 10 2023, 7:50 PM ยท Web2Cit-Research
Aklapper added a comment to T322057: Ignore citations having wikilinks.

Nidiah moved this task from To do to Done on the Web2Cit-Research board.

@Nidiah: Hi, if there is nothing else to do in this task, could you please set the task status to resolved? Thanks a lot!

Feb 10 2023, 11:38 AM ยท Web2Cit-Research

Nov 4 2022

Nidiah moved T322057: Ignore citations having wikilinks from To do to Done on the Web2Cit-Research board.
Nov 4 2022, 5:05 PM ยท Web2Cit-Research
Nidiah added a comment to T322057: Ignore citations having wikilinks.

Some examples of manual data with wikicode:

Nov 4 2022, 5:05 PM ยท Web2Cit-Research
Nidiah updated the task description for T322057: Ignore citations having wikilinks.
Nov 4 2022, 4:39 PM ยท Web2Cit-Research

Oct 31 2022

Nidiah created T322057: Ignore citations having wikilinks.
Oct 31 2022, 6:14 PM ยท Web2Cit-Research

Oct 15 2022

Nidiah created T320877: Save requested URLs for Citoid's response.
Oct 15 2022, 9:57 PM ยท Web2Cit-Research

Aug 29 2022

diegodlh closed T305877: Add Research tab to Web2Cit's landing page as Resolved.
Aug 29 2022, 2:22 PM ยท Web2Cit-Research

Jun 14 2022

Gimenadelrioriande moved T305877: Add Research tab to Web2Cit's landing page from To do to Done on the Web2Cit-Research board.
Jun 14 2022, 9:38 PM ยท Web2Cit-Research

Apr 25 2022

Nidiah closed T305905: Ignore case and comments in citation template names as Resolved.

Added normalization for citation template names in section 2.3 Citation template extraction summary in notebook .

Apr 25 2022, 1:27 AM ยท Web2Cit-Research

Apr 12 2022

diegodlh updated subscribers of T301516: Consider alternative "url" parameters.

Our script is using this column from the list of citation templates already.

Apr 12 2022, 1:52 PM ยท Web2Cit-Research
diegodlh added a parent task for T301516: Consider alternative "url" parameters: T299727: Add missing parameters to list of citation templates.
Apr 12 2022, 1:50 PM ยท Web2Cit-Research
diegodlh added a subtask for T299727: Add missing parameters to list of citation templates: T301516: Consider alternative "url" parameters.
Apr 12 2022, 1:50 PM ยท Web2Cit-Research
diegodlh moved T301519: Validate URLs before calling Citoid from Doing to Done on the Web2Cit-Research board.
Apr 12 2022, 12:58 PM ยท Web2Cit-Research
diegodlh moved T302826: Use custom user agent for API requests from Doing to Done on the Web2Cit-Research board.
Apr 12 2022, 12:57 PM ยท Web2Cit-Research

Apr 11 2022

Nidiah created T305905: Ignore case and comments in citation template names.
Apr 11 2022, 11:33 PM ยท Web2Cit-Research
Nidiah closed T301519: Validate URLs before calling Citoid as Resolved.

URLs having schemes other than http or https or private hosts were excluded.

Apr 11 2022, 11:10 PM ยท Web2Cit-Research
Nidiah closed T302826: Use custom user agent for API requests as Resolved.

Added https://phabricator.wikimedia.org/tag/web2cit-research/ to User-Agent

Apr 11 2022, 10:08 PM ยท Web2Cit-Research
diegodlh created T305877: Add Research tab to Web2Cit's landing page.
Apr 11 2022, 6:09 PM ยท Web2Cit-Research

Mar 31 2022

diegodlh added a comment to T302826: Use custom user agent for API requests.

Thanks, Nidia. I would add Web2Cit-Research somewhere, considering that the script may be used in the future, after the project grant has ended. I think it's OK to keep the other information too.

Mar 31 2022, 4:08 PM ยท Web2Cit-Research

Mar 30 2022

diegodlh updated subscribers of T301519: Validate URLs before calling Citoid.

Note that @Kerry_Raymond has found other URLs that are failing with Citoid. She has listed them here, some of which have been listed in our list of problematic URLs as well.

Mar 30 2022, 9:35 PM ยท Web2Cit-Research
diegodlh added a comment to T301519: Validate URLs before calling Citoid.

As discussed with @Nidiah, ignoring URLs that don't start with http or https is pending.

Mar 30 2022, 7:29 PM ยท Web2Cit-Research

Mar 29 2022

Nidiah added a comment to T302826: Use custom user agent for API requests.

I have been using http://caicyt-conicet.gov.ar/; mailto:nidiahernandez@conicet.gov.ar, I can change it for the the proyect's page in MetaWiki or "Web2Cit-Research" is better?

Mar 29 2022, 3:51 PM ยท Web2Cit-Research
Nidiah moved T302826: Use custom user agent for API requests from To do to Doing on the Web2Cit-Research board.
Mar 29 2022, 3:44 PM ยท Web2Cit-Research

Mar 23 2022

diegodlh moved T299727: Add missing parameters to list of citation templates from To do to Doing on the Web2Cit-Research board.
Mar 23 2022, 1:23 PM ยท Web2Cit-Research

Mar 17 2022

Nidiah added a comment to T301516: Consider alternative "url" parameters.

Hi @Aklapper, sorry for the delayed response, I was out of town for a couple of weeks. Thank you for your reminder and for the link!

Mar 17 2022, 1:05 PM ยท Web2Cit-Research

Mar 14 2022

Aklapper added a comment to T301516: Consider alternative "url" parameters.

@Nidiah: Hi! This task has been assigned to you a while ago. Could you maybe share an update? Do you still plan to work on this task, or do you need any help? Thanks!

Mar 14 2022, 8:44 AM ยท Web2Cit-Research

Mar 2 2022

diegodlh moved T302826: Use custom user agent for API requests from Backlog to To do on the Web2Cit-Research board.
Mar 2 2022, 4:13 PM ยท Web2Cit-Research
diegodlh moved T301510: Optimize large number of Citoid requests for coverage estimation research project from Backlog to Done on the Web2Cit-Research board.
Mar 2 2022, 4:12 PM ยท Web2Cit-Research, Citoid
diegodlh closed T301510: Optimize large number of Citoid requests for coverage estimation research project as Resolved.
Mar 2 2022, 2:34 PM ยท Web2Cit-Research, Citoid
diegodlh added a comment to T301510: Optimize large number of Citoid requests for coverage estimation research project.

Thanks, @Mvolz! We will probably be doing this around the end of March. As mentioned, we will properly identify our requests with a custom user agent. Hopefully we won't cause any disruptions, but please let us know in case we do!

Mar 2 2022, 2:33 PM ยท Web2Cit-Research, Citoid
Mvolz added a comment to T301510: Optimize large number of Citoid requests for coverage estimation research project.

I've been looking into Citoid API request rate limits.

We access the Citoid API through Wikmedia's RESTBase proxy. I found two 429 HyperSwitch errors for exceeded request rates: https://www.mediawiki.org/wiki/HyperSwitch/errors/rate_exceeded and https://www.mediawiki.org/wiki/HyperSwitch/errors/request_rate_exceeded

Here it says that there is a global limit of up to 200 requests per second, but that individual endpoints may have specific limits. However, the Citoid API documentation doesn't seem to say anything about it.

On the other hand, I found this thread where @Mvolz mentions a "1000/10s (100/s long term, with 1000 burst)" limit.

She also refers to how long requests take and timeouts, but I'm not sure what she means. How does time to response affect request rate limit? Say we make 1000 requests at t=0s of which only 500 have returned a response at t=10s, can we make another 1000-request batch now? Or do pending requests count against our request rate limit?

Mar 2 2022, 11:04 AM ยท Web2Cit-Research, Citoid

Mar 1 2022

diegodlh created T302826: Use custom user agent for API requests.
Mar 1 2022, 7:06 PM ยท Web2Cit-Research
diegodlh added a comment to T301510: Optimize large number of Citoid requests for coverage estimation research project.

I've been looking into Citoid API request rate limits.

Mar 1 2022, 7:01 PM ยท Web2Cit-Research, Citoid

Feb 25 2022

diegodlh triaged T302592: Consider making a fast HEAD request to the actual resource before making a request to Citoid as Low priority.
Feb 25 2022, 3:03 PM ยท Web2Cit-Research
diegodlh created T302592: Consider making a fast HEAD request to the actual resource before making a request to Citoid.
Feb 25 2022, 3:01 PM ยท Web2Cit-Research
diegodlh added a comment to T301519: Validate URLs before calling Citoid.

Maybe also check and ignore some of the cases discussed by email, that would be valid URLs, but that would return error 400 from Citoid:

Feb 25 2022, 2:55 PM ยท Web2Cit-Research
diegodlh moved T301519: Validate URLs before calling Citoid from Backlog to Doing on the Web2Cit-Research board.
Feb 25 2022, 2:45 PM ยท Web2Cit-Research
Nidiah added a comment to T301519: Validate URLs before calling Citoid.

Sure, at the current rhythm, that would save us 22h for T301510.

Feb 25 2022, 2:11 PM ยท Web2Cit-Research

Feb 24 2022

diegodlh added a comment to T301519: Validate URLs before calling Citoid.

Indeed, URLs ending in .pdf return 404 error. But they represent only 6.35% of our corpus.

Feb 24 2022, 2:47 PM ยท Web2Cit-Research

Feb 23 2022

Nidiah changed the status of T301519: Validate URLs before calling Citoid from Open to In Progress.

URLs are validated before request using validators.url(url) (see doc: https://validators.readthedocs.io)

Feb 23 2022, 7:43 PM ยท Web2Cit-Research

Feb 22 2022

rominicky added a member for Web2Cit-Research: rominicky.
Feb 22 2022, 2:56 PM