[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option for selecting a language to reconcile with #7

Open
hay opened this issue Dec 7, 2020 · 4 comments
Open

Add option for selecting a language to reconcile with #7

hay opened this issue Dec 7, 2020 · 4 comments

Comments

@hay
Copy link
hay commented Dec 7, 2020

I tried using this tool to reconcile a list of about 100 church denominations (a gist can be found here). Unfortunately, the results were pretty mediocre (only around 5 got matched) because the list is in Dutch while the matching is done using only English labels.

I think it would be a very useful addition to make sure it's possible to set up the language code. For both the OpenRefine reconciliation endpoint as well as the WD query service this is very easy. Also see my wdreconcile tool for some inspiration on how something like that could be done.

@thadguidry
Copy link

Hi @hay did you happen to see and try using the lowercase language codes for the labels, descriptions, aliases, and sitelinks?
For instance, you can use this syntax Lnl for Language=NL (Netherlands)
We (OpenRefine Wikidata recon service maintainers) have this documented for Wikidata reconciling with OpenRefine at the following doc location: https://wikidata.reconci.link/

@shigapov
Copy link
Collaborator
shigapov commented Dec 8, 2020

@hay, thank you for testing and for your good example.

I have added automatic language detector using the langid-library. It should easily detect Dutch. Alternatively it's possible to specify 'language="lang-code"' in the annotate- and contextual_matching-functions.

Note, that your example is a one-column table. The important feature of bbw is contextual matching with at least two cells in a row. We augment one-column table by copying the column.

@thadguidry
Copy link

@shigapov I wonder if some of that info could be put into the docs or README.md? That seems useful to know. Hmm, maybe start a new /doc folder and begin putting some .md files in there just as a starting part for users who could also contribute back with PR's!

@hay
Copy link
Author
hay commented Dec 10, 2020

Thanks, i tried running my list again on the same codebase and the language detection works pretty well. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants