Wikidata talk:Lexicographical data

From Wikidata
Jump to navigation Jump to search


Oxford dictionaries

[edit]

Hi y'all,

Soufiyouns made 6 property proposal for Oxford dictionaries. Right now, the proposal are strangely going in various direction and I thought it might be useful to centralize the talk here.

First, the proposal were strangely on Wikidata:Property_proposal/Authority_control instead of Wikidata:Property proposal/Lexemes (I fixed that but some people may have missed some of the proposals, pages where there is already only low participation in normal times).

Mahir256 had a very interesting question on the 3 English dictionaries: « That's enough with the English dictionaries, don't you think? » I'm wondering, how many is "enough"? I don't think it's just a question of number, but also of content. Sadly the examples given are about very common words and the motivation is short, so its indeed hard to see what te value of these identifiers. For example, is there words that can only be found in these dictionary that would make them unique? Also, are they really "Highly authoritative source"? (just because it's published by the prestigious Oxford don't make them magically great ; I add a quick look at the freely accessible part of https://www.oxfordreference.com/display/10.1093/acref/9780191739545.001.0001/acref-9780191739545 and I may have missed something but it's not really impressive).

Then for me, the main problem is that these website are not fully freely accessible. If the value was clear, maybe we could overlook this but the two points together makes me wonder... the main issue is for homograph, without seeing the content, how can I know what lexeme are https://www.oxfordreference.com/display/10.1093/acref/9780191739545.001.0001/b-fr-en-00003-0000001 and https://www.oxfordreference.com/display/10.1093/acref/9780191739545.001.0001/b-fr-en-00003-0000002 (both "a").

What do you all think?

Cheers, VIGNERON (talk) 15:55, 8 July 2024 (UTC)[reply]

@VIGNERON Note that while ordinarily I would not encourage the addition of entirely paywalled reference properties, Oxford Reference is freely accessible to Wikimedia users through Wikipedia Library. I am not opposed to the addition of any of these properties for the facts that bilingual dictionary properties ultimately reduce the amount of time required of non-English speaking contributors to add sourced information to lexemes. However, given the number of existing properties for English, German, and French lexemes in particular those proposals are not a priority from my point of view. عُثمان (talk) 16:33, 8 July 2024 (UTC)[reply]


Hi @VIGNERON, thanks for opening the discussion. I agree with the various points made.
I supported the creation of the English-Italian dictionary property. The Italian language still needs references and identifiers to support it. Especially with regard to the current Italian language.
Searching for a word in Google, moreover, the 'dictionary box' at the top of the page presents results from Oxford Languages.
Regarding all the other properties of Oxford dictionaries, I too am not convinced by the quantity. It is easy for them to remain dormant properties.
Therefore, I will not vote for other properties that do not interest me.
Thanks, Luca.favorido (talk) 05:20, 9 July 2024 (UTC)[reply]

bilingual/multilingual usage examples

[edit]

Hi, for usage example (P5831), what should I do if I want to enter the exact same sentence (translations) in different languages? Do I simply add another value with the other language id? For example if I want to translate 5 example sentences from language A into 2 other languages (B and C), does that mean the P5831 will have 5 x 3 = 15 values? Bennylin (talk) 11:41, 23 August 2024 (UTC)[reply]

Which languages B and C (out of 6000 others) would you choose and why ? --Infovarius (talk) 19:15, 24 August 2024 (UTC)[reply]
A is local language, B is national language, C is international language. Bennylin (talk) 06:14, 26 August 2024 (UTC)[reply]
@Bennylin: is sounds strange. usage example (P5831) is for an example of this exact lexeme ; if you do a translation, then it's not the same lexeme anymore... I would put the translated examples on the corresponding lexemes. What lexeme and examples do you have in mind ? Cheers, VIGNERON (talk) 07:55, 8 September 2024 (UTC)[reply]
no specific lexeme in particular. well, it seems that it's impossible for now then. thanks for the responses. Bennylin (talk) 09:11, 8 September 2024 (UTC)[reply]
Oh, may be you mean to have an example in original language and translations into 2 other languags? What about to use qualifiers with literal translation (P2441) then? --Infovarius (talk) 19:44, 9 September 2024 (UTC)[reply]

What part to integrate in a lemme?

[edit]

Hi y'all,

On Telegram, Mahir256 asked if helping hand (L310405) and a helping hand (L1368364) are two separate lexemes or not? The answer is probably "not" but I figured I should ask here for more point of view.

Also behind this case, the question is more general: what part should be included in the lexeme or not?

In most case, it's obvious and sources are clear (like rain cats and dogs (L1138151) where the verb "rain" is an integral part of the lexeme and is always present).

But in other case (including this one), some source include the article and some don't (same goes for other lexical categories, should we include the verb, the preposition, etc.).

I had a look at Google Ngram for this case and it seems that indeed the article is almost always used before. Can you think of other way to help decide?

@Jsamwrites, Simplificationalizer:

Cheers, VIGNERON (talk) 16:36, 2 October 2024 (UTC)[reply]

I would consider them to be the same Lexeme fwiw.
To decide what to include I think the deciding factor should be what is most useful for reusers. @Denny maybe has some input on this from the Abstract Wikipedia side? LydiaPintscher (talk) 17:11, 2 October 2024 (UTC)[reply]
Thank you for pinging, @LydiaPintscher. I do not have an opinion as a reuser. I am pretty confident that the Abstract Wikipedia use case, as a reuser, will be able to deal with both ways.
As a volunteer I would say to only have one of the two, I guess, because the additional semantics and lexical knowledge seems entirely compositional from one to the other, so having two lexical entries seems superfluent. --Denny (talk) 13:16, 7 October 2024 (UTC)[reply]
@VIGNERON Thanks for asking this relevant question. Interestingly, dictionaries are doing it differently. Where New Oxford American dictionary have helping hand without "a" (Reference: helping hand), Merriam-Webster online dictionary have entries for both "helping hand" and "a helping hand". However, the examples in Merriam-Webster given for "helping hand" have "a". Looking forward to the community decision. John Samuel (talk) 18:06, 2 October 2024 (UTC)[reply]

Hyphenation character

[edit]

After making the Flying Dephyphenator game at https://ordia.toolforge.org/flying-dehyphenator/ I see that a lot of language do not use the hyphenation character to indicate hyphenation point. Instead interpunct, dash or "." is used. We have had a bit of discussion at the property talk at Property talk:P5279. A question is whether the use of interpunct, dash and "." should be regarded as an error or if people want to have it that way. Can I "correct" the character changing it to the hyphenation character in languages I do not know or should I refrain from that? Note there is hyphenation statistics in Synia: https://synia.toolforge.org/#hyphenationFinn Årup Nielsen (fnielsen) (talk) 11:27, 7 October 2024 (UTC)[reply]

Sorry for offtop: what can I do in your game? I didn't get. --Infovarius (talk) 10:09, 9 October 2024 (UTC)[reply]
You can append hyphenation parts ("syllables"). The right ones gives you two points a wrong one minus one point. Danish (Q9035) and Portuguese (Q5146) are probably the best language to play. You can cheat and see the current hyphenation start parts for Portuguese (Q5146) in Synia (Q121294613) here https://synia.toolforge.org/#language/Q5146/hyphenation (there are not that many). Finn Årup Nielsen (fnielsen) (talk) 12:48, 15 October 2024 (UTC)[reply]

Unwanted to deprecated lexem sense?

[edit]

Pedestrian crossing in Swedish has two lexemes, of which one (Övergångsställe, L54968) is the preferred, "correct" word to use, while the other (Skyddsväg, L706581) is a Finlandism and not to be used (but "accidentally" used by many in Finland).

Is there a P-property to mark Övergångsställe as the prefered of the two? Or to mark Skyddsväg as "unwished" or "depreceted". Thank you! Robert (talk) 12:18, 7 October 2024 (UTC)[reply]

@Robertsilen: Is there a source you can provide for the dispreference for skyddsväg? If there is, then for now you could add language style (P6191) desuetude (Q109986704) to the sense on that lexeme and add that source as a reference to that statement. Mahir256 (talk) 17:56, 7 October 2024 (UTC)[reply]
The template on the bottom on the Wikidata:Lexicographical_data page lists a number of values, e.g., obsolete form (Q54943392) and depreciative form (Q54948374). Rhather than language style (P6191) I think they would be used with instance of (P31) or has characteristic (P1552) Finn Årup Nielsen (fnielsen) (talk) 13:44, 15 October 2024 (UTC)[reply]
I think I might correct myself. archaism (Q181970) I would use with language style (P6191). I am not really sure what the differencies are between these items, e.g., obsolete word (Q12237354). Finn Årup Nielsen (fnielsen) (talk) 13:51, 15 October 2024 (UTC)[reply]

What is mo?

[edit]

What is the mo language code supposed to represent, in what capacity is it supposed to be used?

  1. the official language (in Latin script) of Moldova (Q217), the way it was named until 2023 in the Constitution? (nowadays the Constitution simply says Romanian)
  2. an official language (in Cyrillic script) of Moldavian Soviet Socialist Republic (Q170895) (dissolved 1991)?
  3. an official language (in Cyrillic script) of Transnistria (Q907112)?
  4. some form of the language spoken in the past in the historic region of Bessarabia (Q174994) / Principality of Moldavia (Q10957559)?
  5. something else?

Knowing whether this language code is in Latin script or Cyrillic would be a good starting point, because I have seen it used in both, which is unacceptable. Thank you in advance. Gikü (talk) 20:01, 4 November 2024 (UTC)[reply]

Same question about ro-mdI've seen it in lexemes. Gikü (talk) 20:31, 4 November 2024 (UTC)[reply]