Wikidata talk:Living people

From Wikidata
Jump to navigation Jump to search
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 7 days. For the archive overview, see Archive/. The latest archive is located at Archive/2024.

Proposal

[edit]

I think something like this is necessary but it needs to have a little bit of teeth and at the very least live up to the resolution of the board of the Wikimedia Foundation. Asking that the statements be "supported by information in at least one corresponding Wikipedia article, that (in the article) has a citation to a reliable[note 1] source" is not enough because there's no reason to think that the source (even if we identify it as reliable) has anything to do with the statement being added to Wikidata. If a statement is possibly controversial or privacy-invading, then it should not be added without a direct reliable source. In other words, it can only be imported from Wikipedia if we can also import the reference for that claim. Pichpich (talk) 02:37, 15 June 2013 (UTC)[reply]

When writing this, I was hoping for Wikidata to be able to use its own sourcing system. However, I'm not sure that the system is ready yet, and thus put in the Wikipedia sourcing as a temporary measure.--Jasper Deng (talk) 03:18, 15 June 2013 (UTC)[reply]
I guess I have a stricter view on this. We don't have sourcing, ok, let's work on that. But until we have sourcing, we don't add potentially controversial statements about living people, period. It's not a huge problem: there are plenty of other statements that we can add and source (even under the current limitations) and, as far as I understand, tools for sourcing will be available in the near future. Pichpich (talk) 03:09, 16 June 2013 (UTC)[reply]
Currently this policy makes no distinction between potentially controversial statements and statements that aren't. ChristianKl (talk) 08:15, 9 July 2016 (UTC)[reply]

Must and should

[edit]

Given definitions of Must and Should as layed out in RfC2119, is the word must really the right one for this policy? Especially when it comes to information in talk pages.ChristianKl (talk) 08:21, 9 July 2016 (UTC)[reply]

The must is indeed intended in the sense of the IETF sense, because of the importance of verifiability for living persons' information, as stated in the foundation resolution.--Jasper Deng (talk) 08:24, 9 July 2016 (UTC)[reply]
This means that if an lobbyist edits a Wikidata entry you in generel intent to forbid any discussion of the fact that the user account is used by a lobbyist on Wikidata?
Furthermore the Foundation resolution doesn't state that information must be verifiable or even that it should in the RfC2119 sense.ChristianKl (talk) 09:04, 9 July 2016 (UTC)[reply]

Following up a sentence that contains `must` with one that contains `especially` also makes no sense under the RfC2119 meaning. ChristianKl (talk) 09:20, 9 July 2016 (UTC)[reply]

Additions

[edit]

I added a definition section and sections on controversial statements and privacy concerns, based on the above discussion and discussion at Project Chat. Further edits welcome. I am wondering if "blood type" really should be in the controversial category? ArthurPSmith (talk) 15:53, 3 April 2017 (UTC)[reply]

That's medical information and in most countries very strictly protected by privacy laws. Why in the world is there a blood type field for people in any case? so wierd. Jytdog (talk) 09:54, 9 April 2017 (UTC)[reply]
In Japan the blood type of people has cultural significance. There are folk beliefs about how personality correlates with the blood type. As a result the blood type for celebrities is often listed in Japenese pop-biographies and the Japanese Wikipedia frequently includes information about the blood type. It would be worth to check how well referenced the Japanese Wikipedia is in this regard to decide whether we should require sources. ChristianKl (talk) 08:29, 11 April 2017 (UTC)[reply]
wow. crazy. the stuff we learn while doing this work! but oy mixing up celebrity gossip with medical information is a mess. Jytdog (talk) 19:40, 11 April 2017 (UTC)[reply]

Special bot policy?

[edit]

On project chat the issue of special scrutiny for bots editing items about living people was raised - so should there be a special review procedure documented here for this case? ArthurPSmith (talk) 15:55, 5 April 2017 (UTC)[reply]

in my view, heck yes! Jytdog (talk) 09:54, 9 April 2017 (UTC)[reply]
This reminds me of the incident of adding the CHEMBL data about illness drug associations to drug or therapy used for treatment (P2176). It might be worthwhile to think more generally about how we review bot activity. ChristianKl (talk) 08:32, 11 April 2017 (UTC)[reply]
@Jytdog, ChristianKl: Ok, I added another section specifically on bot approval, is this perhaps suitable? ArthurPSmith (talk) 13:16, 11 April 2017 (UTC)[reply]
Could you give an example of a closed database where you want to forbid the data import with this policy? Otherwise I don't see anything objectionable. ChristianKl (talk) 16:13, 11 April 2017 (UTC)[reply]
That third bullet is likely to be a deal killer. If you plan to run an RfC to make this policy it might be wise to take it out of the proposal, and instead propose it as a separate item for people to !vote on. Jytdog (talk) 19:44, 11 April 2017 (UTC)[reply]
Given that the third point uses should and not must, I don't think the point is likely to be a deal killer. ChristianKl (talk) 11:04, 12 April 2017 (UTC)[reply]
ChristianKl - http://www.whitepages.com for example. There are many such services, some of which provide some limited data for free and then charge for more detail on individuals. I don't think any of them should be used, whether from their free data or premium data, as sources for living people info in wikidata. ArthurPSmith (talk) 16:08, 12 April 2017 (UTC)[reply]
Okay, I agree that it's sensible to not directly import the information from databases like http://www.whitepages.com . ChristianKl (talk) 17:42, 12 April 2017 (UTC)[reply]
"adequate BLP policy"? who decides - Asaf ? ; "community is happy with as a reliable source"? what happens when we have a consensus at a community that does not agree with engish? it seems rather top down, and adversive. why are you not creating a LP upload group to coach data uploders. they can develop their standards of practice. why not survey uploaders and ask for their standards. why dictate? Slowking4 (talk) 21:13, 13 April 2017 (UTC)[reply]
We already have a process for approving bots. This proposal is not about having a totally new way to approve bots but just suggests that certain issues should be considered when approving. ChristianKl (talk) 15:04, 14 April 2017 (UTC)[reply]
Do you have some examples of problematic edits by bots, that would be trapped by this? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:07, 14 April 2017 (UTC)[reply]
@Pigsonthewing: I remember one bot that automatically scraped social network pages to link them to individual people. We already rejected that bot with our present way of forming consensus but I think it's valuable to have an explicit policy about what violates privacy instead of making case by case decisions. I don't expect the bot policy as written to change the current status quo of how we deal with bots in a significant way. ChristianKl (talk) 07:54, 30 May 2017 (UTC)[reply]
For BOT we mean also user that use tool like Petscan or QuickStatements? --ValterVB (talk) 15:28, 14 April 2017 (UTC)[reply]
I don't think we need to define the term bot for the purposes of this policy page. The decision of what counts as bot should be up to our bot policy. To the extend that policy is currently unclear about when usage of QuickStatements counts as bot, making it clear is a worthwhile project but I don't think it's the project of this policy. ChristianKl (talk) 08:19, 30 May 2017 (UTC)[reply]

openly supplied by the individual themselves

[edit]

you have a conflict between "unless they can be considered widespread public knowledge or openly supplied by the individual themselves" and "reliable open information sources such as newspapers or other media outlets". i.e.-a twitter disclosure of sexual orientation, or gender identity. do you wait for secondary sources to report, which may be a while for less notable people, or link to a blog or twitter primary source which is less reliable. Slowking4 (talk) 14:08, 16 April 2017 (UTC)[reply]

I think the idea is that we don't want to have a bot that simply scrapes all information from social media profile and copies that information into Wikidata. As a result the standards for bot work are a bit stricter. ChristianKl (talk) 16:28, 16 April 2017 (UTC)[reply]

Information on non-item pages

[edit]

We still need a section about non-item pages. Does anybody have good ideas of how to word it? ChristianKl (talk) 15:57, 18 April 2017 (UTC)[reply]

For non-item pages I would guess wikidata is more similar to the language wikipedia's - enwiki's non-article space policy section could be copied with minor changes for instance? ArthurPSmith (talk) 19:28, 18 April 2017 (UTC)[reply]
That sounds like a good idea. I copied the section from enwiki and made a few adaptions. Feel free to refine the text further. ChristianKl (talk) 21:35, 18 April 2017 (UTC)[reply]
looks good to me, thanks! ArthurPSmith (talk) 19:34, 19 April 2017 (UTC)[reply]

Scope

[edit]

It seems a bit odd that this page outlines what we can say about living people, but not whether we should include them. Broadly speaking, Wikidata:Notability means that no-one who isn't a public figure to some degree should have a Wikidata item, but some of the people we include are very much on the edge of public, and this does feel a little uncomfortable at times. I wonder if we should have a section here that effectively says "if this person is not a public figure, please consider whether it is appropriate to include them". Andrew Gray (talk) 22:28, 29 May 2017 (UTC)[reply]

Currently, Wikidata:Notability says nothing about a person having to be a public figure to be included in Wikidata. It just requires serious and public sources that can be used to describe the person. This policy page is not about changing our current definition of notability. ChristianKl (talk) 10:44, 30 May 2017 (UTC)[reply]
Wikidata will naturally have far more items for living people than a wikipedia for structural reasons - many of our properties want items as values, so we often create the items for people who are related to a notable person or other entity, which certainly doesn't mean those people are public figures in themselves. ArthurPSmith (talk) 14:33, 30 May 2017 (UTC)[reply]

Information retrieved from Wikipedias

[edit]

The information that is included in any of the WMF projects has the same underlying premise; the policy of the foundation. Once a Wikipedia has entered data it must be understood that the data is ok. This is the only way that allows the current practice of populating Wikidata from the Wikipedias. When this is not accepted, current practices are no longer possible and Wikidata will die slowly. Its quality will go down.

There is another side to this, when a Wikidata statement is challenged, it follows that the upstream data is challenged. Now this means that we should seek a closer link with the projects we gain our data from. It means that we accept as good what we share as being the same. As I have argued already all too often, this is where our time on details pays of. Plenty of examples are available but given the sheer amount of data in Wikidata, this policy will destroy Wikidata when it is applied in a Wikipedia ::manner. Thanks, GerardM (talk) 06:08, 11 September 2017 (UTC)[reply]

"Once a Wikipedia has entered data it must be understood that the data is ok". Ideally this would be true. It isn't. And promulgating practices based on that decreases data quality. Nikkimaria (talk) 00:40, 12 September 2017 (UTC)[reply]
"promulgating practices based on that decreases data quality." = false statement; rather, data quality remains unchanged. but better to have a data quality improvement process here, to then push out to the wikipedias, if some people would allow incorporating data from wikidata. you will not increase quality by slogans or by gatekeeping behavior. Slowking4 (talk) 17:52, 12 September 2017 (UTC)[reply]
A basic level of gatekeeping is key to maintaining data quality - not a slogan, just a fact. Nikkimaria (talk) 19:34, 12 September 2017 (UTC)[reply]
no, a basic level of quality control is key to maintaining data quality. the perpetual attempt to do so by the failed methods of gatekeeping, or increasing the scrap rate is not based on factual evidence, it is based on an ideology. Slowking4 (talk) 16:07, 14 September 2017 (UTC)[reply]
"Once a Wikipedia has entered data it must be understood that the data is ok" is based on an ideology, not on factual evidence. What do you see as the difference between "a basic level of quality control" and "gatekeeping"? Nikkimaria (talk) 02:56, 15 September 2017 (UTC)[reply]
yes, your arguments are based on an ideology only, and you have presented no factual evidence. "it must be understood that the data is ok" is a starting point; and "practices based on that decreases data quality" is false; rather all data's quality can be improved. there is no go/no go about data, and you have no rational standard to determine your go/no go. the notion that you only get one bite at the quality apple upon upload or link is false, and not a rational basis to improve quality. clearly you need to go to school on quality. start here: w:W. Edwards Deming. when you can discuss the 14 principles, then we can collaborate, but not before. Slowking4 (talk) 16:45, 15 September 2017 (UTC)[reply]
"Eliminate the need for massive inspection by building quality into the product in the first place". Nikkimaria (talk) 18:21, 15 September 2017 (UTC)[reply]
"Eliminate slogans, exhortations, and targets for the work force asking for zero defects and new levels of productivity. Such exhortations only create adversarial relationships, as the bulk of the causes of low quality and low productivity belong to the system and thus lie beyond the power of the work force." Slowking4 (talk) 02:13, 16 September 2017 (UTC)[reply]
In this particular case, the "system" is indeed within the "power of the work force" - we as a community develop policies, guidelines, and practices that can address causes of low quality and other problems. That's the whole point of this draft, for example - to develop a standard of quality control for a particular type of data. Nikkimaria (talk) 02:54, 16 September 2017 (UTC)[reply]
what have you ever done to improve quality in a non-adversarial way? i see no standard, or work to improve a system, but rather a veto with constantly shifting rationale, "no the quality is still not good enough not to delete"; and "if you persist, i will block you." Slowking4 (talk) 09:11, 16 September 2017 (UTC)[reply]
Again, the point of this draft and others is to develop a quality standard and improve the system. If you choose to take an adversarial approach there's really nothing I can do about that, other than to say I think this conversation has gone well past the point of usefulness. Nikkimaria (talk) 13:13, 16 September 2017 (UTC)[reply]
  • Ok, I blogged about an error in English Wikipedia. It came to light on Wikidata thanks to a quality assessment query. I have not improved on en.wp because of its adversarial stance. I made several proposals that enable better quality assurance practices for all Wikipedias and Wikidata. It is based on the premise that once one project is in disagreement on a statement, a fact there is a need for attention. By concentrating on our differences and not on where our data agrees we focus on quality issues. My point is very much that I regularly find issue with data retrieved from English Wikipedia that comes to light by comparing the data from other Wikipedias.
My question to anyone who wants a BLP practice for Wikidata, how can you insist on the one that is proposed when collaboration improves quality for all our projects? Thanks, GerardM (talk) 14:05, 16 September 2017 (UTC)[reply]
I have no strong opinion about the "draft" or not here, since we already have a sort of Global policy. But I see one large flaw in your reason above GerardM. Yes, we discover false statements by the help of cooperation, but it does not stop them from being re-added here. I have removed the same "coat of arm", "sister citys" and "official website" hundreds of times from the same items, but they are constantly re-added here. Wikidata has not changed the behaviour on Wikipedia, only documented its flaws. We need a new better strategy, not one that only redo the same mistakes we have done for more than a decade on Wikipedia. -- Innocent bystander (talk) 16:04, 16 September 2017 (UTC)[reply]
That is exactly the problem. We do not cooperate on BLP. Thanks, GerardM (talk) 21:22, 16 September 2017 (UTC)[reply]
Or anywhere else! My solution to that problem is that every claim that is imported from an infoboxes at Wikipedia, at the same time have to be removed from those infoboxes. Otherwise there is no cooperation. -- Innocent bystander (talk) 07:53, 17 September 2017 (UTC)[reply]
I agree with you Innocent bystander, but this only works where Infoboxes are lua generated from wikidata items… it's the case for most ruwiki person's templates, but some projects simply refuse it, considering that infobox are the sole responsability of the article editors... , and wikidata driven infoboxes are a very controversial issue :( -- see frwiki, for instance. --Hsarrazin (talk) 09:29, 17 September 2017 (UTC)[reply]
This is more or less exactly the thing I do on svwiki, not with human (Q5)-items, since it is controversial, but with geographic places. But I prefer to add data from good sources, not from Wikipedia. -- Innocent bystander (talk) 09:49, 17 September 2017 (UTC)[reply]

ACM on transparency and accountability

[edit]

the ACM has issued a policy on transparency and accountability

the following principles should be included in a BLP policy.

  1. access and redress we should have a simple interface and reaction team to respond to subjects of the data
  2. explanation we should have a simple explanation of where BLP data comes from and possible uses of the data
  3. provenance we should provide a permanent trail of where the data came from
  4. validation and testing we should provide a quality control chart, and test proposed policies about BLP and only implement those that improve data quality.

Slowking4 (talk) 16:39, 14 September 2017 (UTC)[reply]

According to the ACM the document you reference is about algorithms whereby "An algorithm is a self-contained step-by-step set of operations that computers and other 'smart' devices carry out to perform calculation, data processing, and automated reasoning tasks."
Wikidata doesn't do automated reasoning and dataprocessing. Wikidata is an open database where data can come from a variety of sources and can be used for a variety of purposes. What kind of explanations would you like to see in the text?
When you call for a permanent trail of where data comes from, is your demand that we have a bot that automatically removes data from BLP that doesn't have references? ChristianKl (talk) 20:43, 14 October 2017 (UTC)[reply]
It is wonderful that the ACM has something to say. However "should" implies that we have no choice but. This is exactly where you are wrong. As it is there is a plethora of sources of our information so it will not be simple. In many of our practices we do not register where data comes from and only because an outside organisation has some wise words to say, it does not follow that we should. When you are talking about validation and testing, there are plenty of opportunities where we will make a qualitative difference. We have known about these for a long time and we don't. Now first do what we easily could do before telling us what we should do. Thanks, GerardM (talk) 08:08, 7 December 2017 (UTC)[reply]

alternate draft language

[edit]

replace the draft with the following:

Data quality is an essential value of the wikidata community. This is especially true for data about living people.

principles

[edit]
  1. Wikidata will provide for special attention to the principles of neutrality and verifiability in data about people.
  2. Personal privacy will be respected especially for people who are not public figures;
  3. Wikidata will investigate new technical mechanisms to assess edits, and provide reports to interested projects and quality circles, to assess and improve data quality; Wikidata will implement quality data management.
  4. Wikidata will institute a landing page, safe space, and response team to respond to complaints about data about people in Wikidata with patience, kindness, and respect, and encouraging others to do the same.

implementation

[edit]
  1. Wikidata:Quality improvement
  2. Wikidata:Privacy
  3. Wikidata:LP technical
  4. Wikidata:Lounge

further reading

[edit]

discussion

[edit]
proposing to resolve this with another layer of indirection? But this means we'll have at least 4 pages to argue over, instead of 1... On privacy, the current page has some guidelines in this regard, do you have anything further in mind beyond what's already stated here? When you say "Wikidata will" what does that mean in practice? Administrators will enforce ...? The community will be expected to ...? WMDE/WMF will ... ? ArthurPSmith (talk) 13:05, 18 September 2017 (UTC)[reply]
i thought it was quite direct. agree on general principles, and then work out implementation as we go. it is action oriented, so no argument, merely work. the practice will be in the implementation. i did not mention enforcement. there is no requirement of micromanagement of implementation or listing rules enforcement: that is an artifact of another community. the long citation list shows that data quality principles are in every database software package. nothing to debate there. there is m:Privacy policy, but if you want to reinvent that wheel, go for it. Slowking4 (talk) 03:20, 19 September 2017 (UTC)[reply]
We don't have documents that define that certain work should be done because there's nobody towards which an RFC can task the work of developing an entirely new software feature. Wikidata isn't developing software. The WMDE is developing software with support of the WMF and as it's an open source program everybody can pitch in and submit his own code.
If you want to write a landing page there's no need to have document that says that we should have a landing page before you write a draft of a landing page. ChristianKl (talk) 21:25, 14 October 2017 (UTC)[reply]
we do not have documents or a plan, because there are no managers, merely nobodies who prefer drama. there are plenty of people, but they respond to leadership, not drama. WMDE could support LP technical with WMF, but it would would require volunteer support. the landing page exists, and is linked to, but again it would require volunteer support, and a grant as was done with Teahouse. the documenting of the process is necessary in order to make clear to the obtuse, just how the project will meet the global policy to the letter, and will not be derailed by other agendas. Slowking4 (talk) 03:20, 15 October 2017 (UTC)[reply]
If you want to have a grant to tackle such a problem with a grant funded project, write a grant proposal. The decision of who gets grants to do what isn't done via RFCs or other internal policy documents on Wikidata. ChristianKl (talk) 13:20, 15 October 2017 (UTC)[reply]

Example of a problematic bot edit

[edit]

I thought this is an interesting case of a problematic biographic bot edit: Byron De La Beckwith, a KKK member whose notability/notoriety is the result of murdering a civil rights leader and harassing activists, was described as a "recipient of the Purple Heart medal" by User:PLbot. I'm sure that bot does a lot of good work, but it's a case in point why we need to be careful with descriptions of living people in particular -- determining exactly what belongs in a description and what doesn't requires some human judgment, or a smarter bot.--Eloquence (talk) 23:14, 18 October 2017 (UTC)[reply]

I don't see a substantial problem with that description. The living people policy exists to protect the people for whom we have entries and when a bot writes a description that errs by being more favorable than we would otherwise write that's okay. ChristianKl (talk) 23:28, 18 October 2017 (UTC)[reply]
It's a fair point that this edit was not harmful to the person (who, in any event, is no longer "living"), but I think it was harmful to readers and re-users nonetheless. These descriptions have powerful real-world consequences -- I personally came across it in the Wikipedia app and had an instant "WTF" reaction, suspecting vandalism (for comparison, Britannica's equivalent description is "American assassin"). But Sjoerd makes the good observation that the bot only bears part of the blame. In the scope of this policy, I do think it's worth treating edits to descriptions with particular care, given that they're often used to encapsulate a subject as a whole.---Eloquence (talk) 23:53, 18 October 2017 (UTC)[reply]
Imported from Persondata, not sure if you can blame the bot. Sjoerd de Bruin (talk) 23:36, 18 October 2017 (UTC)[reply]
Yes, that's true; PLbot appears to merely have copied it over. Thanks for pointing that out. I don't know what the overall quality of these "short descriptions" from persondata was; this one appears to have been originally added semi-automatically.--Eloquence (talk) 23:53, 18 October 2017 (UTC)[reply]
One of the problems with persondata was that it was hidden from view, so that errors, vandalism and statements which became outdated were not corrected. As you've shown, exposing them to more eyeballs is the best way to get such problems fixed. The reason for the semi-automated edit on Wikipedia was that, at the time, the article was in the now-defunct Category:Recipients of the Purple Heart medal; that was added on 4 February 2012, also semi--automatically; presumably because of this prose edit adding the same claim, possibly cited to [1], but that's paywalled. This 2001 Washington Post article, assuming it wasn't sourced from Wikipedia, supports the claim. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:37, 19 October 2017 (UTC)[reply]
That makes sense, Andy! Thanks for unpacking how this made it into the description. I have no reason to believe that the claim is false -- it just doesn't belong in the short description of a convicted white supremacist murderer.--Eloquence (talk) 19:41, 19 October 2017 (UTC)[reply]
"undue weight in item descriptions" is not a living person problem. making factual statements that may be supported by a reference adds to data quality. maybe you should add purple heart back given the reference cited above. Slowking4 (talk) 18:08, 23 October 2017 (UTC)[reply]

How do we deal with requests for removal of personal information

[edit]

The Wikimedia board resultion on BLP says that one principl is supposed to be: "Taking human dignity and respect for personal privacy into account when adding or removing information, especially in articles of ephemeral or marginal interest".

I would translate this into: "Request for removal of information - If the subject of an item request removal of specific information on the item and that information isn't of public interest, an administrator can delete and oversight it."

@ValterVB: undid this edit with the suggestion that any information that can be publically sourced is fair game. I think there's plenty of information published on blogs and social networks that can be sourced but where gathering all information about a given person can make the person feel that their privacy is violated. I can get a validly sourced birth date by having two archive.org archived pages that show the age of a person on a forum which narrow down their birthdate. That doesn't mean the person might not want their birthday to be public knowledge and have a reasonable interest in getting it removed. ChristianKl () 12:50, 24 November 2017 (UTC)[reply]

Not every item has NoValue statements

[edit]

@Hsarrazin: You changed "date of birth (P569) is is missing or less than 115 years ago or date of death (P570) has NoValue" to "date of birth (P569) is is missing or less than 115 years ago and date of death (P570) has NoValue" on the grounds that it would declare people living in the 12th century as living. It doesn't. NoValue shouldn't be set for date of death (P570) for people in the 12th century. The clause is in the document to be able to give a person who's 117 years old the NoValue for date of death (P570) and thus mark them to be a living person for the sake of this policy. ChristianKl () 15:19, 4 December 2017 (UTC)[reply]

@ChristianKl:
the criterium as you stated it is "P569" missing OR "P570" is "novalue"... -> which means "P570" is "novalue" (3rd criterium) is enough... no need to add it on 2nd criterium… why do you add it to 2nd criterium with OR ?
and I agree with @Jura1: "NoValue" should not be used on P570 for human beings...

--Hsarrazin (talk) 20:18, 4 December 2017 (UTC)[reply]

As written a item is considered to be a living person if all three criteria hold. If just criteria (1) and criteria (3) hold, the person is not considered to be living for the sake of the policy. Given that this usage seems to be confusing and there are arguments against using NoValue here I rewrote criteria (2) to use floruit (P1317) to allow us to handle the edge cases of people who might live past 115. ChristianKl () 21:08, 4 December 2017 (UTC)[reply]
I don't quite get why NoValue is mentioned at all. Does that mean that if there is a death date, but it's incorrect (person is alive), the policy doesn't apply?
--- Jura 11:25, 7 December 2017 (UTC)[reply]

Channel for users to report request for removals

[edit]

I'm currently thinking about whether there's a better way than asking people to come to the admin board or contacting individual admins. Does anyone have other preferences? Ideally something where the person can privately report and then all admins can read the request. ChristianKl () 21:55, 4 December 2017 (UTC)[reply]

There is an OTRS queue for Wikidata. Emails sent to info at wikidata.org go there. --Lydia Pintscher (WMDE) (talk) 09:38, 5 December 2017 (UTC)[reply]
@Lydia Pintscher (WMDE): Thanks, that looks good. Who has access to the OTRS queue? Can we add a new email address like "privacy@wikidata.org"? ChristianKl () 10:47, 5 December 2017 (UTC)[reply]
Some editors have access but I am not sure who. Sorry. I believe new addresses can be added. --Lydia Pintscher (WMDE) (talk) 10:59, 5 December 2017 (UTC)[reply]
@Lydia Pintscher (WMDE): From reading the documentation on https://meta.wikimedia.org/wiki/OTRS/Access_policy it seems that there are "Role accounts". Can you look into (or delegate it) the question whether we could configure the system in a way where all Wikidata Admins would automatically have access to a OTRS page that receives the results of information that get's mailed to "privacy@wikidata.org"?
To me that seems like a good technical solution. Telling people to contact individual admins is problematic given that activity level among admins differ and some admins might not be interested in handling these requests. If you have any other ideas about how to create the interface to voice privacy concerns, I would also appreciate hearing them.
Given what written on the OTRS page there's the mention of Wikidata Staff as a role. Maybe only Wikidata Staff gets to read info@wikidata.org at the moment? If that's the case and we do go the road of using OTRS for "privacy@wikidata.org" it might also make sense to make info@wikidata.org accessible to Wikidata Admins. I also wouldn't mind giving access to "privacy@wikidata.org" to Wikidata Staff as well. ChristianKl () 11:27, 5 December 2017 (UTC)[reply]
No only some editors do have access to it. I can't read it or anyone else on my team. It is probably best if you ask one of them about the process and access. Sjoerd maybe? --Lydia Pintscher (WMDE) (talk) 15:56, 5 December 2017 (UTC)[reply]
I've received access to the info-wikidata queue after some request on the OTRS wiki. I think users without current OTRS access can do the same on meta:OTRS/Volunteering. I think, if we get more new oversighters in various timezoens, we can just handle it at oversight@wikidata.org. Sjoerd de Bruin (talk) 16:05, 5 December 2017 (UTC)[reply]
@Sjoerddebruin: Simply getting oversighters of various timezone doesn't seem like an easy task to me given that it's important to keep the oversight rights to a small number of trusted people. I'm however okay with the information simply going into the current OTRS system. I think it's worthwhile to have a specialized email address, so we can better the reason for incoming requests. Even when we currently don't get enough requests for it to be problematic, deciding now on having two email addresses will allow us later to send the requests to different queue is we need to because of an increased volume of requests.
What's the volume of Wikidata related requests at the moment? ChristianKl () 23:56, 5 December 2017 (UTC)[reply]

Can we find a way to mark in the references whether something is "widespread public knowledge" or "supplied by the individual themselves"?

[edit]

I think it would be great if we would find a way to have people mark statements that are "widespread public knowledge" or "supplied by the individual themselves"?

Maybe type of reference (P3865) "supplied by the individual themselves"? ChristianKl () 23:09, 4 December 2017 (UTC)[reply]

Thinking a bit about this, I think for now specifying a way for this adds too much complexity, so I we don't need to specify a notation now. Maybe, we experiment with a notation and at a later point revisit the question about whether this policy should relate to it. ChristianKl () 12:48, 6 December 2017 (UTC)[reply]

Labels/Description/Aliases

[edit]

This section is currently very short. It might make sense to expand here on what it means to be neutral. Does anybody have a good idea? ChristianKl () 10:39, 7 December 2017 (UTC)[reply]

Hmm, wikidata does not have a Wikidata:NPOV page. We could reference en:WP:NPOV which has stuff like "representing fairly, proportionately, and, as far as possible, without editorial bias, all of the significant views that have been published by reliable sources on a topic" but that's not necessarily applicable to labels I think. ArthurPSmith (talk) 15:54, 7 December 2017 (UTC)[reply]
Representing all the significant views isn't what a description is supposed to do, so it would make more sense to say that "neutral" doesn't mean what the enWiki policy says in that paragraph instead of saying that it's supposed to be read that way. ChristianKl () 14:25, 10 December 2017 (UTC)[reply]

Deletion

[edit]

It makes sense to mention deletion of items within the context of administrative actions that could be taken, rather than as a standalone section. Nikkimaria (talk) 18:37, 7 December 2017 (UTC)[reply]

If you want to argue that it's an action that could be taken, than it might be make sense to put it into the section of administrative actions that could be take. You however argue that they should be taken. Specific things that should be done have each their own sections in this policy.
One of the advantages is that it makes it easier to vote in the RfC about which actions should be taken. ChristianKl () 14:22, 10 December 2017 (UTC)[reply]
I've changed the phrasing to match the other noted administrative actions. Nikkimaria (talk) 14:35, 10 December 2017 (UTC)[reply]
Okay, I'm fine with that wording. For explanation: We currently have the case of the Black Lunch Table. There are a few hundred items about living people who just have a name, instance of (P31), sex or gender (P21) and catalog (P972) Black Lunch Table (Q28781198). This means that the notability of the items in question is questionable. I do think that it's reasonable to decide whether we want those items by having a discussion about them, but I see no reason for this policy to say "We should delete those BLT items". If we delete them it can make sense to give the BLT folks a few months to see whether they can add information to make the items notable and there's no reason to create a rush via a Living persons policy. ChristianKl () 19:54, 10 December 2017 (UTC)[reply]
except some editors currently delete items without a wikipedia links as "not notable" after 7 days. doubtless they will cite LP to justify that deletion, even if structurally useful, as "presumably notable, per wp:before, but not yet proven". Slowking4 (talk) 22:59, 13 December 2017 (UTC)[reply]
WP:BEFORE? Are you proposing we import that principle from English Wikipedia? Nikkimaria (talk) 23:52, 13 December 2017 (UTC)[reply]
in effect the english notability is imported already in wikidata notability. (except for the deletionists) if you could write a keepable article then it is notable on both projects. Slowking4 (talk) 02:17, 14 January 2018 (UTC)[reply]

Burden

[edit]

Not sure what is meant by this - the phrasing has nothing to do with bot vs non-bot editing, and so need not depend on any RfC outcomes about bot removal. I think it is vital to include for clarity. Nikkimaria (talk) 19:02, 7 December 2017 (UTC)[reply]

The phrase that there's a burden of proof suggest that you can remove all items where a person hasn't proven a statement. If that's actually what intended the easiest way is to have a bot that takes over the task of deleting everything where the burden isn't meet.
I want to design policy in a way that when there are conflicts between editors that the default is a mutual search of common ground. Having a mutual search for a common ground, means that we have a friendlier enviroment on Wikidata and it's not my desire to copy Wikipedia's culture of hostility and deletionism.
I wrote most of the draft language of this article and want to have it in the form I believe in when I start the RfC. To the extend that you have different opinions of how the policy should look like you are free to add option to the RfC once it's ready that outline your desired wording. ChristianKl () 14:01, 10 December 2017 (UTC)[reply]
As I said, it's got nothing to do with bots - no one AFAIK (other than perhaps your RfC) has suggested having a bot remove all unsourced statements. It simply means if there is a dispute about including a particular piece of information as a living-person issue then the burden is on the person wanting to include it. Feel free to suggest other ways of phrasing that. And no, just because you're writing an RfC doesn't mean the draft should only reflect what you want it to say. Why not have a "mutual search for a common ground"? (And why have all of those different issues in a single RfC in the first place, and not just "should we accept this as policy yes/no/with amendments XYZ?") Nikkimaria (talk) 14:20, 10 December 2017 (UTC)[reply]
In this case, the RfC is the venue where consenus around the policy is supposed to be found. That means that to the extend that there are different views it makes sense to have both views in the RfC to vote which one is prefered.
To the extend that you claim that nobody suggested we should delete unsourced claims, I guess that means that you haven't read previous discussions about what to do with the usage of an property like ethnic group (P172).
Let's take as an example Karl Marx (Q9061) ethnic group (P172) Jewish people (Q7325). Currently, the claim is unsourced even through we have him as an example for ethnic group (P172) and we consider it important to have this kind of claim sourced. I would want to encourage people to actually source the claim but I don't want to encourage the simple removal of claims like that.
As far as having this as one RfC, I think that's valueable because you actually need the policies in context to have a decent understand of the effects they are likely to have.
As a sidenote at the moment the section on dealing with complaints is probably most instable because I want to see the OTRS system from the inside before finishing it. There's also an open request for a developer opinion on https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team#Performance_effects_of_a_special_flag_for_bots_adding_certain_statements_to_living_people that I want to have answered before putting the actual RfC online. ChristianKl () 20:12, 10 December 2017 (UTC)[reply]
I haven't claimed no one has said we should removed unsourced claims - just that doing it en masse by bot is a different matter. But I must disagree with your point about having a single RfC. You could bundle acceptance of this as policy with associated changes necessary to other policies such as blocking. But the alt accounts provision is tangential enough that it should be dealt with separately (whether this policy is accepted or not). As for the potential bots, they're moot if the policy is not accepted, and not required for it to be; they are implementation questions best settled after the policy one is put to bed. Per your point here, best to discuss such options later. Nikkimaria (talk) 20:40, 10 December 2017 (UTC)[reply]
talk of burden of proof will be divisive. shifting the burden is a tactic in a battleground. you would do better to talk of seeking consensus of a standard of practice to follow. Slowking4 (talk) 03:17, 11 December 2017 (UTC)[reply]
Yep! The whole point of this draft is to develop a community standard of practice to follow. Nikkimaria (talk) 03:42, 11 December 2017 (UTC)[reply]
good - then strike that sentence and replace with a consensus process. "Wikidata properties likely to be challenged" created with no consensus whatever. Slowking4 (talk) 19:27, 11 December 2017 (UTC)[reply]
You misunderstand me: the policy proposal is a consensus-building process. The RfC is to assess the consensus for the proposed use of that statement, among other details; if you think "Wikidata properties likely to be challenged" should not exist, you will be able to express that feeling, with appropriate rationale, in that discussion. Nikkimaria (talk) 19:53, 11 December 2017 (UTC)[reply]
i guess you misunderstand, what i mean by consensus. it is not a feeling, it is a standard of practice. it is not a rationale, it is a process to follow, or not if you choose not to follow it. Slowking4 (talk) 23:03, 13 December 2017 (UTC)[reply]
This policy does define a process for adding/removing property likely to be challenged (Q44597997) and property that may violate privacy (Q44601380), given that I expect that as we create new properties and as we add lots of new data we will regularly have to have discussions about whether to add or remove those classes. I don't expect the actual text of the policy to change as frequently. ChristianKl () 12:15, 12 December 2017 (UTC)[reply]
If you think that we shouldn't use a bot to ask on mass to let people fulfill the burden of providing evidence, what kind of criteria would you expect a human editor that removes statements and asks people to fulfill the burden would use that makes his activities qualitatively different (and not only quantitatively)? ChristianKl () 00:00, 13 December 2017 (UTC)[reply]
Humans use judgement rather than criteria, whereas bots can only use the latter. The primary purpose of the phrase at issue is to address what happens when a statement is actually "challenged". Nikkimaria (talk) 00:16, 13 December 2017 (UTC)[reply]
Humans judge by criteria. Humans can use criteria that are more complex than bots, but in that case it would still be possible to describe the criteria in words. Someone might say "I'm challenging all claims that currently don't have sources". I don't think that the policy should indicate that this is a valid move. To use your phrasing, I think that case-by-case judgement is required to analyse how the burden of proof looks like. ChristianKl () 13:20, 13 December 2017 (UTC)[reply]
Or just common sense. I'd expect someone who mass-challenges obviously-true-yet-unsourced claims of "instance of: human" to be shot down pretty quick, but if someone has a good-faith concern about potentially controversial or privacy-violating claims being added then requiring decent sourcing for their inclusion is fair. Nikkimaria (talk) 14:25, 13 December 2017 (UTC)[reply]
Mass removing instance of (P31) human (Q5) is a strawman, as I said above mass removing all unsourced ethnic group (P172) for living people would be a more central issue. I think the policy as it's currently is powerful enough to be used with common sense without having a sentence about burdens. ChristianKl () 14:39, 13 December 2017 (UTC)[reply]
Disagree. As above, the purpose of the phrase is not to address large-scale bot-like changes, but rather disputes; if you'd like to suggest a different wording please do so. Nikkimaria (talk) 14:54, 13 December 2017 (UTC)[reply]
Policy shouldn't be judged on the intent but by the effect. I think the existing wording is fine and your approach problematic. You seem to be the only person who thinks that my draft is problematic in this regard. It's also worth noting that previous RfC's (see https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Findagrave_removed_as_a_source_for_information) indicate that you have a minority position when it comes to how we should go about deleting content. If you would believe that your position here would find general support, it would be easy to follow the suggestion of adding it as a point to the RfC. ChristianKl () 13:06, 14 December 2017 (UTC)[reply]
No, it wouldn't, since per my previous comments I think we should avoid having a million options for people to vote on in a single RfC. I've tweaked the wording to try to address your concern. Nikkimaria (talk) 13:46, 14 December 2017 (UTC)[reply]
[edit]

It's possible avoid link to English page like this? Normally people don't follow all the page of all the wiki, and if someone change that page may not be advised. I prefer keep the page here, where people interested can add it to watchlist. --ValterVB (talk) 08:29, 9 December 2017 (UTC)[reply]

Is it better to link to the Wikidata item? It'd be cool to link to Special:GoToLinkedPage but I don't think it's possible to put the user's own preferred language into that (on the fly). Sam Wilson 13:17, 9 December 2017 (UTC)[reply]
I prefere a page here, where user can know when a page is changed, seeing the Recent Change or Whatchlist in Wikidata. --ValterVB (talk) 14:18, 9 December 2017 (UTC)[reply]
I'm also against linking out to enwiki policy like that but linking to our Wikidata item about is fine until we have our own policy. Additionally, I think a sentence about what it means for a Wikidata item description to be neutral would be well placed in that paragraph. ChristianKl () 14:22, 10 December 2017 (UTC)[reply]
I do not agree. I prefer to import the page and change little by little here. I want avoid somethin like "Because use the policy on en.wiki? de.wiki is better etc.etc. If we have a page here, it is the "our page", not "a page of other wiki", if the page is here persons can't say "I'm only on wikidata" I don't follow other wiki so I can't know if the rule is changed" and so on.... --ValterVB (talk) 15:32, 10 December 2017 (UTC)[reply]
I just went and reread the EnWiki policy and I don't think referencing it even indirectly helps, given that EnWiki just doesn't have a concept of a "description" and we don't want people to put all significant viewpoints in a description. ChristianKl () 19:58, 10 December 2017 (UTC)[reply]
I meant to say that not even the link to the item is a good choice, not only in the example that I linked but in general --ValterVB (talk) 20:12, 10 December 2017 (UTC)[reply]
given that they are wholesale cutting / pasting english policy here, it is good to be honest about that. but you are right. better to build consensus in this community rather than trying to cram down english policy yet again. Slowking4 (talk) 03:12, 11 December 2017 (UTC)[reply]
It's one thing to copy-paste policy, it's another to link out and make the authoritative version of the policy be hosted with EnWiki. Copy-pasting at least means that we are free to change the copy-pasted version of the policy that we host. ChristianKl () 11:33, 12 December 2017 (UTC)[reply]
@Slowking4: As far as this article goes the paragraph "Non-item space" is an adapated version from EnWiki. If you have any suggestion for improving it, I'm happy to hear them. ChristianKl () 11:34, 12 December 2017 (UTC)[reply]
i would strike the entire section. i do not see evidence of LP problems in user space. you are importing the English prescriptive practice about user pages. existing policy about disruption can cover. Slowking4 (talk) 23:09, 13 December 2017 (UTC)[reply]

First impressions of OTRS

[edit]

I just got my OTRS access. It listed three open tickets for Wikidata. The oldest open ticket was 112 days old without getting addressed. I'm not exactly sure whether the problem is poor UI or whether there are simply not enough reviewers for Wikidata. It seems like I need another day to get access to the OTRS Wiki to get a better idea. ChristianKl () 12:12, 12 December 2017 (UTC)[reply]

I settled a bit into the OTRS structure and successfully requested privacy@wikidata.org to be added as additional email address. In case we will get more request via this channel we will need more Wikidata admins to have OTRS accounts but that seems like a solvable problem. Currently, the Wikidata queue has a total of 101 items that it received over the 5 years of Wikidata existence and some of that is spam. ChristianKl () 14:27, 13 December 2017 (UTC)[reply]
@ChristianKl: What is the OTRS backlog looking like now? Beorhtwulf (talk) 19:14, 8 July 2019 (UTC)[reply]
@Beorhtwulf: I don't have OTRS access anymore. ChristianKl07:29, 15 July 2019 (UTC)[reply]

Should we add a way to tag removed claims to prevent them from being readded?

[edit]

I'm thinking about the question of what to do when we remove information because of a request from the subject. One way would be to add "Unknown value" with a qualifier that indicates that the information was removed because of a request. What do you think? ChristianKl () 13:16, 13 December 2017 (UTC)[reply]

seems like a good idea : something like criterion used (P1013) - "removed by request of the subject of the item" (provided it is not "public" of course)... and maybe a link to the request database (like OTRS ticket on commons) ? --Hsarrazin (talk) 13:27, 13 December 2017 (UTC)[reply]
I'm not aware of how commons does it. Can you link me to a good example and/or the policy describing it? ChristianKl () 14:40, 13 December 2017 (UTC)[reply]
I'm not an OTRS user or admin, but [2] and more specifically, for image that has been checked with link to ticket. - this would probably need and OTRS ID property. --Hsarrazin (talk) 15:00, 13 December 2017 (UTC)[reply]
Yes, we would need some OTRS property. I see two possibilities: (1) "removed because of OTRS ticket", (2) "OTRS ticket ID". The first would have the advantage that it's less effort to just add one property. ChristianKl () 15:10, 13 December 2017 (UTC)[reply]
That would be an excellent tool to allow trolls to quickly find removed data in items' histories. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:17, 14 December 2017 (UTC)[reply]
tags are an english solution, perhaps you meant query? or superprotect? Slowking4 (talk) 19:57, 15 December 2017 (UTC)[reply]
I'm not sure what your point happens to be. I used tag as a verb. Could you explain your issue? ChristianKl () 00:08, 21 December 2017 (UTC)[reply]

This proposal is a bad thing

[edit]

Obviously there is a need to deal with living people. However, this proposal will prove to be highly detrimental to diversity. It will stop the growth of information and it is back to the bad old days of Wikipedia thinking in stead of thinking in terms of data, sources and quality through comparison.

  • In a paper about Wikidata, it was mentioned that the implementation of constraints had a detrimental effect on the diversity of the Wikidata data.
  • We are getting to a stage where we can compare data. When we compare what DBpedia has to say about the statements with what we know that is relevant in a BLP way, when the data is the same, we know that we are in sync with (a) Wikipedia. It is particularly where there are differences where we want to consider the requirement for sources. This is exactly where our time is well spend.
  • When we add data from Wikipedia, we know that the quality of Wikipedia is often so so. Improving on the shared data in both Wikipedia and Wikidata should be done in a smart way.
  • Sources known in Wikipedia are acceptable in sources in Wikidata
  • We do not have methods to invalidate sources.

Thanks, GerardM (talk) 20:17, 6 January 2018 (UTC)[reply]

What do you mean by "Sources known in Wikipedia are acceptable in sources in Wikidata"? Nikkimaria (talk) 22:25, 6 January 2018 (UTC)[reply]
When a fact comes from a Wikipedia, we can expect that there is a source. Thanks, GerardM (talk) 19:10, 7 January 2018 (UTC)[reply]
Ideally that would be true, in practice it isn't. Nikkimaria (talk) 01:04, 8 January 2018 (UTC)[reply]
"In a paper about Wikidata, it was mentioned that the implementation of constraints had a detrimental effect on the diversity of the Wikidata data." which paper do you mean? ChristianKl23:17, 6 January 2018 (UTC)[reply]
yes, i agree. this proposal is the same old adversive, directive bag of tricks imported from other wikipedias, unhinged from database quality assurance. there are policies appropriate to databases and data sets, too bad they are not incorporated here. Slowking4 (talk) 02:10, 14 January 2018 (UTC)[reply]
@Slowking4: You speak a lot about that topic but you haven't made any concrete suggestions besides advocating for principle (and thus that somebody else should make concrete policy based on those principles). If you have concrete policy suggestions, why don't you write them up? ChristianKl02:54, 14 January 2018 (UTC)[reply]
ok Wikidata:Living people (draft 2) see also Wikidata_talk:Living_persons_(draft)#ACM_on_transparency_and_accountability above. as you might note, i have a fundamentally different way of managing projects or policy. doubt we will agree on any means and methods here. Slowking4 (talk) 03:17, 14 January 2018 (UTC)[reply]

GDPR - General data protection regulation

[edit]

On May 25, 2018 a new European regulation becomes due. We should, as any enterprise or organisation in the world dealing with EU citizens, apply those rules. Any data item that could possibly identify a living person is privacy sensitive. This goes as far as e.g. name, photo, telephone number, adres, bank acount number, e-mail adres, IP address, etc. - Geertivp (talk) 15:10, 10 January 2018 (UTC)[reply]

What practical steps do you think those rules imply for Wikidata that aren't currently taken? ChristianKl02:52, 14 January 2018 (UTC)[reply]

Addendum

[edit]

@Jura1, Lazypub, MisterSynergy, Jarekt, Ghouston:@Oravrattas, RexxS, Jc3s5h, Koavf: Based on the collective insights that were generated on this conversation and the recent changes performed to Wikidata:Autobiography based on it, I would like to:

  1. Remove the "draft" template from Wikidata:Autobiography
  2. Link it from the lede of this page: "For information about how you can add data about yourself into Wikidata, please refer to Wikidata:Autobiography".

If there are no objections to this, I will proceed in the next days.--Micru (talk) 12:16, 22 December 2018 (UTC)[reply]

BLP?

[edit]

Dumb question – what does BLP stand for? The acronym is used twice in the Applicability to legal persons and groups section, but not explained. --Lucas Werkmeister (talk) 10:24, 31 January 2019 (UTC)[reply]

It stands for Biography of Living People and means this policy. Given that there's no change in meaning by eliminating the term and it's more clear, I changed the wording directly. ChristianKl10:37, 31 January 2019 (UTC)[reply]
Thanks, I think the “biography” part doesn’t make as much sense on Wikidata as it does on Wikipedia :) --Lucas Werkmeister (talk) 10:42, 31 January 2019 (UTC)[reply]
you might want to take on board, yet another objection to eliding wikipedia policy and acronyms onto wikidata. Slowking4 (talk) 17:08, 1 February 2019 (UTC)[reply]

How to register consent?

[edit]

We add information of Dutch professors in Wikidata. Some universities are starting to sent forms for consent to professors to include personal information in Wikidata (date of birth, place of birth, career etc). Currently the text is "Inclusion in the Wikidata portal as part of the Wiki-wetenschappers (Wiki-scientists) project. I declare to be sufficiently informed about the Wiki-wetenschappers project, more specifically about being included in the Wikidata portal, and wish to participate. Participation is entirely voluntary. I have the right to withdraw my consent at any time without stating the reasons. I give my consent to have the above personal information included in the public and searchable Wikidata portal.". I'm wondering if there are wikidata rules on this subject. For example, how could we reference to those filled forms as a source to the (offline) consent form? --Hannolans (talk) 09:02, 21 May 2019 (UTC)[reply]

Assuming the completed forms are not themselves online somewhere, maybe create an item for the project (if it doesn't already exist) and use that as the reference? ArthurPSmith (talk) 15:21, 21 May 2019 (UTC)[reply]
yes, that could do, I am curious what type of reference and qualifiers should I use? Could I model somehow that that data privacy wise is ok to include in Wikidata? I created an item for the model form. Should I use something like copyright license (P275)-> Collection of consent form Wiki-scientists (Q63967144) --Hannolans (talk) 21:51, 21 May 2019 (UTC)[reply]
well I was thinking more stated in (P248) pointing to an item for "Wiki-wetenschappers", but then you could add that form item with the license property as a second part of the reference also. I think a reference should provide some mechanism for confirming the information, so if it can be confirmed in principle by contacting some person associated with the project that would be sufficient. ArthurPSmith (talk) 11:49, 22 May 2019 (UTC)[reply]
Given that those university documents are the sources of the items, it would be best if they would be on Wikidata as a reference for the claims that are made. Do you think it would be a problem for the university to host the relevant documents with a stable link?
In general registering consent on Wikidata is hard because we don't have a good way to authenticate users against real-world identities. On the other hand a university is a trustworthy source for the identities of their scientists. ChristianKl11:59, 22 May 2019 (UTC)[reply]
Thanks for all the feedback. Yes, the university is trustworthy in this case, we can relate this to the archives department that host the contracts. I would like to have a datamodelling that we can describe an individual contract with relating to a model contract without having to put the individual contract online.--Hannolans (talk) 13:50, 22 May 2019 (UTC)[reply]
I have added information to Peter Anderson (Q53548176). I did use as the data model with reference inferred from (P3452) -> Collection of consent form Wiki-scientists (Q63967144). The item is the collections of the consent forms in the university. Not sure whether to use inferred from (P3452) or stated in (P248), as the collection is not the work itself, inferred from (P3452) seems more correct? --Hannolans (talk) 17:58, 28 May 2019 (UTC)[reply]
That sounds fine to me, thanks. ArthurPSmith (talk) 18:17, 28 May 2019 (UTC)[reply]

The policy was written in a way where property likely to be challenged (Q44597997) was intended to be a weaker protection status then one's tagged with property that may violate privacy (Q44601380). There's a way to read the sentence In particular properties that are instance of (P31) property likely to be challenged (Q44597997) should be supported by suitable references when applied to living people. which speaks in plural about properties and then in plural about references in a way where every usage would require at least two references which is counter to it's intention. Properties that need stronger protection can still get property that may violate privacy (Q44601380). To make that more clear I want to change that sentence into In particular each property that is instance of (P31) property likely to be challenged (Q44597997) should be supported by at least one suitable reference when applied to living people. ChristianKl10:56, 1 March 2020 (UTC)[reply]

Phone numbers and email addresses for living people are highly problematic IMO

[edit]

Today I noticed that we store(d) many email address (P968) and phone number (P1329) of individual, living people. I was shocked to discover that. While I saw that many of these email addresses and telephone numbers have been sourced from university websites and personal websites, I believe that we should be very cautious with that. If an individual's email address and phone number is listed on an employer's or a personal website, that does definitely not mean that they want that data to be replicated or aggregated elsewhere, especially not in a broadly scoped project like Wikidata whose data will be harvested for many other purposes. I believe that structurally adding these to Wikidata is risky and potentially a privacy violation, which can get the community deeply in trouble with the European GDPR legislation.

I probably went a bit overboard while feeling shocked, and have deleted many email address (P968) and phone number (P1329) statements. I have not deleted the statements attached to items of politicians, as (in my opinion) these are public figures who actually need to be reachable as part of their public duty. I can restore other statements if needed - I didn't spend a lot of time on it. However, I'd like to advocate for a clear and strict policy on contact information for living people. What do others think? Spinster 💬 21:14, 22 May 2020 (UTC)[reply]

FWIW, both properties are listed on Wikidata:WikiProject Properties/Wikidata properties that may violate privacy - but as far as I know, I'm not sure if we already have clear guidelines in which situation the 'contact information' related properties are OK to be included? Wikidata:Living people states:

Values for living individuals should generally not be supplied unless they can be considered widespread public knowledge or are openly supplied by the individual themselves (otherwise hidden supporting references are not sufficient). As an example, the fact that someone's address is accessible by looking at a domain name registration doesn't imply that it's considered widespread public knowledge for the sake of this policy.

but that still sounds very vague to me. I would be in favor of defining the guidelines for contact information more clearly, on a per-property level. Spinster 💬 21:24, 22 May 2020 (UTC)[reply]
Given that storing someone's contact details might produce under GDPR a requirement to contact them about storing their contact details (as having someone's email address makes contacting easy), I think there's a case to be made to not storing such data at all for living people but only for organizations. ChristianKl22:32, 22 May 2020 (UTC)[reply]
However e-mail provides a way to disambiguate authors and are present in many websites (random example). These are openly provided by the people concerned. Therefore I strongly oppose removing them.--GZWDer (talk) 12:52, 23 May 2020 (UTC)[reply]
I think we have a constraint that exclude email addresses for people. As mentioned in a discussion there, pubmed doesn't provide them easily either. --- Jura 13:25, 23 May 2020 (UTC)[reply]
There are many sources providing e-mail addresses, e.g. Wiley Online Library, ORCID, Web of Science, ScienceDirect, etc.--GZWDer (talk) 21:52, 23 May 2020 (UTC)[reply]
That doesn't mean we have to, especially as we cannot take the steps Pubmed takes to avoid problems. --- Jura 16:43, 24 May 2020 (UTC)[reply]
Note most article have some corresponding author (Q36988860), whose e-mail must be public.--GZWDer (talk) 19:00, 24 May 2020 (UTC)[reply]
It's one thing to go through a bunch of articles to find corresponding author (Q36988860). It's another to run a query that gives you thousand of emails for physicists and then email them all about your new pet theory of everything. ChristianKl07:50, 25 May 2020 (UTC)[reply]
This is already doable even if Wikidata does not exist.--GZWDer (talk) 19:43, 25 May 2020 (UTC)[reply]
PubMed also provides e-mails. example.--GZWDer (talk) 08:12, 28 May 2020 (UTC)[reply]

Original research on living people... and people's "deaths"

[edit]

Most Wikipedia sites disallow original research in general, but original research is still allowed in Wikiversity. However, I won't discuss original research in general. Instead, I'm raising my concerns on allowing original research on living people, including those whose "death" is not verifiable. For example, the Wikidata item of Kip Noll (Q390033) to this date states him as deceased. On the other hand enwiki bio about him states him as "possibly living" and doesn't connect the obituary and the Stallion magazine interview together, which would be considered original research by enwiki standards. (More at Wikidata:Project chat#Kip Noll (Q390033))

Seems that this policy doesn't mention "original" or "original research". I think the policy should disallow original research on living people. If the community would allow original research on living people, what about people whose "death" has been contested? Must I create an RFC to discuss allowing or disallowing original research (and/or thoughts) on living people? If that's not necessary, what can be done about original research on living people... and people whose death is not well verified by multiple sources? George Ho (talk) 21:27, 4 December 2020 (UTC)[reply]

  • The policy currently says "Anyone born within the past 115 years is covered by this policy unless a reliable source has confirmed their death." I think the crux here is whether or not the Library of Congress should be seen as reliable source.
It has editoral overseight and you could say that an analsyis that we shouldn't trust it because of the sources that the editor of the Library of Congress cited is original research.
Given the current policy, consensus finding on the talk page of the item or the project chat would be the way to resolve the issue of that particular item.
If you believe that this principle is not enough and there need to be additional principles, an RFC would be the way to adopt additional principles. ChristianKl23:26, 4 December 2020 (UTC)[reply]

In my postponed/failed RfC, you advised me to define what research means. Must I create a proposal or something? I'm at loss here. What does "original research" mean to Wikidata community? I asked about original research on living people, but you guys demanded a proposal. Geez! An RFC requires a proposal? Ridiculous!!! George Ho (talk) 21:58, 16 December 2020 (UTC)[reply]

  • It's not a term we use presently to make decisions and as such it has no established meaning. If you do want a change in the living person policy then you do have to make a concrete proposal of how it should look differently. Then everybody can decide whether or not they agree that we should make that change or shouldn't make that change. In your intention is to add a new dynamic with a ban on original research, most likely it's benefitial to offer a draft of policies beforehand for review before actually creating an RFC on which people vote. I often have drafts in my userspace and ask on the project chat for feedback before proposing a new policy in an RFC. ChristianKl23:30, 16 December 2020 (UTC)[reply]

Privacy

[edit]

I am trying to add a new WD entry of a woman whose page has been recently added to WQ. This is a living person and since the instructions warn about adding private information I came to this page to see where the pitfalls are.

When I went to Wikidata:Living_people#Statements_that_may_violate_privacy and followed the link to Wikidata:WikiProject Properties/Wikidata properties that may violate privacy I was surprised to see the list is empty and the text says that this list is updated by a BOT.

Where can I find out what sort of things to avoid? Thanks in advance, Ottawahitech (talk) 20:18, 19 January 2021 (UTC)[reply]

SELECT ?item ?itemLabel 
WHERE 
{
  ?item wdt:P8274 wd:Q44601380.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!
Dexxor (talk) 09:17, 20 January 2021 (UTC)[reply]
I am sorry I don't understand your answer to my question. Coud you elaborate? Ottawahitech (talk) 19:42, 24 January 2021 (UTC)[reply]
WD:WikiProject Properties/Wikidata properties that may violate privacy works now. But you can also follow the "Try it!" link in my answer and click the blue play button. —Dexxor (talk) 10:08, 25 January 2021 (UTC).[reply]
Thanks @Dexxor: I clicked the Try it link and saw a page described as Wikidata Query Service . On that page I clicked the "blue play button" and a table with two columns showed up. I still don't undersrand how not to infringe on a person's privacy. Thanks again for your explanation. Ottawahitech (talk) 23:15, 26 January 2021 (UTC)[reply]
Use common sense. Only add privacy-sensitive information if it’s common knowledge or openly supplied by the individual itself, e.g. on a personal website or Twitter profile. When in doubt, don’t add the information. —Dexxor (talk) 08:53, 27 January 2021 (UTC)[reply]

Add guideline about children?

[edit]

Should we add some guidance about number of children (P1971) being specifically made to be used so that we can avoid creating items for non-notable children? Ainali (talk) 15:40, 5 February 2021 (UTC)[reply]

  • No, we generally see children as notable due to structural needs. The property exists because frequently we have sources that report how many children a person has but nothing that would identify the children.
In cases where we do have names of children but don't want to create items for them unknown value Help with subject named as (P1810) would do the job of storing that information. ChristianKl15:47, 5 February 2021 (UTC)[reply]
The discussion of the property says: "Mainly in cases where the full list isn't or shouldn't be added". Are you saying that the latter part is never true? And what structural need does an infant fill? It will very likely have no other links to it than the parents (unless it actually is notable for other reasons) and they can generally (and easily) be connected with other properties. Ainali (talk) 16:30, 5 February 2021 (UTC)[reply]

Regarding BLP requests sent to privacy@wikidata.org

[edit]

Hello Wikidata volunteers,

My name is Brian, and I am writing on behalf of the Wikimedia Foundation Legal department with a question about how to handle requests sent to privacy@wikidata.org. In my volunteer capacity, I edit under the username Airplaneman.

The issue

[edit]

The Wikidata BLP policy directs people to contact privacy@wikidata.org if they want to remove private information about themselves. This address currently redirects to privacy@wikimedia.org. The redirect was created last year as per this Phabricator ticket.

The Foundation’s legal team handles queries that come into privacy@wikimedia.org. Since the privacy@wikidata.org address redirects to privacy@wikimedia.org, the Foundation has recently seen an increase in requests from BLP subjects to modify or remove their entries on Wikidata. The vast majority of these requests regard information that would probably not qualify for legally required oversight. The Wikimedia community—not the Wikimedia Foundation—is in charge of normal content moderation on Wikimedia projects. For the vast majority of requests that do not legally require the Foundation to act directly, the Foundation would prefer content moderation requests be handled by the community.

The following is what usually happens when Wikimedia Foundation Legal receives a request, either through privacy@wikimedia.org or legal@wikimedia.org:

  • We evaluate the request and decide how to respond. I am a part of the staff team that evaluates the requests.
  • Many of the requests we receive ask for content on Wikimedia projects to be modified. An increasing number of these requests regard Wikidata, specifically Wikidata items on living persons.
  • Recently, we have been receiving a handful of requests each month. In August 2021, we received 7 of these requests.
  • For some requests we receive, a legal remedy might not be the best course of action. Instead, we refer these requesters to volunteers. These requests often come from individuals who may not understand the community-run structure of Wikimedia projects and do not feel comfortable posting on-wiki, leading to them looking for an email to contact for help. We usually point requesters to talk pages, policy pages, noticeboards, or the Volunteer Response Team (VRT), with the last being particularly important as many requesters struggle with making an effective on-wiki post. The Foundation very rarely makes changes to Wikimedia project content directly, and would prefer that requesters work through community processes in normal cases. Please refer to this section in the Foundation’s Transparency Report for more information.

The Foundation regularly coordinates with the VRT for content moderation requests. We defer to volunteer judgment whenever it is possible and safe to do so.

However, for Wikidata content complaints coming to the Foundation through privacy@wikidata.org, we are unsure of where to refer requesters.
  • We are aware that the Wikidata VRT queue is not very active.
  • Many requesters to privacy@wikidata.org would like their items deleted, citing privacy concerns. Many such cases are ambiguous or do not require the Foundation to act. When we receive similar emails about other projects, we typically forward these to the VRT for that project, which is not something we can currently do for Wikidata.

Possible changes

[edit]
  • Replace the contact email address on the BLP policy page with another one, such as info@wikidata.org. This would require coordination with Wikidata and VRT to ensure adequate volunteer attention for the queue. In particular, there would need to be a team of people interested in operating this queue in the long-term—without that commitment the solution may not be viable.
  • Ask requesters to post their requests publicly, such as on the Wikidata administrator's noticeboard. This defeats the purpose of a semi-private communication channel such as the VRT. Our experience is that many BLP subjects feel uncomfortable posting in on-wiki spaces and benefit from an email option, which works quite well for other projects.

In sum, the Wikimedia Foundation is receiving many content moderation requests involving Wikidata. We think the majority of these requests can and should be handled by volunteers. However, we are unsure of where to ask volunteers for this help.

I understand that this is quite a long post. Thank you for your time and attention, and I look forward to hearing your thoughts on the matter.

Best regards,

BChoo (WMF) (talk) 21:52, 16 September 2021 (UTC)[reply]

The VRT definitely has not enough active users at the Wikidata queue, so shifting anything there does not make any sense. Please acquire more volunteers before any new VRT tasks are created. --Krd 05:43, 17 September 2021 (UTC)[reply]
A serious issue is currently the info queue is only mentioned in Help:Contents and we need to add it to the main page, and/or create Wikidata:Contact us.--GZWDer (talk) 14:22, 17 September 2021 (UTC)[reply]
I don't know much about these VRT's but it sounds like the solution either way (unless we force people to post on-wiki) is to get more Wikidata people involved in that. Can we just post a call for volunteers on project chat? 7 requests/month doesn't sound like we need a huge number of new people; 2 or 3 may be enough? ArthurPSmith (talk) 17:01, 20 September 2021 (UTC)[reply]
@BChoo (WMF) without commenting on the rest of the discussion points, can I please request that tickets you get regarding removal of information be sent to the wikidata oversight queue instead of the general help queue? We can move them to the help queue if oversight is not needed, which would be better than having non-oversighters see requests where oversight is needed. Thanks, --DannyS712 (talk) 03:12, 24 June 2022 (UTC) (Wikidata Oversighter) (please ping when replying)[reply]
[edit]

As of this evening there were (at the very least) 8474 links to redtube, pornhub, youporn, and xhamster at Wikidata.

About two weeks ago I added the requirement to the first three so-called ID properties (those owned by Mindgeek) that such links should be referenced by a reliable source (using the procedure outlined on this project page).

I am not sure how to search for all of the entities having one of these IDs without a reference... but suggest that a bot could be developed which would periodically delete all such unreferenced links as a short-term solution to this practice of linking to pornographic content at Wikidata.

A somewhat better solution would be to delete the properties entirely, as they are not contributing to "knowledge equity". Parenthetically, it can be noted that of the 8474 instances, 489 of them link to people who are not listed as having the occupation "pornographic actor". While some are directors and some are just not labeled with an occupation, this gives a smaller sample to allow people to see how these "properties" have been abused to suggest that actresses -- because it is almost invariably women -- appear on xhamster and other assorted porn sites.

Here are the original property creation discussions, which will allow the small group of men who are responsible for the creation of these properties to step out of the shadows and explain what they were thinking.

The history pages also contain information about other guys who have been tweaking these categories for maximum visibility.

In my opinion, a better, longer-term solution would be to add language to the living person policy at Wikidata saying that Wikidata's vocation is not to be a backend en:MindGeek catalog and that links to pornographic content are not part of Wikidata's mission.

I was alerted to this unfortunate situation via en:Wikipediocracy which recently published an article on the subject.

Does anyone have other suggestions as to how the Wikidata Living People policy should address linking directly to commercial pornographic content? SashiRolls (talk) 20:06, 14 January 2022 (UTC)[reply]

Some useful links:
Someone should review those 105 items and remove the porn IDs where necessary. Dexxor (talk) 21:36, 15 January 2022 (UTC)[reply]
I removed 8, perhaps the property creators could do some so that they understand the game of whack-a-mole they've created?
  • Penelope Cruz
  • Scarlett Johansson
  • Mila Kunis
  • Anne Hathaway
  • Kate Winslet
  • Charlotte Gainsberg
  • Milla Jovovich
  • Shirley Jones (Partridge family)
Most likely, I'll propose deleting the four properties mentioned above that exclusively point to (likely copyright infringing) content soon, unless a convincing argument is made here that they have some value. ps: those who have more time on their hands should feel free to do it now. SashiRolls (talk) 20:35, 16 January 2022 (UTC)[reply]

External identifiers on wikidata don't usually have references. They are generally self-citing in some sense. For example what citations do we provide for Fandom article ID (P6262)? I can't imagine we would find many references for these identifiers. There is nothing inherently wrong with links to pornography especially for people in the industry. I don't know why you place "property" in scare quotes. Obviously having bad data here is a problem though. I would support enforcing a constraint on having some listed occupation for having these properties. BrokenSegue (talk) 21:54, 16 January 2022 (UTC)[reply]

Well, these properties belong to living people protection class (P8274) (and thus must have a reference) whereas fandom does not. This is true because clicking on them leads directly to potentially violent pornographic content like one I deleted earlier today ("Qxxx's leaked painal sex tape" was in the list above). These various porn IDs are potentially privacy violations for multiple reasons (revenge porn, fakes/misidentification, underage models, (simulated) rape, (simulated) snuff films / hangings, sexual violence etc.) You might find this article in Vice about xHamster's review policy (concerning for example "real tears and real hangings" helpful in considering the larger issues involved. In Dec. 2020 Pornhub had to delete more than three-quarters of its content. (§)
Also, parenthetically, I had a look at Betty White (Q373895) and noticed that Ms. White does not have a CBS ID, ABC ID, NBC ID, leading directly to the commercial catalogue of these content providers. She does have a Disney A to Z ID (P6181), but if you click on it I think you'll find it's a bit like your fandom example and very different from an xHamster pornstar ID (P8720), which leads directly to potentially violent content (and which stores porn trackers on readers' computers). SashiRolls (talk) 01:35, 17 January 2022 (UTC)[reply]
...you're the one that added living people protection class (P8274)s [3] and without any discussion as far as I could tell. I don't know what you mean by "porn trackers". Are you saying it installs malware? Either way I would be fine removing the hyperlinks if you feel that would help. BrokenSegue (talk) 01:48, 17 January 2022 (UTC)[reply]
You are correct that I added living people protection class (P8274). I am not saying these sites install malware, just trackers & cookies. Removing the links (which somehow bypass the fact that these sites are otherwise blacklisted) would be a good thing to do, yes. SashiRolls (talk) 01:52, 17 January 2022 (UTC)[reply]
I don't think the use of cookies is a disqualifying factor. Lots of places we link to use them. BrokenSegue (talk) 01:59, 17 January 2022 (UTC)[reply]
You may wish to read Wikidata:Project_chat#Porn_Ids_being_added, which is where I learned that these sites are blacklisted (except when the ID is used). SashiRolls (talk) 02:02, 17 January 2022 (UTC)[reply]
That doesn't seem relevant. Things are added to the spam filter when they are abused. That does not mean they have no legitimate use. I also think you are not discussing things in good faith as you are citing the use of "living people protection class" (both here and on project chat) as if you weren't the one who had just added the property. BrokenSegue (talk) 02:11, 17 January 2022 (UTC)[reply]
Reread the second line of this thread: "About two weeks ago I added the requirement to the first three so-called ID properties (those owned by Mindgeek) that such links should be referenced by a reliable source (using the procedure outlined on this project page)." Assuming bad faith, when I followed the procedure on this project page and said I did, is really a bit twisted. Note that it explicitly says you can add it without discussion. SashiRolls (talk) 02:24, 17 January 2022 (UTC)[reply]
Ok I did miss you mention it here. Sorry. I only brought it up because you mentioned it on project chat too. I wasn't suggesting you didn't follow policy. I was suggesting you were using it as evidence in an argument when really it was just your opinion (but made to look like consensus or policy). BrokenSegue (talk) 02:37, 17 January 2022 (UTC)[reply]
Fair enough. Perhaps I am misunderstanding, but based on my reading of Wikidata:Living People it is policy that these properties require references until consensus removes living people protection class (P8274) from these link-to-porn-content IDs, which as the evidence above shows, does not seem like a great idea. SashiRolls (talk) 02:42, 17 January 2022 (UTC)[reply]
Back to the point. Wikidata is getting around the global blacklist by providing direct links to pornographic content. Redtube, YouPorn, Pornhub & xHamster are all on the Global spam blacklist. Has the work-around allowing direct linking been discussed anywhere or was it just decided unilaterally without consensus? SashiRolls (talk) 20:45, 22 January 2022 (UTC)[reply]
For these identifiers, the same happens as for thousands of other identifier properties: the web UI builds links from the identifier (stored in the data item) and the URL formatter (stored in the property). The links are not stored in the main database itself (but cached somewhere). There was neither a discussion about these cases anywhere, nor were they added unilaterally without consensus explicitly by an individual editor. —MisterSynergy (talk) 21:00, 22 January 2022 (UTC)[reply]
Just for clarity: are there thousands of other identifier properties that link to pages that are listed on the global spam list? I assume you just mean that thousands of links to reputable sites like loc, bnf, viaf, etc. are built on the fly when the page is displayed. (I saw that you explained this mechanics at Project Chat. The question is not the mechanics, but the desirability of overriding global decisions locally). SashiRolls (talk) 21:43, 22 January 2022 (UTC)[reply]
I don't think that this has been discussed. These properties have simply been treated as all others while the mechanism was already in place.
It is also important to consider that these links are a convenience feature of the web UI, which itself is predominantly a tool for editors—unlike the Wikipedia UI which also serves readers. In the UI, you can simply click on these links in order to easily verify correctness of the identifier; it would be a pain if you would need to make a link manually.
Data users usually access information from Wikidata via machine readable interfaces such as WDQS and they do not get these links at all—only the identifier fragments. In fact, identifiers are not predominantly about clickable URLs and we usually do not treat them as links; they simply want to state that a data item describes the same concept as some external resource does. —MisterSynergy (talk) 22:25, 22 January 2022 (UTC)[reply]
So presumably removing the property "formatter URL" from xHamster pornstar ID (P8720) would suffice to deactivate the link in the UI? Amusingly, it is not possible to delete this line as trying to do so triggers the spam filter. Catch-22. (This suggests that getting around the spam filter in the first place required administrative privileges, just as undoing the undiscussed local override will?) Citing the sheer number of misidentified living people, I have requested that an admin remove the formatter URL on the property discussion page. § SashiRolls (talk) 03:03, 23 January 2022 (UTC)[reply]
  • Yes, if the formatter URL was removed from the property page, the links would no longer appear in the web UI. It might already be sufficient to apply deprecated rank to all formatter URL claims of this property. (This could take some days to take effect on item pages since there is some slow caching going on.)
  • Technically, this is the wrong thing to do. The links are an editorial tool, and a removal of the formatter URL would simply be a measure to obstruct the editorial process.
  • It would be better to review the inappropriate claims and set them to deprecated rank in order to make them as invisible for data users as possible. They would still appear in the UI, tough.
  • As much as I am aware, the property page was fully set up including all URL fragments (Oct/Nov 2020) before this URL has been added to the global spam blacklist (Dec 2020). I do not think that any elevated rights were required to bring the property page to the current form.
  • That said, I am not even sure whether admin right would be sufficient to remove the formatter URL. I think I tried this in the past on another one of these and the abuse filter prevented me from doing this, at least not without additional edits to MediaWiki:Spam-whitelist (I am not the greatest expert regarding to spam blacklists and whitelists). Wikidata:Administrators'_noticeboard/Archive/2018/06#Pornhub_(Q936394) and Wikidata:Administrators'_noticeboard/Archive/2017/10#Carstuckgirls.com seem related.
MisterSynergy (talk) 07:46, 23 January 2022 (UTC)[reply]

────────────────────────────────────────────────────────────────────────────────────────────────────

I appreciate your detailed investigation and response. I do not share your opinion concerning the idea that the direct links to pornographic content are part of the "editorial process". In fact an editorial decision was made globally not to link to these commercial content repositories. Overriding that locally (as your links demonstrate was necessary) without site-wide discussion was disrupting the editorial process. Moreover, it appears that many of these entries are being added not via the WD user interface but via Mix-n-match.

We have not yet heard from @Thierry Caro:, who both requested the creation of these property IDs and requested overriding the global spamlist for Pornhub ID and YouPorn ID. It would be interesting to hear his point of view on the fact that these identifiers are being used to spread hoaxes about women like Penelope Cruz, Jennifer Aniston, Aïssa Maïga, Nicole Kidman, Deepika Padukone, and dozens of others. Do you follow Wikidata:Database_reports/Constraint_violations/P8720 for the property xHamster pornstar ID (P8720) you proposed, for example?

Is the normal way of gaining consensus for a decision on WikiData via Request for Comment? Is there a centralized discussion template at WikiData? It seems that almost everyone historically involved in this discussion was / has been male, it would be interesting to get a larger perspective on the compatibility of these IDs with the project mission ("knowledge equity").

-- SashiRolls (talk) 20:01, 23 January 2022 (UTC)[reply]

I believe these are usual properties that don't really need more constraints than any other one. In my humble opinion, it's not up to a single user to decide whether or not Anne Hathaway (Q36301) or whoever else should have the IDs removed, as most of the content the linked databases have about them are snippets from movies they willingly appeared nude in. It's a complicated endeavor to try to censor anything and it's doomed to become arbitrary very fast. I suggest we just ignore the content that we don't care about, as I mostly do with biochemistry, for instance. It's probably just better I don't vnture into trying to guess whether or not this or that scientist likes the bases their Wikidata item links to, which after all they could very well hate unbeknownst to us. We should provide information to the user in a standard, nondiscriminating manner and let him or her decide what to do with it - love it, hate it, support it, ignore it, fight it, etc. Thierry Caro (talk) 09:38, 24 January 2022 (UTC)[reply]
Linking to user-uploaded copyright violations, such as those you describe above concerning Anne Hathaway (Q36301), is generally a no-no. Actually, just for the record, there were zero videos at the link given. It was just a gratuitous claim bulk dumped via QuickStatements.SashiRolls (talk) 10:28, 24 January 2022 (UTC)[reply]
  • By "editorial process" I was referring to the activity of managing identifiers within the existing environment. So, addition, update, removal, ranking, sourcing of specific values in the context of a given data item—this is actually an everyday task that can be done via the web UI where direct links (as opposed to unlinked identifier fragments) are helpful. I am not a native English speaker, thus I apologize for such inaccuracies if this was not clear.
  • The Wikimedia-wide decision to blacklist these domains has mostly been made based on experiences in Wikipedias (where such links do not belong), or remote small Wikimedia projects where sometimes these links happen to live longer since they are not particularly well monitored for vandalism. However, Wikidata is not an encyclopedia and data items are not "read" from top to bottom or similar. Usually data users query only some information from items, and the general notion is that data users are responsible for their selection and post-filtering of queried data. The information offered by Wikidata is usually much wider than what is in scope of Wikipedias/encyclopedias.
  • Yes, RfCs are the ultimate venue to gain consensus for policy changes or any other significant ways to so things here. Unfortunately, it is a somewhat broken process that is difficult to use in a way that a) any consensus is gained at some point, let alone b) the initiator's position has gained substantial support. It is usually expected to thoroughly discuss the problem at WD:PC first, in order to have a clear idea of the problem, the constraints, and the (likely) community position on the problem.
MisterSynergy (talk) 22:29, 23 January 2022 (UTC)[reply]

Adding exceptions to porn content identifiers

[edit]

@BrokenSegue: Would it not make sense to use the identifiers on Paris Hilton? 1 Night in Paris literally got several prices. --Trade (talk) 02:13, 22 January 2022 (UTC)[reply]

that would make sense. but we could add a constraint exception for such cases. BrokenSegue (talk) 18:35, 22 January 2022 (UTC)[reply]
Maybe it would be better to add a reference (to fulfill the constraint it was violating) rather than hard-code it into the definition of the pornhub ID property. (§). Not sure if you (@Trade:) were able to remove any of the examples of folks falsely identified with an xHamster ID above. I saw that you added occupation: pornographic actor to some of the people in the list, so the list is getting smaller. I've removed several dozen fakes now, including some megastars like Demi Moore, Jennifer Aniston, Deepika Padukone, etc. Not sure if the TrustPilot review score listed at xHamster (Q16617922) -- 2.7/5 (Poor) -- is an additional argument for deprecation? SashiRolls (talk) 20:31, 22 January 2022 (UTC)[reply]
Again references for most external IDs are generally impossible/difficult to find. You're really stretching for reasons to remove these properties. Google has a 2.5 from TrustPilot and we source multiple properties from them. BrokenSegue (talk) 16:46, 25 January 2022 (UTC)[reply]
Thank you for your assessment of the value of the Trustpilot (Q7848226) rating. Looking at the wikidata page for xHamster (Q16617922), the "poor" rating stood out. (I asked about it because I did not know its value, not because I had doubts about whether we should be linking to WMF globally blacklisted pornographic content.) SashiRolls (talk) 18:25, 5 February 2022 (UTC)[reply]

Removal of well referenced data

[edit]

Hi all! In some cases I seen through my watchlist users removing well referenced data (typically birth dates and/or places) from items of living persons; in most cases these users declared, implicitly (e.g. choosing their real name as username) and sometimes also explicitly, to be the person whose item they edited. So, our policy states that "Almost any piece of data about a living person might be controversial; anything that's individually challenged or might be challenged should be supported by a reliable public source or may be subject to removal". I have two questions:

  1. if a piece of data is supported by a reliable public source (e.g. a national library catalog and/or a pdf of a document issued by a university), is the removal of such a statement supported by this policy?
  2. if the removal of such a statement isn't supported by this policy, could we add it explicitly somewhere in this policy, in order to have the possibility to link that information to users trying to perform removals?

Thanks, --Epìdosis 20:00, 20 October 2022 (UTC)[reply]

I think the removal is NOT supported by our current policy, and I don't think removal is a good idea if that information is available from public reliable sources like that. For one thing most likely a bot will add it back in, so removal is somewhat pointless. On your second point, yes that might be a good idea, I'm not sure how to phrase it though. If there's a good reason then yes something like that could possibly be removed, but that reason ought to be something like the "fact" is disputed in this particular case or something like that. ArthurPSmith (talk) 20:53, 20 October 2022 (UTC)[reply]
An example of removal, granted as of now: Wikidata:Requests_for_deletions/Archive/2022/10/23#Elena_Gremigni_(Q106806996); personally I tend to agree with @ArthurPSmith: about not including such removals in our policy. --Epìdosis 20:39, 26 October 2022 (UTC)[reply]
@Epìdosis I think it depends a lot on the data in question. We have property that may violate privacy (Q44601380). For properties that are tagged that way a reliable public source is not enough but it's necessary that "Values for living individuals should generally not be supplied unless they can be considered widespread public knowledge or are openly supplied by the individual themselves".
A national library catalog is a reliable public source but it's no proof that there's widespread public knowledge about the piece of data. When I wrote up the policy, I did not intend date of birth (P569) to be in that category and was more thinking about information like email addresses that we don't want to list even if we have reliable public sources if there's no widespread knowledge of them, but currently date of birth (P569) is at that protection level.
Additionally, we have the case of people requesting that information about themselves get removed. For that our policy says "If the subject of an item is unhappy with specific information in the item about themselves, they can request the removal of specific information via the Administrators' noticeboard or by emailing privacywikidata.org. When the information isn't of public interest, an administrator may revision delete it."
This is due to GDPR. The existance of a reliable public source does not in itself indicate ChristianKl20:05, 10 December 2022 (UTC)[reply]
@ChristianKl: My understanding of the present formulation of the page is different. First, I consider information supported by a national library catalog as widespread public knowledge, in the sense that it's a source widely available to anyone using Internet. Second, "anything that's individually challenged or might be challenged should be supported by a reliable public source or may be subject to removal" means that only information which is not supported by a reliable public source is subject to the clause "If the subject of an item is unhappy with specific information in the item about themselves, they can request the removal of specific information via the Administrators' noticeboard or by emailing privacywikidata.org. When the information isn't of public interest, an administrator may revision delete it.". Anyway, if my understanding of the present formulation is not compliant with GDPR (which may be, of course; anyway, I would be interested in the quotation of one or more precise points regarding our case, which IMHO should be added as a reference in the policy), I suggest to make it a bit clearer in the policy, maybe through a formulation like "If the subject of an item is unhappy with specific information in the item about themselves (including information supported by reliable public sources but that cannot be considered widespread public knowledge)". --Epìdosis 20:25, 10 December 2022 (UTC) P.S. my point is mainly concerned with date of birth (P569) (and place of birth (P19)), while I perfectly agree about email address (P968) and other more private data like religion or worldview (P140), residence (P551) etc. (BTW, I think we could also add number of children (P1971) to this list, currently it has no living people protection class (P8274))[reply]
This policy is a lesson for me in how people can misunderstand what I meant to write in a wrote it initially and I agree that I would make sense to make a few aspects more clear. I thought it would be better to give the two classes of "Statements likely to be challenged" and "Statements that may violate privacy" names in natural language instead of naming them "protection class I" and "protection class II". Probably, using "protection class II" would have made it less likely that people would have tagged that date of birth (P569) with it.
The term widespread public knowledge for "Statements that may violate privacy"/"protection class II" was explicitely picked here to contrast it with the earlier reliable public source that's needed for "Statements likely to be challenged"/"protection class I". There's a lot of information that's on the internet but that's not widespread public knowledge and for statements like phone numbers, email addresses or medical illnesses where it would not be good for privacy purposes to have it on Wikidata.
Public interest is a valid GDPR defense as far as I remember. To be more explicit "processing is necessary for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) based on Union or Member State law which shall be proportionate to the aim pursued, respect the essence of the right to data protection and provide for suitable and specific measures to safeguard the fundamental rights and the interests of the data subject."
Second, "anything that's individually challenged or might be challenged should be supported by a reliable public source or may be subject to removal" means that only information which is not supported by a reliable public source is subject to the clause
That statement is about "Statements likely to be challenged"/"protection class I". It also does neither explicitly negate earlier mentioned rights of people to ask for information removal of information that's not in the public interest nor was intended to negate that right when it was written. ChristianKl20:56, 10 December 2022 (UTC)[reply]

Ambiguity of this policy

[edit]

Wikidata:Living people#Statements that may violate privacy is apparently only meant to apply to properties with living people protection class (P8274)property that may violate privacy (Q44601380)

... however it does not say that ... at all.

Values for living individuals should generally not be supplied unless they can be considered widespread public knowledge or are openly supplied by the individual themselves (otherwise hidden supporting references are not sufficient).

Very much sounds like this applies to any values of any properties. Yes I know that the section is named Statements that may violate privacy however that really does not imply that the scope of the text within the section is constrained to properties with living people protection class (P8274)property that may violate privacy (Q44601380).

--Push-f (talk) 07:35, 14 December 2022 (UTC)[reply]

Something that is verifiable is different from something that is verified. For statements that may violate privacy, what is need is you need to explicitly provide a source on them. GZWDer (talk) 22:39, 23 December 2022 (UTC)[reply]
Yes I know the difference between verifiable and verified. My point is that the policy is unclear on which "statements may violate privacy". It should be very much obvious that properties can violate privacy even if they don't have the living people protection class (P8274). I think it's very obvious that almost all properties can be used to violate privacy, e.g. even part of (P361), significant event (P793) or URL (P2699) can be used to violate privacy. So I'd even suggest that the naming of property that may violate privacy (Q44601380) (currently "property that may violate privacy") is very much misleading. It should be named something like "property that is particularly privacy-sensitive" ... but the absence of that "protection class" does not at all imply that a property cannot be used to violate privacy ... and I think the Statements that may violate privacy section should be clarified accordingly to say that the sentence I quoted above applies to all properties, not just properties with the protection class, like @ChristianKl: previously suggested to be the meaning of the section. --Push-f (talk) 05:52, 24 December 2022 (UTC)[reply]

Remove HTML table for accessibility and readability

[edit]

{{edit protected}}

This is not an edit protected request per se, but I don’t really know how to correctly change the translation-based pages, so I’m tagging it with that. This page uses a table to have icons for its contents. That makes the page significantly harder to read in mobile version and also has some accessibility problems since this is a presentation table used as a data table. I would appreciate if someone could remove the table markup from the page and put the icons under the headings of the specific sections. That would greatly improve the accessibility and responsiveness of this page. stjn[ru] 18:44, 27 July 2023 (UTC)[reply]

@Stjn: not requested edit. Please discuss it at Wikidata:Translators' noticeboard Estopedist1 (talk) 05:08, 10 August 2023 (UTC)[reply]

Deletion of authors of notable publications

[edit]
JakobVoss (talk) ClaudiaMuellerBirn (talk) Criscod (talk) Daniel Mietchen (talk) Ettorerizza (talk) Ls1g (talk) Pasleim (talk) Hjfocs (talk) 17:24, 21 January 2019 (UTC) PKM (talk) 2le2im-bdc (talk) 20:30, 24 January 2019 (UTC) Vladimir Alexiev (talk) 16:37, 21 March 2019 (UTC) ElanHR (talk) User:Epìdosis (talk) Tris T7 TT me UJung (talk) 11:43, 24 August 2019 (UTC) Envlh (talk) SixTwoEight (talk) User:SCIdude (talk) Will (Wiki Ed) (talk) Mathieu Kappler (talk) So9q (talk) 19:33, 8 September 2021 (UTC) Zwolfz (talk) عُثمان (talk) 16:31, 5 April 2023 (UTC) M2k~dewiki (talk) 12:28, 24 September 2023 (UTC) —Ismael Olea (talk) 18:18, 2 December 2023 (UTC) Andrea Westerinen (talk) 23:33, 2 December 2023 (UTC) Peter Patel-Schneider[reply]

Notified participants of WikiProject Data Quality As of now, the only reference to the possible deletion of items due to this policy is "Items about living people which are not notable can be deleted." According to the linked notability policy, an item is notable if it falls into at least one of the notability criteria. The criterium 3 is "It fulfills a structural need, for example: it is needed to make statements made in other items more useful." In my understanding, if an item about a living person is linked through author (P50) from at least one item about a notable publication (e.g. a book with ISBN-13 (P212) or ISBN-10 (P957), or a scientific article with a DOI (P356)) in notable per criterium 3, and thus cannot be deleted under the aforementioned phrase of this policy. However, @DannyS712: has recently deleted one such item per this policy; whilst I'm not particularly concerned about this specific deletion (the person described in the deleted item has authored only one publication having a Wikidata item), I think this could be a good occasion to clarify the aforementioned phrase of this policy, editing it in one of the three following ways:

  1. "Items about living people which are not notable, or which are notable only per criterium 2 and/or 3, can be deleted."
  2. "Items about living people which are not notable, or which are notable only per criterium 3, can be deleted."
  3. "Items about living people which are not notable can be deleted; items about living people which are notable can not be deleted, but specific information can be removed from them."

This would avoid further ambiguities of interpretation. I would also suggest, if option 1 or 2 is approved, to keep somewhere a simple list of the notable Q IDs deleted according to this policy; personally I would support option 3. Opinions are welcome. --Epìdosis 00:20, 1 October 2023 (UTC)[reply]

Thanks for raising this to my attention. Can you give an example of what information would make sense to remove from such an item?. I also support option 3. So9q (talk) 06:35, 1 October 2023 (UTC)[reply]
Good question, I'm not sure (between option 2 and 3). I would say that that respect of living people trumps the notability but it probably depends on other factors. For this specific case, 1. one ISBN or one DOI is very low clue of notability, 2. we have author name string (P2093). But where to put the threshold? (are even ISBN/DOI relevant here? someone with a lot of publication with ISBN probably also as others sources).
In any case, a clarification would be indeed welcome (and through an RFC for more publicity?).
Cheers, VIGNERON (talk) 12:05, 1 October 2023 (UTC)[reply]
In general I support option 3. In particular, items about authors of even a single text which is used as a reference (or "described by source") in one or more Wikidata items should not be deleted. PKM (talk) 23:19, 1 October 2023 (UTC)[reply]
Option 3 definitely. Given some authors are corporate, having at least instance of (P31) human (Q5) is useful information that should be on its own item, not a qualifier on the author name string. Creating an item doesn't require the item to have all possible properties added. In general our policy on living people should cover what statements are added with or without references, or not added at all without express permission. ArthurPSmith (talk) 18:05, 2 October 2023 (UTC)[reply]
  • Critical to track, no policy for now The situation is that we have general policies, but here we have what may be an unusual case where an individual has requested deletion of a Wikidata item about themselves. My guess is that currently we receive about 10 of these a year, which at Wikidata's scale would not be worth regulating. I do not think we should make rules to govern the exceptions because we do not have enough of them. I do think that we should have a central list where people can start logging case studies for personal requests for deletion so that eventually we can look at the nature of cases and develop a policy for this. Posting to the talk page here is a good option, and that is already done for this case. I suggest granting deletions by request, but clarifying to the requestor that anyone can re-create an item for them at any time. For truly fringe personalities their Wikidata item would go; for anyone really notable, they should expect that Wikidata editors will spontaneously log them again at which point they can request deletion again. Bluerasberry (talk) 17:18, 3 October 2023 (UTC)[reply]
    I like this approach a lot. Too many rules are no fun either and this seems very rare indeed. So9q (talk) 02:52, 6 October 2023 (UTC)[reply]
I'm also for option 3. ChristianKl14:41, 8 December 2023 (UTC)[reply]