[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add schema.org sdLicense #545

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Add schema.org sdLicense #545

wants to merge 3 commits into from

Conversation

mkuchnik
Copy link
Contributor

Add a field for the metadata's license

See #544

@mkuchnik mkuchnik added enhancement New feature or request WIP work in process labels Feb 21, 2024
@mkuchnik mkuchnik requested a review from a team as a code owner February 21, 2024 02:28
Copy link
github-actions bot commented Feb 21, 2024

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@mkuchnik mkuchnik marked this pull request as draft February 23, 2024 03:27
@mkuchnik mkuchnik force-pushed the feature_sdlicense branch 4 times, most recently from c85ea2c to cf0437a Compare February 23, 2024 21:44
@mkuchnik mkuchnik removed the WIP work in process label Feb 23, 2024
@mkuchnik mkuchnik marked this pull request as ready for review February 23, 2024 21:57
@mkuchnik
Copy link
Contributor Author
mkuchnik commented Feb 23, 2024

https://schema.org/sdLicense is in the spec now. I set the sdLicenses to Apache 2.0, since the repository is Apache 2.0. sdLicense is either None or string for now.

Note: migration for bigcode-the-stack/metadata.json seems to have lost some information. I'm not sure if this is intended or not.

Copy link
Contributor
@marcenacp marcenacp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just wrote a few comments.

Also, do you want to use apache-2.0 (Hugging Face way) or a URL (Croissant way described in the specs)?

Croissant recommends using the URL of a known license, e.g., one of the licenses listed at https://spdx.org/licenses/.

datasets/1.0/bigcode-the-stack/metadata.json Outdated Show resolved Hide resolved
datasets/1.0/bigcode-the-stack/metadata.json Outdated Show resolved Hide resolved
datasets/1.0/bigcode-the-stack/metadata.json Outdated Show resolved Hide resolved
@mkuchnik
Copy link
Contributor Author

@marcenacp The migration tool applies these changes even to v0.8, so I manually merged the diffs. Also, I think the migration tool may reorder the alphabetical order differently than a JSON formatter/linter, but this seems minor.

@mkuchnik
Copy link
Contributor Author

@marcenacp Is there something that would block the merge if I rebased onto the current main?

Copy link
@Zack-83 Zack-83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that documentation is required to explain in what the fields license and sdLicense differ. Do they accept labels/strings or URIs?

@marcenacp
Copy link
Contributor

@Zack-83 They are schema.org fields so you can find the documentation on respectively https://schema.org/license and https://schema.org/sdLicense. Does this make sense to you? We could explicitly point to those URLs.

@mkuchnik Nothing blocks the merge on my side. Do you have all the rights for merge?

@@ -265,6 +268,17 @@ def validate_license(self) -> list[str] | None:
self.add_error(f"License should be a list of str. Got: {license}")
Copy link
@Zack-83 Zack-83 May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.add_error(f"License should be a list of str. Got: {license}")
self.add_error(f"The object license should be a list of strings. Got: {license}")

elif isinstance(sd_license, str):
return sd_license
else:
self.add_error(f"sdLicense should be a str. Got: {sd_license}")
Copy link
@Zack-83 Zack-83 May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.add_error(f"sdLicense should be a str. Got: {sd_license}")
self.add_error(f"The license for the metadata (schema:sdLicense) should be a string. Got: {sd_license}")

@Zack-83
Copy link
Zack-83 commented May 16, 2024

@Zack-83 They are schema.org fields so you can find the documentation on respectively https://schema.org/license and https://schema.org/sdLicense. Does this make sense to you? We could explicitly point to those URLs.

@mkuchnik Nothing blocks the merge on my side. Do you have all the rights for merge?

@marcenacp thanks for considering my suggestion. I suppose that it would be useful to reference the URLs or disambiguate the two terms in another way at whatever position where the user inputs the license manually or where he/she gets some output, e.g. in the error messages of metadata.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants