[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update phenotype schema sem-sim tables #483

Open
julesjacobsen opened this issue Mar 9, 2023 · 2 comments
Open

Update phenotype schema sem-sim tables #483

julesjacobsen opened this issue Mar 9, 2023 · 2 comments

Comments

@julesjacobsen
Copy link
Contributor

No description provided.

@matentzn
Copy link
Collaborator
matentzn commented Mar 9, 2023

This is our Christmas wish list - feel free to decide which ones go too far, and which gifts you are willing to make.

  • Merge similarity tables into one. Right now, there are three separate similarity tables. This requires a (little) bit of churn when building experimental pipelines, subsetting semantic similarity profiles and pushing them into the database. It will also more easily cater too future experiments where we want to play with phenotype from other species beyond Zebrafish and Mouse.
  • Canonicalise column headers: It would be super great if we could:
  • Not hardcode similarity measures perhaps. Right now you have columns like SIMJ which mean "jaccard similarity" - what of we want to try different similarity measures? Cosine similarity? One thing that @cmungall and I keep disagreeing on is the denormalised structure of the phenotypic profiles. This is a perfect example of why: For you as software developer, you want a fixed schema for your tables to drive your software. (one similarity score). We as experimentalists want to see what happens if we drop in dozens of other semantic similarity measures, and see how it affects the results. Now we can always overload the named field (i.e. stick "cosine similarity" values into the "jaccard_similarity" column which we know is used by Exomier) but it feels less beautiful then having a simple similarity column (and perhaps a separate measure column for jaccard, cosine etc).

Anyways, not sure about this last one. You know best!

@matentzn
Copy link
Collaborator
matentzn commented Mar 9, 2023

(Maybe the right thing to do is make the "similarity measure" configurable on the config file level for an exomiser run, and keep total number of columns in Exomiser open?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants