Update phenotype schema sem-sim tables #483

julesjacobsen · 2023-03-09T13:26:41Z

No description provided.

matentzn · 2023-03-09T16:18:28Z

This is our Christmas wish list - feel free to decide which ones go too far, and which gifts you are willing to make.

Merge similarity tables into one. Right now, there are three separate similarity tables. This requires a (little) bit of churn when building experimental pipelines, subsetting semantic similarity profiles and pushing them into the database. It will also more easily cater too future experiments where we want to play with phenotype from other species beyond Zebrafish and Mouse.
Canonicalise column headers: It would be super great if we could:
- Make sure that all semantic similarity tables have the same columns, and not MP_, ZP_ etc in them
- Ideally adopt the exact column names from OAK, see https://github.com/INCATools/ontology-access-kit/blob/main/src/oaklib/datamodels/similarity.yaml#L45, which would allow us to drop all postprocessing code that renames columns
Not hardcode similarity measures perhaps. Right now you have columns like SIMJ which mean "jaccard similarity" - what of we want to try different similarity measures? Cosine similarity? One thing that @cmungall and I keep disagreeing on is the denormalised structure of the phenotypic profiles. This is a perfect example of why: For you as software developer, you want a fixed schema for your tables to drive your software. (one similarity score). We as experimentalists want to see what happens if we drop in dozens of other semantic similarity measures, and see how it affects the results. Now we can always overload the named field (i.e. stick "cosine similarity" values into the "jaccard_similarity" column which we know is used by Exomier) but it feels less beautiful then having a simple similarity column (and perhaps a separate measure column for jaccard, cosine etc).

Anyways, not sure about this last one. You know best!

matentzn · 2023-03-09T16:27:46Z

(Maybe the right thing to do is make the "similarity measure" configurable on the config file level for an exomiser run, and keep total number of columns in Exomiser open?)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update phenotype schema sem-sim tables #483

Update phenotype schema sem-sim tables #483

Update phenotype schema sem-sim tables #483

Update phenotype schema sem-sim tables #483

Comments