-
-
Notifications
You must be signed in to change notification settings - Fork 25.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH add new function sort_graph_by_row_values #23139
Conversation
I also renamed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR!
I think we need an entry in doc/modules/classes.rst
for the new function.
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last concern, otherwise this looks ready to go.
Thanks for the comments! |
I think the most common use case would be how we use it in graph = check_array(graph, accept_sparse="csr")
if not is_graph_sorted_by_row_values(graph):
graph = sort_graph_by_row_values(graph) In this case, we can not stop users from starting off with a What if we put the def sort_graph_by_row_values(graph, copy=False, warn_when_not_sorted=True):
# check for accepted graph formats
if is_graph_sorted_by_row_values(graph):
return graph
# Maybe always warn?
if warn_when_not_sorted:
warnings.warn(...)
if graph.format != "csr":
graph = graph.asformat("csr")
elif copy:
graph = graph.copy()
# sort graph by rows
return graph This is similar to what you had before, but with the extra |
I like it, see my latest commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be cool to add an entry in the dev utilities:
https://scikit-learn.org/dev/developers/utilities.html#developers-utils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only nitpicks. Otherwise LGTM
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Thanks @TomDLT all good. |
Fixes #22880
This PR add a new public function called
sort_by_row_values
, to allow sorting precomputed sparse distance graphs before using them in estimators such asDBSCAN
,TSNE
,Isomap
, etc.Currently, if a precomputed sparse distance graph is not sorted by row values, the nearest neighbors code sorts the graph and raises an
EfficiencyWarning
. Making a public function allows users to sort the graph beforehand.Happy to change to another name if needed. (
sort_by_data
,sort_graph_by_row_values
,sort_csr_by_data
,sort_csr_by_row_values
, ... ?)