-
-
Notifications
You must be signed in to change notification settings - Fork 25.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DBSCAN seems not to use multiple processors (n_jobs argument ignored) #8003
Comments
you can set How many cores do you have? And can you report times for the default setting an for |
In #4009, we failed to find an implementation in which parallelism in
radius_neighbors for the spatial trees was effectively faster. Perhaps this
needs further experimentation.
…On 8 December 2016 at 06:08, Andreas Mueller ***@***.***> wrote:
you can set algorithm="brute" to use multiple cores but that will
probably make it slower. The neighbors module decides it wants to use a
tree, which we haven't parallelized yet.
How many cores do you have? And can you report times for the default
setting an for algorithm="brute"?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#8003 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6xUQvcXH4z7DigPYkcPgSikJYN8cks5rFwQigaJpZM4LG8pK>
.
|
@pfaucon, a clarification in the documentation is welcome. Please submit a pull request. Otherwise I'm closing this as something we can't do much about. |
Actually, as you seem to be requesting documentation changes, I'll leave it open and you or someone else can contribute a fix. |
Hi, I'm new to scikit learn, but I'd like to contribute to this. |
Go ahead |
is this issue still open? I'm new to scikit-learn, and would like to try |
@kushagraagrawal no, PR at #8039 |
Just wondering if there's any update on this issue. |
Description
DBSCAN seems not to use multiple processors (n_jobs argument ignored)
it looks like dbscan hands the arguments off to nearest neighbor, but NN only uses the n_jobs arguments for certain clustering types (presumably not ones that dbscan calls by default). It would be good to mention how to change input to use the n_jobs parameter, and possibly modify the default values to make it useful.
Steps/Code to Reproduce
code taken from:
http://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html#sphx-glr-auto-examples-cluster-plot-dbscan-py
import numpy as np
from sklearn.cluster import DBSCAN
from sklearn import metrics
from sklearn.datasets.samples_generator import make_blobs
from sklearn.preprocessing import StandardScaler
centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(n_samples=1000000, centers=centers, cluster_std=0.4,
random_state=0)
X = StandardScaler().fit_transform(X)
db = DBSCAN(eps=0.3, min_samples=10, n_jobs=-1).fit(X)
Expected Results
answer is correct but the job should be split between processors, and time consumed should be significantly less.
Actual Results
seems to run on only one processor
Versions
The text was updated successfully, but these errors were encountered: