[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBSCAN seems not to use multiple processors (n_jobs argument ignored) #8003

Closed
pfaucon opened this issue Dec 7, 2016 · 9 comments · Fixed by #10887
Closed

DBSCAN seems not to use multiple processors (n_jobs argument ignored) #8003

pfaucon opened this issue Dec 7, 2016 · 9 comments · Fixed by #10887
Labels
Documentation Easy Well-defined and straightforward way to resolve

Comments

@pfaucon
Copy link
pfaucon commented Dec 7, 2016

Description

DBSCAN seems not to use multiple processors (n_jobs argument ignored)
it looks like dbscan hands the arguments off to nearest neighbor, but NN only uses the n_jobs arguments for certain clustering types (presumably not ones that dbscan calls by default). It would be good to mention how to change input to use the n_jobs parameter, and possibly modify the default values to make it useful.

Steps/Code to Reproduce

code taken from:
http://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html#sphx-glr-auto-examples-cluster-plot-dbscan-py

import numpy as np
from sklearn.cluster import DBSCAN
from sklearn import metrics
from sklearn.datasets.samples_generator import make_blobs
from sklearn.preprocessing import StandardScaler

centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(n_samples=1000000, centers=centers, cluster_std=0.4,
random_state=0)

X = StandardScaler().fit_transform(X)

db = DBSCAN(eps=0.3, min_samples=10, n_jobs=-1).fit(X)

Expected Results

answer is correct but the job should be split between processors, and time consumed should be significantly less.

Actual Results

seems to run on only one processor

Versions

import platform; print(platform.platform())
Linux-3.13.0-101-generic-x86_64-with-Ubuntu-14.04-trusty
import sys; print("Python", sys.version)
Python 3.4.3 (default, Nov 17 2016, 01:08:31)
[GCC 4.8.4]
import numpy; print("NumPy", numpy.version)
NumPy 1.11.2
import scipy; print("SciPy", scipy.version)
SciPy 0.18.1
import sklearn; print("Scikit-Learn", sklearn.version)
Scikit-Learn 0.18.1

@amueller
Copy link
Member
amueller commented Dec 7, 2016

you can set algorithm="brute" to use multiple cores but that will probably make it slower. The neighbors module decides it wants to use a tree, which we haven't parallelized yet.

How many cores do you have? And can you report times for the default setting an for algorithm="brute"?

@jnothman
Copy link
Member
jnothman commented Dec 8, 2016 via email

@jnothman
Copy link
Member
jnothman commented Dec 8, 2016

@pfaucon, a clarification in the documentation is welcome. Please submit a pull request. Otherwise I'm closing this as something we can't do much about.

@jnothman jnothman closed this as completed Dec 8, 2016
@jnothman
Copy link
Member
jnothman commented Dec 8, 2016

Actually, as you seem to be requesting documentation changes, I'll leave it open and you or someone else can contribute a fix.

@jnothman jnothman reopened this Dec 8, 2016
@jnothman jnothman added Documentation Easy Well-defined and straightforward way to resolve Need Contributor labels Dec 8, 2016
@Don86
Copy link
Contributor
Don86 commented Dec 12, 2016

Hi, I'm new to scikit learn, but I'd like to contribute to this.

@jnothman
Copy link
Member

Go ahead

@kushagraagrawal
Copy link

is this issue still open? I'm new to scikit-learn, and would like to try

@amueller
Copy link
Member

@kushagraagrawal no, PR at #8039

@sp7412
Copy link
sp7412 commented Jan 29, 2020

Just wondering if there's any update on this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Easy Well-defined and straightforward way to resolve
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants