In Gaussian mixtures, when n_init > 1, the lower_bound_ is not always the max #10869

ageron · 2018-03-25T13:50:23Z

Description

In Gaussian mixtures, when n_init is set to any value greater than 1, the lower_bound_ is not the max lower bound across all initializations, but just the lower bound of the last initialization.

The bug can be fixed by adding the following line just before return self in BaseMixture.fit():

self.lower_bound_ = max_lower_bound

The test that should have caught this bug is test_init() in mixture/tests/test_gaussian_mixture.py, but it just does a single test, so it had a 50% chance of missing the issue. It should be updated to try many random states.

Steps/Code to Reproduce

import numpy as np
from sklearn.mixture import GaussianMixture

X = np.random.rand(1000, 10)
for random_state in range(100):
    gm1 = GaussianMixture(n_components=2, n_init=1, random_state=random_state).fit(X)
    gm2 = GaussianMixture(n_components=2, n_init=10, random_state=random_state).fit(X)
    assert gm2.lower_bound_ > gm1.lower_bound_, random_state

Expected Results

No error.

Actual Results

Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
AssertionError: 4

Versions

>>> import platform; print(platform.platform())
Darwin-17.4.0-x86_64-i386-64bit
>>> import sys; print("Python", sys.version)
Python 3.6.4 (default, Dec 21 2017, 20:33:21)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.38)]
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.14.2
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 1.0.0
>>> import sklearn; print("Scikit-Learn", sklearn.__version__)
Scikit-Learn 0.19.1

The text was updated successfully, but these errors were encountered:

…ixes scikit-learn#10869

jnothman · 2018-03-26T06:46:01Z

Thanks for the report and the analysis!

…ls when n_init > 1 (#10870) * Set lower_bound_ to max lower bound at the end of BaseMixture.fit(), fixes #10869 * Use a local lower_bound variable rather than self.lower_bound_ during training, in BaseMixture.fit() * Remove extra empty line * Update documentation and reduce test_init() iterations from 100 to 25 * Update whats_new/v0.20.rst to add mention of issue 10869, and reformat file to fit on 80 columns * Remove extra line in whats_new/v0.20.rst * Add tests for convergence detection in Gaussian mixtures when warm_start=True * Remove unnecessary catch_exception blocks * Fix tests since n_iter_ was recently fixed to be increased by 1 * Revert changes unrelated to PR 10870 in doc/whats_new/v0.20.rst * Replace assert_* with plain asserts because of the move to pytest * Remove comment in whats_new/v0.20.rst that will be added upon merging * Replace single backticks with double backticks in doc string * Limit is 79 characters per line, not 80. * Remove the false convergence fix to treat it in a separate PR

ageron added a commit to ageron/scikit-learn that referenced this issue Mar 25, 2018

Set lower_bound_ to max lower bound at the end of BaseMixture.fit(), f…

1b9f54e

…ixes scikit-learn#10869

ageron mentioned this issue Mar 25, 2018

[MRG+1] Fix lower_bound_ not equal to max lower bound in mixture models when n_init > 1 #10870

Merged

jnothman added the Bug label Mar 26, 2018

GaelVaroquaux closed this as completed in #10870 Jul 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In Gaussian mixtures, when n_init > 1, the lower_bound_ is not always the max #10869

In Gaussian mixtures, when n_init > 1, the lower_bound_ is not always the max #10869

In Gaussian mixtures, when n_init > 1, the lower_bound_ is not always the max #10869

In Gaussian mixtures, when n_init > 1, the lower_bound_ is not always the max #10869

Comments

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions