-
-
Notifications
You must be signed in to change notification settings - Fork 25.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
utils.validation.check_array throws bad TypeError pandas series is passed in #12699
Comments
This should have been fixed in #12625, and I can confirm that the corresponding commit is part of the 0.20.1 tag. Could you please double-check that you are using 0.20.1. Also please post the full traceback, which will help us investigate. |
I think @greghoop is using 0.20.1, seems that it's not appropriate to use @greghoop Please provide self-contained example code, including imports and data (if possible), so that other contributors can just run it and reproduce your issue. Ideally your example code should be minimal. And please provide the output of |
I cannot reproduce locally. And I think we have a test in Travis CI. |
I've already downgraded to 0.20.0 so that won't help.
If I get a second I'll create an example, but I can't upload the current code. The problem occurs for me on both Mac and Ubuntu in Python 3.7 with Pandas 0.23.4 |
Given the initial error message, I think this is indeed 0.20.1. The
that was the characteristics we were using in scikit-learn/sklearn/utils/validation.py Lines 477 to 481 in ccf0d92
to deal with Series objects. But I am now thinking of a case where the error can come up: with the custom pandas dtypes ...
@greghoop can it be the case that you were also using categorical data? |
Downgrading to 0.20.0 won't help (downgrading to 0.19.X will)
Thanks, I think you only need to provide the data type of y (you passed into check_array). |
Apart from fixing it here in sklearn, we should consider on the pandas side if we want to fix this (pandas-dev/pandas#24033). |
I agree that we should avoid using |
@qinhanmin2014 Yes, see my example above: #12699 (comment) (at least, I assume that is what happened, @greghoop needs to confirm that, as you will never actually get |
Yes, it was specifically for a categorical dtype series. Apologies for the confusion. FYI, this isn’t a problem in 0.20.0 (I don’t need to downgrade to 0.19.x to get this working). I don’t not know if it affects other dtypes |
@jorisvandenbossche Thanks, I've seems your example above. I'm wondering what's type 'DTYPE NAME OF THE SERIES' since we need to reproduce the reporter's error. Any insights? |
That's just something that @greghoop put there, he apologized for the confusion :-) |
Ahh, I see the comment from the contributor, let's replace |
@greghoop And thanks for the report! |
@qinhanmin2014 I would maybe replace the Shall I do a PR, or are you already on it? |
@jorisvandenbossche I'm not available now so please go ahead :) |
Description
validation.check_array throws bad TypeError pandas series is passed in. It cropped up when using the RandomizedSearchCV class. Caused when line 480 is executed
480 - if hasattr(array, "dtypes") and len(array.dtypes):
Steps/Code to Reproduce
validation.check_array(y, ensure_2d=False, dtype=None) where y is a pandas series
Expected Results
No error (I'm not familiar with this code so not sure on the details)
Actual Results
TypeError: object of type 'DTYPE NAME OF THE SERIES' has no len()
Versions
0.20.1
The text was updated successfully, but these errors were encountered: