[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: tf.multi_one_hot that is one-hot encoding multiple columns of a Tensor #16044

Closed
mkurovski opened this issue Jan 11, 2018 · 13 comments
Labels
type:feature Feature requests

Comments

@mkurovski
Copy link
mkurovski commented Jan 11, 2018

Hi there,

I just wrote a function that creates multiple one-hot-encodings for a tensor and concatenates them. I was curious whether this might serve some others and contribute this feature.

def multiple_one_hot(cat_tensor, depth_list):
    """Creates one-hot-encodings for multiple categorical attributes and
    concatenates the resulting encodings

    Args:
        cat_tensor (tf.Tensor): tensor with mutiple columns containing categorical features
        depth_list (list): list of the no. of values (depth) for each categorical

    Returns:
        one_hot_enc_tensor (tf.Tensor): concatenated one-hot-encodings of cat_tensor
    """
    one_hot_enc_tensor = tf.one_hot(cat_int_tensor[:,0], depth_list[0], axis=1)
    for col in range(1, len(depth_list)):
        add = tf.one_hot(cat_int_tensor[:,col], depth_list[col], axis=1)
        one_hot_enc_tensor = tf.concat([one_hot_enc_tensor, add], axis=1)

    return one_hot_enc_tensor

I am happy for your feedback. Tell me if you think others might profit and I would enjoy to create a pull request ;)

@cy89 cy89 added the type:feature Feature requests label Jan 11, 2018
@cy89
Copy link
cy89 commented Jan 12, 2018

@squall-1002 thanks for volunteering! I will mark "contributions welcome".

@cy89 cy89 added the stat:contribution welcome Status - Contributions welcome label Jan 12, 2018
@divyanshj16
Copy link

@squall-1002 If you are not working on it can I make a PR?

@mkurovski
Copy link
Author

@divyanshj16 Thanks, I just pushed it and created the PR, but feel free to comment ;)

@wileeam
Copy link
wileeam commented Feb 16, 2018

@squall-1002 Could you give some practical example of this method?
I checked the tests but I am not sure of a use case where I would benefit from this, hence reverting to you for more details (maybe you also want to add the example in the documentation of the method for others like me :)

@mkurovski
Copy link
Author
mkurovski commented Feb 17, 2018

@wileeam Of course: Recommender Systems ;)
In RecSys I am dealing with lots of features of varying modalities, among them many categorical features. Assume we work on vehicle recommendations for an online market. I have colors, fuel types, maker IDs, model IDs, etc. The way we treat those features is to one-hot encode them. But, where colors or fuel types may have tens of possible values, we may have thousands of model IDs which is why I can't include this as a test as this would be way too much.
In this case and I assume in others where people deal with multiple categorical features, one could profit from a higher abstraction of tf.one_hot which is why I propose tf.multi_one_hot.
Hope that helps you to understand the practical use of it.

@lenjoy
Copy link
lenjoy commented Apr 7, 2018

hi, @squall-1002 , in recommender system, anyway we need to do some data process to merge the ID features (such as maker IDs) into a list.

Do you think the code below is working as you expected?

import tensorflow as tf

# some sparse features, from raw maker_id to index
maker_id_list = [1, 3, 9, 14, 2]
one_hot_enc = tf.one_hot(indices=maker_id_list, depth=16)

with tf.Session() as sess:
    feature_list = tf.reduce_sum(one_hot_enc, reduction_indices=0)
    print(sess.run(feature_list))

The output is

[0. 1. 1. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0.]

@martinwicke martinwicke removed the stat:contribution welcome Status - Contributions welcome label May 1, 2018
@martinwicke
Copy link
Member

(API review)

I just closed #16300, which contained an implementation. We believe this function is too specialized and too hard to understand and use effectively to be of general use. I'll close this feature request.

Sorry about the delay in making this decision.

@NicoCoallier
Copy link
NicoCoallier commented Oct 9, 2018

Amazing function !!! Very useful for ML models ! However it seems like a part is missing cat_int_tensors vs cat_tensors

@KacperKubara
Copy link

For people who would like to use tf.one_hot() function for a multi-label classification problem (e.g. multi-label text classification):

Modified version of the code made by @lenjoy :

import tensorflow as tf

indices = tf.ragged.constant([[1, 2], [1], [3, 2]])

one_hot = tf.one_hot(indices, depth=4)
one_hot_multi = tf.reduce_max(one_hot, axis=1)

Outputs is in this case:

<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[0., 1., 1., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 1.]], dtype=float32)>

@martinwicke, I don't particularly agree that the function is too specialized. Multi-label classification problems are quite common nowadays and tf.one_hot() is currently not sufficient to make it work. Maybe instead of creating another function, tf.one_hot() could be modified to provide this functionality?

@rsnk96
Copy link
Contributor
rsnk96 commented Jul 21, 2020

@martinwicke, I don't particularly agree that the function is too specialized. Multi-label classification problems are quite common nowadays and tf.one_hot() is currently not sufficient to make it work. Maybe instead of creating another function, tf.one_hot() could be modified to provide this functionality?

I agree with this. The usage of multi-hot labels isn't limited to just recommender systems, but also multi-output networks and multi-task learning.

@mhorlacher
Copy link

Any updates on this issue? What's the current recommended way to achieve multi_hot encoding for a set of string labels in TF-2.4?

@Yannik1337
Copy link
Yannik1337 commented Jun 29, 2021

@mhorlacher you can use the following TF2.x snippet (modified from above) to turn integers to multi-hot:

maker_id_list = [1, 3, 9, 14, 2]
one_hot_enc = tf.one_hot(indices=maker_id_list, depth=16)

feature_list = tf.reduce_sum(one_hot_enc, axis=0)

To use this, you have to cast your string labels to integers. I hope this can help you (or others) as a starting point.

Edit:
Use this to cast a list of string labels to a list of integer labels:

list(map(int, ['1', '5', '9']))

@mhorlacher
Copy link

Hi @Yannik1337 - sorry for the late response. Yes this was the option I was going for in the end. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:feature Feature requests
Projects
None yet
Development

No branches or pull requests