Feature Request: tf.multi_one_hot that is one-hot encoding multiple columns of a Tensor #16044

mkurovski · 2018-01-11T16:12:08Z

Hi there,

I just wrote a function that creates multiple one-hot-encodings for a tensor and concatenates them. I was curious whether this might serve some others and contribute this feature.

def multiple_one_hot(cat_tensor, depth_list):
    """Creates one-hot-encodings for multiple categorical attributes and
    concatenates the resulting encodings

    Args:
        cat_tensor (tf.Tensor): tensor with mutiple columns containing categorical features
        depth_list (list): list of the no. of values (depth) for each categorical

    Returns:
        one_hot_enc_tensor (tf.Tensor): concatenated one-hot-encodings of cat_tensor
    """
    one_hot_enc_tensor = tf.one_hot(cat_int_tensor[:,0], depth_list[0], axis=1)
    for col in range(1, len(depth_list)):
        add = tf.one_hot(cat_int_tensor[:,col], depth_list[col], axis=1)
        one_hot_enc_tensor = tf.concat([one_hot_enc_tensor, add], axis=1)

    return one_hot_enc_tensor

I am happy for your feedback. Tell me if you think others might profit and I would enjoy to create a pull request ;)

The text was updated successfully, but these errors were encountered:

cy89 · 2018-01-12T17:37:04Z

@squall-1002 thanks for volunteering! I will mark "contributions welcome".

divyanshj16 · 2018-01-22T16:21:51Z

@squall-1002 If you are not working on it can I make a PR?

mkurovski · 2018-01-22T23:25:27Z

@divyanshj16 Thanks, I just pushed it and created the PR, but feel free to comment ;)

wileeam · 2018-02-16T14:47:36Z

@squall-1002 Could you give some practical example of this method?
I checked the tests but I am not sure of a use case where I would benefit from this, hence reverting to you for more details (maybe you also want to add the example in the documentation of the method for others like me :)

mkurovski · 2018-02-17T12:16:36Z

@wileeam Of course: Recommender Systems ;)
In RecSys I am dealing with lots of features of varying modalities, among them many categorical features. Assume we work on vehicle recommendations for an online market. I have colors, fuel types, maker IDs, model IDs, etc. The way we treat those features is to one-hot encode them. But, where colors or fuel types may have tens of possible values, we may have thousands of model IDs which is why I can't include this as a test as this would be way too much.
In this case and I assume in others where people deal with multiple categorical features, one could profit from a higher abstraction of tf.one_hot which is why I propose tf.multi_one_hot.
Hope that helps you to understand the practical use of it.

lenjoy · 2018-04-07T05:12:48Z

hi, @squall-1002 , in recommender system, anyway we need to do some data process to merge the ID features (such as maker IDs) into a list.

Do you think the code below is working as you expected?

import tensorflow as tf

# some sparse features, from raw maker_id to index
maker_id_list = [1, 3, 9, 14, 2]
one_hot_enc = tf.one_hot(indices=maker_id_list, depth=16)

with tf.Session() as sess:
    feature_list = tf.reduce_sum(one_hot_enc, reduction_indices=0)
    print(sess.run(feature_list))

The output is

[0. 1. 1. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0.]

martinwicke · 2018-05-01T15:58:02Z

(API review)

I just closed #16300, which contained an implementation. We believe this function is too specialized and too hard to understand and use effectively to be of general use. I'll close this feature request.

Sorry about the delay in making this decision.

NicoCoallier · 2018-10-09T21:48:04Z

Amazing function !!! Very useful for ML models ! However it seems like a part is missing cat_int_tensors vs cat_tensors

KacperKubara · 2020-07-10T12:42:54Z

For people who would like to use tf.one_hot() function for a multi-label classification problem (e.g. multi-label text classification):

Modified version of the code made by @lenjoy :

import tensorflow as tf

indices = tf.ragged.constant([[1, 2], [1], [3, 2]])

one_hot = tf.one_hot(indices, depth=4)
one_hot_multi = tf.reduce_max(one_hot, axis=1)

Outputs is in this case:

<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[0., 1., 1., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 1.]], dtype=float32)>

@martinwicke, I don't particularly agree that the function is too specialized. Multi-label classification problems are quite common nowadays and tf.one_hot() is currently not sufficient to make it work. Maybe instead of creating another function, tf.one_hot() could be modified to provide this functionality?

rsnk96 · 2020-07-21T07:18:23Z

@martinwicke, I don't particularly agree that the function is too specialized. Multi-label classification problems are quite common nowadays and tf.one_hot() is currently not sufficient to make it work. Maybe instead of creating another function, tf.one_hot() could be modified to provide this functionality?

I agree with this. The usage of multi-hot labels isn't limited to just recommender systems, but also multi-output networks and multi-task learning.

mhorlacher · 2021-02-15T13:37:10Z

Any updates on this issue? What's the current recommended way to achieve multi_hot encoding for a set of string labels in TF-2.4?

Yannik1337 · 2021-06-29T13:39:34Z

@mhorlacher you can use the following TF2.x snippet (modified from above) to turn integers to multi-hot:

maker_id_list = [1, 3, 9, 14, 2]
one_hot_enc = tf.one_hot(indices=maker_id_list, depth=16)

feature_list = tf.reduce_sum(one_hot_enc, axis=0)

To use this, you have to cast your string labels to integers. I hope this can help you (or others) as a starting point.

Edit:
Use this to cast a list of string labels to a list of integer labels:

list(map(int, ['1', '5', '9']))

mhorlacher · 2022-02-28T20:23:30Z

Hi @Yannik1337 - sorry for the late response. Yes this was the option I was going for in the end. Thanks!

cy89 added the type:feature Feature requests label Jan 11, 2018

cy89 added the stat:contribution welcome Status - Contributions welcome label Jan 12, 2018

mkurovski mentioned this issue Jan 22, 2018

Add tf.multi_one_hot that one-hot encodes multiple columns of Tensor #16300

Closed

martinwicke removed the stat:contribution welcome Status - Contributions welcome label May 1, 2018

martinwicke closed this as completed May 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: tf.multi_one_hot that is one-hot encoding multiple columns of a Tensor #16044

Feature Request: tf.multi_one_hot that is one-hot encoding multiple columns of a Tensor #16044

Feature Request: tf.multi_one_hot that is one-hot encoding multiple columns of a Tensor #16044

Feature Request: tf.multi_one_hot that is one-hot encoding multiple columns of a Tensor #16044

Comments