How to wrap a CuPy function inside a function decorated by tf.function #51642

llodds · 2021-08-23T20:43:29Z

I have a CuPy function tweaking TF tensor as follows:
TF tensor => dlpack => CuPy device array => apply some CuPy functions => CuPy device array => dlpack => TF tensor.

The code will work in eager mode, but once I decorate the function with tf.function, it won't work.

dlcapsule = tf.experimental.dlpack.to_dlpack(x)
InvalidArgumentError: The argument to `to_dlpack` must be a TF tensor, not Python object

I believe the general question should be: how to wrap up a python function calling TF tensor and also returning TF tensor in graph mode? Thanks.

The text was updated successfully, but these errors were encountered:

tilakrayal · 2021-08-24T06:51:47Z

@llodds ,
In order to expedite the trouble-shooting process, could you please provide a complete code and tensorflow you are using.Thanks!

llodds · 2021-08-24T17:05:01Z

@tilakrayal

As requested:

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # suppress tf messages
os.environ['TF_GPU_THREAD_MODE'] = 'gpu_private' # GPU has dedicated CPU threads

import tensorflow as tf
# so we know exactly the GPU memory usage
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)
    
import cupy as cp
# conversion functions between tensorflow and cupy
def tf2cp(x):
    dlcapsule = tf.experimental.dlpack.to_dlpack(x)
    return cp.fromDlpack(dlcapsule)
def cp2tf(x):
    dlcapsule = x.toDlpack()
    return tf.experimental.dlpack.from_dlpack(dlcapsule)

# now, test how to use cupy with distributed TF
import numpy as np
X = np.random.random_sample((100, 128, 128, 128))
Y = np.random.random_sample((100, 128, 128, 128))
dataset = tf.data.Dataset.from_tensor_slices((X, Y))
dataset = dataset.shuffle(50, reshuffle_each_iteration=True)
dataset = dataset.batch(10, drop_remainder=True).prefetch(tf.data.AUTOTUNE)
strategy = tf.distribute.MirroredStrategy()
dataset_dist = strategy.experimental_distribute_dataset(dataset)


def simple_cupy_op(X, Y):
    X_cp = tf2cp(X)
    Y_cp = tf2cp(Y)
    Y_cp = X_cp + Y_cp
    X = cp2tf(X_cp)
    Y = cp2tf(Y_cp)
    
@tf.function
def simple_cupy_op_dist(X, Y):
    strategy.run(simple_cupy_op, args = (X, Y))
    
for X, Y in dataset_dist:
    simple_cupy_op_dist(X, Y)

Error message:

    ./test_cupy_with_dist_TF.py:41 simple_cupy_op_dist  *
        strategy.run(simple_cupy_op, args = (X, Y))
    ./test_cupy_with_dist_TF.py:33 simple_cupy_op  *
        X_cp = tf2cp(X)
    ./test_cupy_with_dist_TF.py:15 tf2cp  *
        dlcapsule = tf.experimental.dlpack.to_dlpack(x)
    /hpc/apps/pyhpc/dist/conda/x86_64/envs/cuda-11.0/lib/python3.8/site-packages/tensorflow/python/dlpack/dlpack.py:45 to_dlpack  **
        return pywrap_tfe.TFE_ToDlpackCapsule(tf_tensor)

    InvalidArgumentError: The argument to `to_dlpack` must be a TF tensor, not Python object

Thanks!

tilakrayal · 2021-08-27T18:09:45Z

@Saduf2019 ,
I was able to reproduce the issue in tf v2.5,v2.6 and nightly.Please find the gist of it here.

mdanatg · 2021-09-20T14:11:21Z

CuPy, like NumPy, is dependent on a Python runtime to run. When executing a tf.function, there is no such Python runtime - all ops are compiled into a graph that is executed inside the TF executor, which isolated from Python.

To mix such Python-reliant ops into the TF graph, you can use py_function:

@tf.function
def simple_cupy_op(X, Y):
    tf.py_function(simple_cupy_op, ...)

cc @yuefengz

Note however that py_function code is not portable, and I don't think it will work well with tf.distribute. For use with tf.function / tf.distribute, my advice would be to rewrite the CuPy code into TF ops. Fundamentally, they should be similar - in both cases you'd run GPU kernels.

llodds · 2021-10-25T23:47:13Z

@mdanatg Looks like I have to build a C++ wrapper for CuPy, and then convert it to TF ops, is that right? Can I write TF op directly from python? tf.py_function indeed doesn't work well with tf.distribute.

llodds · 2021-10-25T23:49:51Z

@mdanatg Is there a way/function to extract tensor from distributed replica and then gather them to produce a distributed replica? I am thinking an alternative way to use multiprocessing module to apply CuPy op on tensor directly.

mdanatg · 2021-10-26T14:03:23Z

@yuefengz would know more about the last question.

For building a wrapper over CuPy, that might be tricky, though it might work. Replacing all cp.* / np.* calls with corresponding tf.* might be a lot more straightforward, unless you have very large programs.

llodds · 2021-10-26T16:31:11Z

@mdanatg @yuefengz I am using CuPy to write a customized 3D augmentation layer for device tensors. TF currently doesn't provide 3D random transformation APIs. This CuPy-based layer currently work in eager mode with strategy.run(), but nsys results show poor-overlapping among augmentation operations across multiple-GPUs, so I am thinking about either wrapping it as TF ops so it can work in graph mode or doing it outside of TF (extracting tensors from replicas and then wrapping CuPy inside multiprocessing.Process()). Now I don't know the right TF API to extract from or gather to a distributed replica.

llodds added the type:others issues not falling in bug, perfromance, support, build and install or feature label Aug 23, 2021

google-ml-butler bot assigned tilakrayal Aug 23, 2021

tilakrayal added stat:awaiting response Status - Awaiting response from author comp:apis Highlevel API related issues labels Aug 24, 2021

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Aug 26, 2021

tilakrayal added the comp:tf.function tf.function related issues label Aug 27, 2021

tilakrayal assigned Saduf2019 and unassigned Saduf2019 and tilakrayal Aug 27, 2021

Saduf2019 assigned ymodak and unassigned Saduf2019 Sep 6, 2021

ymodak added type:support Support issues stat:awaiting tensorflower Status - Awaiting response from tensorflower and removed comp:apis Highlevel API related issues type:others issues not falling in bug, perfromance, support, build and install or feature labels Sep 18, 2021

diricxbart mentioned this issue Apr 28, 2022

Support dataset creation using cuCIM DigitalSlideArchive/HistomicsStream#74

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to wrap a CuPy function inside a function decorated by tf.function #51642

How to wrap a CuPy function inside a function decorated by tf.function #51642

How to wrap a CuPy function inside a function decorated by tf.function #51642

How to wrap a CuPy function inside a function decorated by tf.function #51642

Comments