[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to wrap a CuPy function inside a function decorated by tf.function #51642

Open
llodds opened this issue Aug 23, 2021 · 8 comments
Open

How to wrap a CuPy function inside a function decorated by tf.function #51642

llodds opened this issue Aug 23, 2021 · 8 comments
Assignees
Labels
comp:tf.function tf.function related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:support Support issues

Comments

@llodds
Copy link
llodds commented Aug 23, 2021

I have a CuPy function tweaking TF tensor as follows:
TF tensor => dlpack => CuPy device array => apply some CuPy functions => CuPy device array => dlpack => TF tensor.

The code will work in eager mode, but once I decorate the function with tf.function, it won't work.

dlcapsule = tf.experimental.dlpack.to_dlpack(x)
InvalidArgumentError: The argument to `to_dlpack` must be a TF tensor, not Python object

I believe the general question should be: how to wrap up a python function calling TF tensor and also returning TF tensor in graph mode? Thanks.

@llodds llodds added the type:others issues not falling in bug, perfromance, support, build and install or feature label Aug 23, 2021
@tilakrayal
Copy link
Contributor

@llodds ,
In order to expedite the trouble-shooting process, could you please provide a complete code and tensorflow you are using.Thanks!

@tilakrayal tilakrayal added stat:awaiting response Status - Awaiting response from author comp:apis Highlevel API related issues labels Aug 24, 2021
@llodds
Copy link
Author
llodds commented Aug 24, 2021

@tilakrayal

As requested:

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # suppress tf messages
os.environ['TF_GPU_THREAD_MODE'] = 'gpu_private' # GPU has dedicated CPU threads

import tensorflow as tf
# so we know exactly the GPU memory usage
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)
    
import cupy as cp
# conversion functions between tensorflow and cupy
def tf2cp(x):
    dlcapsule = tf.experimental.dlpack.to_dlpack(x)
    return cp.fromDlpack(dlcapsule)
def cp2tf(x):
    dlcapsule = x.toDlpack()
    return tf.experimental.dlpack.from_dlpack(dlcapsule)

# now, test how to use cupy with distributed TF
import numpy as np
X = np.random.random_sample((100, 128, 128, 128))
Y = np.random.random_sample((100, 128, 128, 128))
dataset = tf.data.Dataset.from_tensor_slices((X, Y))
dataset = dataset.shuffle(50, reshuffle_each_iteration=True)
dataset = dataset.batch(10, drop_remainder=True).prefetch(tf.data.AUTOTUNE)
strategy = tf.distribute.MirroredStrategy()
dataset_dist = strategy.experimental_distribute_dataset(dataset)


def simple_cupy_op(X, Y):
    X_cp = tf2cp(X)
    Y_cp = tf2cp(Y)
    Y_cp = X_cp + Y_cp
    X = cp2tf(X_cp)
    Y = cp2tf(Y_cp)
    
@tf.function
def simple_cupy_op_dist(X, Y):
    strategy.run(simple_cupy_op, args = (X, Y))
    
for X, Y in dataset_dist:
    simple_cupy_op_dist(X, Y)

Error message:

    ./test_cupy_with_dist_TF.py:41 simple_cupy_op_dist  *
        strategy.run(simple_cupy_op, args = (X, Y))
    ./test_cupy_with_dist_TF.py:33 simple_cupy_op  *
        X_cp = tf2cp(X)
    ./test_cupy_with_dist_TF.py:15 tf2cp  *
        dlcapsule = tf.experimental.dlpack.to_dlpack(x)
    /hpc/apps/pyhpc/dist/conda/x86_64/envs/cuda-11.0/lib/python3.8/site-packages/tensorflow/python/dlpack/dlpack.py:45 to_dlpack  **
        return pywrap_tfe.TFE_ToDlpackCapsule(tf_tensor)

    InvalidArgumentError: The argument to `to_dlpack` must be a TF tensor, not Python object

Thanks!

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Aug 26, 2021
@tilakrayal tilakrayal added the comp:tf.function tf.function related issues label Aug 27, 2021
@tilakrayal
Copy link
Contributor

@Saduf2019 ,
I was able to reproduce the issue in tf v2.5,v2.6 and nightly.Please find the gist of it here.

@tilakrayal tilakrayal assigned Saduf2019 and unassigned Saduf2019 and tilakrayal Aug 27, 2021
@Saduf2019 Saduf2019 assigned ymodak and unassigned Saduf2019 Sep 6, 2021
@ymodak ymodak added type:support Support issues stat:awaiting tensorflower Status - Awaiting response from tensorflower and removed comp:apis Highlevel API related issues type:others issues not falling in bug, perfromance, support, build and install or feature labels Sep 18, 2021
@mdanatg
Copy link
mdanatg commented Sep 20, 2021

CuPy, like NumPy, is dependent on a Python runtime to run. When executing a tf.function, there is no such Python runtime - all ops are compiled into a graph that is executed inside the TF executor, which isolated from Python.

To mix such Python-reliant ops into the TF graph, you can use py_function:

@tf.function
def simple_cupy_op(X, Y):
    tf.py_function(simple_cupy_op, ...)

cc @yuefengz

Note however that py_function code is not portable, and I don't think it will work well with tf.distribute. For use with tf.function / tf.distribute, my advice would be to rewrite the CuPy code into TF ops. Fundamentally, they should be similar - in both cases you'd run GPU kernels.

@llodds
Copy link
Author
llodds commented Oct 25, 2021

@mdanatg Looks like I have to build a C++ wrapper for CuPy, and then convert it to TF ops, is that right? Can I write TF op directly from python? tf.py_function indeed doesn't work well with tf.distribute.

@llodds
Copy link
Author
llodds commented Oct 25, 2021

@mdanatg Is there a way/function to extract tensor from distributed replica and then gather them to produce a distributed replica? I am thinking an alternative way to use multiprocessing module to apply CuPy op on tensor directly.

@mdanatg
Copy link
mdanatg commented Oct 26, 2021

@yuefengz would know more about the last question.

For building a wrapper over CuPy, that might be tricky, though it might work. Replacing all cp.* / np.* calls with corresponding tf.* might be a lot more straightforward, unless you have very large programs.

@llodds
Copy link
Author
llodds commented Oct 26, 2021

@mdanatg @yuefengz I am using CuPy to write a customized 3D augmentation layer for device tensors. TF currently doesn't provide 3D random transformation APIs. This CuPy-based layer currently work in eager mode with strategy.run(), but nsys results show poor-overlapping among augmentation operations across multiple-GPUs, so I am thinking about either wrapping it as TF ops so it can work in graph mode or doing it outside of TF (extracting tensors from replicas and then wrapping CuPy inside multiprocessing.Process()). Now I don't know the right TF API to extract from or gather to a distributed replica.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:tf.function tf.function related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:support Support issues
Projects
None yet
Development

No branches or pull requests

6 participants