-
Notifications
You must be signed in to change notification settings - Fork 74.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to wrap a CuPy function inside a function decorated by tf.function #51642
Comments
@llodds , |
As requested: import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # suppress tf messages
os.environ['TF_GPU_THREAD_MODE'] = 'gpu_private' # GPU has dedicated CPU threads
import tensorflow as tf
# so we know exactly the GPU memory usage
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
import cupy as cp
# conversion functions between tensorflow and cupy
def tf2cp(x):
dlcapsule = tf.experimental.dlpack.to_dlpack(x)
return cp.fromDlpack(dlcapsule)
def cp2tf(x):
dlcapsule = x.toDlpack()
return tf.experimental.dlpack.from_dlpack(dlcapsule)
# now, test how to use cupy with distributed TF
import numpy as np
X = np.random.random_sample((100, 128, 128, 128))
Y = np.random.random_sample((100, 128, 128, 128))
dataset = tf.data.Dataset.from_tensor_slices((X, Y))
dataset = dataset.shuffle(50, reshuffle_each_iteration=True)
dataset = dataset.batch(10, drop_remainder=True).prefetch(tf.data.AUTOTUNE)
strategy = tf.distribute.MirroredStrategy()
dataset_dist = strategy.experimental_distribute_dataset(dataset)
def simple_cupy_op(X, Y):
X_cp = tf2cp(X)
Y_cp = tf2cp(Y)
Y_cp = X_cp + Y_cp
X = cp2tf(X_cp)
Y = cp2tf(Y_cp)
@tf.function
def simple_cupy_op_dist(X, Y):
strategy.run(simple_cupy_op, args = (X, Y))
for X, Y in dataset_dist:
simple_cupy_op_dist(X, Y) Error message:
Thanks! |
@Saduf2019 , |
CuPy, like NumPy, is dependent on a Python runtime to run. When executing a tf.function, there is no such Python runtime - all ops are compiled into a graph that is executed inside the TF executor, which isolated from Python. To mix such Python-reliant ops into the TF graph, you can use py_function:
cc @yuefengz Note however that |
@mdanatg Looks like I have to build a C++ wrapper for CuPy, and then convert it to TF ops, is that right? Can I write TF op directly from python? tf.py_function indeed doesn't work well with tf.distribute. |
@mdanatg Is there a way/function to extract tensor from distributed replica and then gather them to produce a distributed replica? I am thinking an alternative way to use multiprocessing module to apply CuPy op on tensor directly. |
@yuefengz would know more about the last question. For building a wrapper over CuPy, that might be tricky, though it might work. Replacing all |
@mdanatg @yuefengz I am using CuPy to write a customized 3D augmentation layer for device tensors. TF currently doesn't provide 3D random transformation APIs. This CuPy-based layer currently work in eager mode with strategy.run(), but nsys results show poor-overlapping among augmentation operations across multiple-GPUs, so I am thinking about either wrapping it as TF ops so it can work in graph mode or doing it outside of TF (extracting tensors from replicas and then wrapping CuPy inside multiprocessing.Process()). Now I don't know the right TF API to extract from or gather to a distributed replica. |
I have a CuPy function tweaking TF tensor as follows:
TF tensor => dlpack => CuPy device array => apply some CuPy functions => CuPy device array => dlpack => TF tensor.
The code will work in eager mode, but once I decorate the function with tf.function, it won't work.
I believe the general question should be: how to wrap up a python function calling TF tensor and also returning TF tensor in graph mode? Thanks.
The text was updated successfully, but these errors were encountered: