CopyTensor::ViaDMA function, allocator type sometimes not match actual input underlying memory type #60856

chengchen666 · 2023-06-14T03:09:42Z

Click to expand!

Issue Type

Bug

Have you reproduced the bug with TF nightly?

No

Source

source

Tensorflow Version

2.12.0rc0

Custom Code

Yes

OS Platform and Distribution

CentOS Linux 7

Mobile device

No response

Python version

3.7.5

Bazel version

bazel 3.7.2

GCC/Compiler version

gcc-9

CUDA/cuDNN version

cuda 11, cudnn 8

GPU model and memory

Tesla V100S

Current Behaviour?

In CopyTensor::ViaDMA, alloc_attr decides the direction of memory copy. However, sometimes alloc_attr does not keep the same as the Tensor pointer's underlying memory type. In my case,src_alloc_attr.on_host() is True, but input->GetMemoryType() equals to kDevice. So this results the memory copy direction in this function is cpu->gpu, but actually the direction should be gpu -> gpu.
I think this bug does not reveal is because the cuda driver api, like cuMemcpyHtoD(), does not care about the direction if it's H to D or others, it only cares about the pointer attribute, if the src pointer is on device and dst pointer is also on device, even if we call cuMemcpyHtoD(), cuda driver would still do D to D copy. This feature would cover many bugs.

I haven't figured out where did the on_host attribute is set. From my understanding so far, same allocator object would be reused on different tensors, but the on_host attribute is one-way, once it's been set on_host, it cannot be unset later. This might cause some issue? Also, why wouln't we just use input->GetMemoryType() to decieds the memory copy direction, instead of the on_host attribute of alloc_attr

I meet this issue when I run horovod unit test case. Add some log message in
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/copy_tensor.cc#L219
, such as

if(!src_alloc_attr.on_host() && (input->GetMemoryType()==AllocatorMemoryType::kHostPageable || input->GetMemoryType()==AllocatorMemoryType::kHostPinned)){
    std::cout<<"!!!!!!! src alloc not on host, but input mem type is on host"<< std::endl;
  }
  if( src_alloc_attr.on_host() && input->GetMemoryType()==AllocatorMemoryType::kDevice) {
    std::cout<<"!!!!!!! src alloc on host, but input mem is on device" << std::endl;
  }

  if(!dst_alloc_attr.on_host() && (output->GetMemoryType()==AllocatorMemoryType::kHostPageable || output->GetMemoryType()==AllocatorMemoryType::kHostPinned)){
    std::cout<<"!!!!!!! dst alloc not on host, but output mem type is on host"<< std::endl;
  }
  if( dst_alloc_attr.on_host() && output->GetMemoryType()==AllocatorMemoryType::kDevice) {
    std::cout<<"!!!!!!! dst alloc on host, but output mem is on device" << std::endl;
  }

For me, I ran horovod alltoall Op unit test case to reproduce this issue. But this issue might reveal in other cases.

Standalone code to reproduce the issue

Run horovod unit test case can reproduce this issue:
https://github.com/horovod/horovod/blob/master/test/parallel/test_tensorflow.py
horovodrun --mpi -np 2 pytest -s -v test_tensorflow.py::TensorFlowTests::test_horovod_alltoall_gpu.

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

google-ml-butler bot added the type:bug Bug label Jun 14, 2023

google-ml-butler bot assigned SuryanarayanaY Jun 14, 2023

SuryanarayanaY added TF 2.12 For issues related to Tensorflow 2.12 comp:runtime c++ runtime, performance issues (cpu) labels Jun 14, 2023

SuryanarayanaY assigned sachinprasadhs and unassigned SuryanarayanaY Jun 14, 2023

sachinprasadhs assigned chuanhaozhuge Jul 19, 2023

sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jul 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CopyTensor::ViaDMA function, allocator type sometimes not match actual input underlying memory type #60856

CopyTensor::ViaDMA function, allocator type sometimes not match actual input underlying memory type #60856

Issue Type

Have you reproduced the bug with TF nightly?

Source

Tensorflow Version

Custom Code

OS Platform and Distribution

Mobile device

Python version

Bazel version

GCC/Compiler version

CUDA/cuDNN version

GPU model and memory

Current Behaviour?

Standalone code to reproduce the issue

Relevant log output

CopyTensor::ViaDMA function, allocator type sometimes not match actual input underlying memory type #60856

CopyTensor::ViaDMA function, allocator type sometimes not match actual input underlying memory type #60856

Comments

Issue Type

Have you reproduced the bug with TF nightly?

Source

Tensorflow Version

Custom Code

OS Platform and Distribution

Mobile device

Python version

Bazel version

GCC/Compiler version

CUDA/cuDNN version

GPU model and memory

Current Behaviour?

Standalone code to reproduce the issue

Relevant log output