-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of Memory in mnist? #138
Comments
Duplicate of #136 |
tensorflow-copybara
pushed a commit
that referenced
this issue
Sep 17, 2019
Fixes #138 PiperOrigin-RevId: 269509668
xinan-jiang
pushed a commit
to xinan-jiang/tensorflow
that referenced
this issue
Oct 4, 2019
Fixes tensorflow#138 PiperOrigin-RevId: 269509668
cjolivier01
pushed a commit
to Cerebras/tensorflow
that referenced
this issue
Dec 6, 2019
Fix build failure in depthwise_conv_op_gpu.cu.cc by re-enable some ha…
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
ubuntu@slave1:/media/slave1temp/tensorflow_full$ python tensorflow/tensorflow/models/image/mnist/convolutional.py
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 8
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:888] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:88] Found device 0 with properties:
name: GeForce GTX 750 Ti
major: 5 minor: 0 memoryClockRate (GHz) 1.189
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.96GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:112] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:122] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:643] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:47] Setting region size to 1896685568
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 8
Initialized!
Epoch 0.00
Minibatch loss: 12.054, learning rate: 0.010000
Minibatch error: 90.6%
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 256 (256B) Pool: chunks: 64 free: 35 cumulative malloc: 93 cumulative freed: 64
Number of chunks: 64, in_use chunks: 29
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 2048 (2.0KiB) Pool: chunks: 8 free: 4 cumulative malloc: 7 cumulative freed: 3
Number of chunks: 8, in_use chunks: 4
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 4096 (4.0KiB) Pool: chunks: 16 free: 13 cumulative malloc: 18 cumulative freed: 15
Number of chunks: 16, in_use chunks: 3
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 32768 (32.0KiB) Pool: chunks: 8 free: 5 cumulative malloc: 9 cumulative freed: 6
Number of chunks: 8, in_use chunks: 3
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 139264 (136.0KiB) Pool: chunks: 15 free: 15 cumulative malloc: 16 cumulative freed: 16
Number of chunks: 15, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 212992 (208.0KiB) Pool: chunks: 12 free: 9 cumulative malloc: 14 cumulative freed: 11
Number of chunks: 12, in_use chunks: 3
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 278528 (272.0KiB) Pool: chunks: 1 free: 1 cumulative malloc: 1 cumulative freed: 1
Number of chunks: 1, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 851968 (832.0KiB) Pool: chunks: 4 free: 4 cumulative malloc: 4 cumulative freed: 4
Number of chunks: 4, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 1703936 (1.62MiB) Pool: chunks: 6 free: 6 cumulative malloc: 6 cumulative freed: 6
Number of chunks: 6, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 2883584 (2.75MiB) Pool: chunks: 1 free: 1 cumulative malloc: 1 cumulative freed: 1
Number of chunks: 1, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 3407872 (3.25MiB) Pool: chunks: 8 free: 8 cumulative malloc: 9 cumulative freed: 9
Number of chunks: 8, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 6815744 (6.50MiB) Pool: chunks: 12 free: 9 cumulative malloc: 18 cumulative freed: 15
Number of chunks: 12, in_use chunks: 3
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 15728640 (15.00MiB) Pool: chunks: 1 free: 0 cumulative malloc: 1 cumulative freed: 0
Number of chunks: 1, in_use chunks: 1
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 20971520 (20.00MiB) Pool: chunks: 1 free: 1 cumulative malloc: 1 cumulative freed: 1
Number of chunks: 1, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 125829120 (120.00MiB) Pool: chunks: 1 free: 0 cumulative malloc: 1 cumulative freed: 0
Number of chunks: 1, in_use chunks: 1
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 251658240 (240.00MiB) Pool: chunks: 0 free: 0 cumulative malloc: 0 cumulative freed: 0
Number of chunks: 0, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 503316480 (480.00MiB) Pool: chunks: 3 free: 3 cumulative malloc: 3 cumulative freed: 3
Number of chunks: 3, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:345] Aggregate Region Memory: 1896685568 (1.77GiB)
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:347] Aggregate Chunk Memory: 1803329536 (1.68GiB)
W tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:89] Out of GPU memory, see memory state dump above
W tensorflow/core/kernels/conv_ops.cc:162] Resource exhausted: OOM when allocating tensor with shapedim { size: 5000 } dim { size: 14 } dim { size: 14 } dim { size: 64 }
W tensorflow/core/common_runtime/executor.cc:1027] 0x57b6090 Compute status: Resource exhausted: OOM when allocating tensor with shapedim { size: 5000 } dim { size: 14 } dim { size: 14 } dim { size: 64 }
[[Node: Conv2D_3 = Conv2D[T=DT_FLOAT, padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](MaxPool_2, Variable_2)]]
W tensorflow/core/common_runtime/executor.cc:1027] 0x50bc640 Compute status: Resource exhausted: OOM when allocating tensor with shapedim { size: 5000 } dim { size: 14 } dim { size: 14 } dim { size: 64 }
[[Node: Conv2D_3 = Conv2D[T=DT_FLOAT, padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](MaxPool_2, Variable_2)]]
[[Node: Softmax_1/_45 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_735_Softmax_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]
Traceback (most recent call last):
File "tensorflow/tensorflow/models/image/mnist/convolutional.py", line 270, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/default/_app.py", line 11, in run
sys.exit(main(sys.argv))
File "tensorflow/tensorflow/models/image/mnist/convolutional.py", line 258, in main
validation_prediction.eval(), validation_labels)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 405, in eval
return _eval_using_default_session(self, feed_dict, self.graph, session)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2728, in _eval_using_default_session
return session.run(tensors, feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 345, in run
results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 419, in _do_run
e.code)
tensorflow.python.framework.errors.ResourceExhaustedError: OOM when allocating tensor with shapedim { size: 5000 } dim { size: 14 } dim { size: 14 } dim { size: 64 }
[[Node: Conv2D_3 = Conv2D[T=DT_FLOAT, padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](MaxPool_2, Variable_2)]]
[[Node: Softmax_1/_45 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_735_Softmax_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]
Caused by op u'Conv2D_3', defined at:
File "tensorflow/tensorflow/models/image/mnist/convolutional.py", line 270, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/default/_app.py", line 11, in run
sys.exit(main(sys.argv))
File "tensorflow/tensorflow/models/image/mnist/convolutional.py", line 229, in main
validation_prediction = tf.nn.softmax(model(validation_data_node))
File "tensorflow/tensorflow/models/image/mnist/convolutional.py", line 179, in model
padding='SAME')
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 207, in conv2d
use_cudnn_on_gpu=use_cudnn_on_gpu, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 633, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1710, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 988, in init
self._traceback = _extract_stack()
ubuntu@slave1:/media/slave1temp/tensorflow_full$
The text was updated successfully, but these errors were encountered: