Out of Memory in mnist? #138

strongbanker · 2015-11-11T14:13:32Z

ubuntu@slave1:/media/slave1temp/tensorflow_full$ python tensorflow/tensorflow/models/image/mnist/convolutional.py
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 8
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:888] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:88] Found device 0 with properties:
name: GeForce GTX 750 Ti
major: 5 minor: 0 memoryClockRate (GHz) 1.189
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.96GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:112] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:122] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:643] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:47] Setting region size to 1896685568
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 8
Initialized!
Epoch 0.00
Minibatch loss: 12.054, learning rate: 0.010000
Minibatch error: 90.6%
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 256 (256B) Pool: chunks: 64 free: 35 cumulative malloc: 93 cumulative freed: 64
Number of chunks: 64, in_use chunks: 29
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 2048 (2.0KiB) Pool: chunks: 8 free: 4 cumulative malloc: 7 cumulative freed: 3
Number of chunks: 8, in_use chunks: 4
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 4096 (4.0KiB) Pool: chunks: 16 free: 13 cumulative malloc: 18 cumulative freed: 15
Number of chunks: 16, in_use chunks: 3
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 32768 (32.0KiB) Pool: chunks: 8 free: 5 cumulative malloc: 9 cumulative freed: 6
Number of chunks: 8, in_use chunks: 3
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 139264 (136.0KiB) Pool: chunks: 15 free: 15 cumulative malloc: 16 cumulative freed: 16
Number of chunks: 15, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 212992 (208.0KiB) Pool: chunks: 12 free: 9 cumulative malloc: 14 cumulative freed: 11
Number of chunks: 12, in_use chunks: 3
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 278528 (272.0KiB) Pool: chunks: 1 free: 1 cumulative malloc: 1 cumulative freed: 1
Number of chunks: 1, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 851968 (832.0KiB) Pool: chunks: 4 free: 4 cumulative malloc: 4 cumulative freed: 4
Number of chunks: 4, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 1703936 (1.62MiB) Pool: chunks: 6 free: 6 cumulative malloc: 6 cumulative freed: 6
Number of chunks: 6, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 2883584 (2.75MiB) Pool: chunks: 1 free: 1 cumulative malloc: 1 cumulative freed: 1
Number of chunks: 1, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 3407872 (3.25MiB) Pool: chunks: 8 free: 8 cumulative malloc: 9 cumulative freed: 9
Number of chunks: 8, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 6815744 (6.50MiB) Pool: chunks: 12 free: 9 cumulative malloc: 18 cumulative freed: 15
Number of chunks: 12, in_use chunks: 3
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 15728640 (15.00MiB) Pool: chunks: 1 free: 0 cumulative malloc: 1 cumulative freed: 0
Number of chunks: 1, in_use chunks: 1
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 20971520 (20.00MiB) Pool: chunks: 1 free: 1 cumulative malloc: 1 cumulative freed: 1
Number of chunks: 1, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 125829120 (120.00MiB) Pool: chunks: 1 free: 0 cumulative malloc: 1 cumulative freed: 0
Number of chunks: 1, in_use chunks: 1
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 251658240 (240.00MiB) Pool: chunks: 0 free: 0 cumulative malloc: 0 cumulative freed: 0
Number of chunks: 0, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:339] Chunk size: 503316480 (480.00MiB) Pool: chunks: 3 free: 3 cumulative malloc: 3 cumulative freed: 3
Number of chunks: 3, in_use chunks: 0
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:345] Aggregate Region Memory: 1896685568 (1.77GiB)
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:347] Aggregate Chunk Memory: 1803329536 (1.68GiB)
W tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:89] Out of GPU memory, see memory state dump above
W tensorflow/core/kernels/conv_ops.cc:162] Resource exhausted: OOM when allocating tensor with shapedim { size: 5000 } dim { size: 14 } dim { size: 14 } dim { size: 64 }
W tensorflow/core/common_runtime/executor.cc:1027] 0x57b6090 Compute status: Resource exhausted: OOM when allocating tensor with shapedim { size: 5000 } dim { size: 14 } dim { size: 14 } dim { size: 64 }
[[Node: Conv2D_3 = Conv2D[T=DT_FLOAT, padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](MaxPool_2, Variable_2)]]
W tensorflow/core/common_runtime/executor.cc:1027] 0x50bc640 Compute status: Resource exhausted: OOM when allocating tensor with shapedim { size: 5000 } dim { size: 14 } dim { size: 14 } dim { size: 64 }
[[Node: Conv2D_3 = Conv2D[T=DT_FLOAT, padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](MaxPool_2, Variable_2)]]
[[Node: Softmax_1/_45 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_735_Softmax_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]
Traceback (most recent call last):
File "tensorflow/tensorflow/models/image/mnist/convolutional.py", line 270, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/default/_app.py", line 11, in run
sys.exit(main(sys.argv))
File "tensorflow/tensorflow/models/image/mnist/convolutional.py", line 258, in main
validation_prediction.eval(), validation_labels)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 405, in eval
return _eval_using_default_session(self, feed_dict, self.graph, session)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2728, in _eval_using_default_session
return session.run(tensors, feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 345, in run
results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 419, in _do_run
e.code)
tensorflow.python.framework.errors.ResourceExhaustedError: OOM when allocating tensor with shapedim { size: 5000 } dim { size: 14 } dim { size: 14 } dim { size: 64 }
[[Node: Conv2D_3 = Conv2D[T=DT_FLOAT, padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](MaxPool_2, Variable_2)]]
[[Node: Softmax_1/_45 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_735_Softmax_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]
Caused by op u'Conv2D_3', defined at:
File "tensorflow/tensorflow/models/image/mnist/convolutional.py", line 270, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/default/_app.py", line 11, in run
sys.exit(main(sys.argv))
File "tensorflow/tensorflow/models/image/mnist/convolutional.py", line 229, in main
validation_prediction = tf.nn.softmax(model(validation_data_node))
File "tensorflow/tensorflow/models/image/mnist/convolutional.py", line 179, in model
padding='SAME')
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 207, in conv2d
use_cudnn_on_gpu=use_cudnn_on_gpu, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 633, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1710, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 988, in init
self._traceback = _extract_stack()

ubuntu@slave1:/media/slave1temp/tensorflow_full$

The text was updated successfully, but these errors were encountered:

vincentvanhoucke · 2015-11-11T15:10:57Z

Duplicate of #136

Fixes #138 PiperOrigin-RevId: 269509668

Fixes tensorflow#138 PiperOrigin-RevId: 269509668

Fix build failure in depthwise_conv_op_gpu.cu.cc by re-enable some ha…

vincentvanhoucke closed this as completed Nov 11, 2015

hanskrupakar mentioned this issue Oct 18, 2016

Sudden OOM error with Tensorflow embedding_attention_seq2seq after successful runs #5043

Closed

tensorflow-copybara pushed a commit that referenced this issue Sep 17, 2019

Add missing CMake dependency from libAnalysis to the Vector dialect

d5754ed

Fixes #138 PiperOrigin-RevId: 269509668

xinan-jiang pushed a commit to xinan-jiang/tensorflow that referenced this issue Oct 4, 2019

Add missing CMake dependency from libAnalysis to the Vector dialect

4cc85f3

Fixes tensorflow#138 PiperOrigin-RevId: 269509668

chengdianxuezi mentioned this issue Nov 1, 2019

Bug: tensorflow-gpu takes long time before beginning to compute #18652

Closed

cjolivier01 pushed a commit to Cerebras/tensorflow that referenced this issue Dec 6, 2019

Merge pull request tensorflow#138 from yxsamliu/hip-clang-depthwise

0583147

Fix build failure in depthwise_conv_op_gpu.cu.cc by re-enable some ha…

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of Memory in mnist? #138

Out of Memory in mnist? #138

Out of Memory in mnist? #138

Out of Memory in mnist? #138

Comments