Adding new metric spec causing OOM #115

axelning · 2021-03-02T10:27:58Z

System information

Have I specified the code to reproduce the issue
(Yes/No): yes
Environment in which the code is executed (e.g., Local
(Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc): ubuntu 18.04
TensorFlow
version (you are using): - TFX Version: 0.26.1- Python version:3.6.7 tensorflow version:2.3.2

Describe the current behavior
when do as the tutorial, I was adding a extra tfma.EvalConfig to evaluator as

eval_config = tfma.EvalConfig(
		model_specs=[
			# This assumes a serving model with signature 'serving_default'. If
			# using estimator based EvalSavedModel, add signature_name: 'eval' and
			# remove the label_key.
			tfma.ModelSpec(label_key='Label',
			               model_type=constants.TF_GENERIC
			               )
		],
		metrics_specs=[
			tfma.MetricsSpec(
				# The metrics added here are in addition to those saved with the
				# model (assuming either a keras model or EvalSavedModel is used).
				# Any metrics added into the saved model (for example using
				# model.compile(..., metrics=[...]), etc) will be computed
				# automatically.
				metrics=[
					tfma.MetricConfig(class_name='ExampleCount'),
					tfma.MetricConfig(
						class_name='BinaryAccuracy',
						threshold=tfma.MetricThreshold(
							value_threshold=tfma.GenericValueThreshold(
								lower_bound={'value': 0.5}),
							change_threshold=tfma.GenericChangeThreshold(
								direction=tfma.MetricDirection.HIGHER_IS_BETTER,
								absolute={'value': -1e-10})))
				]
			)
		],
		slicing_specs=[
			# An empty slice spec means the overall slice, i.e. the whole dataset.
			tfma.SlicingSpec(),
			# Data can be sliced along a feature column. In this case, data is
			# sliced along feature column trip_start_hour.
			# tfma.SlicingSpec(feature_keys=['trip_start_hour'])
		])

calls the function in tfma/metrics/metrics_spec.py: 477 _keys_and_metrics_from_specs(metrics_specs)

this functions will call the from_config() in tensorflow/python/keras/engine/base_layer.py :697

looks like this call will form a new layer. and this operation will take all the gpu memory and causeing OOM for the following running.

Describe the expected behavior
Well, another OOM issue， looks like tf need a emergency surgery concerning the memory managment

Standalone code to reproduce the issue Providing a bare minimum test case or

Name of your Organization (Optional)

Other info / logs Include any logs or source code that would be helpful to
just normal OOM error

The text was updated successfully, but these errors were encountered:

arghyaganguly self-assigned this Mar 2, 2021

arghyaganguly added the type:bug label Mar 2, 2021

arghyaganguly assigned mdreves and unassigned arghyaganguly Mar 2, 2021

arghyaganguly added the stat:awaiting tensorflower label Mar 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding new metric spec causing OOM #115

Adding new metric spec causing OOM #115

Adding new metric spec causing OOM #115

Adding new metric spec causing OOM #115

Comments