[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF_RegisterLogListener in C API does not seem to work #44995

Open
gehlhaarPfizer opened this issue Nov 18, 2020 · 5 comments
Open

TF_RegisterLogListener in C API does not seem to work #44995

gehlhaarPfizer opened this issue Nov 18, 2020 · 5 comments
Assignees
Labels
comp:runtime c++ runtime, performance issues (cpu) TF 2.3 Issues related to TF 2.3 type:bug Bug

Comments

@gehlhaarPfizer
Copy link

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: No
  • TensorFlow installed from (source or binary): Binary
  • TensorFlow version (use command below): 2.3.0
  • Python version: N/A
  • Bazel version (if compiling from source): N/A
  • GCC/Compiler version (if compiling from source): N/A
  • CUDA/cuDNN version: N/A
  • GPU model and memory: N/A

Describe the current behavior

When I call TF_RegisterLogListener in the C API, it doesn't seem to work as described. My listener is never called, and TensorFlow log messages still go to the console. The API description is:

// Register a listener method that processes printed messages.
//
// If any listeners are registered, the print operator will call all listeners
// with the printed messages and immediately return without writing to the
// logs.

Describe the expected behavior

My listener should be called and the messages should not be written to the console.

Standalone code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/Jupyter/any notebook.

extern "C"
{
void logListener(const char *message)
{
// This is never called and TensorFlow log messages go to the console instead.
std::cout << "TensorFlow: " << message << std::endl;
}
}

TF_RegisterLogListener(logListener);

Other info / logs Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.

@Saduf2019 Saduf2019 added TF 2.3 Issues related to TF 2.3 comp:runtime c++ runtime, performance issues (cpu) labels Nov 19, 2020
@jvishnuvardhan jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Nov 19, 2020
@tomerk
Copy link
tomerk commented Nov 23, 2020

Hi @gehlhaarPfizer , when you say "log messages" do you mean all TF logging or specifically messages/tensors printed via tf.print?

Originally this API was created to make sure tf.print specifically would be able to print to jupyter notebook / colab cell outputs. Although there were tentative plans to explore having all TF logging go through it, we didn't really have the bandwidth to do so.

@rohan100jain to see if tf-core should revisit this.

@gehlhaarPfizer
Copy link
Author

Ah. I misunderstood. I had expected all TensorFlow logging to go through the listener and not be output.

I have a small DNN model that I built in Python and then incorporated into an existing C++ program for production deployment. I took the shortest path to put this into C++, which was through the C API. The C++ API is horribly complex. I don't understand why there isn't a C++ API as clean as the Python API. But that is a topic for another conversation.

While it works correctly, I get superfluous output from TensorFlow to my console:

2020-11-23 14:06:43.917926: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /gpfs/apps/medsci/stacks/noOS/software/zing/latest/bin/../parameters/FitScoreModel.tf
2020-11-23 14:06:44.409069: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2020-11-23 14:06:44.417661: I tensorflow/cc/saved_model/loader.cc:234] Reading SavedModel debug info (if present) from: /gpfs/apps/medsci/stacks/noOS/software/zing/latest/bin/../parameters/FitScoreModel.tf
2020-11-23 14:06:44.556361: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-23 14:06:45.745551: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2200000000 Hz
2020-11-23 14:06:45.815726: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2633b30 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-23 14:06:45.815758: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-11-23 14:06:46.971689: I tensorflow/cc/saved_model/loader.cc:199] Restoring SavedModel bundle.
2020-11-23 14:06:52.002029: I tensorflow/cc/saved_model/loader.cc:183] Running initialization op on SavedModel bundle at path: /gpfs/apps/medsci/stacks/noOS/software/zing/latest/bin/../parameters/FitScoreModel.tf
2020-11-23 14:06:52.034424: I tensorflow/cc/saved_model/loader.cc:303] SavedModel load for tags { serve }; Status: success: OK. Took 8134566 microseconds.

I would really like to redirect this to my program's internal logging mechanism, which goes to a log file. I can suppress this output entirely via an environment variable but I would prefer to redirect it.

@tensorflowbutler tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Nov 25, 2020
@jvishnuvardhan jvishnuvardhan removed their assignment Dec 9, 2020
@rnett
Copy link
Contributor
rnett commented Apr 10, 2021

Seconding this on behalf of SIG JVM, we would like to be able to redirect the core logging to a Java logging API (almost certainly SLF4J). Additionally, it would be extremely nice if the callback had some structure, i.e. (severity: int, message: str) or similar, maybe with the time stamp, rather than just giving us the strings.

@kkimdev
Copy link
Contributor
kkimdev commented Jun 21, 2021

Related change: #41733 cc @avitebskiy

IIRC, this is not C-API(yet) but could you try TFAddLogSink(...) and see if it works for you?

@rnett
Copy link
Contributor
rnett commented Jun 27, 2021

It seems to work to some extent, but the added sink doesn't get all messages. For example, if I just echo the data back, I see stuff like:

[stderr] 2021-06-26 19:45:11.097075: I external/org_tensorflow/tensorflow/core/common_runtime/direct_session.cc:361] Device mapping: no known devices.
[stderr] 2021-06-26 19:45:11.102576: I external/org_tensorflow/tensorflow/core/common_runtime/placer.cc:114] testScope/absWithDevice: (Abs): /job:localhost/replica:0/task:0/device:CPU:0
[stderr] 2021-06-26 19:45:11.102898: I external/org_tensorflow/tensorflow/core/common_runtime/placer.cc:114] testScope/Const: (Const): /job:localhost/replica:0/task:0/device:CPU:0
Log message: Severity: 0, Fname: external/org_tensorflow/tensorflow/core/common_runtime/direct_session.cc, line: 361, string: Device mapping: no known devices.

[stderr] 2021-06-26 19:45:11.115882: I external/org_tensorflow/tensorflow/core/common_runtime/direct_session.cc:361] Device mapping: no known devices.
[stderr] 2021-06-26 19:45:11.119011: I external/org_tensorflow/tensorflow/core/common_runtime/placer.cc:114] testScope/absWithDevice: (Abs): /job:localhost/replica:0/task:0/device:CPU:0
[stderr] 2021-06-26 19:45:11.119308: I external/org_tensorflow/tensorflow/core/common_runtime/placer.cc:114] testScope/Const: (Const): /job:localhost/replica:0/task:0/device:CPU:0
Log message: Severity: 0, Fname: external/org_tensorflow/tensorflow/core/common_runtime/direct_session.cc, line: 361, string: Device mapping: no known devices.

[stderr] 2021-06-26 19:45:11.125814: I external/org_tensorflow/tensorflow/core/common_runtime/direct_session.cc:361] Device mapping: no known devices.
[stderr] 2021-06-26 19:45:11.128253: I external/org_tensorflow/tensorflow/core/common_runtime/placer.cc:114] testScope1/absWithDevice: (Abs): /job:localhost/replica:0/task:0/device:CPU:0
[stderr] 2021-06-26 19:45:11.128562: I external/org_tensorflow/tensorflow/core/common_runtime/placer.cc:114] testScope2/mulWithDevice: (Mul): /job:localhost/replica:0/task:0/device:CPU:0
[stderr] 2021-06-26 19:45:11.128816: I external/org_tensorflow/tensorflow/core/common_runtime/placer.cc:114] testScope1/Const: (Const): /job:localhost/replica:0/task:0/device:CPU:0
[stderr] 2021-06-26 19:45:11.129037: I external/org_tensorflow/tensorflow/core/common_runtime/placer.cc:114] testScope2/Const: (Const): /job:localhost/replica:0/task:0/device:CPU:0
Log message: Severity: 0, Fname: external/org_tensorflow/tensorflow/core/common_runtime/direct_session.cc, line: 361, string: Device mapping: no known devices.

[stderr] 2021-06-26 19:45:11.134049: I external/org_tensorflow/tensorflow/core/common_runtime/direct_session.cc:361] Device mapping: no known devices.
[stderr] 2021-06-26 19:45:11.136836: I external/org_tensorflow/tensorflow/core/common_runtime/placer.cc:114] testScope/absWithDevice: (Abs): /job:localhost/replica:0/task:0/device:CPU:0
[stderr] 2021-06-26 19:45:11.137125: I external/org_tensorflow/tensorflow/core/common_runtime/placer.cc:114] testScope/Const: (Const): /job:localhost/replica:0/task:0/device:CPU:0
Log message: Severity: 0, Fname: external/org_tensorflow/tensorflow/core/common_runtime/direct_session.cc, line: 361, string: Device mapping: no known devices.

[stderr] 2021-06-26 19:45:11.150138: I external/org_tensorflow/tensorflow/core/common_runtime/direct_session.cc:361] Device mapping: no known devices.
Log message: Severity: 0, Fname: external/org_tensorflow/tensorflow/core/common_runtime/direct_session.cc, line: 361, string: Device mapping: no known devices.

[stderr] 2021-06-26 19:45:11.941404: W external/org_tensorflow/tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 400000000 exceeds 10% of free system memory.
[stderr] 2021-06-26 19:45:11.941443: W external/org_tensorflow/tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 400000000 exceeds 10% of free system memory.

[stderr] 2021-06-26 19:45:12.270591: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:38] Reading SavedModel from: /tmp/tf-saved-model-export-test17892089746029133424
[stderr] 2021-06-26 19:45:12.270678: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:90] Reading meta graph with tags { serve }
[stderr] 2021-06-26 19:45:12.270688: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: /tmp/tf-saved-model-export-test17892089746029133424
[stderr] 2021-06-26 19:45:12.271437: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:206] Restoring SavedModel bundle.
[stderr] 2021-06-26 19:45:12.273924: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: success: OK. Took 3338 microseconds.
[stderr] 2021-06-26 19:45:12.282239: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:38] Reading SavedModel from: /windows/Users/jimne/Desktop/OtherStuff/tensorflow_java/tensorflow-core/tensorflow-core-api/target/test-classes/saved_model
[stderr] 2021-06-26 19:45:12.282918: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:90] Reading meta graph with tags { serve }
[stderr] 2021-06-26 19:45:12.282937: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: /windows/Users/jimne/Desktop/OtherStuff/tensorflow_java/tensorflow-core/tensorflow-core-api/target/test-classes/saved_model
[stderr] 2021-06-26 19:45:12.283734: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:206] Restoring SavedModel bundle.
[stderr] 2021-06-26 19:45:12.287482: I external/org_tensorflow/tensorflow/core/profiler/lib/profiler_session.cc:126] Profiler session initializing.
[stderr] 2021-06-26 19:45:12.287504: I external/org_tensorflow/tensorflow/core/profiler/lib/profiler_session.cc:141] Profiler session started.

The [stderr] messages is what is printed to stderr, the Log message: is my echoing.

Looking at the code for direct_session.cc and placer.cc, both look correct, so I'm not sure what's going on. This is on Linux, 2.5.0, I can make a full issue report if necessary. The full code is here: tensorflow/java#345

Building with --define no_default_logger=true didn't change anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:runtime c++ runtime, performance issues (cpu) TF 2.3 Issues related to TF 2.3 type:bug Bug
Projects
None yet
Development

No branches or pull requests

9 participants