[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorflow and onednn logs #61322

Open
akote123 opened this issue Jul 19, 2023 · 11 comments
Open

Tensorflow and onednn logs #61322

akote123 opened this issue Jul 19, 2023 · 11 comments
Assignees
Labels
comp:mkl MKL related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:support Support issues

Comments

@akote123
Copy link

Hi,
I am trying to analyse the call flow of tensorflow and onednn. I am setting up environment vars as
export ONEDNN_VERBOSE=1
export TF_CPP_MAX_VLOG_LEVEL=1  
export omp_num_threads=1

I am collecting the logs. I was trying to map _mklops with onednn primitive. But here the onednn and mkl calls are random in log that is after 10 mkl calls I am seeing 20 onednn calls.

Is there any flags need to be set to get logs with correct mapping or do we need to map manually
filtered_intel_log.txt

@SuryanarayanaY SuryanarayanaY added the comp:mkl MKL related issues label Jul 19, 2023
@SuryanarayanaY SuryanarayanaY added the type:support Support issues label Jul 19, 2023
@sachinprasadhs sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jul 19, 2023
@huiyan2021
Copy link

Hi @akote123 , could you share the original verbose log, thanks!

@akote123
Copy link
Author

Hi @huiyan2021, I have uploaded here vit_intel_log.zip

@huiyan2021
Copy link

I guess the reason is that _mklops are logged by <<, where the outputs are fully buffered if they are redirected to a file, while oneDNN verbose always flush stdout immediately: https://github.com/search?q=repo%3Aoneapi-src%2FoneDNN+fflush&type=code

@akote123
Copy link
Author

@huiyan2021 , one more observation what I found is some ops are common_runtime/eager/execute.cc and some are in common_runtime/executor.cc , here I am not able to understand why there is two path for execution

@huiyan2021
Copy link

Hi @akote123,

from the log I can see you are using xla, so for ops can not be jitted, they come to common_runtime/eager/execute.cc, otherwise they come to common_runtime/executor.cc

also, suggest that you use trace viewer to trace executions.

@akote123
Copy link
Author

@huiyan2021 , do we have file location tensorflow source code where checking happen whether to go common_runtime/eager/execute.cc or the other one.

@huiyan2021
Copy link

@akote123 , you can refer to this article: https://whatdhack.medium.com/tensorflow-graph-graphdef-grappler-xla-mlir-llvm-etc-615191e96ebc, see XLA Flow part and call stack

@akote123
Copy link
Author
akote123 commented Oct 31, 2023

@huiyan2021 , In tensorflow the single model can go in both XLA and oneDNN or is it like either it will use XLA or oneDNN only

@huiyan2021
Copy link

Both. There may be different scenarios:

  1. Some parts of the model go in XLA path, some parts go in oneDNN path.
  2. Intel recently submitted a pilot PR to accelerate XLA’s Dot op with oneDNN.

You can refer to this RFC: https://docs.google.com/document/d/1ZzMcrjxITJeN2IjjgbzUjHh-4W1YgDUus3j25Dvn9ng/edit

@akote123
Copy link
Author
akote123 commented Oct 31, 2023

@huiyan2021 ,Thank you for the pointers I will got through it .
Actually for pretrained models how we can enable XLA both for inference and transfer learning

@huiyan2021
Copy link

same as training, you can refer to https://www.tensorflow.org/xla#enable_xla_for_tensorflow_models

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:mkl MKL related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:support Support issues
Projects
None yet
Development

No branches or pull requests

5 participants