-
Notifications
You must be signed in to change notification settings - Fork 74.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set_intra_op_parallelism_threads and set_inter_op_parallelism_threads has no impact on thread usage #48772
Comments
I just wanted to include some followup info from efforts I have made to debug the issue over the past week. I had attempted to set OMP_NUM_THREADS prior to running the python file, this also appears to have no impact. I have also set the environment variables TF_NUM_INTEROP_THREADS and TF_NUM_INTRAOP_THREADS prior to application runtime, checking them using os.environ from within python both before and after tensorflow is imported, as well as the tf.config.threading.set_intra/inter_op_parallelism_threads calls, the environment variables remain unchanged at 1, however, when observing the python process using top, I observe that it is using 13 threads using the nTH column in top. Please let me know if it seems I am making an incorrect assumption on anything here. Thank you. |
If there are any other issues that are relevant that I have been unable to unearth during my searches that might be able to help, I would love if somebody could link them! While this issue doesn't prevent me from being able to do work, it dramatically slows the rate at which I can do so and none of my attempts to resolve it have been successful. |
Can someone clarify for me whether the functions Adding |
Is there a possibility of getting a response on this issue? Because otherwise, it is probably best to close it. The issue is still unresolved but there has been no activity outside of my own updates. Any additional insight would be appreciated. Also, just to clarify my understanding from above, the functions set_inter_op_parallelism and set_intra_op_parallelism should affect all processes involved with TensorFlow, including both the nvidia-cuda-mps and python processes. My issue is that Python is scaling to all available threads and I want to make sure I am understanding correctly that these functions should effect the Python process, not another process. |
Apologies for the delayed response. Would it be possible to provide some sample D_cropped.npy and K_cropped.npy so that I can try and reproduce the issue? Generally the inter op and intra op threads are meant for the C++ tensorflow executor runtime - to be precise the inter op threadpool controls the number of C++ threads we use to dispath ops which executing functions / graphs and the intra op threadpool controls the size of the EIGEN threadpool for individual ops. The GPU runtime with NVIDIA might have its own threadpool and thats why that needs to be configured as well. Usually we rarely do any multi-threading in python but I'd like to reproduce the issue to figure out what might be going on. One other suggestion to get more data would be to run the Tensorflow profiler https://www.tensorflow.org/guide/profiler which would clearly enumerate the list of threads in action. That might provide some insight as well. |
Here is a sample of the files, Sorry for the delayed response, I will also look into the profiler on my end! |
Thanks... I was able to reproduce the issue and yeah I do see a large number of threads. A hypothesis is that there are some tf.data threads that you're seeing. But it'll be a lot clearer with the profile. |
Just got the opportunity to try the profiler, sorry for the further delay, was working to meet some deadlines. Any attempt I make to use profiling appears to immediately cause a segfault. I will update the GitHub (I've been behind on that anyway) and the relevant files can be found under the Groups directory, I made some modifications to the code in an attempt to improve runtime, using a different method of implementing a DepthwiseConv3D layer I found. Below is the stack trace from the GDB debugger, everything else should be about the same. The Segmentation Fault consistently occurs at the model.fit call on line 149 in the runMotor.py file.
This might require a separate issue post, if so let me know and I will proceed. I am approaching concluding the project I was working on, but am fully willing to continue with efforts to debug this issue as I am able. Let me know if you have any other questions! |
System information
Describe the current behavior
I am running code on a compute cluster, hence the different GPUs. I am required by the compute cluster admin to restrict thread usage when possible and had been referred to use the functions from tf.config.threading in the tensorflow documentation. I set both intra and inter thread parallelism to 2 and used an interactive session on the node to monitor thread usage with top, however the usage of the thread parameters seems to have no impact on thread usage. I still observe the python process using all available threads.
My understanding from the documentation for these functions is that all is required is to call them with the parameters wanted, no errors related to threading have been raised.
My github repository is as follows, I use the mult.csh file under the EEGNet folder to execute the code, which runs the runMB3D.py file, using the network model from MB3DEEGNet.py.
https://github.com/matt-houk/MB3DCNN
I have attached both the stderr and stdout output, I canceled the code when I noticed it using excessive threads.
err-mult.txt
out-mult.txt
The text was updated successfully, but these errors were encountered: