-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some optimization for process stats collection #2645
base: master
Are you sure you want to change the base?
Conversation
Hi @dqminh. Thanks for your PR. I'm waiting for a google member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
@dashpole have you had time to look into this one ? |
container/libcontainer/handler.go
Outdated
@@ -43,6 +43,8 @@ var ( | |||
whitelistedUlimits = [...]string{"max_open_files"} | |||
referencedResetInterval = flag.Uint64("referenced_reset_interval", 0, | |||
"Reset interval for referenced bytes (container_referenced_bytes metric), number of measurement cycles after which referenced bytes are cleared, if set to 0 referenced bytes are never cleared (default: 0)") | |||
disableSocketCountStats = flag.Bool("disable_socket_count_stats", false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is currently covered under the "process" stats category in --disable_metrics. Does that not work for your usecase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply !
No, because we want to have process stats, we just don't need the socket counts ( partly because it's too costly to collect )
Sorry for the slow review. How expensive are socket stats? If we wanted to do this, we would need to add a new category for it in --disable_metrics, rather than a new flag. But we would only do that if socket stats are very expensive. |
@dashpole in the commit, i put in some numbers from when i test it With args:
and using
We have been running this for a while on our machines, here's a graph The first significant bump is when we update cadvisor to https://github.com/google/cadvisor/commits/6f30891d89cc5b9cf93634048eb9f77c748a1437 (without these patches) from 0.30, |
@bobbypage this looks legit to me :) |
@dashpole ping again. Is there anything we can do here ? |
This should be a new category of disable metrics, instead of a new flag. |
container/libcontainer/handler.go
Outdated
} | ||
if strings.HasPrefix(linkName, "socket") { | ||
socketCount++ | ||
if !noSocketCount { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add something like this instead?
if noSocketCount
return fdCount, socketCount, nil
}
GitHub action comments between the code state that there are undefined names being used.
I concur that this change is too insignificant to warrant its own option. It should be part of -disable_metrics. I think socket (count) item being dependent on process stats item is confusing to the users of that option though, and it should be made independent of process stats. (If it's not made independent, then this PR documentation updates are incomplete, all places in docs mentioning process or socket stats should be updated to note that there's a dependency between their CLI options.) Also, when there's a disable option for some item, outputting of the disabled data should also be skipped (e.g. in Prometheus output). Currently that too is missing from this PR. |
Because collecting that data is reported to have noticeable impact on cAdvisor load: google#2645 Note: this option depends on process stats being enabled, as it's a subset of them.
Here's (only build tested) code for adding "sock_count" to metrics one can use with -disable_metrics/-enable_metrics: https://github.com/eero-t/cadvisor/commits/sock-count-option I'm not sure how its dependency on process stats option should be documented. |
Because collecting that data is reported to have noticeable impact on cAdvisor load: google#2645 Added "sock_count" option depends on "process" stats being enabled, as it's a subset of them.
Because collecting that data is reported to have noticeable impact on cAdvisor load: google#2645 Added "sock_count" option depends on "process" stats being enabled, as it's a subset of them.
I've rebased above branch/commit to latest upstream (adding missing items to Prometheus doc), so that I could add "sock_count" option to relevant places in docs. If separating sock count is still useful, I could make a new PR of that to replace this...? |
Because collecting that data is reported to have noticeable impact on cAdvisor load: google#2645 Added "sock_count" option depends on "process" stats being enabled, as it's a subset of them.
Because collecting that data is reported to have noticeable impact on cAdvisor load: google#2645 Added "sock_count" option depends on "process" stats being enabled, as it's a subset of them.
Because collecting that data is reported to have noticeable impact on cAdvisor load: google#2645 Added "sock_count" option depends on "process" stats being enabled, as it's a subset of them.
Because collecting that data is reported to have noticeable impact on cAdvisor load: google#2645 Added "sock_count" option depends on "process" stats being enabled, as it's a subset of them.
`ioutil.Readdir` does extra Lstat on the files and also sorts the files, but we have no need for that, so switch to using only Readdirnames. With args: ``` /cadvisor \ -disable_metrics=sched,hugetlb,referenced_memory,cpu_topology,tcp,udp,advtcp \ -housekeeping_interval=1s -profiling=true ``` and using `profile?seconds=10`: - before: `processStatsFromProcs` takes 38.46% - after: `processStatsFromProcs` takes 25.93%
b70237f
to
ec94a2f
Compare
Socket count costs a lot of CPU cycles to collect as we need to perform readlink to determine if fd is a socket. In many applications/use cases such as load balancers etc, just fd count is enough for many monitoring purposes, so we add a option to disable socket count stats collection here with `process_socket_count` metric. Signed-off-by: Daniel Dao <dqminh89@gmail.com>
ec94a2f
to
eeb9b63
Compare
} | ||
if strings.HasPrefix(linkName, "socket") { | ||
socketCount++ | ||
if includeSocketCount { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For less indentation / more readability, this can be
if !includeSocketCount {
return
}
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i kinda prefer the current way since it cleanly scopes the socket counting logic inside the block, but this is ok too if we prefer it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other checks are early-exit ones, so it would be more consistent within that function, and the loop is already scoping the counting logic. I do not feel strongly about this though as the loop is short enough to be readable even with extra indent, and there are some other functions in this same file that are not using early-exit.
Because collecting that data is reported to have noticeable impact on cAdvisor load: google#2645 Added "sock_count" option depends on "process" stats being enabled, as it's a subset of them.
Ping again for review since it has been some time and i kinda forgot about this, sorry ! |
This may help for fixing kubernetes/kubernetes#99183. Is this PR still in progress? |
Readdirnames
instead ofioutil.Readdir
to collect fdsReadlink
is a lot of work for each fd