Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

🏠 Back to Blog

troubleshooting

Measure CPU time of a process

We can measure CPU time using time. Be aware that there are two implementations of time, and you may be running the wrong one. There is a bash built-in named time, which does not provide extensive statistics. You want to use the time utility at /usr/bin/time. Run which time to see which one you are using.

ryan:// $ time httping -delay 2 www.google.com
Time				Count	Url				Result		Time		Headers
-----				-----	---				------		----		-------
[ 2023-01-21T09:29:49-05:00 ]	[ 0 ]	[ https://www.google.com ]	[ 200 OK ]	[ 136ms ]	[  :  ]
[ 2023-01-21T09:29:52-05:00 ]	[ 1 ]	[ https://www.google.com ]	[ 200 OK ]	[ 111ms ]	[  :  ]
[ 2023-01-21T09:29:54-05:00 ]	[ 2 ]	[ https://www.google.com ]	[ 200 OK ]	[ 105ms ]	[  :  ]
^C
Total Requests: 2

real	0m6.384s
user	0m0.000s
sys	0m0.021s
  • Real time - represents the total time the application spent running. This is the user time + system (kernel) time + time spent waiting (the process could be waiting of various things… waiting for CPU time, waiting on network resources, etc.)
  • User time - represents the time the CPU spent running the program itself
  • Sys time - represents the time the Kernel spent doing the process’s work (for example, reading files and directories)

You can determine how much time the process spent waiting by substracting the user and sys times from the real time: real - (user + sys) = time waiting. You can see in the example above we spent ~6 seconds waiting, in this case we were waiting on network resources.

Measuring and troubleshooting load average

You can use uptime to get the overall load average of the system:

ryan:wc/ $ uptime
 09:43:44 up 5 days,  4:34,  1 user,  load average: 0.27, 0.36, 0.29
                                                       ^     ^     ^
                                                       |     |     |
                                                       |     |     -- 15 minutes
                                                       |     --------- 5 minutes
                                                       --------------- 1 minute

uptime shows the overall time since the last reboot. It also shows load averages over 1 minutes, 5 minutes, and 15 minutes, respectively.

If a load average goes up to around 1, a single process is likely using all of that CPU. With multi-core/processor systems, if a load average goes up to 2 (or more), this means that all cores have just enough to do all of the time. To troubleshoot processes, use `top` or (preferably) `htop`. Processes consuming more CPU than other's will typically rise to the top of the list.

A high load average doesn't necessarily mean there is a problem. If you see a high load average, but your system is responding well, don't panic. The system just has a lot of processes sharing the CPU. On servers with high compute demands (such as web servers or servers that serve in scientific computations), processes and threads are being started and stopped so quickly that the load averages will be skewed and innacurate. However, if a load average is high and the system performance is suffering, you are likely running into memory problems. When a system is low on memory, it will start to thrash, or rapidly swap pages to and from disk. This is less of a problem on modern systems using solid state storage such as SSDs or NVMe. On traditional systems with spinning media, this can be an issue.

Measuring and troubleshooting memory

One of the simplest ways to view memory status on your system is to use the `free` command or view `/proc/meminfo`

You can also use vmstat to view memory performance on a system. vmstat is one of the oldest utilities for this purpose. It has minimal overhead and is a no-frills kind of program. The output is a bit difficult to read for those who are unfamiliar. You can use it to see how often the kernel is swapping pages in and out, how busy the CPU is, and how I/O resources are being utilized. To use it, run vmstat 2 (with 2 being the seconds in between updating the screen)

ryan:todo$ vmstat 2
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 23791304 233996 4724548    0    0    30    30   10  237  5  1 94  0  0
 0  0      0 23807492 233996 4704048    0    0     0   164 2571 3426  2  2 96  0  0
 0  0      0 23806096 234004 4703776    0    0     0    34  586 1424  1  0 99  0  0
 0  0      0 23808876 234004 4703968    0    0     0    70  522 1152  1  0 99  0  0
 0  0      0 23816764 234004 4696736    0    0     0     0  591 1293  1  0 99  0  0