High CPU utilization on rabbitMQ 3.8.5

1,554 views
Skip to first unread message

Gopi V

unread,
Oct 6, 2020, 3:22:46 AM10/6/20
to rabbitmq-users
I've upgraded from rabbitMQ version: 3.4.4 on Erlang R16B03 (erts-5.10.4) to rabbitMQ 3.8.5 on Erlang 11.0.3 (OTP 23).
When I ran rabbitmq-perf-test on 3.8.5 it was able to handle 1000's of messages per second with 500 producers and 500 consumers. (https://rabbitmq.github.io/rabbitmq-perf-test/stable/htmlsingle). I did see CPU utilization around 500%-600% on 8 CPU system where as it stayed below 1 CPU utilization for 100-200 messages/sec which is the case of 3.4.4 production cluster.
But in production 3.8.5 is taking 50% more CPU when compared to 3.4.4.

Below is the output of rabbitmq-diagnostics runtime_thread_stats which is taking more time on other (I assume its busy wait). The default value of scheduler busy wait (sbwt) is medium on both the version of erlang. When I changed this to None the CPU utilization went down from 4-5 CPU's to 1 CPU, when I tried very_short then CPU utilization is same as 3.4.4 which has sbwt set to medium (default from erlang docs). Is this a know issue?

Average thread real-time : 5000228 us
Accumulated system run-time : 12160527 us
Average scheduler run-time : 1511950 us

    Thread      aux check_io emulator       gc    other     port    sleep
Stats per thread:
scheduler( 1) 0.47% 0.52% 7.89% 0.54% 52.64% 1.27% 36.68%
scheduler( 2) 0.53% 0.49% 7.15% 0.52% 51.70% 1.11% 38.51%
scheduler( 3) 0.50% 0.50% 7.46% 0.51% 52.58% 1.17% 37.28%
scheduler( 4) 0.40% 0.35% 5.33% 0.36% 46.75% 0.79% 46.01%
(ignoring scheduler 5 to 8)

Paolo Losi

unread,
Oct 6, 2020, 4:51:48 AM10/6/20
to rabbitmq-users
I can confirm higher CPU usage upgrading from rabbitmq 3.6.2 to 3.8.9 

3.8.9 is installed from Bintray repository on a Ubuntu Focal 2020.04 updated installation

Karl Nilsson

unread,
Oct 6, 2020, 4:58:15 AM10/6/20
to rabbitm...@googlegroups.com
Hi,

You may need to do some runtime tuning, in particular try disabling scheduler busy wait.



Cheers
Karl

From: rabbitm...@googlegroups.com <rabbitm...@googlegroups.com> on behalf of Paolo Losi <pa...@enuan.com>
Sent: 06 October 2020 9:51 AM
To: rabbitmq-users <rabbitm...@googlegroups.com>
Subject: [rabbitmq-users] Re: High CPU utilization on rabbitMQ 3.8.5
 
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/f70040ca-7080-402e-839e-f3415dcc8269n%40googlegroups.com.

gopi V

unread,
Oct 6, 2020, 2:38:41 PM10/6/20
to rabbitmq-users

Setting below value will for sure reduce CPU utilization.

RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+sbwt very_short +sbwtdcpu very_short +sbwtdio very_short"

or 

RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+sbwt none +sbwtdcpu none +sbwtdio none" 

But I want to know 
erts-5.10.4 has only sbwt flag with default value as medium (http://erlang.org/documentation/doc-5.10.4/erts-5.10.4/doc/html/erl.html). Erlang 11.0.3 has sbwt flag with default value as medium (http://erlang.org/documentation/doc-11.0-rc3/erts-11.0/doc/html/erl.html#+sbt) but also introduced sbwtdcpu and sbwtdio with default value as short. From settings nothing has changed with scheduler busy wait threshold and I do see twice CPU utilization with 11.0.3. 
1. Does anyone know there are significant amount of change in scheduling that would cause this? 
2. If yes, then apart from consuming CPU when traffic is sparse does this have any negative impact when traffic is continuous and number of messages doubles? Technically,  in this scenario scheduler threads won't be in busy wait state.
3. I tried to replicate my production scianrio with 500 messages/secs for 10 secs and keeping it idle for 3 secs but rabbitMQ is only utilizing 1 CPU and less 10% on of busy wait. I want to replicate a scenario where for a workload 3.8.5 has 50% on busy wait consuming x CPU's while 3.4.4 consumes x/2 CPU's. Then go back to 3.8.5 and set sbwt to very_short and run the test again to see if there was any significant impact on throughput. 

./runjava com.rabbitmq.perf.PerfTest --queue-pattern 'perf-test-%d' --queue-pattern-from 1 --queue-pattern-to 6 --producers 500 --consumers 500 --json-body --size 16000 --body-content-type application/json --variable-rate 1:10 --variable-rate 0:3  

David Cook

unread,
Oct 7, 2020, 2:23:10 AM10/7/20
to rabbitmq-users
I'm using Ubuntu 18.04 and RabbitMQ 3.6.10-1ubuntu0.1 which looks like it's using Erts 9.2. 

A running RabbitMQ with 1 worker connection and 0 messages being processed seems to use 2% CPU on my 8 CPU system.

After using "+sbwt none", I've seen "sched_yield" syscalls disappear when using strace on beam.smp.

However, I'm noticing that beam.smp still is invoking clock_gettime and futex very rapidly, which still leaves a fairly constant 1% CPU usage, even when processing 0 messages. 

I've done a bit of reading and looked through the otp source code a bit, and it looks like these syscalls are being generated by https://github.com/erlang/otp/blob/master/erts/lib_src/pthread/ethr_event.c#L65-L161 which looks like it's still doing busy waiting/spinning, even with +swbt none.

According to https://www.rabbitmq.com/runtime.html#busy-waiting, we shouldn't be using speculative busy waiting anymore, and the schedulers should be put to sleep, but there are no sleep syscalls and there is that constant syscall usage. 

Is this a RabbitMQ/Erlang bug or a misunderstanding of RabbitMQ's documentation?

Luke Bakken

unread,
Oct 7, 2020, 9:41:33 AM10/7/20
to rabbitmq-users
Hi David,

Note that both your RabbitMQ and Erlang version are out-of-date.

If you upgrade both, you should be able to use the environment variables pointed out below to reduce CPU use to a minimum. Note that the Erlang VM will still use some CPU at all times, as you found in the VM code.

Thanks,
Luke

M K

unread,
Oct 7, 2020, 3:46:28 PM10/7/20
to rabbitmq-users
There is a perception in this thread that message rates is the only important aspect for a node's CPU utilization.
That's not correct, or at least a very incomplete view of things.

An RabbitMQ node that is "idle" because there is no message traffic has at least some stateful
stats-emitting entities: queues, connections, channels. It also has a bunch of processes per virtual host, namely a pair of message stores.
Then it has periodic operations (often related to metric collection) and so does its schema data store.

So an "idle" node is not actually idle; and in most cases, the number of Erlang processes that perform those operations
usually outnumbers the cores available to the node, so CPU context switching and timed events are expected.
They can consume one percent of the CPU or more. The doc guide mentioned in this thread explains how to reduce most obvious
effects but it still won't be zero today or with any substantial changes planned for future RabbitMQ versions. In part this
is how the runtime works, and while it may be somewhat annoying for very idle systems, it yields much better core utilization
for more realistic workloads.

Gopi V

unread,
Oct 7, 2020, 5:36:54 PM10/7/20
to rabbitmq-users
Michael, 

I understand that keeping scheduler threads in busy wait have performance improvement. I know that scheduler is managed by erlang VM but I see twice CPU consumption for the same workload in production where scheduler busy wait threshold is medium when comparing erts-5.10.4 and Erlang 11.0.3. I just want to understand the reason for this behavior. If this how Erlang 11.0.3 runtime works then we may have to change threshold from medium to very_short or none and compare if the performance is close to previous version I'm running. 
I just curious to know that after erts-5.10.4 the actual time to wait for medium might have increased or there was some major update done to scheduler code that is causing this behavior. 

Luke Bakken

unread,
Oct 7, 2020, 5:46:33 PM10/7/20
to rabbitmq-users
Hello,

Erlang R16B03 was released almost 7 years ago. There have been far too many changes to the VM for us to follow along.

In your original message you made this statement:

when I tried very_short then CPU utilization is same as 3.4.4 which has sbwt set to medium (default from erlang docs)

If using very_short provides the CPU utilization you are looking for, then use that setting and run benchmarks to ensure it does not adversely affect your performance.

Thanks,
Luke

Michael Klishin

unread,
Oct 7, 2020, 5:54:47 PM10/7/20
to rabbitmq-users
We are not aware of any changes but there likely were relevant changes. It'd be a great question for erlang-users where the Erlang/OTP core team members are.

Gopi V

unread,
Oct 7, 2020, 6:02:34 PM10/7/20
to rabbitmq-users
Yeah, I've signed up for erlang-questions and emailed erlang-q...@erlang.org. I haven't got any reply yet. Perf test I've done didn't showed any high CPU utilization. Anyway I'll try to add one 3.8.5 to production cluster with sbwt set to very_short and check performance. 

dc...@prosentient.com.au

unread,
Oct 7, 2020, 7:22:48 PM10/7/20
to rabbitm...@googlegroups.com

Hi Luke,

 

Thanks for your message.

 

I’ll take on board your comment about upgrading. I have been using the packages available in the Ubuntu 18.04 LTS repositories, but as you (and this link https://www.rabbitmq.com/install-debian.html) indicate, it does appear that there is a reasonable package-based alternative. Cheers.

 

If I understand the Erlang change logs correctly, “+sbwt none” in older versions of Erlang should do the same thing as “+sbwt none +sbwtdcpu none +sbwtdio none” in newer versions of Erlang. I think the change happened in Erts 10.0.

 

I imagine that my confusion just arises from not understanding Erlang internals well enough. Both https://www.rabbitmq.com/runtime.html#busy-waiting and http://erlang.org/doc/man/erl.html#+sbwt say that the option disables scheduler busy wait and that the schedulers go to sleep, and there is evidence that there are fewer syscalls (particularly sched_yield) being generated after switching to “+sbwt none”

 

However, strace shows that beam.smp is doing a spin lock which is busy waiting. Upon reflection, the documentation for +sbwt does seem to indicate it only applies to schedulers and not other aspects of the VM. It does seem like newer versions of Erlang use fewer spin locks, so maybe an upgrade will improve things too. I am OK if the answer is that the VM will always spin a bit and can’t be configured otherwise. I just want to know that it’s by design heh.

 

That said, I suppose I am a bit surprised that the VM spins so much. In the case of RabbitMQ, I can see where that would be handy in terms of performance for heavy workloads, but for light to moderate workloads it would be nice to just sleep and wake when there’s something to read over a network socket. I admit that I was digging through Golang internals yesterday to try to do a comparison, but I don’t know Erlang well enough to try a like-for-like comparison. Maybe that’s an exercise for me to do another time just for my own curiosity.

 

Anyway, it might be worth adding something to https://www.rabbitmq.com/runtime.html#busy-waiting saying that the Erlang VM will continue to busy wait overall and that this is normal and to be expected.

 

David Cook

Software Engineer

Prosentient Systems

72/330 Wattle St

Ultimo, NSW 2007

Australia

 

Office: 02 9212 0899

Online: 02 8005 0595

 


Sent: Thursday, 8 October 2020 12:42 AM
To: rabbitmq-users <rabbitm...@googlegroups.com>

--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/BYJzgySEdr8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/1d718155-c03d-4ef9-8436-320e727d363co%40googlegroups.com.

dc...@prosentient.com.au

unread,
Oct 7, 2020, 7:35:56 PM10/7/20
to rabbitm...@googlegroups.com

Thanks for the well-worded message, M K. Much appreciated.

 

I did notice about 230 kernel threads being used by the 1 beam.smp process, so I figured that there must be something else going on too.

 

Your words inspire me. I’ve added “+sbwt none” and using strace, it looks like the frequent futex and clock_gettime syscalls are all coming from scheduler threads which are supposed to be asleep. While I see a few syscalls coming from threads with other names, it’s the scheduler threads (e.g. 1_scheduler through 8_scheduler) that seem to be spinning.

 

But, as Luke noted, this is an older RabbitMQ and Erlang. I am tempted to try a newer version and see how the Erlang VM behaves.

 

David Cook

Software Engineer

Prosentient Systems

72/330 Wattle St

Ultimo, NSW 2007

Australia

 

Office: 02 9212 0899

Online: 02 8005 0595

 

--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/BYJzgySEdr8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/ee5edc5b-42a9-4774-a5d3-b872f7813ee7o%40googlegroups.com.

dc...@prosentient.com.au

unread,
Oct 7, 2020, 9:56:48 PM10/7/20
to rabbitm...@googlegroups.com

Ok some updates…

 

I remembered that I have a openSUSE Leap 15.1 server running RabbitMQ 3.7.14 with Erlang 20.3.8.15 and Erts 9.3.3.6 with 12 CPU, and RabbitMQ is doing great there. It’s staying steady at 0% CPU usage when idle. Great! Every 10 seconds or so, it might spike to 0.333% or 2% or something. The majority of the work is being done by the “1_scheduler” thread and then a little bit on the “aux” thread. The other 11 scheduler threads are all sleeping like you’d hope/expect. There are lots of futex, clock_gettime, and epoll_pwait syscalls.

 

Using “docker run --rm -it -e RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+sbwt none +sbwtdcpu none +sbwtdio none" rabbitmq:3.8.9”, the majority of work is being done by 1_scheduler thqread. It’s only a 2 CPU system and spikes to 2% every 5 seconds or so. 1_scheduler is pretty much just doing a spin lock with the futex syscall. The “aux” thread seems to be trying to polling a file descriptor pretty rapidly but I’m not bothered. Occasionally, 10_dirty_io_scheduler and 9_dirty_io_scheduler also do things ,which I find a bit surprising.

 

With Ubuntu 18.04 with RabbitMQ 3.6.10 with Erlang 20.2.2 and Erts 9.2, several scheduler threads are spinning. So I’m guessing there must be an Erlang optimization at some point between 20.2.2 and 20.3.8 where it only spins the 1st scheduler thread instead of all of them.

 

I reckon the Erlang (and consequently the RabbitMQ) documentation is a little bit misleading, as it’s clear that at least 1 scheduler, even on new RabbitMQ is still busy waiting, even when using +sbwt none (and friends).

 

Looks like openSUSE Leap 15.1 is using an older kernel than Ubuntu 18.04, so that’s probably not relevant. I’m curious why RabbitMQ seems to be working more efficiently when idling on openSUSE Leap 15.1 than Ubuntu 18.04.

 

Looking at the Dockerfiles, it looks like the Docker images use Ubuntu 18.04 as well (https://github.com/docker-library/rabbitmq/blob/35b41e318d9d9272126f681be74bcbfd9712d71b/3.8/ubuntu/Dockerfile).

 

I didn’t see anything obvious in terms of CLI command flags or environment variables that would explain the difference. Both the openSUSE Leap 15.1 and Ubuntu 18.04 have similar overall system load (and are both RabbitMQ instances are idling). But comparing a fully virtualized server and a Docker container isn’t really comparing apples to apples either.

 

Overall, it seems like upgrading RabbitMQ and Erlang would be a good idea, as it does seem like newer versions are more efficient. Even comparing the Ubuntu 18.04 fully virtualized server with the Docker container shows that the Docker container on the newer RabbitMQ/Erlang idles much more efficiently. A quick comparison suggests that the newer version idles using 3x less CPU time.  

 

The older RabbitMQ on openSUSE used 2x less CPU time than the newer RabbitMQ in Docker…

 

Anyway, I suppose that’s my questions answered. I might get in touch with the Erlang team about clarifying their documentation.

 

David Cook

Software Engineer

Prosentient Systems

72/330 Wattle St

Ultimo, NSW 2007

Australia

 

Office: 02 9212 0899

Online: 02 8005 0595

 

Gopi V

unread,
Oct 8, 2020, 11:48:16 PM10/8/20
to rabbitmq-users
David, 

Have you tried rabbitMQ 3.8.5 with Erlang 23?

dc...@prosentient.com.au

unread,
Oct 11, 2020, 6:32:22 PM10/11/20
to rabbitm...@googlegroups.com

Hi Gopi,

 

No, I just tried RabbitMQ 3.8.9. I’m not sure which Erlang it used although I could take a look.

 

I just tried “docker run --rm -it -e RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+sbwt none +sbwtdcpu none +sbwtdio none" rabbitmq:3.8.5”, and it looks like it’s using Erlang/OTP 23 and Erts 11.0.3. Seemed to run the same as the 3.8.9.

 

David Cook

Software Engineer

Prosentient Systems

72/330 Wattle St

Ultimo, NSW 2007

Australia

 

Office: 02 9212 0899

Online: 02 8005 0595

 

Gerhard Lazu

unread,
Oct 12, 2020, 6:34:53 AM10/12/20
to rabbitmq-users
FWIW, Gopi started a thread in erlang-questions that I've just added some details to which may be of interest: http://erlang.org/pipermail/erlang-questions/2020-October/100035.html

dc...@prosentient.com.au

unread,
Oct 12, 2020, 10:29:17 PM10/12/20
to rabbitm...@googlegroups.com

Cheers, Gerhard. The details you’ve added in erlang-questions are really interesting.

 

What function is used to collect those metrics every 5 seconds? Is the interval controlled by the “collect_statistics_interval” key?

 

I’m tempted to play with that configuration now to see if that is what is responsible for most of the CPU usage I was seeing…

 

David Cook

Software Engineer

Prosentient Systems

72/330 Wattle St

Ultimo, NSW 2007

Australia

 

Office: 02 9212 0899

Online: 02 8005 0595

 

Reply all
Reply to author
Forward
0 new messages