New Question
0

Ceph for Windows - TCP Session count

asked 2023-06-08 19:09:30 +0300

mstidham gravatar image

We've been testing the Ceph for Windows driver (WNBD driver) for a while now and have ran into an interesting situation. It appears that every rbd-wnbd connected volume opens an abnormal amount of tcp sessions to the ceph hosts. This could be a coincidence but using some quick math it appears that every rbd connected volume has a tcp session for every osd on every ceph node.

If I have a Hyper-V host with lets say 60 connected rbd volumes and I run the following netstat commands; we find that the number of sessions is roughly the same as the number of rbd volumes * the number of servers * the number of osd's per server. For example:

netstat -nao | find /i "ip of a single ceph node" /c will show around 1440 tcp sessions to that node. 1440 / 60 = 24.

So when you have 60 connected rbd volumes and 13 ceph nodes with 24 osds; the number of tcp sessions is around 18720.

By default; Windows caps the number of source ports to 16384. It is around this mark that we started having trouble with rbd volumes randomly being disconnected or having issues. That combined with the tcp/ip port exhaustion errors we were seeing in event log is how we discovered this issue. To get around the issue we expanded the port range in windows to 32768. However; we see this as a potential issue as the number of rbd volumes we'll want to connect to a single Hyper-V host to increase AND we expect to continue adding nodes to the Ceph cluster. It won't take long until we're butting up against this limitation again.

As an added measure; I decided to spin up a proxmox cluster to see if it opens the same amount of connections per LUN and it does not appear to. The connection count does not seem to corelate to the amount of nodes or the amount of osds per node at all. The connection count was surprisingly really low.

I'm hoping with this post that maybe someone can share some insight into this. Is this something you're already aware of? Is this expected behavior and why? Have we misconfigured something? Just looking for some guidance. Thank you

edit retag flag offensive close merge delete

1 answer

Sort by » oldest newest most voted
0

answered 2023-06-09 14:48:23 +0300

lpetrut gravatar image

Hi,

Other Ceph users reported similar problems with rbd + qemu: https://ceph-users.ceph.narkive.com/S....

The issue is that some rbd clients (e.g. qemu + lirbd, rbd-nbd or rbd-wnbd) do not reuse OSD connections across image mappings, unlike krbd I think.

Decreasing the connection idle timeout (ms_connection_idle_timeout, defaults to 15min) or partitioning the cluster might help but it's less than ideal.

I'll run some rbd-wnbd experiments, tweaking it to use a single process per host instead of one per mounted image. If everything goes well, I'll open a pull request upstream.

Thanks, Lucian

edit flag offensive delete link more

Comments

Just wanted to check back on this to see what the outcome of your experiments are. I haven't seen any updates to the wnbd driver page. Thanks

mstidham gravatar imagemstidham ( 2023-06-30 15:48:59 +0300 )edit

The good news is that the OSD connections are shared when using a single process, so the number of connections is significantly reduced. The bad news is that some RBD operations (e.g. mapping images using the same RADOS context) don't seem to be thread safe and often crash.

lpetrut gravatar imagelpetrut ( 2023-07-05 10:17:44 +0300 )edit

I'll need to discuss it with the RBD team and see if it's a known issue.

lpetrut gravatar imagelpetrut ( 2023-07-05 10:18:46 +0300 )edit

Thanks for the update.

mstidham gravatar imagemstidham ( 2023-07-05 17:52:45 +0300 )edit

I've fixed the crashes, will have a PR soon. We'll most probably include this in the Reef msi.

lpetrut gravatar imagelpetrut ( 2023-07-12 11:05:40 +0300 )edit

That's great news. Can't wait to check it out. Thanks for the update.

mstidham gravatar imagemstidham ( 2023-07-12 23:07:02 +0300 )edit

The PR is up for review: https://github.com/ceph/ceph/pull/52540. Let me know if you'd like to try out a custom MSI before the official Reef release is ready.

lpetrut gravatar imagelpetrut ( 2023-07-19 16:16:19 +0300 )edit

Thanks for the update. Yes that would be great if I could test it out early. Please let me know how I can get that MSI. Thanks again.

mstidham gravatar imagemstidham ( 2023-07-28 16:09:54 +0300 )edit

Here's a preview MSI: https://cloudbase.it/downloads/ceph_dev_single_process.msi. Please let me know if you hit any issues.

lpetrut gravatar imagelpetrut ( 2023-08-03 16:22:19 +0300 )edit

I sure will. We'll start testing this in our environment and will let you know what we find. Thanks

mstidham gravatar imagemstidham ( 2023-08-04 16:07:08 +0300 )edit

The PR hasn't merged upstream yet but we might apply it downstream. I did notice a few issues while running some tests: the ceph-rbd service can no longer start without a properly configured ceph.conf file (I'll need to update the msi and document this).

lpetrut gravatar imagelpetrut ( 2023-08-25 15:43:25 +0300 )edit

Also, the PR caches a single cluster connection, meaning that we can no longer connect to multiple clusters. One workaround would be to run in foreground mode (-f) but we'll need a proper fix, perhaps using the "--cluster" cli parameter and caching one rados context per cluster.

lpetrut gravatar imagelpetrut ( 2023-08-25 15:45:30 +0300 )edit

The above issues have been addressed, here's a beta Reef MSI that includes this fix: https://cloudbase.it/downloads/ceph_reef_beta.msi

lpetrut gravatar imagelpetrut ( 2023-09-04 12:51:24 +0300 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

Stats

Asked: 2023-06-08 19:09:30 +0300

Seen: 359 times

Last updated: Jun 09 '23