New Question
0

openvswitch gre and mysterious RST

asked 2016-10-11 22:25:26 +0300

andybrucenet gravatar image

Hello Friends - I have just gotten my second Hyper-V compute node connected to my HA Mitaka cluster (well, HA except for Neutron ;)

Using GRE tunnels with VLAN tags. Gnarly.

But I have a mysterious problem that comes'n'goes - TCP RST. Here's the scenario:

  1. For testing - singletons for Keystone / Nova Controller (scheduler / api) / Neutron - Just to make watching logs easier.
  2. Dual HAproxy LBs in front of everything with a VIP that defines cluster entrypoint (e.g. 9696, which is then proxied by HAproxy to backend Neutron - same for all of the OpenStack functions)
  3. Neutron with ML2 using GRE (not VXLAN).
  4. For this test - two Nova Compute nodes. One standard KVM (CentOS 7), another is Hyper-V (W2K12 R2 with all SPs / patches / latest drivers). Both Compute nodes configured with Open vSwitch (2.50 from Mitaka yum repo for KVM, 2.5.1 latest CloudBase Mitaka .MSI for Hyper-V). Both configured with GRE to match Neutron.
  5. KVM Compute node runs Just Fine, thanks. It creates VMs, no connection problems.
  6. Hyper-V Compute node usually runs OK - but sometimes I get a mysterious RST.

So your next question is...versions and config!

  • Latest CentOS OpenStack Mitaka repo for the controllers, and CloudBase 2.5.1 Open vSwitch / CloudBase Mitaka download for Nova Compute.
  • On the Hyper-V Nova Compute, I dedicate a single 10GbE uplink for use by Hyper-V vSwitch.
  • Manually disabled TSO (and all of its friends) for the virtual NICs. And for the physicals NICs as well. But I've tried my tests both with / without TSO (and GSO / GRO).
  • Hyper-V Nova Compute is beefy enuf: 2 Xeon 2690 sockets (8-way) running hyperthreaded for a total of 32 cores. 256GB RAM. Separate dedicated NIC for management (as well as OOB iLO). 8x1.2TB disks running as RAID-1 for two OS disks, RAID-6 for the remaining disks. Tons of disk space.
  • My networking treats the 10GbE uplink on the Hyper-V Nova Compute host as a trunk device. I use a dedicated vif / VLAN for GRE traffic - and I keep that traffic fitting in a standard 1500-byte frame. FWIW - my OpenStack traffic is on a separate vif (and separate VLAN).
  • I use tenant-private networking with overlapping IPs (OpenFlow helps to keep all that sorted out pretty magically).

Here's how things look to Open vSwitch (and Windows):

PS C:\Users\Administrator> Get-NetAdapter

Name                      InterfaceDescription                    ifIndex Status       MacAddress             LinkSpeed
----                      --------------------                    ------- ------       ----------             ---------
br-tun                    Hyper-V Virtual Ethernet Adapter #7          31 Up           00-15-5D-09-05-0A        10 Gbps
br-int                    Hyper-V Virtual Ethernet Adapter #6          30 Up           00-15-5D-09-05-09        10 Gbps
stg                       Hyper-V Virtual Ethernet Adapter #5          28 Up           00-15-5D-09-05-06        10 Gbps
gre                       Hyper-V Virtual Ethernet Adapter #4          27 Up           00-15-5D-09-05-05        10 Gbps
aos                       Hyper-V Virtual Ethernet Adapter #3          26 Up           00-15-5D-09-05-04        10 Gbps
br-eno1                   Hyper-V Virtual Ethernet Adapter #2          25 Up           00-15-5D-09-05-03        10 Gbps
ens1f3                    HP NC365T PCIe Quad Port Gigabit S...#4      16 Not Present  AC-16-2D-A1-3D-83          0 bps
ens1f2                    HP NC365T PCIe Quad Port Gigabit S...#3      12 Not Present  AC-16-2D-A1-3D-82          0 bps
ens1f1                    HP NC365T PCIe Quad Port Gigabit S...#2      17 ...
(more)
edit retag flag offensive close merge delete

1 answer

Sort by » oldest newest most voted
0

answered 2016-10-25 21:13:17 +0300

aserdean gravatar image

Hi Andy,

Thanks for the well written report, and using OVS on Windows!

I will try to answer the best I can:

  1. To put it simple no. The only setting that should be changed is MTU in the vNIC but that is handled by the dhcp server(it should be automated).

  2. No. The only thing that could maybe cause a problem is a flow that is denying the TCP connection, but since you get a RST it shouldn't be the case.

  3. I think this is the cause of the problem. Maybe you could try to set up a netcat server before and after HAproxy to see what happens when you send packets from the Windows side. I have no idea what version you are running, but I remember a talk with a friend, not long ago, which had to downgrade the version because they implemented a feature for RST attacks which wasn't fully polished yet.

  4. Besides what I said above (if it is still debatable) maybe try eliminating HAproxy to see if your issues disappear and after try adding it back.

Thanks, Alin.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

Stats

Asked: 2016-10-11 22:25:26 +0300

Seen: 496 times

Last updated: Oct 25 '16