New Question
0

WMI job failed with status 10. Error details: Failed to add device 'Ethernet Connection' [closed]

asked 2018-06-29 17:11:43 +0200

Nebukazar gravatar image

Hi,

In addition to the issue we're having @ https://ask.cloudbase.it/question/275... we are having an issue with neutron and one of our instance. The instance appears to be running just fine, but we're seeing a bunch of errors within the neutron's logfile where the instance is running at.

The issue seems to be occurring on any hyperv node that is running the instance.

The error messages are as follow:

2018-06-29 10:04:01.970 3648 ERROR hyperv.neutron.hyperv_neutron_agent [-] Exception encountered while processing port c17da5c8-61ea-4da8-ae81-a02989ab3226.
2018-06-29 10:04:01.970 3648 ERROR hyperv.neutron.hyperv_neutron_agent Traceback (most recent call last):
2018-06-29 10:04:01.970 3648 ERROR hyperv.neutron.hyperv_neutron_agent   File "C:\PROGRA~1\CLOUDB~1\OPENST~1\Nova\Python27\lib\site-packages\hyperv\neutron\hyperv_neutron_agent.py", line 336, in _process_added_port
2018-06-29 10:04:01.970 3648 ERROR hyperv.neutron.hyperv_neutron_agent     device_details['admin_state_up'])
2018-06-29 10:04:01.970 3648 ERROR hyperv.neutron.hyperv_neutron_agent   File "C:\PROGRA~1\CLOUDB~1\OPENST~1\Nova\Python27\lib\site-packages\hyperv\neutron\_common_utils.py", line 35, in wrapper
2018-06-29 10:04:01.970 3648 ERROR hyperv.neutron.hyperv_neutron_agent     return inner()
2018-06-29 10:04:01.970 3648 ERROR hyperv.neutron.hyperv_neutron_agent   File "C:\PROGRA~1\CLOUDB~1\OPENST~1\Nova\Python27\lib\site-packages\oslo_concurrency\lockutils.py", line 271, in inner
2018-06-29 10:04:01.970 3648 ERROR hyperv.neutron.hyperv_neutron_agent     return f(*args, **kwargs)
2018-06-29 10:04:01.970 3648 ERROR hyperv.neutron.hyperv_neutron_agent   File "C:\PROGRA~1\CLOUDB~1\OPENST~1\Nova\Python27\lib\site-packages\hyperv\neutron\_common_utils.py", line 34, in inner
2018-06-29 10:04:01.970 3648 ERROR hyperv.neutron.hyperv_neutron_agent     return f(*args, **kwargs)
2018-06-29 10:04:01.970 3648 ERROR hyperv.neutron.hyperv_neutron_agent   File "C:\PROGRA~1\CLOUDB~1\OPENST~1\Nova\Python27\lib\site-packages\hyperv\neutron\hyperv_neutron_agent.py", line 315, in _treat_vif_port
2018-06-29 10:04:01.970 3648 ERROR hyperv.neutron.hyperv_neutron_agent     physical_network, segmentation_id)
2018-06-29 10:04:01.970 3648 ERROR hyperv.neutron.hyperv_neutron_agent   File "C:\PROGRA~1\CLOUDB~1\OPENST~1\Nova\Python27\lib\site-packages\hyperv\neutron\hyperv_neutron_agent.py", line 251, in _port_bound
2018-06-29 10:04:01.970 3648 ERROR hyperv.neutron.hyperv_neutron_agent     self._utils.connect_vnic_to_vswitch(map['vswitch_name'], port_id)
2018-06-29 10:04:01.970 3648 ERROR hyperv.neutron.hyperv_neutron_agent   File "C:\PROGRA~1\CLOUDB~1\OPENST~1\Nova\Python27\lib\site-packages\os_win\utils\network\networkutils.py", line 284, in connect_vnic_to_vswitch
2018-06-29 10:04:01.970 3648 ERROR hyperv.neutron.hyperv_neutron_agent     self._jobutils.add_virt_resource(port, vm)
2018-06-29 10:04:01.970 3648 ERROR hyperv.neutron.hyperv_neutron_agent   File "C:\PROGRA~1\CLOUDB~1\OPENST~1\Nova\Python27\lib\site-packages\os_win\_utils.py", line 181, in inner
2018-06-29 10:04:01.970 3648 ERROR hyperv.neutron.hyperv_neutron_agent     time.sleep(sleep_time)
2018-06-29 10:04:01.970 3648 ERROR hyperv.neutron.hyperv_neutron_agent   File "C:\PROGRA~1\CLOUDB~1\OPENST~1\Nova\Python27\lib\site-packages\oslo_utils\excutils.py", line 220, in __exit__
2018-06-29 10:04:01.970 3648 ERROR hyperv.neutron.hyperv_neutron_agent     self.force_reraise()
2018-06-29 10:04:01.970 3648 ...
(more)
edit retag flag offensive reopen merge delete

Closed for the following reason the question is answered, right answer was accepted by Nebukazar
close date 2018-07-16 15:50:01.703398

1 answer

Sort by » oldest newest most voted
0

answered 2018-06-29 17:30:09 +0200

lpetrut gravatar image

So you're saying that the instance is running fine, while neutron fails to attach the port "c17da5c8-61ea-4da8-ae81-a02989ab3226" to a vswitch. Is that instance reachable? What network type are you using?

I'm wondering if that port does actually get attached to a vswitch. It could be that we're reattempting to attach it because of other port binding errors.

edit flag offensive delete link more

Comments

Yes, the instance has a vnic configured, although it shows status degraded ? see: https://nextcloud.dataenligne.com/index.php/s/N848waxS4kpatby These VMs are running on a non tagged VLAN, IPv4. Let me know if you need further information. Thanks!

Nebukazar gravatar imageNebukazar ( 2018-06-29 17:40:06 +0200 )edit

Interesting, so it's already connected to a vswitch. What's the neutron port status? You can do: neutron port-show c17da5c8-61ea-4da8-ae81-a02989ab3226 I'd grep for log messages including the port id to see why the port may be handled as unbound, in which case we retry binding it.

lpetrut gravatar imagelpetrut ( 2018-06-29 18:58:49 +0200 )edit

there you go: https://nextcloud.dataenligne.com/index.php/s/NX93bedjD7wXr6p status is: BUILD which doesn't seem right to me ?

Nebukazar gravatar imageNebukazar ( 2018-06-29 22:59:48 +0200 )edit

Thanks, that confirms it, something else must've failed. We'll try to make the retry logic more robust, ensuring that the port bind operations are idempotent.

lpetrut gravatar imagelpetrut ( 2018-07-02 09:32:37 +0200 )edit

It would be really helpful if we could see what failed the first time the port was processed. You should check the logs around the time the VM was spawned.

lpetrut gravatar imagelpetrut ( 2018-07-02 09:34:18 +0200 )edit

At a first look, this looks like an invalid cache issue. You can either attempt to restart the neutron agent, or disable the wmi cache, adding the following in your neutron config: [os_win] cache_temporary_wmi_objects = false

lpetrut gravatar imagelpetrut ( 2018-07-02 09:48:43 +0200 )edit

Thanks for getting back. We've been trying to add the following to neutron_hyperv_agent.conf where the instance is currently running but so far, we're still seeing the same error messages: [os_win] cache_temporary_wmi_objects = false Are we missing something ? Thanks!

Nebukazar gravatar imageNebukazar ( 2018-07-03 02:47:58 +0200 )edit

That's odd. Just to make sure, you've restarted the neutron agent service after making the config file changes, right? Can you please confirm that the neutron agent log contains this message: 'WMI caching is disabled.' ?

lpetrut gravatar imagelpetrut ( 2018-07-03 09:17:12 +0200 )edit

Yep, neutron agent was restarted, however, I'm not seeing any WMI caching disabled messages in the neutron's logfile. We're currently using Ocata; would that make a difference ?

Nebukazar gravatar imageNebukazar ( 2018-07-03 17:07:18 +0200 )edit

Ocata should include os-win 2.2.0, which has this option. Could you please double check the os-win version? You should be able to just do a "pip freeze | findstr os-win".

lpetrut gravatar imagelpetrut ( 2018-07-03 17:32:01 +0200 )edit

The Python scripts folder may not be added to %PATH%, so you may have to use the full path (probably "C:\Progra~1\cloudbase solutions\openstack\nova\python27\scripts\pip.exe").

lpetrut gravatar imagelpetrut ( 2018-07-03 17:32:25 +0200 )edit

Looks like we might have our culprit! https://nextcloud.dataenligne.com/index.php/s/RPEKKq3A5DHr8qL It shows: os-win==1.4.2.dev11 We currently have the CloudBase HyperV Compute Ocata version 15.0.0.0 installed.

Nebukazar gravatar imageNebukazar ( 2018-07-03 19:56:02 +0200 )edit

Hi @lpetrut ; just wondering if you had any thoughts on the previous comment ? thanks !

Nebukazar gravatar imageNebukazar ( 2018-07-10 15:57:37 +0200 )edit

That's a bit strange, could you please try to update os-win to 2.2.0? I'll double check the ocata msi as well.

lpetrut gravatar imagelpetrut ( 2018-07-10 16:54:46 +0200 )edit

I have tried to update os-win to 2.2.0 using the following command: ".\pip.exe install os-win==2.2.0" --- which appears to have worked, however, when doing so, ceilometer & nova compute service (neutron doesn't seem to be affected) won't start.

Nebukazar gravatar imageNebukazar ( 2018-07-10 23:13:50 +0200 )edit

By running the command through CLI, it says "pbr" needs to be <2.0.0,>=1.8. If I force an install of pbr 1.10.0 (which seems to be the version being used by the msi package), nova won't start because it doesn't meet other dependencies requirements (Requirement.parse('pbr!=2.1.0,>=2.0.0').

Nebukazar gravatar imageNebukazar ( 2018-07-10 23:14:02 +0200 )edit

Any help would be appreciated.

Nebukazar gravatar imageNebukazar ( 2018-07-10 23:14:12 +0200 )edit

Looks like the Ocata MSI from cloudbase.it is still using os-win 1.4.2, which doesn't have a bunch of fixes (including that wmi caching option). We do have an unpublished MSI that uses 2.2.0 instead, also taking care of the dependencies.

lpetrut gravatar imagelpetrut ( 2018-07-11 13:35:47 +0200 )edit

You may either wait until we release a new Ocata MSI, or fetch os-win from https://github.com/openstack/os-win, checkout the 2.2.0 tag and modify setup.py and requirements.txt to limit the pbr version so that it matches the one used by Nova and then install it with pip, see if this fixes your issue.

lpetrut gravatar imagelpetrut ( 2018-07-11 13:44:03 +0200 )edit

I would prefer using the MSI - I'll be sure I don't mix up things. Or, I could use your "unpublished/unofficial" MSI and give feedbacks if you'd like.

Nebukazar gravatar imageNebukazar ( 2018-07-11 16:39:20 +0200 )edit

Here's an updated (currently unofficial) Ocata MSI: https://www.dropbox.com/s/eso15uj6519ixmb/HyperVNovaCompute_Ocata_15_0_4.msi?dl=0 Let me know how it works for you.

lpetrut gravatar imagelpetrut ( 2018-07-12 10:06:54 +0200 )edit

Awesome, thanks! os-win is now @ 2.2.0 - I have added back the wmi caching lines into neutron_hyperv_agent.conf ([OS_WIN] cache_temporary_wmi_objects = false). Unfortunately, the same error message occurs. I am not seeing the wmi caching deactgivated message tho within the logfile.

Nebukazar gravatar imageNebukazar ( 2018-07-12 16:38:53 +0200 )edit

I have tried to restart neutron a few times to make sure it picks up the config.

Nebukazar gravatar imageNebukazar ( 2018-07-12 16:39:32 +0200 )edit

We spent a little time on this and found one of the possible causes (reproduced it and had the same behavior). So the port is already bound to the vSwitch and high chance is that it wasn't bound by Neutron (e.g. someone did it through PowerShell, Hyper-V manager, etc).

lpetrut gravatar imagelpetrut ( 2018-07-13 11:04:43 +0200 )edit

To bring a little context, whenever you attach a NIC to a vswitch, a new vswitch port gets created. Our neutron agent will use the port id as vswitch port name, which won't be the case when someone manually connects the port.

lpetrut gravatar imagelpetrut ( 2018-07-13 11:07:10 +0200 )edit

This tricks neutron into thinking that the port is not attached, since it's relying on the port name, so it tries to connect the nic but it will fail. Here's a small script that should fix the issue by renaming the switch port so that Neutron can pick it up: http://paste.openstack.org/raw/725812

lpetrut gravatar imagelpetrut ( 2018-07-13 11:17:00 +0200 )edit

Let me know if this script works for you. You should call it using the port name (i.e. the one that neutron fails to bind). Since this situation is a bit more common than we'd hope, we'll send a patch so that the agent will gracefully handle manually connected ports.

lpetrut gravatar imagelpetrut ( 2018-07-13 14:28:03 +0200 )edit

This makes sense now; this instance has been restored using Veeam a few weeks ago which involves the instance being re-created. I've successfully ran the script and it seems to have fixed the issue.

Nebukazar gravatar imageNebukazar ( 2018-07-13 17:33:03 +0200 )edit

However, I'm now seeing different error messages such as https://nextcloud.dataenligne.com/index.php/s/6TyzZqRze38qggZ

Nebukazar gravatar imageNebukazar ( 2018-07-13 17:33:06 +0200 )edit

How often is that error occurring? It looks like it's trying to use a reference of a nonexistent vm (perhaps still pointing to the old vm), may need a little investigation. For the time being, can you try disabling port metrics collection, to see if there are any other blockers?

lpetrut gravatar imagelpetrut ( 2018-07-13 18:33:12 +0200 )edit

Actually, the error doesn't seem to be occurring very often. It occurred quite a few times but only for a few hours. Since then, no more errors.. its been 3 days or so. I'll keep an eye on it but so far.. it seems all good ! I'll make flag this post as answered and will re-open if needed.

Nebukazar gravatar imageNebukazar ( 2018-07-16 15:49:11 +0200 )edit

Question Tools

1 follower

Stats

Asked: 2018-06-29 17:11:43 +0200

Seen: 675 times

Last updated: Jun 29 '18