New Question
0

Cloudbase-init 0.9.9 timing and sysprep

asked 2016-11-17 16:47:09 +0200

jrack gravatar image

updated 2016-11-17 16:51:01 +0200

Claudiu Belu gravatar image

So trying out not sysprep'ing my image that was generated on a KVM instance so it theoretically be reused with a simple convert format. I know there be dragons... ;-)

I can see in the initial boot the image goes into a device detection phase and during that detection phase (UI shows 66%) I can see in the console that init tries to connect to metadata and gets an "unreachable network error" which is fatal. After about 5min (which makes me think an internal timeout in the image) the instance will shift to Ready UI will come up and the instance will be then be pingable. If I manually force a reboot cb-init will restart and do what it should w/o an issue.

Couple thoughts....

  • Assuming if I sysprep'd that init would not launch till after the devices are really ready and wouldn't hit the fatal and I would just carry the 3x prep annoyances etc.
  • Assuming if I did the image prep (no sysprep) on Hyper-V only than the device detection would be shorter or unnecessary and wouldn't hit the fatal

This just creates a bit of a fudge factor with rally which I don't love. Is there a networkReady test that should occur before attempting the metadata pull that would prevent the fatal?

edit retag flag offensive close merge delete

2 answers

Sort by » oldest newest most voted
0

answered 2016-11-17 17:15:58 +0200

Claudiu Belu gravatar image

Hello,

This is might not directly answer your question, but regarding the "unreachable network error", the symptoms you're describing sound like the VM's neutron ports were bound after the VM started.

There is a feature in nova, which basically makes nova-compute wait for Neutron vif plug events before starting the VMs. This ensures that the VMs neutron ports and security groups are already processed and bound, ensuring that the VM will have network connectivity. The config option is called vif_plugging_timeout [1]. There is also a config option calledvif_plugging_is_fatal, which will cause the VM to fail to spawn on a host, if the Neutron vif plug events did not occur during the vif_plugging_is_fatal period.

We've introduced this feature into compute-hyperv in Mitaka.

What OpenStack version are you using? What hypervisor are you using?

[1] http://docs.openstack.org/kilo/config...

Best regards,

Claudiu Belu

edit flag offensive delete link more

Comments

This is on the Liberty driver today, but pushing on my team to upgrade to Mitaka hopefully soon and this is a WinSrv16 TP5. It has been inconsistent so makes sense that maybe neutron is just slow to respond. Don't think a backport is merited in this case, but helps to understand the cause.

jrack gravatar imagejrack ( 2016-11-17 17:23:29 +0200 )edit

So because this is back in neutron its hard to manage this in guest. The fact Server is showing devdetect tells me though there may be some status that cb-init could halt and wait for though?

jrack gravatar imagejrack ( 2016-11-17 17:25:28 +0200 )edit

Know of a way I could confirm the time when the plugging was complete in neutron to try and paint a picture? The plugging completing (or timing out) may be what allows device detect to end and shift to ready.

jrack gravatar imagejrack ( 2016-11-17 17:54:37 +0200 )edit

You will have to set debug=True and verbose=True in the compute node's nova.conf file. Then, when you spawn a VM, and it is waiting for a Neutron VIF plug event, it should look like this in the logs: http://paste.openstack.org/show/589616/

Claudiu Belu gravatar imageClaudiu Belu ( 2016-11-17 18:10:20 +0200 )edit

Also, neutron-hyperv-agent / neutron-ovs-agent logs what ports it processed in its logs. A VM's NIC name is the same as its neutron port name. Also, Windows / Hyper-V Server 2016 has been released, you can give it a try. :) https://www.microsoft.com/en-us/evalcenter/evaluate-hyper-v-server-2016

Claudiu Belu gravatar imageClaudiu Belu ( 2016-11-17 18:15:35 +0200 )edit
0

answered 2016-11-18 13:16:40 +0200

avladu gravatar image

Hello,

This issue you are getting might be related to the network autoconfiguration timeout. Before the shutdown, you need to run: "ipconfig.exe /release"

If this does not solve your issue, can you provide the cloudbase-init logs for debugging?

Thank you, Adrian Vladu

edit flag offensive delete link more

Comments

My lab is currently on paper tape storage so Glance ops have caused a few issues, but I will give that a shot in a bit after I see if I can get some hardware made in this millenia. I manually reboot around that 08:29 time http://paste.openstack.org/show/589708/

jrack gravatar imagejrack ( 2016-11-18 15:32:01 +0200 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2016-11-17 16:47:09 +0200

Seen: 1,100 times

Last updated: Nov 18 '16