1. 程式人生 > 實用技巧 >記一次partprobe和partx失敗的問題

記一次partprobe和partx失敗的問題

問題描述:

在一次裸金屬發放過程中,ironic-python-agent日誌裡面出現如下的問題:

Aug 18 05:34:59 cmss ironic-python-agent[9488]: 2020-08-18 05:34:59.224 9488 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): partprobe /dev/sda execute /usr/share/ironic-python-agent/venv/lib/python2.7/site-packages/oslo_concurrency/processutils.py:355
Aug 18 05:35:02 cmss ironic-python-agent[9488]: ::ffff:99.99.1.63 - - [18/Aug/2020 05:35:02] "GET /v1/commands HTTP/1.1" 200 95178
Aug 18 05:35:02 cmss ironic-python-agent[9488]: 2020-08-18 05:35:02.280 9488 DEBUG oslo_concurrency.processutils [-] CMD "partprobe /dev/sda" returned: 1 in 3.056s execute /usr/share/ironic-python-agent/venv/lib/python2.7/site-packages/oslo_concurrency/processutils.py:385
Aug 18 05:35:02 cmss ironic-python-agent[9488]: 2020-08-18 05:35:02.281 9488 DEBUG oslo_concurrency.processutils [-] u'partprobe /dev/sda' failed. Not Retrying. execute /usr/share/ironic-python-agent/venv/lib/python2.7/site-packages/oslo_concurrency/processutils.py:433
Aug 18 05:35:02 cmss ironic-python-agent[9488]: 2020-08-18 05:35:02.281 9488 DEBUG ironic_lib.disk_utils [-] Failed to notice kernel to sync partitions tables on disk /dev/sda for node 56a9526d-fdc0-4e41-b05d-9295771eb074. Error: Error: Partition(s) 3, 5, 6 on /dev/sda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use.  As a result, the old partition(s) will remain in use.  You should reboot now before making further changes.
Aug 18 05:35:02 cmss ironic-python-agent[9488]: . It is not fatal error, you can ignore it.
(The 1/3 times. Use partx to retry) _get_labelled_partition /usr/share/ironic-python-agent/venv/lib/python2.7/site-packages/ironic_lib/disk_utils.py:648 Aug 18 05:35:02 cmss ironic-python-agent[9488]: 2020-08-18 05:35:02.281 9488 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): partx -u /dev/sda execute /usr/share/ironic-python-agent/venv/lib/python2.7/site-packages/oslo_concurrency/processutils.py:355 Aug 18 05:35:02 cmss ironic-python-agent[9488]: 2020-08-18 05:35:02.287 9488 DEBUG oslo_concurrency.processutils [-] u'partx -u /dev/sda' failed. Not Retrying. execute /usr/share/ironic-python-agent/venv/lib/python2.7/site-packages/oslo_concurrency/processutils.py:433 Aug 18 05:35:02 cmss ironic-python-agent[9488]: 2020-08-18 05:35:02.287 9488 ERROR ironic_lib.disk_utils [-] Get partition label occurs error on disk /dev/sda for node 56a9526d-fdc0-4e41-b05d-9295771eb074. Because partx is not exist or other unknown reasons Error: [Errno 2] No such file or directory

該問題主要是在為裸金屬寫完映象後,建立64M大小的configdrive分割槽用於儲存userdata和metadata的時候出現的,在這裡會執行partprobe來通知核心重讀分割槽表。

另外,在測試裸金屬自定義分割槽的時候,有時候也會出現如下的問題:

經測試,上述pvcreate失敗的原因在於該pv已經被建立了。但是我們在寫映象之前,是將系統上的vg和pv都清理了的。

初步結論:

初步懷疑為部署過程中對磁碟的清理不徹底,有資料殘留導致該問題。

測試步驟:

1.準備一個乾淨的磁碟sda

2.在sda盤上建立幾個磁碟分割槽:sda1/sda2/sda3/sda4/sda5等

3.將磁碟分割槽轉換為pv:

pvcreate --force /dev/sdaX

4.基於pv建立vg:

vgcreate vg01 /dev/sda1 /dev/sda2
vgcreate vg02 /dev/sda3 /dev/sda5

5.基於vg再建立lv:

lvcreate -L <size> <vg>

按照現有的ironic-python-agent流程,先是執行磁碟擦除操作:

wipefs --force --all /dev/sda

這裡執行partprobe和partx:
partprobe執行成功,但是partx報錯:failed to read partition table

到這裡磁碟 /dev/sda 中的資料其實已經被清空了,通過fdisk /dev/sda -l其實可以看到。但是lsblk還是會顯示之前的lv、分割槽等資訊。

6. 用fdisk /dev/sda 建立幾個分割槽,然後用partprobe和partx分別測試:

partx執行成功:

partprobe執行失敗:

該錯誤與在部署裸金屬過程中遇到的完全一樣,所以最終結論在於,ironic-python-agent先執行wipefs擦除磁碟元資料,然後通過qemu-img convert -t directsync -O host_device 命令將映象寫入磁碟,但是由於此時lvm扔有殘留,導致partprobe無法重新讀取分割槽表。

最終結論:

需要在擦除磁碟寫映象之前就要將相應的lvm清理掉,包括lv、vg、pv等。

後記:

關於pvcreate建立pv失敗的問題,經過上面的修復,後續沒有再次復現過,但是問題原因可以得知是pv沒有清理乾淨。暫且記下,看看後面還會不會再次出現。