Python通過lxml庫遍歷xml通過xpath查詢(標簽,屬性名稱,屬性值,標簽對屬性)
阿新 • • 發佈:2017-09-09
style 去掉 odi 之間 [] 符號 層次結構 div amp
xml實例:
版本一:
<?xml version="1.0" encoding="UTF-8"?><country name="chain"><provinces><heilongjiang name="citys"><haerbin/><daqing/></heilongjiang><guangdong name="citys"><guangzhou/><shenzhen/><huhai/></guangdong><taiwan name="citys"><taibei/><gaoxiong/></taiwan><xinjiang name="citys"><wulumuqi waith="tianqi">晴</wulumuqi></xinjiang></provinces></country>
沒有空格,換行,的版本
python操作操作實例:
from lxml import etree class r_xpath_xml(object): def __init__(self): self.xmetrpa=etree.parse(‘info.xml‘) #讀取xml數據 pass def xpxm(self): xpxlm=self.xmetrpa print etree.tostring(xpxlm) #打印xml數據 root=xpxlm.getroot() #獲得該樹的樹根 print root.tag,‘ ‘, #打印根標簽名 print root.items() #獲得標簽屬性名稱和屬性值 for a in root: ##遍歷根下一集級標簽 print a.tag,a.items(),a.text,‘被打印的類型為: ‘,type(a) #打印標簽名稱,標簽屬性,標簽數據 for b in a: print b.tag,b.items(),b.text#,b for c in b: print c.tag,c.items(),c.text#,c for d in c: print d.tag,d.items(),d.test,d print xpxlm.xpath(‘//node()‘)#.items()#.tag print ‘=====================================================================================================‘ xa=xpxlm.xpath(‘//heilongjiang/*‘) print xa for xb in xa: print xb.tag,xb.items(),xb.text xc=xpxlm.xpath(‘//xinjiang/*‘) print xc for xd in xc: print xd.tag,xd.items(),xd.text if __name__ == ‘__main__‘: xpx=r_xpath_xml() xpx.xpxm()
應用for循環遍歷標簽層次結構,tag獲取標簽名,items()通過字典函數獲取[(‘屬性名‘ , ‘屬性值‘)],text獲取標簽對之間的數據。tag,items(),text針對的類型為:<type ‘lxml.etree._Element‘>
打印結果:
<country name="chain"><provinces><heilongjiang name="citys"><haerbin/><daqing/></heilongjiang><guangdong name="citys"><guangzhou/><shenzhen/><huhai/></guangdong><taiwan name="citys"><taibei/><gaoxiong/></taiwan><xinjiang name="citys"><wulumuqi waith="tianqi">晴</wulumuqi></xinjiang></provinces></country> country [(‘name‘, ‘chain‘)] provinces [] None 被打印的類型為: <type ‘lxml.etree._Element‘> heilongjiang [(‘name‘, ‘citys‘)] None haerbin [] None daqing [] None guangdong [(‘name‘, ‘citys‘)] None guangzhou [] None shenzhen [] None huhai [] None taiwan [(‘name‘, ‘citys‘)] None taibei [] None gaoxiong [] None xinjiang [(‘name‘, ‘citys‘)] None wulumuqi [(‘waith‘, ‘tianqi‘)] 晴 [<Element country at 0x2d47b20>, <Element provinces at 0x2d47990>, <Element heilongjiang at 0x2d479b8>, <Element haerbin at 0x2d47558>, <Element daqing at 0x2d47328>, <Element guangdong at 0x2d47300>, <Element guangzhou at 0x2d476e8>, <Element shenzhen at 0x2d47530>, <Element huhai at 0x2d472d8>, <Element taiwan at 0x2d47260>, <Element taibei at 0x2d47238>, <Element gaoxiong at 0x2d47080>, <Element xinjiang at 0x2d47710>, <Element wulumuqi at 0x2d47968>, u‘\u6674‘] ===================================================================================================== [<Element haerbin at 0x2d479b8>, <Element daqing at 0x2d47148>] haerbin [] None daqing [] None [<Element wulumuqi at 0x2d47968>] 類型為: <type ‘list‘> wulumuqi [(‘waith‘, ‘tianqi‘)] 晴
xml實例:
版本二:
<?xml version="1.0" encoding="UTF-8"?> <country name="chain"> <provinces> <city:table xmlns:city="http://www.w3school.com.cn/furniture"> <heilongjiang name="citys"><city:haerbin/><city:daqing/></heilongjiang> <guangdong name="citys"><city:guangzhou/><city:shenzhen/><city:zhuhai/></guangdong> <taiwan name="citys"><city:taibei/><city:gaoxiong/></taiwan> <xinjiang name="citys"><city:wulumuqi>晴</city:wulumuqi></xinjiang> </city:table> </provinces> </country>
實例:
print xpxlm.xpath(‘//node()‘)
打印結果:
空格回車字符,命名空間。
[<Element country at 0x2e79b20>, ‘\n ‘, <Element provinces at 0x2e79990>, ‘\n ‘, <Element {http://www.w3school.com.cn/furniture}table at 0x2e79710>, ‘\n ‘, <Element heilongjiang at 0x2e799b8>, <Element {http://www.w3school.com.cn/furniture}haerbin at 0x2e79328>, <Element {http://www.w3school.com.cn/furniture}daqing at 0x2e79968>, ‘\n ‘, <Element guangdong at 0x2e79530>, <Element {http://www.w3school.com.cn/furniture}guangzhou at 0x2e79300>, <Element {http://www.w3school.com.cn/furniture}shenzhen at 0x2e792d8>, <Element {http://www.w3school.com.cn/furniture}zhuhai at 0x2e79260>, ‘\n ‘, <Element taiwan at 0x2e79238>, <Element {http://www.w3school.com.cn/furniture}taibei at 0x2e79080>, <Element {http://www.w3school.com.cn/furniture}gaoxiong at 0x2e79058>, ‘\n ‘, <Element xinjiang at 0x2e796e8>, <Element {http://www.w3school.com.cn/furniture}wulumuqi at 0x2e79558>, u‘\u6674‘, ‘\n ‘, ‘ \n ‘, ‘\n‘]
去掉空格:
xp=xpxlm.xpath(‘//node()‘) print xp, #.items()#.tag for i in xp: if ‘‘ in i or ‘\n‘ in i: continue else: print i.tag
通過判斷去除空格換行符號
輸出結果:
provinces
{city}table
heilongjiang
{city}haerbin
{city}daqing
guangdong
{city}guangzhou
{city}shenzhen
{city}zhuhai
taiwan
{city}taibei
{city}gaoxiong
xinjiang
{city}wulumuqi
Python通過lxml庫遍歷xml通過xpath查詢(標簽,屬性名稱,屬性值,標簽對屬性)