爬蟲庫之BeautifulSoup學習（三）

阿新 • • 發佈：2017-05-12

子節點 rom lac repr 文檔 strong 爬蟲 time contents

遍歷文檔樹：

　　1、查找子節點

　　.contents　　

　　tag的.content屬性可以將tag的子節點以列表的方式輸出。

　　print soup.body.contents

　　print type(soup.body.contents)

　　運行結果：

[u‘\n‘, The Dormouse‘s story, u‘\n‘, Once upon a time there were three little sisters; and their names were\n<a class="sister" href="http://example.com/elsie" id="link1"></a>,\n<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and\n<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;\nand they lived at the bottom of a well., u‘\n‘, ..., u‘\n‘]

<type ‘list‘>
[Finished in 0.2s]

.children

它返回的不是一個list，不過我們可以通過它來遍歷獲取所有子節點。

我們可以打印輸出，可以發現它返回的是一個list生成器對象

print soup.body.children

我們怎樣獲得裏面的內容呢？遍歷一下就ok了：

for child in soup.boyd.children:

　　print child

運行返回內容：

The Dormouse‘s story

Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1"></a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;

and they lived at the bottom of a well.

...

[Finished in 0.2s]

2、所有子孫節點

.descendants

.contents 和 .children 屬性僅包含tag的直接子節點，.descendants 屬性可以對所有tag的子孫節點進行遞歸循環，和 children類似，我們也需要遍歷獲取其中的內容。

for child in soup.descendants:
　　print child

運行結果如下，可以發現，所有的節點都被打印出來了，先生最外層的 HTML標簽，其次從 head 標簽一個個剝離，以此類推。

3、節點內容

.string

如果一個標簽裏面沒有標簽了，那麽 .string 就會返回標簽裏面的內容。如果標簽裏面只有唯一的一個標簽了，那麽 .string 也會返回最裏面的內容。

果tag包含了多個子節點,tag就無法確定，string 方法應該調用哪個子節點的內容, .string 的輸出結果是 None

print soup.head.string
print soup.title.string
print soup.body.string

#The Dormouse‘s story
#The Dormouse‘s story
#None
[Finished in 0.2s]

4、多個內容

.strings

獲取多個內容，不過需要遍歷獲取

for string in soup.strings:

　　print repr(string)

　　.stripped_strings

　　輸出的字符串中可能包含了很多空格或空行,使用 .stripped_strings 可以去除多余空白內容

for string in soup.stripped_strings:
　　print repr(string)

運行結果：

u"The Dormouse‘s story"
u"The Dormouse‘s story"
u‘Once upon a time there were three little sisters; and their names were‘
u‘,‘
u‘Lacie‘
u‘and‘
u‘Tillie‘
u‘;\nand they lived at the bottom of a well.‘
u‘...‘
[Finished in 0.2s]

5、父節點

.parent

print soup.p.parent.name

print soup.head.title.string.parent.name

#body

#title

6、兄弟節點、前後節點等略

爬蟲庫之BeautifulSoup學習（三）

子節點 rom lac repr 文檔 strong 爬蟲 time contents 遍歷文檔樹：　　1、查找子節點　　.contents　　　　tag的.content屬性可以將tag的子節點以列表的方式輸出。　　print soup.body.cont

爬蟲庫之BeautifulSoup學習（三）

爬蟲庫之BeautifulSoup學習（三）

爬蟲庫之BeautifulSoup學習（二）

爬蟲庫之BeautifulSoup學習（四）

機器學習之整合學習（三）AdaBoost演算法scikit-learn庫

數據庫mysql的學習（三）

介面自動化之requests學習（三）--傳送post請求

ARCore之路－計算機視覺之機器學習（三）

python網路爬蟲與資訊採取之解析網頁（三）---- BeautifulSoup庫的導航樹例項

PYTHON學習（三）之利用python進行數據分析(1)---準備工作

小白學習之Code First（三）

機器學習數學基礎之矩陣理論（三）

Java學習（三）面向對象之封裝

【JMeter4.0學習（三）】之SoapUI創建WebService接口模擬服務端以及JMeter測試SOAP協議性能測試腳本開發

C++學習（三）之基本數據類型

Android so註入(inject)和Hook技術學習（三）——Got表hook之導出表hook

29 Java學習之NIO Selector（三）

NS2入門學習（三）之Tcl知識點

hyperledger fabric 學習（三）之-hyperledger fabric 的模組化

STM32 HAL庫學習（三）ADC取樣以及printf的使用

Git 學習之git 分支（三）

爬蟲庫之BeautifulSoup學習（三）

相關推薦