1. 程式人生 > >使用BeautifulSoup讀取網頁時發生錯誤的處理方法

使用BeautifulSoup讀取網頁時發生錯誤的處理方法

剛開始學習BeautifulSoup在讀取網頁後解析網頁內容時發生錯誤,先上一段執行程式碼:

#!/usr/bin/python
# -*- coding: UTF-8 -*-
from bs4 import BeautifulSoup
from urllib2 import urlopen
WebSite='http://www.weather.com.cn/weather/101010100.shtml'
soup = BeautifulSoup(WebSite,"html.parser")#"html.parser",,from_encoding="utf-8"
print soup.prettify()

我是想把給定網頁的內容顯示一下,但執行程式時出現如下錯誤:

/usr/lib/python2.7/dist-packages/bs4/__init__.py:282: UserWarning: "http://www.weather.com.cn/weather/101010100.shtml" looks like a URL. Beautiful Soup is not an HTTP client. You should probably use an HTTP client like requests to get the document behind the URL, and feed that document to Beautiful Soup.
  ' that document to Beautiful Soup.' % decoded_markup
http://www.weather.com.cn/weather/101010100.shtml

最後在stackoverflow上找到了答案,網址:https://stackoverflow.com/questions/24768858/beautifulsoup-responses-with-error

出現上述問題是因為程式中這條語句:soup = BeautifulSoup(WebSite,"html.parser")是有問題的,應該為:soup = BeautifulSoup(urlopen(WebSite),"html.parser")

正確的完整程式碼如下:

#!/usr/bin/python
# -*- coding: UTF-8 -*-
from bs4 import BeautifulSoup
from urllib2 import urlopen
WebSite='http://www.weather.com.cn/weather/101010100.shtml'
soup = BeautifulSoup(urlopen(WebSite),"html.parser")#"html.parser",,from_encoding="utf-8"
print soup.prettify()