【Python】python3中urllib爬蟲開發
阿新 • • 發佈:2017-12-08
urlopen 狀態碼 tco processor span agent cond urllib 聲明
以下是三種方法
①First Method
最簡單的方法
②添加data,http header
使用Request對象
③CookieJar
import urllib.request from http import cookiejar url =‘http://www.baidu.com‘ print("First Method") response1 = urllib.request.urlopen(url) #返回狀態碼 print(response1.getcode()) print(len(response1.read())) print("Second Method") request = urllib.request.Request(url) request.add_header("uese-agent","Mazilla/5.0") response2 = urllib.request.urlopen(url) #返回狀態碼 print(response2.getcode()) print(len(response2.read())) print("Third Method") #聲明一個CookieJar對象實例來保存cookie cj = cookiejar.CookieJar() #利用urllib.request庫的HTTPCookieProcessor對象來創建cookie處理器,也就CookieHandlerhandler = urllib.request.HTTPCookieProcessor(cj) #通過CookieHandler創建opener opener = urllib.request.build_opener(handler) #此處的open方法同urllib.request的urlopen方法,也可以傳入request response3 = opener.open(url) #返回狀態碼 print(response3.getcode()) print(response3.read())
【Python】python3中urllib爬蟲開發