1. 程式人生 > >【Python】python3中urllib爬蟲開發

【Python】python3中urllib爬蟲開發

urlopen 狀態碼 tco processor span agent cond urllib 聲明

以下是三種方法

①First Method

最簡單的方法

②添加data,http header

使用Request對象

③CookieJar

import urllib.request
from http import cookiejar
url =http://www.baidu.com

print("First Method")

response1 = urllib.request.urlopen(url)
#返回狀態碼
print(response1.getcode())
print(len(response1.read()))

print("Second Method
") request = urllib.request.Request(url) request.add_header("uese-agent","Mazilla/5.0") response2 = urllib.request.urlopen(url) #返回狀態碼 print(response2.getcode()) print(len(response2.read())) print("Third Method") #聲明一個CookieJar對象實例來保存cookie cj = cookiejar.CookieJar() #利用urllib.request庫的HTTPCookieProcessor對象來創建cookie處理器,也就CookieHandler
handler = urllib.request.HTTPCookieProcessor(cj) #通過CookieHandler創建opener opener = urllib.request.build_opener(handler) #此處的open方法同urllib.request的urlopen方法,也可以傳入request response3 = opener.open(url) #返回狀態碼 print(response3.getcode()) print(response3.read())

【Python】python3中urllib爬蟲開發