python中urllib.request和requests的使用及區別詳解

阿新 • • 發佈：2020-05-07

urllib.request

我們都知道，urlopen()方法能發起最基本對的請求發起，但僅僅這些在我們的實際應用中一般都是不夠的，可能我們需要加入headers之類的引數,那需要用功能更為強大的Request類來構建了

在不需要任何其他引數配置的時候，可直接通過urlopen()方法來發起一個簡單的web請求

發起一個簡單的請求

import urllib.request
url='https://www.douban.com'
webPage=urllib.request.urlopen(url)
print(webPage)
data=webPage.read()
print(data)
print(data.decode('utf-8'))

urlopen()方法返回的是一個http.client.HTTPResponse物件，需要通過read（）方法做進一步的處理。一般使用read（）後，我們需要用decode（）進行解碼，通常為utf-8，經過這些步驟後，最終才獲取到我們想要的網頁。

新增Headers資訊

import urllib.request
url='https://www.douban.com'
headers = {
   'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/64.0.3282.186 Safari/537.36',}
response=urllib.request.Request(url=url,headers=headers)
webPage=urllib.request.urlopen(response)
print(webPage.read().decode('utf-8'))

使用Request類返回的又是一個urllib.request.Request物件了。

通常我們爬取網頁，在構造http請求的時候，都需要加上一些額外資訊，什麼Useragent，cookie等之類的資訊，或者新增代理伺服器。往往這些都是一些必要的反爬機制

requests

通常而言，在我們使用python爬蟲時，更建議用requests庫，因為requests比urllib更為便捷，requests可以直接構造get,post請求併發起，而urllib.request只能先構造get，post請求，再發起。

import requests
url='https://www.douban.com'
headers = {
  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,}
get_response = requests.get(url,headers=headers,params=None)
post_response=requests.post(url,data=None,json=None)
print(post_response)
print(get_response.text)
print(get_response.content)
print(get_response.json)

get_response.text得到的是str資料型別。

get_response.content得到的是Bytes型別,需要進行解碼。作用和get_response.text類似。

get_response.json得到的是json資料。

總而言之，requests是對urllib的進一步封裝，因此在使用上顯得更加的便捷，建議小夥伴們在實際應用當中儘量使用requests。

補充知識：python中urllib.request.Request()與urllib.request.urlopen()區別

蟒蛇中urllib.request.Request（）與urllib.request.urlopen（）的區別：

相對於urllib.request.urlopen（）來說urllib.request.Request是進一步的包裝請求，下面是請求類的原始碼示例：

class Request:
  
  # 主要看這塊，建構函式中指明瞭Request進一步包裝請求中可以傳遞的引數有（url，data，headers，            
  # origin_req_host，unverifiable，method）
 
  def __init__(self,url,headers={},origin_req_host=None,unverifiable=False,method=None):
    self.full_url = url
    self.headers = {}
    self.unredirected_hdrs = {}
    self._data = None
    self.data = data
    self._tunnel_host = None
    for key,value in headers.items():
      self.add_header(key,value)
    if origin_req_host is None:
      origin_req_host = request_host(self)
    self.origin_req_host = origin_req_host
    self.unverifiable = unverifiable
    if method:
      self.method = method
  pass

我們可以這樣使用（以下是模擬有道字典翻譯傳送的請求）：

# 請求地址url
url = "http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule"
 
# 請求頭
request_headers = {
  'Host':'fanyi.youdao.com',"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/63.0.3239.108 Safari/537.36",}
 
# 傳送給伺服器的表單
form_data = {
  "i": word,"from": "AUTO","to": "AUTO","smartresult": "dict","doctype": "json","version": "2.1","keyfrom": "fanyi.web","action": "FY_BY_REALTIME","typoResult": "false"
}
 
# POST傳送的data必須為bytes或bytes型別的可迭代物件，不能是字串
form_data = urllib.parse.urlencode(form_data).encode()
 
# 構造請求物件Request
req = urllib.request.Request(url,data=form_data,headers=request_headers)
 
# 發起請求
response = urllib.request.urlopen(req)
data = response.read().decode()
print(data)

所以，總的來說，如果我們在獲取請求物件時，不需要過多的引數傳遞，我麼可以直接選擇urllib.request.urlopen（）;如果需要進一步的包裝請求，則需要用urllib.request裡。的urlopen（）進行包裝處理。

以上這篇python中urllib.request和requests的使用及區別詳解就是小編分享給大家的全部內容了，希望能給大家一個參考，也希望大家多多支援我們。

python中urllib.request和requests的使用及區別詳解

python中urllib.request和requests的使用及區別詳解

spider.?-python中urllib.request和requests的使用和區別

基於Python中isfile函式和isdir函式使用詳解

Golang中的Slice與陣列及區別詳解

python中字首運算子 *和 **的用法示例詳解

Python生成器next方法和send方法區別詳解

python字串的index和find的區別詳解

java中的i++和++i的區別詳解

jQuery中event.target和this的區別詳解

python re的findall和finditer的區別詳解

Python時間差中seconds和total_seconds的區別詳解

基於python cut和qcut的用法及區別詳解

Python中*args和**kwargs的區別詳解

對python中 math模組下 atan 和 atan2的區別詳解

深入淺析python 中的self和cls的區別

Python中的@staticmethod和@classmethod的區別

Python 中的new和init的區別

python中的列表和元組區別分析

Python中'+='和extend的區別

關於numpy中eye和identity的區別詳解

python中urllib.request和requests的使用及區別詳解

相關推薦