Python爬蟲headers處理及網路超時問題解決方案

阿新 • • 發佈：2020-06-19

1、請求headers處理

　　我們有時請求伺服器時，無論get或post請求，會出現403錯誤，這是因為伺服器拒絕了你的訪問，這時我們可以通過模擬瀏覽器的頭部資訊進行訪問，這樣就可以解決反爬設定的問題。

import requests
# 建立需要爬取網頁的地址
url = 'https://www.baidu.com/'   
# 建立頭部資訊
headers = {'User-Agent':'OW64; rv:59.0) Gecko/20100101 Firefox/59.0'}
# 傳送網路請求
response = requests.get(url,headers=headers)  
# 以位元組流形式列印網頁原始碼
print(response.content)

結果：

b'<!DOCTYPE html><!--STATUS OK-->\n\n\n  \n  \n              <html><head><meta http-equiv="Content-Type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"><meta content="always" name="referrer"><meta name="theme-color" content="#2932e1"><meta name="description" content="\xe5\x85\xa8\xe7\x90\x83\xe6\x9c\x80\xe5\xa4\xa7\xe7\x9a\x84\xe4\xb8\xad\xe6\x96\x87\xe6\x90\x9c\xe7\xb4\xa2\xe5\xbc\x95\xe6\x93\x8e\xe3\x80\x81\xe8\x87\xb4\xe5\x8a\x9b\xe4\xba\x8e\xe8\xae\xa9\xe7\xbd\x91\xe6\xb0\x91\xe6\x9b\xb4\xe4\xbe\xbf\xe6\x8d\xb7\xe5\x9c\xb0\xe8\x8e\xb7\xe5\x8f\x96\xe4\xbf\xa1\xe6\x81\xaf\xef\xbc\x8c\xe6\x89\xbe\xe5\x88\xb0\xe6\x89\x80\xe6\xb1\x82\xe3\x80\x82\xe7\x99\xbe\xe5\xba\xa6\xe8\xb6\x85\xe8\xbf\x87\xe5\x8d\x83\xe4\xba\xbf\xe7\x9a\x84\xe4\xb8\xad\xe6\x96\x87\xe7\xbd\x91\xe9\xa1\xb5\xe6\x95\xb0\xe6\x8d\xae\xe5\xba\x93\xef\xbc\x8c\xe5\x8f\xaf\xe4\xbb\xa5\xe7\x9e\xac\xe9\x97\xb4\xe6\x89\xbe\xe5\x88\xb0\xe7\x9b\xb8\xe5\x85\xb3\xe7\x9a\x84\xe6\x90\x9c\xe7\xb4\xa2\xe7\xbb\x93\xe6\x9e\x9c\xe3\x80\x82"><link rel="shortcut icon" href="/favicon.ico" rel="external nofollow" type="image/x-icon" /><link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" rel="external nofollow" title="\xe7\x99\xbe\xe5\xba\xa6\xe6\x90\x9c\xe7\xb4\xa2" /><link rel="icon" sizes="any" mask href="//www.baidu.com/img/baidu_85beaf5496f291521eb75ba38eacbd87.svg" rel="external nofollow" ><link rel="dns-prefetch" href="//dss0.bdstatic.com" rel="external nofollow" /><link rel="dns-prefetch" href="//dss1.bdstatic.com" rel="external nofollow" /><link rel="dns-prefetch" href="//ss1.bdstatic.com" rel="external nofollow" /><link rel="dns-prefetch" href="//sp0.baidu.com" rel="external nofollow" /><link rel="dns-prefetch" href="//sp1.baidu.com" rel="external nofollow" /><link rel="dns-prefetch" href="//sp2.baidu.com" rel="external nofollow" />

2、網路超時問題

　　在訪問一個網頁時，如果該網頁長時間未響應，系統就會判斷該網頁超時，而無法開啟網頁。下面通過程式碼來模擬一個網路超時的現象。

import requests
# 迴圈傳送請求50次
for a in range(1,50):
  # 捕獲異常
  try:
    # 設定超時為0.5秒
    response = requests.get('https://www.baidu.com/',timeout=0.5)
    # 列印狀態碼
    print(response.status_code)
  # 捕獲異常
  except Exception as e:
    # 列印異常資訊
    print('異常'+str(e))

結果：

以上程式碼中，模擬進行了50次迴圈請求，設定超時時間為0.5秒，在0.5秒內伺服器未作出相應視為超時，程式會將超時資訊列印在控制檯中。

　　說起網路異常資訊，requests模組同樣提供了三種常見的網路異常類，示例程式碼如下：

import requests
# 匯入requests.exceptions模組中的三種異常類
from requests.exceptions import ReadTimeout,HTTPError,RequestException
# 迴圈傳送請求50次
for a in range(1,timeout=0.5)
    # 列印狀態碼
    print(response.status_code)
  # 超時異常
  except ReadTimeout:
    print('timeout')
  # HTTP異常
  except HTTPError:
    print('httperror')
  # 請求異常
  except RequestException:
    print('reqerror')

結果：

以上就是本文的全部內容，希望對大家的學習有所幫助，也希望大家多多支援我們。