1. 程式人生 > >scrapy中設定IP代理池(自定義IP代理池)

scrapy中設定IP代理池(自定義IP代理池)

首先主要的就是你應該對scrapy目錄結構,有一個相對清晰的認識,至少有過一個demo

一、手動更新IP池

1.在settings配置檔案中新增IP池:

IPPOOL=[
    {"ipaddr":"61.129.70.131:8080"},
    {"ipaddr":"61.152.81.193:9100"},
    {"ipaddr":"120.204.85.29:3128"},
    {"ipaddr":"219.228.126.86:8123"},
    {"ipaddr":"61.152.81.193:9100"},
    {"ipaddr":"218.82.33.225:53853"},
    {"ipaddr":"223.167.190.17:42789"}
]

2.修改中介軟體檔案middlewares.py

import random
from scrapy import signals
from myproxies.settings import IPPOOL

class MyproxiesSpiderMiddleware(object):

      def __init__(self,ip=''):
          self.ip=ip
       
      def process_request(self, request, spider):
          thisip=random.choice(IPPOOL)
          print("this is ip:"+thisip["ipaddr"])
          request.meta["proxy"]="http://"+thisip["ipaddr"]

3.在settings中設定DOWNLOADER_MIDDLEWARES

DOWNLOADER_MIDDLEWARES = {
#    'myproxies.middlewares.MyCustomDownloaderMiddleware': 543,
     'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware':543,
     'myproxies.middlewares.MyproxiesSpiderMiddleware':125
}