scrapy中設定IP代理池(自定義IP代理池)
阿新 • • 發佈:2018-12-10
首先主要的就是你應該對scrapy目錄結構,有一個相對清晰的認識,至少有過一個demo
一、手動更新IP池
1.在settings
配置檔案中新增IP池:
IPPOOL=[ {"ipaddr":"61.129.70.131:8080"}, {"ipaddr":"61.152.81.193:9100"}, {"ipaddr":"120.204.85.29:3128"}, {"ipaddr":"219.228.126.86:8123"}, {"ipaddr":"61.152.81.193:9100"}, {"ipaddr":"218.82.33.225:53853"}, {"ipaddr":"223.167.190.17:42789"} ]
2.修改中介軟體檔案middlewares.py
import random from scrapy import signals from myproxies.settings import IPPOOL class MyproxiesSpiderMiddleware(object): def __init__(self,ip=''): self.ip=ip def process_request(self, request, spider): thisip=random.choice(IPPOOL) print("this is ip:"+thisip["ipaddr"]) request.meta["proxy"]="http://"+thisip["ipaddr"]
3.在settings
中設定DOWNLOADER_MIDDLEWARES
DOWNLOADER_MIDDLEWARES = {
# 'myproxies.middlewares.MyCustomDownloaderMiddleware': 543,
'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware':543,
'myproxies.middlewares.MyproxiesSpiderMiddleware':125
}