2017.08.04 Python網絡爬蟲之Scrapy爬蟲實戰二 天氣預報的數據存儲問題
1.數據存儲到JSon:程序閱讀一般都是使用更方便的Json或者cvs等待格式,繼續講解Scrapy爬蟲的保存方式,也就是繼續對pipelines.py文件動手腳
(1)創建pipelines2json.py文件:
import time
import json
import codecs
class WeatherPipeline(object):
def process_item(self, item, spider):
today=time.strftime(‘%Y%m%d‘,time.localtime())
fileName=today+‘.json‘
with codecs.open(fileName,‘a‘,encoding=‘utf8‘) as fp:
line=json.dumps(dict(item),ensure_ascii=False)+‘\n‘
fp.write(line)
return item
(2)修改Settings.py文件,將pipelines2json加入到ITEM_PIPELINES中去:
(3)執行命令:scrapy crawl HQUSpider,查看運行結果
2.數據存儲到MySQL:
(1)linux安裝Mysql:sudo apt-get install mysql-server mysql-client
(2)登錄mysql,查看mysql的字符編碼:
mysql -u root -p
mysql> SHOW VARIABLES LIKE "character%";
(3)不修改Mysql的環境變量,只在創建數據庫和表的時候指定字符編碼:
mysql>CREATE DATABASE scrapyDB CHARACTER SET ‘utf8‘ COLLATE ‘utf8_general_Ci‘;
mysql> USE scrapyDB;
mysql> CREATE TABLE weather(
-> id INT AUTO_INCREMENT,
-> cityDate char(24),week char(6),
-> img char(20),
-> temperature char(20),
-> weather char(20),
-> wind char(20),
-> PRIMARY KEY(id)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
(4)創建一個普通用戶,並給這個用戶管理數據庫的權限,在mysql環境下,執行命令:
mysql> grant all PRIVILEGES on scrapy.* to [email protected]%‘ identified by ‘crawl123‘;
mysql> GRANT USAGE ON scrapyDB.* TO [email protected] IDENTIFIED BY ‘crawl123‘ WITH GRANT OPTION;
(5)windows下安裝MySQLdb模塊:
執行命令:pip install MySQL-python
安裝報錯:
轉而安裝Microsoft Visual C++ Compiler for Python 2.7,網址:https://www.microsoft.com/en-us/download/confirmation.aspx?id=44266
安裝VcForPython27也遇到問題: error: command ‘C:\\Users\\\xb4\xba\xcc\xef\\AppData\\Local\\Programs\\Common\\Microsoft\\Visual C++ for Python\\9.0\\VC\\Bin\\amd64\\cl.exe‘ failed with exit status 2
解決方法:
解決辦法:搜索或者到下面網址下載安裝:MySQL-python-1.2.3.win-amd64-py2.7.exe
MySQL-python 1.2.3 for Windows and Python 2.7, 32bit and 64bit versions | codegood
http://www.codegood.com/archives/129
安裝成功!!
(6)編輯pipelines2mysql.py:
# -*- coding: utf-8 -*-
# Define your item pipelines here
#
# Don‘t forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html
import MySQLdb
import os.path
class WeatherPipeline(object):
def process_item(self, item, spider):
cityDate=item[‘cityDate‘].encode(‘utf8‘)
week=item[‘week‘].encode(‘utf8‘)
img=os.path.basename(item[‘img‘])
temperature=item[‘temperature‘].encode(‘utf8‘)
weather=item[‘weather‘].encode(‘utf8‘)
wind=item[‘wind‘].encode(‘utf8‘)
conn=MySQLdb.connect(
host=‘遠程主機ip地址‘,
port=3306,
user=‘crawlUSER‘,
passwd=‘密碼‘,
db=‘scrapyDB‘,
charset=‘utf8‘
)
cur=conn.cursor()
cur.execute("INSERT weather(cityDate,week,img,temperature,weather,wind) values(%s,%s,%s,%s,%s,%s)",
( cityDate,week,img,temperature,weather,wind))
cur.close()
conn.commit()
conn.close()
return item
運行報錯:
解決方法:去掉sql語句中的TO ;
完成!!
2017.08.04 Python網絡爬蟲之Scrapy爬蟲實戰二 天氣預報的數據存儲問題