2017.08.04 Python網絡爬蟲之Scrapy爬蟲實戰二天氣預報的數據存儲問題

阿新 • • 發佈：2017-08-05

sql語句 city amd64 ces img href asp encoding primary

1.數據存儲到JSon：程序閱讀一般都是使用更方便的Json或者cvs等待格式，繼續講解Scrapy爬蟲的保存方式，也就是繼續對pipelines.py文件動手腳

（1）創建pipelines2json.py文件：

import time
import json
import codecs



class WeatherPipeline(object):
    def process_item(self, item, spider):
        today=time.strftime(‘%Y%m%d‘,time.localtime())
        fileName=today+‘.json‘
        with codecs.open(fileName,‘a‘,encoding=‘utf8‘) as fp:
            line=json.dumps(dict(item),ensure_ascii=False)+‘\n‘
            fp.write(line)
        return item

（2）修改Settings.py文件，將pipelines2json加入到ITEM_PIPELINES中去：

技術分享

（3）執行命令：scrapy crawl HQUSpider，查看運行結果

2.數據存儲到MySQL：

（1）linux安裝Mysql：sudo apt-get install mysql-server mysql-client

（2）登錄mysql，查看mysql的字符編碼：

mysql -u root -p

mysql> SHOW VARIABLES LIKE "character%";

技術分享

（3）不修改Mysql的環境變量，只在創建數據庫和表的時候指定字符編碼：

mysql>CREATE DATABASE scrapyDB CHARACTER SET ‘utf8‘ COLLATE ‘utf8_general_Ci‘;

mysql> USE scrapyDB;

mysql> CREATE TABLE weather(
-> id INT AUTO_INCREMENT,
-> cityDate char(24),week char(6),
-> img char(20),
-> temperature char(20),
-> weather char(20),
-> wind char(20),
-> PRIMARY KEY(id)) ENGINE=InnoDB DEFAULT CHARSET=utf8;

技術分享

（4）創建一個普通用戶，並給這個用戶管理數據庫的權限，在mysql環境下，執行命令：

mysql> grant all PRIVILEGES on scrapy.* to [email protected]%‘ identified by ‘crawl123‘;

mysql> GRANT USAGE ON scrapyDB.* TO [email protected] IDENTIFIED BY ‘crawl123‘ WITH GRANT OPTION;

技術分享

（5）windows下安裝MySQLdb模塊：

執行命令：pip install MySQL-python

安裝報錯:

技術分享

轉而安裝Microsoft Visual C++ Compiler for Python 2.7，網址：https://www.microsoft.com/en-us/download/confirmation.aspx?id=44266

安裝VcForPython27也遇到問題： error: command ‘C:\\Users\\\xb4\xba\xcc\xef\\AppData\\Local\\Programs\\Common\\Microsoft\\Visual C++ for Python\\9.0\\VC\\Bin\\amd64\\cl.exe‘ failed with exit status 2

技術分享

解決方法：

解決辦法：搜索或者到下面網址下載安裝：MySQL-python-1.2.3.win-amd64-py2.7.exe

MySQL-python 1.2.3 for Windows and Python 2.7, 32bit and 64bit versions | codegood
http://www.codegood.com/archives/129

安裝成功！！

（6）編輯pipelines2mysql.py：

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don‘t forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html
import MySQLdb
import os.path



class WeatherPipeline(object):
    def process_item(self, item, spider):
        cityDate=item[‘cityDate‘].encode(‘utf8‘)
        week=item[‘week‘].encode(‘utf8‘)
        img=os.path.basename(item[‘img‘])
        temperature=item[‘temperature‘].encode(‘utf8‘)
        weather=item[‘weather‘].encode(‘utf8‘)
        wind=item[‘wind‘].encode(‘utf8‘)

        conn=MySQLdb.connect(
            host=‘遠程主機ip地址‘,
            port=3306,
            user=‘crawlUSER‘,
            passwd=‘密碼‘,
            db=‘scrapyDB‘,
            charset=‘utf8‘
        )
        cur=conn.cursor()
        cur.execute("INSERT  weather(cityDate,week,img,temperature,weather,wind) values(%s,%s,%s,%s,%s,%s)",
                    ( cityDate,week,img,temperature,weather,wind))
        cur.close()
        conn.commit()
        conn.close()
        return item

運行報錯：

技術分享

解決方法：去掉sql語句中的TO ；

技術分享

完成！！

2017.08.04 Python網絡爬蟲之Scrapy爬蟲實戰二天氣預報的數據存儲問題

2017.08.04 Python網絡爬蟲之Scrapy爬蟲實戰二天氣預報的數據存儲問題

2017.08.04 Python網絡爬蟲之Scrapy爬蟲實戰二天氣預報

2017.08.04 Python網絡爬蟲之Scrapy爬蟲實戰二天氣預報的數據存儲問題

2017.08.11 Python網絡爬蟲實戰之Beautiful Soup爬蟲

2017.07.26 Python網絡爬蟲之Scrapy爬蟲框架

2017.07.28 Python網絡爬蟲之爬蟲實戰今日影視2 獲取JS加載的數據

運維學python之爬蟲中級篇（五）數據存儲（無數據庫版）

Python網路爬蟲之scrapy爬蟲的基本使用

Python學習筆記——爬蟲之Scrapy-Redis實戰

使用網絡監視器（IRSI）捕捉和分析協議數據包

Python3爬蟲（八）數據存儲之TXT、JSON、CSV

Python3爬蟲（九）數據存儲之關系型數據庫MySQL

Python-面向對象之property裝飾器的實現（數據描述器）

python | 爬蟲筆記（五）- 數據存儲

java調用Linux執行Python爬蟲，並將數據存儲到elasticsearch中--（java後臺代碼）

cocos2d-html5開發之本地數據存儲

cocos2dX 之數據存儲

Android學習——數據存儲之文件存儲

iphone數據存儲之－－ Core Data的使用（一）

微信小程序開發之數據存儲參數傳遞數據緩存

2017.08.04 Python網絡爬蟲之Scrapy爬蟲實戰二 天氣預報的數據存儲問題

相關推薦

2017.08.04 Python網絡爬蟲之Scrapy爬蟲實戰二天氣預報的數據存儲問題