【python 淘寶爬蟲】淘寶信譽分抓取
阿新 • • 發佈:2019-02-15
一、需求分析
輸入旺旺號,獲取淘寶賣家的信用分
二、思路
淘寶需要模擬登陸,我們這裡抓不到,因此為了繞過登陸,發現了淘一兔,我們可以通過這裡,得到淘寶賣家的信用分,結果是一樣的。
http://www.taoyizhu.com/
輸入旺旺號,需要點選查詢,等待幾秒,得到查詢結果,這裡我們用selienum 來做
三、實現原始碼(抓取不能太快,否則抓不到)
# encoding: utf-8
from selenium import webdriver
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
import pandas as pd
import time
import re
time1=time.time()
driver=webdriver.PhantomJS(executable_path='D:\\Program Files\\Python27\\Scripts\\phantomjs.exe')
driver.set_window_size(800, 600)
########################讀取資料############################
data1=pd.read_excel(r'C:/taobao/taobao1.xlsx')
print data1
#######################查詢店鋪信譽分#############################
seller_credit=[]
for i in range(0,len(data1)):
key=str(data1.iloc[i,0])
key1=key.decode("utf-8")
driver.get("http://www.taoyizhu.com/")
time.sleep(5)
driver.find_element_by_id("txt_name").clear()
driver.find_element_by_id("txt_name").send_keys(key1)
driver.find_element_by_id('search_btn' ).click()
time.sleep(3)
html2 = driver.page_source
seller_credit1 = re.findall('<span id="spanUserSellerCount">(.*?)</span>', html2, re.S)
for each in seller_credit1:
print key,each
seller_credit.append(each)
#######################################增加店鋪信譽分這一列#############################
data1['店鋪信譽分']=seller_credit
print data1
# 寫出excel
writer = pd.ExcelWriter(r'C:\\taobao\\taobao1_all.xlsx', engine='xlsxwriter', options={'strings_to_urls': False})
data1.to_excel(writer, index=False)
writer.close()
time2 = time.time()
print u'ok,爬蟲結束!'
print u'總共耗時:' + str(time2 - time1) + 's'