實戰單執行緒爬取，單執行緒+協程爬取，多執行緒爬取

阿新 • • 發佈：2020-07-29

一.目標網頁：https://lusongsong.com/default_2.html.爬取該頁面連結（有17個）下詳情內容並儲存到本地

二分別採取單執行緒爬取，多執行緒爬取，單執行緒+協程爬取

　　2.1 單執行緒爬取

import requests
from lxml import etree
import time

def get_request(url):
    response = requests.get(url).text
    return response

def parse(html):
    tree = etree.HTML(html)
    title = tree.xpath('//div[@class="post-title"]/h1/a/text()')[0]
 
    text = tree.xpath('//dd[@class="con"]/p/text()')
    text = "".join(text)
    with open('1.txt','a+',encoding='utf-8') as fp:
        fp.write(title + '\n' + text + '\n')

if __name__ == '__main__':
    start = time.time()
    # p = Pool(3)
    index = requests.get('https://lusongsong.com/default_2.html').text
 
    tree = etree.HTML(index)
    urls = tree.xpath('//div[@class="post"]/h2/a/@href')
    for url in urls:
        c = get_request(url)
        parse(c)
    print('總耗時：',time.time()-start)

總耗時： 13.49609375
2.2.多執行緒爬取：先用requests模組獲取頁面連結到列表，再用多工協程爬蟲爬取內容

import requests
from lxml import etree
import time
from multiprocessing.dummy import Pool
 
def get_request(url):
    response = requests.get(url).text
    return response

def parse(html):
    tree = etree.HTML(html)
    title = tree.xpath('//div[@class="post-title"]/h1/a/text()')[0]
    text = tree.xpath('//dd[@class="con"]/p/text()')
    text = "".join(text)
    with open('1.txt','a+',encoding='utf-8') as fp:
        fp.write(title + '\n' + text + '\n')


if __name__ == '__main__':
    start = time.time()
    p = Pool(3)
    index = requests.get('https://lusongsong.com/default_2.html').text
    tree = etree.HTML(index)
    urls = tree.xpath('//div[@class="post"]/h2/a/@href')
    res_list = p.map(get_request,urls)
    for res in res_list:
        parse(res)

    print('總耗時：',time.time()-start)

總耗時： 1.737304925918579

2.3 單執行緒+協程爬取

import time
import asyncio
import aiohttp
from lxml import etree
import requests

async def get_request(url):
    async with aiohttp.ClientSession() as sess:
        async with await sess.get(url=url) as response:
            page_text = await response.text()
            return page_text

def parse(task):
    page_text = task.result()
    tree = etree.HTML(page_text)
    title = tree.xpath('//div[@class="post-title"]/h1/a/text()')[0]
    text = tree.xpath('//dd[@class="con"]/p/text()')
    text = "".join(text)
    with open('3.txt','a+',encoding='utf-8') as fp:
        fp.write(title + '\n' + text + '\n')


if __name__ == '__main__':
    start = time.time()
    index = requests.get('https://lusongsong.com/default_2.html').text
    tree = etree.HTML(index)
    urls = tree.xpath('//div[@class="post"]/h2/a/@href')
    tasks = []
    for url in urls:
        c = get_request(url)
        task = asyncio.ensure_future(c)
        task.add_done_callback(parse)
        tasks.append(task)
    loop = asyncio.get_event_loop()
    loop.run_until_complete(asyncio.wait(tasks))
    print('總耗時：',time.time()-start)

總耗時： 0.5029296875

結論：對於網路爬蟲，多執行緒和協程能夠有效提升效率，原因：單執行緒下有IO操作會進行IO等待，造成不必要的時間浪費，而開啟多執行緒能線上程A等待時，自動切換到執行緒B，可以不浪費CPU的資源，從而能提升程式執行效率。

淺析JavaScript中的協程、程序如何切換執行緒的機制、執行緒如何切換協程的機制、協程的體現（生成器函式）、協程如何實現非同步和非阻塞以及為什麼要使用生成器+Promise組合

一、使用遊戲來理解協程的概念　　如果你還在想辦法理解協程是什麼，那麼就讓我們玩一玩分手廚房。分手廚房（overcooked)，是一款多人烹飪遊戲，玩家需要在特定的時間內做出儘可能多的訂單。協程 (coroutine)有些人

Swoole協程與Go協程的區別，很詳細，很牛逼

程式是什麼？程式就是應用程式的啟動例項。例如：開啟一個軟體，就是開啟了一個程式。程式擁有程式碼和開啟的檔案資源，資料資源，獨立的記憶體空間。

M3U8 併發下載視訊神器，讓大家協程的威力有多大

產生背景：最近身邊有位同志說：“你瞭解M3U8檔案？？？，你能用實現一個下載片兒的神器？？？”。好傢伙！一下就提了兩個問題此時我就逐個回答。我還不知道M3U8是什麼呢，我通過網路知道這是挺流行的網路

python協程的用法，手動切換協程，自動切換協程

技術標籤：軟體測試python 通過yield生成器來實現協程 import time def consumer(): #函式中如果有yield，那麼c=consumer()不會執行該函式，只有執行了c.__next__()才會執行該函式

Swoole-2.0.1-Alpha 已釋出，提供PHP原生協程支援

Swoole-2.0 提供了PHP原生協程排程器，PHP程式碼可以按照同步方式編寫，底層引擎使用非同步IO，排程器會在IO完成後自動切換PHP函式呼叫棧。

實戰單執行緒爬取，單執行緒+協程爬取，多執行緒爬取

一.目標網頁：https://lusongsong.com/default_2.html.爬取該頁面連結（有17個）下詳情內容並儲存到本地

python基礎爬蟲——單執行緒多執行緒爬取圖片

技術標籤：pythonpython 困於心衡於慮而後作今天的學習目標是：單執行緒與多執行緒爬取網頁圖片 python單執行緒：

Mix PHP V2.1 釋出，基於 Swoole 4.4+ 單執行緒協程 PHP 框架

MixPHP 是什麼一個基於 Swoole 開發的高效能 PHP 框架，經過兩年發展收穫了很多中小型團隊的支援，框架版本經歷了：

併發程式設計——定時器，協程，greenlet模組，gevent模組，單執行緒的套接字併發，asyncio模組

一、定時器 Timer的父類是Thread，所以定時器這裡用的是執行緒 # 多長時間之後執行一個任務

AMD R9 5900X 跑分曝光，單執行緒效能較 3900X 提升 20 %

10 月 1 日訊息貼吧使用者 @抓老鼠的好貓呀今天晒出了號稱是 AMD R9 5900X 的 CPU-Z 跑分，並與 AMD R7 3700X 進行了對比。

AMD R9 5950X 跑分曝光：頻率超至 6GHz，單執行緒跑分破紀錄

10 月 16 日訊息根據爆料者 @APISAK 的訊息，AMDR9 5950X (或其變種)的跑分出現在了 Geekbench 中，單執行緒跑分突破 2000 分，打破現有紀錄。

ScheduledExecutorService 多執行緒，單執行緒的影響，定時任務，延時任務，週期任務

adfa1.schedule()方法 public static void main(String[] args) {// 注意此處執行緒個數為1ScheduledExecutorService executorService = Executors.newScheduledThreadPool(1);long start = System.currentTimeMill