Scrapy爬蟲入門教程五 Selectors（選擇器）

阿新 • • 發佈：2018-12-26

開發環境：
Python 3.6.0 版本 （當前最新）
Scrapy 1.3.2 版本 （當前最新）

Selectors（選擇器）

當您抓取網頁時，您需要執行的最常見任務是從HTML源中提取資料。有幾個庫可以實現這一點：

BeautifulSoup是Python程式設計師中非常流行的網路抓取庫，它基於HTML程式碼的結構構建一個Python物件，並且處理相當糟糕的標記，但它有一個缺點：它很慢。
lxml是一個XML解析庫（它還解析HTML）與基於ElementTree的pythonic API 。（lxml不是Python標準庫的一部分。）
Scrapy自帶了提取資料的機制。它們稱為選擇器，因為它們“選擇”由

XPath或CSS表示式指定的HTML文件的某些部分。

XPath是用於選擇XML文件中的節點的語言，其也可以與HTML一起使用。CSS是一種用於將樣式應用於HTML文件的語言。它定義了選擇器以將這些樣式與特定的HTML元素相關聯。

Scrapy選擇器構建在lxml庫之上，這意味著它們的速度和解析精度非常相似。

這個頁面解釋了選擇器是如何工作的，並描述了他們的API是非常小和簡單，不像lxml API是更大，因為 lxml庫可以用於許多其他任務，除了選擇標記文件。

有關選擇器 API的完整參考，請參閱選擇器引用

使用選擇器

構造選擇器

Scrapy選擇器是Selector通過傳遞文字或TextResponse 物件構造的類的例項。它根據輸入型別自動選擇最佳的解析規則（XML與HTML）：

>>> from scrapy.selector import Selector
>>> from scrapy.http import HtmlResponse

從文字構造：

>>> body = '<html><body><span>good</span></body></html>'
>>> Selector(text=body).xpath('//span/text()').extract()
[u'good']

構建響應：

>>>  
response = HtmlResponse(url='http://example.com', body=body)
>>> Selector(response=response).xpath('//span/text()').extract()
[u'good']

為了方便起見，響應物件在.selector屬性上顯示一個選擇器，在可能的情況下使用此快捷鍵是完全正確的：

>>> response.selector.xpath('//span/text()').extract()
[u'good']

使用選擇器

為了解釋如何使用選擇器，我們將使用Scrapy shell（提供互動式測試）和位於Scrapy文件伺服器中的示例頁面：

<html>
 <head>
  <base href='http://example.com/' />
  <title>Example website</title>
 </head>
 <body>
  <div id='images'>
   <a href='image1.html'>Name: My image 1 <br /><img src='image1_thumb.jpg' /></a>
   <a href='image2.html'>Name: My image 2 <br /><img src='image2_thumb.jpg' /></a>
   <a href='image3.html'>Name: My image 3 <br /><img src='image3_thumb.jpg' /></a>
   <a href='image4.html'>Name: My image 4 <br /><img src='image4_thumb.jpg' /></a>
   <a href='image5.html'>Name: My image 5 <br /><img src='image5_thumb.jpg' /></a>
  </div>
 </body>
</html>

首先，讓我們開啟shell：
scrapy shell http://doc.scrapy.org/en/latest/_static/selectors-sample1.html
然後，在載入shell之後，您將有可用的響應作為response shell變數，以及其附加的選擇器response.selector屬性。

由於我們處理HTML，選擇器將自動使用HTML解析器。

因此，通過檢視該頁面的HTML程式碼，讓我們構造一個XPath來選擇標題標籤中的文字：

>>> response.selector.xpath('//title/text()')
[<Selector (text) xpath=//title/text()>]

使用XPath和CSS查詢響應非常普遍，響應包括兩個方便的快捷鍵：response.xpath()和response.css()：

>>> response.xpath('//title/text()')
[<Selector (text) xpath=//title/text()>]
>>> response.css('title::text')
[<Selector (text) xpath=//title/text()>]

正如你所看到的，.xpath()而.css()方法返回一個 SelectorList例項，它是新的選擇列表。此API可用於快速選擇巢狀資料：

>>> response.css('img').xpath('@src').extract()
[u'image1_thumb.jpg',
 u'image2_thumb.jpg',
 u'image3_thumb.jpg',
 u'image4_thumb.jpg',
 u'image5_thumb.jpg']

要實際提取文字資料，必須呼叫選擇器.extract() 方法，如下所示：

>>> response.xpath('//title/text()').extract()
[u'Example website']

如果只想提取第一個匹配的元素，可以呼叫選擇器 .extract_first()

>>> response.xpath('//div[@id="images"]/a/text()').extract_first()
u'Name: My image 1 '

None如果沒有找到元素則返回：

>>> response.xpath('//div[@id="not-exists"]/text()').extract_first() is None
True

可以提供預設返回值作為引數，而不是使用None：

>>> response.xpath('//div[@id="not-exists"]/text()').extract_first(default='not-found')
'not-found'

請注意，CSS選擇器可以使用CSS3偽元素選擇文字或屬性節點：

>>> response.css('title::text').extract()
[u'Example website']

現在我們要獲取基本URL和一些影象連結：

>>> response.xpath('//base/@href').extract()
[u'http://example.com/']

>>> response.css('base::attr(href)').extract()
[u'http://example.com/']

>>> response.xpath('//a[contains(@href, "image")]/@href').extract()
[u'image1.html',
 u'image2.html',
 u'image3.html',
 u'image4.html',
 u'image5.html']

>>> response.css('a[href*=image]::attr(href)').extract()
[u'image1.html',
 u'image2.html',
 u'image3.html',
 u'image4.html',
 u'image5.html']

>>> response.xpath('//a[contains(@href, "image")]/img/@src').extract()
[u'image1_thumb.jpg',
 u'image2_thumb.jpg',
 u'image3_thumb.jpg',
 u'image4_thumb.jpg',
 u'image5_thumb.jpg']

>>> response.css('a[href*=image] img::attr(src)').extract()
[u'image1_thumb.jpg',
 u'image2_thumb.jpg',
 u'image3_thumb.jpg',
 u'image4_thumb.jpg',
 u'image5_thumb.jpg']

巢狀選擇器

選擇方法（.xpath()或.css()）返回相同型別的選擇器的列表，因此您也可以呼叫這些選擇器的選擇方法。這裡有一個例子：

>>> links = response.xpath('//a[contains(@href, "image")]')
>>> links.extract()
[u'<a href="image1.html">Name: My image 1 <br><img src="image1_thumb.jpg"></a>',
 u'<a href="image2.html">Name: My image 2 <br><img src="image2_thumb.jpg"></a>',
 u'<a href="image3.html">Name: My image 3 <br><img src="image3_thumb.jpg"></a>',
 u'<a href="image4.html">Name: My image 4 <br><img src="image4_thumb.jpg"></a>',
 u'<a href="image5.html">Name: My image 5 <br><img src="image5_thumb.jpg"></a>']

>>> for index, link in enumerate(links):
...     args = (index, link.xpath('@href').extract(), link.xpath('img/@src').extract())
...     print 'Link number %d points to url %s and image %s' % args

Link number 0 points to url [u'image1.html'] and image [u'image1_thumb.jpg']
Link number 1 points to url [u'image2.html'] and image [u'image2_thumb.jpg']
Link number 2 points to url [u'image3.html'] and image [u'image3_thumb.jpg']
Link number 3 points to url [u'image4.html'] and image [u'image4_thumb.jpg']
Link number 4 points to url [u'image5.html'] and image [u'image5_thumb.jpg']

使用帶有正則表示式的選擇器

Selector也有一種.re()使用正則表示式提取資料的方法。但是，不同於使用 .xpath()或 .css()methods，.re()返回一個unicode字串列表。所以你不能構造巢狀.re()呼叫。

以下是用於從上面的HTML程式碼中提取圖片名稱的示例：

>>> response.xpath('//a[contains(@href, "image")]/text()').re(r'Name:\s*(.*)')
[u'My image 1',
 u'My image 2',
 u'My image 3',
 u'My image 4',
 u'My image 5']

這裡有一個額外的輔助往復.extract_first()進行.re()，命名.re_first()。使用它只提取第一個匹配的字串：

>>> response.xpath('//a[contains(@href, "image")]/text()').re_first(r'Name:\s*(.*)')
u'My image 1'

使用相對XPath

請記住，如果您巢狀選擇器並使用以XPath開頭的XPath /，該XPath將是絕對的文件，而不是相對於 Selector您呼叫它。

例如，假設要提取

元素中的所有

元素。首先，你會得到所有的元素：
>>> divs = response.xpath('//div')

首先，你可能會使用下面的方法，這是錯誤的，因為它實際上

從文件中提取所有元素，而不僅僅是那些內部

元素：

>>> for p in divs.xpath('//p'):  # this is wrong - gets all <p> from the whole document
...     print p.extract()

這是正確的方式（注意點前面的.//pXPath 的點）：

>>> for p in divs.xpath('.//p'):  # extracts all <p> inside
...     print p.extract()

另一個常見的情況是提取所有直接的\

孩子：

>>> for p in divs.xpath('p'):
...     print p.extract()

有關相對XPath的更多詳細資訊，請參閱XPath規範中的位置路徑部分。

XPath表示式中的變數

XPath允許您使用$somevariable語法來引用XPath表示式中的變數。這在某種程度上類似於SQL世界中的引數化查詢或預準備語句，您在查詢中使用佔位符替換一些引數，?然後用查詢傳遞的值替換。

這裡有一個例子來匹配元素基於其“id”屬性值，沒有硬編碼它（如前所示）：

>>> # `$val` used in the expression, a `val` argument needs to be passed
>>> response.xpath('//div[@id=$val]/a/text()', val='images').extract_first()
u'Name: My image 1 '

這裡是另一個例子，找到一個<div>標籤的“id” 屬性包含五個<a>孩子（這裡我們傳遞的值5作為一個整數）：

>>> response.xpath('//div[count(a)=$cnt]/@id', cnt=5).extract_first()
u'images'

所有變數引用在呼叫時必須有一個繫結值.xpath()（否則你會得到一個異常）。這是通過傳遞必要的命名引數。ValueError: XPath error:

parsel是為Scrapy選擇器提供動力的庫，有關於XPath變數的更多詳細資訊和示例。

使用EXSLT擴充套件

在構建在lxml之上時，Scrapy選擇器還支援一些EXSLT擴充套件，並帶有這些預先註冊的名稱空間以在XPath表示式中使用：

正則表示式

test()例如，當XPath starts-with()或者contains()不足時，該函式可以證明是非常有用的。

示例選擇列表項中的連結，其中“類”屬性以數字結尾：

>>> from scrapy import Selector
>>> doc = """
... <div>
...     <ul>
...         <li class="item-0"><a href="link1.html">first item</a></li>
...         <li class="item-1"><a href="link2.html">second item</a></li>
...         <li class="item-inactive"><a href="link3.html">third item</a></li>
...         <li class="item-1"><a href="link4.html">fourth item</a></li>
...         <li class="item-0"><a href="link5.html">fifth item</a></li>
...     </ul>
... </div>
... """
>>> sel = Selector(text=doc, type="html")
>>> sel.xpath('//li//@href').extract()
[u'link1.html', u'link2.html', u'link3.html', u'link4.html', u'link5.html']
>>> sel.xpath('//li[re:test(@class, "item-\d$")]//@href').extract()
[u'link1.html', u'link2.html', u'link4.html', u'link5.html']
>>>

警告

C庫libxslt本身不支援EXSLT正則表示式，所以lxml的實現使用鉤子到Python的re模組。因此，在XPath表示式中使用regexp函式可能會增加小的效能損失。

設定操作

這些可以方便地在提取文字元素之前排除文件樹的部分。

使用專案範圍組和相應的itemprops組提取微資料（從http://schema.org/Product中提取的示例內容）示例：

>>> doc = """
... <div itemscope itemtype="http://schema.org/Product">
...   <span itemprop="name">Kenmore White 17" Microwave</span>
...   <img src="kenmore-microwave-17in.jpg" alt='Kenmore 17" Microwave' />
...   <div itemprop="aggregateRating"
...     itemscope itemtype="http://schema.org/AggregateRating">
...    Rated <span itemprop="ratingValue">3.5</span>/5
...    based on <span itemprop="reviewCount">11</span> customer reviews
...   </div>
...
...   <div itemprop="offers" itemscope itemtype="http://schema.org/Offer">
...     <span itemprop="price">$55.00</span>
...     <link itemprop="availability" href="http://schema.org/InStock" />In stock
...   </div>
...
...   Product description:
...   <span itemprop="description">0.7 cubic feet countertop microwave.
...   Has six preset cooking categories and convenience features like
...   Add-A-Minute and Child Lock.</span>
...
...   Customer reviews:
...
...   <div itemprop="review" itemscope itemtype="http://schema.org/Review">
...     <span itemprop="name">Not a happy camper</span> -
...     by <span itemprop="author">Ellie</span>,
...     <meta itemprop="datePublished" content="2011-04-01">April 1, 2011
...     <div itemprop="reviewRating" itemscope itemtype="http://schema.org/Rating">
...       <meta itemprop="worstRating" content = "1">
...       <span itemprop="ratingValue">1</span>/
...       <span itemprop="bestRating">5</span>stars
...     </div>
...     <span itemprop="description">The lamp burned out and now I have to replace
...     it. </span>
...   </div>
...
...   <div itemprop="review" itemscope itemtype="http://schema.org/Review">
...     <span itemprop="name">Value purchase</span> -
...     by <span itemprop="author">Lucas</span>,
...     <meta itemprop="datePublished" content="2011-03-25">March 25, 2011
...     <div itemprop="reviewRating" itemscope itemtype="http://schema.org/Rating">
...       <meta itemprop="worstRating" content = "1"/>
...       <span itemprop="ratingValue">4</span>/
...       <span itemprop="bestRating">5</span>stars
...     </div>
...     <span itemprop="description">Great microwave for the price. It is small and
...     fits in my apartment.</span>
...   </div>
...   ...
... </div>
... """
>>> sel = Selector(text=doc, type="html")
>>> for scope in sel.xpath('//div[@itemscope]'):
...     print "current scope:", scope.xpath('@itemtype').extract()
...     props = scope.xpath('''
...                 set:difference(./descendant::*/@itemprop,
...                                .//*[@itemscope]/*/@itemprop)''')
...     print "    properties:", props.extract()
...     print

current scope: [u'http://schema.org/Product']
    properties: [u'name', u'aggregateRating', u'offers', u'description', u'review', u'review']

current scope: [u'http://schema.org/AggregateRating']
    properties: [u'ratingValue', u'reviewCount']

current scope: [u'http://schema.org/Offer']
    properties: [u'price', u'availability']

current scope: [u'http://schema.org/Review']
    properties: [u'name', u'author', u'datePublished', u'reviewRating', u'description']

current scope: [u'http://schema.org/Rating']
    properties: [u'worstRating', u'ratingValue', u'bestRating']

current scope: [u'http://schema.org/Review']
    properties: [u'name', u'author', u'datePublished', u'reviewRating', u'description']

current scope: [u'http://schema.org/Rating']
    properties: [u'worstRating', u'ratingValue', u'bestRating']

>>>

這裡我們先迭代itemscope元素，對於每一個元素，我們尋找所有itemprops元素，並排除那些在另一個元素內部的元素itemscope。

一些XPath提示

這裡有一些提示，你可能會發現有用的使用XPath與Scrapy選擇器，基於這個帖子從ScrapingHub的部落格。如果你不太熟悉XPath，你可能想先看看這個XPath教程。

在條件中使用文字節點

當您需要使用文字內容作為XPath字串函式的引數時，請避免使用.//text()和使用.。

這是因為表示式.//text()產生一組文字元素 - 一個節點集。當一個節點集被轉換為一個字串，當它作為引數傳遞給一個字串函式，如contains()or starts-with()時，會導致第一個元素的文字。

例：

>>> from scrapy import Selector
>>> sel = Selector(text='<a href="#">Click here to go to the <strong>Next Page</strong></a>')

將節點集轉換為字串：

>>> sel.xpath('//a//text()').extract() # take a peek at the node-set
[u'Click here to go to the ', u'Next Page']
>>> sel.xpath("string(//a[1]//text())").extract() # convert it to string
[u'Click here to go to the ']

一個節點轉換為字串，但是，拼文字本身及其所有的後代：

>>> sel.xpath("//a[1]").extract() # select the first node
[u'<a href="#">Click here to go to the <strong>Next Page</strong></a>']
>>> sel.xpath("string(//a[1])").extract() # convert it to string
[u'Click here to go to the Next Page']

所以，.//text()在這種情況下使用節點集不會選擇任何東西：

>>> sel.xpath("//a[contains(.//text(), 'Next Page')]").extract()
[]

但是使用的.意思是節點，工作原理：

>>> sel.xpath("//a[contains(., 'Next Page')]").extract()
[u'<a href="#">Click here to go to the <strong>Next Page</strong></a>']

注意// node [1]和（// node）之間的區別[1]
//node[1]選擇在它們各自的父親下首先出現的所有節點。

(//node)[1] 選擇文件中的所有節點，然後僅獲取其中的第一個。

例：

>>> from scrapy import Selector
>>> sel = Selector(text="""
....:     <ul class="list">
....:         <li>1</li>
....:         <li>2</li>
....:         <li>3</li>
....:     </ul>
....:     <ul class="list">
....:         <li>4</li>
....:         <li>5</li>
....:         <li>6</li>
....:     </ul>""")
>>> xp = lambda x: sel.xpath(x).extract()

這將獲得所有第一個<li> 元素，無論它是它的父：

>>> xp("//li[1]")
[u'<li>1</li>', u'<li>4</li>']

這<li> 是整個文件的第一個元素：

>>> xp("(//li)[1]")
[u'<li>1</li>']

這將獲得父<li> 下的所有第一個元素<ul>：

>>> xp("//ul/li[1]")
[u'<li>1</li>', u'<li>4</li>']

這將獲得整個文件中父級<li> 下的第一個元素<ul>：

>>> xp("(//ul/li)[1]")
[u'<li>1</li>']

當按類查詢時，請考慮使用CSS

因為一個元素可以包含多個CSS類，所以XPath選擇元素的方法是相當冗長：

*[contains(concat(' ', normalize-space(@class), ' '), ' someclass ')]

如果你使用@class='someclass'你可能最終缺少有其他類的元素，如果你只是使用補償，你可能會得到更多的你想要的元素，如果他們有一個不同的類名共享字串。contains(@class, 'someclass')someclass

事實證明，Scrapy選擇器允許你連結選擇器，所以大多數時候你可以使用CSS選擇類，然後在需要時切換到XPath：

>>> from scrapy import Selector
>>> sel = Selector(text='<div class="hero shout"><time datetime="2014-07-23 19:00">Special date</time></div>')
>>> sel.css('.shout').xpath('./time/@datetime').extract()
[u'2014-07-23 19:00']

這比使用上面顯示的詳細XPath技巧更清晰。只要記住.在後面的XPath表示式中使用。

內建選擇器參考

class scrapy.selector.Selector(response=None, text=None, type=None)

一個例項Selector是一個包裝器響應來選擇其內容的某些部分。

response是一個HtmlResponse或一個XmlResponse將被用於選擇和提取的資料物件。

text是一個unicode字串或utf-8編碼的文字，當一個 response不可用時。使用text和response一起是未定義的行為。

type定義選擇器型別，它可以是"html"，"xml"或None（預設）。

如果type是None，選擇器將根據response型別（見下文）自動選擇最佳型別，或者預設"html"情況下與選項一起使用text。

如果type是None和response傳遞，選擇器型別從響應型別推斷如下：

"html"對於HtmlResponse型別
"xml"對於XmlResponse型別
"html"為任何其他

否則，如果type設定，選擇器型別將被強制，並且不會發生檢測。

xpath（查詢）
查詢與xpath匹配的節點query，並將結果作為 SelectorList例項將所有元素展平。列表元素也實現Selector介面。

query 是一個包含要應用的XPATH查詢的字串。

注意

為了方便起見，這種方法可以稱為 response.xpath()

css（查詢）
應用給定的CSS選擇器並返回一個SelectorList例項。

query 是一個包含要應用的CSS選擇器的字串。

在後臺，CSS查詢使用cssselect庫和run .xpath()方法轉換為XPath查詢。

注意

為了方便起見，該方法可以稱為 response.css()

extract（）
序列化並返回匹配的節點作為unicode字串列表。編碼內容的百分比未引用。

re（regex）
應用給定的正則表示式並返回一個包含匹配項的unicode字串的列表。

regex可以是編譯的正則表示式或將被編譯為正則表示式的字串 re.compile(regex)

注意

注意，re()和re_first()解碼HTML實體（除\<和\&）。

register_namespace（prefix，uri）
註冊在此使用的給定名稱空間Selector。如果不註冊名稱空間，則無法從非標準名稱空間中選擇或提取資料。參見下面的例子。

remove_namespaces（）
刪除所有名稱空間，允許使用無名稱空間的xpaths遍歷文件。參見下面的例子。

_nonzero_（）
返回True如果有選擇或任何實際的內容False 除外。換句話說，a的布林值Selector由它選擇的內容給出。

SelectorList物件

class scrapy.selector.SelectorList

本SelectorList類是內建的一個子list 類，它提供了幾個方法。

xpath（查詢）
呼叫.xpath()此列表中每個元素的方法，並將其結果作為另一個返回SelectorList。

query 是同一個引數 Selector.xpath()

css（查詢）
呼叫.css()此列表中每個元素的方法，並將其結果作為另一個返回SelectorList。

query 是同一個引數 Selector.css()

extract（）
呼叫.extract()此列表中每個元素的方法，並將其結果作為unicode字串列表返回展平。

re（）
呼叫.re()此列表中每個元素的方法，並將其結果作為unicode字串列表返回展平。

_nonzero_（）
如果列表不為空，則返回True，否則返回False。

HTML響應的選擇器示例

這裡有幾個Selector例子來說明幾個概念。在所有情況下，我們假設已經Selector例項化了一個HtmlResponse物件，如下：

sel = Selector(html_response)

<h1>從HTML響應主體中選擇所有元素，返回Selector物件列表（即SelectorList物件）：

sel.xpath("//h1")

<h1>從HTML響應正文中提取所有元素的文字，返回unicode字串

sel.xpath("//h1").extract()         # this includes the h1 tag
sel.xpath("//h1/text()").extract()  # this excludes the h1 tag

迭代所有<p>標籤並列印其類屬性：

for node in sel.xpath("//p"):
    print node.xpath("@class").extract()

XML響應的選擇器示例

這裡有幾個例子來說明幾個概念。在這兩種情況下，我們假設已經Selector例項化了一個 XmlResponse物件，像這樣：

sel = Selector(xml_response)

從XML響應主體中選擇所有元素，返回Selector物件列表（即SelectorList物件）：

sel.xpath("//product")

sel.register_namespace("g", "http://base.google.com/ns/1.0")
sel.xpath("//g:price").extract()

刪除名稱空間

當處理抓取專案時，通常很方便地完全刪除名稱空間，只需處理元素名稱，編寫更簡單/方便的XPath。你可以使用的 Selector.remove_namespaces()方法。

讓我們展示一個例子，用GitHub部落格atom feed來說明這一點。

首先，我們開啟shell和我們想要抓取的url：

$ scrapy shell https://github.com/blog.atom

一旦在shell中，我們可以嘗試選擇所有物件，並看到它不工作（因為Atom XML名稱空間模糊了這些節點）：

>>> response.xpath("//link")
[]

但是一旦我們呼叫該Selector.remove_namespaces()方法，所有節點都可以直接通過他們的名字訪問：

>>> response.selector.remove_namespaces()
>>> response.xpath("//link")
[<Selector xpath='//link' data=u'<link xmlns="http://www.w3.org/2005/Atom'>,
 <Selector xpath='//link' data=u'<link xmlns="http://www.w3.org/2005/Atom'>,
 ...

如果你想知道為什麼預設情況下不呼叫名稱空間刪除過程，而不是手動呼叫它，這是因為兩個原因，按照相關性的順序：

刪除名稱空間需要迭代和修改文件中的所有節點，這對於Scrapy爬取的所有文件來說是一個相當昂貴的操作
可能有一些情況下，實際上需要使用名稱空間，以防某些元素名稱在名稱空間之間衝突。這些情況非常罕見。

Scrapy爬蟲入門教程五 Selectors（選擇器）

Selectors（選擇器）

使用選擇器

構造選擇器

使用選擇器

巢狀選擇器

使用帶有正則表示式的選擇器

使用相對XPath

XPath表示式中的變數

使用EXSLT擴充套件

正則表示式

設定操作

一些XPath提示

在條件中使用文字節點

當按類查詢時，請考慮使用CSS

內建選擇器參考

SelectorList物件

HTML響應的選擇器示例

XML響應的選擇器示例

刪除名稱空間

Scrapy爬蟲入門教程五 Selectors（選擇器）

Scrapy爬蟲入門教程十三 Settings（設定）

Scrapy爬蟲入門教程四 Spider（爬蟲）

Scrapy爬蟲入門教程十一 Request和Response（請求和響應）

Scrapy爬蟲入門教程七 Item Loaders（專案載入器）

Spring Cloud 入門教程(五): 路由閘道器(zuul) (Greenwich.RELEASE)

Python入門學習筆記03（裝飾器）

Jquery | 基礎 | 慕課網 | （*選擇器）

Python全棧開發記錄_第五篇（裝飾器）

jquery應用—— 查詢元素相關（選擇器）；

前端之CSS：CSS選擇器前端之css樣式（選擇器）。。。

MUI-picker（選擇器），一級、二級聯動、三級聯動

轉載：Javascript獲取html元素的XPath路徑（選擇器）

好程式設計師web前端CSS選擇符（選擇器）:表示要定義樣式的物件

protobuf入門教程(五)：列舉(enum)、包（package）

怎樣解決安裝scrapy爬蟲框架失敗的問題（圖文教程）？

Spring Boot入門教程(五十一): JSON Web Token（JWT）

WebMagic爬蟲入門教程（三）爬取汽車之家的例項-品牌車系車型結構等

Spring Cloud入門教程之路由閘道器 Zuul（五）(Finchley版本+Boot2.0)

學習Python就業有哪些方向，附加視訊教程（python3從入門到進階（面向物件），實戰（爬蟲，飛機遊戲，GUI實戰）視訊教程）

Scrapy爬蟲入門教程五 Selectors（選擇器）

Selectors（選擇器）

使用選擇器

構造選擇器

使用選擇器

巢狀選擇器

使用帶有正則表示式的選擇器

使用相對XPath

XPath表示式中的變數

使用EXSLT擴充套件

正則表示式

設定操作

一些XPath提示

在條件中使用文字節點

當按類查詢時，請考慮使用CSS

內建選擇器參考

SelectorList物件

HTML響應的選擇器示例

XML響應的選擇器示例

刪除名稱空間

相關推薦