python正則表示式(re模組)

阿新 • • 發佈：2020-08-08

# python正則表示式(re模組) ## 什麼是正則表示式正則表示式(Regular Expression)是一種文字模式，包括普通字元（例如，a到z之間的字母）和特殊字元（稱為"元字元"）。正則表示式使用但個字串來描述、匹配一系列匹配某個句法規則的字串。 ## 正則字元簡單介紹 ### 普通字元普通字元包括沒有顯示指定為元字元的所有可列印和不可列印字元。這包括所有大寫和小寫字母、所有數字、所有標點符號和一些其它符號 ###特殊字元 | 特別字元 |描述 | | ---- | ---- | | $ | 匹配輸入字串的結尾位置。如果設定了RegExp物件的Multline屬性，則$也匹配'\n'或'\r'。 | | () | 匹配一個子表示式的開始和結束位置（匹配括號中的全部內容）。子表示式可以獲取供以後使用。 | | * | 匹配前面的子表示式零次或多次。 | | + | 匹配前面的子表示式一次或多次（至少有一次）。 | | . | 匹配除換行符\n之外的任何單詞。 | | [ ] | 匹配括號中一個字元，範圍描述如[0-9 a-z A-Z]。 | | ? | 匹配前面的子表示式零次或一次，或指明一個非貪婪的限定。 | | \ | 轉義字元，如\*表示匹配*號。 | | ^ | 匹配字串的開始位置（用在[ ]時，可以理解為取反，表示不匹配中括號中的字串）。 | | {} | 限定匹配的次數，如{n}表示匹配n個字元，{n,}表示至少匹配n個字元，{n,m}表示至少n個，最多m個（m和n均為非負整數）。 | | \| | 兩項中取一項。 | ###非列印字元非列印字元也可以是正則表示式的組成部分。 | 字元 | 描述 | | --- | --- | |\b|匹配一個單詞邊界，即字與空格間的位置。（"This is Regex匹配單獨的單詞"is",正則就要寫成"\bis\b"）。| | \d | 匹配數字。 | | \w | 匹配字母，數字，下劃線。 | | \s | 匹配空格。 | |\B|非單詞邊界匹配。| | \D | 匹配非數字。 | | \W | 匹配非（字母，數字，下劃線）。 | | \S | 匹配非空格。 | ### 量詞量詞的三個重要概念 **貪婪**(貪心)如"*"字元，貪婪量詞會首先匹配整個字串，嘗試匹配時，它會選定儘可能多的內容，如果失敗則回退一個字元，然後再次嘗試，回退的過程就叫做回溯，它會每次回退一個字元，直到找到匹配的內容或者沒有字元可以回退。相比下面兩種貪婪量詞對資源的消耗是最大的。 **懶惰**(勉強)如"?"，懶惰量詞使用另一種方法匹配，它從目標的起始位置開始嘗試匹配，每檢查一個字元，並尋找它要匹配的內容，如此迴圈直到字串結尾處。 **佔有**如"+"，佔有量詞很像貪心式量詞，它會選擇儘可能多的內容，然後嘗試尋找匹配內容，但它只嘗試一次，不會回溯。就好比先抓一把石頭，然後從石頭中挑出黃金。 ## re模組中常用功能函式 **compile()** 編譯正則表示式模式，返回一個物件的模式。（可以把一些常用的正則表示式編譯成正則表示式物件，這樣可以提高效率）。 re.compile(pattern,flags=0) pattern：編譯時用的表示式字串。 flags：編譯標誌位，用於修改正則表示式，如：是否區分大小寫，多行匹配等。常用的flags有： | 標誌 | 含義 | |---|---| | re.S(DOTALL) | 使.匹配包括換行符在內的所有字元。 | | re.I(IGNORECASE) | 是匹配對大小寫不敏感。 | | re.L(LOCALE) | 做本地化識別（locale-aware）匹配，法語等。 | | re.M(MULTILINE) | 多行匹配，影響^和$。 | | re.X(VERBOSE) | 該標誌通過給予更靈活的格式以便將正則表示式寫的更易於理解。 | | re.U | 根據Unicode字符集解析字元，這個標誌影響\w、\W、\b、\B | ```python import re # 將正則表示式編譯成pattern物件 pattern = re.compile("\d+") ``` pattern物件的常用方法有：match()、search()、finall()、finder()、split()、sub()、subn()。 **（1）match()方法** 該方法用於查詢字串的頭部，它只要找到一個匹配的結果就返回。（這個方法並不是完全匹配。當pattern結束時若string還要剩餘字元，仍然視為成功。想要完全匹配，可以在表示式末尾加上邊界匹配符'$'）。 ```python match(pattern, string,pos=0,endpos=-1) ``` 說明：string是待匹配的字串，pos和endpos指定字串的起始和終點的位置，當不指定是，預設從頭部開始匹配，當匹配成功是，返回Match物件。 ```python import re pattern = re.compile("\\d+") match = pattern.match("aaa123bbb123ccc123") print(match) # None（從頭部開始匹配） match = pattern.match("aaa123bbb123ccc123", 3, 6) print(match) # <_sre.SRE_Match object; span=(3, 6), match='123'> print(match.group()) # 123,返回匹配的字串，如果需要獲得整個匹配的字串時，可以使用group()或者group(0) print(match.start()) # 3,返回匹配的字串在整個字串的起始位置 print(match.end()) # 6,返回匹配的字串在整個字串的結束位置 print(match.span()) # (3, 6),返回（start(), end()） ``` **(2) serach()方法** ```python search(pattern, string,pos=0,endpos=-1) ``` 說明：匹配成功時返回Match物件，匹配不成功時返回None。 ```python import re pattern = re.compile("\\d+") match = pattern.search("aaaa1111bbbb1234cccc1243") print(match) # <_sre.SRE_Match object; span=(4, 8), match='1111'> match = pattern.search("aaaa1111bbbb1234cccc1243", 3, 6) print(match) # <_sre.SRE_Match object; span=(4, 6), match='11'> print(match.group()) # 11 print(match.start()) # 4 print(match.end()) # 6 print(match.span()) # (4, 6) ``` **(3)findall()方法** 該方法返回所有的匹配結果。 ```python findall(pattern, string, pos, endpos=-1) ``` 說明：匹配成功，返回匹配的列表，匹配不成功，返回空列表。 ```python import re pattern = re.compile("\\d+") match = pattern.findall("aaaa1111bbbb1234cccc1243") print(match) # ['1111', '1234', '1243'] ``` **(4)finditer()方法** ```python finditer(pattern, string, pos=0, endpos=-1) ``` 說明：匹配所有的字串，返回所有匹配字串，但是它返回的是一個迭代器，通過該迭代器我們可以訪問匹配的每一個字串。 ```python import re pattern = re.compile("\\d+") result_iter = pattern.finditer("aaaa1111bbbb1234cccc1243") for result in result_iter: print("找得到字串{}，位置是{}".format(result.group(), result.span())) # 找得到字串1111，位置是(4, 8) # 找得到字串1234，位置是(12, 16) # 找得到字串1243，位置是(20, 24) ``` **(5)split()方法** ```python split(pattern, string, maxsplit=0) ``` 說明：用來分割字串，maxsplit表示最大的分割次數，不指定即為全部分割。 ```python import re print(re.split('\d+', 'one1two2three3four4five5')) # ['one', 'two', 'three', 'four', 'five', ''] ``` **(6)sub()方法** ```python sub(pattern, repl, string, count=0, flags=0) ``` 說明：該方法用來替換。rel如果為字串，會使用rel替換字串中的每一個匹配的子串，並且返回替換後的字串；如果為函式，則該函式應該只接收一個Match物件，並且返回一個字串用於替換。 count用於指定替換次數。 ```python import re p = re.compile(r'(\w+) (\w+)') s = 'test aaa,test bbb' def func(m): return 'hei' + ' ' + m.group(2) print(p.sub(r'hello world', s)) # hello world,hello world(使用hello world替換) print(p.sub(r'\2 \1', s)) # aaa test,bbb test(\1 上方第一個括號內的內容。) print(p.sub(func, s)) # hei aaa,hei bbb(替換全部) print(p.sub(func, s, 1)) # hei aaa,test bbb(最多隻替換一次) ``` (7)subn()方法 ``` subn(pattern, repl, string, count=0, flags=0) ``` 說明：該方法也是用於替換，返回一個元祖，元祖有兩個元素，第一個和使用sub方法返回的結果一樣，count表示替換的次數。 ```python import re p = re.compile(r'(\w+) (\w+)') s = 'test aaa,test bbb' def func(m): return 'hei' + ' ' + m.group(2) print(p.subn(r'hello world', s)) # ('hello world,hello world', 2) print(p.subn(r'\2 \1', s)) # ('aaa test,bbb test', 2) (\1 上方第一個括號內的內容。) print(p.subn(func, s)) # ('hei aaa,hei bbb', 2) print(p.subn(func, s, 1)) # ('hei aaa,test bbb', 1) ``` ## 一些注意點 **1、re.match**與**re.search**與**re.findall**的區別： re.match只匹配字串的開始，如果字串開始不符合正則表示式，則匹配失敗，函式返回None;而re.search匹配整個字串，直到找到一個匹配。re.findall返回所有的匹配結果。 ```python import re p = re.compile(r'[\d]') s = 'abc33' print(p.search(s).group()) # 3 print(p.match(s)) # None print(p.findall(s)) # ['3', '3'] ``` **2、貪婪匹配與非貪婪匹配** *,+,?等都是貪婪匹配，也就是儘可能多的匹配，後面加上？號使其變成惰性匹配。 ```python import re a = re.findall(r'a(\d+?)', 'a23b') print(a) b = re.findall(r'a(\d+)', 'a23b') print(b) ``` 注意：如果前後均有限定條件的時候，就不存在什麼貪婪模式了，非匹配模式失效。 ```python import re a = re.findall(r'a(\d+)b', 'a3333b') print(a) b = re.findall(r'a(\d+?)b', 'a3333b') print(b) ```

python正則表示式(re模組)

Python正則表示式:re模組

Python 正則表示式——re模組介紹

python 正則表示式re模組

python正則表示式(re模組)

[Python模組]正則表示式 re模組的使用與例項

python記錄_day23 正則表示式 re模組

[Python模組]正則表示式 re模組的使用及例項

python之路---24 正則表示式 re模組

正則表示式(re模組)

正則表示式-re模組的使用

正則表示式 re模組 re模組實用方法

python正則表示式 re （二）sub

Python正則表示式Re中findall

python正則表示式 re （二）compile

python正則表示式 re （二）split

Python正則表示式--Re庫的基本使用

python正則表示式re 中m.group和m.groups的解釋

Python正則表示式 re.sub()函式：標誌位flags與引數個數問題

Python 3.7.1 模組正則表示式 re

Python 正則表示式，re模組，match匹配(預設從開頭匹配)，分組

python正則表示式(re模組)

相關推薦