python第一篇：正則表達式的方法簡單歸納

阿新 • • 發佈：2018-11-02

取子串正則匹配 ear 字符0 結束所有 cde 但我長度

首先先對一些常用的匹配模式進行一下介紹及互相認識一下，當然了可能它認識我，但我絕對還不認識它。。。 ******************************************** 元字符【 . ^ $ * + ? {} [] \ | () 】 ******************************************** 1、[.] 匹配任意1個字符，除了換行符（\n）。要匹配包括‘\n‘在內的任何字符，就使用‘[.\n]‘的模式例： import re print(re.findall(r‘a.‘,‘ab ac afbbbb‘)) >> [‘ab‘, ‘ac‘, ‘af‘] 2、[|] 或匹配符號左邊或右邊的字符例： import re str1=‘dit dot det,dct dit dot‘ print (re.findall(‘det|dct‘,str1)) >> [‘det‘, ‘dct‘] 3、[] 字符集合，匹配這個集合裏面的任意一個就可以 [ic]表示i或c，‘d[ic]t‘表示dit和dct兩者，和‘dit|dct‘等價 [^abc ] 代表反取,代表取abc之外的字符串例： import re str1=‘dit dot det,dct dit dot‘ print (re.findall(‘d[ic]t‘,str1)) >> [‘dit‘, ‘dct‘, ‘dit‘] 4、^ 表示否定 [^ic]表示除了i和c 例： import re str1=‘dit dot det,dct dit dot‘ print (re.findall(‘d[^ic]t‘,str1)) >> [‘dot‘, ‘det‘, ‘dot‘] 4-2 匹配字符串的開頭 ^dit 表示子串dit在開頭位置例： import re str1=‘dit dot det,dct dit dot‘ print (re.findall(‘^dit‘,str1)) >> [‘dit‘] 5、$ 匹配字符串的末尾 ‘dot$‘ 表示子串dot要在末尾位置 import re str1=‘dit dot det,dct dit dot‘ print (re.findall(‘dot$‘,str1)) >> [‘dot‘] 6、+ ‘di+t’ 表示d與t之間省略了一個或多個i import re str1=‘dit dot det,dct diit dot‘ print (re.findall(‘di+t‘,str1)) >> [‘dit‘, ‘diit‘] 7、* 匹配*號前的一個字符0次或多次 ‘di*t’表示d與t之間省略了零個至多個i import re str1=‘dit dt det,dct diit dot‘ print (re.findall(‘di*t‘,str1)) >> [‘dit‘, ‘dt‘, ‘diit‘] 擴展 ‘.+’搭配使用，表示省略了一個至多個任意元素 ‘.*’ 搭配使用，表示省略了零個至多個任意元素 import re str1=‘d dt dit diit dot‘ print (re.findall(‘d.*t‘,str1)) print (re.findall(‘d.+t‘,str1)) >> [‘d dt dit diit dot‘] >> [‘d dt dit diit dot‘] 8、? 匹配？前的一個字符0次或1次 ‘di?t’表示i可有可無，即dt，dit都滿足條件 import re str1=‘d dt dit diit det‘ print(re.findall(‘di?t‘,str1)) >> [‘dt‘, ‘dit‘] 8-2 ? 清除貪婪匹配 9、{} di{n}t 表示d和t之間有n個i import re str1=‘d dt dit diit det‘ print(re.findall(‘di{2}t‘,str1)) >> [‘diit‘] 9-2 {} di{n,m}t 表示d和t之間有n到m個i import re str1=‘d dt dit diit diiiit det‘ print(re.findall(‘di{2,4}t‘,str1)) >> [‘diit‘, ‘diiiit‘] 擴展：n和m都可以省略。{n,}表示n個到任意個；{,m}表示0到m個；{,}表示任意個 import re str1=‘d dt dit diit diiit diiiit det‘ print(re.findall(‘di{3,}t‘,str1)) print(re.findall(‘di{,3}t‘,str1)) print(re.findall(‘di{,}t‘,str1)) >> [‘diiit‘, ‘diiiit‘] [‘dt‘, ‘dit‘, ‘diit‘, ‘diiit‘] [‘dt‘, ‘dit‘, ‘diit‘, ‘diiit‘, ‘diiiit‘] 10、\ 取消元字符，變成轉義字符就是說如果你想讓特殊字符失去python語言賦予的意義變成普通字符，就在前面加上\ 例： import re print(re.findall(r‘\?.‘,‘ab? ac af?bab?b‘)) >> [‘? ‘, ‘?b‘, ‘?b‘] 10-2、 \ 預定義字符例： import re str1=‘12 abd 34 def‘ print(re.findall(‘\d‘,str1)) print(re.findall(‘\w‘,str1)) >> [‘1‘, ‘2‘, ‘3‘, ‘4‘] [‘1‘, ‘2‘, ‘a‘, ‘b‘, ‘d‘, ‘3‘, ‘4‘, ‘d‘, ‘e‘, ‘f‘] 11、（）在匹配字符串後，只輸出匹配字串（）裏面的內容 import re str1=‘12abcd34‘ print(re.findall(‘12abcd34‘,str1)) print(re.findall(‘1(2a)bcd‘,str1)) print(re.findall(‘1(2a)bc(d3)4‘,str1)) >> [‘12abcd34‘] [‘2a‘] [(‘2a‘, ‘d3‘)] *************************************************************** re模塊的主要方法 findall() finditer() match() search() compile() aplit() sub() subn() *************************************************************** 1、re.match(1正則表達式，2要匹配的字符串，3標誌位“用於控制正則表達式的匹配方式”) 開始位置：從第一個單詞中匹配字符串，如果匹配到就返回一個結果，匹配不到就返回None 例: import re print (re.match(‘www‘,‘www.baidu.com‘).span()) print (re.match(‘baidu‘,‘www.baidu.com‘)) >> (0, 3) None 2、re.search(1正則，2原字符串，3標誌位) 匹配規則：掃描整個字符串並返回第一個成功匹配的結果，沒有就返回None 例： import re print (re.search(‘www‘,‘www.baidu.com‘).span()) print (re.search(‘baidu‘,‘www.baidu.com.baidu‘).span()) >> (0, 3) (4, 9) 3、re.findall() 匹配規則：從左往右搜索，結果1以list的形式返回，沒有匹配到就返回空列表例： import re print (re.findall(‘www‘,‘www.baidu.com.wwww‘)) print (re.findall(‘www‘,‘nishuosdhsds‘)) >>[‘www‘, ‘www‘] [] 4、re.finditer() 匹配規則：從左往右搜索，但是結果以叠代器的形式返回例： import re str1=‘ab cd e‘ istr1=re.finditer(‘\w+‘,str1) for a in istr1: print (a.group(),a.span()) 擴展：a.group()返回滿足匹配調節的子串，a.span()返回子串的起始位置和末尾位置 >> (‘ab‘, (0, 2)) (‘cd‘, (3, 5)) (‘e‘, (6, 7)) 5、re.compile() 匹配規則：對匹配格式先進行編譯，返回一個實例對象。然後再使用它，可以加快匹配速度例： import re str1=‘abcdeabfg‘ pre=re.compile(‘ab‘) print (pre.findall(str1)) >> [‘ab‘, ‘ab‘] 6、re.split() 匹配規則：在string匹配正則表達式時進行分割例： import re str1=‘abc.d.ea.bfg.rere‘ str2=‘12+34-56*78/90‘ print (re.split(‘\.‘,str1)) print (re.split(‘[\+\-\*/]‘,str2)) >> [‘abc‘, ‘d‘, ‘ea‘, ‘bfg‘, ‘rere‘] [‘12‘, ‘34‘, ‘56‘, ‘78‘, ‘90‘] 7、re.sub(1正則，2替換的字符串，3原始字符串，4匹配後替換的最大次數) 匹配規則：用於替換字符串中的匹配項例； import re str1=‘abcdefg‘ print (re.sub(‘b‘,‘123‘,str1)) >> a123cdefg 8、re.subn() 匹配規則：功能與sub相似，但返回結果多了一個數字，代表替換了多少次例： import re str1=‘abcdebnfgcb‘ print (re.subn(‘b‘,‘123‘,str1)) >> (‘a123cde123nfgc123‘, 3) **************************************************** 還有一些關於字母的 \A \b\B \d\D \G \s\S \w\W \z\Z #####[0-9] [a-z] [A-Z] [a-zA-Z0-9] [^0-9] **************************************************** 1、\A 匹配字符串開始，不能進行多行匹配（^）例： import re print(re.findall(r‘\Ahttp://‘,‘http://www.baidu.com is good\nhttp://www.sohu.com‘,re.M)) >> [‘http://‘] 2、\Z 匹配字符串的結尾，不能進行多行匹配($) 例： import re print(re.findall(r‘\.jpge\Z|\.png\Z|\.gif\Z‘,‘touxiang.gif\nqq.png‘,re.M)) >> [‘.png‘] 3、\d 匹配一個數字 \w 匹配一個字母或數字 . 匹配任意字符例： ‘00\d‘ 可以匹配‘007‘,不能匹配‘00A‘ ‘\d\d\d‘可以匹配‘010‘ ‘\w\w\d‘可以匹配‘py3 ‘py.‘可以匹配‘py1‘‘pyc‘‘py!‘ * 匹配任意個字符（包括0個） . 匹配1個字符 + 匹配至少一個字符？表示0個或1個字符 {n} 表示n個字符 {n,m}表示n-m個字符 4、\s 匹配任意空白字符 \S匹配任意非空字符例： \d{3}\s+\d{3,8} >> \d{3} 表示匹配3個數字如‘010‘ \s 可以匹配一個空格 \s+表示至少有一個空格如‘ ‘,‘ ‘等 \d{3,8} 表示3-8個數字如‘1234567‘ 5、精確匹配 [0-9a-zA-Z\_] 可以匹配一個數字、字母或下劃線 [0-9a-zA-Z\_]+ 可以匹配至少由一個數字、字母或下劃線組成的字符串。如‘a100‘,‘0_z‘,‘Py003‘ [a-zA-Z\_][0-9a-zA-Z\_]*可以匹配由字母下劃線開頭，後接任意一個數字、字母下劃線組成的字符串 [a-zA-Z\_][0-9a-zA-Z\_]{0,19}限制了變量的長度是1-20個字符（前面一個字符+後面最多19個字符） ^ 表示行的開頭，^\d 表示必須以數字開頭 $ 表示行的結束 \d$表示必須以數字結束 ************************************************ 分組除了簡單的判斷是否匹配之外，正則表達式還有提取子串的強大功能。用（）表示的就是要提取的分組（group） ************************************************ 1、group()或group(0)就是匹配正則表達式整體結果例： import re test=‘010-12345‘ m= re.match(r‘^\d{3}\-\d{3,8}$‘,test) print (m.span(),m.group()) >> ((0, 9), ‘010-12345‘) 2、group(1)列出第一個括號匹配的部分，group(2)列出第二個括號匹配的部分. 如果正則表達式中沒有括號，group(1)肯定是不對的了。例： import re a=‘123abc456‘ print (re.search(‘([0-9]*)([a-z]*)([0-9]*)‘,a).group()) print (re.search(‘([0-9]*)([a-z]*)([0-9]*)‘,a).group(1)) print (re.search(‘([0-9]*)([a-z]*)([0-9]*)‘,a).group(2)) print (re.search(‘([0-9]*)([a-z]*)([0-9]*)‘,a).group(3)) >> 123abc456 123 abc 456 3、groups() 返回一個包含所有小組字符串的元組，從1到所含的小組號例： import re a=‘123abc456‘ print (re.search(‘([0-9]*)([a-z]*)([0-9]*)‘,a).group()) print (re.search(‘([0-9]*)([a-z]*)([0-9]*)‘,a).group(1)) print (re.search(‘([0-9]*)([a-z]*)([0-9]*)‘,a).group(2)) print (re.search(‘([0-9]*)([a-z]*)([0-9]*)‘,a).group(3)) print (re.search(‘([0-9]*)([a-z]*)([0-9]*)‘,a).groups()) >> 123abc456 123 abc 456 (‘123‘, ‘abc‘, ‘456‘) 重要看這條！！！ ********************************************** 貪婪匹配 ********************************************** 正則匹配默認是貪婪匹配，也就是匹配盡可能多的字符。例： import re print (re.match(r‘^(\d+)(0*)$‘,‘1012300‘).groups()) >> (‘1012300‘, ‘‘) 此處\d+采用貪婪匹配，直接把後面的0也給匹配了，導致0*智能匹配空字符串加個？就可以讓\d+采用非貪婪匹配例： import re print (re.match(r‘^(\d+?)(0*)$‘,‘1012300‘).groups()) >> (‘10123‘, ‘00‘)

python第一篇：正則表達式的方法簡單歸納

python第一篇：正則表示式的方法簡單歸納

python第一篇：正則表達式的方法簡單歸納

Shell第二篇：正則表達式和文本處理工具

第二篇：正則表達式

python學習筆記9：正則表達式

Python學習【四】：正則表達式

Python：正則表達式（三）*、+、？的用法

Python：正則表達式

Python：正則表達式 re 模塊

小tips：正則表達式中的RegExp.$1

Day13：正則表達式

Python開發基礎-Day15正則表達式爬蟲應用，configparser模塊和subprocess模塊

Python日誌分析與正則表達式

Python爬蟲之利用正則表達式爬取內涵吧

Python網絡爬蟲-正則表達式

python網絡爬蟲-正則表達式和BeautifulSoup

JAVASE01---Unit02：正則表達式、 Object 、包裝類

Python基礎知識之正則表達式re模塊

兄弟連學Python（06）---- 正則表達式匹配規則

python 爬蟲入門之正則表達式一

python第一篇：正則表達式的方法簡單歸納

相關推薦