SQL & Hadoop系列 -- Spark Dataframe LIKE NOT LIKE RLIKE
阿新 • • 發佈:2021-08-30
LIKE condition is used in situation when you don't know the exact value or you are looking for some specific word pattern in the output. LIKE is similar as in SQL and can be used to specify any pattern in WHERE/FILTER or even in JOIN conditions.
sql中有like 和 rlike,具體區別:
- like不是正則,而是萬用字元;
- rlike是正則,正則的寫法與java一樣;
Spark LIKE
找出所有名字以James開頭的記錄:
scala> df_pres.filter($"pres_name".like("James%")).select($"pres_name",$"pres_dob",$"pres_bs").show() +-----------------+----------+--------------+ | pres_name| pres_dob| pres_bs| +-----------------+----------+--------------+ | James Madison|1751-03-16| Virginia| | James Monroe|1758-04-28| Virginia| | James K. Polk|1795-11-02|North Carolina| | James Buchanan|1791-04-23| Pennsylvania| |James A. Garfield|1831-11-19| Ohio| +-----------------+----------+--------------+
相關的規則
%:匹配零個及多個任意字元
_:與任意單字元匹配
[]:匹配一個範圍
[^]:排除一個範圍
ESCAPE 關鍵字定義轉義符 WHERE ColumnA LIKE '%5/%%' ESCAPE '/'
Spark NOT LIKE
如果我們想使用not like,跟like的使用相比,我們只需要在列名前加上"!"即可
scala> df_pres.filter(!$"pres_name".like("James%")).select($"pres_name",$"pres_dob",$"pres_bs").show() +--------------------+----------+--------------------+ | pres_name| pres_dob| pres_bs| +--------------------+----------+--------------------+ | George Washington|1732-02-22| Virginia| | John Adams|1735-10-30| Massachusetts| | Thomas Jefferson|1743-04-13| Virginia| | John Quincy Adams|1767-07-11| Massachusetts| | Andrew Jackson|1767-03-15|South/North Carolina| | Martin Van Buren|1782-12-05| New York| |William Henry Har...|1773-02-09| Virginia| | John Tyler|1790-03-29| Virginia| | Zachary Taylor|1784-11-24| Virginia| | Millard Fillmore|1800-01-07| New York| | Franklin Pierce|1804-11-23| New Hampshire| | Abraham Lincoln|1809-02-12| Kentucky| | Andrew Johnson|1808-12-29| North Carolina| | Ulysses S. Grant|1822-04-27| Ohio| | Rutherford B. Hayes|1822-10-04| Ohio| | Chester A. Arthur|1829-10-05| Vermont| | Grover Cleveland|1837-03-18| New Jersey| | Benjamin Harrison|1833-08-20| Ohio| | Grover Cleveland|1837-03-18| New Jersey| | William McKinley|1843-01-29| Ohio| +--------------------+----------+--------------------+ only showing top 20 rows
Spark RLIKE
找出所有名字不以James或John開頭的記錄:
scala> df_pres.filter(!$"pres_name".rlike("^(James|John)(.*)+$")).select($"pres_name",$"pres_dob",$"pres_bs").show()
+--------------------+----------+--------------------+
| pres_name| pres_dob| pres_bs|
+--------------------+----------+--------------------+
| George Washington|1732-02-22| Virginia|
| Thomas Jefferson|1743-04-13| Virginia|
| Andrew Jackson|1767-03-15|South/North Carolina|
| Martin Van Buren|1782-12-05| New York|
|William Henry Har...|1773-02-09| Virginia|
| Zachary Taylor|1784-11-24| Virginia|
| Millard Fillmore|1800-01-07| New York|
| Franklin Pierce|1804-11-23| New Hampshire|
| Abraham Lincoln|1809-02-12| Kentucky|
| Andrew Johnson|1808-12-29| North Carolina|
| Ulysses S. Grant|1822-04-27| Ohio|
| Rutherford B. Hayes|1822-10-04| Ohio|
| Chester A. Arthur|1829-10-05| Vermont|
| Grover Cleveland|1837-03-18| New Jersey|
| Benjamin Harrison|1833-08-20| Ohio|
| Grover Cleveland|1837-03-18| New Jersey|
| William McKinley|1843-01-29| Ohio|
| Theodore Roosevelt|1858-10-27| New York|
| William Howard Taft|1857-09-15| Ohio|
| Woodrow Wilson|1856-12-28| Virginia|
+--------------------+----------+--------------------+
only showing top 20 rows