1. 程式人生 > >刪除表中重複資料的sql

刪除表中重複資料的sql

一方法:

查詢表中重複資料  select * from employee where employeeId in (select employeeId from employee group by employeeId having count(employeeId) > 1)

刪除表中多餘的重複記錄,重複記錄是根據單個欄位(employeeId)來判斷,只留有rowid最小的記錄  delete from employee where employeeId in (select employeeId from employee group by employeeId having count(employeeId) > 1)  and rowid not in (select min(rowid) from employee group by employeeId having count(employeeId )>1)

查詢表中多餘的重複記錄(多個欄位)  select * from employee e  where (e.employeeId,e.phoneNo) in (select employeeId,phoneNo from employee group by employeeId,phoneNo having count(*) > 1)

刪除表中多餘的重複記錄(多個欄位),只留有rowid最小的記錄  delete from employee e  where (e.employeeId, e.phoneNo) in (select employeeId, phoneNo from employee group by employeeId,phoneNo having count(*) > 1)  and rowid not in (select min(rowid) from employee group by employeeId,phoneNo having count(*)>1)

查詢表中多餘的重複記錄(多個欄位),不包含rowid最小的記錄  select * from employee e  where (e.employeeId,e.phoneNo) in (select employeeId, phoneNo from employee group by employeeId,phoneNo having count(*) > 1)  and rowid not in (select min(rowid) from employee group by employeeId,phoneNo having count(*)>1)

通用:

delete from table t  where (t.欄位1, t.欄位2, … , t.欄位n) in (select 欄位1, 欄位2, … , 欄位n from table group by 欄位1, 欄位2, … , 欄位n having count(*) > 1)  and rowid not in (select min(rowid) from table group by 欄位1, 欄位2, … , 欄位n having count(*)>1)

此外:

如果只是查詢的時候, 不顯示重複, 只需select distinct 欄位 from table…. --------------------- 

二刪除方法:DELETE FROM hr.employees t1                      WHERE t1.ROWID NOT IN (                        SELECT MIN(t2.ROWID)                        FROM hr.employees t2                        GROUP BY t2.employee_id --按照想要唯一保留的欄位進行分組                       );

這個明顯就比方法一好多了,子查詢中我們先選除了rowid,然後按照我們想要保留的唯一欄位進行分組,並取每組最小的rowid(注意是子查詢表的rowid);然後在用not in刪除除開最小的rowid以外的所有記錄

  怎麼樣,這個方法是不是瞬間解決並且非常好理解?但是你以為這樣就結束了?no no no

三刪除方法:DELETE FROM hr.employees t1
              WHERE t1.rowid > (
                   SELECT MIN(t2.rowid)
                   FROM hr.employees t2
                   WHERE t1.employee_id = t2.employee_id --按照想要唯一保留的欄位進行匹配
                 );

這個方式看起來和方法二差不多,但是想要說的是,他用的是連線,他用的是連線,不敢說連線一定比group by快,但是基本上不會輸group by,而且在一般的情況下也是最快的了。而且外層的">"可以用到索引,就是各種快。

  方法也同樣說一下,子查詢中按照要保留的欄位對t1和t2進行關聯,然後選擇出最小的rowid(注意是子查詢表的rowid),然後在外層用">"只保留每個匹配結果最小的一條記錄。然後就瞬間刪除重複的記錄