1. 程式人生 > 其它 >資料庫表中重複資料的刪除策略

資料庫表中重複資料的刪除策略

技術標籤:sql資料庫後端

概要

我們在做C/S或B/S系統開發時候,常常遇到一種場景。因為移動端操作不慎或網路狀況不佳,造成伺服器端重複錄入資料。因為專案倉促上線,伺服器端程式碼並沒有對於重複資料檢查的處理,造成大量重複資料進入資料庫表中。

本文介紹一種基於T-SQL指令碼的解決方案。背景為在通過移動端錄入每個城市ATM機資訊時,出現大量重複資料錄入。

資料表

包括ATM機的基本資訊,如業務Id,定位,地址,所在城市。具體資料初始化見附錄。

create table t_atms(
	Id int primary key identity(1,1),
	ATMId char(4) not
null, ATMLocation varchar(100) not null, ATMAddress varchar(200) not null, City varchar(50) not null, )

實現目標

通過一個T-SQL指令碼,刪除重複錄入的ATM機資料。對於重複錄入的資料的認定,ATMId和ATMLocation完全相同,則為重複資料,取最後一條錄入的為保留資料,其它的刪除。

重複資料刪除策略

首先我們定義四種類型的資料,正常資料,保留資料,髒資料和重複資料:

  • 正常錄入沒有重複的資料,簡稱為正常資料;
  • ATMId和ATMLocation完全相同的資料中,最後一次錄入的,簡稱為保留資料;
  • ATMId和ATMLocation完全相同的資料中,不是最後一次錄入的,簡稱為髒資料。
  • 保留資料和髒資料的總和簡稱為重複資料。

基本思路

  1. 在所有資料中,過濾掉正常資料,獲取重複資料。
  2. 在重複資料中過濾出保留資料。
  3. 在重複資料中將剩餘髒資料刪除。

程式碼實現

為了更清楚的表達我們的基本思路,我們採用CTE,分割程式碼,逐步實現我們的刪除策略。

;with CTE_MAX_Duplicted as(
	select max(Id) as MaxId,ATMId, ATMLocation from t_atms
	group by ATMId, ATMLocation
	having count(*) >
1 ) , CTE_Duplicted_Deleted as( select a.Id from CTE_MAX_Duplicted d inner join t_atms a on a.ATMId = d.ATMId and a.ATMLocation = d.ATMLocation and a.Id != d.MaxId ) merge into t_atms as a using CTE_Duplicted_Deleted as d on a.Id = d.Id when matched then delete; select * from t_atms
  1. ATM表按照 ATMId, ATMLocation分組,如果組內資料超過一條,證明存在重複資料,以此過濾掉正常資料。
  2. ATM表Id採用自增策略,在ATMId, ATMLocation相同的情況下,Id數值越大,表示越晚錄入。通過max聚合函式,找到保留資料。
  3. 將保留資料與t_atms進行不等值連線,從重複資料中過濾掉保留資料,獲取髒資料的Id。
  4. 刪除所有髒資料,將CTE_Duplicted_Deleted中的髒資料與t_atms進行merge操作,如果匹配則刪除。本例中的資料量很少,現實情況考慮到幾萬甚至幾十萬資料量的情況下,所以採用merge操作。

附錄

if object_id('t_atms') is not null 
	drop table t_atms
create table t_atms(
Id int primary key identity(1,1),
ATMId char(4) not null,
ATMLocation varchar(100) not null,
ATMAddress varchar(200) not null,
City varchar(50) not null,
)
insert into t_atms values 
('0101',	'TEEN TALWAR BRANCH',	'TEEN TALWAR, GROUND FLOOR, HAMILTON COURT COMPLEX, BLOCK 7, KDA SCHEME # 5, CLIFTON, KARACHI',	'KARACHI'),
('0111',	'SCHON CIRCLE BRANCH',	'PLOT NO. G-19/3,BLOCK-9,KEHKHSAN, CLIFTON, KARACHI',	'KARACHI'),
('0131',	'KORANGI BRANCH ATM1',	'PLOT NO. SC-7 (ST-17), SECTOR 15, KORANGI INDUSTRIAL AREA, KARACHI',	'KARACHI'),
('0132',	'KORANGI BRANCH ATM2',	'PLOT NO. SC-7 (ST-17), SECTOR 15, KORANGI INDUSTRIAL AREA, KARACHI',	'KARACHI'),
('0211',	'MAIN BRANCH HEAD OFFICE ATM1',	'MAIN BRANCH, OPPOSITE HABIB BANK PLAZA, I.I. CHUNDRIGAR ROAD, KARACHI',	'KARACHI'),
('0213',	'MAIN BRANCH (INSIDE) ATM2',	'MAIN BRANCH, OPPOSITE HABIB BANK PLAZA, I.I. CHUNDRIGAR ROAD, KARACHI',	'KARACHI'),
('0241',	'GULSHAN-E-IQBAL BRANCH ATM1',	'SB-9 BLOCK 13-B, UNIVERSITY ROAD, GULSHAN-E-IQBAL,KARACHI',	'KARACHI'),
('0242',	'GULSHAN-E-IQBAL BRANCH ATM2',	'SB-9 BLOCK 13-B, UNIVERSITY ROAD, GULSHAN-E-IQBAL,KARACHI',	'KARACHI'),
('0721',	'CLIFTON W.T.C. BRANCH ATM1',	'WORLD TRADE CENTER, 10, KHY-E-ROOMI, CLIFTON, KARACHI',	'KARACHI'),
('0722',	'CLIFTON W.T.C. BRANCH ATM2',	'WORLD TRADE CENTER, 10, KHY-E-ROOMI, CLIFTON, KARACHI',	'KARACHI'),


('0211',	'MAIN BRANCH HEAD OFFICE ATM1',	'MAIN BRANCH, OPPOSITE HABIB BANK PLAZA, I.I. CHUNDRIGAR ROAD, KARACHI',	'KARACHI'),
('0722',	'CLIFTON W.T.C. BRANCH ATM2',	'WORLD TRADE CENTER, 10, KHY-E-ROOMI, CLIFTON, KARACHI',	'KARACHI'),
('0213',	'MAIN BRANCH (INSIDE) ATM2',	'MAIN BRANCH, OPPOSITE HABIB BANK PLAZA, I.I. CHUNDRIGAR ROAD, KARACHI',	'KARACHI'),
('0213',	'MAIN BRANCH (INSIDE) ATM2',	'MAIN BRANCH, OPPOSITE HABIB BANK PLAZA, I.I. CHUNDRIGAR ROAD, KARACHI',	'KARACHI'),
('0131',	'KORANGI BRANCH ATM1',	'PLOT NO. SC-7 (ST-17), SECTOR 15, KORANGI INDUSTRIAL AREA, KARACHI'	,'KARACHI'),


('0652',	'MODEL TOWN BRANCH ATM2',	'SHOP # 26 CENTRAL COMMERCIAL MARKET, C-BLOCK MODEL TOWN, LAHORE' ,'LAHORE'),
('0742',	'MALL ROAD BRANCH ATM2',	'47 THE MALL ROAD, LAHORE'	,'LAHORE'),
('1292',	'ALLAMA IQBAL TOWN BRANCH ATM2',	'23 - PAK BLOCK, ALLAMA IQBAL TOWN, LAHORE'	,'LAHORE'),

('0791',	'UNDP TOWER BRANCH',	'SAUDI PAK TOWER,61-A,JINNAH AVENUE, ISLAMABAD'	,'ISLAMABAD'),
('0861',	'F-8 MARKAZ BRANCH',	'GN SHOPPING CENTER, AL BABAR PLAZA, F-8 MARKAZ, ISLAMABAD'	,'ISLAMABAD'),
('0331',	'SHAHRAH-E-QUAID AZAM BRANCH ATM1',	'35-SHAHRAE QUAID-E-AZAM, THE MALL PESHAWAR'	,'PESHAWAR'),
('0321',	'HAIDER ROAD BRANCH ATM1',	'55 HAIDER ROAD BRANCH RAWALPINDI CANTT'	,'RAWALPINDI'),


('1292',	'ALLAMA IQBAL TOWN BRANCH ATM2',	'23 - PAK BLOCK, ALLAMA IQBAL TOWN, LAHORE'	,'LAHORE'),
('0331',	'SHAHRAH-E-QUAID AZAM BRANCH ATM1',	'35-SHAHRAE QUAID-E-AZAM, THE MALL PESHAWAR'	,'PESHAWAR'),
('0331',	'SHAHRAH-E-QUAID AZAM BRANCH ATM1',	'35-SHAHRAE QUAID-E-AZAM, THE MALL PESHAWAR'	,'PESHAWAR'),


('1292',	'ALLAMA IQBAL TOWN BRANCH ATM2',	'23 - PAK BLOCK, ALLAMA IQBAL TOWN, LAHORE'	,'LAHORE'),
('1292',	'ALLAMA IQBAL TOWN BRANCH ATM2',	'23 - PAK BLOCK, ALLAMA IQBAL TOWN, LAHORE'	,'LAHORE')