MySQL 求中位數 兩個案例
阿新 • • 發佈:2020-08-12
MySQL 求中位數 兩個案例
簡單案例
使用者變數法:
SET @rownum := -1;
SELECT
AVG(t.mark) as median_num
FROM
(SELECT @rownum:=@rownum + 1 AS rowindex,
marks AS mark
FROM median_even
# (或medain_odd)
ORDER BY marks) AS t
WHERE t.rowindex IN (FLOOR(@rownum / 2) , CEIL(@rownum / 2));
原理:
- 排序,預設asc
- 定義使用者變數,為資料進行標號(從0開始,初始取-1)
- 無論求中位數的欄位的個數是奇數還是偶數,都取最大編號的一半的
floor()
和CEIL()
,就是向下和向上取整,然後求平均,結果總是中位數
關乎使用者變數(SET @xxx:=x)的拓展:https://blog.csdn.net/JesseYoung/article/details/40779631
複雜案例
題目:https://leetcode-cn.com/problems/median-employee-salary/
SELECT Id, Company, Salary FROM Employee WHERE Id in ( SELECT e1.Id FROM Employee e1 JOIN Employee e2 ON e1.Company = e2.Company GROUP BY e1.Id HAVING SUM(CASE WHEN e1.Salary >= e2.Salary THEN 1 ELSE 0 END) >= COUNT(*)/2 AND SUM(CASE WHEN e1.Salary <= e2.Salary THEN 1 ELSE 0 END) >= COUNT(*)/2 ) GROUP BY Company, Salary ORDER BY Company
-- 然後將工資>=的數量與COUNT(*)/2,進行對比,將工資<=的數量與COUNT(*)/2 進行對比
-- 還不理解的是 HAVING SUM(CASE WHEN e1.Salary >= e2.Salary THEN 1 ELSE 0 END) >= COUNT(*)/2 AND SUM(CASE WHEN e1.Salary <= e2.Salary THEN 1 ELSE 0 END) >= COUNT(*)/2的結果是什麼
-
首先對錶進行自連線,主鍵為id,但這裡on的是company,使得比如id為1的A就會對應六次A,每行兩個A且對應著兩列工資。內連線為工資的對比建立了基礎。(因為這裡是"各個公司的員工工資中位數",因此要內連線到Company)
-
然後按照id分組,使得每個id的行數是id所在公司的員工數,每個id都對應該公司所有員工的工資
-
對於下面語句的理解是
HAVING SUM(CASE WHEN e1.Salary >= e2.Salary THEN 1 ELSE 0 END) >= COUNT(*)/2 AND SUM(CASE WHEN e1.Salary <= e2.Salary THEN 1 ELSE 0 END) >= COUNT(*)/2
也就是說明這裡的
then 1
其實是在選子集,而不是真的一個一個為1然後sum。這句語句的真正含義其實是選出兩個子集中的重複值
另外拓展:https://leetcode-cn.com/problems/find-median-given-frequency-of-numbers/
方法一:
SELECT
AVG(Number)median
FROM
(SELECT n1.Number FROM Numbers n1 JOIN Numbers n2 ON n1.Number>=n2.Number
GROUP BY
n1.Number
HAVING
SUM(n2.Frequency)>=(SELECT SUM(Frequency) FROM Numbers)/2
AND
SUM(n2.Frequency)-AVG(n1.Frequency)<=(SELECT SUM(Frequency) FROM Numbers)/2
)s
- 核心原理:如果 n1.Number 為中位數,n1.Number(包含本身)前累計的數字應大於等於總數/2 同時n1.Number(不包含本身)前累計數字應小於等於總數/2
方法二:
select avg(n) as median from
(
select Number as n, @c1 + 1 as 'c1', (@c1 := @c1 + Frequency) as 'c2', t2.s
from Numbers, (select @c1 := 0) t1, (select sum(Frequency) as s
from Numbers) t2
order by n
) tmp
where c1 <= s/2 + 1 and c2 >= s/2
拓展題目還未完全理解。