好的,阿杰,我给你整理一份 SQL 去重的三种常用方法汇总,包括 DISTINCTGROUP BY 和 ROW_NUMBER() OVER,并附上示例和适用场景。


📝 SQL 去重三种方法汇总

1️⃣ 使用 DISTINCT 去重

语法

SELECT DISTINCT column1, column2, ...
FROM table_name;

示例

假设表 users 有如下数据:

idnamecity
1AliceBeijing
2BobShanghai
3AliceBeijing
4AliceShanghai

去重查询:

SELECT DISTINCT name, city
FROM users;

结果:

namecity
AliceBeijing
AliceShanghai
BobShanghai

适用场景

  • 简单去掉 完全重复的行
  • 查询多列组合唯一值

2️⃣ 使用 GROUP BY 去重

语法

SELECT column1, column2, ...
FROM table_name
GROUP BY column1, column2, ...;

示例

SELECT name, city
FROM users
GROUP BY name, city;

结果与 DISTINCT 相同:

namecity
AliceBeijing
AliceShanghai
BobShanghai

优势

  • 可以顺便进行 聚合统计
SELECT name, city, COUNT(*) AS cnt
FROM users
GROUP BY name, city;

结果:

namecitycnt
AliceBeijing1
AliceShanghai1
BobShanghai1

适用场景

  • 去重并进行 聚合统计
  • 需要统计每类数据数量时

3️⃣ 使用 ROW_NUMBER() OVER 去重(高级去重)

语法

SELECT *
FROM (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY column1, column2 ORDER BY id) AS rn
    FROM table_name
) t
WHERE rn = 1;

示例

去掉 users 表中 name, city 重复的记录,只保留每组的第一条:

SELECT *
FROM (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY name, city ORDER BY id) AS rn
    FROM users
) t
WHERE rn = 1;

结果:

idnamecityrn
1AliceBeijing1
4AliceShanghai1
2BobShanghai1

优势

  • 精确控制保留哪一条(按 id、日期、分数等排序)
  • 可删除重复行时非常有用
DELETE FROM users
WHERE id NOT IN (
    SELECT id
    FROM (
        SELECT id,
               ROW_NUMBER() OVER (PARTITION BY name, city ORDER BY id) AS rn
        FROM users
    ) t
    WHERE rn = 1
);

适用场景

  • 表中有主键或唯一标识,需要按规则保留一条记录
  • 删除重复行

4️⃣ 总结对比

方法去重粒度是否可聚合是否可控制保留哪条适用场景
DISTINCT全行简单去重
GROUP BY指定列去重 + 聚合统计
ROW_NUMBER() OVER指定列高级去重、删除重复行、按条件保留