好的,阿杰,我给你整理一份 SQL 去重的三种常用方法汇总,包括 DISTINCT、GROUP BY 和 ROW_NUMBER() OVER,并附上示例和适用场景。
📝 SQL 去重三种方法汇总
1️⃣ 使用 DISTINCT 去重
语法
SELECT DISTINCT column1, column2, ...
FROM table_name;
示例
假设表 users 有如下数据:
| id | name | city |
|---|---|---|
| 1 | Alice | Beijing |
| 2 | Bob | Shanghai |
| 3 | Alice | Beijing |
| 4 | Alice | Shanghai |
去重查询:
SELECT DISTINCT name, city
FROM users;
结果:
| name | city |
|---|---|
| Alice | Beijing |
| Alice | Shanghai |
| Bob | Shanghai |
适用场景
- 简单去掉 完全重复的行
- 查询多列组合唯一值
2️⃣ 使用 GROUP BY 去重
语法
SELECT column1, column2, ...
FROM table_name
GROUP BY column1, column2, ...;
示例
SELECT name, city
FROM users
GROUP BY name, city;
结果与 DISTINCT 相同:
| name | city |
|---|---|
| Alice | Beijing |
| Alice | Shanghai |
| Bob | Shanghai |
优势
- 可以顺便进行 聚合统计
SELECT name, city, COUNT(*) AS cnt
FROM users
GROUP BY name, city;
结果:
| name | city | cnt |
|---|---|---|
| Alice | Beijing | 1 |
| Alice | Shanghai | 1 |
| Bob | Shanghai | 1 |
适用场景
- 去重并进行 聚合统计
- 需要统计每类数据数量时
3️⃣ 使用 ROW_NUMBER() OVER 去重(高级去重)
语法
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY column1, column2 ORDER BY id) AS rn
FROM table_name
) t
WHERE rn = 1;
示例
去掉 users 表中 name, city 重复的记录,只保留每组的第一条:
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY name, city ORDER BY id) AS rn
FROM users
) t
WHERE rn = 1;
结果:
| id | name | city | rn |
|---|---|---|---|
| 1 | Alice | Beijing | 1 |
| 4 | Alice | Shanghai | 1 |
| 2 | Bob | Shanghai | 1 |
优势
- 精确控制保留哪一条(按 id、日期、分数等排序)
- 可删除重复行时非常有用
DELETE FROM users
WHERE id NOT IN (
SELECT id
FROM (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY name, city ORDER BY id) AS rn
FROM users
) t
WHERE rn = 1
);
适用场景
- 表中有主键或唯一标识,需要按规则保留一条记录
- 删除重复行
4️⃣ 总结对比
| 方法 | 去重粒度 | 是否可聚合 | 是否可控制保留哪条 | 适用场景 |
|---|---|---|---|---|
| DISTINCT | 全行 | ❌ | ❌ | 简单去重 |
| GROUP BY | 指定列 | ✅ | ❌ | 去重 + 聚合统计 |
| ROW_NUMBER() OVER | 指定列 | ✅ | ✅ | 高级去重、删除重复行、按条件保留 |
发表回复