我需要获得每个用户ID的一行,并且遇到了一个问题,即在应该有一个县的情况下,有一个县的双重输入。例如,“棕榈滩”和“棕榈滩县”是为单个用户录制的。这导致我的查询为单个用户返回两个相同的行,差异位于县列中。
当前的输出如下(这是一个超级简化的版本,只显示两列):
+---------+-------------------+
| User ID | County |
+---------+-------------------+
| 1 | Bay |
| 2 | Palm Beach |
| 2 | PALM BEACH COUNTY |
| 3 | Collin |
| 3 | COLLIN COUNTY |
| 4 | Colusa County |
| 5 | Fresno |
+---------+-------------------+产出不应包括“县”一词,而应全部资本化。输出应该如下所示:
+---------+------------+
| User ID | County |
+---------+------------+
| 1 | BAY |
| 2 | PALM BEACH |
| 3 | COLLIN |
| 4 | COLUSA |
| 5 | FRESNO |
+---------+------------+到目前为止我一直在用
UPPER(REPLACE(addr.COUNTY,'COUNTY','')) AS COUNTY然后手动删除excel工作表中的额外字段。问题是,有那么多,这已经变得非常低效。
有一个政策确实有“县”这个词,所以我试着排除其中包含" county“的值,但是这样就带走了那些只有一条记录的用户。
对于如何更有效地解决这一问题,有什么想法吗?
发布于 2020-04-14 15:13:50
您可以使用替换和组(由代替)以获得所需的结果:
with sample_data as
(
select 1 user_id, 'Bay' county from dual
union all
select 2 user_id, 'Palm Beach' from dual
union all
select 2 user_id, 'PALM BEACH COUNTY' county from dual
union all
select 3 user_id, 'Collin' county from dual
union all
select 3 user_id, 'COLLIN COUNTY' county from dual
union all
select 4 user_id, 'Colusa County' county from dual
union all
select 5 user_id, 'Fresno' county from dual
)
select user_id,
upper(replace(lower(county), ' county', '')) county
from
sample_data
group by
user_id,
upper(replace(lower(county), ' county', ''))
order by
user_id;
USER_ID COUNTY
---------- -----------------
1 BAY
2 PALM BEACH
3 COLLIN
4 COLUSA
5 FRESNO 现在有了SQL,它为您提供了所需的结果。因此,与其从源表中删除,不如创建一个比删除它们快得多的新表:
CREATE new_table AS
select user_id,
upper(replace(lower(county), ' county', '')) county
from
sample_data
group by
user_id,
upper(replace(lower(county), ' county', ''));https://stackoverflow.com/questions/61210709
复制相似问题