我被提供了一个源文件,用于填充MySQL数据库。我使用Python脚本来清理和转换数据。下面是我试图解决的问题的一个小例子。
源文件
Area | Team |
---------------------------------------
NORTH EAST - CO.DURHAM BL | Winters |
NORTH EAST - NEWCASTLE GT | Summers |
...... |
SOUTH EAST - SOUTHAMPTON BI | Winters |第2阶段经清洗后转换为DB暂存表等
ID | Region | City | State | Team |
----------------------------------------------------|
1 | NORTH EAST | DURHAM | BL | Winters |
2 | NORTH EAST | NEWCASTLE | GT | Summers |
......
80 | SOUTH EAST | SOUTHAMPTON | BI | Winters |第3阶段-为每个区域分组生成区域ID
ID | Region ID | Region | City | State | Team |
-------------------------------------------|-----------|---------|
1 | 1 | NORTH EAST | DURHAM | BL | Winters |
2 | 1 | NORTH EAST | NEWCASTLE | GT | Summers |
......
80 | 10 | SOUTH EAST | SOUTHAMPTON | BI | Winters |是否可以为mysql中的分组生成序列号?例如,在上面的场景中,是否可以生成RegionID列,以便东北地区分组为1,东南区域分组为10等等。这将如何实现?
注:有自动生成序列的方法吗?例如,区域id列中的值是否可以自动生成,而不必在区域表中手动指定id?
发布于 2021-05-24 20:34:40
在DENSE_RANK函数的帮助下求解。下面的sql代码段代码将动态生成区域id。
SELECT
DENSE_RANK() OVER (ORDER BY Region) as RegionID,
Region
FROM StagingTable
ORDER BY Region发布于 2021-05-24 19:59:54
CREATE TABLE regions (
region_id INT PRIMARY KEY
region varchar(10),
key(region)
);
INSERT INTO regions VALUES
(1, 'NORTH EAST'), (10,'SOUTH EAST'), etc...
UPDATE db_staging_table AS d
JOIN regions AS r USING (region)
SET d.region_id = r.region_id;如果您使用的是MySQL 8.0,您可以这样做:
WITH cte (region, region_id) AS (
SELECT region, ROW_NUMBER() OVER()
FROM db_staging_table
GROUP BY region
)
UPDATE db_staging_table AS d JOIN cte ON d.region = cte.region
SET d.region_id = cte.region_id;https://stackoverflow.com/questions/67678107
复制相似问题