我有一个数据集,它与Date1和Date2中的重叠日期类似
+------+------+------+------+-------+------------+------------+
| Key1 | Key2 | Key3 | Key4 | Value | Date1 | Date2 |
+------+------+------+------+-------+------------+------------+
| k1 | k2 | k3 | k4 | 10 | 2022-01-01 | 2026-01-30 |
| k1 | k2 | k3 | k4 | 12 | 2022-06-05 | 2026-01-10 |
| k1 | k2 | k3 | k4 | 14 | 2022-08-07 | 2026-01-15 |
+------+------+------+------+-------+------------+------------+我想解决这些重叠的问题,让日期像下面这样继续下去-
+------+------+------+------+-------+------------+------------+
| Key1 | Key2 | Key3 | Key4 | Value | Date1 | Date2 |
+------+------+------+------+-------+------------+------------+
| k1 | k2 | k3 | k4 | 10 | 2022-01-01 | 2022-06-04 |
| k1 | k2 | k3 | k4 | 12 | 2022-06-05 | 2022-08-06 |
| k1 | k2 | k3 | k4 | 14 | 2022-08-07 | 2026-01-15 |
+------+------+------+------+-------+------------+------------+从某种意义上说,new_date2 = old_date1 (下一个记录)-1
发布于 2022-11-23 07:13:33
您可以使用铅窗口函数。
df = df.withColumn('date2', F.expr('nvl(date_sub(lead(date1) over (order by date1), 1), date2)'))
df.show(truncate=False)https://stackoverflow.com/questions/74542501
复制相似问题