我有一个数据框架,如下所示:
>>> df
Date Name 1st Column 2nd Column 3rd Column
0 2021/05/01 A 0.0 0.0 1.573127e+06
1 2021/07/01 A 0.0 0.0 1.507486e+06
2 2023/05/01 A 0.0 0.0 1.317854e+06
3 2016/08/01 A 0.0 0.0 0.000000e+00
4 2016/11/01 A 0.0 0.0 0.000000e+00
.. ... ... ... ... ...
160 2019/08/01 A 0.0 0.0 1.621895e+06
161 2021/01/01 A 0.0 0.0 1.693617e+06
162 2021/10/01 A 0.0 0.0 1.479616e+06
163 2025/02/01 A 0.0 0.0 1.296158e+06
164 2025/06/01 A 0.0 0.0 1.325505e
[165 rows x 5 columns]我想用文本文件中的排序选项卡分隔的值替换一些零:
Date 1/2019 2/2019 3/2019 4/2019 5/2019 6/2019 7/2019 8/2019 9/2019 10/2019 11/2019 12/2019 1/2020 2/2020 3/2020 4/2020 5/2020 6/2020 7/2020 8/2020 9/2020 10/2020 11/2020 12/2020 1/2021 2/2021 3/2021 4/2021 5/2021 6/2021 7/2021 8/2021 9/2021 10/2021 11/2021 12/2021 1/2022 2/2022 3/2022 4/2022 5/2022 6/2022 7/2022 8/2022 9/2022 10/2022 11/2022 12/2022 1/2023 2/2023 3/2023 4/2023 5/2023 6/2023 7/2023 8/2023 9/2023 10/2023 11/2023 12/2023 1/2024 2/2024 3/2024 4/2024 5/2024 6/2024 7/2024 8/2024 9/2024 10/2024 11/2024 12/2024 1/2025 2/2025 3/2025 4/2025 5/2025 6/2025 7/2025 8/2025 9/2025 10/2025 11/2025 12/2025 1/2026
1st Column 3,197423109 3,199271438 3,201119768 3,205836429 3,210549655 3,139294108 3,044097425 2,948900742 2,855464295 2,842043348 2,849479962 2,856916577 2,864353191 2,87182361 2,879294029 2,883960121 2,888617532 2,969237657 3,073817093 3,172887513 3,270197696 3,24771483 3,196074523 3,149663844 3,079303752 3,035528235 2,995261308 2,919925073 2,860230995 2,800496835 2,740882913 2,710733322 2,680583731 2,690211691 2,708139529 2,72083774 2,757485364 2,768058092 2,775122231 2,81794729 2,844999222 2,872025753 2,899128487 2,81798565 2,713110718 2,608235786 2,50512109 2,474120803 2,463978077 2,453760035 2,443541992 2,419794799 2,396097461 2,372425051 2,348677859 2,424831471 2,524717178 2,619093869 2,711710323 2,704387968 2,67253926 2,645701347 2,60987964 2,598712878 2,591032768 2,55817305 2,525288737 2,491362789 2,457436841 2,415412747 2,373388653 2,359188961 2,34865806 2,356650046 2,373625828 2,379772984 2,382327375 2,410008316 2,421711325 2,41884542 2,415979515 2,350753715 2,277362479 2,203971243 2,132340243
2nd Column -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000不会搞砸日期的。例如:
>>> df_new.sort_values(["Date"])
Date Name 1st Column 2nd Column 3rd Column
100 2012/04/01 A 0.0 0.0 0.000000e+00
139 2012/05/01 A 0.0 0.0 0.000000e+00
105 2012/06/01 A 0.0 0.0 0.000000e+00
78 2012/07/01 A 0.0 0.0 0.000000e+00
16 2012/08/01 A 0.0 0.0 0.000000e+00
.. ... ... ... ... ...
45 2025/08/01 A 2,41884542 -550000 1.330365e+06
46 2025/09/01 A 2,415979515 -550000 1.328789e+06
32 2025/10/01 A 2,350753715 -550000 1.292915e+06
152 2025/11/01 A 2,277362479 -550000 1.252549e+06
8 2025/12/01 A 2,203971243 -550000 1.212184e+06
[165 rows x 5 columns]请注意,日期是不同格式的YYYY/MM/DD和m/YYYY。
这是如何做到的呢?谢谢!
我可以很容易地扩展选项卡分隔的值,以涵盖166个月,也可以将前导零添加到月份,使其成为MM/YYYY.如果这能让解决方案更容易。
发布于 2020-07-03 11:27:53
Idea是在列中使用DatetimeIndex创建DatetimeIndex,然后通过DataFrame.T转置
df1 = pd.read_csv(file, sep="\t", index_col=[0])
df1.columns = pd.to_datetime(df1.columns)
df1 = df1.T
print (df1)
Date 1stColumn 2ndColumn
2019-01-01 3,197423109 -550000
2019-02-01 3,199271438 -550000
2019-03-01 3,201119768 -550000
2019-04-01 3,205836429 -550000
2019-05-01 3,210549655 -550000
... ...
2025-09-01 2,415979515 -550000
2025-10-01 2,350753715 -550000
2025-11-01 2,277362479 -550000
2025-12-01 2,203971243 -550000
2026-01-01 2,132340243 -550000
[85 rows x 2 columns]然后在原始数据中创建DatetimeIndex,用来自第二个DataFrame的值替换0值,然后使用DataFrame.fillna替换不匹配的值到0。
df = df.set_index('Date')
df.index = pd.to_datetime(df.index)
df = df.mask(df.eq(0), df1).fillna(0)
print (df)
Name 1stColumn 2ndColumn 3rdColumn
Date
2021-05-01 A 2,860230995 -550000 1.573127e+06
2021-07-01 A 2,740882913 -550000 1.507486e+06
2023-05-01 A 2,396097461 -550000 1.317854e+06
2016-08-01 A 0 0 0.000000e+00
2016-11-01 A 0 0 0.000000e+00
2019-08-01 A 2,948900742 -550000 1.621895e+06
2021-01-01 A 3,079303752 -550000 1.693617e+06
2021-10-01 A 2,690211691 -550000 1.479616e+06
2025-02-01 A 2,356650046 -550000 1.296158e+06
2025-06-01 A 2,410008316 -550000 1.325505e编辑:如果想要相同的格式,比如df中的数据,首先在df1中的列中转换日期时间,然后使用DatetimeIndex.strftime来格式化YYYY/MM/DD
df1 = pd.read_csv(file, sep="\t", index_col=[0])
df1.columns = pd.to_datetime(df1.columns).strftime('%Y/%m/%d')
df1 = df1.T
print (df1)
Date 1stColumn 2ndColumn
2019/01/01 3,197423109 -550000
2019/02/01 3,199271438 -550000
2019/03/01 3,201119768 -550000
2019/04/01 3,205836429 -550000
2019/05/01 3,210549655 -550000
... ...
2025/09/01 2,415979515 -550000
2025/10/01 2,350753715 -550000
2025/11/01 2,277362479 -550000
2025/12/01 2,203971243 -550000
2026/01/01 2,132340243 -550000
[85 rows x 2 columns]df = df.mask(df.eq(0), df1).fillna(0).reset_index()
print (df)
Date Name 1stColumn 2ndColumn 3rdColumn
0 2021/05/01 A 2,860230995 -550000 1.573127e+06
1 2021/07/01 A 2,740882913 -550000 1.507486e+06
2 2023/05/01 A 2,396097461 -550000 1.317854e+06
3 2016/08/01 A 0 0 0.000000e+00
4 2016/11/01 A 0 0 0.000000e+00
5 2019/08/01 A 2,948900742 -550000 1.621895e+06
6 2021/01/01 A 3,079303752 -550000 1.693617e+06
7 2021/10/01 A 2,690211691 -550000 1.479616e+06
8 2025/02/01 A 2,356650046 -550000 1.296158e+06
9 2025/06/01 A 2,410008316 -550000 1.325505ehttps://stackoverflow.com/questions/62714225
复制相似问题