我的熊猫DataFrame出了点小问题。
如图所示,第一行的released_date为"Released 2006“,而同一列的所有其他值的格式为"Released MMM”。
我想将released_date下的第一个单元格拆分为"Released“和"2006",将"2006”复制到year列,然后将所有内容移动一列。有什么想法吗?
当前格式:
...|**released_date**| **year** | **genre** | ...
...| Released 2006 | Arcade | Comic |...所需的输出格式:
...|**released_date**| **year** | **genre** | ...
...| Released | 2006 | Arcade |...提前感谢!!

下面是读取文件的代码:
import pandas as pd
df = pd.read_csv("IndieGameCSV/page_1.csv", \
names=["Windows","Mac","Linux","engine","release_date","year","genre1",\
"theme","players","score_final","rating", "link" ], index_col=False)下面是如图所示的数据:
True, False, True,Custom Built,Released 2006,Arcade,Comic,Single Player, 10,1 v, http://indiedb.com/games/tux-climber,
True, True, True,Custom Built,Released Oct 20, 2014,Role Playing,Fantasy,MMO, 7.3,45 , http://indiedb.com/games/pokemon-planet,
True, True, True,Ren'py,Released May 16, 2015,Turn Based Strategy,Noire,Single Player, 9,1 v, http://indiedb.com/games/black-closet,
True, True, False,ShiVa3D,Released Jan 2, 2015,First Person Shooter,Sci-Fi,Single Player, 7.8,4 v, http://indiedb.com/games/kumoon,发布于 2015-09-24 08:50:00
您可以使用str.extract方法提取年份:
In [11]: df["release_date"].str.extract("(\d{4})")
Out[11]:
0 2006
1 2014
2 2015
3 2015
Name: "release_date", dtype: object如果您想拆分DataFrame,您还可以查看.str.match,以检查列是否与正则表达式匹配:
In [12]: df["release_date"].str.match("Released \d{4}")
Out[12]:
0 True
1 False
2 False
3 False
Name: "release_date", dtype: bool并使用this和~this对df进行索引。
https://stackoverflow.com/questions/32751077
复制相似问题