我正在读潘达斯的.txt。我希望根据“=>”之前的值创建新的列名,并根据“=>”之后的值在该列中添加值。
输入
other_tags
"access"=>"agricultural","agricultural"=>"no"
"access"=>"customers"
"access"=>"customers","amenity"=>"parking"
"description"=>"GRAND PERE DE MON AMI"
"description"=>"GRAND PERE"
"design"=>"asymmetric","power"=>"tower"
"design"=>"asymmetric","power"=>"tower"码
import pandas as pd
df1 = pd.read_csv('try.txt', sep ='\t')
df1['access_type'] = df1['other_tags'].get('access')
df1['amenity'] = df1['other_tags'].get('amenity')
print(df1)expected_output
other_tags access_type amenity
"access"=>"agricultural","agricultural"=>"no" agricultural
"access"=>"customers" customers
"access"=>"customers","amenity"=>"parking" customers parking
"description"=>"GRAND PERE DE MON AMI"
"description"=>"GRAND PERE"
"design"=>"asymmetric","power"=>"tower"
"design"=>"asymmetric","power"=>"tower" 发布于 2022-02-15 07:43:15
爆炸并展开列other_tags,然后将其枢轴,最后将其加入到原始数据框架中。
df = df.join(df['other_tags'].str.replace('"', '')
.str.split(',').explode().str.split('=>', expand=True)
.reset_index().pivot('index', 0, 1).fillna(''))输出
>>> df
other_tags access agricultural amenity description design power
0 "access"=>"agricultural","agricultural"=>"no" agricultural no
1 "access"=>"customers" customers
2 "access"=>"customers","amenity"=>"parking" customers parking
3 "description"=>"GRAND PERE DE MON AMI" GRAND PERE DE MON AMI
4 "description"=>"GRAND PERE" GRAND PERE
5 "design"=>"asymmetric","power"=>"tower" asymmetric tower
6 "design"=>"asymmetric","power"=>"tower" asymmetric tower如果只想保留2列access和amenity
COLS_TO_KEEP = ['access', 'amenity']
df = df.join(df['other_tags'].str.replace('"', '')
.str.split(',').explode().str.split('=>', expand=True)
.reset_index().pivot('index', 0, 1)[COLS_TO_KEEP].fillna(''))输出:
>>> df
other_tags access amenity
0 "access"=>"agricultural","agricultural"=>"no" agricultural
1 "access"=>"customers" customers
2 "access"=>"customers","amenity"=>"parking" customers parking
3 "description"=>"GRAND PERE DE MON AMI"
4 "description"=>"GRAND PERE"
5 "design"=>"asymmetric","power"=>"tower"
6 "design"=>"asymmetric","power"=>"tower" 更新
获取此错误ValueError:索引包含重复条目,无法重新构造
如果我稍微修改一下您的dataframe:
other_tags
0 "access"=>"agricultural","agricultural"=>"no"
1 "access"=>"customers"
2 "access"=>"customers","amenity"=>"parking"
3 "description"=>"GRAND PERE DE MON AMI"
4 "description"=>"GRAND PERE"
5 "design"=>"asymmetric","power"=>"tower"
6 "design"=>"asymmetric","power"=>"tower"
7 "access"=>"BONJOUR","access"=>"TOI" # <- same tag on the same row使用pivot_table而不是pivot
COLS_TO_KEEP = ['access', 'amenity']
df = df.join(
df['other_tags'].str.replace('"', '')
.str.split(',').explode().str.split('=>', expand=True)
.rename(columns={0: 'tag', 1: 'value'}).reset_index()
.pivot_table('value', 'index', 'tag', aggfunc=','.join)[COLS_TO_KEEP].fillna('')
)输出:
>>> df
other_tags access amenity
0 "access"=>"agricultural","agricultural"=>"no" agricultural
1 "access"=>"customers" customers
2 "access"=>"customers","amenity"=>"parking" customers parking
3 "description"=>"GRAND PERE DE MON AMI"
4 "description"=>"GRAND PERE"
5 "design"=>"asymmetric","power"=>"tower"
6 "design"=>"asymmetric","power"=>"tower"
7 "access"=>"BONJOUR","access"=>"TOI" BONJOUR,TOI
# HERE ---^-----^https://stackoverflow.com/questions/71122550
复制相似问题