首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何根据分隔符之前的值创建新列,并在Pandas中的分隔符之后添加值?

如何根据分隔符之前的值创建新列,并在Pandas中的分隔符之后添加值?
EN

Stack Overflow用户
提问于 2022-02-15 07:29:32
回答 1查看 268关注 0票数 1

我正在读潘达斯的.txt。我希望根据“=>”之前的值创建新的列名,并根据“=>”之后的值在该列中添加值。

输入

代码语言:javascript
复制
other_tags
"access"=>"agricultural","agricultural"=>"no"
"access"=>"customers"
"access"=>"customers","amenity"=>"parking"
"description"=>"GRAND PERE DE MON AMI"
"description"=>"GRAND PERE"
"design"=>"asymmetric","power"=>"tower"
"design"=>"asymmetric","power"=>"tower"

代码语言:javascript
复制
import pandas as pd
df1 = pd.read_csv('try.txt', sep ='\t')
df1['access_type'] = df1['other_tags'].get('access')
df1['amenity'] = df1['other_tags'].get('amenity')
print(df1)

expected_output

代码语言:javascript
复制
other_tags                                         access_type                            amenity
"access"=>"agricultural","agricultural"=>"no"      agricultural 
"access"=>"customers"                              customers    
"access"=>"customers","amenity"=>"parking"         customers                              parking
"description"=>"GRAND PERE DE MON AMI"      
"description"=>"GRAND PERE"     
"design"=>"asymmetric","power"=>"tower"     
"design"=>"asymmetric","power"=>"tower"     
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-02-15 07:43:15

爆炸并展开列other_tags,然后将其枢轴,最后将其加入到原始数据框架中。

代码语言:javascript
复制
df = df.join(df['other_tags'].str.replace('"', '')
                 .str.split(',').explode().str.split('=>', expand=True)
                 .reset_index().pivot('index', 0, 1).fillna(''))

输出

代码语言:javascript
复制
>>> df
                                      other_tags        access agricultural  amenity            description      design  power
0  "access"=>"agricultural","agricultural"=>"no"  agricultural           no                                                   
1                          "access"=>"customers"     customers                                                                
2     "access"=>"customers","amenity"=>"parking"     customers               parking                                          
3         "description"=>"GRAND PERE DE MON AMI"                                      GRAND PERE DE MON AMI                   
4                    "description"=>"GRAND PERE"                                                 GRAND PERE                   
5        "design"=>"asymmetric","power"=>"tower"                                                             asymmetric  tower
6        "design"=>"asymmetric","power"=>"tower"                                                             asymmetric  tower

如果只想保留2列accessamenity

代码语言:javascript
复制
COLS_TO_KEEP = ['access', 'amenity']

df = df.join(df['other_tags'].str.replace('"', '')
                 .str.split(',').explode().str.split('=>', expand=True)
                 .reset_index().pivot('index', 0, 1)[COLS_TO_KEEP].fillna(''))

输出:

代码语言:javascript
复制
>>> df
                                      other_tags        access  amenity
0  "access"=>"agricultural","agricultural"=>"no"  agricultural         
1                          "access"=>"customers"     customers         
2     "access"=>"customers","amenity"=>"parking"     customers  parking
3         "description"=>"GRAND PERE DE MON AMI"                       
4                    "description"=>"GRAND PERE"                       
5        "design"=>"asymmetric","power"=>"tower"                       
6        "design"=>"asymmetric","power"=>"tower"                       

更新

获取此错误ValueError:索引包含重复条目,无法重新构造

如果我稍微修改一下您的dataframe:

代码语言:javascript
复制
                                      other_tags
0  "access"=>"agricultural","agricultural"=>"no"
1                          "access"=>"customers"
2     "access"=>"customers","amenity"=>"parking"
3         "description"=>"GRAND PERE DE MON AMI"
4                    "description"=>"GRAND PERE"
5        "design"=>"asymmetric","power"=>"tower"
6        "design"=>"asymmetric","power"=>"tower"
7            "access"=>"BONJOUR","access"=>"TOI"  # <- same tag on the same row

使用pivot_table而不是pivot

代码语言:javascript
复制
COLS_TO_KEEP = ['access', 'amenity']
df = df.join(
  df['other_tags'].str.replace('"', '')
    .str.split(',').explode().str.split('=>', expand=True)
    .rename(columns={0: 'tag', 1: 'value'}).reset_index()
    .pivot_table('value', 'index', 'tag', aggfunc=','.join)[COLS_TO_KEEP].fillna('')
)

输出:

代码语言:javascript
复制
>>> df
                                      other_tags        access  amenity
0  "access"=>"agricultural","agricultural"=>"no"  agricultural         
1                          "access"=>"customers"     customers         
2     "access"=>"customers","amenity"=>"parking"     customers  parking
3         "description"=>"GRAND PERE DE MON AMI"                       
4                    "description"=>"GRAND PERE"                       
5        "design"=>"asymmetric","power"=>"tower"                       
6        "design"=>"asymmetric","power"=>"tower"                       
7            "access"=>"BONJOUR","access"=>"TOI"   BONJOUR,TOI         
#                                             HERE ---^-----^
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71122550

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档