首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >字符串清理函数产生意外输出

字符串清理函数产生意外输出
EN

Stack Overflow用户
提问于 2020-07-30 08:19:21
回答 1查看 29关注 0票数 0

我希望清理以下数据帧的配置文件列:

代码语言:javascript
复制
name    profile
6   Pedro   ["\n Design ...
7   Leonardo    ["\n Design ...
8   Daniel  ["\n JavaScript ...
9   Mario   ["\n JavaScript ...
10  Christi     ["\n Design ...

我已经在单独的行上测试了以下函数...

代码语言:javascript
复制
def clean_profile(row):
    for index, row in new_df2["profile"].items():
        str_row = str(row)
        clean_row = (
            '""'.join(str_row)
            .replace(",", "")
            .replace('""', "")
            .replace("\\n                    ", "")
            .replace("                ", "")
        )
    return clean_row

...and找到它来转换这个字符串:

代码语言:javascript
复制
'["\\n                    Design                ","\\n                    Design                "]'

添加到这个清理过的字符串:

代码语言:javascript
复制
'["Design","Design"]'

(额外的替换方法对于清理非常杂乱的字符串是必要的,就像下面这样:)

代码语言:javascript
复制
'{"Tools ""    Google Analytics            ":null,"    Google Adsense            ":null,"    MailChimp            ":null,"    Google Adwords            ","Containers ""    Docker            ","Digital ""    SEO            ":null,"    Email Marketing            ":null,"    Article Writing            ":null,"    Market Research            ":null,"    Social Media            ":null,"    Inbound Marketing            ","*Nix ""    Ubuntu            ":null,"    Linux            ","Java ""    Java    ","Python ""    Django            ":null,"    Python    ":null,"    Flask            ","Databases ""    MySQL Management            ":null,"    MongoDB Management            ":null,"    PostgreSQL Management            ","Visual ""    Brand Design            ":null,"    Graphic Design            ":null,"    Logo Design            ","HTML ""    HTML    ","Version Control ""    Git            ","PHP ""    Laravel            ":null,"    Wordpress            ":null,"    PHP    ":null,"    Symfony            ","Mobile ""    React Native            ","Ruby ""    Ruby    ":null,"    Sinatra            ":null,"    Rails            ","Project Management ""    Agile Methodology            ":null,"    Client Management            ":null,"    Scrum            ","English ""    Written English    ":null,"    Spoken English            ","Configuration Management ""    Chef            ","Webserver ""    Nginx            ":null,"    Apache            ","CDN ""    AWS CloudFront            ":null,"    Cloudflare            ","Other ""    C++            ","Experience ""    Creative Direction            ":null,"    UI/UX Design            ":null,"    Wireframing            ","JavaScript ""    JavaScript    ":null,"    TypeScript            ":null,"    Redux            ":null,"    Angular JS            ":null,"    Angular            ":null,"    D3.js            ":null,"    Node.js            ":null,"    React            ":null,"    Flux            ":null,"    Express            ","CSS ""    SASS            ":null,"    LESS            ":null,"    CSS    ","Hosting ""    Heroku            ":null,"    Digital Ocean            ":null,"    AWS            ","Automated Testing ""    TDD            ":null,"    Automated Testing    ":null,"    BDD            ":null,"    Jest            ","Traditional ""    Outbound Marketing            ":null,"    Brand Strategy            ","Data Science ""    Data Science    ":null,"    Data Analysis            ":null,"    Machine Learning            ":null,"    Data Visualization            ":null,"    R            ":null,"    Statistics            "}'

当我循环遍历dataframe的所有行时,我得到对所有行重复的结果:

代码语言:javascript
复制
["JavaScriptDevOpsPHPJavaScriptDevOpsPHP"]

或者这样:

代码语言:javascript
复制
<function clean_profile at 0x0845CB20>

我已经尝试了一些不同的方法,但是worked...is没有人能够解释这里发生了什么,也许可以建议一个更好的方法来清理这些字符串?

谢谢!

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-09-10 14:06:02

看起来您没有在for循环外部声明初始clean_row变量,因此您的clean_row将始终等于您上次清除的任何字符串。

代码语言:javascript
复制
def clean_profile(row):
    clean_row = "" //added this line
    for index, row in new_df2["profile"].items():
        str_row = str(row)
        clean_row = (
            '""'.join(str_row)
            .replace(",", "")
            .replace('""', "")
            .replace("\\n                    ", "")
            .replace("                ", "")
        )
    return clean_row

此外,我还会研究一下用于字符串清理的strip函数。这里有一个很好的例子:https://www.geeksforgeeks.org/clean-the-string-data-in-the-given-pandas-dataframe/

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/63164346

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档