问推算缺失的值
EN

Stack Overflow用户

提问于 2020-07-07 00:19:26

回答 1查看 65关注 0票数 1

我希望过滤talentpool_subset数据帧，以便只从location列捕获城市和州(它当前包含这样的字符串，"Software Developer in London，in“)。我尝试将NaN值替换为0，并通过将数据框子集设置为只返回NaN值来确认我已经做到了这一点，这就像预期的那样返回了一个空数据框。但每次运行最后一条语句时，我都会得到这样的错误："ValueError: cannot mask with array containing / NaN values“为什么会发生这种情况？

talentpool_subset = talentpool_df[['name', 'profile', 'location','skills']]
talentpool_subset

talentpool_subset['location'].fillna(0, inplace=True)
location = talentpool_subset['location'].isna()
talentpool_subset[location]

talentpool_subset[talentpool_subset['location'].str.contains(r'(?<=in).*')]

    name    profile     url     source  github  location    skills  tags_strong     tags_expert     is_available    description
0   Hugo L. Samayoa     DevOps Developer    https://www.toptal.com/resume/hugo-l-samayoa    toptal  NaN     DevOps Developer in Long Beach, CA, United States   {"Paradigms":["Agile Software Development","Sc...   NaN     ["Linux System Administration","VMware ESXi","...   available   "DevOps before DevOps" is a term mostly associ...
1   Stepan Yakovenko    Software Developer  https://www.toptal.com/resume/stepan-yakovenko  toptal  stiv-yakovenko  Software Developer in Novosibirsk, Novosibirsk...   {"Platforms":["Debian Linux","Windows","Linux"...   ["Linux","C++","AngularJS"]     ["Java","HTML5","CSS","JavaScript","MySQL","Hi...   available   Stepan is an experienced software developer wi...
2   Slobodan Gajic  Software Developer  https://www.toptal.com/resume/slobodan-gajic    toptal  bobangajicsm    Software Developer in Sremska Mitrovica, Vojvo...   {"Platforms":["Firebase","XAMPP"],"Storage":["...   ["Firebase","Karma"]    ["jQuery","HTML5","CSS3","Git","JavaScript","S...   available   Slobodan is a front-end developer with a Bache...
4   Jennifer Aquino     Query Optimization Developer    https://www.toptal.com/resume/jennifer-aquino   toptal  BlueCamelArt    Query Optimization Developer in West Ryde, New...   {"Paradigms":["Automation","ETL Implementation...   ["Data Warehouse","Unix","Oracle 10g","Automat...   ["SQL","SQL Server Integration Services (SSIS)...   available   Jennifer has five years of professional experi...

regex

pandas

nan

回答 1

Stack Overflow用户

发布于 2020-07-07 13:18:48

这里假设目标是获取位置，并且不需要使用掩码进行定位。下面的代码使用.extract()在location列中只保留city, state。

例如:来自DevOps Developer in Long Beach, CA, United States的Long Beach, CA, United States。

# Import libraries
import pandas as pd
import numpy as np


# Create list using text from question
name = ['Hugo L. Samayoa','Stepan Yakovenko','Slobodan Gajic','Bruno Furtado Montes Oliveira','Jennifer Aquino']
profile = ['DevOps Developer','Software Developer','Software Developer','Visual Studio Team Services (VSTS) Developer','Query Optimization Developer']
url = ['https://www.toptal.com/resume/hugo-l-samayoa','https://www.toptal.com/resume/stepan-yakovenko','https://www.toptal.com/resume/slobodan-gajic','https://www.toptal.com/resume/bruno-furtado-mo...','https://www.toptal.com/resume/jennifer-aquino']
source = ['toptal','toptal','toptal','toptal','toptal']
github = [np.nan, 'stiv-yakovenko','bobangajicsm','brunofurmon','BlueCamelArt']
location = ['DevOps Developer in Long Beach, CA, United States', 'Software Developer in Novosibirsk, Novosibirsk','Software Developer in Sremska Mitrovica, Vojvo','Visual Studio Team Services (VSTS) Developer in New York','Query Optimization Developer in West Ryde, New York']
skills = ['{"Paradigms":["Agile Software Development","Sc...', '{"Platforms":["Debian Linux","Windows","Linux"...','{"Platforms":["Firebase","XAMPP"],"Storage":["...','{"Paradigms":["Agile","CQRS","Azure DevOps"],"...','{"Paradigms":["Automation","ETL Implementation...']

# Create DataFrame using list above
talentpool_df = pd.DataFrame({
    'name':name,
    'profile':profile,
    'url':url,
    'source':source,
    'github':github,
    'location':location,
    'skills':skills
})

# Add NaN row to DataFrame
talentpool_df.loc[6,:] = np.nan

# Subset DataFrame to get columns of interest
talentpool_subset = talentpool_df[['name', 'profile', 'location','skills']]

# Use .extract() to keep only text after 'in' in the 'location' column
talentpool_subset['location'] = talentpool_subset['location'].str.extract(r'((?<=in).*)')

输出

talentpool_subset

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/62760253

复制

相似问题

问推算缺失的值
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问推算缺失的值EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问推算缺失的值
EN