我希望过滤talentpool_subset数据帧,以便只从location列捕获城市和州(它当前包含这样的字符串,"Software Developer in London,in“)。我尝试将NaN值替换为0,并通过将数据框子集设置为只返回NaN值来确认我已经做到了这一点,这就像预期的那样返回了一个空数据框。但每次运行最后一条语句时,我都会得到这样的错误:"ValueError: cannot mask with array containing / NaN values“为什么会发生这种情况?
talentpool_subset = talentpool_df[['name', 'profile', 'location','skills']]
talentpool_subset
talentpool_subset['location'].fillna(0, inplace=True)
location = talentpool_subset['location'].isna()
talentpool_subset[location]
talentpool_subset[talentpool_subset['location'].str.contains(r'(?<=in).*')]
name profile url source github location skills tags_strong tags_expert is_available description
0 Hugo L. Samayoa DevOps Developer https://www.toptal.com/resume/hugo-l-samayoa toptal NaN DevOps Developer in Long Beach, CA, United States {"Paradigms":["Agile Software Development","Sc... NaN ["Linux System Administration","VMware ESXi","... available "DevOps before DevOps" is a term mostly associ...
1 Stepan Yakovenko Software Developer https://www.toptal.com/resume/stepan-yakovenko toptal stiv-yakovenko Software Developer in Novosibirsk, Novosibirsk... {"Platforms":["Debian Linux","Windows","Linux"... ["Linux","C++","AngularJS"] ["Java","HTML5","CSS","JavaScript","MySQL","Hi... available Stepan is an experienced software developer wi...
2 Slobodan Gajic Software Developer https://www.toptal.com/resume/slobodan-gajic toptal bobangajicsm Software Developer in Sremska Mitrovica, Vojvo... {"Platforms":["Firebase","XAMPP"],"Storage":["... ["Firebase","Karma"] ["jQuery","HTML5","CSS3","Git","JavaScript","S... available Slobodan is a front-end developer with a Bache...
4 Jennifer Aquino Query Optimization Developer https://www.toptal.com/resume/jennifer-aquino toptal BlueCamelArt Query Optimization Developer in West Ryde, New... {"Paradigms":["Automation","ETL Implementation... ["Data Warehouse","Unix","Oracle 10g","Automat... ["SQL","SQL Server Integration Services (SSIS)... available Jennifer has five years of professional experi...发布于 2020-07-07 13:18:48
这里假设目标是获取位置,并且不需要使用掩码进行定位。下面的代码使用.extract()在location列中只保留city, state。
例如:来自DevOps Developer in Long Beach, CA, United States的Long Beach, CA, United States。
# Import libraries
import pandas as pd
import numpy as np
# Create list using text from question
name = ['Hugo L. Samayoa','Stepan Yakovenko','Slobodan Gajic','Bruno Furtado Montes Oliveira','Jennifer Aquino']
profile = ['DevOps Developer','Software Developer','Software Developer','Visual Studio Team Services (VSTS) Developer','Query Optimization Developer']
url = ['https://www.toptal.com/resume/hugo-l-samayoa','https://www.toptal.com/resume/stepan-yakovenko','https://www.toptal.com/resume/slobodan-gajic','https://www.toptal.com/resume/bruno-furtado-mo...','https://www.toptal.com/resume/jennifer-aquino']
source = ['toptal','toptal','toptal','toptal','toptal']
github = [np.nan, 'stiv-yakovenko','bobangajicsm','brunofurmon','BlueCamelArt']
location = ['DevOps Developer in Long Beach, CA, United States', 'Software Developer in Novosibirsk, Novosibirsk','Software Developer in Sremska Mitrovica, Vojvo','Visual Studio Team Services (VSTS) Developer in New York','Query Optimization Developer in West Ryde, New York']
skills = ['{"Paradigms":["Agile Software Development","Sc...', '{"Platforms":["Debian Linux","Windows","Linux"...','{"Platforms":["Firebase","XAMPP"],"Storage":["...','{"Paradigms":["Agile","CQRS","Azure DevOps"],"...','{"Paradigms":["Automation","ETL Implementation...']
# Create DataFrame using list above
talentpool_df = pd.DataFrame({
'name':name,
'profile':profile,
'url':url,
'source':source,
'github':github,
'location':location,
'skills':skills
})
# Add NaN row to DataFrame
talentpool_df.loc[6,:] = np.nan
# Subset DataFrame to get columns of interest
talentpool_subset = talentpool_df[['name', 'profile', 'location','skills']]
# Use .extract() to keep only text after 'in' in the 'location' column
talentpool_subset['location'] = talentpool_subset['location'].str.extract(r'((?<=in).*)')输出
talentpool_subset

https://stackoverflow.com/questions/62760253
复制相似问题