我有一个熊猫栏,它以下列格式以列表的形式存储数据:
text
[['Mark','PERSON'],['Data Scientist','TITLE'], ['Berlin','LOC'], ['Python','SKILLS'], ['Tableau,','SKILLS'], ['SQL','SKILLS'], ['AWS','SKILLS']]
[['John','PERSON'],['Data Engineer','TITLE'], ['London','LOC'], ['Python','SKILLS'], ['DB2,','SKILLS'], ['SQL','SKILLS']
[['Pearson','PERSON'],['Intern','TITLE'], ['Barcelona','LOC'], ['Python','SKILLS'], ['Excel,','SKILLS'], ['SQL','SKILLS']
[['Broody','PERSON'],['Manager','TITLE'], ['Barcelona','LOC'], ['Team Management','SKILLS'], ['Excel,','SKILLS'], ['Good Communications','SKILLS']
[['Rita','PERSON'],['Software Developer','TITLE'], ['London','LOC'], ['Dot Net','SKILLS'], ['SQl Server,','SKILLS'], ['VS Code,'SKILLS']作为输出,我想看到的是:
PERSON TITLE LOC SKILLS
Mark Data Scientist Berlin Python, Tableau, SQL, AWS
John Data Engineer London Python, DB2,SQL.其他输入行也是如此。
因此,从本质上说,将数据拆分为",“并将左边部分存储在",”作为列标题,而将“右边的部分”存储为值。
我怎样才能做到这一点?
发布于 2021-02-09 19:17:50
如果您有一个像这样的数据框架叫做"df":,那么
index text
0 1 [[Mark, PERSON], [Data Scientist, TITLE], [Ber...
1 2 [[John, PERSON], [Data Engineer, TITLE], [Lond...
2 3 [[Pearson, PERSON], [Intern, TITLE], [Barcelon...
3 4 [[Broody, PERSON], [Manager, TITLE], [Barcelon...
4 5 [[Rita, PERSON], [Software Developer, TITLE], ...person=[]
skills=[]
title=[]
loc=[]
temp=[]
for i in range(len(df['text'])):
for j in range(len(df['text'][i])):
if df['text'][i][j][1]=='PERSON':
person.append(df['text'][i][j][0])
elif df['text'][i][j][1]=='TITLE':
title.append(df['text'][i][j][0])
elif df['text'][i][j][1]=='LOC':
loc.append(df['text'][i][j][0])
elif df['text'][i][j][1]=='SKILLS':
temp.append(df['text'][i][j][0].replace(",", ""))
skills.append(",".join(temp))
temp=[] PERSON TITLE LOC SKILLS
0 Mark Data Scientist Berlin Python,Tableau,SQL,AWS
1 John Data Engineer London Python,DB2,SQL
2 Pearson Intern Barcelona Python,Excel,SQL
3 Broody Manager Barcelona Team Management,Excel,Good Communications
4 Rita Software Developer London Dot Net,SQl Server,VS Codehttps://stackoverflow.com/questions/66123833
复制相似问题