我有一个如下所示的df:
answerRequired answerTime choiceId \
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
5 NaN NaN NaN
6 NaN NaN NaN
7 NaN NaN NaN
8 NaN NaN NaN
9 NaN NaN NaN
10 NaN NaN NaN
11 NaN NaN NaN
12 NaN NaN NaN
13 False 1.564541e+12 1542213646976
14 False 1.564541e+12 1542213646984
15 True 1.564541e+12 1542213646994
16 True 1.564541e+12 1542213647040
17 True 1.564541e+12 1542213647041
18 True 1.564541e+12 1542213647042
19 True 1.564541e+12 1542213647043
20 False 1.564541e+12 NaN
choiceLabel \
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 Give it a shot! Hit the arrow below! Don't be ...
14
15 T-Shirts
16 Band / Music
17 Fun
18 TV
19 Movies
20 NaN
exportLabel logicalType \
0 Participant ID NaN
1 Viewed NaN
2 Started NaN
3 Completed NaN
4 Time spent (HH:MM:SS.SSS) NaN
5 Country NaN
6 City NaN
7 IP NaN
8 Operating System NaN
9 Browser NaN
10 Device NaN
11 External ID NaN
12 Warnings NaN
13 It's all about the green arrow! (not that Gree... singleSelection
14 Make your choice. 2. Hit the green arrow at th... singleSelection
15 What are you most interested in? (Pick one) (T... singleSelection
16 We have the threads that you want! What kind o... multipleSelection
17 We have the threads that you want! What kind o... multipleSelection
18 We have the threads that you want! What kind o... multipleSelection
19 We have the threads that you want! What kind o... multipleSelection
20 NaN text
question questionId \
0 NaN participantId
1 NaN viewTime
2 NaN startedTime
3 NaN completedTime
4 NaN timeSpent
5 NaN country_name
6 NaN city
7 NaN ip
8 NaN os
9 NaN browser
10 NaN device
11 NaN externalId
12 NaN warnings
13 It's all about the green arrow! (not that Gree... 1542213646975
14 Make your choice. 2. Hit the green arrow at th... 1542213646983
15 What are you most interested in? (Pick one) 1542213646991
16 We have the threads that you want! What kind o... 1542213647039
17 We have the threads that you want! What kind o... 1542213647039
18 We have the threads that you want! What kind o... 1542213647039
19 We have the threads that you want! What kind o... 1542213647039
20 Almost Done! Enter Your Email Address! 1542213647050
questionOrder subType type value \
0 NaN NaN id -Ll4truw3KbSjVRtXmJy
1 NaN NaN time 2019-07-31T02:41:34.063Z
2 NaN NaN time 2019-07-31T02:44:37.732Z
3 NaN NaN time 2019-07-31T02:44:57.936Z
4 NaN NaN time 00:00:00.000
5 NaN NaN location Unknown
6 NaN NaN location Roslindale
7 NaN NaN location
8 NaN NaN device macOS 10.14
9 NaN NaN device Firefox 68.0
10 NaN NaN device
11 NaN NaN id
12 NaN NaN info []
13 0.0 singleSelection mediaGallery True
14 2.0 singleSelection mediaGallery True
15 4.0 singleSelection mediaGallery True
16 12.0 multipleSelection mediaGallery True
17 12.0 multipleSelection mediaGallery True
18 12.0 multipleSelection mediaGallery True
19 12.0 multipleSelection mediaGallery True
20 14.0 NaN emailBox 123456789@yellow.com
visualType
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 mediaGallery
14 mediaGallery
15 mediaGallery
16 mediaGallery
17 mediaGallery
18 mediaGallery
19 mediaGallery
20 emailBox 我如何切割数据帧,并将其转换为如下所示:

我试过这个:
但这些都是在没有聚合数据的情况下改变数据的。
在高层次上,我想:
将exportLabel列值翻转为列,将value列值转换为来自exportLabel的列值下的值,仅在question列为空的情况下。
然后,我想将question列值翻转为非空列,并将choicelabel列中的值翻转到问题列下。注意相同问题的问题,被折叠成一列。例外情况是,question列中的最后一个值可以在value列下进行选择。
我们现在可以删除其余的列。此外,我还可以发布原来的json字符串,我正试图从API中扁平化。
编辑:
下面是json字符串:
/*
* 提示:该行代码过长,系统自动注释不进行高亮。一键复制会移除系统注释
* {"id":"4","survey_id":"-L","response_id":"-L","response_url":"data":[{"type":"id","questionId":"participantId","exportLabel":"Participant ID","value":"-Ll4truw3KbSjVRtXmJy"},{"type":"time","questionId":"viewTime","exportLabel":"Viewed","value":"2019-07-31T02:41:34.063Z"},{"type":"time","questionId":"startedTime","exportLabel":"Started","value":"2019-07-31T02:44:37.732Z"},{"type":"time","questionId":"completedTime","exportLabel":"Completed","value":"2019-07-31T02:44:57.936Z"},{"type":"time","questionId":"timeSpent","exportLabel":"Time spent (HH:MM:SS.SSS)","value":"00:00:00.000"},{"type":"location","questionId":"country_name","exportLabel":"Country","value":"Unknown"},{"type":"location","questionId":"city","exportLabel":"City","value":"Roslindale"},{"type":"location","questionId":"ip","exportLabel":"IP","value":""},{"type":"device","questionId":"os","exportLabel":"Operating System","value":"macOS 10.14"},{"type":"device","questionId":"browser","exportLabel":"Browser","value":"Firefox 68.0"},{"type":"device","questionId":"device","exportLabel":"Device","value":""},{"type":"id","questionId":"externalId","exportLabel":"External ID","value":""},{"type":"info","questionId":"warnings","exportLabel":"Warnings","value":[]},{"logicalType":"singleSelection","choiceId":"1542213646976","choiceLabel":"Give it a shot! Hit the arrow below! Don't be shy!","exportLabel":"It's all about the green arrow! (not that Green Arrow!) 1. Make your choice. 2. Hit the green arrow at the bottom! (Give it a shot! Hit the arrow below! Don't be shy!)","value":true,"type":"mediaGallery","visualType":"mediaGallery","answerRequired":false,"questionId":1542213646975,"questionOrder":0,"question":"It's all about the green arrow! (not that Green Arrow!) 1. Make your choice. 2. Hit the green arrow at the bottom!","subType":"singleSelection","answerTime":1564541080009},{"logicalType":"singleSelection","choiceId":"1542213646984","choiceLabel":"","exportLabel":"Make your choice. 2. Hit the green arrow at the bottom! ()","value":true,"type":"mediaGallery","visualType":"mediaGallery","answerRequired":false,"questionId":1542213646983,"questionOrder":2,"question":"Make your choice. 2. Hit the green arrow at the bottom!","subType":"singleSelection","answerTime":1564541081044},{"logicalType":"singleSelection","choiceId":"1542213646994","choiceLabel":"T-Shirts","exportLabel":"What are you most interested in? (Pick one) (T-Shirts)","value":true,"type":"mediaGallery","visualType":"mediaGallery","answerRequired":true,"questionId":1542213646991,"questionOrder":4,"question":"What are you most interested in? (Pick one)","subType":"singleSelection","answerTime":1564541083354},{"logicalType":"multipleSelection","choiceId":"1542213647040","choiceLabel":"Band / Music","exportLabel":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!) (Band / Music)","value":true,"type":"mediaGallery","visualType":"mediaGallery","answerRequired":true,"questionId":1542213647039,"questionOrder":12,"question":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!)","subType":"multipleSelection","answerTime":1564541086280},{"logicalType":"multipleSelection","choiceId":"1542213647041","choiceLabel":"Fun","exportLabel":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!) (Fun)","value":true,"type":"mediaGallery","visualType":"mediaGallery","answerRequired":true,"questionId":1542213647039,"questionOrder":12,"question":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!)","subType":"multipleSelection","answerTime":1564541086280},{"logicalType":"multipleSelection","choiceId":"1542213647042","choiceLabel":"TV","exportLabel":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!) (TV)","value":true,"type":"mediaGallery","visualType":"mediaGallery","answerRequired":true,"questionId":1542213647039,"questionOrder":12,"question":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!)","subType":"multipleSelection","answerTime":1564541086280},{"logicalType":"multipleSelection","choiceId":"1542213647043","choiceLabel":"Movies","exportLabel":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!) (Movies)","value":true,"type":"mediaGallery","visualType":"mediaGallery","answerRequired":true,"questionId":1542213647039,"questionOrder":12,"question":"We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!)","subType":"multipleSelection","answerTime":1564541086280},{"type":"emailBox","visualType":"emailBox","answerRequired":false,"questionId":1542213647050,"questionOrder":14,"question":"Almost Done! Enter Your Email Address!","answerTime":1564541097466,"logicalType":"text","value":"123456789@yellow.com"}]}
*/我将字符串转换为第一个df,如下所示:
from pandas.io.json import json_normalize
import pandas as pd
import json
with open('jsonfile') as json_file:
data = json.load(json_normalize(json_file))
df = json_normalize(data['data'])发布于 2019-08-04 05:19:04
Idea是用boolean indexing按条件对行进行过滤,然后通过GroupBy.cumcount对计数器和DataFrame.unstack进行整形,如果需要的话,与原来添加的DataFrame.reindex顺序相同。
第一部分是:
df1 = data.loc[data['question'].isna(), ['exportLabel','value']]
print (df1)
exportLabel value
0 Participant ID -Ll4truw3KbSjVRtXmJy
1 Viewed 2019-07-31T02:41:34.063Z
2 Started 2019-07-31T02:44:37.732Z
3 Completed 2019-07-31T02:44:57.936Z
4 Time spent (HH:MM:SS.SSS) 00:00:00.000
5 Country Unknown
6 City Roslindale
7 IP
8 Operating System macOS 10.14
9 Browser Firefox 68.0
10 Device
11 External ID
12 Warnings []df11 = (df1.set_index([df1.groupby('exportLabel').cumcount(),
'exportLabel'])['value']
.unstack()
.rename_axis(None, axis=1)
.reindex(df1['exportLabel'].unique(), axis=1)
)
print (df11)
Participant ID Viewed Started \
0 -Ll4truw3KbSjVRtXmJy 2019-07-31T02:41:34.063Z 2019-07-31T02:44:37.732Z
Completed Time spent (HH:MM:SS.SSS) Country City IP \
0 2019-07-31T02:44:57.936Z 00:00:00.000 Unknown Roslindale
Operating System Browser Device External ID Warnings
0 macOS 10.14 Firefox 68.0 [] 第二:
df2 = data.loc[data['question'].notna(), ['question','value','choiceLabel']]
#if need replace all missing values by value column
#df2['choiceLabel'] = df2['choiceLabel'].fillna(df2['value'])
#if need replace only last value if missing
idx = df2.index[[-1]]
df2.loc[idx,'choiceLabel'] = df2.loc[idx,'choiceLabel'].fillna(df2.loc[idx,'value'])
print (df2)
question value \
13 It's all about the green arrow! (not that Gree... True
14 Make your choice. 2. Hit the green arrow at th... True
15 What are you most interested in? (Pick one) True
16 We have the threads that you want! What kind o... True
17 We have the threads that you want! What kind o... True
18 We have the threads that you want! What kind o... True
19 We have the threads that you want! What kind o... True
20 Almost Done! Enter Your Email Address! 123456789@yellow.com
choiceLabel
13 Give it a shot! Hit the arrow below! Don't be ...
14
15 T-Shirts
16 Band / Music
17 Fun
18 TV
19 Movies
20 123456789@yellow.com df21 = (df2.set_index([df2.groupby('question').cumcount(),
'question'])['choiceLabel']
.unstack()
.rename_axis(None, axis=1)
.reindex(df2['question'].unique(), axis=1)
)print (df21)
It's all about the green arrow! (not that Green Arrow!) 1. Make your choice. 2. Hit the green arrow at the bottom! \
0 Give it a shot! Hit the arrow below! Don't be ...
1 NaN
2 NaN
3 NaN
Make your choice. 2. Hit the green arrow at the bottom! \
0
1 NaN
2 NaN
3 NaN
What are you most interested in? (Pick one) \
0 T-Shirts
1 NaN
2 NaN
3 NaN
We have the threads that you want! What kind of tees live in your closet? (Pick one or more - we won't judge!) \
0 Band / Music
1 Fun
2 TV
3 Movies
Almost Done! Enter Your Email Address!
0 123456789@yellow.com
1 NaN
2 NaN
3 NaN https://stackoverflow.com/questions/57341960
复制相似问题