我试图获得对线程评论的答复。下面是我通过解析JSON所能完成的任务:
subreddit = 'wallstreetbets'
link = 'https://oauth.reddit.com/r/'+subreddit+'/hot'
hot = requests.get(link,headers = headers)
hot.json()这是输出
{'kind': 'Listing',
'data': {'after': 't3_x8kidp',
'dist': 27,
'modhash': None,
'geo_filter': None,
'children': [{'kind': 't3',
'data': {'approved_at_utc': None,
'subreddit': 'wallstreetbets',
'selftext': '**Read [rules](https://www.reddit.com/r/wallstreetbets/wiki/contentguide), follow [Twitter](https://twitter.com/Official_WSB) and [IG](https://www.instagram.com/official_wallstreetbets/), join [Discord](https://discord.gg/wallstreetbets), see [ban bets](https://www.reddit.com/r/wallstreetbets/wiki/banbets)!**\n\n[dm mods because why not](https://www.reddit.com/message/compose/?to=/r/wallstreetbets)\n\n[Earnings Thread](https://wallstreetbets.reddit.com/x4ryjg)',
'author_fullname': 't2_bd6q5',
'saved': False,
'mod_reason_title': None,
'gilded': 0,
'clicked': False,
'title': 'What Are Your Moves Tomorrow, September 08, 2022',
'link_flair_richtext': [{'e': 'text', 't': 'Daily Discussion'}],
'subreddit_name_prefixed': 'r/wallstreetbets',
'hidden': False,
'pwls': 7,
'link_flair_css_class': 'daily',
'downs': 0,
'thumbnail_height': None,
'top_awarded_type': None,
'hide_score': False,
'name': 't3_x8ev67',
...
'created_utc': 1662594703.0,
'num_crossposts': 0,
'media': None,
'is_video': False}}],
'before': None}}然后,我将它转换成一个数据框架。
df = pd.DataFrame()
for post in hot.json()['data']['children']:
df = df.append({
'subreddit' : post['data']['subreddit'],
'title': post['data']['title'],
'selftext': post['data']['selftext'],
'created_utc': post['data']['created_utc'],
'id': post['data']['id']
}, ignore_index = True)有了这个,我就能够获得一个类似于这个DataFrame的数据框架
然后,为了获得注释,我从26个帖子中创建了一个包含所有JSON脚本的列表,然后创建了一个while循环来遍历json脚本。
supereme = len(list_of_comments)
indexy = pd.DataFrame()
while supereme > 0:
supereme -= 1
for g in range(0,len(list_of_comments[supereme]['data']['children'])-1):
indexy = pd.concat([indexy, pd.DataFrame.from_records([{
'body': list_of_comments[supereme]['data']['children'][g]['data']['body'],
'post_id': list_of_comments[supereme]['data']['children'][g]['data']['parent_id'] }])], ignore_index = True)
indexy这给了我这个:DataFrame,但是,我无法得到对评论的答复。有什么帮助吗?我试过这样做
posts = 26
for i in np.arange(0,27):
print('i',i)
if len(list_of_comments[i]['data']['children']) == 0:
continue
for j in np.arange(0,len(list_of_comments[i]['data']['children'])):
if len(list_of_comments[i]['data']['children'][j]['data']['replies']) == 0:
break
else:
print('j',len(list_of_comments[i]['data']['children'][j]['data']['replies']))
for z in np.arange(len(list_of_comments[i]['data']['children'][j]['data']['replies']['data']['children'])):
if len(list_of_comments[i]['data']['children'][j]['data']['replies']['data']['children']) == 0:
break
print('z',z)
print(list_of_comments[i]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['body'])第一个循环有点工作,但它没有正确计数,以获得所有的回复,所有的帖子,它只会拉一个或两个。我们不想用婴儿车
发布于 2022-09-10 05:13:18
x=len(list_of_comments)
replies = pd.DataFrame()
for i in range(0,len(list_of_comments)):
try:
for j in range(0, len(list_of_comments[x]['data']['children'])):
try:
for z in range(0, len(list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'])):
#print(list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['body'])
#print(list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['link_id'])
replies = pd.concat([replies, pd.DataFrame.from_records([{
'body': list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['body'],
'post_id': list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['link_id']
}])], ignore_index = True)
except:
pass
except:
continuehttps://stackoverflow.com/questions/73644019
复制相似问题