首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >JSON问题(没有PRAW)

JSON问题(没有PRAW)
EN

Stack Overflow用户
提问于 2022-09-08 05:14:44
回答 1查看 61关注 0票数 1

我试图获得对线程评论的答复。下面是我通过解析JSON所能完成的任务:

代码语言:javascript
复制
subreddit =  'wallstreetbets'
link = 'https://oauth.reddit.com/r/'+subreddit+'/hot'
hot = requests.get(link,headers = headers)
hot.json()

这是输出

代码语言:javascript
复制
{'kind': 'Listing',
 'data': {'after': 't3_x8kidp',
  'dist': 27,
  'modhash': None,
  'geo_filter': None,
  'children': [{'kind': 't3',
    'data': {'approved_at_utc': None,
     'subreddit': 'wallstreetbets',
     'selftext': '**Read [rules](https://www.reddit.com/r/wallstreetbets/wiki/contentguide), follow [Twitter](https://twitter.com/Official_WSB) and [IG](https://www.instagram.com/official_wallstreetbets/), join [Discord](https://discord.gg/wallstreetbets), see [ban bets](https://www.reddit.com/r/wallstreetbets/wiki/banbets)!**\n\n[dm mods because why not](https://www.reddit.com/message/compose/?to=/r/wallstreetbets)\n\n[Earnings Thread](https://wallstreetbets.reddit.com/x4ryjg)',
     'author_fullname': 't2_bd6q5',
     'saved': False,
     'mod_reason_title': None,
     'gilded': 0,
     'clicked': False,
     'title': 'What Are Your Moves Tomorrow, September 08, 2022',
     'link_flair_richtext': [{'e': 'text', 't': 'Daily Discussion'}],
     'subreddit_name_prefixed': 'r/wallstreetbets',
     'hidden': False,
     'pwls': 7,
     'link_flair_css_class': 'daily',
     'downs': 0,
     'thumbnail_height': None,
     'top_awarded_type': None,
     'hide_score': False,
     'name': 't3_x8ev67',
...
     'created_utc': 1662594703.0,
     'num_crossposts': 0,
     'media': None,
     'is_video': False}}],
  'before': None}}

然后,我将它转换成一个数据框架。

代码语言:javascript
复制
df = pd.DataFrame()
for post in hot.json()['data']['children']:
    df = df.append({
        'subreddit' : post['data']['subreddit'],
        'title': post['data']['title'],
        'selftext': post['data']['selftext'],
        'created_utc': post['data']['created_utc'],
        'id': post['data']['id']

      

    }, ignore_index = True)

有了这个,我就能够获得一个类似于这个DataFrame的数据框架

然后,为了获得注释,我从26个帖子中创建了一个包含所有JSON脚本的列表,然后创建了一个while循环来遍历json脚本。

代码语言:javascript
复制
supereme = len(list_of_comments)
indexy = pd.DataFrame()
while supereme > 0:
    supereme -= 1
    for g in range(0,len(list_of_comments[supereme]['data']['children'])-1):
        indexy = pd.concat([indexy, pd.DataFrame.from_records([{
     'body': list_of_comments[supereme]['data']['children'][g]['data']['body'],
     'post_id': list_of_comments[supereme]['data']['children'][g]['data']['parent_id'] }])], ignore_index = True)

      

  
indexy

这给了我这个:DataFrame,但是,我无法得到对评论的答复。有什么帮助吗?我试过这样做

代码语言:javascript
复制
posts = 26 
for i in np.arange(0,27):
    print('i',i)
    if len(list_of_comments[i]['data']['children']) == 0:
        continue
    for j in np.arange(0,len(list_of_comments[i]['data']['children'])):
        if len(list_of_comments[i]['data']['children'][j]['data']['replies']) == 0:
            break
        else: 
            print('j',len(list_of_comments[i]['data']['children'][j]['data']['replies']))
            for z in np.arange(len(list_of_comments[i]['data']['children'][j]['data']['replies']['data']['children'])):
                if len(list_of_comments[i]['data']['children'][j]['data']['replies']['data']['children']) == 0:
                    break
                print('z',z)


                print(list_of_comments[i]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['body'])

第一个循环有点工作,但它没有正确计数,以获得所有的回复,所有的帖子,它只会拉一个或两个。我们不想用婴儿车

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-09-10 05:13:18

代码语言:javascript
复制
x=len(list_of_comments)
replies = pd.DataFrame()
for i in range(0,len(list_of_comments)):
    try:
        for j in range(0, len(list_of_comments[x]['data']['children'])):  
            try: 
                for z in range(0, len(list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'])):
                    #print(list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['body'])
                    #print(list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['link_id'])
                    replies = pd.concat([replies, pd.DataFrame.from_records([{
        'body': list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['body'],
        'post_id': list_of_comments[x]['data']['children'][j]['data']['replies']['data']['children'][z]['data']['link_id']

      

    }])], ignore_index = True)
                    
            except:
                pass
    except:
        continue
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73644019

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档