我正在尝试实现一个facebook抓取器,以获得对facebook-pages馈送帖子上的反应的洞察。我注意到当天和最后一天的结果(帖子)是正确的,但过去越远,跳过的提要帖子就越多,返回的结果数量非常少。
为什么Graph跳过了很多帖子?有时甚至会跳过完整的几个月!
下面是我使用的代码:
import json
import datetime
import csv
import time
import urllib.request
import urllib.error
import requests
import numpy as np
import matplotlib.pyplot as plt
import json
from urllib.parse import urlencode
import pandas as pd
page_id="nytimes"
token="my_User_Token_Here" #using a user token got from [https://developers.facebook.com/tools/explorer/][1]
url="https://graph.facebook.com/v2.12/"+page_id+"/posts/?fields=id,created_time,message,shares.summary(true).limit(0),comments.summary(true).limit(0),likes.summary(true),reactions.type(LOVE).limit(0).summary(total_count).as(Love),reactions.type(WOW).limit(0).summary(total_count).as(Wow),reactions.type(HAHA).limit(0).summary(total_count).as(Haha),reactions.type(SAD).limit(0).summary(1).as(Sad),reactions.type(ANGRY).limit(0).summary(1).as(Angry)&access_token="+token+"&limit=100"
posts = []
found = False
try:
while (True):
print(url)
facebook_connection = urlopen(url)
data = facebook_connection.read().decode('utf8')
json_object = json.loads(data)
allposts=json_object["data"]
allposts = np.asarray(allposts)
created = '2018-03-01'
for i in range(0,100,1):
if (pd.to_datetime(allposts[i]['created_time']) > pd.to_datetime(created)):
#print(allposts[i]['created_time'])
posts.append(allposts[i])
else:
print(i, "%i fucking here!")
posts.append(allposts[i])
found = True
break;
if (i == 99):
#print('here is: ' + i)
url = json_object["paging"]["next"]
if (found == True):
break;
df=pd.DataFrame(posts)
except Exception as ex:
print (ex)发布于 2018-05-25 21:29:07
这是一个reported bug。自从被报道以来,API2.12版的规则发生了变化,每年只能访问top 600帖子。这对开发人员和研究人员来说显然是个坏消息。
https://stackoverflow.com/questions/50495838
复制相似问题