我试图从一个特定的用户那里获得所有的tweet:
def get_all_tweets(user_id, DEBUG):
# Your bearer token here
t = Twarc2(bearer_token="blah")
# Initialize a list to hold all the tweepy Tweets
alltweets = []
new_tweets = {}
if DEBUG:
# Debug: read from file
f = open('tweets_debug.txt',)
new_tweets = json.load(f)
alltweets.extend(new_tweets)
else:
# make initial request for most recent tweets (3200 is the maximum allowed count)
new_tweets = t.timeline(user=user_id)
# save most recent tweets
alltweets.extend(new_tweets)
if DEBUG:
# Debug: write to file
f = open("tweets_debug.txt", "w")
f.write(json.dumps(alltweets, indent=2, sort_keys=False))
f.close()
# Save the id of the oldest tweet less one
oldest = str(int(alltweets[-1]['meta']['oldest_id']) - 1)
# Keep grabbing tweets until there are no tweets left to grab
while len(dict(new_tweets)) > 0:
print(f"getting tweets before {oldest}")
# All subsiquent requests use the max_id param to prevent duplicates
new_tweets = t.timeline(user=user_id,until_id=oldest)
# Save most recent tweets
alltweets.extend(new_tweets)
# Update the id of the oldest tweet less one
oldest = str(int(alltweets[-1]['meta']['oldest_id']) - 1)
print(f"...{len(alltweets)} tweets downloaded so far")
res = []
for tweetlist in alltweets:
res.extend(tweetlist['data'])
f = open("output.txt", "w")
f.write(json.dumps(res, indent=2, sort_keys=False))
f.close()
return res然而,len(dict(new_tweets))不起作用。它总是返回0。sum(1 for dummy in new_tweets)还返回0。
我试过json.load(new_tweets),但它也不起作用。
然而,alltweets.extend(new_tweets)工作正常。
似乎timeline()返回一个生成器类型的值(<generator object Twarc2._timeline at 0x000001D78B3D8B30>)。有什么方法可以让我数数它的长度来确定是否还有更多的推文没有被抓取?
或者有什么方法可以合并..。
someList = []
someList.extend(new_tweets)
while len(someList) > 0:
# blah blah...into一行与while
编辑:我在while循环之前尝试了print(list(new_tweets)),它返回[]。该对象似乎实际上是空。
是因为alltweets.extend(new_tweets)以某种方式消耗了new_tweets生成器.?
发布于 2021-10-16 21:10:27
我自己想出来的。这个问题可以通过将生成器转换为list来解决:
new_tweets = list(t.timeline(user=user_id,until_id=oldest))https://stackoverflow.com/questions/69591930
复制相似问题