文章/答案/技术大牛

发布

社区首页 >问答首页 >打印来自Tumblr API的20多篇文章

问打印来自Tumblr API的20多篇文章
EN

Stack Overflow用户

提问于 2017-11-15 15:54:37

回答 1查看 1.7K关注 0票数 6

下午好,

我对Python非常陌生，但我正在编写一段代码，它允许我从指定的Tumblr帐户下载所有帖子(包括“便笺”)到我的计算机上。

考虑到我在编码方面缺乏经验，我试图找到一个预先制作好的脚本，这样我就可以这样做了。我在GitHub上找到了几个很棒的脚本，但没有一个真正返回Tumblr帖子中的注释(据我所见，如果有人知道有这样的脚本，请纠正我！)

因此，我试图写我自己的剧本。下面的代码我已经取得了一些成功。它从给定的Tumblr中打印出最近的20个帖子(尽管格式相当难看--基本上是数百行文本都打印在记事本文件的一行中)：

#This script prints all the posts (including tags, comments) and also the 
#first 20notes from all the Tumblr blogs.

import pytumblr

# Authenticate via API Key
client = pytumblr.TumblrRestClient('myapikey')

#offset = 0

# Make the request
client.posts('staff', limit=2000, offset=0, reblog_info=True, notes_info=True, 
filter='html')
#print out into a .txt file
with open('out.txt', 'w') as f:
print >> f, client.posts('staff', limit=2000, offset=0, reblog_info=True, 
notes_info=True, filter='html')

但是，我希望脚本不断地打印帖子，直到它到达指定博客的末尾为止。

我搜索了这个站点，发现了一个非常类似的问题(通过PyTumblr只返回20个帖子)，这个问题已经被堆栈溢出用户戳了出来。但是，我似乎无法真正实现poke的解决方案，这样它就可以用于我的数据。实际上，当我运行以下脚本时，根本不会产生任何输出。

import pytumblr

# Authenticate via API Key
client = pytumblr.TumblrRestClient('myapikey')
blog = ('staff')
def getAllPosts (client, blog):
offset = 0
while True:
    posts = client.posts(blog, limit=20, offset=offset, reblog_info=True, notes_info=True)
    if not posts:
        return

    for post in posts:
        yield post


    offset += 20

我应该注意到，这个站点上有几篇关于Tumblr注释的文章(例如使用Tumblr API获得50多个注释)，其中大多数都询问如何下载50多个帖子。我非常满意每一篇文章只有50篇，这是我想要增加的帖子数量。

另外，我已经将这篇文章标记为Python，但是，如果有更好的方法来使用另一种编程语言来获取我需要的数据，那就更好了。

非常感谢您的时间提前！

python

tumblr

pytumblr

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-11-16 13:01:54

如果你只想看到答案，它就在标题A更正后的底部

第二个代码段是一个生成器，它一个接一个地生成posts，所以您必须使用它作为循环之类的部分，然后对输出执行一些操作。下面是您的代码和一些额外的代码，这些代码在生成器上进行迭代，并打印出它得到的数据。

import pytumblr

def getAllPosts (client, blog):
    offset = 0
    while True:
        posts = client.posts(blog, limit=20, offset=offset, reblog_info=True, notes_info=True)
        if not posts:
            return

        for post in posts:
            yield post

        offset += 20

# Authenticate via API Key
client = pytumblr.TumblrRestClient('myapikey')
blog = ('staff')

# use the generator getAllPosts
for post in getAllPosts(client, blog):
    print(post)

但是，该代码中有几个bug。getAllPosts不会只提供每个帖子，它还会返回其他内容，因为它将遍历API响应，正如您从我在ipython shell中运行的示例中可以看到的那样。

In [7]: yielder = getAllPosts(client, 'staff')

In [8]: next(yielder)
Out[8]: 'blog'

In [9]: next(yielder)
Out[9]: 'posts'

In [10]: next(yielder)
Out[10]: 'total_posts'

In [11]: next(yielder)
Out[11]: 'supply_logging_positions'

In [12]: next(yielder)
Out[12]: 'blog'

In [13]: next(yielder)
Out[13]: 'posts'

In [14]: next(yielder)
Out[14]: 'total_posts'

之所以会出现这种情况，是因为getAllPosts中的getAllPosts对象是一个字典，它包含的不仅仅是staff博客中的每一篇文章--它还包含了博客包含多少篇文章、博客的描述、上次更新的时间等等。代码本身可能会导致无限循环，因为以下条件是：

if not posts:
    return

由于响应结构，因此不可能为真，因为来自pytumblr的空Tumblr响应如下所示：

{'blog': {'ask': False,
  'ask_anon': False,
  'ask_page_title': 'Ask me anything',
  'can_send_fan_mail': False,
  'can_subscribe': False,
  'description': '',
  'followed': False,
  'is_adult': False,
  'is_blocked_from_primary': False,
  'is_nsfw': False,
  'is_optout_ads': False,
  'name': 'asdfasdf',
  'posts': 0,
  'reply_conditions': '3',
  'share_likes': False,
  'subscribed': False,
  'title': 'Untitled',
  'total_posts': 0,
  'updated': 0,
  'url': 'https://asdfasdf.tumblr.com/'},
 'posts': [],
 'supply_logging_positions': [],
 'total_posts': 0}

将对照该结构检查if not posts，而不是posts字段(此处为空列表)，因此条件不会失败，因为响应字典不是空的(请参见：用Python进行真值测试)。

修正版

下面的代码(主要是经过测试/验证)修复了getAllPosts实现中的循环，然后使用该函数检索帖子并将其转储到名为(BLOG_NAME)-posts.txt的文件中。

import pytumblr


def get_all_posts(client, blog):
    offset = 0
    while True:
        response = client.posts(blog, limit=20, offset=offset, reblog_info=True, notes_info=True)

        # Get the 'posts' field of the response        
        posts = response['posts']

        if not posts: return

        for post in posts:
            yield post

        # move to the next offset
        offset += 20


client = pytumblr.TumblrRestClient('secrety-secret')
blog = 'staff'

# use our function
with open('{}-posts.txt'.format(blog), 'w') as out_file:
    for post in get_all_posts(client, blog):
        print >>out_file, post
        # if you're in python 3.x, use the following
        # print(post, file=out_file)

这只是API的post响应的一个直接的文本转储，所以如果您需要使它看起来更好看或任何东西，这取决于您。

票数 4

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/47311845

复制

相似问题

问打印来自Tumblr API的20多篇文章
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问打印来自Tumblr API的20多篇文章EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问打印来自Tumblr API的20多篇文章
EN