文章/答案/技术大牛

发布

社区首页 >问答首页 >使用Flask-Cache缓存lxml.html对象

问使用Flask-Cache缓存lxml.html对象
EN

Stack Overflow用户

提问于 2012-08-20 06:06:50

回答 1查看 1.1K关注 0票数 2

我正在尝试制作一个Flask web应用程序，你必须请求整个非本地网站，我想知道是否有可能缓存它以加快速度，因为网站不会经常更改，但我仍然希望它每天更新一次缓存。

不管怎样，我查了一下，找到了Flask-Cache，它似乎做了我想做的事情，所以我对它做了适当的修改，并添加了以下内容：

from flask.ext.cache import Cache
[...]
cache = Cache()
[...]
cache.init_app(app)
[...]
@cache.cached(timeout=86400, key_prefix='content')
def get_content():
    return lxml.html.fromstring(urllib2.urlopen('http://WEBSITE.com').read())

然后，我从需要内容的函数调用，如下所示：

content = get_content()

现在我期望它在每次调用时都重用缓存的lxml.html对象，但这不是我所看到的。对象的id在每次调用时都会发生变化，并且根本不会加速。那么，是我误解了Flask-Cache的功能，还是我做错了什么？我试着使用memoize装饰器来代替，我试着减少超时或者把它一起移除，但似乎没有什么不同。

谢谢。

python

caching

flask

回答 1

Stack Overflow用户

回答已采纳

发布于 2012-12-23 04:30:01

默认的CACHE_TYPE是null，它给你一个NullCache --所以你根本不会得到缓存，这是你观察到的。The documentation没有明确说明这一点，但Cache.init_app源代码中的这一行说明了：

self.config.setdefault('CACHE_TYPE', 'null')

要实际使用一些缓存，请初始化Cache实例以使用适当的缓存。

cache = Cache(config={'CACHE_TYPE': 'simple'})

旁白:请注意，非常适合开发和测试，本例也是如此，但您不应该在生产中使用它。像或这样的东西会更好

现在，有了实际的缓存，您将遇到下一个问题。在第二次调用中，将从Cache检索缓存的lxml.html对象，但由于这些对象是不可缓存的，因此该对象被破坏。Stacktrace看起来像这样：

Traceback (most recent call last):
  File "/home/day/.virtualenvs/so-flask/lib/python2.7/site-packages/flask/app.py", line 1701, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/day/.virtualenvs/so-flask/lib/python2.7/site-packages/flask/app.py", line 1689, in wsgi_app
    response = self.make_response(self.handle_exception(e))
  File "/home/day/.virtualenvs/so-flask/lib/python2.7/site-packages/flask/app.py", line 1687, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/day/.virtualenvs/so-flask/lib/python2.7/site-packages/flask/app.py", line 1360, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/day/.virtualenvs/so-flask/lib/python2.7/site-packages/flask/app.py", line 1358, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/day/.virtualenvs/so-flask/lib/python2.7/site-packages/flask/app.py", line 1344, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/day/q12030403.py", line 20, in index
    return "get_content returned: {0!r}\n".format(get_content())
  File "lxml.etree.pyx", line 1034, in lxml.etree._Element.__repr__ (src/lxml/lxml.etree.c:41389)

  File "lxml.etree.pyx", line 881, in lxml.etree._Element.tag.__get__ (src/lxml/lxml.etree.c:39979)

  File "apihelpers.pxi", line 15, in lxml.etree._assertValidNode (src/lxml/lxml.etree.c:12306)

AssertionError: invalid Element proxy at 3056741852

因此，不应该缓存lxml.html对象，而应该只缓存简单的字符串-您下载的网站的内容，然后每次重新解析它以获得新的lxml.html对象。你的缓存仍然有帮助，因为你不会每次都访问其他网站。下面是一个完整程序来演示该解决方案：

from flask import Flask
from flask.ext.cache import Cache
import time
import lxml.html
import urllib2

app = Flask(__name__)

cache = Cache(config={'CACHE_TYPE': 'simple'})
cache.init_app(app)

@cache.cached(timeout=86400, key_prefix='content')
def get_content():
    app.logger.debug("get_content called")
#    return lxml.html.fromstring(urllib2.urlopen('http://daybarr.com/wishlist').read())
    return urllib2.urlopen('http://daybarr.com/wishlist').read()

@app.route("/")
def index():
    app.logger.debug("index called")
    return "get_content returned: {0!r}\n".format(get_content())

if __name__ == "__main__":
    app.run(debug=True)

当我运行该程序，并向http://127.0.0.1:5000/发出两个请求时，我得到以下输出。请注意，第二次不会调用get_content，因为内容是从缓存中提供的。

 * Running on http://127.0.0.1:5000/
 * Restarting with reloader
--------------------------------------------------------------------------------
DEBUG in q12030403 [q12030403.py:20]:
index called
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
DEBUG in q12030403 [q12030403.py:14]:
get_content called
--------------------------------------------------------------------------------
127.0.0.1 - - [21/Dec/2012 00:03:28] "GET / HTTP/1.1" 200 -
--------------------------------------------------------------------------------
DEBUG in q12030403 [q12030403.py:20]:
index called
--------------------------------------------------------------------------------
127.0.0.1 - - [21/Dec/2012 00:03:33] "GET / HTTP/1.1" 200 -

票数 6

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/12030403

复制

相似问题

问使用Flask-Cache缓存lxml.html对象
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Flask-Cache缓存lxml.html对象EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Flask-Cache缓存lxml.html对象
EN