首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >CssParser错误,同时将html转换为pdf、PISA和Python

CssParser错误,同时将html转换为pdf、PISA和Python
EN

Stack Overflow用户
提问于 2014-05-18 10:18:04
回答 1查看 1.7K关注 0票数 2

我试图使用比萨和python将html文档转换成pdf文件。对于小的html代码,它工作得很好。但是当您通过google.com的html数据传递它,或者实际上任何一个大的html文件时,它就会破坏这个错误。

下面是将html转换为pdf的代码:

代码语言:javascript
复制
import ho.pisa as pisa
import sys
import os
ls =[]
for arg in sys.argv:
    ls.append(arg)
pisa.showLogging()
print ls

html_file = open(ls[1])
HTML = html_file.read()
filename = os.path.basename(str(ls[1]))
print filename
str(os.getcwd()+filename)
pdfFile =open(str(os.getcwd()+filename), "wb")
pdf = pisa.CreatePDF(HTML,pdfFile)

if not pdf.err:
    print "ds"
    pisa.startViewer(filename)

pdfFile.close()
html_file.close()

这是引发的错误:

代码语言:javascript
复制
ERROR [ho.pisa] C:\Python27\lib\site-packages\sx\pisa3\pisa_document.py line 223: Document error

Traceback (most recent call last):
  File "C:\Python27\lib\site-packages\sx\pisa3\pisa_document.py", line 128, in pisaDocument
    c = pisaStory(src, path, link_callback, debug, default_css, xhtml, encoding,
 c=c, xml_output=xml_output)
  File "C:\Python27\lib\site-packages\sx\pisa3\pisa_document.py", line 73, in pisaStory
    pisaParser(src, c, default_css, xhtml, encoding, xml_output)
  File "C:\Python27\lib\site-packages\sx\pisa3\pisa_parser.py", line 626, in pisaParser
    c.parseCSS()
  File "C:\Python27\lib\site-packages\sx\pisa3\pisa_context.py", line 545, in parseCSS
    self.css = self.cssParser.parse(self.cssText)
  File "C:\Python27\lib\site-packages\sx\w3c\cssParser.py", line 358, in parse
    src, stylesheet = self._parseStylesheet(src)
  File "C:\Python27\lib\site-packages\sx\w3c\cssParser.py", line 453, in _parseStylesheet
    src, atResults = self._parseAtKeyword(src)
  File "C:\Python27\lib\site-packages\sx\w3c\cssParser.py", line 577, in _parseAtKeyword
    src, result = self._parseAtIdent(src)
  File "C:\Python27\lib\site-packages\sx\w3c\cssParser.py", line 722, in _parseAtIdent
    src, stylesheet = self._parseStylesheet(src)
  File "C:\Python27\lib\site-packages\sx\w3c\cssParser.py", line 458, in _parseStylesheet
    src, ruleset = self._parseRuleset(src)
  File "C:\Python27\lib\site-packages\sx\w3c\cssParser.py", line 737, in _parseRuleset
    src, properties = self._parseDeclarationGroup(src.lstrip())
  File "C:\Python27\lib\site-packages\sx\w3c\cssParser.py", line 922, in _parseDeclarationGroup
    raise self.ParseError('Declaration group closing \'}\' not found', src, ctxsrc)
CSSParseError: Declaration group closing '}' not found:: (u'{', u'0%{opacity:0}50%{opa')
Traceback (most recent call last):
  File "trypdf.py", line 16, in <module>
    pdf = pisa.CreatePDF(HTML,pdfFile)
  File "C:\Python27\lib\site-packages\sx\pisa3\pisa_document.py", line 128, in pisaDocument
    c = pisaStory(src, path, link_callback, debug, default_css, xhtml, encoding,
 c=c, xml_output=xml_output)
  File "C:\Python27\lib\site-packages\sx\pisa3\pisa_document.py", line 73, in pisaStory
pisaParser(src, c, default_css, xhtml, encoding, xml_output)
  File "C:\Python27\lib\site-packages\sx\pisa3\pisa_parser.py", line 626, in pisaParser
    c.parseCSS()
  File "C:\Python27\lib\site-packages\sx\pisa3\pisa_context.py", line 545, in parseCSS
self.css = self.cssParser.parse(self.cssText)
  File "C:\Python27\lib\site-packages\sx\w3c\cssParser.py", line 358, in parse
src, stylesheet = self._parseStylesheet(src)
  File "C:\Python27\lib\site-packages\sx\w3c\cssParser.py", line 453, in _parseStylesheet
    src, atResults = self._parseAtKeyword(src)
  File "C:\Python27\lib\site-packages\sx\w3c\cssParser.py", line 577, in _parseAtKeyword
    src, result = self._parseAtIdent(src)
  File "C:\Python27\lib\site-packages\sx\w3c\cssParser.py", line 722, in _parseAtIdent
    src, stylesheet = self._parseStylesheet(src)
  File "C:\Python27\lib\site-packages\sx\w3c\cssParser.py", line 458, in _parseStylesheet
    src, ruleset = self._parseRuleset(src)
  File "C:\Python27\lib\site-packages\sx\w3c\cssParser.py", line 737, in _parseRuleset
    src, properties = self._parseDeclarationGroup(src.lstrip())
  File "C:\Python27\lib\site-packages\sx\w3c\cssParser.py", line 922, in _parseDeclarationGroup
    raise self.ParseError('Declaration group closing \'}\' not found', src, ctxsrc)
sx.w3c.cssParser.CSSParseError: Declaration group closing '}' not found:: (u'{', u'0%{opacity:0}50%{opa')
EN

回答 1

Stack Overflow用户

发布于 2014-05-18 16:30:52

xhmlt2pdf不会和所有的网站一起工作。相反,您可以使用pdfkit

代码语言:javascript
复制
import pdfkit
pdfkit.from_url('http://google.com', 'out.pdf')

编辑:--我找到了另一个PyQt解决方案(来自这里,感谢Mark ):

代码语言:javascript
复制
import sys 
from PyQt4.QtCore import *
from PyQt4.QtGui import * 
from PyQt4.QtWebKit import * 

app = QApplication(sys.argv)
web = QWebView()
web.load(QUrl("http://www.yahoo.com"))
printer = QPrinter()
printer.setPageSize(QPrinter.A4)
printer.setOutputFormat(QPrinter.PdfFormat)
printer.setOutputFileName("fileOK.pdf")

def convertIt():
    web.print_(printer)
    print "Pdf generated"
    QApplication.exit()

QObject.connect(web, SIGNAL("loadFinished(bool)"), convertIt)
sys.exit(app.exec_())
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/23720852

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档