当我将HTML转换为docx时,我遇到了新的问题,它会引发异常:
org.xml.sax.SAXParseException;lineNumber: 4;columnNumber: 73;引用实体"nbsp“,但未声明
正如我所理解的,这是因为docx4j认为我的文件是XML,并希望将其转换为docx,但是在XML中只有5个预定义实体,而像nbsp这样的实体没有在XML中定义。如何使docx4j在不声明doctype中的实体nbsp的情况下将转换成文档?
它是docx4j的不正确工作还是它的局限性?
这是我的代码:
package ru.simplexsoftware.constructorOfDocuments.web.rest;
import org.docx4j.convert.in.xhtml.XHTMLImporterImpl;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.exceptions.InvalidFormatException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.NumberingDefinitionsPart;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.HttpRequestHandler;
import ru.simplexsoftware.constructorOfDocuments.dao.TemplateDao;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.xml.bind.JAXBException;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.charset.StandardCharsets;
public class DocxFileDownloadServlet implements HttpRequestHandler {
@Autowired
TemplateDao templateDao;
@Override
public void handleRequest(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
String parameter = request.getParameter("documentId");
Long documentId = Long.parseLong(parameter);
WordprocessingMLPackage wordMLPackage = null;
try {
wordMLPackage = WordprocessingMLPackage.createPackage();
} catch (InvalidFormatException e) {
e.printStackTrace();
}
NumberingDefinitionsPart ndp = null;
try {
ndp = new NumberingDefinitionsPart();
} catch (InvalidFormatException e) {
e.printStackTrace();
}
try {
wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
} catch (InvalidFormatException e) {
e.printStackTrace();
}
try {
ndp.unmarshalDefaultNumbering();
} catch (JAXBException e) {
e.printStackTrace();
}
XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
xHTMLImporter.setHyperlinkStyle("Hyperlink");
String htmlString=templateDao.get(documentId).html;
htmlString = htmlString.replaceAll("<br>","<br/>");
InputStream stream = new ByteArrayInputStream(htmlString.getBytes(StandardCharsets.UTF_8.name()));
// Convert the XHTML, and add it into the empty docx we made
try {
wordMLPackage.getMainDocumentPart().getContent().addAll(
xHTMLImporter.convert(htmlString, null));
} catch (Docx4JException e) {
e.printStackTrace();
}
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
try {
wordMLPackage.save(outputStream);
} catch (Docx4JException e) {
e.printStackTrace();
}
response.setContentType("application/msword");
response.getOutputStream().write(outputStream.toString().getBytes("UTF-8"));
response.flushBuffer();
}
}发布于 2018-01-31 09:48:06
您可以尝试使用AltChunkType类型将HTML字符串插入docx段落中。
wordMLPackage.getMainDocumentPart().addAltChunk(AltChunkType.Xhtml, htmlString .getBytes());https://stackoverflow.com/questions/46391640
复制相似问题