文章/答案/技术大牛

发布

社区首页 >问答首页 >如何用nokogiri和rubyzip编辑docx

问如何用nokogiri和rubyzip编辑docx
EN

Stack Overflow用户

提问于 2010-10-08 04:00:34

回答 3查看 5K关注 0票数 8

我使用rubyzip和nokogiri的组合来编辑.docx文件。我使用rubyzip解压.docx文件，然后使用nokogiri解析和更改word/document.xml文件的正文，但每次我在末尾关闭rubyzip时，它都会损坏文件，并且我无法打开或修复它。我在桌面上解压.docx文件，并检查word/document.xml文件，内容更新为我更改的内容，但所有其他文件都被弄乱了。有人能帮我解决这个问题吗？下面是我的代码：

require 'rubygems'  
require 'zip/zip'  
require 'nokogiri'  
zip = Zip::ZipFile.open("test.docx")  
doc = zip.find_entry("word/document.xml")  
xml = Nokogiri::XML.parse(doc.get_input_stream)  
wt = xml.root.xpath("//w:t", {"w" => "http://schemas.openxmlformats.org/wordprocessingml/2006/main"}).first  
wt.content = "New Text"  
zip.get_output_stream("word/document.xml") {|f| f << xml.to_s}  
zip.close

ruby-on-rails

nokogiri

docx

rubyzip

回答 3

Stack Overflow用户

发布于 2011-01-12 22:27:57

我昨晚遇到了和rubyzip同样的腐败问题。我将所有内容复制到一个新的zip文件中，并根据需要替换文件，从而解决了这个问题。

这是我的工作概念证明：

#!/usr/bin/env ruby

require 'rubygems'
require 'zip/zip' # rubyzip gem
require 'nokogiri'

class WordXmlFile
  def self.open(path, &block)
    self.new(path, &block)
  end

  def initialize(path, &block)
    @replace = {}
    if block_given?
      @zip = Zip::ZipFile.open(path)
      yield(self)
      @zip.close
    else
      @zip = Zip::ZipFile.open(path)
    end
  end

  def merge(rec)
    xml = @zip.read("word/document.xml")
    doc = Nokogiri::XML(xml) {|x| x.noent}
    (doc/"//w:fldSimple").each do |field|
      if field.attributes['instr'].value =~ /MERGEFIELD (\S+)/
        text_node = (field/".//w:t").first
        if text_node
          text_node.inner_html = rec[$1].to_s
        else
          puts "No text node for #{$1}"
        end
      end
    end
    @replace["word/document.xml"] = doc.serialize :save_with => 0
  end

  def save(path)
    Zip::ZipFile.open(path, Zip::ZipFile::CREATE) do |out|
      @zip.each do |entry|
        out.get_output_stream(entry.name) do |o|
          if @replace[entry.name]
            o.write(@replace[entry.name])
          else
            o.write(@zip.read(entry.name))
          end
        end
      end
    end
    @zip.close
  end
end

if __FILE__ == $0
  file = ARGV[0]
  out_file = ARGV[1] || file.sub(/\.docx/, ' Merged.docx')
  w = WordXmlFile.open(file) 
  w.force_settings
  w.merge('First_Name' => 'Eric', 'Last_Name' => 'Mason')
  w.save(out_file)
end

票数 12

Stack Overflow用户

发布于 2010-11-09 00:37:24

我跌跌撞撞地跨过了柱子，对红宝石和nokogiri一无所知，但是...

看起来您错误地重新压缩了新内容。我不知道rubyzip，但您需要一种方法来告诉它更新条目word/document.xml，然后重新保存/重新压缩文件。

看起来您只是在用新数据覆盖条目，当然这将是一个不同的大小，并且完全搞乱了zip文件的其余部分。

我在这篇文章Parse text file and create an excel report中给出了一个excel的例子

尽管我使用的是不同的zip库和VB (我仍然在做你想做的事情，我的代码已经完成了一半)

以下是适用的部分

Using z As ZipFile = ZipFile.Read(xlStream.BaseStream) 
'Grab Sheet 1 out of the file parts and read it into a string. 
Dim myEntry As ZipEntry = z("xl/worksheets/sheet1.xml") 
Dim msSheet1 As New MemoryStream 
myEntry.Extract(msSheet1) 
msSheet1.Position = 0 
Dim sr As New StreamReader(msSheet1) 
Dim strXMLData As String = sr.ReadToEnd 

'Grab the data in the empty sheet and swap out the data that I want  
Dim str2 As XElement = CreateSheetData(tbl) 
Dim strReplace As String = strXMLData.Replace("<sheetData/>", str2.ToString) 
z.UpdateEntry("xl/worksheets/sheet1.xml", strReplace) 
'This just rezips the file with the new data it doesnt save to disk 
z.Save(fiRet.FullName) 
End Using

票数 1

Stack Overflow用户

发布于 2014-02-01 10:22:12

根据official Github documentation的说法，你应该Use write_buffer instead open。在链接上还有一个代码示例。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/3885425

复制

相似问题

问如何用nokogiri和rubyzip编辑docx
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何用nokogiri和rubyzip编辑docxEN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何用nokogiri和rubyzip编辑docx
EN