首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用iTextSharp去除水印

使用iTextSharp去除水印
EN

Stack Overflow用户
提问于 2012-01-07 15:32:20
回答 2查看 19.3K关注 0票数 5

我使用Pdfstamper在pdf上添加了一个水印。代码如下:

代码语言:javascript
复制
for (int pageIndex = 1; pageIndex <= pageCount; pageIndex++)
{
    iTextSharp.text.Rectangle pageRectangle = reader.GetPageSizeWithRotation(pageIndex);
    PdfContentByte pdfData = stamper.GetUnderContent(pageIndex);
    pdfData.SetFontAndSize(BaseFont.CreateFont(BaseFont.HELVETICA, BaseFont.CP1252, 
        BaseFont.NOT_EMBEDDED), watermarkFontSize);
    PdfGState graphicsState = new PdfGState();
    graphicsState.FillOpacity = watermarkFontOpacity;
    pdfData.SetGState(graphicsState);
    pdfData.SetColorFill(iTextSharp.text.BaseColor.BLACK);
    pdfData.BeginText();
    pdfData.ShowTextAligned(PdfContentByte.ALIGN_CENTER, "LipikaChatterjee", 
        pageRectangle.Width / 2, pageRectangle.Height / 2, watermarkRotation);
    pdfData.EndText();
}

这可以很好地工作。现在我想从我的pdf文件中删除这个水印。我查看了iTextSharp,但找不到任何帮助。我甚至尝试添加水印作为图层,然后删除图层,但无法从pdf中删除图层的内容。我在iText中寻找去除层的方法,发现了一个类OCGRemover,但我无法在iTextsharp中获得一个等效的类。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2012-01-09 01:11:23

基于“我甚至尝试将水印添加为图层”这句话,我将假定您正在处理您正在创建的内容,而不是试图取消他人内容的水印。

PDF使用可选的内容组(OCG)将对象存储为图层。如果您将水印文本添加到图层中,以后可以相当容易地将其删除。

以下代码是针对iTextSharp 5.1.1.0的完整工作的C# 2010 WinForms应用程序。它使用基于Bruno's original Java code found here的代码。代码分为三个部分。第1节创建了一个示例PDF供我们使用。第2节从第一个PDF创建一个新的PDF,并将水印应用于单独图层上的每个页面。第3节从第二个创建了最终的PDF,但删除了带有水印文本的图层。有关更多详细信息,请参阅代码注释。

当您创建PdfLayer对象时,您可以为其指定一个名称,使其显示在PDF阅读器中。不幸的是,我找不到访问这个名称的方法,所以下面的代码在层中查找实际的水印文本。如果你没有使用额外的PDF层,我建议你只使用,在内容流中寻找/OC,而不是浪费时间寻找你实际的水印文本。如果你能找到按名字查找/OC组的方法,请让我知道!

代码语言:javascript
复制
using System;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;

namespace WindowsFormsApplication1 {
    public partial class Form1 : Form {
        public Form1() {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e) {
            string workingFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
            string startFile = Path.Combine(workingFolder, "StartFile.pdf");
            string watermarkedFile = Path.Combine(workingFolder, "Watermarked.pdf");
            string unwatermarkedFile = Path.Combine(workingFolder, "Un-watermarked.pdf");
            string watermarkText = "This is a test";

            //SECTION 1
            //Create a 5 page PDF, nothing special here
            using (FileStream fs = new FileStream(startFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
                using (Document doc = new Document(PageSize.LETTER)) {
                    using (PdfWriter witier = PdfWriter.GetInstance(doc, fs)) {
                        doc.Open();

                        for (int i = 1; i <= 5; i++) {
                            doc.NewPage();
                            doc.Add(new Paragraph(String.Format("This is page {0}", i)));
                        }

                        doc.Close();
                    }
                }
            }

            //SECTION 2
            //Create our watermark on a separate layer. The only different here is that we are adding the watermark to a PdfLayer which is an OCG or Optional Content Group
            PdfReader reader1 = new PdfReader(startFile);
            using (FileStream fs = new FileStream(watermarkedFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
                using (PdfStamper stamper = new PdfStamper(reader1, fs)) {
                    int pageCount1 = reader1.NumberOfPages;
                    //Create a new layer
                    PdfLayer layer = new PdfLayer("WatermarkLayer", stamper.Writer);
                    for (int i = 1; i <= pageCount1; i++) {
                        iTextSharp.text.Rectangle rect = reader1.GetPageSize(i);
                        //Get the ContentByte object
                        PdfContentByte cb = stamper.GetUnderContent(i);
                        //Tell the CB that the next commands should be "bound" to this new layer
                        cb.BeginLayer(layer);
                        cb.SetFontAndSize(BaseFont.CreateFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED), 50);
                        PdfGState gState = new PdfGState();
                        gState.FillOpacity = 0.25f;
                        cb.SetGState(gState);
                        cb.SetColorFill(BaseColor.BLACK);
                        cb.BeginText();
                        cb.ShowTextAligned(PdfContentByte.ALIGN_CENTER, watermarkText, rect.Width / 2, rect.Height / 2, 45f);
                        cb.EndText();
                        //"Close" the layer
                        cb.EndLayer();
                    }
                }
            }

            //SECTION 3
            //Remove the layer created above
            //First we bind a reader to the watermarked file, then strip out a bunch of things, and finally use a simple stamper to write out the edited reader
            PdfReader reader2 = new PdfReader(watermarkedFile);

            //NOTE, This will destroy all layers in the document, only use if you don't have additional layers
            //Remove the OCG group completely from the document.
            //reader2.Catalog.Remove(PdfName.OCPROPERTIES);

            //Clean up the reader, optional
            reader2.RemoveUnusedObjects();

            //Placeholder variables
            PRStream stream;
            String content;
            PdfDictionary page;
            PdfArray contentarray;

            //Get the page count
            int pageCount2 = reader2.NumberOfPages;
            //Loop through each page
            for (int i = 1; i <= pageCount2; i++) {
                //Get the page
                page = reader2.GetPageN(i);
                //Get the raw content
                contentarray = page.GetAsArray(PdfName.CONTENTS);
                if (contentarray != null) {
                    //Loop through content
                    for (int j = 0; j < contentarray.Size; j++) {
                        //Get the raw byte stream
                        stream = (PRStream)contentarray.GetAsStream(j);
                        //Convert to a string. NOTE, you might need a different encoding here
                        content = System.Text.Encoding.ASCII.GetString(PdfReader.GetStreamBytes(stream));
                        //Look for the OCG token in the stream as well as our watermarked text
                        if (content.IndexOf("/OC") >= 0 && content.IndexOf(watermarkText) >= 0) {
                            //Remove it by giving it zero length and zero data
                            stream.Put(PdfName.LENGTH, new PdfNumber(0));
                            stream.SetData(new byte[0]);
                        }
                    }
                }
            }

            //Write the content out
            using (FileStream fs = new FileStream(unwatermarkedFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
                using (PdfStamper stamper = new PdfStamper(reader2, fs)) {

                }
            }
            this.Close();
        }
    }
}
票数 13
EN

Stack Overflow用户

发布于 2014-03-19 06:20:40

作为Chris's answer的扩展,在这篇文章的底部包含了一个用于删除图层的VB.Net类,这个类应该更精确一些。

  1. 它遍历PDF的图层列表(存储在文件目录中OCProperties字典的OCGs数组中)。此数组包含对PDF文件中的对象的间接引用,其中包含名称
  2. 它遍历页面的属性(也存储在字典中)以查找指向layer对象的属性(通过间接引用)
  3. 它对内容流进行实际解析以查找模式/OC /{PagePropertyReference} BDC {Actual Content} EMC的实例,因此它可以根据需要删除这些片段

然后,代码会尽可能地清理所有引用。调用代码可能如下所示:

代码语言:javascript
复制
Public Shared Sub RemoveWatermark(path As String, savePath As String)
  Using reader = New PdfReader(path)
    Using fs As New FileStream(savePath, FileMode.Create, FileAccess.Write, FileShare.None)
      Using stamper As New PdfStamper(reader, fs)
        Using remover As New PdfLayerRemover(reader)
          remover.RemoveByName("WatermarkLayer")
        End Using
      End Using
    End Using
  End Using
End Sub

完整的类:

代码语言:javascript
复制
Imports iTextSharp.text
Imports iTextSharp.text.io
Imports iTextSharp.text.pdf
Imports iTextSharp.text.pdf.parser

Public Class PdfLayerRemover
  Implements IDisposable

  Private _reader As PdfReader
  Private _layerNames As New List(Of String)

  Public Sub New(reader As PdfReader)
    _reader = reader
  End Sub

  Public Sub RemoveByName(name As String)
    _layerNames.Add(name)
  End Sub

  Private Sub RemoveLayers()
    Dim ocProps = _reader.Catalog.GetAsDict(PdfName.OCPROPERTIES)
    If ocProps Is Nothing Then Return
    Dim ocgs = ocProps.GetAsArray(PdfName.OCGS)
    If ocgs Is Nothing Then Return

    'Get a list of indirect references to the layer information
    Dim layerRefs = (From l In (From i In ocgs
                                Select Obj = DirectCast(PdfReader.GetPdfObject(i), PdfDictionary),
                                       Ref = DirectCast(i, PdfIndirectReference))
                     Where _layerNames.Contains(l.Obj.GetAsString(PdfName.NAME).ToString)
                     Select l.Ref).ToList
    'Get a list of numbers for these layer references
    Dim layerRefNumbers = (From l In layerRefs Select l.Number).ToList

    'Loop through the pages
    Dim page As PdfDictionary
    Dim propsToRemove As IEnumerable(Of PdfName)
    For i As Integer = 1 To _reader.NumberOfPages
      'Get the page
      page = _reader.GetPageN(i)

      'Get the page properties which reference the layers to remove
      Dim props = _reader.GetPageResources(i).GetAsDict(PdfName.PROPERTIES)
      propsToRemove = (From k In props.Keys Where layerRefNumbers.Contains(props.GetAsIndirectObject(k).Number) Select k).ToList

      'Get the raw content
      Dim contentarray = page.GetAsArray(PdfName.CONTENTS)
      If contentarray IsNot Nothing Then
        For j As Integer = 0 To contentarray.Size - 1
          'Parse the stream data looking for references to a property pointing to the layer.
          Dim stream = DirectCast(contentarray.GetAsStream(j), PRStream)
          Dim streamData = PdfReader.GetStreamBytes(stream)
          Dim newData = GetNewStream(streamData, (From p In propsToRemove Select p.ToString.Substring(1)))

          'Store data without the stream references in the stream
          If newData.Length <> streamData.Length Then
            stream.SetData(newData)
            stream.Put(PdfName.LENGTH, New PdfNumber(newData.Length))
          End If
        Next
      End If

      'Remove the properties from the page data
      For Each prop In propsToRemove
        props.Remove(prop)
      Next
    Next

    'Remove references to the layer in the master catalog
    RemoveIndirectReferences(ocProps, layerRefNumbers)

    'Clean up unused objects
    _reader.RemoveUnusedObjects()
  End Sub

  Private Shared Function GetNewStream(data As Byte(), propsToRemove As IEnumerable(Of String)) As Byte()
    Dim item As PdfLayer = Nothing
    Dim positions As New List(Of Integer)
    positions.Add(0)

    Dim pos As Integer
    Dim inGroup As Boolean = False
    Dim tokenizer As New PRTokeniser(New RandomAccessFileOrArray(New RandomAccessSourceFactory().CreateSource(data)))
    While tokenizer.NextToken
      If tokenizer.TokenType = PRTokeniser.TokType.NAME AndAlso tokenizer.StringValue = "OC" Then
        pos = CInt(tokenizer.FilePointer - 3)
        If tokenizer.NextToken() AndAlso tokenizer.TokenType = PRTokeniser.TokType.NAME Then
          If Not inGroup AndAlso propsToRemove.Contains(tokenizer.StringValue) Then
            inGroup = True
            positions.Add(pos)
          End If
        End If
      ElseIf tokenizer.TokenType = PRTokeniser.TokType.OTHER AndAlso tokenizer.StringValue = "EMC" AndAlso inGroup Then
        positions.Add(CInt(tokenizer.FilePointer))
        inGroup = False
      End If
    End While
    positions.Add(data.Length)

    If positions.Count > 2 Then
      Dim length As Integer = 0
      For i As Integer = 0 To positions.Count - 1 Step 2
        length += positions(i + 1) - positions(i)
      Next

      Dim newData(length) As Byte
      length = 0
      For i As Integer = 0 To positions.Count - 1 Step 2
        Array.Copy(data, positions(i), newData, length, positions(i + 1) - positions(i))
        length += positions(i + 1) - positions(i)
      Next

      Dim origStr = System.Text.Encoding.UTF8.GetString(data)
      Dim newStr = System.Text.Encoding.UTF8.GetString(newData)

      Return newData
    Else
      Return data
    End If
  End Function

  Private Shared Sub RemoveIndirectReferences(dict As PdfDictionary, refNumbers As IEnumerable(Of Integer))
    Dim newDict As PdfDictionary
    Dim arrayData As PdfArray
    Dim indirect As PdfIndirectReference
    Dim i As Integer

    For Each key In dict.Keys
      newDict = dict.GetAsDict(key)
      arrayData = dict.GetAsArray(key)
      If newDict IsNot Nothing Then
        RemoveIndirectReferences(newDict, refNumbers)
      ElseIf arrayData IsNot Nothing Then
        i = 0
        While i < arrayData.Size
          indirect = arrayData.GetAsIndirectObject(i)
          If refNumbers.Contains(indirect.Number) Then
            arrayData.Remove(i)
          Else
            i += 1
          End If
        End While
      End If
    Next
  End Sub

#Region "IDisposable Support"
  Private disposedValue As Boolean ' To detect redundant calls

  ' IDisposable
  Protected Overridable Sub Dispose(disposing As Boolean)
    If Not Me.disposedValue Then
      If disposing Then
        RemoveLayers()
      End If

      ' TODO: free unmanaged resources (unmanaged objects) and override Finalize() below.
      ' TODO: set large fields to null.
    End If
    Me.disposedValue = True
  End Sub

  ' TODO: override Finalize() only if Dispose(ByVal disposing As Boolean) above has code to free unmanaged resources.
  'Protected Overrides Sub Finalize()
  '    ' Do not change this code.  Put cleanup code in Dispose(ByVal disposing As Boolean) above.
  '    Dispose(False)
  '    MyBase.Finalize()
  'End Sub

  ' This code added by Visual Basic to correctly implement the disposable pattern.
  Public Sub Dispose() Implements IDisposable.Dispose
    ' Do not change this code.  Put cleanup code in Dispose(ByVal disposing As Boolean) above.
    Dispose(True)
    GC.SuppressFinalize(Me)
  End Sub
#End Region

End Class
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/8768130

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档