首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >从png文件中获取文本字段

从png文件中获取文本字段
EN

Stack Overflow用户
提问于 2012-05-05 03:06:43
回答 1查看 4.6K关注 0票数 3

到目前为止,似乎什么都没有奏效。我在pnginfo上看到了以下消息:

代码语言:javascript
复制
concept_Sjet_dream6.png...
  Image Width: 200 Image Length: 240
  Bitdepth (Bits/Sample): 8
  Channels (Samples/Pixel): 2
  Pixel depth (Pixel Depth): 16
  Colour Type (Photometric Interpretation): GRAYSCALE with alpha channel 
  Image filter: Single row per byte filter 
  Interlacing: No interlacing 
  Compression Scheme: Deflate method 8, 32k window
  Resolution: 11811, 11811 (pixels per meter)
  FillOrder: msb-to-lsb
  Byte Order: Network (Big Endian)
  Number of text strings: 1 of 9
    Comment (xTXt deflate compressed): The comment

但是剩余的文本字符串丢失了。我也尝试了stackoverflow中的其他解决方案,它们也不起作用。pngchunks提供了以下信息:

代码语言:javascript
复制
Chunk: Data Length 13 (max 2147483647), Type 1380206665 [IHDR]
  Critical, public, PNG 1.2 compliant, unsafe to copy
  IHDR Width: 200
  IHDR Height: 240
  IHDR Bitdepth: 8
  IHDR Colortype: 4
  IHDR Compression: 0
  IHDR Filter: 0
  IHDR Interlace: 0
  IHDR Compression algorithm is Deflate
  IHDR Filter method is type zero (None, Sub, Up, Average, Paeth)
  IHDR Interlacing is disabled
  Chunk CRC: -277290027
Chunk: Data Length 1 (max 2147483647), Type 1111970419 [sRGB]
  Ancillary, public, PNG 1.2 compliant, unsafe to copy
  ... Unknown chunk type
  Chunk CRC: -1362223895
Chunk: Data Length 2 (max 2147483647), Type 1145523042 [bKGD]
  Ancillary, public, PNG 1.2 compliant, unsafe to copy
  ... Unknown chunk type
  Chunk CRC: -2020619073
Chunk: Data Length 9 (max 2147483647), Type 1935231088 [pHYs]
  Ancillary, public, PNG 1.2 compliant, safe to copy
  ... Unknown chunk type
  Chunk CRC: 2024095606
Chunk: Data Length 7 (max 2147483647), Type 1162692980 [tIME]
  Ancillary, public, PNG 1.2 compliant, unsafe to copy
  ... Unknown chunk type
  Chunk CRC: 292503155
Chunk: Data Length 19 (max 2147483647), Type 1951942004 [tEXt]
  Ancillary, public, PNG 1.2 compliant, safe to copy
  ... Unknown chunk type
  Chunk CRC: -528748773
Chunk: Data Length 8192 (max 2147483647), Type 1413563465 [IDAT]
  Critical, public, PNG 1.2 compliant, unsafe to copy
  IDAT contains image data
  Chunk CRC: -309524018
Chunk: Data Length 8192 (max 2147483647), Type 1413563465 [IDAT]
  Critical, public, PNG 1.2 compliant, unsafe to copy
  IDAT contains image data
  Chunk CRC: -1646200198
Chunk: Data Length 2301 (max 2147483647), Type 1413563465 [IDAT]
  Critical, public, PNG 1.2 compliant, unsafe to copy
  IDAT contains image data
  Chunk CRC: -810299134
Chunk: Data Length 0 (max 2147483647), Type 1145980233 [IEND]
  Critical, public, PNG 1.2 compliant, unsafe to copy
  IEND contains no data
  Chunk CRC: -1371381630

示例图像:

我很困惑。谢谢大家。

PD:显然,每张图片都会发生这种情况,这不是特例,而是常见的情况。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2012-05-08 00:35:56

这是一种从“有效”png文件中提取文本信息的简单Java方法。国际字符可能无法在控制台上很好地显示。

代码语言:javascript
复制
import java.io.*;
import java.util.zip.InflaterInputStream;

public class PNGExtractText
{  
  /** PNG signature constant */
  public static final long SIGNATURE = 0x89504E470D0A1A0AL;
  /** PNG Chunk type constants, 4 Critical chunks */
  /** Image header */
  private static final int IHDR = 0x49484452;   // "IHDR"
  /** Image trailer */
  private static final int IEND = 0x49454E44;   // "IEND"
  /** Palette */
  /** Textual data */
  private static final int tEXt = 0x74455874;   // "tEXt"
  /** Compressed textual data */
  private static final int zTXt = 0x7A545874;   // "zTXt"
  /** International textual data */
  private static final int iTXt = 0x69545874;   // "iTXt"
  /** Background color */

  public static void showText(InputStream is) throws Exception
  {
      //Local variables for reading chunks
      int data_len = 0;
      int chunk_type = 0;
      byte[] buf=null;

      long signature = readLong(is);

      if (signature != SIGNATURE)
      {
          System.out.println("--- NOT A PNG IMAGE ---");
          return;
      }

        /** Read header */
        /** We are expecting IHDR */
      if ((readInt(is)!=13)||(readInt(is) != IHDR))
      {
          System.out.println("--- NOT A PNG IMAGE ---");
          return;
      }

      buf = new byte[13+4];//13 plus 4 bytes CRC
      is.read(buf,0,17);

      while (true)
      {
          data_len = readInt(is);
          chunk_type = readInt(is);
          //System.out.println("chunk type: 0x"+Integer.toHexString(chunk_type));

          if (chunk_type == IEND)
          {
             System.out.println("IEND found");
             int crc = readInt(is);
             break;
          }

          switch (chunk_type)
          {
               case zTXt:
               {  
                   System.out.println("zTXt chunk:");
                   buf = new byte[data_len];
                   is.read(buf);
                   int keyword_len = 0;
                   while(buf[keyword_len]!=0) keyword_len++;
                   System.out.print(new String(buf,0,keyword_len,"UTF-8")+": ");
                   InflaterInputStream ii = new InflaterInputStream(new ByteArrayInputStream(buf,keyword_len+2, data_len-keyword_len-2));
                   InputStreamReader ir = new InputStreamReader(ii,"UTF-8");
                   BufferedReader br = new BufferedReader(ir);                       
                   String read = null;
                   while((read=br.readLine()) != null) {
                      System.out.println(read);
                   }
                   System.out.println("**********************");
                   is.skip(4);
                   break;
               }

               case tEXt:
               {
                   System.out.println("tEXt chunk:");
                   buf = new byte[data_len];
                   is.read(buf);
                   int keyword_len = 0;
                   while(buf[keyword_len]!=0) keyword_len++;
                   System.out.print(new String(buf,0,keyword_len,"UTF-8")+": ");
                   System.out.println(new String(buf,keyword_len+1,data_len-keyword_len-1,"UTF-8"));
                   System.out.println("**********************");
                   is.skip(4);
                   break;
               }

               case iTXt:
               {
                  // System.setOut(new PrintStream(new File("TextChunk.txt"),"UTF-8"));
                  /**
                   * Keyword:             1-79 bytes (character string)
                   * Null separator:      1 byte
                   * Compression flag:    1 byte
                   * Compression method:  1 byte
                   * Language tag:        0 or more bytes (character string)
                   * Null separator:      1 byte
                   * Translated keyword:  0 or more bytes
                   * Null separator:      1 byte
                   * Text:                0 or more bytes
                   */
                   System.out.println("iTXt chunk:");
                   buf = new byte[data_len];
                   is.read(buf);
                   int keyword_len = 0;
                   int trans_keyword_len = 0;
                   int lang_flg_len = 0;
                   boolean compr = false;
                   while(buf[keyword_len]!=0) keyword_len++;
                   System.out.print(new String(buf,0,keyword_len,"UTF-8"));
                   if(buf[++keyword_len]==1) compr = true;
                   keyword_len++;//Skip the compresssion method byte.
                   while(buf[++keyword_len]!=0) lang_flg_len++;
                   //////////////////////
                   System.out.print("(");
                   if(lang_flg_len>0)
                       System.out.print(new String(buf,keyword_len-lang_flg_len, lang_flg_len, "UTF-8"));
                   while(buf[++keyword_len]!=0) trans_keyword_len++;
                   if(trans_keyword_len>0)
                       System.out.print(" "+new String(buf,keyword_len-trans_keyword_len, trans_keyword_len, "UTF-8"));
                   System.out.print("): ");
                   /////////////////////// End of key.
                   if(compr) //Compressed text
                   {
                       InflaterInputStream ii = new InflaterInputStream(new ByteArrayInputStream(buf,keyword_len+1, data_len-keyword_len-1));
                       InputStreamReader ir = new InputStreamReader(ii,"UTF-8");
                       BufferedReader br = new BufferedReader(ir);                       
                       String read = null;
                       while((read=br.readLine()) != null) {
                          System.out.println(read);
                       }
                   }
                   else //Uncompressed text
                   {
                       System.out.println(new String(buf,keyword_len+1,data_len-keyword_len-1,"UTF-8"));                           
                   }
                   System.out.println("**********************");
                   is.skip(4);
                   break;
               }    

               default:
               {
                   buf = new byte[data_len+4];
                   is.read(buf,0, data_len+4);
                   break;
               }
          }
      }
      is.close();
 }

 private static int readInt(InputStream is) throws Exception
 {
     byte[] buf = new byte[4];
     is.read(buf,0,4);
     return (((buf[0]&0xff)<<24)|((buf[1]&0xff)<<16)|
                            ((buf[2]&0xff)<<8)|(buf[3]&0xff));
 }

 private static long readLong(InputStream is) throws Exception
 {
     byte[] buf = new byte[8];
     is.read(buf,0,8);
     return (((buf[0]&0xffL)<<56)|((buf[1]&0xffL)<<48)|
                            ((buf[2]&0xffL)<<40)|((buf[3]&0xffL)<<32)|((buf[4]&0xffL)<<24)|
                              ((buf[5]&0xffL)<<16)|((buf[6]&0xffL)<<8)|(buf[7]&0xffL));
 }

 public static void main(String args[]) throws Exception
 {
    FileInputStream fs = new FileInputStream(args[0]);
    showText(fs);       
 }
}

注意事项::用法: java PNGExtractText image.png

这是我从官方png测试套件中获得的测试图像ctzn0g04.png:

代码语言:javascript
复制
D:\tmp>java PNGExtractText ctzn0g04.png
tEXt chunk:
Title: PngSuite
**********************
tEXt chunk:
Author: Willem A.J. van Schaik
(willem@schaik.com)
**********************
zTXt chunk:
Copyright: Copyright Willem van Schaik, Singapore 1995-96
**********************
zTXt chunk:
Description: A compilation of a set of images created to test the
various color-types of the PNG format. Included are
black&white, color, paletted, with alpha channel, with
transparency formats. All bit-depths allowed according
to the spec are present.
**********************
zTXt chunk:
Software: Created on a NeXTstation color using "pnmtopng".
**********************
zTXt chunk:
Disclaimer: Freeware.
**********************
IEND found

编辑:刚刚找到了你的链接,并尝试了图片,得到了如下结果:

代码语言:javascript
复制
D:\tmp>java PNGExtractText concept_Sjet_dream6.png
tEXt chunk:
Comment: The comment
**********************
IEND found

上面的示例已经成为https://github.com/dragon66/icafe上的Java image库的一部分

票数 4
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/10454733

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档