文章/答案/技术大牛

发布

问iconv编码转换问题
EN

Stack Overflow用户

提问于 2010-01-29 22:05:25

回答 2查看 13.8K关注 0票数 0

我在将字符串从utf8转换为gb2312时遇到了问题。我的转换函数如下

void convert(const char *from_charset,const char *to_charset, char *inptr, char *outptr)
{
    size_t inleft = strlen(inptr);
    size_t outleft = inleft;
    iconv_t cd;     /* conversion descriptor */

    if ((cd = iconv_open(to_charset, from_charset)) == (iconv_t)(-1)) 
    {
            fprintf(stderr, "Cannot open converter from %s to %s\n", from_charset, to_charset);
            exit(8);
    }

    /* return code of iconv() */
    int rc = iconv(cd, &inptr, &inleft, &outptr, &outleft);
    if (rc == -1) 
    {
            fprintf(stderr, "Error in converting characters\n");

            if(errno == E2BIG)
                    printf("errno == E2BIG\n");
            if(errno == EILSEQ)
                    printf("errno == EILSEQ\n");
            if(errno == EINVAL)
                    printf("errno == EINVAL\n");

            iconv_close(cd);
            exit(8);
    }
    iconv_close(cd);
}

这是我如何使用它的一个示例：

int len = 1000;
char *result = new char[len];
convert("UTF-8", "GB2312", some_string, result);

编辑:大多数时候我会得到一个E2BIG错误。

c++

iconv

回答 2

Stack Overflow用户

回答已采纳

发布于 2010-01-30 20:47:31

outleft应该是输出缓冲区的大小(例如1000字节)，而不是传入字符串的大小。

在转换时，字符串长度通常会在转换过程中发生变化，直到转换完成后才能知道它会有多长。E2BIG意味着输出缓冲区不够大，在这种情况下，您需要给它更多的输出缓冲区空间(请注意，它已经转换了一些数据，并相应地调整了传递给它的四个变量)。

票数 5

Stack Overflow用户

发布于 2010-01-30 23:31:44

正如其他人所指出的，E2BIG意味着输出缓冲区不够大，无法进行转换，并且您使用了错误的outleft值。

但我也注意到您的函数可能存在其他一些问题。也就是说，根据函数的工作方式，调用者无法知道输出字符串中有多少字节。您的convert()函数既没有nul终止输出缓冲区，也没有办法告诉调用者它写入outptr的字节数。

如果您想要处理nul结尾的字符串(这似乎就是您想要做的，因为您的输入字符串是nul结尾的)，您可能会发现以下方法要好得多：

char *
convert (const char *from_charset, const char *to_charset, const char *input)
{
 size_t inleft, outleft, converted = 0;
 char *output, *outbuf, *tmp;
 const char *inbuf;
 size_t outlen;
 iconv_t cd;

 if ((cd = iconv_open (to_charset, from_charset)) == (iconv_t) -1)
  return NULL;

 inleft = strlen (input);
 inbuf = input;

 /* we'll start off allocating an output buffer which is the same size
  * as our input buffer. */
 outlen = inleft;

 /* we allocate 4 bytes more than what we need for nul-termination... */
 if (!(output = malloc (outlen + 4))) {
  iconv_close (cd);
  return NULL;
 }

 do {
  errno = 0;
  outbuf = output + converted;
  outleft = outlen - converted;

  converted = iconv (cd, (char **) &inbuf, &inleft, &outbuf, &outleft);
  if (converted != (size_t) -1 || errno == EINVAL) {
   /*
    * EINVAL  An  incomplete  multibyte sequence has been encoun-
    *         tered in the input.
    *
    * We'll just truncate it and ignore it.
    */
   break;
  }

  if (errno != E2BIG) {
   /*
    * EILSEQ An invalid multibyte sequence has been  encountered
    *        in the input.
    *
    * Bad input, we can't really recover from this. 
    */
   iconv_close (cd);
   free (output);
   return NULL;
  }

  /*
   * E2BIG   There is not sufficient room at *outbuf.
   *
   * We just need to grow our outbuffer and try again.
   */

  converted = outbuf - out;
  outlen += inleft * 2 + 8;

  if (!(tmp = realloc (output, outlen + 4))) {
   iconv_close (cd);
   free (output);
   return NULL;
  }

  output = tmp;
  outbuf = output + converted;
 } while (1);

 /* flush the iconv conversion */
 iconv (cd, NULL, NULL, &outbuf, &outleft);
 iconv_close (cd);

 /* Note: not all charsets can be nul-terminated with a single
  * nul byte. UCS2, for example, needs 2 nul bytes and UCS4
  * needs 4. I hope that 4 nul bytes is enough to terminate all
  * multibyte charsets? */

 /* nul-terminate the string */
 memset (outbuf, 0, 4);

 return output;
}

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/2162390

复制

相似问题

问iconv编码转换问题
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问iconv编码转换问题EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问iconv编码转换问题
EN