文章/答案/技术大牛

发布

问fgets()替代
EN

Code Review用户

提问于 2017-10-30 21:26:11

回答 1查看 3.2K关注 0票数 13

size_t readline_tostring(char * restrict dest, size_t size, FILE * restrict stream)

fgets()是可以的，但是它有一些缺点，下面的readline_tostring()地址用于读取一行：

当缓冲区不足时，行的其余部分将被消耗(并丢失)。指出一个错误。
在C中，输入行由'\n的C11 7.21.2 2决定，当流以新行以外的内容结束时，处理方式是实现定义的行为。J.3.12。这段代码将'\n‘和文件结束处理为相同的。在这两种情况下，保存的缓冲区中都不包含'\n'。
如果代码读取'\0'，则用fgets()识别是不实际的。此代码返回dest中使用的空间的大小，其中包括附加的空字符。
较小的问题包括fgets()对NULL增强的处理、小缓冲尺寸、ferror()上未定义的缓冲区状态以及int与size_t的使用。下面的代码也清楚地-我希望-处理这一点。
另一种分配内存:为每个外部输入分配内存可能导致滥用。这允许外部力量压倒内存分配。下面不使用像将强制‘n’作为字符读取限制的getline替换或吸气线这样的内存分配。另一种选择可以使用有限的分配，但这里没有这样做。

主审查请求(非测试代码的)

可移植性问题:在某些特定系统上，常见或罕见的情况会失败吗？

异常/错误情况的处理:是否有建议的备选方案？

性能方面的问题在得到实际测量结果的支持时会得到赞赏。

一般性评论(关于任何代码)。

为了便于代码评审，下面的代码被列出为一个文件，但通常是单独的.h、.c文件。

/////////////////////////////////////////////////////////////////
// Header info, usually in some *.h file

/*
 * Read a _line_ of text. Save characters up to a limit, and form a _string_.
 * The string saved in `dest` never contains a '\n'.
 * A null character is always appended ***1.
 * Reading only attempted in non-pathological cases.  
 * Otherwise the end-of-file flag and error flags are cleared before reading.
 *
 * Normal: The return value is greater than  0 and represents the _size_ of `dest` used.
 *     This includes all non-'\n' characters read and an appended null character. ***2
 *     Reading text "abc\n" forms string "abc" and return 4
 *
 * Exceptional cases:
 *   In these cases, the return value is 0 and `dest[0] = '\0'` except as noted.
 *   1: Pathological: Buffer invalid for string.
 *     `dest == NULL` or `size == 0` (No data is written into `dest`)
 *   2: Pathological: Stream invalid.
 *     `stream == NULL`
 *   3: End-of-file occurs and no data read.
 *     Typical end-of-file: `feof(stream)` will return true.
 *   4: Input error.
 *     `ferror(stream)` will return true.
 *     strlen(dest) is number of characters successfully read. ***3
 *   5: Buffer is too small.
 *     First `size-1` non-'\n' characters are saved in `dest[]`.
 *     Additional characters are read up to and including '\n'.  These are not saved.
 *     The end-of-file flag and error flags are cleared again.
 *     strlen(dest) is number of characters successfully save. ***3
 *
 * ***1 Except when `dest == NULL` or `size == 0`
 * ***2 If code reads a null character, it is treated like any non-'\n' character.
 * ***3 strlen(dest) does not reflect the number of characters in `dest` 
 *       if a null character was read and saved.
 *
 */

#include <stdio.h>
#include <stdlib.h>
size_t readline_tostring(char * restrict dest, size_t size,
    FILE * restrict stream);

/////////////////////////////////////////////////////////////////
// Code, usually in some *.c file

size_t readline_tostring(char * restrict dest, size_t size,
    FILE * restrict stream) {
  // Handle pathological cases
  if (dest == NULL || size == 0) {
    return 0;
  }
  if (stream == NULL) {
    dest[0] = '\0';
    return 0;
  }
  clearerr(stream);

  size_t i = 0;
  int ch;
  while ((ch = fgetc(stream)) != '\n' && ch != EOF) {
    if (i < size) {
      dest[i++] = (char) ch;
    }
  }

  // Add null character termination - always
  // If too many were read
  if (i >= size) {
    dest[size - 1] = '\0';
    clearerr(stream);
    return 0;
  }
  dest[i] = '\0';

  if ((ch == EOF) && (i == 0 || ferror(stream))) { // end-of-file or error
    return 0;
  }

  clearerr(stream);
  return i + 1;
}

/////////////////////////////////////////////////////////////////
// Test code

#include <string.h>
#include <ctype.h>

// Sample usage showing how to discern the results.
void sample(char * restrict dest, size_t size, FILE * restrict stream) {
  size_t sz;
  while ((sz = readline_tostring(dest, size, stream)) > 0) {
    printf("Size:%zu string:\"%s\"\n", sz, dest);
  }

  // Well formed code need not perform this 1st test
  if (dest == NULL || size == 0 || stream == NULL) {
    puts("Pathological case");
  } else if (feof(stream)) {
    puts("End of file");
  } else if (ferror(stream)) {
    puts("Input error");
  } else {
    printf("Line too long: begins with <%s>\n", dest);
  }
  puts("");
}

void test4(const char *s) {
  FILE *stream = fopen("tmp.bin", "wb");
  size_t len = strlen(s);
  fwrite(s, 1, len, stream);
  fclose(stream);
  for (size_t i = 0; i < len; i++) {
    printf(isprint((unsigned char)s[i]) ? "%c" : "<%d>", s[i]);
  }
  puts("");

  stream = fopen("tmp.bin", "r");
  char buf[4];
  sample(buf, sizeof buf, stream);
  fclose(stream);
  fflush(stdout);
}

int main(void) {
  test4("12\nAB\n");
  test4("123\nABC\n");
  test4("1234\nABCD\n");
  test4("");
  test4("1");
  test4("12");
  test4("123");
  test4("1234");
  return 0;
}

输出

12<10>AB<10>
Size:3 string:"12"
Size:3 string:"AB"
End of file

123<10>ABC<10>
Size:4 string:"123"
Size:4 string:"ABC"
End of file

1234<10>ABCD<10>
Line too long: begins with <123>


End of file

1
Size:2 string:"1"
End of file

12
Size:3 string:"12"
End of file

123
Size:4 string:"123"
End of file

1234
Line too long: begins with <123>

strings

回答 1

Code Review用户

回答已采纳

发布于 2018-02-27 11:29:56

第一印象

代码清晰易懂，易于遵循。您有两个选项可以忽略超长行(另一个是吸收溢出的第二个循环)。这个选择似乎是合理的。

在定义函数时，我会考虑将参数设置为const，这只是为了帮助避免意外事故，并且非常清楚：

size_t readline_tostring(char *const restrict dest, size_t const size,
                         FILE *const restrict stream)
{

清除流错误

我不相信在这段代码中调用clearerr()是正确的，因为这可能会向调用者隐藏有用的信息。无可奉告，我认为这是应该的。

返回偏线

最好是阻止在到达EOF时返回的部分行的使用，在这种情况下设置dest[0] = '\0'。可以说，如果返回值为0，客户端代码可能希望发现有可能使用的结果，但dest[0]是非空的(例如，如果丢弃的部分位于行尾注释中)。

当然，当我们得到一个错误时，最好让dest成为一个空字符串。

测试

测试代码看起来有点仓促。我们如何知道在当前工作目录中覆盖tmp.bin是安全的？我们应该使用tmpfile()或tmpnam()，并实际测试我们是否成功地编写了临时数据。然后取消文件的链接(如果不使用tmpfile()，它已经创建了一个未链接的文件)。

以二进制模式编写文件和以文本模式读取文件似乎是一个糟糕的选择，我甚至不相信它在面向记录的文件系统上是合法的。

这些测试中没有一个是自我检查的，也没有一个包含嵌入式NUL字符或连续换行符.

这是一些自我检查的测试。我对此并不完全满意，但希望这能成为一个开端：

#include <stdarg.h>
#include <string.h>
#include <ctype.h>

// Sample usage showing how to discern the results.
size_t sample(char *const restrict dest, size_t const size,
              FILE *const restrict stream)
{
    size_t sz = readline_tostring(dest, size, stream);
    printf("Size:%zu string:\"%s\"\n", sz, dest);

    // Well formed code need not perform this 1st test
    if (dest == NULL || size == 0 || stream == NULL) {
        puts("Pathological case");
    } else if (feof(stream)) {
        puts("End of file");
    } else if (ferror(stream)) {
        puts("Input error");
    } else {
        printf("Line too long: begins with <%s>\n", dest);
    }

    return sz;
}

/* return a count of errors (0 for success) */
/* Varargs are (size, string) pairs of expected result */
/* Terminate with -1 */
int test4(const char *s, size_t len, ...)
{
    for (size_t i = 0; i < len; i++) {
        printf(isprint((unsigned char)s[i]) ? "%c" : "<%d>", s[i]);
    }
    puts("");

    FILE *const stream = tmpfile();
    if (!stream || fwrite(s, 1, len, stream) != len) {
        perror("");
        return 1;
    }
    rewind(stream);


    va_list args;
    va_start(args, len);

    int errors = 0;
    int expected;
    while ((expected = va_arg(args, int)) >= 0) {
        const char *s = va_arg(args, const char*);
        char buf[4];
        size_t actual = sample(buf, sizeof buf, stream);
        if (actual != (size_t)expected) {
            printf("FAIL (%d): Expected %d, got %zd\n",
                   ++errors, expected, actual);
        } else if (memcmp(buf, s, actual) != 0) {
            printf("FAIL (%d): Expected %s, got %s\n",
                   ++errors, s, buf);
        }
    }
    va_end(args);

    if (errors) {
        puts("FAILED");
    }
    puts("");

    fclose(stream);
    fflush(stdout);

    return errors;
}

int main(void) {
    /* two lines each of 4 chars and a newline */
    static const char *test_string = "1234\n"  "A\0BC\n";

    return
        + test4(test_string, 5, 0, "123", -1)
        + test4(test_string+1, 5, 4, "234", -1)
        + test4(test_string+3, 5, 2, "4", 4, "A\0B", -1)
        + test4(test_string+4, 5, 1, "",  0, "A\0BC", -1);
}

没有用于处理无效参数的测试(空或零长度缓冲区、空流)，也没有对病理病例(一个字节的缓冲区)的测试。这些都应该加进去。

票数 3

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/179201

复制

相似问题

问fgets()替代
EN

主审查请求(非测试代码的)

回答 1

Code Review用户

第一印象

清除流错误

返回偏线

测试

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问fgets()替代EN

主审查请求(非测试代码的)

回答 1

Code Review用户

第一印象

清除流错误

返回偏线

测试

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问fgets()替代
EN