首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >从文件中读取行时C++切断字符

从文件中读取行时C++切断字符
EN

Stack Overflow用户
提问于 2015-04-16 17:04:47
回答 1查看 1.5K关注 0票数 1

我知道这与Windows和linux中的行尾指示符之间的区别有关,但我不知道如何修复它。

我确实看过Getting std :: ifstream to handle LF, CR, and CRLF?上的帖子,但当我使用那个帖子中的一个简化版本时(我使用的是直读而不是缓冲读取,因为我知道存在性能上的损失,但希望暂时保持简单),它并没有解决我的问题,所以我希望在这里得到一些指导。我确实测试了我修改过的post版本,它成功地找到和替换了字符和一个临时用于测试场景的选项卡,所以逻辑是有效的,但我仍然有问题。

我知道我在这里错过了一些非常基本的东西,当有人帮助我解决这个问题时,我可能会感到非常愚蠢,所以我宁愿不公开承认我的愚蠢,但我已经为此做了一个星期的工作,无法解决,所以我在寻求帮助。

我是C++的新手,所以如果我在这里做一些真正的事情,请温柔地回答:-)

我有以下一个文件程序,我已经创建了原型,我想做什么。所以这是一个简单的例子,但我需要让它更好地发挥作用。这不是家庭作业问题,我真的需要解决这个问题才能创建一个应用程序。

该方案(如下所示):

  • 在没有错误或警告的情况下编译并在CentOS框上干净地运行;
  • 交叉编译没有错误或警告使用mingw32上的CentOS框,并在CentOS上运行干净;
  • 使用在linux上创建的输入文本文件时,在linux和Windows上生成正确的(预期的)输出
  • 使用在Windows中创建的输入文本文件时,不会产生正确的(预期的)输出

因此,是的,它与linux和Windows之间的不同文件格式有关,很可能与换行符有关,但我尝试过适应这种情况,但它不起作用。

更复杂的是,我发现旧Mac换行符又不同了:

  • linux = \n
  • Windows = \r\n
  • Mac =r

救命啊!.

我想:

  1. 读取txt文件的内容
  2. 对内容执行一些验证检查(此处未完成;下一步将执行)
  3. 将报表输出到另一个txt文件

因此,我需要检查文件,确定所使用的换行符并相应地处理。

有什么建议吗?

我当前(简化的)代码(尚未进行验证检查)是:

代码

代码语言:javascript
复制
int main(int argc, char** argv)
{
    std::string rc_input_file_name = "rc_input_file.txt";
    std::string rc_output_file_name = "rc_output_file.txt";

    char * RC_INPUT_FILE_NAME = new char[ rc_input_file_name.length() + 1 ];
    strcpy( RC_INPUT_FILE_NAME, rc_input_file_name.c_str() );
    char * RC_OUTPUT_FILE_NAME = new char[ rc_output_file_name.length() + 1 ];
    strcpy( RC_OUTPUT_FILE_NAME, rc_output_file_name.c_str() );

    bool failure_flag = false;

    std::ifstream rc_input_file_holder;
    rc_input_file_holder.open( RC_INPUT_FILE_NAME , std::ios::in );

    if ( ! rc_input_file_holder.is_open() )
    {
       std::cout << "Error - Could not open the input file" << std::endl;
       failure_flag = true;
    }
    else
    {
       std::ofstream rc_output_file_holder;
       rc_output_file_holder.open( RC_OUTPUT_FILE_NAME , std::ios::out | std::ios::trunc );

       if ( ! rc_output_file_holder.is_open() )
       {
          std::cout << "Error - Could not open or create the output file" << std::endl;
          failure_flag = true;
       }
       else
       {
          std::streampos char_num = 0;

          long int line_num = 0;
          long int starting_char_pos = 0;

          std::string file_line = "";
          while ( getline( rc_input_file_holder , file_line ) )
          {
             line_num = line_num + 1;
             long int file_line_length = file_line.length() +1 ;
             long int char_num = 0;
             for ( char_num = 0 ; char_num < file_line_length ;  char_num++ )
             {
                if ( file_line[ char_num ] == '\n' )
                {
                    if ( char_num == file_line_length - 1 )
                    {
                       file_line[ char_num ] = '-';
                    }
                    else
                    {
                       if ( file_line[ char_num + 1 ] == '\n' )
                       {
                          file_line[ char_num ] = ' ';
                       }
                       else
                       {
                          file_line[ char_num ] = ' ';
                       }
                    }
                }
             }

             int field_display_width = 4;
             std::cout << "Line " << std::setw( field_display_width ) << line_num << 
                    ", starting at character position " << std::setw( field_display_width ) << starting_char_pos << 
                    ", contains " << file_line << "." << std::endl;

             starting_char_pos = rc_input_file_holder.tellg();

             rc_output_file_holder << "Line " << line_num << ": " << file_line << std::endl;
          }

          rc_input_file_holder.close();
          rc_output_file_holder.close();
          delete [] RC_INPUT_FILE_NAME;
          delete [] RC_OUTPUT_FILE_NAME;
       }
    }

    if ( failure_flag )
    {
       return EXIT_FAILURE;
    }
    else
    {
       return EXIT_SUCCESS;
    }
}

/code

同样的代码有很多注释(作为学习经验的好处)是:

代码

代码语言:javascript
复制
/*
 * The main function, from which all else is accessed
 */
int main(int argc, char** argv)
{


    /*
    *Program to:
    *  1) read from a text file
    *  2) do some validation checks on the content of that text file
    *  3) output a report to another text file
    */

    // Set the filenames to be used in this file-handling program
    std::string rc_input_file_name = "rc_input_file.txt";
    std::string rc_output_file_name = "rc_output_file.txt";

    // Note that when the filenames are used in the .open statements below
    //   they have to be in a cstring format, not a string format
    //   so the conversion is done here once
    // Use the Capitalized form of the file name to indicate the converted value
    //   (remember, variable names are case-sensitive in C/C++ so NAME is different than name)
    // This conversion could be done 3 ways:
    // - done each time the cstring is needed: 
    //          file_holder_name.open( string_file_name.c_str() )
    // - done once and referred to each time
    //     simple method: 
    //          const char * converted_file_name = string_file_name.c_str()
    //     explicit method (2-step):              
    //          char * converted_file_name = new char[ string_file_name.length() + 1 ];
    //          strcpy( converted_file_name, string_file_name.c_str() );
    // This program uses the explicit method to do it once for each filename
    // because by doing so, the char array created has variable length
    // and you do not risk buffer overflow
    char * RC_INPUT_FILE_NAME = new char[ rc_input_file_name.length() + 1 ];
    strcpy( RC_INPUT_FILE_NAME, rc_input_file_name.c_str() );
    char * RC_OUTPUT_FILE_NAME = new char[ rc_output_file_name.length() + 1 ];
    strcpy( RC_OUTPUT_FILE_NAME, rc_output_file_name.c_str() );

    // This will be set to true if either the input or output file cannot be opened
    bool failure_flag = false;

    // Open the input file
    std::ifstream rc_input_file_holder;
    rc_input_file_holder.open( RC_INPUT_FILE_NAME , std::ios::in );

    // Validate that the input file was properly opened/created
    // If not, set failure flag
    if ( ! rc_input_file_holder.is_open() )
    {
       // Could not open the input file; set failure flag to true
       std::cout << "Error - Could not open the input file" << std::endl;
       failure_flag = true;
    }
    else
    {
       // Open the output file
       // Create one if none previously existed
       // Erase the contents if it already existed
       std::ofstream rc_output_file_holder;
       rc_output_file_holder.open( RC_OUTPUT_FILE_NAME , std::ios::out | std::ios::trunc );

       // Validate that the output file was properly opened/created
       // If not, set failure flag
       if ( ! rc_output_file_holder.is_open() )
       {
          // Could not open the output file; set failure flag to true
          std::cout << "Error - Could not open or create the output file" << std::endl;
          failure_flag = true;
       }
       else
       {
          // Get the current position where the character pointer is at
          // Get it before the getline is executed so it gives you where the current line starts
          std::streampos char_num = 0;

          // Initialize the line_number and starting character position to 0
          long int line_num = 0;
          long int starting_char_pos = 0;

          std::string file_line = "";
          while ( getline( rc_input_file_holder , file_line ) )
          {
             // Set the line number counter to the current line (first line is Line 1, not 0)
             line_num = line_num + 1;


             // Check if the new line designator uses the standard for:
             //   - linux (\n)
             //   - Windows (\r\n)
             //   - Old Mac (\r)
             // Convert any non-linux new line designator to linux new line designator (\n)
             long int file_line_length = file_line.length() +1 ;
             long int char_num = 0;
             for ( char_num = 0 ; char_num < file_line_length ;  char_num++ )
             {
                // If a \r character is found, decide what to do with it
                if ( file_line[ char_num ] == '\n' )
                {
                    // If the \r char  is the last line character (before the null terminator)
                    //   the file use the old Mac format to indicate new line
                    //   so replace the \r with \n
                    if ( char_num == file_line_length - 1 )
                    {
                       file_line[ char_num ] = '-';
                    }
                    else
                    // If the \r char is NOT the last line character (before the null terminator)
                    {
                       // If the next character is a \n, the file uses the Windows format to indicate new line
                       //   so replace the \r with space
                       if ( file_line[ char_num + 1 ] == '\n' )
                       {
                          file_line[ char_num ] = ' ';
                       }
                       // If the next char is NOT a \n (and the pointer is NOT at the last line character)
                       //   then for some reason, there is a \r in the interior of the string
                       // At this point, I do  not know why this would be
                       //   but I don't want it left there, so replace it with a space
                       // Yes, I  know this is the same as the above action, 
                       //   but I left is separate to allow for future flexibility
                       else
                       {
                          file_line[ char_num ] = '-';
                       }
                    }
                }
             }


             // Output the contents of the line just fetched
             // This is done in this prototype file as a placeholder
             // In the real program, this is where the validation check(s) for the line would occur)
             //   and would likely be done in a function or class
             // The setw() function requires #include <iomanip>
             int field_display_width = 4;
             std::cout << "Line " << std::setw( field_display_width ) << line_num << 
                    ", starting at character position " << std::setw( field_display_width ) << starting_char_pos << 
                    ", contains " << file_line << "." << std::endl;

             // Reset the character pointer to the end of this line => start of next line
             starting_char_pos = rc_input_file_holder.tellg();

             // Output the (edited) contents of the line just fetched
             // This is done in this prototype file as a placeholder
             // In the real program, this is where the results of the validation checks would be recorded
             // You could put this in an if statement and record nothing if the line was valid
             rc_output_file_holder << "Line " << line_num << ": " << file_line << std::endl;
          }

          // Clean up by:
          //  - closing the files that were opened (input and output)
          //  - deleting the character arrays created
          rc_input_file_holder.close();
          rc_output_file_holder.close();
          delete [] RC_INPUT_FILE_NAME;
          delete [] RC_OUTPUT_FILE_NAME;
       }
    }

    // Check to see if all operations have successfully completed
    // If so exit this program with success indicated
    // If not,exit this program with failure indicated
    if ( failure_flag )
    {
       return EXIT_FAILURE;
    }
    else
    {
       return EXIT_SUCCESS;
    }
}

/code

我有所有正确的包含,并且在为linux编译或为Windows交叉编译时没有生成错误或警告。

我使用的输入文件只有5行(愚蠢)文本:

代码语言:javascript
复制
A new beginning
just in case
the file was corrupted
and the darn program was working fine ...
at least it was on linux

linux上的输出如预期的那样:

代码语言:javascript
复制
Line    1, starting at character position    0, contains A new beginning.
Line    2, starting at character position   16, contains just in case.
Line    3, starting at character position   29, contains the file was corrupted.
Line    4, starting at character position   52, contains and the darn program was working fine ....
Line    5, starting at character position   94, contains at least it was on linux.

在linux中导入文本文件创建时,Windows中的输出是相同的,但当我使用记事本并在Windows中手动重新创建同一个文件时,输出如下

代码语言:javascript
复制
Line    1, starting at character position    0, contains A new beginning.
Line    2, starting at character position   20, contains t in case.
Line    3, starting at character position   33, contains e file was corrupted.
Line    4, starting at character position   56, contains nd the darn program was working fine ....
Line    5, starting at character position   98, contains at least it was on linux.

注意第2行、第3行、第4行和第5行起始字符位置的差异注意第2行、第3行和第4行开头缺少的字符

  • 第2行缺少3个字符
  • 第3行缺少2个字符
  • 第5行中缺少1个字符
  • 在第5行中缺少0个字符

欢迎所有想法..。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-04-25 21:21:40

见决议

cross-compiler out of date

为了把它网出来,通过apt-get安装安装的混合编译器已经过时了。当我手动安装一个更新的交叉编译器,并更新设置,以防止一些错误消息,所有的工作良好。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/29681515

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档