首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >stdin的最小wc

stdin的最小wc
EN

Code Review用户
提问于 2018-03-28 18:26:51
回答 1查看 127关注 0票数 5

这是wc的最小重隐。它目前只支持stdin,不支持命令行参数,这是供以后版本使用的。

这是我第一个完整的Rust程序/包,所以我对任何评论都感兴趣,包括但不限于:

  • 文件,
  • 评论,
  • 一般的风格和类型,
  • 测试,
  • 任何其他评论。

Cargo.toml

代码语言:javascript
复制
[package]
name = "wc"
version = "0.1.0"

[dependencies]

src/lib.rs (on 游乐场)

代码语言:javascript
复制
use std::io::Read;

/// The statistics returned by `wordcount`.
#[derive(Debug, PartialEq, Eq, Clone, Copy)]
pub struct WordCountStats {
    /// number of bytes in the input
    pub bytes: usize,
    /// number of groups of consecutive non-whitespace characters
    pub words: usize,
    /// number of newline characters (`\n`)
    pub newlines: usize,
}

/// Returns the word count statistics of the given `reader`.
///
/// ```
/// use wc::{wordcount,WordCountStats};
///
/// assert_eq!(
///    wordcount("Hello, World!".as_bytes()).unwrap(),
///    WordCountStats {
///        bytes: 13,
///        words: 2,
///        newlines: 0,
///    }
/// );
/// ```
///
/// The statistics follow `wc` (`man 1 wc`) output:
///
/// * bytes is always the number of bytes (not utf8 characters or similar)
/// * words is the number of positive length consecutive non-whitespace runs
/// * newlines is the number of newlines (NOT the number of lines)
///
/// `wordcount` uses `bytes()` internally and tries not to
/// add any buffering to the `reader`. If you use an unbuffered
/// device, consider using `BufRead` around your content.
///
/// # Errors
/// If a `byte` couldn't get read you will get a `Err(std::io::Error)`.
/// This can happen if the socket disconnects suddenly, a filesystem
/// error occurred, or your scanner couldn't continue to read the stripes
/// from your cat.
pub fn wordcount<R>(reader: R) -> std::io::Result<WordCountStats>
where
    R: Read,
{
    let mut bytes = 0;
    let mut words = 0;
    let mut newlines = 0;
    let mut spacemode = true;

    for byte in reader.bytes() {
        bytes += 1;

        let c = byte?;

        if (c as char).is_whitespace() {
            spacemode = true
        } else if spacemode {
            // A non-whitespace character after a whitespace character sequence.
            words += 1;
            spacemode = false
        }

        if c as char == '\n' {
            newlines += 1
        }
    }

    Ok(WordCountStats {
        bytes,
        words,
        newlines,
    })
}

#[cfg(test)]
mod tests {
    use WordCountStats;
    fn wc_string(input: &str) -> ::WordCountStats {
        ::wordcount(input.as_bytes()).unwrap()
    }
    #[test]
    fn empty_input() {
        assert_eq!(
            wc_string(""),
            WordCountStats {
                bytes: 0,
                words: 0,
                newlines: 0,
            }
        )
    }
    #[test]
    fn single_letter_input() {
        assert_eq!(
            wc_string("a"),
            WordCountStats {
                bytes: 1,
                words: 1,
                newlines: 0,
            }
        )
    }
    #[test]
    fn single_space_input() {
        assert_eq!(
            wc_string(" "),
            WordCountStats {
                bytes: 1,
                words: 0,
                newlines: 0,
            }
        )
    }
    #[test]
    fn two_letters_separated_by_spaces() {
        assert_eq!(
            wc_string("a \t b"),
            WordCountStats {
                bytes: 5,
                words: 2,
                newlines: 0,
            }
        )
    }
    #[test]
    fn two_line_input() {
        assert_eq!(
            wc_string("\n"),
            WordCountStats {
                bytes: 1,
                words: 0,
                newlines: 1,
            }
        )
    }

    #[test]
    fn complicated_input() {
        assert_eq!(
            wc_string("Hello, World!\nHow are you today?\nI hope you're fine!"),
            WordCountStats {
                bytes: 52,
                words: 10,
                newlines: 2,
            }
        )
    }
}

src/bin/main.rs

这只是一个检查库的存根main。我将在以后的版本中对其进行扩展。你可以复习一下,但我并没有真正关注它。

代码语言:javascript
复制
extern crate wc;

fn main() {
    let filename = "-";
    let stdin = std::io::stdin();
    let handle = stdin.lock();
    match wc::wordcount(handle) {
        Err(e) => eprintln!("{}: {}", filename, e),
        Ok(wc::WordCountStats {
            bytes,
            words,
            newlines,
        }) => println!("{:8} {:8} {:12} {}", newlines, words, bytes, filename),
    }
}
EN

回答 1

Code Review用户

回答已采纳

发布于 2018-04-15 00:55:55

对于这样的程序,空格的定义是至关重要的。您的代码不记录空白的含义。检查代码表明,由于您按字节进行迭代,因此您可能不会处理任何更有趣的Unicode字符:

代码语言:javascript
复制
printf %b '\u200b \u200b \u200b \u200b \u200b' | cargo run
   0        5           34 -

我可能会为该结构派生Default,调用它,然后在函数中对该结构的字段进行变异,而不是有3个单独的变量:

for循环中不立即执行错误检查是非常奇怪的。我会将?移到第一个表达式。我也会隐藏名字byte,因为名称c使它听起来像一个char,而它不是,这就释放了变量名称c,使其成为类型类型的版本。

我会将空闲函数移到与WordCountStats相关的函数。

在测试中使用更多的垂直间距。垂直空间帮助我们的眼睛很容易地分辨出不同的相关部分。

由于您在测试中导入了WordCountStats,所以不需要使用::作为路径前缀。

我是你添加的小测试实用程序函数的粉丝。

代码语言:javascript
复制
use std::io::Read;

/// The statistics returned by `WordCountStats::from_reader`.
#[derive(Debug, PartialEq, Eq, Clone, Copy, Default)]
pub struct WordCountStats {
    /// number of bytes in the input
    pub bytes: usize,
    /// number of groups of consecutive non-whitespace characters
    pub words: usize,
    /// number of newline characters (`\n`)
    pub newlines: usize,
}

impl WordCountStats {
    /// Returns the word count statistics of the given `reader`.
    ///
    /// ```
    /// use wc::WordCountStats;
    ///
    /// assert_eq!(
    ///    WordCountStats::from_reader("Hello, World!".as_bytes()).unwrap(),
    ///    WordCountStats {
    ///        bytes: 13,
    ///        words: 2,
    ///        newlines: 0,
    ///    }
    /// );
    /// ```
    ///
    /// The statistics follow `wc` (`man 1 wc`) output:
    ///
    /// * bytes is always the number of bytes (not utf8 characters or similar)
    /// * words is the number of positive length consecutive non-whitespace runs
    /// * newlines is the number of newlines (NOT the number of lines)
    ///
    /// `WordCountStats::from_reader` uses `bytes()` internally and tries not to
    /// add any buffering to the `reader`. If you use an unbuffered
    /// device, consider using `BufRead` around your content.
    ///
    /// # Errors
    /// If a `byte` couldn't get read you will get a `Err(std::io::Error)`.
    /// This can happen if the socket disconnects suddenly, a filesystem
    /// error occurred, or your scanner couldn't continue to read the stripes
    /// from your cat.
    pub fn from_reader<R>(reader: R) -> std::io::Result<WordCountStats>
    where
        R: Read,
    {
        let mut stats = WordCountStats::default();
        let mut spacemode = true;

        for byte in reader.bytes() {
            let byte = byte?;
            let c = byte as char;

            stats.bytes += 1;

            if c.is_whitespace() {
                spacemode = true
            } else if spacemode {
                // A non-whitespace character after a whitespace character sequence.
                stats.words += 1;
                spacemode = false
            }

            if c == '\n' {
                stats.newlines += 1
            }
        }

        Ok(stats)
    }
}

#[cfg(test)]
mod tests {
    use WordCountStats;

    fn wc_string(input: &str) -> WordCountStats {
        ::WordCountStats::from_reader(input.as_bytes()).unwrap()
    }

    #[test]
    fn empty_input() {
        assert_eq!(
            wc_string(""),
            WordCountStats {
                bytes: 0,
                words: 0,
                newlines: 0,
            }
        )
    }

    #[test]
    fn single_letter_input() {
        assert_eq!(
            wc_string("a"),
            WordCountStats {
                bytes: 1,
                words: 1,
                newlines: 0,
            }
        )
    }

    #[test]
    fn single_space_input() {
        assert_eq!(
            wc_string(" "),
            WordCountStats {
                bytes: 1,
                words: 0,
                newlines: 0,
            }
        )
    }

    #[test]
    fn two_letters_separated_by_spaces() {
        assert_eq!(
            wc_string("a \t b"),
            WordCountStats {
                bytes: 5,
                words: 2,
                newlines: 0,
            }
        )
    }

    #[test]
    fn two_line_input() {
        assert_eq!(
            wc_string("\n"),
            WordCountStats {
                bytes: 1,
                words: 0,
                newlines: 1,
            }
        )
    }

    #[test]
    fn complicated_input() {
        assert_eq!(
            wc_string("Hello, World!\nHow are you today?\nI hope you're fine!"),
            WordCountStats {
                bytes: 52,
                words: 10,
                newlines: 2,
            }
        )
    }
}

因此,真正的测试用例是cargo run是否返回与wc相同的内容。

那就做个测试吧。我喜欢使用快速检查

代码语言:javascript
复制
use std::{
    io::Write,
    process::{Command, Stdio},
};

fn wc_program(bytes: &[u8]) -> WordCountStats {
    let mut child = Command::new("wc")
        .stdin(Stdio::piped())
        .stdout(Stdio::piped())
        .spawn()
        .unwrap();

    child.stdin.as_mut().unwrap().write_all(bytes).unwrap();

    let out = child.wait_with_output().unwrap();
    let out = String::from_utf8(out.stdout).unwrap();
    let mut nums = out.split_whitespace().map(|n| n.parse::<usize>().unwrap()).fuse();

    WordCountStats {
        newlines: nums.next().unwrap(),
        words: nums.next().unwrap(),
        bytes: nums.next().unwrap(),
    }
}

quickcheck! {
    fn prop(xs: Vec<u8>) -> bool {
        let me = WordCountStats::from_reader(&xs[..]).unwrap();
        let them = wc_program(&xs);
        me == them
    }
}
票数 1
EN
页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://codereview.stackexchange.com/questions/190706

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档