这是wc的最小重隐。它目前只支持stdin,不支持命令行参数,这是供以后版本使用的。
这是我第一个完整的Rust程序/包,所以我对任何评论都感兴趣,包括但不限于:
[package]
name = "wc"
version = "0.1.0"
[dependencies]use std::io::Read;
/// The statistics returned by `wordcount`.
#[derive(Debug, PartialEq, Eq, Clone, Copy)]
pub struct WordCountStats {
/// number of bytes in the input
pub bytes: usize,
/// number of groups of consecutive non-whitespace characters
pub words: usize,
/// number of newline characters (`\n`)
pub newlines: usize,
}
/// Returns the word count statistics of the given `reader`.
///
/// ```
/// use wc::{wordcount,WordCountStats};
///
/// assert_eq!(
/// wordcount("Hello, World!".as_bytes()).unwrap(),
/// WordCountStats {
/// bytes: 13,
/// words: 2,
/// newlines: 0,
/// }
/// );
/// ```
///
/// The statistics follow `wc` (`man 1 wc`) output:
///
/// * bytes is always the number of bytes (not utf8 characters or similar)
/// * words is the number of positive length consecutive non-whitespace runs
/// * newlines is the number of newlines (NOT the number of lines)
///
/// `wordcount` uses `bytes()` internally and tries not to
/// add any buffering to the `reader`. If you use an unbuffered
/// device, consider using `BufRead` around your content.
///
/// # Errors
/// If a `byte` couldn't get read you will get a `Err(std::io::Error)`.
/// This can happen if the socket disconnects suddenly, a filesystem
/// error occurred, or your scanner couldn't continue to read the stripes
/// from your cat.
pub fn wordcount<R>(reader: R) -> std::io::Result<WordCountStats>
where
R: Read,
{
let mut bytes = 0;
let mut words = 0;
let mut newlines = 0;
let mut spacemode = true;
for byte in reader.bytes() {
bytes += 1;
let c = byte?;
if (c as char).is_whitespace() {
spacemode = true
} else if spacemode {
// A non-whitespace character after a whitespace character sequence.
words += 1;
spacemode = false
}
if c as char == '\n' {
newlines += 1
}
}
Ok(WordCountStats {
bytes,
words,
newlines,
})
}
#[cfg(test)]
mod tests {
use WordCountStats;
fn wc_string(input: &str) -> ::WordCountStats {
::wordcount(input.as_bytes()).unwrap()
}
#[test]
fn empty_input() {
assert_eq!(
wc_string(""),
WordCountStats {
bytes: 0,
words: 0,
newlines: 0,
}
)
}
#[test]
fn single_letter_input() {
assert_eq!(
wc_string("a"),
WordCountStats {
bytes: 1,
words: 1,
newlines: 0,
}
)
}
#[test]
fn single_space_input() {
assert_eq!(
wc_string(" "),
WordCountStats {
bytes: 1,
words: 0,
newlines: 0,
}
)
}
#[test]
fn two_letters_separated_by_spaces() {
assert_eq!(
wc_string("a \t b"),
WordCountStats {
bytes: 5,
words: 2,
newlines: 0,
}
)
}
#[test]
fn two_line_input() {
assert_eq!(
wc_string("\n"),
WordCountStats {
bytes: 1,
words: 0,
newlines: 1,
}
)
}
#[test]
fn complicated_input() {
assert_eq!(
wc_string("Hello, World!\nHow are you today?\nI hope you're fine!"),
WordCountStats {
bytes: 52,
words: 10,
newlines: 2,
}
)
}
}这只是一个检查库的存根main。我将在以后的版本中对其进行扩展。你可以复习一下,但我并没有真正关注它。
extern crate wc;
fn main() {
let filename = "-";
let stdin = std::io::stdin();
let handle = stdin.lock();
match wc::wordcount(handle) {
Err(e) => eprintln!("{}: {}", filename, e),
Ok(wc::WordCountStats {
bytes,
words,
newlines,
}) => println!("{:8} {:8} {:12} {}", newlines, words, bytes, filename),
}
}发布于 2018-04-15 00:55:55
对于这样的程序,空格的定义是至关重要的。您的代码不记录空白的含义。检查代码表明,由于您按字节进行迭代,因此您可能不会处理任何更有趣的Unicode字符:
printf %b '\u200b \u200b \u200b \u200b \u200b' | cargo run
0 5 34 -我可能会为该结构派生Default,调用它,然后在函数中对该结构的字段进行变异,而不是有3个单独的变量:
在for循环中不立即执行错误检查是非常奇怪的。我会将?移到第一个表达式。我也会隐藏名字byte,因为名称c使它听起来像一个char,而它不是,这就释放了变量名称c,使其成为类型类型的版本。
我会将空闲函数移到与WordCountStats相关的函数。
在测试中使用更多的垂直间距。垂直空间帮助我们的眼睛很容易地分辨出不同的相关部分。
由于您在测试中导入了WordCountStats,所以不需要使用::作为路径前缀。
我是你添加的小测试实用程序函数的粉丝。
use std::io::Read;
/// The statistics returned by `WordCountStats::from_reader`.
#[derive(Debug, PartialEq, Eq, Clone, Copy, Default)]
pub struct WordCountStats {
/// number of bytes in the input
pub bytes: usize,
/// number of groups of consecutive non-whitespace characters
pub words: usize,
/// number of newline characters (`\n`)
pub newlines: usize,
}
impl WordCountStats {
/// Returns the word count statistics of the given `reader`.
///
/// ```
/// use wc::WordCountStats;
///
/// assert_eq!(
/// WordCountStats::from_reader("Hello, World!".as_bytes()).unwrap(),
/// WordCountStats {
/// bytes: 13,
/// words: 2,
/// newlines: 0,
/// }
/// );
/// ```
///
/// The statistics follow `wc` (`man 1 wc`) output:
///
/// * bytes is always the number of bytes (not utf8 characters or similar)
/// * words is the number of positive length consecutive non-whitespace runs
/// * newlines is the number of newlines (NOT the number of lines)
///
/// `WordCountStats::from_reader` uses `bytes()` internally and tries not to
/// add any buffering to the `reader`. If you use an unbuffered
/// device, consider using `BufRead` around your content.
///
/// # Errors
/// If a `byte` couldn't get read you will get a `Err(std::io::Error)`.
/// This can happen if the socket disconnects suddenly, a filesystem
/// error occurred, or your scanner couldn't continue to read the stripes
/// from your cat.
pub fn from_reader<R>(reader: R) -> std::io::Result<WordCountStats>
where
R: Read,
{
let mut stats = WordCountStats::default();
let mut spacemode = true;
for byte in reader.bytes() {
let byte = byte?;
let c = byte as char;
stats.bytes += 1;
if c.is_whitespace() {
spacemode = true
} else if spacemode {
// A non-whitespace character after a whitespace character sequence.
stats.words += 1;
spacemode = false
}
if c == '\n' {
stats.newlines += 1
}
}
Ok(stats)
}
}
#[cfg(test)]
mod tests {
use WordCountStats;
fn wc_string(input: &str) -> WordCountStats {
::WordCountStats::from_reader(input.as_bytes()).unwrap()
}
#[test]
fn empty_input() {
assert_eq!(
wc_string(""),
WordCountStats {
bytes: 0,
words: 0,
newlines: 0,
}
)
}
#[test]
fn single_letter_input() {
assert_eq!(
wc_string("a"),
WordCountStats {
bytes: 1,
words: 1,
newlines: 0,
}
)
}
#[test]
fn single_space_input() {
assert_eq!(
wc_string(" "),
WordCountStats {
bytes: 1,
words: 0,
newlines: 0,
}
)
}
#[test]
fn two_letters_separated_by_spaces() {
assert_eq!(
wc_string("a \t b"),
WordCountStats {
bytes: 5,
words: 2,
newlines: 0,
}
)
}
#[test]
fn two_line_input() {
assert_eq!(
wc_string("\n"),
WordCountStats {
bytes: 1,
words: 0,
newlines: 1,
}
)
}
#[test]
fn complicated_input() {
assert_eq!(
wc_string("Hello, World!\nHow are you today?\nI hope you're fine!"),
WordCountStats {
bytes: 52,
words: 10,
newlines: 2,
}
)
}
}因此,真正的测试用例是
cargo run是否返回与wc相同的内容。
那就做个测试吧。我喜欢使用快速检查:
use std::{
io::Write,
process::{Command, Stdio},
};
fn wc_program(bytes: &[u8]) -> WordCountStats {
let mut child = Command::new("wc")
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.spawn()
.unwrap();
child.stdin.as_mut().unwrap().write_all(bytes).unwrap();
let out = child.wait_with_output().unwrap();
let out = String::from_utf8(out.stdout).unwrap();
let mut nums = out.split_whitespace().map(|n| n.parse::<usize>().unwrap()).fuse();
WordCountStats {
newlines: nums.next().unwrap(),
words: nums.next().unwrap(),
bytes: nums.next().unwrap(),
}
}
quickcheck! {
fn prop(xs: Vec<u8>) -> bool {
let me = WordCountStats::from_reader(&xs[..]).unwrap();
let them = wc_program(&xs);
me == them
}
}https://codereview.stackexchange.com/questions/190706
复制相似问题