几周前,我决定看一下Rust编程语言。经过几次标准练习之后,我选择了尝试实现SHA256哈希算法,因为为什么不呢?
该算法的状态存储在一个结构中,类似于Python的hashlib。这是lib.rs文件:
use std::convert::TryInto;
pub struct SHA256Hash {
h0: u32,
h1: u32,
h2: u32,
h3: u32,
h4: u32,
h5: u32,
h6: u32,
h7: u32,
finalized: bool,
total_bits: usize,
unprocessed_bytes: Vec<u8>,
input_block_size: usize
}
impl SHA256Hash {
/// Create a new hash instance
pub fn new() -> SHA256Hash {
SHA256Hash {
h0: 0x6a09e667,
h1: 0xbb67ae85,
h2: 0x3c6ef372,
h3: 0xa54ff53a,
h4: 0x510e527f,
h5: 0x9b05688c,
h6: 0x1f83d9ab,
h7: 0x5be0cd19,
finalized: false,
total_bits: 0,
unprocessed_bytes: Vec::new(),
input_block_size: 64
}
}
pub fn block_size(&self) -> usize {
self.input_block_size
}
/// update with a block of 512bit
fn update_block(&mut self, block: &[u8; 64]) {
// Initialize array of round constants:
// (first 32 bits of the fractional parts of the cube roots of the first 64 primes 2..311):
let round_constants: [u32; 64] = [
0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
];
let mut w: [u32; 64] = [0; 64];
for j in 0..16 {
// let mut bytes: [u8; 4] = Default::default();
// bytes.copy_from_slice(&block[j*4..(j+1)*4]);
// w[j] = transform_array_of_u8_big_endian_to_u32(&bytes);
w[j] = transform_array_of_u8_big_endian_to_u32(block[j*4..(j+1)*4].try_into().expect(""));
// println!("w[{:2}] {:08X?}", j, w[j]);
}
// Extend the first 16 words into the remaining 48 words w[16..63] of the message schedule array:
for j in 16..64 {
let s0 = w[j-15].rotate_right(7) ^ w[j-15].rotate_right(18) ^ (w[j-15] >> 3);
let s1 = w[j-2].rotate_right(17) ^ w[j-2].rotate_right(19) ^ (w[j-2] >> 10);
w[j] = w[j-16].wrapping_add(s0).wrapping_add(w[j-7]).wrapping_add(s1);
}
let mut a = self.h0;
let mut b = self.h1;
let mut c = self.h2;
let mut d = self.h3;
let mut e = self.h4;
let mut f = self.h5;
let mut g = self.h6;
let mut h = self.h7;
// Compression function main loop:
for j in 0..64 {
let s1 = e.rotate_right(6) ^ e.rotate_right(11) ^ e.rotate_right(25);
let ch = (e & f) ^ (!e & g);
let temp1 = h.wrapping_add(s1).wrapping_add(ch).wrapping_add(round_constants[j]).wrapping_add(w[j]);
let s0 = a.rotate_right(2) ^ a.rotate_right(13) ^ a.rotate_right(22);
let maj = (a & b) ^ (a & c) ^ (b & c);
let temp2 = s0.wrapping_add(maj);
h = g;
g = f;
f = e;
e = d.wrapping_add(temp1);
d = c;
c = b;
b = a;
a = temp1.wrapping_add(temp2);
}
self.h0 = self.h0.wrapping_add(a);
self.h1 = self.h1.wrapping_add(b);
self.h2 = self.h2.wrapping_add(c);
self.h3 = self.h3.wrapping_add(d);
self.h4 = self.h4.wrapping_add(e);
self.h5 = self.h5.wrapping_add(f);
self.h6 = self.h6.wrapping_add(g);
self.h7 = self.h7.wrapping_add(h);
}
/// consume blocks of unprocessed bytes
fn consume(&mut self) {
assert_eq!(self.unprocessed_bytes.len() % 64, 0);
let n_blocks = self.unprocessed_bytes.len() / 64;
for i in 0..n_blocks {
let mut block: [u8; 64] = [0; 64];
// the copy seems to be necessary because of a multiple
// mutable/immutable borowing situation I've set up for myself
block.copy_from_slice(&self.unprocessed_bytes[i*64..(i+1)*64]);
self.update_block(&block);
// it's a pitty that try_into does not work here
}
self.unprocessed_bytes.clear();
}
/// query the hash digest
pub fn digest(&mut self) -> [u32; 8] {
self.finalize();
[self.h0, self.h1, self.h2, self.h3, self.h4, self.h5, self.h6, self.h7]
}
/// query the hex digest, formatted as hexadecimal string
pub fn hex_digest(&mut self) -> String {
let digest = self.digest();
format!(
"{:08X?}{:08X?}{:08X?}{:08X?}{:08X?}{:08X?}{:08X?}{:08X?}",
digest[0], digest[1], digest[2], digest[3],
digest[4], digest[5], digest[6], digest[7]
)
}
/// update the hash state
pub fn update(&mut self, input: &[u8]) -> bool {
if self.finalized || input.len() == 0 {
return false;
}
let input_len_bytes = input.len();
self.total_bits += input_len_bytes * 8;
let remaining_bytes = (input_len_bytes + self.unprocessed_bytes.len()) % 64;
self.unprocessed_bytes.extend(input[..(input_len_bytes-remaining_bytes)].iter().clone());
self.consume();
if remaining_bytes > 0 {
self.unprocessed_bytes.extend(input[input_len_bytes-remaining_bytes..].iter().clone());
}
true
}
fn finalize(&mut self) {
self.unprocessed_bytes.push(0x80);
let n_padding_bits = 512 - (self.total_bits + 8 + 64) % 512;
let n_padding_bytes = n_padding_bits / 8;
self.unprocessed_bytes.extend(vec![0; n_padding_bytes]);
let length: u64 = self.total_bits.try_into().unwrap();
self.unprocessed_bytes.extend(&transform_u64_to_array_of_u8_big_endian(length));
self.consume();
self.finalized = true;
}
}
///////////////////////////////////////////////////////////////////////////////
fn transform_u64_to_array_of_u8_big_endian(x: u64) -> [u8; 8] {
let b1 = ((x >> 56) & 0xff) as u8;
let b2 = ((x >> 48) & 0xff) as u8;
let b3 = ((x >> 40) & 0xff) as u8;
let b4 = ((x >> 32) & 0xff) as u8;
let b5 = ((x >> 24) & 0xff) as u8;
let b6 = ((x >> 16) & 0xff) as u8;
let b7 = ((x >> 8) & 0xff) as u8;
let b8 = (x & 0xff) as u8;
[b1, b2, b3, b4, b5, b6, b7, b8]
}
fn transform_array_of_u8_big_endian_to_u32(arr_of_u8: &[u8; 4]) -> u32 {
let mut x: u32 = 0;
x |= (arr_of_u8[0] as u32) << 24;
x |= (arr_of_u8[1] as u32) << 16;
x |= (arr_of_u8[2] as u32) << 8;
x |= arr_of_u8[3] as u32;
x
}
///////////////////////////////////////////////////////////////////////////////
#[cfg(test)]
mod tests {
use super::SHA256Hash;
/// Run empty test input from FIPS 180-2
#[test]
fn sha256_nist_empty() {
let mut hasher = SHA256Hash::new();
hasher.update(&[]);
let digest = hasher.digest();
assert_eq!(
digest,
[0xe3b0c442, 0x98fc1c14, 0x9afbf4c8, 0x996fb924,
0x27ae41e4, 0x649b934c, 0xa495991b, 0x7852b855]
);
}
/// Run abc test from FIPS 180-2
#[test]
fn sha256_nist_abc() {
let mut hasher = SHA256Hash::new();
hasher.update(b"abc");
let digest = hasher.digest();
assert_eq!(
digest,
[0xba7816bf, 0x8f01cfea, 0x414140de, 0x5dae2223,
0xb00361a3, 0x96177a9c, 0xb410ff61, 0xf20015ad]
)
}
/// Run two-block test from FIPS 180-2
#[test]
fn sha256_nist_two_blocks() {
let mut hasher = SHA256Hash::new();
hasher.update(b"abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq");
let digest = hasher.digest();
assert_eq!(
digest,
[0x248d6a61, 0xd20638b8, 0xe5c02693, 0x0c3e6039,
0xa33ce459, 0x64ff2167, 0xf6ecedd4, 0x19db06c1]
);
}
/// Run large input test (1,000,000 x a) from FIPS 180-2
#[test]
fn sha256_nist_large_input() {
let input_str = std::iter::repeat("a").take(1_000_000).collect::<String>();
let input = input_str.as_bytes();
let mut hasher = SHA256Hash::new();
hasher.update(&input);
let digest = hasher.digest();
assert_eq!(
digest,
[0xcdc76e5c, 0x9914fb92, 0x81a1c7e2, 0x84d73e67,
0xf1809a48, 0xa497200e, 0x046d39cc, 0xc7112cd0]
);
}
}lib.rs附带了一个小的单元测试套件,如FIPS 180-2附录B中指定的那样。可以使用cargo test [--release]运行测试套件。
还有一个main.rs,它生成一个可执行文件,用于命令行工具:
use std::env;
use std::fs::File;
use std::io::{BufRead, BufReader};
use sha256::SHA256Hash;
fn main() -> std::io::Result<()> {
let mut hasher = SHA256Hash::new();
let args: Vec<String> = env::args().collect();
if args.len() < 2 {
eprintln!("Please provide a filename as command line argument!");
return Ok(());
}
let filename = &args[1];
let file = File::open(filename)?;
let chunk_size: usize = hasher.block_size() * 1024;
let mut reader = BufReader::with_capacity(chunk_size, file);
loop {
let length = {
let buffer = reader.fill_buf()?;
hasher.update(buffer);
buffer.len()
};
if length == 0 {
break;
}
reader.consume(length);
}
println!("{} {}", hasher.hex_digest().to_ascii_lowercase(), filename);
Ok(())
}作为一个例子,让我们计算lib.rs的散列;-)
cargo run --release src\lib.rs
86dccf89c7cb4de0837a2c4a3c709ae7845151338a1a21a8321fcbf9e4d8dcf7 src\lib.rs在我的笔记本电脑上,计算3.64 GiB的ISO映像的哈希值大约需要39s (35s带有7zip的内置SHA256)。
我愿意接受各种各样的反馈,不管是风格、习惯性的生锈、性能、可用性,还是你脑海中的任何东西。
为完整性起见,Cargo.toml:
[package]
name = "hash_sha256"
version = "0.1.0"
authors = ["AlexV"]
edition = "2018"
[lib]
name = "sha256"
path = "src/lib.rs"我知道这里有一个关于代码评审的相似问题,但不幸的是它没有答案。此外,从我的判断,这些方法是相当不同的,以他们自己的权利。
发布于 2019-10-09 00:31:28
首先,让我们从一个简单的假设开始: SIMD已经不在桌面上了。因此,您的内核计算(update_block)基本上是经过优化的,相反,我最终关注的是其余部分。
在分析之后,正如预期的那样,SHA256实现的大部分CPU时间都在update_block()中。没什么好惊讶的。因此,我们事先就知道,在不采取某种诡计的情况下,我们几乎无法减轻64轮行动的痛苦。
然而,我们可以做一个简单的改变。我们将引入byteorder机箱,并使用它将我们的u8数组转换为u32s。这既可以减轻我们的痛苦,又能提供一个更好的版本。因为我们总是给它添加4个字节(通过代码逻辑验证),所以我们可以安全地删除片大小前缀,并从&[u8] Read实现中受益。
它看起来也很整洁,也是不可阻挡的:
fn transform_array_of_u8_big_endian_to_u32(mut arr_of_u8: &[u8]) -> u32 {
arr_of_u8.read_u32::<BigEndian>().unwrap()
}我们可以移除大量的拨款!
我们将把consume的方法签名从fn consume(&mut self);更改为fn consume(&mut self, bytes: &[u8]);。这样做的原因是我们可以通过执行以下操作(游乐场)来获得相当好的性能增益:
fn consume(&mut self, mut bytes: &[u8]) {
let input_len_bytes = bytes.len();
let unprocessed_len = self.unprocessed_bytes.len();
self.total_bits += input_len_bytes * 8;
// Do we have bytes in the unprocessed buffer?
if unprocessed_len > 0 {
if (unprocessed_len + input_len_bytes) < 64 {
// Nothing to do, we just append
// Copy up to 63 bytes to our Vec
self.unprocessed_bytes.extend_from_slice(bytes);
return;
}
let (additional, new_bytes) = bytes.split_at(64 - unprocessed_len);
// Reassign
bytes = new_bytes;
// Copy up to 64 bytes from what we just took
self.unprocessed_bytes.extend_from_slice(additional);
// We can afford a 64-byte clone
self.update_block(self.unprocessed_bytes.clone().as_slice());
self.unprocessed_bytes.clear();
// Call ourselves
//return self.inner_consume(new_bytes);
}
let iter = bytes.chunks_exact(64);
let remainder_i = iter.clone();
for block in iter {
self.update_block(&block)
}
let bytes = remainder_i.remainder();
self.unprocessed_bytes.extend_from_slice(bytes); // max 64bytes allocated
}这也打开了对update()的重复调用,而上一版本的错误(索引超出界限的错误)就已经过时了。chunks_exact()迭代器利用了切片不可变的优点,实际上只是移动了一个指针。这使我们从数据副本和所有权问题中解脱出来。
老实说,在正确地查看了分析数据之后,您将不会冒险进入更新函数的古怪领域,即simd (通过packed_simd);作为一个学习实验,我强烈推荐它。它目前只在夜间使用(它稳定到锈蚀1.33,然后就坏了,它又要稳定下来了),但这是值得的--我们说的是10-15倍的加速。由于SHA256中重复运算的性质,它是256位(u32x8)向量的理想选择。
https://codereview.stackexchange.com/questions/230384
复制相似问题