首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >SHA256在锈病中的教育实施

SHA256在锈病中的教育实施
EN

Code Review用户
提问于 2019-10-08 19:51:48
回答 1查看 800关注 0票数 5

几周前,我决定看一下Rust编程语言。经过几次标准练习之后,我选择了尝试实现SHA256哈希算法,因为为什么不呢?

该算法的状态存储在一个结构中,类似于Python的hashlib。这是lib.rs文件:

代码语言:javascript
复制
use std::convert::TryInto;


pub struct SHA256Hash {
    h0: u32,
    h1: u32,
    h2: u32,
    h3: u32,
    h4: u32,
    h5: u32,
    h6: u32,
    h7: u32,
    finalized: bool,
    total_bits: usize,
    unprocessed_bytes: Vec<u8>,
    input_block_size: usize
}


impl SHA256Hash {

    /// Create a new hash instance
    pub fn new() -> SHA256Hash {
        SHA256Hash {
            h0: 0x6a09e667,
            h1: 0xbb67ae85,
            h2: 0x3c6ef372,
            h3: 0xa54ff53a,
            h4: 0x510e527f,
            h5: 0x9b05688c,
            h6: 0x1f83d9ab,
            h7: 0x5be0cd19,
            finalized: false,
            total_bits: 0,
            unprocessed_bytes: Vec::new(),
            input_block_size: 64
        }
    }

    pub fn block_size(&self) -> usize {
        self.input_block_size
    }

    /// update with a block of 512bit
    fn update_block(&mut self, block: &[u8; 64]) {

        // Initialize array of round constants:
        // (first 32 bits of the fractional parts of the cube roots of the first 64 primes 2..311):
        let round_constants: [u32; 64] = [
            0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
            0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
            0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
            0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
            0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
            0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
            0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
            0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
        ];

        let mut w: [u32; 64] = [0; 64];

        for j in 0..16 {
            // let mut bytes: [u8; 4] = Default::default();
            // bytes.copy_from_slice(&block[j*4..(j+1)*4]);
            // w[j] = transform_array_of_u8_big_endian_to_u32(&bytes);
            w[j] = transform_array_of_u8_big_endian_to_u32(block[j*4..(j+1)*4].try_into().expect(""));
            // println!("w[{:2}] {:08X?}", j, w[j]);
        }

        // Extend the first 16 words into the remaining 48 words w[16..63] of the message schedule array:
        for j in 16..64 {
            let s0 = w[j-15].rotate_right(7) ^ w[j-15].rotate_right(18) ^ (w[j-15] >> 3);
            let s1 = w[j-2].rotate_right(17) ^ w[j-2].rotate_right(19) ^ (w[j-2] >> 10);
            w[j] = w[j-16].wrapping_add(s0).wrapping_add(w[j-7]).wrapping_add(s1);
        }

        let mut a = self.h0;
        let mut b = self.h1;
        let mut c = self.h2;
        let mut d = self.h3;
        let mut e = self.h4;
        let mut f = self.h5;
        let mut g = self.h6;
        let mut h = self.h7;

        // Compression function main loop:
        for j in 0..64 {
            let s1 = e.rotate_right(6) ^ e.rotate_right(11) ^ e.rotate_right(25);
            let ch = (e & f) ^ (!e & g);
            let temp1 = h.wrapping_add(s1).wrapping_add(ch).wrapping_add(round_constants[j]).wrapping_add(w[j]);

            let s0 = a.rotate_right(2) ^ a.rotate_right(13) ^ a.rotate_right(22);
            let maj = (a & b) ^ (a & c) ^ (b & c);
            let temp2 = s0.wrapping_add(maj);

            h = g;
            g = f;
            f = e;
            e = d.wrapping_add(temp1);
            d = c;
            c = b;
            b = a;
            a = temp1.wrapping_add(temp2);
        }

        self.h0 = self.h0.wrapping_add(a);
        self.h1 = self.h1.wrapping_add(b);
        self.h2 = self.h2.wrapping_add(c);
        self.h3 = self.h3.wrapping_add(d);
        self.h4 = self.h4.wrapping_add(e);
        self.h5 = self.h5.wrapping_add(f);
        self.h6 = self.h6.wrapping_add(g);
        self.h7 = self.h7.wrapping_add(h);
    }

    /// consume blocks of unprocessed bytes
    fn consume(&mut self) {
        assert_eq!(self.unprocessed_bytes.len() % 64, 0);
        let n_blocks = self.unprocessed_bytes.len() / 64;
        for i in 0..n_blocks {
            let mut block: [u8; 64] = [0; 64];
            // the copy seems to be necessary because of a multiple
            // mutable/immutable borowing situation I've set up for myself
            block.copy_from_slice(&self.unprocessed_bytes[i*64..(i+1)*64]);
            self.update_block(&block);
            // it's a pitty that try_into does not work here
        }
        self.unprocessed_bytes.clear();
    }

    /// query the hash digest
    pub fn digest(&mut self) -> [u32; 8] {
        self.finalize();
        [self.h0, self.h1, self.h2, self.h3, self.h4, self.h5, self.h6, self.h7]
    }

    /// query the hex digest, formatted as hexadecimal string
    pub fn hex_digest(&mut self) -> String {
        let digest = self.digest();
        format!(
            "{:08X?}{:08X?}{:08X?}{:08X?}{:08X?}{:08X?}{:08X?}{:08X?}",
            digest[0], digest[1], digest[2], digest[3],
            digest[4], digest[5], digest[6], digest[7]
        )
    }

    /// update the hash state
    pub fn update(&mut self, input: &[u8]) -> bool {
        if self.finalized || input.len() == 0 {
            return false;
        }

        let input_len_bytes = input.len();
        self.total_bits += input_len_bytes * 8;
        let remaining_bytes = (input_len_bytes + self.unprocessed_bytes.len()) % 64;
        self.unprocessed_bytes.extend(input[..(input_len_bytes-remaining_bytes)].iter().clone());

        self.consume();

        if remaining_bytes > 0 {
            self.unprocessed_bytes.extend(input[input_len_bytes-remaining_bytes..].iter().clone());
        }

        true
    }

    fn finalize(&mut self) {
        self.unprocessed_bytes.push(0x80);

        let n_padding_bits = 512 - (self.total_bits + 8 + 64) % 512;
        let n_padding_bytes = n_padding_bits / 8;
        self.unprocessed_bytes.extend(vec![0; n_padding_bytes]);
        let length: u64 = self.total_bits.try_into().unwrap();
        self.unprocessed_bytes.extend(&transform_u64_to_array_of_u8_big_endian(length));

        self.consume();

        self.finalized = true;
    }
}

///////////////////////////////////////////////////////////////////////////////

fn transform_u64_to_array_of_u8_big_endian(x: u64) -> [u8; 8] {
    let b1 = ((x >> 56) & 0xff) as u8;
    let b2 = ((x >> 48) & 0xff) as u8;
    let b3 = ((x >> 40) & 0xff) as u8;
    let b4 = ((x >> 32) & 0xff) as u8;
    let b5 = ((x >> 24) & 0xff) as u8;
    let b6 = ((x >> 16) & 0xff) as u8;
    let b7 = ((x >> 8) & 0xff) as u8;
    let b8 = (x & 0xff) as u8;
    [b1, b2, b3, b4, b5, b6, b7, b8]
}


fn transform_array_of_u8_big_endian_to_u32(arr_of_u8: &[u8; 4]) -> u32 {
    let mut x: u32 = 0;
    x |= (arr_of_u8[0] as u32) << 24;
    x |= (arr_of_u8[1] as u32) << 16;
    x |= (arr_of_u8[2] as u32) << 8;
    x |= arr_of_u8[3] as u32;
    x
}

///////////////////////////////////////////////////////////////////////////////

#[cfg(test)]
mod tests {
    use super::SHA256Hash;

    /// Run empty test input from FIPS 180-2
    #[test]
    fn sha256_nist_empty() {
        let mut hasher = SHA256Hash::new();
        hasher.update(&[]);
        let digest = hasher.digest();
        assert_eq!(
            digest,
            [0xe3b0c442, 0x98fc1c14, 0x9afbf4c8, 0x996fb924,
             0x27ae41e4, 0x649b934c, 0xa495991b, 0x7852b855]
        );
    }

    /// Run abc test from FIPS 180-2
    #[test]
    fn sha256_nist_abc() {
        let mut hasher = SHA256Hash::new();
        hasher.update(b"abc");
        let digest = hasher.digest();
        assert_eq!(
            digest,
            [0xba7816bf, 0x8f01cfea, 0x414140de, 0x5dae2223,
             0xb00361a3, 0x96177a9c, 0xb410ff61, 0xf20015ad]
        )
    }

    /// Run two-block test from FIPS 180-2
    #[test]
    fn sha256_nist_two_blocks() {
        let mut hasher = SHA256Hash::new();
        hasher.update(b"abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq");
        let digest = hasher.digest();
        assert_eq!(
            digest,
            [0x248d6a61, 0xd20638b8, 0xe5c02693, 0x0c3e6039,
             0xa33ce459, 0x64ff2167, 0xf6ecedd4, 0x19db06c1]
        );
    }

    /// Run large input test (1,000,000 x a) from FIPS 180-2
    #[test]
    fn sha256_nist_large_input() {
        let input_str = std::iter::repeat("a").take(1_000_000).collect::<String>();
        let input = input_str.as_bytes();
        let mut hasher = SHA256Hash::new();
        hasher.update(&input);
        let digest = hasher.digest();
        assert_eq!(
            digest,
            [0xcdc76e5c, 0x9914fb92, 0x81a1c7e2, 0x84d73e67,
             0xf1809a48, 0xa497200e, 0x046d39cc, 0xc7112cd0]
        );
    }
}

lib.rs附带了一个小的单元测试套件,如FIPS 180-2附录B中指定的那样。可以使用cargo test [--release]运行测试套件。

还有一个main.rs,它生成一个可执行文件,用于命令行工具:

代码语言:javascript
复制
use std::env;
use std::fs::File;
use std::io::{BufRead, BufReader};

use sha256::SHA256Hash;


fn main() -> std::io::Result<()> {
    let mut hasher = SHA256Hash::new();

    let args: Vec<String> = env::args().collect();
    if args.len() < 2 {
        eprintln!("Please provide a filename as command line argument!");
        return Ok(());
    }

    let filename = &args[1];
    let file = File::open(filename)?;
    let chunk_size: usize = hasher.block_size() * 1024;
    let mut reader = BufReader::with_capacity(chunk_size, file);

    loop {
        let length = {
            let buffer = reader.fill_buf()?;
            hasher.update(buffer);
            buffer.len()
        };
        if length == 0 {
            break;
        }
        reader.consume(length);
    }

    println!("{} {}", hasher.hex_digest().to_ascii_lowercase(), filename);

    Ok(())
}

作为一个例子,让我们计算lib.rs的散列;-)

代码语言:javascript
复制
cargo run --release src\lib.rs
86dccf89c7cb4de0837a2c4a3c709ae7845151338a1a21a8321fcbf9e4d8dcf7 src\lib.rs

在我的笔记本电脑上,计算3.64 GiB的ISO映像的哈希值大约需要39s (35s带有7zip的内置SHA256)。

我愿意接受各种各样的反馈,不管是风格、习惯性的生锈、性能、可用性,还是你脑海中的任何东西。

为完整性起见,Cargo.toml

代码语言:javascript
复制
[package]
name = "hash_sha256"
version = "0.1.0"
authors = ["AlexV"]
edition = "2018"

[lib]
name = "sha256"
path = "src/lib.rs"

我知道这里有一个关于代码评审的相似问题,但不幸的是它没有答案。此外,从我的判断,这些方法是相当不同的,以他们自己的权利。

EN

回答 1

Code Review用户

回答已采纳

发布于 2019-10-09 00:31:28

首先,让我们从一个简单的假设开始: SIMD已经不在桌面上了。因此,您的内核计算(update_block)基本上是经过优化的,相反,我最终关注的是其余部分。

一些坏消息

在分析之后,正如预期的那样,SHA256实现的大部分CPU时间都在update_block()中。没什么好惊讶的。因此,我们事先就知道,在不采取某种诡计的情况下,我们几乎无法减轻64轮行动的痛苦。

然而,我们可以做一个简单的改变。我们将引入byteorder机箱,并使用它将我们的u8数组转换为u32s。这既可以减轻我们的痛苦,又能提供一个更好的版本。因为我们总是给它添加4个字节(通过代码逻辑验证),所以我们可以安全地删除片大小前缀,并从&[u8] Read实现中受益。

它看起来也很整洁,也是不可阻挡的:

代码语言:javascript
复制
fn transform_array_of_u8_big_endian_to_u32(mut arr_of_u8: &[u8]) -> u32 {
    arr_of_u8.read_u32::<BigEndian>().unwrap()
}

一些好消息

我们可以移除大量的拨款!

我们将把consume的方法签名从fn consume(&mut self);更改为fn consume(&mut self, bytes: &[u8]);。这样做的原因是我们可以通过执行以下操作(游乐场)来获得相当好的性能增益:

代码语言:javascript
复制
fn consume(&mut self, mut bytes: &[u8]) {
    let input_len_bytes = bytes.len();
    let unprocessed_len = self.unprocessed_bytes.len();
    self.total_bits += input_len_bytes * 8;
    // Do we have bytes in the unprocessed buffer?
    if unprocessed_len > 0 {
        if (unprocessed_len + input_len_bytes) < 64 {
            // Nothing to do, we just append
            // Copy up to 63 bytes to our Vec
            self.unprocessed_bytes.extend_from_slice(bytes);
            return;
        }
        let (additional, new_bytes) = bytes.split_at(64 - unprocessed_len);
        // Reassign
        bytes = new_bytes;
        // Copy up to 64 bytes from what we just took
        self.unprocessed_bytes.extend_from_slice(additional);
        // We can afford a 64-byte clone
        self.update_block(self.unprocessed_bytes.clone().as_slice());
        self.unprocessed_bytes.clear();
        // Call ourselves
        //return self.inner_consume(new_bytes);
    }
    let iter = bytes.chunks_exact(64);
    let remainder_i = iter.clone();
    for block in iter {
        self.update_block(&block)
    }
    let bytes = remainder_i.remainder();
    self.unprocessed_bytes.extend_from_slice(bytes); // max 64bytes allocated
}

这也打开了对update()的重复调用,而上一版本的错误(索引超出界限的错误)就已经过时了。chunks_exact()迭代器利用了切片不可变的优点,实际上只是移动了一个指针。这使我们从数据副本和所有权问题中解脱出来。

老实说,在正确地查看了分析数据之后,您将不会冒险进入更新函数的古怪领域,即simd (通过packed_simd);作为一个学习实验,我强烈推荐它。它目前只在夜间使用(它稳定到锈蚀1.33,然后就坏了,它又要稳定下来了),但这是值得的--我们说的是10-15倍的加速。由于SHA256中重复运算的性质,它是256位(u32x8)向量的理想选择。

票数 5
EN
页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://codereview.stackexchange.com/questions/230384

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档