我在使用int / size_t时遇到了困难,因为我们实际上不能减一个size_t。所以我不知道我是否很好地使用了size_t,但是我想尽可能大地处理输入字符串。
lz77编码和ilz77解码。
作为我在这里没有复制的外部函数,有:
int bin_to_dec(std::string bin),它将二进制数字转换为十进制表示形式。
std::string dec_to_bin(int dec, int nb_bits=-1),它执行反向操作:取一个十进制数,并在必要时在左边添加额外的0,从而输出其大小为nb_bits的二进制表示。
这是我的算法:
#include "BinaryTools.h"
#include <string>
#include <cmath>
#include <algorithm>
#include <tuple>
/**
returns a pair (p, n) where p is the position of the first character of the match relatively to the end of dictionary (p = 0 means no match)
and n is the length of the match
the match is the longest substring of dictionary + lookahead_buffer which is a prefix of lookahead_buffer (where + represents string concatenation)
*/
std::pair<int, int> get_longest_match(const std::string& dictionary, const std::string& lookahead_buffer)
{
int p = 0; // q = dictionary.size() - p
int n = 0;
int dict_size = dictionary.size();
int buffer_size = lookahead_buffer.size();
for(int q = 0; q < dictionary.size(); q++)
{
int local_n = 0;
while(local_n < std::min(buffer_size, dict_size - q) && dictionary[q + local_n] == lookahead_buffer[local_n])
local_n++;
if(local_n == dict_size - q) // the match may continue in the buffer
{
int buffer_index = 0;
while(local_n < buffer_size && lookahead_buffer[buffer_index] == lookahead_buffer[local_n])
{
buffer_index++;
local_n++;
}
}
if(local_n > n)
{
n = local_n;
p = dict_size - q;
}
}
std::cout << n << " ";
return std::make_pair(p, n);
}
/**
0: char on 8 bits
1: tuple (p, n) on 15 + 8 bits (only if n >= 3, otherwise send chars)
note: p is shifted 1 down and n is shifted 3 down
*/
std::string lz77(const std::string& input)
{
size_t cursor = 0;
size_t input_size = input.size();
int dictionary_size = pow(2, 15); // (2^15 - 1) + 1 because p >= 1 if there is a match
int buffer_size = pow(2, 8) + 2; // (2^8 - 1) + 3 because n >= 3
std::string output, output_bin;
size_t output_bin_cursor = 0;
while(cursor < input_size)
{
// if cursor < dictionary_size then substr(0, cursor) else substr(cursor - dictionary_size, dictionary_size)
std::string dictionary = input.substr(std::max(0, (int)cursor - dictionary_size), std::min(cursor, (size_t)dictionary_size)); // PB
std::string buffer = input.substr(cursor, buffer_size);
std::pair<int, int> x = get_longest_match(dictionary, buffer);
int p = x.first; // in {1, ..., 2^15}, will be shifted to {0, ..., 2^15 - 1} ; 2^15 = 32768
int n = x.second; // in {3, ..., 2^8 + 2}, will be shited to {0, ..., 2^8 - 1} ; 2^8 = 256
if(n < 3)
{
output_bin += "0";
output_bin += dec_to_bin((int)(unsigned char)input[cursor], 8);
cursor++;
}
else
{
output_bin += "1";
output_bin += dec_to_bin(p - 1, 15);
output_bin += dec_to_bin(n - 3, 8);
cursor += n;
}
int output_bin_size = output_bin.size();
while(output_bin_size - output_bin_cursor >= 8)
{
std::string c = output_bin.substr(output_bin_cursor, 8);
output += (char)bin_to_dec(c);
output_bin_cursor += 8;
}
if(output_bin_size > output_bin.max_size() - 100)
{
output_bin.erase(0, cursor);
output_bin_cursor = 0;
}
}
// fill last byte with 0 (minimal size to encode a something is 9 anyway)
output_bin.erase(0, output_bin_cursor);
if(!output_bin.empty())
{
std::string fill_byte(8 - output_bin.size(), '0');
output_bin += fill_byte;
std::string c = output_bin.substr(0, 8);
output += (char)bin_to_dec(c);
}
return output;
}
std::string ilz77(const std::string& input)
{
size_t input_size = input.size();
size_t input_cursor = 0;
std::string input_bin;
size_t input_bin_cursor = 0;
// initial input_bin fill
while(input_cursor < input_size && input_bin.size() < input_bin.max_size() - 100)
{
input_bin += dec_to_bin((int)(unsigned char)input[input_cursor], 8);
input_cursor++;
}
std::string output;
while(!input_bin.empty())
{
char type = input_bin[input_bin_cursor];
input_bin_cursor++;
if(input_bin_cursor > input_bin.size() || input_bin.size() - (int)input_bin_cursor < 8) // only last byte filling bits remain
return output;
if(type == '0')
{
std::string c = input_bin.substr(input_bin_cursor, 8);
output += (char)bin_to_dec(c);
input_bin_cursor += 8;
}
else
{
int p = bin_to_dec(input_bin.substr(input_bin_cursor, 15)) + 1;
input_bin_cursor += 15;
int n = bin_to_dec(input_bin.substr(input_bin_cursor, 8)) + 3;
input_bin_cursor += 8;
size_t match_begin = output.size() - p; // not just substr because of the eventual overlap
for(int k = 0; k < n; k++)
output += output[match_begin + k];
}
// refill input_bin
if(input_bin.size() - (int)input_bin_cursor < 100 && input_cursor < input_size)
{
input_bin.erase(0, input_bin_cursor);
input_bin_cursor = 0;
while(input_cursor < input_size && input_bin.size() < input_bin.max_size() - 100)
{
input_bin += dec_to_bin((int)(unsigned char)input[input_cursor], 8);
input_cursor++;
}
}
}
return output;
}我的get_longest_match函数非常慢,但我已经看到并设想使用哈希表,稍后我将尝试实现该函数。
更新:根据请求,这是我的BinaryTools.h文件:
#ifndef BINARYTOOLS_H
#define BINARYTOOLS_H
#include <string>
std::string next_bin(std::string bin)
{
int k = bin.size() - 1;
while(bin[k] == '1')
{
bin[k] = '0';
k--;
}
if(k == -1)
bin = "1" + bin;
else
bin[k] = '1';
return bin;
}
int bin_to_dec(std::string bin)
{
int dec = 0;
int n = bin.size();
int k = 1;
for(int i = 0; i < n; i++)
{
if(bin[n-1-i] == '1')
dec += k;
k *= 2;
}
return dec;
}
// if nb_bits > 1, add 0s at the beginning of the output string so that its size is nb_bits
std::string dec_to_bin(int dec, int nb_bits=-1)
{
std::string bin;
while(dec != 0)
{
if(dec % 2 == 0)
{
bin = "0" + bin;
dec /= 2;
}
else
{
bin = "1" + bin;
dec = (dec - 1) / 2;
}
}
if(nb_bits > 0 && bin.size() < nb_bits)
{
std::string fill(nb_bits - bin.size(), '0');
return fill + bin;
}
return bin;
}
#endif // BINARYTOOLS_H发布于 2017-05-24 14:41:09
欢迎来到代码评审,这是一个不错的第一个问题。代码写得很好,可读性也很好。只是一些可能有助于改进代码的观察:
正如@TobySpeight所提到的,您应该将变量更改为size_t,这样警告消息就会消失。
代码丢失了
#include <iostream>这导致了上面提到的bug @TobySpeight。
头文件中的
显然,将函数体放入头文件是有效的,但是,将函数原型放入头文件和将函数体放入cpp源文件更为常见。这样做的原因是,如果包含函数体的头文件被多个文件包括在内,则函数现在被乘定义,用户在链接时会遇到多个定义错误。一种方法是使头文件中的函数是静态的,但是最好只是在头文件中有原型并将多个对象文件链接在一起。
单一责任原则指出,每个模块或类都应该对软件提供的功能的单个部分负有责任,这种责任应该完全由类封装。它的所有服务都应与这一责任密切配合。
Robert C. Martin expresses the principle as follows:
`A class should have only one reason to change.`虽然这主要针对面向对象语言中的类,但它很好地应用于函数和子程序。
lz77(const std::string& input)函数可以分解为多个函数,特别是外部while循环。内部虽然循环似乎是一个好的候选函数,第一个if语句的两个子句似乎也是很好的函数候选。
函数越独立,理解或读取代码就越容易。这也使任何程序员更容易维护或调试代码。
函数ilz77(const std::string& input)看起来也过于复杂,可能也会被分解。
有几个数字常量(8,15,3,100)确实没有得到解释。最好使用符号常量来给这些值赋予名称和意义。这在C++中做了很多事情
如果此代码的目标是一组库例程,那么将其封装在类中可能更有意义。这将为已经存在的功能提供一个或更少的接口来调用以实现压缩。
代码包含名字很好的变量,如dict_size和buffer_size,但是它包含的变量名为p、n和q,很难理解。请记住,代码需要维护,特性可能需要在3到5年内添加。即使是代码的原始作者,在这样长的时间之后,也可能很难维护具有单个字符变量名称的代码。
发布于 2017-05-24 12:38:36
您应该修复这些编译器警告:
g++ -std=c++17 -fPIC -g -Wall -Wextra -Wwrite-strings -Wno-parentheses -Wpedantic -Warray-bounds -Weffc++ 164064.cpp -o 164064
164064.cpp: In function ‘std::__cxx11::string dec_to_bin(int, int)’:
164064.cpp:53:34: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if(nb_bits > 0 && bin.size() < nb_bits)
~~~~~~~~~~~^~~~~~~~~
164064.cpp: In function ‘std::pair<int, int> get_longest_match(const string&, const string&)’:
164064.cpp:84:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for(int q = 0; q < dictionary.size(); q++)
~~^~~~~~~~~~~~~~~~~~~
164064.cpp: In function ‘std::__cxx11::string lz77(const string&)’:
164064.cpp:157:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if(output_bin_size > output_bin.max_size() - 100)
~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~要修复其中的中间部分,我认为get_longest_match()应该只对无符号类型进行操作:
std::pair<int, int> get_longest_match(const std::string& dictionary,
const std::string& lookahead_buffer)
{
auto n = 0u;
auto const dict_size = dictionary.size();
auto const buffer_size = lookahead_buffer.size();
for (size_t q = 0; q < dictionary.size(); q++) {
auto local_n = 0u;
while (local_n < std::min(buffer_size, dict_size - q) && dictionary[q + local_n] == lookahead_buffer[local_n])
++local_n;
if (local_n == dict_size - q) {
// the match may continue in the buffer
for (int buffer_index = 0;
local_n < buffer_size && lookahead_buffer[buffer_index] == lookahead_buffer[local_n];
++buffer_index)
++local_n;
}
if (local_n > n)
n = local_n;
}
return std::make_pair(dict_size - q, n);
}另外,
164064.cpp:105:5: error: ‘cout’ is not a member of ‘std’
std::cout << n << " ";
^~~我猜那只是你无意中留下的一些错误的调试。
https://codereview.stackexchange.com/questions/164064
复制相似问题