我一直在研究,虽然我已经想出了用基排序对字符串数组进行字母化的总体想法,但我知道我走错了方向。
到目前为止,这就是我所拥有的:
void radixSort(string* sortMe, int l)
{
queue<string>* sections = new queue<string>[27]; //Have a-z, and also one for strings that are null terminated.
for(int i = 0; i < numElements; i++)
{
if(!(sortMe[i][l] == 32))
sections[sortMe[i][l]-96].push(sortMe[i]); //-96 because the ascii code for a is 97. If, for example a is the character, it will be placed at 1. 0 is left for null characters
}
for(int i =0; i < 26; i++)
{
while(!sections[i].empty())
{
temp.push_back(sections[i].front());
sections[i].pop();
}
}
}到目前为止,我已经按照第一个字符对所有字符串进行了排序,我知道接下来我必须对剩下的字符进行子数组并对它们进行排序,但是如何有效地实现它呢?字符串的大小是可变的,可以包括空格,例如:
这是我发现的似乎有用的东西:http://algs4.cs.princeton.edu/lectures/51DemoKeyIndexedCounting.pdf
发布于 2014-04-13 08:36:42
你找到的幻灯片很棒!但是,这些队列在您的代码中来自哪里呢?
总之,您在这里(实例化):
template <typename E>
size_t bin(const E& elem, size_t digit)
{
return elem.size() > digit ? size_t(elem[digit]) + 1 : 0;
}
template <size_t R, typename C, typename P>
void radix_sort(P& pos, const C& data, size_t digit)
{
using A = std::array<size_t, R + 1>;
A count = {};
P prev(pos);
for (auto i : prev)
++count[bin(data[i], digit)];
A done = {}, offset = {{0}};
std::partial_sum(count.begin(), count.end() - 1, offset.begin() + 1);
for (auto i : prev)
{
size_t b = bin(data[i], digit);
pos[offset[b] + done[b]++] = i;
}
}
struct shorter
{
template <typename A>
bool operator()(const A& a, const A& b) { return a.size() < b.size(); }
};
template <size_t R, typename C>
std::vector<size_t> radix_sort(const C& data)
{
std::vector<size_t> pos(data.size());
std::iota(pos.begin(), pos.end(), 0);
size_t width = std::max_element(data.begin(), data.end(), shorter())->size();
for (long digit = long(width) - 1; digit >= 0; --digit)
radix_sort<R>(pos, data, size_t(digit));
return pos;
}你可以这样用
int main()
{
std::vector<std::string> data = generate();
std::vector<size_t> pos = radix_sort<128>(data);
for (auto i : pos)
std::cout << data[i] << std::endl;
}其中,generate()包含在实际示例中,并生成在您的问题中给出的字符串。
我不想在这里解释这是怎么回事,我想你可以弄清楚,因为你正在解决这个问题。但有几句话是有道理的。
O(lw),其中l是输入长度(输入字符串数),w是最大输入宽度(最大)。所有输入字符串的长度)。因此,如果字符串宽度变化不大,则此算法是有意义的。R of radix_sort()是输入中每个数字(字母)的可能值数。例如,R = 128意味着可能的值是0..127。这应该对你的投入是好的。我没有尝试在ASCII代码方面做任何聪明的事情,但是您可以为此定制函数bin()。bin()的输出中,值0保留为“我们超过了这个字符串的末尾”。这些字符串放在仍在继续的其他字符串之前。发布于 2014-12-02 04:00:07
非常类似于iavr,但排序到位(使用g++ -O3对iavr的解决方案进行基准测试,与iavr的1780‘s相比需要20’s左右),享受常规接口和可恢复的代码。Iavr实现的问题在于它的逻辑只适用于字符串容器,并且不易扩展到其他类型。显然,他的专业版本更有效率,但为了规律牺牲一些性能可能是值得的。您可以在基排序实现找到其余的代码
一般基数分类:
template <typename T>
using Iter_value = std::iterator_traits<T>::value_type;
// intermediate struct to get partial template specialization
template<typename Iter, typename T, size_t range = 256>
struct rdx_impl {
static void rdx_sort(Iter begin, Iter end, int bits) {
// bits is # bits to consider up to if a max val is known ahead of time
// most efficent (theoretically) when digits are base n, having lg(n) bits
constexpr size_t digit_bits {8}; // # bits in digit, 8 works well for 32 and 64 bit vals
size_t d {0}; // current digit #
for (long long mask = (1 << digit_bits) - 1;
d * digit_bits < bits;) {// ex. 0x000000ff for setting lower 8 bits on 32 bit num
cnt_sort(begin, end, range, Digit_cmp<T>(mask, digit_bits*d));
++d;
}
}
};
// specialization of rdx_sort for strings
struct Shorter {
template <typename Seq>
bool operator()(const Seq& a, const Seq& b) { return a.size() < b.size(); }
};
template <typename Iter>
struct rdx_impl<Iter, std::string> { // enough to hold ASCII char range
static void rdx_sort(Iter begin, Iter end, int) {
// ignore additional int argument
int len_max = std::max_element(begin, end, Shorter())->size();
for (int d = len_max - 1; d >= 0; --d)
cnt_sort(begin, end, 128, Digit_cmp<std::string>(d));
}
};
// generic call interface for all iterators
template <typename Iter> // use intermediate struct for partial specialization
void rdx_sort(Iter begin, Iter end, int bits) {
rdx_impl<Iter, Iter_value<Iter>>::rdx_sort(begin, end, bits);
}将排序计数到每一位数字上(就位):
template <typename Iter, typename Op>
void cnt_sort(Iter begin, Iter end, size_t range, Op op) {
using T = typename Iter::value_type;
std::vector<int> counts(range); // init to 0
for (auto i = begin; i != end; ++i) // count # elems == i
++counts[op(*i)];
for (size_t i = 1; i < range; ++i)
counts[i] += counts[i-1]; // turn into # elems <= i
std::vector<T> res(end - begin);
for (auto j = end;;) {
--j;
res[--counts[op(*j)]] = *j;
if (j == begin) break;
}
// ~18% of time is spent on copying
std::copy(res.begin(), res.end(), begin);
}数字的提取值:
template <typename T> // overload digit_cmp for non-integral types top provide radix sort with digits
class Digit_cmp { // functor for comparing a "digit" (particular bits)
const long long mask; // 0..63 bitfield to test against
const size_t to_shift;
public:
Digit_cmp(long long m, size_t ts) : mask{m}, to_shift{ts} {}
// by default assumes integral, just shifts
size_t operator()(T n) const { // char assuming r = 8
return (n >> to_shift) & mask; // shift then mask for unit digit
}
};
// specialization for strings
template <>
class Digit_cmp<std::string> {
const size_t digit;
public:
Digit_cmp(size_t d) : digit{d} {}
size_t operator()(const std::string& str) {
// 0 indicates past the end of the string
return str.size() > digit ? str[digit] : 0;
}
};https://stackoverflow.com/questions/23038622
复制相似问题