基数排序算法说明 我是新手。我在C++中搜索基数排序实现,我发现了这个。 代码在这里 void countSort(string a[], int size, size_t k) { string *b = NULL; int *c = NULL; b = new string[size]; c = new int[257]; for (int i = 0; i <257; i++){ c[i] = 0; } for (int j = 0; j <size; j++){ c[k < a[j].size() ? (int)(unsigned char)a[j][k] + 1 : 0]++; //a[j] is a string } for (int f = 1; f <257; f++){ c[f] += c[f - 1]; } for (int r = size - 1; r >= 0; r--){ b[c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0] - 1] = a[r]; c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0]--; } for (int l = 0; l < size; l++){ a[l] = b[l]; } // avold memory leak delete[] b; delete[] c; }
所以我的问题是这些线做了什么:基数排序算法说明 我是新手。我在C++中搜索基数排序实现,我发现了这个。 代码在这里 void countSort(string a[], int size, size_t k) { string *b = NULL; int *c = NULL; b = new string[size]; c = new int[257]; for (int i = 0; i <257; i++){ c[i] = 0; } for (int j = 0; j <size; j++){ c[k < a[j].size() ? (int)(unsigned char)a[j][k] + 1 : 0]++; //a[j] is a string } for (int f = 1; f <257; f++){ c[f] += c[f - 1]; } for (int r = size - 1; r >= 0; r--){ b[c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0] - 1] = a[r]; c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0]--; } for (int l = 0; l < size; l++){ a[l] = b[l]; } // avold memory leak delete[] b; delete[] c; },c++,sorting,radix-sort,C++,Sorting,Radix Sort,所以我的问题是这些线做了什么: c[k < a[j].size() ? (int)(unsigned char)a[j][k] + 1 : 0]++; b[c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0] - 1] = a[r]; c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0]--; c[k”)。 因此,我相信编译器在内部也会做类似的事情明确地这样
c[k < a[j].size() ? (int)(unsigned char)a[j][k] + 1 : 0]++;
b[c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0] - 1] = a[r];
c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0]--;
c[k
这是MSD还是LSD基数排序
谢谢。这是一个不必要的紧凑的简洁示例,因此代码很难阅读 要对其进行分析,请将其稍微分开:
// what a mess...
c[k < a[j].size() ? (int)(unsigned char)a[j][k] + 1 : 0]++;
指数计算包含一个条件:
// determine index for c
const int iC
// check whether k is (not) exceeding the size of a
= k < a[j].size()
// then
? (int)(unsigned char)a[j][k] + 1
// else
: 0;
如上所述将其分离,结果是:
const int iC = k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0;
const int iB = c[iC - 1]; // What?
b[iB] = a[r];
我看不到任何其他分配给c
这一切看起来都不太可信。充其量,该条件过于悲观,并且从未指定0
我确信我可以证明,如果不帮助更容易地发现代码中可能存在的问题,那么紧凑的代码可能会提高可读性 那么,非紧凑型代码是否较慢? 根据我的经验,它并没有出现在现代编译器上,因为它具有惊人的优化功能 我曾经读过一篇关于优化和优化的文章。 同样,当我调试C++代码时,我在VisualStudio调试器监视窗口中不时看到所有有趣的<代码> $$/COD>变量(这绝对不包含任何变量,名为“代码> $$< /COD>”)。 因此,我相信编译器在内部也会做类似的事情明确地这样做是为了提高可读性,对性能的影响应该是最小的 如果我真的有疑问,我仍然可以检查汇编程序的输出。 (例如,这是个好地方。)
顺便说一句,
c=newint[257]代码>
为什么不intc[257]代码>
257int
值不是我害怕立即超过堆栈大小的那么多
更不用说,用<代码>新< /COD>分配的数组,尤其是数组,真的是不好的C++风格。好像还没有发明
当我还是一名学生时,不知何故,我错过了有关基数排序的课程(但我必须承认,我在日常业务中还没有错过这方面的知识)。
因此,出于好奇,我查阅了维基百科,并重新实现了那里的描述。
这是为了提供一个(希望更好的)替代OP在问题中发现和暴露的内容
因此,我实施了
根据上一篇文章的描述,这是一种幼稚的方法
然后OPs显示了我在上找到的方法(带有计数排序)
请注意,字符串是按字符的数值排序的。
如果改为使用英语词典排序,则必须修改数字到存储桶的映射。因此,可以更改字符值的顺序,并将相应的大写和小写字符映射到同一个bucket
频繁地复制字符串(或其他容器)会占用空间和时间,我最好在生产代码中避免这样做。
这是一个降低CPU压力的选项,同时保持代码非常干净,并与后面的算法相当。
这就是我试图在样本代码中考虑的问题。
// determine index for c
const int iC
// check whether k is (not) exceeding the size of a
= k < a[j].size()
// then
? (int)(unsigned char)a[j][k] + 1
// else
: 0;
b[c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0] - 1] = a[r];
const int iC = k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0;
const int iB = c[iC - 1]; // What?
b[iB] = a[r];
c = new int[257];
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
/* helper to find max. length in data strings
*/
size_t maxLength(const std::vector<std::string> &data)
{
size_t lenMax = 0;
for (const std::string &value : data) {
if (lenMax < value.size()) lenMax = value.size();
}
return lenMax;
}
/* a naive implementation of radix sort
* like described in https://en.wikipedia.org/wiki/Radix_sort
*/
void radixSort(std::vector<std::string> &data)
{
/* A char has 8 bits - which encode (unsigned) the numbers of [0, 255].
* Hence, 256 buckets are used for sorting.
*/
std::vector<std::string> buckets[256];
// determine max. length of input data:
const size_t len = maxLength(data);
/* iterate over data for according to max. length
*/
for (size_t i = len; i--;) { // i-- -> check for 0 and post-decrement
// sort data into buckets according to the current "digit":
for (std::string &value : data) {
/* digits after end of string are considered as '\0'
* because 0 is the usual end-marker of C strings
* and the least possible value of an unsigned char.
* This shall ensure that an string goes before a longer
* string with same prefix.
*/
const unsigned char digit = i < value.size() ? value[i] : '\0';
// move current string into the corresponding bucket
buckets[digit].push_back(std::move(value));
}
// store buckets back into data (preserving current order)
data.clear();
for (std::vector<std::string> &bucket : buckets) {
// append bucket to the data
data.insert(data.end(),
std::make_move_iterator(bucket.begin()),
std::make_move_iterator(bucket.end()));
bucket.clear();
}
}
}
/* counting sort as helper for the not so naive radix sort
*/
void countSort(std::vector<std::string> &data, size_t i)
{
/* There are 256 possible values for an unsigned char
* (which may have a value in [0, 255]).
*/
size_t counts[256] = { 0 }; // initialize all counters with 0.
// count how often a certain charater appears at the place i
for (const std::string &value : data) {
/* digits after end of string are considered as '\0'
* because 0 is the usual end-marker of C strings
* and the least possible value of an unsigned char.
* This shall ensure that an string goes before a longer
* string with same prefix.
*/
const unsigned char digit = i < value.size() ? value[i] : '\0';
// count the resp. bucket counter
++counts[digit];
}
// turn counts of digits into offsets in data
size_t total = 0;
for (size_t &count : counts) {
#if 0 // could be compact (and, maybe, confusing):
total = count += total; // as C++ assignment is right-associative
#else // but is the same as:
count += total; // add previous total sum to count
total = count; // remember new total
#endif // 0
}
// an auxiliary buffer to sort the input data into.
std::vector<std::string> buffer(data.size());
/* Move input into aux. buffer
* while using the bucket offsets (the former counts)
* for addressing of new positions.
* This is done backwards intentionally as the offsets
* are decremented from end to begin of partitions.
*/
for (size_t j = data.size(); j--;) { // j-- -> check for 0 and post-decrement
std::string &value = data[j];
// see comment for digit above...
const unsigned char digit = i < value.size() ? value[i] : '\0';
/* decrement offset and use as index
* Arrays (and vectors) in C++ are 0-based.
* Hence, this is adjusted respectively (compared to the source of algorithm).
*/
const size_t k = --counts[digit];
// move input element into auxiliary buffer at the determined offset
buffer[k] = std::move(value);
}
/* That's it.
* Move aux. buffer back into data.
*/
data = std::move(buffer);
}
/* radix sort using count sort internally
*/
void radixCountSort(std::vector<std::string> &data)
{
// determine max. length of input data:
const size_t len = maxLength(data);
/* iterate over data according to max. length
*/
for (size_t i = len; i--;) { // i-- -> check for 0 and post-decrement
countSort(data, i);
}
}
/* output of vector with strings
*/
std::ostream& operator<<(std::ostream &out, const std::vector<std::string> &data)
{
const char *sep = " ";
for (const std::string &value : data) {
out << sep << '"' << value << '"';
sep = ", ";
}
return out;
}
/* do a test for certain data
*/
void test(const std::vector<std::string> &data)
{
std::cout << "Data: {" << data << " }\n";
std::vector<std::string> data1 = data;
radixSort(data1);
std::cout << "Radix Sorted: {" << data1 << " }\n";
std::vector<std::string> data2 = data;
radixCountSort(data2);
std::cout << "Radix Count Sorted: {" << data2 << " }\n";
}
/* helper to turn a text into a vector of strings
* (by separating at white spaces)
*/
std::vector<std::string> tokenize(const char *text)
{
std::istringstream in(text);
std::vector<std::string> tokens;
for (std::string token; in >> token;) tokens.push_back(token);
return tokens;
}
/* main program
*/
int main()
{
// do some tests:
test({ "Hi", "He", "Hello", "World", "Wide", "Web" });
test({ });
test(
tokenize(
"Radix sort dates back as far as 1887 to the work of Herman Hollerith on tabulating machines.\n"
"Radix sorting algorithms came into common use as a way to sort punched cards as early as 1923.\n"
"The first memory-efficient computer algorithm was developed in 1954 at MIT by Harold H. Seward.\n"
"Computerized radix sorts had previously been dismissed as impractical "
"because of the perceived need for variable allocation of buckets of unknown size.\n"
"Seward's innovation was to use a linear scan to determine the required bucket sizes and offsets beforehand, "
"allowing for a single static allocation of auxiliary memory.\n"
"The linear scan is closely related to Seward's other algorithm - counting sort."));
}