C# 修饰排序不修饰,如何按降序对字母字段排序
我有一个很大的数据集,计算排序键是相当昂贵的。我想做的是使用DSU模式,在这里我获取行并计算排序键。例如:C# 修饰排序不修饰,如何按降序对字母字段排序,c#,sorting,C#,Sorting,我有一个很大的数据集,计算排序键是相当昂贵的。我想做的是使用DSU模式,在这里我获取行并计算排序键。例如: Qty Name Supplier Row 1: 50 Widgets IBM Row 2: 48 Thingies Dell Row 3: 99 Googaws IBM 要按数量和供应商排序,我可以使用排序键:0050 IBM,0048 Dell,0099 IBM。数字是右对齐的,文本是左对齐的,所有内容都根据需要填
Qty Name Supplier
Row 1: 50 Widgets IBM
Row 2: 48 Thingies Dell
Row 3: 99 Googaws IBM
要按数量和供应商排序,我可以使用排序键:0050 IBM
,0048 Dell
,0099 IBM
。数字是右对齐的,文本是左对齐的,所有内容都根据需要填充
如果我需要按数量降序排序,我可以从常数(比如10000)中减去该值来构建排序键:9950 IBM
,9952 Dell
,9901 IBM
如何快速/廉价地为C#中的字母字段构建降序键
[我的数据都是8位ASCII,带有ISO 8859扩展字符。]
注意:在Perl中,这可以通过以下方式完成:
将此解决方案直接移植到C#中不起作用:
subkey = encoding.GetString(encoding.GetBytes(stringval).
Select(x => (byte)(x ^ 0xff)).ToArray());
我怀疑是因为C#/Perl处理字符串的方式不同。也许Perl是在按ASCII顺序排序,而C#是在试图变得聪明
下面是一段试图实现这一点的示例代码:
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
List<List<string>> sample = new List<List<string>>() {
new List<string>() { "", "apple", "table" },
new List<string>() { "", "apple", "chair" },
new List<string>() { "", "apple", "davenport" },
new List<string>() { "", "orange", "sofa" },
new List<string>() { "", "peach", "bed" },
};
foreach(List<string> line in sample)
{
StringBuilder sb = new StringBuilder();
string key1 = line[1].PadRight(10, ' ');
string key2 = line[2].PadRight(10, ' ');
// Comment the next line to sort desc, desc
key2 = encoding.GetString(encoding.GetBytes(key2).
Select(x => (byte)(x ^ 0xff)).ToArray());
sb.Append(key2);
sb.Append(key1);
line[0] = sb.ToString();
}
List<List<string>> output = sample.OrderBy(p => p[0]).ToList();
return;
System.Text.asciencoding encoding=new System.Text.asciencoding();
列表示例=新列表(){
新列表(){“,”苹果“,”表格“},
新列表(){“,”苹果“,”椅子“},
新列表(){“,”苹果“,”达文波特“},
新列表(){“,”橙色“,”沙发“},
新列表(){“,”桃“,”床“},
};
foreach(示例中的列表行)
{
StringBuilder sb=新的StringBuilder();
字符串key1=行[1]。PadRight(10',);
字符串key2=行[2]。右键(10',);
//注释下一行以排序desc,desc
key2=encoding.GetString(encoding.GetBytes(key2)。
选择(x=>(字节)(x^0xff)).ToArray();
某人附加(键2);
某人附加(键1);
第[0]行=sb.ToString();
}
列表输出=sample.OrderBy(p=>p[0]).ToList();
返回;
只需编写一个IComparer,它可以作为一个比较器链来工作。
如果每个阶段都是平等的,则应将评估传递给下一个关键部分。如果小于或大于此值,请返回
你需要这样的东西:
int comparision = 0;
foreach(i = 0; i < n; i++)
{
comparision = a[i].CompareTo(b[i]) * comparisionSign[i];
if( comparision != 0 )
return comparision;
}
return comparision;
var comparerChain = new ComparerChain<Row>()
.By(r => r.Qty, false)
.By(r => r.Name, false)
.By(r => r.Supplier, false);
var sortedByCustom = rows.OrderBy(i => i, comparerChain).ToList();
第一次调用返回IOrderedEnumerable,它可以按其他字段排序。回答了我自己的问题(但并不令人满意)。为了构造降序字母键,我使用了以下代码,然后将此子键附加到对象的搜索键:
if ( reverse )
subkey = encoding.GetString(encoding.GetBytes(subkey)
.Select(x => (byte)(0x80 - x)).ToArray());
rowobj.sortKey.Append(subkey);
一旦我有了钥匙,我就不能这么做:
rowobjList.Sort();
因为默认比较器不是ASCII顺序(my0x80-x
trick所依赖的)。因此,我必须编写一个使用顺序排序的IComparable
:
public int CompareTo(RowObject other)
{
return String.Compare(this.sortKey, other.sortKey,
StringComparison.Ordinal);
}
这似乎奏效了。我有点不满意,因为在C语言中,字符串的编码/解码让人感觉很笨拙。你可以到达你想要的地方,尽管我承认我不知道是否有更好的整体方法
直接翻译Perl方法的问题是.NET根本不允许您对编码如此放任。但是,如果如您所说,您的数据都是可打印的ASCII(即由Unicode代码点范围为32..127的字符组成)-请注意,没有“8位ASCII”这样的东西-那么您可以执行以下操作:
key2 = encoding.GetString(encoding.GetBytes(key2).
Select(x => (byte)(32+95-(x-32))).ToArray());
在这个表达中,我明确地表达了我在做什么:
- 以
(我假设是在32..127中)x
- 将范围映射到0..95以使其基于零
- 通过从95中减去来反转
- 添加32以映射回可打印范围
这不是很好,但确实有效。如果密钥计算很昂贵,为什么还要计算密钥?字符串比较本身并不是免费的,它实际上是一个昂贵的字符循环,并且不会比自定义比较循环执行得更好 在这个测试中,自定义比较排序的性能大约是DSU的3倍 请注意,DSU密钥计算在本测试中没有测量,而是预先计算的
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using Microsoft.VisualStudio.TestTools.UnitTesting;
namespace DSUPatternTest
{
[TestClass]
public class DSUPatternPerformanceTest
{
public class Row
{
public int Qty;
public string Name;
public string Supplier;
public string PrecomputedKey;
public void ComputeKey()
{
// Do not need StringBuilder here, String.Concat does better job internally.
PrecomputedKey =
Qty.ToString().PadLeft(4, '0') + " "
+ Name.PadRight(12, ' ') + " "
+ Supplier.PadRight(12, ' ');
}
public bool Equals(Row other)
{
if (ReferenceEquals(null, other)) return false;
if (ReferenceEquals(this, other)) return true;
return other.Qty == Qty && Equals(other.Name, Name) && Equals(other.Supplier, Supplier);
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != typeof (Row)) return false;
return Equals((Row) obj);
}
public override int GetHashCode()
{
unchecked
{
int result = Qty;
result = (result*397) ^ (Name != null ? Name.GetHashCode() : 0);
result = (result*397) ^ (Supplier != null ? Supplier.GetHashCode() : 0);
return result;
}
}
}
public class RowComparer : IComparer<Row>
{
public int Compare(Row x, Row y)
{
int comparision;
comparision = x.Qty.CompareTo(y.Qty);
if (comparision != 0) return comparision;
comparision = x.Name.CompareTo(y.Name);
if (comparision != 0) return comparision;
comparision = x.Supplier.CompareTo(y.Supplier);
return comparision;
}
}
[TestMethod]
public void CustomLoopIsFaster()
{
var random = new Random();
var rows = Enumerable.Range(0, 5000).Select(i =>
new Row
{
Qty = (int) (random.NextDouble()*9999),
Name = random.Next().ToString(),
Supplier = random.Next().ToString()
}).ToList();
foreach (var row in rows)
{
row.ComputeKey();
}
var dsuSw = Stopwatch.StartNew();
var sortedByDSU = rows.OrderBy(i => i.PrecomputedKey).ToList();
var dsuTime = dsuSw.ElapsedMilliseconds;
var customSw = Stopwatch.StartNew();
var sortedByCustom = rows.OrderBy(i => i, new RowComparer()).ToList();
var customTime = customSw.ElapsedMilliseconds;
Trace.WriteLine(dsuTime);
Trace.WriteLine(customTime);
CollectionAssert.AreEqual(sortedByDSU, sortedByCustom);
Assert.IsTrue(dsuTime > customTime * 2.5);
}
}
}
使用系统;
使用System.Collections.Generic;
使用系统诊断;
使用System.Linq;
使用系统文本;
使用Microsoft.VisualStudio.TestTools.UnitTesting;
命名空间DSUPatterTest
{
[测试类]
公共类DSUPatterPerformanceTest
{
公共类行
{
公共整数数量;
公共字符串名称;
公共字符串供应商;
公共字符串预计算;
公共无效计算机()
{
//这里不需要StringBuilder,String.Concat在内部做得更好。
预计算基=
Qty.ToString().PadLeft(4,'0')+“”
+Name.PadRight(12'')+“”
+供应商。右键(12“);
}
公共布尔等于(其他行)
{
if(ReferenceEquals(null,other))返回false;
if(ReferenceEquals(this,other))返回true;
返回other.Qty==数量和等于(other.Name,Name)和等于(other.Supplier,Supplier);
}
公共覆盖布尔等于(对象对象对象)
{
if(ReferenceEquals(null,obj))返回false;
if(ReferenceEquals(this,obj))返回true;
if(obj.GetType()!=typeof(Row))返回false;
返回等于((行)obj);
}
公共覆盖int GetHashCode()
{
未经检查
{
int结果=数量;
结果=(结果*397)^(名称!=null?名称。GetHashCode():0);
结果=(结果*397)^(供应商!=null?供应商。GetHashCode():0);
返回结果;
}
}
}
公共类RowComparer:IComparer
{
公共整数比较(第x行,第y行)
{
综合比较;
比较=x数量比较到(y数量);
如果(比较!=0)返回比较;
comparison=x.Name.comparieto(y.Name);
如果(比较!=0)返回比较;
比较=x.供应商。比较到(y.供应商);
回归比较;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using Microsoft.VisualStudio.TestTools.UnitTesting;
namespace DSUPatternTest
{
[TestClass]
public class DSUPatternPerformanceTest
{
public class Row
{
public int Qty;
public string Name;
public string Supplier;
public string PrecomputedKey;
public void ComputeKey()
{
// Do not need StringBuilder here, String.Concat does better job internally.
PrecomputedKey =
Qty.ToString().PadLeft(4, '0') + " "
+ Name.PadRight(12, ' ') + " "
+ Supplier.PadRight(12, ' ');
}
public bool Equals(Row other)
{
if (ReferenceEquals(null, other)) return false;
if (ReferenceEquals(this, other)) return true;
return other.Qty == Qty && Equals(other.Name, Name) && Equals(other.Supplier, Supplier);
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != typeof (Row)) return false;
return Equals((Row) obj);
}
public override int GetHashCode()
{
unchecked
{
int result = Qty;
result = (result*397) ^ (Name != null ? Name.GetHashCode() : 0);
result = (result*397) ^ (Supplier != null ? Supplier.GetHashCode() : 0);
return result;
}
}
}
public class RowComparer : IComparer<Row>
{
public int Compare(Row x, Row y)
{
int comparision;
comparision = x.Qty.CompareTo(y.Qty);
if (comparision != 0) return comparision;
comparision = x.Name.CompareTo(y.Name);
if (comparision != 0) return comparision;
comparision = x.Supplier.CompareTo(y.Supplier);
return comparision;
}
}
[TestMethod]
public void CustomLoopIsFaster()
{
var random = new Random();
var rows = Enumerable.Range(0, 5000).Select(i =>
new Row
{
Qty = (int) (random.NextDouble()*9999),
Name = random.Next().ToString(),
Supplier = random.Next().ToString()
}).ToList();
foreach (var row in rows)
{
row.ComputeKey();
}
var dsuSw = Stopwatch.StartNew();
var sortedByDSU = rows.OrderBy(i => i.PrecomputedKey).ToList();
var dsuTime = dsuSw.ElapsedMilliseconds;
var customSw = Stopwatch.StartNew();
var sortedByCustom = rows.OrderBy(i => i, new RowComparer()).ToList();
var customTime = customSw.ElapsedMilliseconds;
Trace.WriteLine(dsuTime);
Trace.WriteLine(customTime);
CollectionAssert.AreEqual(sortedByDSU, sortedByCustom);
Assert.IsTrue(dsuTime > customTime * 2.5);
}
}
}
var comparerChain = new ComparerChain<Row>()
.By(r => r.Qty, false)
.By(r => r.Name, false)
.By(r => r.Supplier, false);
var sortedByCustom = rows.OrderBy(i => i, comparerChain).ToList();
public class ComparerChain<T> : IComparer<T>
{
private List<PropComparer<T>> Comparers = new List<PropComparer<T>>();
public int Compare(T x, T y)
{
foreach (var comparer in Comparers)
{
var result = comparer._f(x, y);
if (result != 0)
return result;
}
return 0;
}
public ComparerChain<T> By<Tp>(Func<T,Tp> property, bool descending) where Tp:IComparable<Tp>
{
Comparers.Add(PropComparer<T>.By(property, descending));
return this;
}
}
public class PropComparer<T>
{
public Func<T, T, int> _f;
public static PropComparer<T> By<Tp>(Func<T,Tp> property, bool descending) where Tp:IComparable<Tp>
{
Func<T, T, int> ascendingCompare = (a, b) => property(a).CompareTo(property(b));
Func<T, T, int> descendingCompare = (a, b) => property(b).CompareTo(property(a));
return new PropComparer<T>(descending ? descendingCompare : ascendingCompare);
}
public PropComparer(Func<T, T, int> f)
{
_f = f;
}
}