C# 连续数分组算法
我正在尝试构建一个高效的算法,可以处理数千行包含客户邮政编码的数据。然后,我想将这些邮政编码与大约1000个邮政编码的分组进行交叉检查,但我有大约100列1000个邮政编码。很多邮政编码都是连续的数字,但也有很多随机的邮政编码。所以我想做的是将连续的邮政编码分组在一起,然后我可以检查邮政编码是否在该范围内,而不是检查每个邮政编码 样本数据-C# 连续数分组算法,c#,algorithm,C#,Algorithm,我正在尝试构建一个高效的算法,可以处理数千行包含客户邮政编码的数据。然后,我想将这些邮政编码与大约1000个邮政编码的分组进行交叉检查,但我有大约100列1000个邮政编码。很多邮政编码都是连续的数字,但也有很多随机的邮政编码。所以我想做的是将连续的邮政编码分组在一起,然后我可以检查邮政编码是否在该范围内,而不是检查每个邮政编码 样本数据- 90001 90002 90003 90004 90005 90006 90007 90008 90009 90010 90012 90022 90031
90001
90002
90003
90004
90005
90006
90007
90008
90009
90010
90012
90022
90031
90032
90033
90034
90041
这应按如下方式分组:
{ 90001-90010, 90012, 90022, 90031-90034, 90041 }
以下是我对算法的想法:
public struct gRange {
public int start, end;
public gRange(int a, int b) {
start = a;
if(b != null) end = b;
else end = a;
}
}
function groupZips(string[] zips){
List<gRange> zipList = new List<gRange>();
int currZip, prevZip, startRange, endRange;
startRange = 0;
bool inRange = false;
for(int i = 1; i < zips.length; i++) {
currZip = Convert.ToInt32(zips[i]);
prevZip = Convert.ToInt32(zips[i-1]);
if(currZip - prevZip == 1 && inRange == false) {
inRange = true;
startRange = prevZip;
continue;
}
else if(currZip - prevZip == 1 && inRange == true) continue;
else if(currZip - prevZip != 1 && inRange == true) {
inRange = false;
endRange = prevZip;
zipList.add(new gRange(startRange, endRange));
continue;
}
else if(currZip - prevZip != 1 && inRange == false) {
zipList.add(new gRange(prevZip, prevZip));
}
//not sure how to handle the last case when i == zips.length-1
}
}
public struct gRange{
公共int开始,结束;
公共田庄(内部a、内部b){
开始=a;
如果(b!=null)end=b;
else-end=a;
}
}
函数组zips(字符串[]zips){
List zipList=新列表();
int currZip、prevZip、startRange、endRange;
startRange=0;
bool-inRange=false;
对于(int i=1;i
因此,到目前为止,我不确定如何处理最后一个案例,但看看这个算法,我觉得它没有那么有效。有没有更好/更简单的方法对这样的一组数字进行排序?在这种特殊情况下,哈希很可能会更快。但是,基于范围的解决方案将使用更少的内存,因此如果您的列表非常大(而且我不相信有足够多的可能zipcode使任何zipcode列表足够大),这将是合适的 无论如何,这里有一个更简单的逻辑来创建范围列表并查找目标是否在范围内:
ranges
成为一个简单的整数列表(甚至是zipcodes),并将zip
的第一个元素作为其第一个元素
zip
的每个元素(最后一个除外),如果该元素加上一个元素与下一个元素不同,则将该元素加上一个元素和下一个元素添加到范围中
zip
范围内
,请在范围内进行二进制搜索,查找大于目标zipcode的最小元素。[注1]如果该元素的索引为奇数,则目标在其中一个范围内,否则不是
笔记:
AIUI,C#列表上的BinarySearch方法返回找到的元素的索引或第一个较大元素索引的补码。要获得建议算法所需的结果,可以使用index>=0?index+1:~index
,但只搜索比目标小的zipcode,然后使用结果低阶位的补码可能会更简单。我相信您对这一点想得太多了。仅对IEnumerable使用Linq就可以在不到1/10秒的时间内搜索80000多条记录
我在这里使用了免费的CSV邮政编码列表:
使用系统;
使用System.IO;
使用System.Collections.Generic;
使用系统数据;
使用System.Data.OleDb;
使用System.Linq;
使用系统文本;
命名空间ZipCodeSearchTest
{
结构zipCodeEntry
{
公共字符串ZipCode{get;set;}
公共字符串City{get;set;}
}
班级计划
{
静态void Main(字符串[]参数)
{
List zipCodes=新列表();
string dataFileName=“free zipcode database.csv”;
使用(FileStream fs=newfilestream(dataFileName,FileMode.Open,FileAccess.Read))
使用(StreamReader sr=新StreamReader(fs))
而(!sr.EndOfStream)
{
字符串行=sr.ReadLine();
字符串[]lineVals=line.Split(',');
添加(新的zipCodeEntry{ZipCode=lineVals[1].Trim(''''\'')、City=lineVals[3].Trim('''\''))});
}
bool终止=假;
而(!终止)
{
Console.WriteLine(“输入邮政编码:”);
var userEntry=Console.ReadLine();
if(userEntry.ToLower()=“x”| | userEntry.ToString()=“q”)
终止=真;
其他的
{
DateTime dtStart=DateTime.Now;
foreach(zipCodes.Where中的var arrayVal(z=>z.ZipCode==userEntry.PadLeft(5,'0'))
WriteLine(string.Format(“ZipCode:{0}”,arrayVal.ZipCode.PadRight(20,”)+string.Format(“City:{0}”,arrayVal.City));
DateTime dtStop=DateTime.Now;
Console.WriteLine();
WriteLine(“查找时间:{0}”,dtStop.Subtract(dtStart.ToString());
Console.WriteLine(“\n\n”);
}
}
}
}
}
这里有一个O(n)
解决方案,即使您的邮政编码不能保证井然有序
如果您需要对输出分组进行排序,那么最好的方法就是O(n*log(n))
,因为在某个地方您必须对某些内容进行排序,但是如果对zi进行分组
using System;
using System.IO;
using System.Collections.Generic;
using System.Data;
using System.Data.OleDb;
using System.Linq;
using System.Text;
namespace ZipCodeSearchTest
{
struct zipCodeEntry
{
public string ZipCode { get; set; }
public string City { get; set; }
}
class Program
{
static void Main(string[] args)
{
List<zipCodeEntry> zipCodes = new List<zipCodeEntry>();
string dataFileName = "free-zipcode-database.csv";
using (FileStream fs = new FileStream(dataFileName, FileMode.Open, FileAccess.Read))
using (StreamReader sr = new StreamReader(fs))
while (!sr.EndOfStream)
{
string line = sr.ReadLine();
string[] lineVals = line.Split(',');
zipCodes.Add(new zipCodeEntry { ZipCode = lineVals[1].Trim(' ', '\"'), City = lineVals[3].Trim(' ', '\"') });
}
bool terminate = false;
while (!terminate)
{
Console.WriteLine("Enter zip code:");
var userEntry = Console.ReadLine();
if (userEntry.ToLower() == "x" || userEntry.ToString() == "q")
terminate = true;
else
{
DateTime dtStart = DateTime.Now;
foreach (var arrayVal in zipCodes.Where(z => z.ZipCode == userEntry.PadLeft(5, '0')))
Console.WriteLine(string.Format("ZipCode: {0}", arrayVal.ZipCode).PadRight(20, ' ') + string.Format("City: {0}", arrayVal.City));
DateTime dtStop = DateTime.Now;
Console.WriteLine();
Console.WriteLine("Lookup time: {0}", dtStop.Subtract(dtStart).ToString());
Console.WriteLine("\n\n");
}
}
}
}
}
// I'm assuming zipcodes are ints... convert if desired
// jumbled up your sample data to show that the code would still work
var zipcodes = new List<int>
{
90012,
90033,
90009,
90001,
90005,
90004,
90041,
90008,
90007,
90031,
90010,
90002,
90003,
90034,
90032,
90006,
90022,
};
// facilitate constant-time lookups of whether zipcodes are in your set
var zipHashSet = new HashSet<int>();
// lookup zipcode -> linked list node to remove item in constant time from the linked list
var nodeDictionary = new Dictionary<int, DoublyLinkedListNode<int>>();
// linked list for iterating and grouping your zip codes in linear time
var zipLinkedList = new DoublyLinkedList<int>();
// initialize our datastructures from the initial list
foreach (int zipcode in zipcodes)
{
zipLinkedList.Add(zipcode);
zipHashSet.Add(zipcode);
nodeDictionary[zipcode] = zipLinkedList.Last;
}
// object to store the groupings (ex: "90001-90010", "90022")
var groupings = new HashSet<string>();
// iterate through the linked list, but skip nodes if we group it with a zip code
// that we found on a previous iteration of the loop
var node = zipLinkedList.First;
while (node != null)
{
var bottomZipCode = node.Element;
var topZipCode = bottomZipCode;
// find the lowest zip code in this group
while (zipHashSet.Contains(bottomZipCode - 1))
{
var nodeToDel = nodeDictionary[bottomZipCode - 1];
// delete node from linked list so we don't observe any node more than once
if (nodeToDel.Previous != null)
{
nodeToDel.Previous.Next = nodeToDel.Next;
}
if (nodeToDel.Next != null)
{
nodeToDel.Next.Previous = nodeToDel.Previous;
}
// see if previous zip code is in our group, too
bottomZipCode--;
}
// get string version zip code bottom of the range
var bottom = bottomZipCode.ToString();
// find the highest zip code in this group
while (zipHashSet.Contains(topZipCode + 1))
{
var nodeToDel = nodeDictionary[topZipCode + 1];
// delete node from linked list so we don't observe any node more than once
if (nodeToDel.Previous != null)
{
nodeToDel.Previous.Next = nodeToDel.Next;
}
if (nodeToDel.Next != null)
{
nodeToDel.Next.Previous = nodeToDel.Previous;
}
// see if next zip code is in our group, too
topZipCode++;
}
// get string version zip code top of the range
var top = topZipCode.ToString();
// add grouping in correct format
if (top == bottom)
{
groupings.Add(bottom);
}
else
{
groupings.Add(bottom + "-" + top);
}
// onward!
node = node.Next;
}
// print results
foreach (var grouping in groupings)
{
Console.WriteLine(grouping);
}