C# 连续数分组算法_C#_Algorithm

C# 连续数分组算法

c# algorithm

C# 连续数分组算法,c#,algorithm,C#,Algorithm,我正在尝试构建一个高效的算法，可以处理数千行包含客户邮政编码的数据。然后，我想将这些邮政编码与大约1000个邮政编码的分组进行交叉检查，但我有大约100列1000个邮政编码。很多邮政编码都是连续的数字，但也有很多随机的邮政编码。所以我想做的是将连续的邮政编码分组在一起，然后我可以检查邮政编码是否在该范围内，而不是检查每个邮政编码样本数据- 90001 90002 90003 90004 90005 90006 90007 90008 90009 90010 90012 90022 90031

我正在尝试构建一个高效的算法，可以处理数千行包含客户邮政编码的数据。然后，我想将这些邮政编码与大约1000个邮政编码的分组进行交叉检查，但我有大约100列1000个邮政编码。很多邮政编码都是连续的数字，但也有很多随机的邮政编码。所以我想做的是将连续的邮政编码分组在一起，然后我可以检查邮政编码是否在该范围内，而不是检查每个邮政编码

样本数据-

这应按如下方式分组：

{ 90001-90010, 90012, 90022, 90031-90034, 90041 }

以下是我对算法的想法：

public struct gRange {
   public int start, end;

   public gRange(int a, int b) {
      start = a;
      if(b != null) end = b;
      else end = a;
   }
}

function groupZips(string[] zips){
    List<gRange> zipList = new List<gRange>();
    int currZip, prevZip, startRange, endRange;
    startRange = 0;

    bool inRange = false;

    for(int i = 1; i < zips.length; i++) {
        currZip = Convert.ToInt32(zips[i]);
        prevZip = Convert.ToInt32(zips[i-1]);

        if(currZip - prevZip == 1 && inRange == false) {
            inRange = true;
            startRange = prevZip;
            continue;
        }
        else if(currZip - prevZip == 1 && inRange == true) continue;
        else if(currZip - prevZip != 1 && inRange == true) {
            inRange = false;
            endRange = prevZip;
            zipList.add(new gRange(startRange, endRange));
            continue;
        }
        else if(currZip - prevZip != 1 && inRange == false) {
            zipList.add(new gRange(prevZip, prevZip));
        }
        //not sure how to handle the last case when i == zips.length-1
    }
}

public struct gRange{
公共int开始，结束；
公共田庄（内部a、内部b）{
开始=a；
如果（b！=null）end=b；
else-end=a；
}
}
函数组zips（字符串[]zips）{
List zipList=新列表（）；
int currZip、prevZip、startRange、endRange；
startRange=0；
bool-inRange=false；
对于（int i=1；i


因此，到目前为止，我不确定如何处理最后一个案例，但看看这个算法，我觉得它没有那么有效。有没有更好/更简单的方法对这样的一组数字进行排序？
在这种特殊情况下，哈希很可能会更快。但是，基于范围的解决方案将使用更少的内存，因此如果您的列表非常大（而且我不相信有足够多的可能zipcode使任何zipcode列表足够大），这将是合适的
无论如何，这里有一个更简单的逻辑来创建范围列表并查找目标是否在范围内：
使ranges
成为一个简单的整数列表（甚至是zipcodes），并将zip
的第一个元素作为其第一个元素

对于zip
的每个元素（最后一个除外），如果该元素加上一个元素与下一个元素不同，则将该元素加上一个元素和下一个元素添加到范围中

在“范围”的末尾按一个以上的zip


现在，要确定zipcode是否在范围内
，请在范围内进行二进制搜索，查找大于目标zipcode的最小元素。[注1]如果该元素的索引为奇数，则目标在其中一个范围内，否则不是

笔记：
AIUI，C#列表上的BinarySearch方法返回找到的元素的索引或第一个较大元素索引的补码。要获得建议算法所需的结果，可以使用index>=0？index+1:~index
，但只搜索比目标小的zipcode，然后使用结果低阶位的补码可能会更简单。
我相信您对这一点想得太多了。仅对IEnumerable使用Linq就可以在不到1/10秒的时间内搜索80000多条记录
我在这里使用了免费的CSV邮政编码列表：
使用系统；
使用System.IO；
使用System.Collections.Generic；
使用系统数据；
使用System.Data.OleDb；
使用System.Linq；
使用系统文本；
命名空间ZipCodeSearchTest
{
结构zipCodeEntry
{
公共字符串ZipCode{get；set；}
公共字符串City{get；set；}
}
班级计划
{
静态void Main（字符串[]参数）
{
List zipCodes=新列表（）；
string dataFileName=“free zipcode database.csv”；
使用（FileStream fs=newfilestream（dataFileName，FileMode.Open，FileAccess.Read））
使用（StreamReader sr=新StreamReader（fs））
而（！sr.EndOfStream）
{
字符串行=sr.ReadLine（）；
字符串[]lineVals=line.Split（'，'）；
添加（新的zipCodeEntry{ZipCode=lineVals[1].Trim（''''\''）、City=lineVals[3].Trim（'''\''））}）；
}
bool终止=假；
而（！终止）
{
Console.WriteLine（“输入邮政编码：”）；
var userEntry=Console.ReadLine（）；
if（userEntry.ToLower（）=“x”| | userEntry.ToString（）=“q”）
终止=真；
其他的
{
DateTime dtStart=DateTime.Now；
foreach（zipCodes.Where中的var arrayVal（z=>z.ZipCode==userEntry.PadLeft（5，'0'））
WriteLine（string.Format（“ZipCode:{0}”，arrayVal.ZipCode.PadRight（20，”）+string.Format（“City:{0}”，arrayVal.City））；
DateTime dtStop=DateTime.Now；
Console.WriteLine（）；
WriteLine（“查找时间：{0}”，dtStop.Subtract（dtStart.ToString（））；
Console.WriteLine（“\n\n”）；
}
}
}
}
}
这里有一个O（n）
解决方案，即使您的邮政编码不能保证井然有序
如果您需要对输出分组进行排序，那么最好的方法就是O（n*log（n）），因为在某个地方您必须对某些内容进行排序，但是如果对zi进行分组
using System;
using System.IO;
using System.Collections.Generic;
using System.Data;
using System.Data.OleDb;
using System.Linq;
using System.Text;

namespace ZipCodeSearchTest
{
    struct zipCodeEntry
    {
        public string ZipCode { get; set; }
        public string City { get; set; }
    }
    class Program
    {
        static void Main(string[] args)
        {
            List<zipCodeEntry> zipCodes = new List<zipCodeEntry>();

            string dataFileName = "free-zipcode-database.csv";
            using (FileStream fs = new FileStream(dataFileName, FileMode.Open, FileAccess.Read))
            using (StreamReader sr = new StreamReader(fs))
                while (!sr.EndOfStream)
                {
                    string line = sr.ReadLine();
                    string[] lineVals = line.Split(',');
                    zipCodes.Add(new zipCodeEntry { ZipCode = lineVals[1].Trim(' ', '\"'), City = lineVals[3].Trim(' ', '\"') });
                }

            bool terminate = false;
            while (!terminate)
            {
                Console.WriteLine("Enter zip code:");
                var userEntry = Console.ReadLine();
                if (userEntry.ToLower() == "x" || userEntry.ToString() == "q")
                    terminate = true;
                else
                {
                    DateTime dtStart = DateTime.Now;
                    foreach (var arrayVal in zipCodes.Where(z => z.ZipCode == userEntry.PadLeft(5, '0')))
                        Console.WriteLine(string.Format("ZipCode: {0}", arrayVal.ZipCode).PadRight(20, ' ') + string.Format("City: {0}", arrayVal.City));
                    DateTime dtStop = DateTime.Now;
                    Console.WriteLine();
                    Console.WriteLine("Lookup time: {0}", dtStop.Subtract(dtStart).ToString());
                    Console.WriteLine("\n\n");
                }
            }
        }
    }
}

// I'm assuming zipcodes are ints... convert if desired
// jumbled up your sample data to show that the code would still work
var zipcodes = new List<int>
{
    90012,
    90033,
    90009,
    90001,
    90005,
    90004,
    90041,
    90008,
    90007,
    90031,
    90010,
    90002,
    90003,
    90034,
    90032,
    90006,
    90022,
};

// facilitate constant-time lookups of whether zipcodes are in your set
var zipHashSet = new HashSet<int>();

// lookup zipcode -> linked list node to remove item in constant time from the linked list
var nodeDictionary = new Dictionary<int, DoublyLinkedListNode<int>>();

// linked list for iterating and grouping your zip codes in linear time
var zipLinkedList = new DoublyLinkedList<int>();

// initialize our datastructures from the initial list
foreach (int zipcode in zipcodes)
{
    zipLinkedList.Add(zipcode);
    zipHashSet.Add(zipcode);
    nodeDictionary[zipcode] = zipLinkedList.Last;
}

// object to store the groupings (ex: "90001-90010", "90022")
var groupings = new HashSet<string>();

// iterate through the linked list, but skip nodes if we group it with a zip code
// that we found on a previous iteration of the loop
var node = zipLinkedList.First;
while (node != null)
{
    var bottomZipCode = node.Element;
    var topZipCode = bottomZipCode;

    // find the lowest zip code in this group
    while (zipHashSet.Contains(bottomZipCode - 1))
    {
        var nodeToDel = nodeDictionary[bottomZipCode - 1];

        // delete node from linked list so we don't observe any node more than once
        if (nodeToDel.Previous != null)
        {
            nodeToDel.Previous.Next = nodeToDel.Next;
        }
        if (nodeToDel.Next != null)
        {
            nodeToDel.Next.Previous = nodeToDel.Previous;
        }
        // see if previous zip code is in our group, too
        bottomZipCode--;
    }
    // get string version zip code bottom of the range
    var bottom = bottomZipCode.ToString();

    // find the highest zip code in this group
    while (zipHashSet.Contains(topZipCode + 1))
    {
        var nodeToDel = nodeDictionary[topZipCode + 1];

        // delete node from linked list so we don't observe any node more than once
        if (nodeToDel.Previous != null)
        {
            nodeToDel.Previous.Next = nodeToDel.Next;
        }
        if (nodeToDel.Next != null)
        {
            nodeToDel.Next.Previous = nodeToDel.Previous;
        }

        // see if next zip code is in our group, too
        topZipCode++;
    }

    // get string version zip code top of the range
    var top = topZipCode.ToString();

    // add grouping in correct format
    if (top == bottom)
    {
        groupings.Add(bottom);
    }
    else
    {
        groupings.Add(bottom + "-" + top);
    }

    // onward!
    node = node.Next;
}


// print results
foreach (var grouping in groupings)
{
    Console.WriteLine(grouping);
}