Regex 将字符串与最大数值匹配的正则表达式正确_Regex

Regex 将字符串与最大数值匹配的正则表达式正确

regex

Regex 将字符串与最大数值匹配的正则表达式正确,regex,Regex,我试图找到一种方法，使用下面的正则表达式匹配所有字符串输入字符串： https://subdomain.domain.com/e8cf09b4763e03d208dfd21121baacd4/domain_p6amv8xJVr1qto1_500.txt https://subdomain.domain.com/e8cf09b4763e03d208dfd21121baacd4/domain_p6amv8xJVr1qto1_400.txt https://subdomain

我试图找到一种方法，使用下面的正则表达式匹配所有字符串

输入字符串：

    https://subdomain.domain.com/e8cf09b4763e03d208dfd21121baacd4/domain_p6amv8xJVr1qto1_500.txt
    https://subdomain.domain.com/e8cf09b4763e03d208dfd21121baacd4/domain_p6amv8xJVr1qto1_400.txt
    https://subdomain.domain.com/e8cf09b4763e03d208dfd21121baacd4/domain_p6amv8xJVr1qto1_250.txt
    https://subdomain.domain.com/e8cf09b4763e03d208dfd21121baacd4/domain_p6amv8xJVr1qto1_10.txt

    https://subdomain.domain.com/163c7b0508062729dsdk1f1e264210/domain_p6amv8xJVr1wvilqto2_640.txt
    https://subdomain.domain.com/163c7b0508062729dsdk1f1e264210/domain_p6amv8xJVr1wvilqto2_1280.txt
    https://subdomain.domain.com/163c7b0508062729dsdk1f1e264210/domain_p6amv8xJVr1wvilqto2_540.txt

    https://subdomain.domain.com/adfd386be957c3247/domain_p6amv8xJVr1wvilqto3_250.txt
    https://subdomain.domain.com/adfd386be957c3247/domain_p6amv8xJVr1wvilqto3_100.txt

    https://subdomain.domain.com/25e5ccd5e95ca2888a39b939f199b822/domain_p6amv8xJVr1ilqto4_640.txt
    https://subdomain.domain.com/25e5ccd5e95ca2888a39b939f199b822/domain_p6amv8xJVr1ilqto4_540.txt
    https://subdomain.domain.com/25e5ccd5e95ca2888a39b939f199b822/domain_p6amv8xJVr1ilqto4_980.csv

预期产出：

    https://subdomain.domain.com/e8cf09b4763e03d208dfd21121baacd4/domain_p6amv8xJVr1qto1_500.txt
    https://subdomain.domain.com/163c7b0508062729dsdk1f1e264210/domain_p6amv8xJVr1wvilqto2_1280.txt
    https://subdomain.domain.com/adfd386be957c3247/domain_p6amv8xJVr1wvilqto3_250.txt
    https://subdomain.domain.com/25e5ccd5e95ca2888a39b939f199b822/domain_p6amv8xJVr1ilqto4_980.csv

我正在尝试下面的表达式，但是它得到了所有的URL，我如何才能将结果限制为我想要的

    "https://subdomain.domain.com/([^,:"]+?([_\d]*?)).(txt|csv)"

您可以使用否定字符类

[^，：“]+

来不匹配逗号、冒号或双引号。我认为您不必使用

？

然后使用一个空格将1+位数字后跟下划线与列出的任何数字（？：500 | 1280 | 980）进行匹配

对于示例数据，您不必将下划线或数字的0+倍匹配为非贪婪的

[\ud]*？

，您还可以将1+位匹配为下划线

\d+

注意：转义点

\。

以逐字匹配

https://subdomain\.domain\.com/[^,:"]+\d+_(?:500|1280|980)\.(?:txt|csv)

当我了解到使用Regex几乎不可能实现这样的目标时，我已经在C#中实现了这一点，使用LINQ而不使用Regex。多亏了Burdui，我在尝试您的建议时提出了这一点

    public List<string> FindUnique(List<string> Urls)
    {
        var distinct = Urls.Distinct();
        var grouping = distinct.GroupBy(x => x.Substring(1, x.LastIndexOf('_')));

        if (grouping.Count() > 0)
        { 
            return grouping.Select(x =>
                x.First(a =>
                    a.Contains(x.Max(y =>
                        Int32.Parse(y.Substring(y.LastIndexOf('_') + 1).Split('.')[0])).ToString())
                )
            ).ToList();
        }
        else
        {
            return distinct.ToList();
        }
    }

公共列表FindUnique（列表URL）
{
var distinct=url.distinct（）；
var grouping=distinct.GroupBy（x=>x.Substring（1，x.LastIndexOf（“”））；
if（grouping.Count（）>0）
{ 
返回分组。选择（x=>
x、 第一（a=>
a、 包含（x.Max（y=>
解析（y.Substring（y.LastIndexOf（''.'）+1）.Split（'.'）[0]）.ToString（））
)
).ToList（）；
}
其他的
{
返回distinct.ToList（）；
}
}

如果您的块确实按照您的问题进行了分组，那么很容易做到这一点
使用正则表达式

@（？m）（？：^[^\S\r\n]*（https？：/\S+？）（\d+）\（txt | csv）[^\S\r\n]*$\r？\n）+（？=\S*\r\n |$）”

解释

 (?m)
 (?:                           # Cluster group for block
      ^                             # BOL
      [^\S\r\n]*                    # Optional horizontal whitespace
      ( https?:// \S+? _ )          # (1), Location
      ( \d+ )                       # (2), Number
      \. 
      ( txt | csv )                 # (3), Extension
      [^\S\r\n]*                    # Optional horizontal whitespace
      $ \r? \n                      # EOL plus linebreak
 )+                            # End cluster, 1 to many times
 (?= \s* \r \n | $ )           # Lookahead to determine where the end of block is

C#代码示例

var str =
"    https://subdomain.domain.com/e8cf09b4763e03d208dfd21121baacd4/domain_p6amv8xJVr1qto1_500.txt\n" + 
"    https://subdomain.domain.com/e8cf09b4763e03d208dfd21121baacd4/domain_p6amv8xJVr1qto1_400.txt\n" +
"    https://subdomain.domain.com/e8cf09b4763e03d208dfd21121baacd4/domain_p6amv8xJVr1qto1_250.txt\n" +
"    https://subdomain.domain.com/e8cf09b4763e03d208dfd21121baacd4/domain_p6amv8xJVr1qto1_10.txt\n" +
"\n" +
"    https://subdomain.domain.com/163c7b0508062729dsdk1f1e264210/domain_p6amv8xJVr1wvilqto2_640.txt\n" +
"    https://subdomain.domain.com/163c7b0508062729dsdk1f1e264210/domain_p6amv8xJVr1wvilqto2_1280.txt\n" +
"    https://subdomain.domain.com/163c7b0508062729dsdk1f1e264210/domain_p6amv8xJVr1wvilqto2_540.txt\n" +
"\n" +
"    https://subdomain.domain.com/adfd386be957c3247/domain_p6amv8xJVr1wvilqto3_250.txt\n" +
"    https://subdomain.domain.com/adfd386be957c3247/domain_p6amv8xJVr1wvilqto3_100.txt\n" +
"\n" +
"    https://subdomain.domain.com/25e5ccd5e95ca2888a39b939f199b822/domain_p6amv8xJVr1ilqto4_640.txt\n" +
"    https://subdomain.domain.com/25e5ccd5e95ca2888a39b939f199b822/domain_p6amv8xJVr1ilqto4_540.txt\n" +
"    https://subdomain.domain.com/25e5ccd5e95ca2888a39b939f199b822/domain_p6amv8xJVr1ilqto4_980.csv\n" +
"\n";

// This regex matches a block each time
var RxBlock = new Regex(@"(?m)(?:^[^\S\r\n]*(https?://\S+?_)(\d+)\.(txt|csv)[^\S\r\n]*$\r?\n)+(?=\s*\r\n|$)");

Match M = RxBlock.Match(str);
while (M.Success)
{
    CaptureCollection ccFileLoc = M.Groups[1].Captures;  // location
    CaptureCollection ccFileNum = M.Groups[2].Captures;  // number
    CaptureCollection ccFileExt = M.Groups[3].Captures;  // extension

    String Loc = ccFileLoc[0].Value;
    String Ext = ccFileExt[0].Value;
    int Largest = 0;
    bool bValid = true;

    if (Int32.TryParse(ccFileNum[0].Value, out Largest))
    {
        int cur_num = 0;
        int cnt = ccFileLoc.Count;

        for (int i = 0; bValid && i < cnt; i++)
        {
            if (!Int32.TryParse(ccFileNum[i].Value, out cur_num) || ccFileLoc[i].Value != Loc)
                bValid = false;
            else
            if (cur_num > Largest)
            {
                Largest = cur_num;
                Ext = ccFileExt[i].Value;
            }
        }
    }
    else
        bValid = false;

    if ( bValid )
        Console.WriteLine("{0}{1}.{2} ", Loc, Largest, Ext);

    M = M.NextMatch();
}

即使您的数据没有排序，您也可以这样使用正则表达式。
必须先对其进行行排序。
然后，需要稍微修改一下。如果您想这样做

好的，让我知道，我可能会告诉你怎么做。

如果唯一的区别是结尾的数字，那么试试

https://subdomain\.domain\.com/[^，：“]+（？：500 | 1280 | 980）\.（？：txt | csv）

你能解释一下，你用什么标准来过滤吗？我可以编写regex

980\.csv$|（（25 | 50 | 128）0\.txt）$

，它会根据语言的不同过滤您的输入。你们怎么能在不知道他用的语言的情况下回答呢？我认为它是python，但我不确定。您想要的是正则表达式的功能之外的东西。使用pragraming语言按第一个捕获组对结果进行分组，然后按第二个捕获组提取最大值。你说了你想要的吗？我没看到那句话。那是一个详细而精彩的解释。谢谢你的努力，它帮助我以多种方式解决了这个问题。再次感谢。

https://subdomain.domain.com/e8cf09b4763e03d208dfd21121baacd4/domain_p6amv8xJVr1qto1_500.txt
https://subdomain.domain.com/163c7b0508062729dsdk1f1e264210/domain_p6amv8xJVr1wvilqto2_1280.txt
https://subdomain.domain.com/adfd386be957c3247/domain_p6amv8xJVr1wvilqto3_250.txt
https://subdomain.domain.com/25e5ccd5e95ca2888a39b939f199b822/domain_p6amv8xJVr1ilqto4_980.csv