Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Regex 链接标记正则表达式分组解析_Regex_Perl - Fatal编程技术网

Regex 链接标记正则表达式分组解析

Regex 链接标记正则表达式分组解析,regex,perl,Regex,Perl,我希望解析一组链接标记并输出两个特定部分 <a href='/mysite/test/sample2/_layouts/ListEdit.aspx?List={2A1D7816-6AC1-4B3B-B9E9-9EEF1B31F812}' onclick='GoToLink(this);return false;'>Customize &quot;Sample List&quot;</a> 我需要捕获guid“2A1D7816-6AC1-4B3B-B9

我希望解析一组链接标记并输出两个特定部分

<a href='/mysite/test/sample2/_layouts/ListEdit.aspx?List={2A1D7816-6AC1-4B3B-B9E9-9EEF1B31F812}' onclick='GoToLink(this);return false;'>Customize &quot;Sample List&quot;</a>

我需要捕获guid“2A1D7816-6AC1-4B3B-B9E9-9EEF1B31F812”和部分标记内容,在本例中为“示例列表”

我可以使用以下方法在单独的列表中对它们进行分类:

For guid: [a-fA-F0-9]{8}-([a-fA-F0-9]{4}-){3}[a-fA-F0-9]{12}
For tag content: (?<=Customize &quot;)((.*)(?=&quot;))
guid:[a-fA-F0-9]{8}-([a-fA-F0-9]{4}-{3}[a-fA-F0-9]{12} 对于标签内容:(?说明

1:(2A1D7816-6AC1-4B3B-B9E9-9EEF1B31F812)
2:(定制“样品清单”)
3: (")
4:(样本清单)
免责声明
有些边缘情况不适用于此,但提供与此处示例类似的输入文本时,您应该可以找到。如果不适用,则您应该真正使用HTML解析。

我不懂Perl,因此我现在无法用Perl编写此脚本。这是用python编写的,应该非常简单。如果你知道Perl,我肯定你能把这个脚本翻译成Perl。我希望你能感谢我的努力

此脚本首先搜索所有链接,然后搜索每个链接的guid和部分标记内容

import re

sample_str = """
<a href='/mysite/test/sample2/_layouts/ListEdit.aspx?List={2A1D7816-6AC1-4B3B-B9E9-9EEF1B31F812}' onclick='GoToLink(this);return false;'>Customize &quot;Sample List&quot;</a>
bla bla
<a href='/mysite/test/sample2/_layouts/ListEdit.aspx?List={21M31F46-937B-88B3-U7Z1-99DFJZ9N249A}' onclick='GoToLink(this);return false;'>Another &quot;This is it&quot;</a>
"""

links = re.findall('<a .*?</a>', sample_str)

for link in links:
    print 'link:'
    print '    ' + link
    print 'list:'
    print '    ' + re.search('List={([^}]*)}', link).group(1)
    print 'quoted text:'
    print '    ' + re.search('>[^<]*&quot;([^<]+)&quot;[^<]*</a>', link).group(1)
    print ''
重新导入
样本_str=”“”
布拉布拉
"""
links=re.findall('
名单:
2A1D7816-6AC1-4B3B-B9E9-9EEF1B31F812
引文:
样本清单
链接:
名单:
21M31F46-937B-88B3-U7Z1-99DFJZ9N249A
引文:
就是这个
如果您有python,则可以在命令行上使用
pythonscriptname.py
轻松运行脚本

import re

sample_str = """
<a href='/mysite/test/sample2/_layouts/ListEdit.aspx?List={2A1D7816-6AC1-4B3B-B9E9-9EEF1B31F812}' onclick='GoToLink(this);return false;'>Customize &quot;Sample List&quot;</a>
bla bla
<a href='/mysite/test/sample2/_layouts/ListEdit.aspx?List={21M31F46-937B-88B3-U7Z1-99DFJZ9N249A}' onclick='GoToLink(this);return false;'>Another &quot;This is it&quot;</a>
"""

links = re.findall('<a .*?</a>', sample_str)

for link in links:
    print 'link:'
    print '    ' + link
    print 'list:'
    print '    ' + re.search('List={([^}]*)}', link).group(1)
    print 'quoted text:'
    print '    ' + re.search('>[^<]*&quot;([^<]+)&quot;[^<]*</a>', link).group(1)
    print ''
link:
    <a href='/mysite/test/sample2/_layouts/ListEdit.aspx?List={2A1D7816-6AC1-4B3B-B9E9-9EEF1B31F812}' onclick='GoToLink(this);return false;'>Customize &quot;Sample List&quot;</a>
list:
    2A1D7816-6AC1-4B3B-B9E9-9EEF1B31F812
quoted text:
    Sample List

link:
    <a href='/mysite/test/sample2/_layouts/ListEdit.aspx?List={21M31F46-937B-88B3-U7Z1-99DFJZ9N249A}' onclick='GoToLink(this);return false;'>Another &quot;This is it&quot;</a>
list:
    21M31F46-937B-88B3-U7Z1-99DFJZ9N249A
quoted text:
    This is it