Python 模式的正则表达式_Python_Python 3.x

Python 模式的正则表达式

python python-3.x

Python 模式的正则表达式,python,python-3.x,Python,Python 3.x,我是正则表达式的初学者，试图搜索特定的数字模式。以下数据以xml格式嵌入 <Tag Name="DUT_1_PC" TagType="Base" DataType="Power" Constant="false" ExternalAccess="Read/Write"> <Data Format="xx"> <![CDATA[[10247,200

我是正则表达式的初学者，试图搜索特定的数字模式。以下数据以xml格式嵌入

<Tag Name="DUT_1_PC" TagType="Base" DataType="Power" Constant="false" ExternalAccess="Read/Write">
<Data Format="xx">
<![CDATA[[10247,20000,1705,0,16384,16384,[0,0,0,0,0,0,0],[[0,0,0],[1965615,2000,2000],[1952824,50000,0],[0,10000,0],[1928064,500,0
        ],[1928064,10000,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0]],4,1705,[24779,24760,24760,24760,24780,24740,24760,24780,24760,24800,24740
        ,24740,24740,24780,24740,24740,24800,24780,24760,24760,24740,24780,24760,24760,24740,24740,24780,24760
        ,24740,24740,24779,24779,24760,24819,24780,24740,24759,24780,24760,24740,24720,24780,24780,24760,24760
        ,24740,24779,24780,24740,24760,24820,24780,24740,24780,24760,24780,24780,24760,24781,24719,24779,24800
        ,24780,24780,24760,24760,24799,24780,24780,24780,24739,24780,24780,24740,24779,24741,24780,24780,24760
        ,24740,24740,24720,24740,24780,24740,24720,24760,24800,24740,24760,24760,24800,24740,24780,24760,24740,24760,24740,24740,24740,24780,24760,24780,24739,24761,24760,24800,24780,24740,24719,24739,24760,24760]]]]


需求是提取数据（最里面的列表）。在此示例中，数据从24779到24760。
注：每次数据可能不是从“24”开始。
因此，我计划按以下逻辑提取：
如果标签名称（在本例中：DUT_1_PC）具有除零以外的有效数据，且有效数据的计数大于100，则用逗号分隔，然后提取该列表及其标签名称（DUT_1_PC）
我无法提取所需的数据
关于findall（r'\d+（？：[\d，.]*\d））
这个正则表达式提取所有不能满足我要求的列表数据
有谁能帮我找出正则表达式来提取所需的数据及其标记吗？
不完全清楚最里面的列表是什么意思。在您的示例中，[0,0,0]
比您提到的列表嵌套得更深。假设您指的是最后一个列表，那么re.findall（r'\[（[^]]*）]*]]>$，DATA，re.MULTILINE）
（查找所有以“]]>”结尾的[…]列表）将查找所有这些列表
但是，正如其他人提到的，使用XML解析器要好得多：
DATA = """
<Outer>
<Tag Name="DUT_1_PC" TagType="Base" DataType="Power" Constant="false" ExternalAccess="Read/Write">
<Data Format="xx">
<![CDATA[[10247,20000,1705,0,16384,16384,[0,0,0,0,0,0,0],[[0,0,0],[1965615,2000,2000],[1952824,50000,0],[0,10000,0],[1928064,500,0
        ],[1928064,10000,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0]],4,1705,[24779,24760,24760,24760,24780,24740,24760,24780,24760,24800,24740
        ,24740,24740,24780,24740,24740,24800,24780,24760,24760,24740,24780,24760,24760,24740,24740,24780,24760
        ,24740,24740,24779,24779,24760,24819,24780,24740,24759,24780,24760,24740,24720,24780,24780,24760,24760
        ,24740,24779,24780,24740,24760,24820,24780,24740,24780,24760,24780,24780,24760,24781,24719,24779,24800
        ,24780,24780,24760,24760,24799,24780,24780,24780,24739,24780,24780,24740,24779,24741,24780,24780,24760
        ,24740,24740,24720,24740,24780,24740,24720,24760,24800,24740,24760,24760,24800,24740,24780,24760,24740,24760,24740,24740,24740,24780,24760,24780,24739,24761,24760,24800,24780,24740,24719,24739,24760,24760]]]]>
</Data>
</Tag> 
</Outer>     
"""

import re
import xml.etree.ElementTree as ET

# Find [...] list at end of string
pattern = re.compile(r'\[([^]]*)]+$', re.MULTILINE)
parsed = ET.fromstring(DATA)
for tag in parsed.findall('Tag'):
    if tag.attrib.get('Name') == 'DUT_1_PC':
        print(re.findall(pattern, tag.find('Data').text))

DATA=”“”
"""
进口稀土
将xml.etree.ElementTree作为ET导入
#在字符串末尾查找[…]列表
pattern=re.compile（r'\[（[^]]*）]+$，re.MULTILINE）
parsed=ET.fromstring（数据）
对于parsed.findall（'tag'）中的标记：
如果tag.attrib.get（'Name'）='DUT_1_PC'：
打印（关于findall（模式、标记、查找（'Data'）.text））
还不完全清楚最里面的列表是什么意思。在您的示例中，[0,0,0]
比您提到的列表嵌套得更深。假设您指的是最后一个列表，那么re.findall（r'\[（[^]]*）]*]]>$，DATA，re.MULTILINE）
（查找所有以“]]>”结尾的[…]列表）将查找所有这些列表
但是，正如其他人提到的，使用XML解析器要好得多：
DATA = """
<Outer>
<Tag Name="DUT_1_PC" TagType="Base" DataType="Power" Constant="false" ExternalAccess="Read/Write">
<Data Format="xx">
<![CDATA[[10247,20000,1705,0,16384,16384,[0,0,0,0,0,0,0],[[0,0,0],[1965615,2000,2000],[1952824,50000,0],[0,10000,0],[1928064,500,0
        ],[1928064,10000,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0]],4,1705,[24779,24760,24760,24760,24780,24740,24760,24780,24760,24800,24740
        ,24740,24740,24780,24740,24740,24800,24780,24760,24760,24740,24780,24760,24760,24740,24740,24780,24760
        ,24740,24740,24779,24779,24760,24819,24780,24740,24759,24780,24760,24740,24720,24780,24780,24760,24760
        ,24740,24779,24780,24740,24760,24820,24780,24740,24780,24760,24780,24780,24760,24781,24719,24779,24800
        ,24780,24780,24760,24760,24799,24780,24780,24780,24739,24780,24780,24740,24779,24741,24780,24780,24760
        ,24740,24740,24720,24740,24780,24740,24720,24760,24800,24740,24760,24760,24800,24740,24780,24760,24740,24760,24740,24740,24740,24780,24760,24780,24739,24761,24760,24800,24780,24740,24719,24739,24760,24760]]]]>
</Data>
</Tag> 
</Outer>     
"""

import re
import xml.etree.ElementTree as ET

# Find [...] list at end of string
pattern = re.compile(r'\[([^]]*)]+$', re.MULTILINE)
parsed = ET.fromstring(DATA)
for tag in parsed.findall('Tag'):
    if tag.attrib.get('Name') == 'DUT_1_PC':
        print(re.findall(pattern, tag.find('Data').text))

DATA=”“”
"""
进口稀土
将xml.etree.ElementTree作为ET导入
#在字符串末尾查找[…]列表
pattern=re.compile（r'\[（[^]]*）]+$，re.MULTILINE）
parsed=ET.fromstring（数据）
对于parsed.findall（'tag'）中的标记：
如果tag.attrib.get（'Name'）='DUT_1_PC'：
打印（关于findall（模式、标记、查找（'Data'）.text））
为什么不使用XML解析器？如果您知道pandas已经可以检查，那么在pandas中操作数据就很容易了为什么不使用XML解析器？如果您知道pandas已经可以检查，那么在pandas中操作数据就很容易了