Python：通过匹配正则表达式进行迭代_Python_Regex_Sorting

Python：通过匹配正则表达式进行迭代

python regex sorting

Python：通过匹配正则表达式进行迭代,python,regex,sorting,Python,Regex,Sorting,我正在制作一个脚本来自动解析一些文本数据（具有复杂的结构），并将其插入MySQL数据库我希望有多个for循环，根据与文件名匹配的正则表达式在文件列表上迭代。最后，我将连接它们并将它们插入数据库以下是我的正则表达式： Trgx= re.compile('([a-zA-Z0-9]{3,4})_.*_.*_.*$'); Dtrgx= re.compile('[a-zA-Z0-9]{3,4}_[a-zA-Z0-9]{3,4}_([0-9]{10})_[0-9]{3}'); Mrgx= re.com

我正在制作一个脚本来自动解析一些文本数据（具有复杂的结构），并将其插入MySQL数据库

我希望有多个for循环，根据与文件名匹配的正则表达式在文件列表上迭代。最后，我将连接它们并将它们插入数据库

以下是我的正则表达式：

Trgx= re.compile('([a-zA-Z0-9]{3,4})_.*_.*_.*$');
Dtrgx= re.compile('[a-zA-Z0-9]{3,4}_[a-zA-Z0-9]{3,4}_([0-9]{10})_[0-9]{3}'); 
Mrgx= re.compile('.*_([a-zA-Z0-9]{3,4})_.*$'); 
Hrgx= re.compile('.*([0-9]{3}).csv$');

我的文件名如下所示：

ecd_cdd_2012102100_000.csv
ecd_cdd_2012102100_024.csv
ecd_hdd_2012102200_000.csv
ecd_hdd_2012102200_024.csv
ecd_hdd_2012102200_048.csv
ecd_avgt_2012102200_120.csv
ecd_avgt_2012102200_144.csv
ecd_avgt_2012102200_168.csv
ecd_mint_2012102200_192.csv
ecd_maxt_2012102200_144.csv
ecd_maxt_2012102200_168.csv
ecd_cdd_2012102200_000.csv
ecd_cdd_2012102200_024.csv

for fileName in fileNameList
    for each distinct value in  Trgx.group(1)
         for each distinct value in  Dtrgx.group(1)
              for each distinct value in Hrgx.group(1)
                     do whatever

每个表达式捕获文件名的子集：

Trgx捕获第一部分（本例中每个实例中的“ecd”）
Mrgx捕获第二部分（“cdd”、“hdd”、“avgt”等）
Dtrgx捕获日期/时间段（如2012102100）
Hrgx捕获最后一部分（如扩展之前的000或024）

每个文件名都将匹配每个正则表达式，但将填充

.group（1）

根据不同的价值观

我想使用正则表达式作为“分组”元素遍历文件，以便以正确的顺序将它们连接在一起

大概是这样的：

ecd_cdd_2012102100_000.csv
ecd_cdd_2012102100_024.csv
ecd_hdd_2012102200_000.csv
ecd_hdd_2012102200_024.csv
ecd_hdd_2012102200_048.csv
ecd_avgt_2012102200_120.csv
ecd_avgt_2012102200_144.csv
ecd_avgt_2012102200_168.csv
ecd_mint_2012102200_192.csv
ecd_maxt_2012102200_144.csv
ecd_maxt_2012102200_168.csv
ecd_cdd_2012102200_000.csv
ecd_cdd_2012102200_024.csv

for fileName in fileNameList
    for each distinct value in  Trgx.group(1)
         for each distinct value in  Dtrgx.group(1)
              for each distinct value in Hrgx.group(1)
                     do whatever

将正则表达式组合在一起可能更容易

re_fn = re.compile('(?P<T>[a-zA-Z0-9]{3,4})_(?P<M>[a-zA-Z0-9]{3,4})_(?P<Dt>[0-9]{10})_(?P<H>[0-9]{3}).csv')