Python行比较_Python_Regex_Hash_Comparison

Python行比较

python regex hash

Python行比较,python,regex,hash,comparison,Python,Regex,Hash,Comparison,我有一个目录，其子目录中充满了所有包含以下行的文件： VERSION "1.0" # THAT NUMBER VARIES. 因此，作为内容示例的一个文件应为： #comments about what the file program does # # #ifndefine _AWS_THIS_READS_STUFF_H #define _AWS_THIS_READS_STUFF_H #define AWS_THIS_READS_STUFF_VERSION "1.2" &l

我有一个目录，其子目录中充满了所有包含以下行的文件：

VERSION "1.0"  # THAT NUMBER VARIES.

因此，作为内容示例的一个文件应为：

 #comments about what the file program does
 #
 #

 #ifndefine _AWS_THIS_READS_STUFF_H
 #define _AWS_THIS_READS_STUFF_H

 #define AWS_THIS_READS_STUFF_VERSION "1.2" <---this is the line I want 
                                            #to compare that is in all

当它在我的命令提示符下运行时，它不会输出任何内容

我也试过这个我得到它打印出文件和“0.0”号，但现在我需要比较

myfiles = glob.glob('*.h') #reads in all files ending in .h
for file in myfiles:
    for line in open(file):
        line = line.rstrip()
        if re.search('VERSION\s+("\d+\.\d+")$', line):
            list = re.findall("\d+\.\d+" , line)
            list.append(file)
            print(list)
            #print (list + ' : ' + root + "/" + myfile)
    with open(file) as f:
        version = re.findall('VERSION\s+("\d+\.\d+")$', file)
        version = re.search(next(dropwhile(lambda x: "VERSION" not in x, f)))
        print(version)

只需找出如何比较“0.0”列表中的数字（再次，在小数点之前和之后）

如果您的版本行是每个文件的标题，您可以从

iglob

中提取第一个返回的行，并使用

all

将该行与其余行进行比较：

import glob
import re
from collections import defaultdict
def all_same(patt):
    r = re.compile("VERSION\s+(\d+\.\d+)")
    files = glob.iglob(patt)
    d = defaultdict(list)
    for file in files:
        with open(file) as f:
            version = r.search(next(dropwhile(lambda x: "VERSION" not in x, f)))
            d[version].append(file)
    return d

如果您真的想使用不同的版本号查找所有文件的名称，我会在dict中通过版本号查找每个文件和组中的版本行，使用版本号作为键，并将文件名作为值附加在后面：

import glob
import re
from collections import defaultdict
from itertools import dropwhile
def all_same(path):
    r = re.compile("VERSION\s+(\d+\.\d+)")
    files = glob.iglob(path)
    d = defaultdict(list)
    for file in files:
        with open(file) as f:
            version = r.search(next(dropwhile(lambda x: "VERSION" not in x, f))).group()
            d[version].append(file)
    return d

你真的试过什么吗？

grep VERSION*.h | sort-u

？好的，那么你是在搜索文件内容本身吗？所有的版本行都是从一开始的吗？@jornsharpe我试过很多，我试过filecmp，我也试过散列，但在python方面我不太流利，因为我通常使用Java脚本。但是，是的，所有文件都包含版本和一些数字，但数字确实不同，总是一个数字，然后是十进制数字，然后是另一个数字。另外，斯文谢谢你，我也会调查一下。所以请给出你的代码和一个具体的错误。@Girls.Gone.Wired，最后一个代码会给你一个dict，其中所有具有通用版本号的文件都被分组在一起，如果你没有分组，你会做

O（n^2）

比较这似乎没有任何建议？我认为经过一些研究后，dict绝对是最好的方法。除了所有的文件名，我似乎无法让它打印出任何内容。对于版本=它应该说“版本”吗？因为所有的文件都这么说。将您的输入添加到您的问题中好的，我添加了它。它和你的非常相似。当我在命令提示符下运行它时，它接受它，但不打印任何内容。谢谢你的帮助，我真的很感激。

import glob
import re
from collections import defaultdict
from itertools import dropwhile
def all_same(path):
    r = re.compile("VERSION\s+(\d+\.\d+)")
    files = glob.iglob(path)
    d = defaultdict(list)
    for file in files:
        with open(file) as f:
            version = r.search(next(dropwhile(lambda x: "VERSION" not in x, f))).group()
            d[version].append(file)
    return d