Python:如何使用正则表达式显示文本文件中的顶级数字
我的任务是显示两个不同文本文件的俯视图。文本文件的格式为“文件”,后跟路径文件夹、视图、打开/关闭。我遇到的问题是显示顶部视图和路径文件夹的标题必须按字母顺序排列,以防视图相同 我已经用glob读取了两个不同的文件。我甚至使用正则表达式来确保文件的读取方式是它应该的。我还知道我可以使用sort/sorted按字母顺序排列。我主要关心的是显示文本文件的俯视图 这是我的档案: file1.txt file2.txt **(从格式可以看出,第三个选项卡是视图) 输出应如下所示:Python:如何使用正则表达式显示文本文件中的顶级数字,python,regex,python-3.x,file,Python,Regex,Python 3.x,File,我的任务是显示两个不同文本文件的俯视图。文本文件的格式为“文件”,后跟路径文件夹、视图、打开/关闭。我遇到的问题是显示顶部视图和路径文件夹的标题必须按字母顺序排列,以防视图相同 我已经用glob读取了两个不同的文件。我甚至使用正则表达式来确保文件的读取方式是它应该的。我还知道我可以使用sort/sorted按字母顺序排列。我主要关心的是显示文本文件的俯视图 这是我的档案: file1.txt file2.txt **(从格式可以看出,第三个选项卡是视图) 输出应如下所示: file GameOf
file GameOfThrones 900 0
file DC/Batman 504 1
file Science/Chemistry 444 1
file Marvel/CaptainAmerica 342 0
file Math/Calculus 342 0
...
除此之外,我目前正在使用的功能是显示俯视图:
records = dict(re.findall(r"files (.+) (\d+).+", files))
main_dict = {}
for file in records:
print(file)
#idk how to display the top views
return main_dict
提取排序标准
首先,您需要获取信息,以便对每一行进行排序。
可以使用此正则表达式从行中提取视图和路径:
>>> import re
>>> criteria_re = re.compile(r'file (?P<path>\S*) (?P<views>\d*) \d*')
>>> m = criteria_re.match('file GameOfThrones 900 0')
>>> res = (int(m.group('views')), m.group('path'))
>>> res
(900, 'GameOfThrones')
从我上面的评论继续:
fileOne = 'list.txt'
fileTwo = 'list2.txt'
result = []
with open (fileOne, 'r') as file1Obj, open(fileTwo, 'r') as file2Obj:
result.append(file1Obj.readlines())
result.append(file2Obj.readlines())
result = sum(result, []) # flattening the nested list
result = [i.split('\n', 1)[0] for i in result] # removing the \n char
print(sorted(result, reverse=True, key = lambda x: int(x.split()[2]))) # sorting by the view
[
'file GameOfThrones 900 0', 'file DC/Batman 504 1', 'file Science/Chemistry 444 1',
'file Marvel/CaptainAmerica 342 0', 'file Math/Calculus 342 0',
'file Psychology 324 1', 'file Marvel/GuardiansOfGalaxy 300 1',
'file Anthropology 234 0', 'file DC/Superman 200 1', 'file Science/Biology 200 1'
]
with open (fileOne, 'r') as file1Obj, open(fileTwo, 'r') as file2Obj: result = file1Obj.readlines() + file2Obj.readlines()
print(list(i.split('\n', 1)[0] for i in sorted(result, reverse=True, key = lambda x: int(x.split()[2])))) # sorting by the view
list.txt:
file Marvel/GuardiansOfGalaxy 300 1
file DC/Batman 504 1
file GameOfThrones 900 0
file DC/Superman 200 1
file Marvel/CaptainAmerica 342 0
list2.txt:
file Science/Biology 200 1
file Math/Calculus 342 0
file Psychology 324 1
file Anthropology 234 0
file Science/Chemistry 444 1
和:
fileOne = 'list.txt'
fileTwo = 'list2.txt'
result = []
with open (fileOne, 'r') as file1Obj, open(fileTwo, 'r') as file2Obj:
result.append(file1Obj.readlines())
result.append(file2Obj.readlines())
result = sum(result, []) # flattening the nested list
result = [i.split('\n', 1)[0] for i in result] # removing the \n char
print(sorted(result, reverse=True, key = lambda x: int(x.split()[2]))) # sorting by the view
[
'file GameOfThrones 900 0', 'file DC/Batman 504 1', 'file Science/Chemistry 444 1',
'file Marvel/CaptainAmerica 342 0', 'file Math/Calculus 342 0',
'file Psychology 324 1', 'file Marvel/GuardiansOfGalaxy 300 1',
'file Anthropology 234 0', 'file DC/Superman 200 1', 'file Science/Biology 200 1'
]
with open (fileOne, 'r') as file1Obj, open(fileTwo, 'r') as file2Obj: result = file1Obj.readlines() + file2Obj.readlines()
print(list(i.split('\n', 1)[0] for i in sorted(result, reverse=True, key = lambda x: int(x.split()[2])))) # sorting by the view
输出:
fileOne = 'list.txt'
fileTwo = 'list2.txt'
result = []
with open (fileOne, 'r') as file1Obj, open(fileTwo, 'r') as file2Obj:
result.append(file1Obj.readlines())
result.append(file2Obj.readlines())
result = sum(result, []) # flattening the nested list
result = [i.split('\n', 1)[0] for i in result] # removing the \n char
print(sorted(result, reverse=True, key = lambda x: int(x.split()[2]))) # sorting by the view
[
'file GameOfThrones 900 0', 'file DC/Batman 504 1', 'file Science/Chemistry 444 1',
'file Marvel/CaptainAmerica 342 0', 'file Math/Calculus 342 0',
'file Psychology 324 1', 'file Marvel/GuardiansOfGalaxy 300 1',
'file Anthropology 234 0', 'file DC/Superman 200 1', 'file Science/Biology 200 1'
]
with open (fileOne, 'r') as file1Obj, open(fileTwo, 'r') as file2Obj: result = file1Obj.readlines() + file2Obj.readlines()
print(list(i.split('\n', 1)[0] for i in sorted(result, reverse=True, key = lambda x: int(x.split()[2])))) # sorting by the view
较短版本:
fileOne = 'list.txt'
fileTwo = 'list2.txt'
result = []
with open (fileOne, 'r') as file1Obj, open(fileTwo, 'r') as file2Obj:
result.append(file1Obj.readlines())
result.append(file2Obj.readlines())
result = sum(result, []) # flattening the nested list
result = [i.split('\n', 1)[0] for i in result] # removing the \n char
print(sorted(result, reverse=True, key = lambda x: int(x.split()[2]))) # sorting by the view
[
'file GameOfThrones 900 0', 'file DC/Batman 504 1', 'file Science/Chemistry 444 1',
'file Marvel/CaptainAmerica 342 0', 'file Math/Calculus 342 0',
'file Psychology 324 1', 'file Marvel/GuardiansOfGalaxy 300 1',
'file Anthropology 234 0', 'file DC/Superman 200 1', 'file Science/Biology 200 1'
]
with open (fileOne, 'r') as file1Obj, open(fileTwo, 'r') as file2Obj: result = file1Obj.readlines() + file2Obj.readlines()
print(list(i.split('\n', 1)[0] for i in sorted(result, reverse=True, key = lambda x: int(x.split()[2])))) # sorting by the view
您可以使用以下代码:
#open the 2 files in read mode
with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2:
data = f1.read() + f2.read() #store the content of the two files in a string variable
lines = data.split('\n') #split each line to generate a list
#do the sorting in reverse mode, based on the 3rd word, in your case number of views
print(sorted(lines[:-1], reverse=True, key=lambda x:int(x.split()[2])))
输出:
['file GameOfThrones 900 0', 'file DC/Batman 504 1', 'file Science/Chemistry 444 1', 'file Marvel/CaptainAmerica 342 0', 'file Math/Calculus 342 0', 'file Psychology 324 1', 'file Marvel/GuardiansOfGalaxy 300 1', 'file Anthropology 234 0', 'file DC/Superman 200 1', 'file Science/Biology 200 1']
查看
sort()。它用于提取文本。提取这些需要排序的数字,然后进行排序。如何读取这两个文件,将结果存储在列表中,然后按值排序?从数据中,似乎可以使用line.split()[2]
,提取每个数字,不需要正则表达式。我也可以使用import glob打开这两个.txt文件吗?等等,我在哪里可以看到这种方式?@smokingpenguin我的意思是,我显示了我的方式。既然在最初的问题中没有问/提到这个问题,我就试着用它。m组中的“m”来自哪里?我试着遵循你的代码,但我似乎无法遵循第一部分。有没有办法让我看看这是如何在rep.l或其他文件中输出的?非常感谢你!糟糕的是,我编辑了一大堆代码,却忘记了这一点。我可以明天回复,没问题=)@smokingpenguin我把我的代码和你的示例文件加上几个示例来测试字母顺序。如果代码不清楚,请随时提问。谢谢。这在现在意义重大。对于子孙后代来说,如果repl链接腐烂: