使用Python根据文本文件中的值提取行_Python_Extract_Extraction

使用Python根据文本文件中的值提取行

python

使用Python根据文本文件中的值提取行,python,extract,extraction,Python,Extract,Extraction,我在文件a中有一个信息列表，我想根据文件B中的编号提取该列表。如果给定值4和5，则将提取文件a中所有值为4和5的第4列。我可以知道如何使用python来实现这一点吗？有人能帮我吗？下面的代码仅基于值为4的索引进行提取 with open("B.txt", "rt") as f: classes = [int(line) for line in f.readlines()] with open("A.txt", "rt") as f: lines = [line f

我在文件a中有一个信息列表，我想根据文件B中的编号提取该列表。如果给定值4和5，则将提取文件a中所有值为4和5的第4列。我可以知道如何使用python来实现这一点吗？有人能帮我吗？下面的代码仅基于值为4的索引进行提取

with open("B.txt", "rt") as f:
    classes = [int(line) for line in f.readlines()]
    with open("A.txt", "rt") as f:
        lines = [line for index, line in enumerate(f.readlines()) if classes[index]== 4]
        lines_all= "".join(lines)

with open("C.txt", "w") as f:
        f.write(lines_all)

A.txt

hg17_ct_ER_ER_1003  36  42  1
hg17_ct_ER_ER_1003  109 129 2
hg17_ct_ER_ER_1003  110 130 2
hg17_ct_ER_ER_1003  129 149 2
hg17_ct_ER_ER_1003  130 150 2
hg17_ct_ER_ER_1003  157 163 3
hg17_ct_ER_ER_1003  157 165 3
hg17_ct_ER_ER_1003  179 185 4
hg17_ct_ER_ER_1003  197 217 5
hg17_ct_ER_ER_1003  220 226 6

4
5

hg17_ct_ER_ER_1003  179 185 4
hg17_ct_ER_ER_1003  197 217 5

B.txt

hg17_ct_ER_ER_1003  36  42  1
hg17_ct_ER_ER_1003  109 129 2
hg17_ct_ER_ER_1003  110 130 2
hg17_ct_ER_ER_1003  129 149 2
hg17_ct_ER_ER_1003  130 150 2
hg17_ct_ER_ER_1003  157 163 3
hg17_ct_ER_ER_1003  157 165 3
hg17_ct_ER_ER_1003  179 185 4
hg17_ct_ER_ER_1003  197 217 5
hg17_ct_ER_ER_1003  220 226 6

4
5

hg17_ct_ER_ER_1003  179 185 4
hg17_ct_ER_ER_1003  197 217 5

所需输出

hg17_ct_ER_ER_1003  36  42  1
hg17_ct_ER_ER_1003  109 129 2
hg17_ct_ER_ER_1003  110 130 2
hg17_ct_ER_ER_1003  129 149 2
hg17_ct_ER_ER_1003  130 150 2
hg17_ct_ER_ER_1003  157 163 3
hg17_ct_ER_ER_1003  157 165 3
hg17_ct_ER_ER_1003  179 185 4
hg17_ct_ER_ER_1003  197 217 5
hg17_ct_ER_ER_1003  220 226 6

4
5

hg17_ct_ER_ER_1003  179 185 4
hg17_ct_ER_ER_1003  197 217 5

从b文件中创建一组行/数字，并将f1中每行的最后一个元素与集合中的元素进行比较：

import  csv    
with open("a.txt") as f, open("b.txt") as f2:
    st = set(line.rstrip() for line in f2)
    r = csv.reader(f,delimiter=" ")
    data = [row for row in r if row[-1] in st]
    print(data)

[['hg17_ct_ER_ER_1003', '179', '185', '4'], ['hg17_ct_ER_ER_1003', '197', '217', '5']]

将

delimiter=

设置为任意值，或者如果文件以逗号分隔，则根本不设置它

或：

输出：

hg17_ct_ER_ER_1003  179 185 4
hg17_ct_ER_ER_1003  197 217 5

通过@Padraic，我将

split（）。这两个模块都是Python模块，您可以看看它们对您有什么帮助。谢谢您的建议。非常感谢，好的。通过编辑，您发布了一个完整的问题。我投你一票，结果不是他想要的。看看我的答案。@liushuaikobe，你到底在说什么？你的代码只是我所说的效率较低的版本provided@zehnpaard，我们只从右侧拆分一次，而不是在每个空格上拆分，即'hg17_-ct_-ER_1003 197 217 5'>['hg17_-ct_-ER_-ER_1003 197 217，'5']
对，考虑到我们查看的是最后一列，同样更有效，感谢您的澄清。@Padraiccningham感谢您对rsplit
的澄清。