在Python中解析文本文件并打印与字符串匹配的行

在Python中解析文本文件并打印与字符串匹配的行,python,pandas,data-science,Python,Pandas,Data Science,我有一个名为file.txt的文件,其中包含一些文本。另一个文件details.txt包含要从file.txt中提取的字符串,并打印与details.txt中的字符串匹配的行 files.txt 12345 04/04/19 07:06:55 entered | computer message| ID WRE435TW: headway | | 23456 04/04/19 07:10:00 entered | computer message| Double vehicle log

我有一个名为file.txt的文件,其中包含一些文本。另一个文件details.txt包含要从file.txt中提取的字符串,并打印与details.txt中的字符串匹配的行

files.txt

12345 04/04/19 07:06:55  entered | computer message|  ID WRE435TW: headway | | 
23456 04/04/19 07:10:00  entered | computer message|  Double vehicle logon  | | 
23567 04/04/19 07:06:55  entered | computer message|  ID EWFRSDE3: small   | | 
09872 04/04/19 07:07:47  entered | computer message|  Double vehicle logon  | | 
76789 04/04/19 07:10:05  entered | computer message|  Veh : logoff          | | 
headway
small
logoff
logon
details.txt

12345 04/04/19 07:06:55  entered | computer message|  ID WRE435TW: headway | | 
23456 04/04/19 07:10:00  entered | computer message|  Double vehicle logon  | | 
23567 04/04/19 07:06:55  entered | computer message|  ID EWFRSDE3: small   | | 
09872 04/04/19 07:07:47  entered | computer message|  Double vehicle logon  | | 
76789 04/04/19 07:10:05  entered | computer message|  Veh : logoff          | | 
headway
small
logoff
logon
我试图解析文本文件,但没有得到正确的格式化输出

import pandas as pd
import re
import os
import glob
import csv


os.chdir("file_path")

with open("file.txt", "r") as fp:
    with open("details.txt", 'r+') as f:
        for i in f:
            for line in fp:
                if i:
                    print(i)
                else:
                    print('no Event')
只要使用熊猫能量:

import pandas as pd
import numpy as np

# Read the file as CSV with custom delimiter
df = pd.read_csv(
    'files.txt',
    delimiter='|',
    header=None
)
我们将得到:


    0                                   1                   2                     3 4
0   64834 04/04/19 07:06:55 entered     computer message    Veh SBS3797R: headway       
1   73720 04/04/19 07:10:00 entered     computer message    Double vehicle logon        
2   64840 04/04/19 07:06:55 entered     computer message    Veh SBS3755L: small         
3   67527 04/04/19 07:07:47 entered     computer message    Double vehicle logon        
4   73895 04/04/19 07:10:05 entered     computer message    Veh : logoff        
选择第三列(索引为2)并对其进行变换:

words = np.vectorize(lambda x: x.strip().split(' ')[-1])(df[2].values)
np.vectorize
将函数
lambda x:x.strip().split(“”)[-1]
(清除文本并拾取最后一个单词)应用于第三列
df[2]。值

因此,您可以将其写入结果文件:

with open("details.txt", 'a+') as f:
    f.write('\n'.join(words))
注意,您应该使用
a+
附加到结果文件中
r
禁止这样做。

只需使用电源即可:

import pandas as pd
import numpy as np

# Read the file as CSV with custom delimiter
df = pd.read_csv(
    'files.txt',
    delimiter='|',
    header=None
)
我们将得到:


    0                                   1                   2                     3 4
0   64834 04/04/19 07:06:55 entered     computer message    Veh SBS3797R: headway       
1   73720 04/04/19 07:10:00 entered     computer message    Double vehicle logon        
2   64840 04/04/19 07:06:55 entered     computer message    Veh SBS3755L: small         
3   67527 04/04/19 07:07:47 entered     computer message    Double vehicle logon        
4   73895 04/04/19 07:10:05 entered     computer message    Veh : logoff        
选择第三列(索引为2)并对其进行变换:

words = np.vectorize(lambda x: x.strip().split(' ')[-1])(df[2].values)
np.vectorize
将函数
lambda x:x.strip().split(“”)[-1]
(清除文本并拾取最后一个单词)应用于第三列
df[2]。值

因此,您可以将其写入结果文件:

with open("details.txt", 'a+') as f:
    f.write('\n'.join(words))

注意,您应该使用
a+
附加到结果文件中<代码> R< /代码>禁止执行。

注意,Python中不同于<代码> >代码>的字符串被认为是<代码>真< /代码>。因此,在您的代码中:

with open("file.txt", "r") as fp:
    with open("details.txt", 'r+') as f:
        for i in f:
            for line in fp:
                if i:  # This is always true (for input you showed)
                    print(i)
                else:
                    print('no Event')
你可以试试这个:

with open("file.txt", "r") as fp:
    with open("details.txt", 'r+') as f:
        for i in f:
            for line in fp:
                if i in line:  
                    print(line)  # I assume you wanted to print line from files.txt
                else:
                    print('no Event')

注意,在Python中不同于<代码> > '/COD>的字符串被认为是<代码> true。因此,在您的代码中:

with open("file.txt", "r") as fp:
    with open("details.txt", 'r+') as f:
        for i in f:
            for line in fp:
                if i:  # This is always true (for input you showed)
                    print(i)
                else:
                    print('no Event')
你可以试试这个:

with open("file.txt", "r") as fp:
    with open("details.txt", 'r+') as f:
        for i in f:
            for line in fp:
                if i in line:  
                    print(line)  # I assume you wanted to print line from files.txt
                else:
                    print('no Event')

您是否尝试将文件读入pandas数据框,然后执行合并?@KlemenKoleša尝试转换为pandas数据框,但没有获得需要匹配的字符串的正确索引您是否尝试将文件读入pandas数据框,然后执行合并?@KlemenKoleša尝试转换为pandas数据框,但没有获得正确的索引需要匹配的字符串索引谢谢,但我想打印files.txt中与details.txt中的字符串匹配的行。好的,您将在
words
变量中拥有单词数组。只需阅读
details.txt
中的所有单词并进行检查。通过将列表转换为集合,您可以很容易地做到这一点:
waka=set(words)
并将它们相交,如:
result=waka&waka2
假设文件detail.txt中有一个字符串“headway”。它将检查file.txt中每一行的字符串(“headway”),如果找到匹配项,它将打印12345 04/04/19 07:06:55输入的| computer message | ID WRE435TW:headway | |谢谢,但我想打印files.txt中与details.txt中的字符串匹配的行,您将在
words
变量中拥有单词数组。只需阅读
details.txt
中的所有单词并进行检查。通过将列表转换为集合,您可以很容易地做到这一点:
waka=set(words)
并将它们相交,如:
result=waka&waka2
假设文件detail.txt中有一个字符串“headway”。它将检查file.txt中每一行的字符串(“headway”),如果找到匹配项,它将打印12345 04/04/19 07:06:55输入的|计算机消息| ID WRE435TW:headway ||