在Python中解析文本文件并打印与字符串匹配的行_Python_Pandas_Data Science

在Python中解析文本文件并打印与字符串匹配的行

python pandas

在Python中解析文本文件并打印与字符串匹配的行,python,pandas,data-science,Python,Pandas,Data Science,我有一个名为file.txt的文件，其中包含一些文本。另一个文件details.txt包含要从file.txt中提取的字符串，并打印与details.txt中的字符串匹配的行 files.txt 12345 04/04/19 07:06:55 entered | computer message| ID WRE435TW: headway | | 23456 04/04/19 07:10:00 entered | computer message| Double vehicle log

我有一个名为file.txt的文件，其中包含一些文本。另一个文件details.txt包含要从file.txt中提取的字符串，并打印与details.txt中的字符串匹配的行

files.txt

12345 04/04/19 07:06:55  entered | computer message|  ID WRE435TW: headway | | 
23456 04/04/19 07:10:00  entered | computer message|  Double vehicle logon  | | 
23567 04/04/19 07:06:55  entered | computer message|  ID EWFRSDE3: small   | | 
09872 04/04/19 07:07:47  entered | computer message|  Double vehicle logon  | | 
76789 04/04/19 07:10:05  entered | computer message|  Veh : logoff          | |

headway
small
logoff
logon

details.txt

12345 04/04/19 07:06:55  entered | computer message|  ID WRE435TW: headway | | 
23456 04/04/19 07:10:00  entered | computer message|  Double vehicle logon  | | 
23567 04/04/19 07:06:55  entered | computer message|  ID EWFRSDE3: small   | | 
09872 04/04/19 07:07:47  entered | computer message|  Double vehicle logon  | | 
76789 04/04/19 07:10:05  entered | computer message|  Veh : logoff          | |

headway
small
logoff
logon

我试图解析文本文件，但没有得到正确的格式化输出

import pandas as pd
import re
import os
import glob
import csv


os.chdir("file_path")

with open("file.txt", "r") as fp:
    with open("details.txt", 'r+') as f:
        for i in f:
            for line in fp:
                if i:
                    print(i)
                else:
                    print('no Event')

只要使用熊猫能量：

import pandas as pd
import numpy as np

# Read the file as CSV with custom delimiter
df = pd.read_csv(
    'files.txt',
    delimiter='|',
    header=None
)

我们将得到：


    0                                   1                   2                     3 4
0   64834 04/04/19 07:06:55 entered     computer message    Veh SBS3797R: headway       
1   73720 04/04/19 07:10:00 entered     computer message    Double vehicle logon        
2   64840 04/04/19 07:06:55 entered     computer message    Veh SBS3755L: small         
3   67527 04/04/19 07:07:47 entered     computer message    Double vehicle logon        
4   73895 04/04/19 07:10:05 entered     computer message    Veh : logoff

选择第三列（索引为2）并对其进行变换：

words = np.vectorize(lambda x: x.strip().split(' ')[-1])(df[2].values)

np.vectorize

将函数

lambda x:x.strip（）.split（“”）[-1]

（清除文本并拾取最后一个单词）应用于第三列

df[2]。值

因此，您可以将其写入结果文件：

with open("details.txt", 'a+') as f:
    f.write('\n'.join(words))

注意，您应该使用

a+

附加到结果文件中

禁止这样做。

只需使用电源即可：

import pandas as pd
import numpy as np

# Read the file as CSV with custom delimiter
df = pd.read_csv(
    'files.txt',
    delimiter='|',
    header=None
)

我们将得到：


    0                                   1                   2                     3 4
0   64834 04/04/19 07:06:55 entered     computer message    Veh SBS3797R: headway       
1   73720 04/04/19 07:10:00 entered     computer message    Double vehicle logon        
2   64840 04/04/19 07:06:55 entered     computer message    Veh SBS3755L: small         
3   67527 04/04/19 07:07:47 entered     computer message    Double vehicle logon        
4   73895 04/04/19 07:10:05 entered     computer message    Veh : logoff

选择第三列（索引为2）并对其进行变换：

words = np.vectorize(lambda x: x.strip().split(' ')[-1])(df[2].values)

np.vectorize

将函数

lambda x:x.strip（）.split（“”）[-1]

（清除文本并拾取最后一个单词）应用于第三列

df[2]。值

因此，您可以将其写入结果文件：

with open("details.txt", 'a+') as f:
    f.write('\n'.join(words))

注意，您应该使用

a+

附加到结果文件中<代码> R< /代码>禁止执行。

注意，Python中不同于<代码> >代码>的字符串被认为是<代码>真< /代码>。因此，在您的代码中：

with open("file.txt", "r") as fp:
    with open("details.txt", 'r+') as f:
        for i in f:
            for line in fp:
                if i:  # This is always true (for input you showed)
                    print(i)
                else:
                    print('no Event')

你可以试试这个：

with open("file.txt", "r") as fp:
    with open("details.txt", 'r+') as f:
        for i in f:
            for line in fp:
                if i in line:  
                    print(line)  # I assume you wanted to print line from files.txt
                else:
                    print('no Event')

注意，在Python中不同于<代码> > '/COD>的字符串被认为是<代码> true。因此，在您的代码中：

with open("file.txt", "r") as fp:
    with open("details.txt", 'r+') as f:
        for i in f:
            for line in fp:
                if i:  # This is always true (for input you showed)
                    print(i)
                else:
                    print('no Event')

你可以试试这个：

with open("file.txt", "r") as fp:
    with open("details.txt", 'r+') as f:
        for i in f:
            for line in fp:
                if i in line:  
                    print(line)  # I assume you wanted to print line from files.txt
                else:
                    print('no Event')

您是否尝试将文件读入pandas数据框，然后执行合并？@KlemenKoleša尝试转换为pandas数据框，但没有获得需要匹配的字符串的正确索引您是否尝试将文件读入pandas数据框，然后执行合并？@KlemenKoleša尝试转换为pandas数据框，但没有获得正确的索引需要匹配的字符串索引谢谢，但我想打印files.txt中与details.txt中的字符串匹配的行。好的，您将在

words

变量中拥有单词数组。只需阅读

details.txt

中的所有单词并进行检查。通过将列表转换为集合，您可以很容易地做到这一点：

waka=set（words）

并将它们相交，如：

result=waka&waka2

假设文件detail.txt中有一个字符串“headway”。它将检查file.txt中每一行的字符串（“headway”），如果找到匹配项，它将打印12345 04/04/19 07:06:55输入的| computer message | ID WRE435TW:headway | |谢谢，但我想打印files.txt中与details.txt中的字符串匹配的行，您将在

words

变量中拥有单词数组。只需阅读

details.txt

中的所有单词并进行检查。通过将列表转换为集合，您可以很容易地做到这一点：

waka=set（words）

并将它们相交，如：

result=waka&waka2

假设文件detail.txt中有一个字符串“headway”。它将检查file.txt中每一行的字符串（“headway”），如果找到匹配项，它将打印12345 04/04/19 07:06:55输入的|计算机消息| ID WRE435TW:headway ||