Python文件中的唯一行和换行_Python_File_Parsing_Unique

Python文件中的唯一行和换行

python file parsing

Python文件中的唯一行和换行,python,file,parsing,unique,Python,File,Parsing,Unique,我正在基于打印匹配的命令行样式脚本创建日志解析器。其思想是只输出日志中与几个唯一值匹配的唯一行。下面是唯一提取内容的示例格式 From source books, query: ((domain:www.users.com || username:ed || location:boston || years:2 || title:lead || last_update:{2019-09-19T16:44:36.153Z TO 2019-09-19T16:48:04.125Z] &&

我正在基于打印匹配的命令行样式脚本创建日志解析器。其思想是只输出日志中与几个唯一值匹配的唯一行。下面是唯一提取内容的示例格式

From source books, query: ((domain:www.users.com || username:ed || location:boston || years:2 || title:lead || last_update:{2019-09-19T16:44:36.153Z TO 2019-09-19T16:48:04.125Z] && userid_17:*).
From source books, query: ((domain:www.users.com || username:john || location:austin || years:1 || title:associate || last_update:{2019-09-19T16:44:40.133Z TO 2019-09-19T16:48:06.145Z] && userid_18:*).

在其他行中，这些行是唯一的，因为它们具有用户ID、域和年份。如果这3个不在同一行中，则不需要显示它们

每10分钟将向日志写入一行新行，并带有更新的last_update时间戳。我只需要第一次点击这个用户ID。在我的脚本中，我删除了{]之间的时间戳，有效地使行都相同，从而更容易提取唯一的行

目前，我的脚本正在“运行”，但我确信这可以被清理，并希望有新的想法。我对Python脚本还是很陌生，所以请批评一下。现在换行符不起作用，我觉得这会使它在视觉上更容易看到

我也觉得userid_uu对于unique会更好，但是我不确定如何说一次找到这个惟一的值，但也必须有搜索2和3

标准：

唯一输出，每个id只能打印一行用户id
域和年份是此搜索唯一的，其他行包括不需要打印没有这两个匹配项的userid
必须在每次独特的查找后换行，以便于阅读

注意：这是一个带有其他命令行参数的工具。这是错误的数据，在这个活动数据字符串中有更多的项，但希望通过截断来更容易理解请求

import os, sys, argparse, urllib.parse, csv, re

parser = argparse.ArgumentParser(description='Choose an option')

# Setup required arguments
parser.add_argument('-b', action="store_true", help='searches users with domain and years')
args = parser.parse_args()

#Get Current Working Directory
dirpath = os.getcwd()

if args.b:
        debug_log = dirpath+'/var/log/database/debug.log'
        # 3 items are unique to this line vs other similar "userid_" lines 
        search1="userid_"
        search2="domain"
        search3="years"
        with open(debug_log, 'r') as search:
                unique = set()
                for lines in search:
                        #search the file for matching terms
                        if search1 and search2 and search3 in lines:
                                #remove the last_update items, anything between { and ] to make it unique
                                removed = re.sub(r'\{(.*?)\]', '', lines)
                                if removed not in unique:
                                        unique.add(removed)
                print(unique)

此脚本的输出结果如下所示，因此它确实有效。但是，即使输出中有一个“\n”，换行符也不起作用。我假设是因为正在使用set？当命中次数超过50次时，单行输出更难读取

{'From source books, query: ((domain:www.users.com || username:ed || location:boston || years:2 || title:lead || last_update: && userid_17:*).\n', 'From source books, query: ((domain:www.users.com || username:john || location:austin || years:1 || title:associate || last_update: && userid_18:*).\n'}

谢谢！

这并不优雅，也没有真正解决“为什么”，但如果你只是需要继续生活，你能尝试添加你自己的“\n”吗，如：

if removed not in unique:
     unique.add(removed + '\n')

不幸的是，这就是我的工作方式：/

不是打印

unique

集合的字符串，而是打印集合中的每一行

也就是说，改变

print(unique)

到

（这将在Python 3中工作，看起来您正在使用它。）

（之所以有

end='

，是因为每行

都以一个换行结束，默认情况下，print
也会附加一个换行。你不需要两者都添加。）hmm。所以我尝试了类似的方法。给了它另一次机会，它奇怪地只是在输出中添加了另一行。\n(（域名：www.users.com | |用户名：john | |位置：austin | | |年份：1 | |头衔：associate | | | | last | update:&&userid | | 18:.。。。。）\n\n'，'来自源代码集，查询：（（域名：www.users.com |用户名：ed | | |位置：波士顿| |年份：2 |年份：2 |头衔：lead | |上次更新：&&userid:&&userid:::::&17“不在”是多余的，因为一个集合不会有重复项。我对集合不是很熟悉，但如果用列表替换它，可能会更好。例如：使用“unique=”，而不是“unique=”，然后使用“unique=”，而不是unique.add使用unique.append（保留if..not in…逻辑）。您可以编辑您的问题以包含日志文件的3-4行吗？另一种方法是在使用集合时将print（unique）替换为print（str（unique））。
for line in unique:
    print(line, end='')