Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用用户定义的输入解析large.txt-Python_Python_Python 3.x_Parsing - Fatal编程技术网

使用用户定义的输入解析large.txt-Python

使用用户定义的输入解析large.txt-Python,python,python-3.x,parsing,Python,Python 3.x,Parsing,早上好,伙计们 我为.txt文件编写的这段代码遵循开始/完成时间的模式。当我尝试查看它是否适用于不遵循该模式的不同.txt文件时。。。它(显然)坏了。其工作时的输出低于 import pprint  # Fancy pretty print for python import re  # regular expressions   count = 0 d = {}  # d is an empty dictionary   file = open(r"C:\Users\cqt7wny\Deskt

早上好,伙计们

我为.txt文件编写的这段代码遵循开始/完成时间的模式。当我尝试查看它是否适用于不遵循该模式的不同.txt文件时。。。它(显然)坏了。其工作时的输出低于

import pprint  # Fancy pretty print for python
import re  # regular expressions
 
count = 0
d = {}  # d is an empty dictionary
 
file = open(r"C:\Users\cqt7wny\Desktop\test.txt", "r")  # Open file for reading, it returns the contents of file as array (its a generator)
 
for line in file:  # Read line by line
  if '==' in line or "**" in line or not line.strip() or 'countriesshipped by day' in line:  # If line is long string of =, its a record separator, skip it
      continue
 
  if 'STARTED' in line:  # This line contains start time
      program_name, _ = line.split("STARTED")  # The pattern is <program name><space>STARTED<WHATEVER>
      start_time = line.split(' ')[-1].strip()  # Slplit line wit a space and take last component
      d[count] = ({'start_time': start_time})  # Initialize the nth record, starts with 0 as 'count' is set to 0
 
      continue
 
  if 'COMPLETED' in line:  # End time
      end_time = line.split(' ')[-1].strip()
      d[count].update({'end_time': end_time})  # Get end time
      count += 1
      continue
 
  # For every other line with = in it,  split with = to make it key/value
 
  try:
      x, y = re.split(r'\=|\:', line)
  except:
      x, y = ("", "")
      print (line)
 
  x = x.strip()  # Remove leading and trailing spaces on key
  y = y.strip()  # Remove leading and trailing spaces on value
 
  d[count].update({x: y})  # Put the key value pair into d[count]
 
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(d)
 
我想要完成的是: 我现在的目标是制作一个解析器程序,它能够扫描任何.txt文件,而不管其格式如何,并检索特定的用户定义信息

我的计划/想法

为了让这个程序能够处理任何文本文件,用户需要知道他们希望程序扫描的信息的每一个细节。换句话说,用户告诉程序它需要搜索什么。。。该计划不做任何假设

我希望运行该程序的用户为1。输入文件名,2。输入程序名(用作开始搜索)3。输入分隔符(用于文件中的键值对)4。用户需要值的键(程序将在各行中运行,查看键是否与某一行匹配,然后获取右侧的值)。   所以这个过程只需要几个步骤

  • 得到 A.文件名 B程序名 C分隔符 d、 来自用户的密钥列表
  • 打开文件并读取它 3.在每一行中循环寻找键值对
  • 用分离器把它分开
  • 打印键和值
  • 我当前的代码:


    代码对我来说很好。你有什么特别的问题吗?它不适用于非常独特的大数据。啊,好的。您是否收到错误消息,或者您确定它以其他方式不起作用?我只是不知道从哪里开始尝试。我使用的代码不适用于没有唯一性的数据。。。
    {   0: {   'ADDR FOUND': '3169',
              'ADDR NOT FND': '0',
              'CALLS': '82',
              'ELIG   SYS': '3762',
              'INELIG SYS': '7',
              'Program Name': 'program1',
              'REC READ': '265',
              'REC WRITTEN': '265',
              'SHPR FOUND': '69',
              'SHPR NOT FND': '3',
              'end_time': '2017-06-07-14.35.56.067879',
              'start_time': '2017-06-07-14.31.34.827086'},
       1: {   'ADDR FOUND': '31369',
              'ADDR NOT FND': '10',
              'CALLS': '32',
              'ELIG   SYS': '762',
              'INELIG SYS': '471',
              'Program Name': 'program1',
              'REC READ': '165',
              'REC WRITTEN': '235',
              'SHPR FOUND': '649',
              'SHPR NOT FND': '23',
              'end_time': '2017-06-07-14.35.56.067879',
              'start_time': '2017-06-07-14.31.34.827086'},
       2: {   'ADDR FOUND': '3169',
              'ADDR NOT FND': '0',
              'CALLS': '82',
              'ELIG   SYS': '3762',
              'INELIG SYS': '7',
              'Program Name': 'program1',
              'REC READ': '265',
              'REC WRITTEN': '265',
              'SHPR FOUND': '69',
              'SHPR NOT FND': '3',
              'end_time': '2017-06-07-14.35.56.067879',
              'start_time': '2017-06-07-14.31.34.827086'},
       3: {   'ADDR FOUND': '31369',
              'ADDR NOT FND': '10',
              'CALLS': '32',
              'ELIG   SYS': '762',
              'INELIG SYS': '471',
              'Program Name': 'program1',
              'REC READ': '165',
              'REC WRITTEN': '235',
              'SHPR FOUND': '649',
              'SHPR NOT FND': '23',
              'end_time': '2017-06-07-14.35.56.067879',
              'start_time': '2017-06-07-14.31.34.827086'},
    
    file_name = input("File name : ")
    program_name = input("Program name : ")
    delimiter = input("Delimiter : ")
     
    fields = input("Fields : ")
    field_list = fields.split(",")
     
    d = []  # d is an empty array
     
    file = open(file_name, "r")  # Open file for reading, it returns the contents of file as array (its a generator)
     
    for line in file:  # Read line by line
       if any(field in line for field in field_list):
           key, value = line.split(delimiter)
           d.append({key: value}) # Put the key value pair into d[count]
     
    print(d)