Python 3.x 为文件中单词的第一个匹配项提取n个字符

Python 3.x 为文件中单词的第一个匹配项提取n个字符,python-3.x,Python 3.x,我是Python的初学者。我有一个只有一行数据的文件。我的要求是在某些单词后提取“n”个字符,仅用于第一次出现。而且,这些单词不是连续的 数据文件:{“id”:“1234566jnejnwfw”,“displayId”:“1234566jne”,“author”:{“name”:”abcd@xyz.com,“datetime”:15636378484,“displayId:“2342346JNE”,“datetime”:4353453} 我想在“displayId”的第一次匹配之后和“autho

我是Python的初学者。我有一个只有一行数据的文件。我的要求是在某些单词后提取“n”个字符,仅用于第一次出现。而且,这些单词不是连续的

数据文件:
{“id”:“1234566jnejnwfw”,“displayId”:“1234566jne”,“author”:{“name”:”abcd@xyz.com,“datetime”:15636378484,“displayId:“2342346JNE”,“datetime”:4353453}

我想在“displayId”的第一次匹配之后和“author”之前获取值,即:1234566jne。对于“datetime”也是如此

我试着根据索引作为单词来断开这行,并将其放入另一个文件中进行进一步清理,以获得准确的值

tmpFile = "tmpFile.txt"
tmpFileOpen = open(tmpFile, "w+")

with open("data file") as openfile:
       for line in openfile:
           tmpFileOpen.write(line[line.index(displayId) + len(displayId):])
然而,我相信这不是进一步工作的好办法


有人能帮我吗?

如果我正确理解了您的问题,您可以通过执行以下操作来实现这一点:

import json

tmpFile = "tmpFile.txt"
tmpFileOpen = open(tmpFile, "w+")

with open("data.txt") as openfile:
    for line in openfile:
        // Loads the json to a dict in order to manipulate it easily
        data = json.loads(str(line))
        // Here I specify that I want to write to my tmp File only the first 3
        // characters of the field `displayId`
        tmpFileOpen.write(data['displayId'][:3])

这是可以做到的,因为您的文件中的数据是JSON,但是如果格式更改,它将无法工作

此答案应该适用于任何与您的问题中的格式类似的displayId。我决定不为此答案加载JSON文件,因为完成任务不需要它

import re

tmpFile = "tmpFile.txt"
tmpFileOpen = open(tmpFile, "w+")

with open('data_file.txt', 'r') as input:
  lines = input.read()

  # Use regex to find the displayId element
  # example: "displayId":"1234566jne
  # \W matches none words, such as " and :
  # \d matches digits
  # {6,8} matches digits lengths between 6 and 8
  # [a-z] matches lowercased ASCII characters
  # {3} matches 3 lowercased ASCII characters
  id_patterns = re.compile(r'\WdisplayId\W{3}\d{6,8}[a-z]{3}')
  id_results = re.findall(id_patterns, lines)

  # Use list comprehension to clean the results
  clean_results = ([s.strip('"displayId":"') for s in id_results])

  # loop through clean_results list
  for id in clean_results:
    # Write id to temp file on separate lines
    tmpFileOpen.write('{} \n'.format(id))

    # output in tmpFileOpen
    # 1234566jne 
    # 23423426jne 
此答案确实加载JSON文件,但如果JSON文件格式发生更改,则此答案将失败

import json

tmpFile = 'tmpFile.txt'
tmpFileOpen = open(tmpFile, "w+")

# Load the JSON file
jdata = json.loads(open('data_file.txt').read())

# Find the first ID
first_id = (jdata['displayId'])
# Write the first ID to the temp file
tmpFileOpen.write('{} \n'.format(first_id))

# Find the second ID
second_id = (jdata['author']['displayId'])
# Write the second ID to the temp file
tmpFileOpen.write('{} \n'.format(second_id))

# output in tmpFileOpen
# 1234566jne 
# 23423426jne 

非常感谢你。你的回答帮助了我,我能够推导出解决方案,因为json结构发生了变化。谢谢你,Milox。你的回答也帮助了我。