数据文件-使用python重新构造文件数据_Python_Python 3.x

数据文件-使用python重新构造文件数据

python python-3.x

数据文件-使用python重新构造文件数据,python,python-3.x,Python,Python 3.x,我正在将数据文件内容转换为新格式并写入新文件数据文件：f1.txt user A start 10:30 user B start 10:30 user B end 10:40 user C start 10:50 user A end 11:30 user C end 12:30 user A start 10:30 and end 11:30 user B start 10:30 and end 10:40 user C start 10:50 and end 1

我正在将数据文件内容转换为新格式并写入新文件

数据文件：

f1.txt

user A start 10:30
user B start 10:30
user B end   10:40
user C start 10:50
user A end   11:30
user C end   12:30

user A  start  10:30 and end 11:30
user B  start  10:30 and end 10:40
user C  start  10:50 and end 12:30

新数据文件：

f2.txt

user A start 10:30
user B start 10:30
user B end   10:40
user C start 10:50
user A end   11:30
user C end   12:30

user A  start  10:30 and end 11:30
user B  start  10:30 and end 10:40
user C  start  10:50 and end 12:30

请解释如何写入新的数据格式。

我将按用户类别

、

或

对您的行进行分组，分组方式如下所示：

{'user A': {'start': '10:30', 'end': '11:30'}, 'user B': {'start': '10:30', 'end': '10:40'}, 'user C': {'start': '10:50', 'end': '12:30'}}

然后迭代此字典，并将项目组合成行以写入文件

from collections import defaultdict

# open file for reading, and output file for writing to
with open("f1.txt") as f, open("f2.txt", mode="w") as out:
    d = defaultdict(dict)

    # Group lines
    for line in f:

        # Unpack all columns
        col1, col2, col3, col4 = line.split()

        # Assign start and end times to dict
        d[f"{col1} {col2}"][col3] = col4

    # Iterate dictionary keys and values
    for user, data in d.items():

        # Construct new line to write to file + newline
        line = f"{user}  start  {data['start']} and end {data['end']}\n"

        # Write line to file
        out.write(line)

f2.txt

user A  start  10:30 and end 11:30
user B  start  10:30 and end 10:40
user C  start  10:50 and end 12:30

user A  start  10:30 and end 11:30
user B  start  10:30 and end 10:40
user C  start  10:50 and end 12:30

更新（来自评论中的OP）

如果数据文件包含多个记录，那么我们需要更改方法。一种方法是收集列表中的开始和结束时间，然后取

min

和

max

，使用as

键

将格式为

'%H:%M'

的日期字符串转换为要比较的日期时间对象

我们正在寻找的新的

defaultdict

将如下所示：

{'user A': {'start': ['10:30', '10:30'], 'end': ['11:30', '11:30']}, 'user B': {'start': ['10:30', '10:30'], 'end': ['10:40', '10:40']}, 'user C': {'start': ['10:50', '10:50'], 'end': ['12:30', '12:30']}}

示例输入文件：

user A start 10:30
user B start 10:30
user B end 10:40
user C start 10:50
user A end 11:30
user C end 12:30
user A start 10:30
user B start 10:30
user B end 10:40
user C start 10:50
user A end 11:30
user C end 12:30

我们可以这样做：

from collections import defaultdict
from datetime import datetime

def date_compare_key(date):
    return datetime.strptime(date, '%H:%M')

with open("f1.txt") as f, open("f2.txt", mode="w") as out:
    d = defaultdict(lambda : defaultdict(list))

    for line in f:
        col1, col2, col3, col4 = line.split()
        d[f"{col1} {col2}"][col3].append(col4)

    for user, data in d.items():
        start = min(data["start"], key=date_compare_key)
        end = max(data["end"], key=date_compare_key)
        line = f"{user}  start  {start} and end {end}\n"
        out.write(line)

f2.txt

user A  start  10:30 and end 11:30
user B  start  10:30 and end 10:40
user C  start  10:50 and end 12:30

user A  start  10:30 and end 11:30
user B  start  10:30 and end 10:40
user C  start  10:50 and end 12:30

更有效的方法是保留第一个实现，只需在值出现时用值更新

start

和

end

值，而不是在收集所有值后使用

max

和

min

。然而，这将导致更复杂的代码。这可以是一个练习，让您了解：-）

我会按用户类别

、

或

对您的行进行分组，如下所示：

{'user A': {'start': '10:30', 'end': '11:30'}, 'user B': {'start': '10:30', 'end': '10:40'}, 'user C': {'start': '10:50', 'end': '12:30'}}

然后迭代此字典，并将项目组合成行以写入文件

from collections import defaultdict

# open file for reading, and output file for writing to
with open("f1.txt") as f, open("f2.txt", mode="w") as out:
    d = defaultdict(dict)

    # Group lines
    for line in f:

        # Unpack all columns
        col1, col2, col3, col4 = line.split()

        # Assign start and end times to dict
        d[f"{col1} {col2}"][col3] = col4

    # Iterate dictionary keys and values
    for user, data in d.items():

        # Construct new line to write to file + newline
        line = f"{user}  start  {data['start']} and end {data['end']}\n"

        # Write line to file
        out.write(line)

f2.txt

user A  start  10:30 and end 11:30
user B  start  10:30 and end 10:40
user C  start  10:50 and end 12:30

user A  start  10:30 and end 11:30
user B  start  10:30 and end 10:40
user C  start  10:50 and end 12:30

更新（来自评论中的OP）

如果数据文件包含多个记录，那么我们需要更改方法。一种方法是收集列表中的开始和结束时间，然后取

min

和

max

，使用as

键

将格式为

'%H:%M'

的日期字符串转换为要比较的日期时间对象

我们正在寻找的新的

defaultdict

将如下所示：

{'user A': {'start': ['10:30', '10:30'], 'end': ['11:30', '11:30']}, 'user B': {'start': ['10:30', '10:30'], 'end': ['10:40', '10:40']}, 'user C': {'start': ['10:50', '10:50'], 'end': ['12:30', '12:30']}}

示例输入文件：

user A start 10:30
user B start 10:30
user B end 10:40
user C start 10:50
user A end 11:30
user C end 12:30
user A start 10:30
user B start 10:30
user B end 10:40
user C start 10:50
user A end 11:30
user C end 12:30

我们可以这样做：

from collections import defaultdict
from datetime import datetime

def date_compare_key(date):
    return datetime.strptime(date, '%H:%M')

with open("f1.txt") as f, open("f2.txt", mode="w") as out:
    d = defaultdict(lambda : defaultdict(list))

    for line in f:
        col1, col2, col3, col4 = line.split()
        d[f"{col1} {col2}"][col3].append(col4)

    for user, data in d.items():
        start = min(data["start"], key=date_compare_key)
        end = max(data["end"], key=date_compare_key)
        line = f"{user}  start  {start} and end {end}\n"
        out.write(line)

f2.txt

user A  start  10:30 and end 11:30
user B  start  10:30 and end 10:40
user C  start  10:50 and end 12:30

user A  start  10:30 and end 11:30
user B  start  10:30 and end 10:40
user C  start  10:50 and end 12:30

更有效的方法是保留第一个实现，只需在值出现时用值更新

start

和

end

值，而不是在收集所有值后使用

max

和

min

。然而，这将导致更复杂的代码。这可以是一个练习，让您了解：-）

你的问题到底是什么？为你写逻辑没有多大意义。你的问题到底是什么？为您编写逻辑没有多大意义。如果数据文件具有更多相同的用户记录。用户A开始10:30用户B开始10:30用户B结束10:40用户C开始10:50用户A结束11:30用户C结束12:30用户A开始10:30用户B开始10:30用户B结束10:40用户C开始10:50用户A结束11:30用户C结束12：30@dineb然后上述解决方案需要修改。该数据的输出文件是什么样子的？文件中的新数据条目如下所示。用户A开始10:30和结束11:30用户B开始10:30和结束10:40用户C开始10:50和结束12:30用户A开始10:30和结束11:30用户B开始10:30和结束10:40用户C开始10:50和结束12：30@dineb我已经用一种方法更新了这个问题。如果数据文件拥有更多相同的用户记录。用户A开始10:30用户B开始10:30用户B结束10:40用户C开始10:50用户A结束11:30用户C结束12:30用户A开始10:30用户B开始10:30用户B结束10:40用户C开始10:50用户A结束11:30用户C结束12：30@dineb然后上述解决方案需要修改。该数据的输出文件是什么样子的？文件中的新数据条目如下所示。用户A开始10:30和结束11:30用户B开始10:30和结束10:40用户C开始10:50和结束12:30用户A开始10:30和结束11:30用户B开始10:30和结束10:40用户C开始10:50和结束12：30@dineb我已经用一种方法更新了这个问题。