Python 3.x 将非结构化txt文件读入数据帧
我有一个凌乱的文本文件,15分钟后重复了相同的数据 我想使用python从该txt文件创建一个数据帧 将open(fname)作为f: content=f.read().splitlines() 预期数据帧Python 3.x 将非结构化txt文件读入数据帧,python-3.x,pandas,Python 3.x,Pandas,我有一个凌乱的文本文件,15分钟后重复了相同的数据 我想使用python从该txt文件创建一个数据帧 将open(fname)作为f: content=f.read().splitlines() 预期数据帧 这适用于您的示例数据,但您可能需要使用真实数据调整某些内容 数据: 在: 输出: 使用正则表达式过滤出感兴趣的行,然后将其转换为数据帧。 2019-09-26 14:15:44 discount=1e 019-09-26 14:16:44 discount=4e 019-09-26 14:1
这适用于您的示例数据,但您可能需要使用真实数据调整某些内容 数据: 在: 输出:
使用正则表达式过滤出感兴趣的行,然后将其转换为数据帧。
2019-09-26 14:15:44 discount=1e
019-09-26 14:16:44 discount=4e
019-09-26 14:17:44 discount=2e
019-09-26 14:18:44 discount=3e
019-09-26 14:19:44 discount=2e
some text
some text
some text
Products: sold = 5, bought = 5, left = 0 (20% profit),
New data and new data in the same format
date discount profit
2019-09-26 14:15:44 1 20%
2019-09-26 14:16:44 4 20%
2019-09-26 14:17:44 2 20%
2019-09-26 14:18:44 3 20%
2019-09-26 14:19:44 2 20%
2019-09-26 14:15:44 discount=1e
2019-09-26 14:16:44 discount=4e
2019-09-26 14:17:44 discount=2e
2019-09-26 14:18:44 discount=3e
2019-09-26 14:19:44 discount=2e
some text
some text
some text
Products: sold = 5, bought = 5, left = 0 (20% profit),
2019-09-26 14:20:44 discount=1e
2019-09-26 14:21:44 discount=4e
2019-09-26 14:22:44 discount=2e
2019-09-26 14:23:44 discount=3e
2019-09-26 14:24:44 discount=2e
some text
some text
some text
Products: sold = 5, bought = 5, left = 0 (15% profit)
# range(12) because the expected input was 12 fields. May need to change this to the number of expected fields
df = pd.read_clipboard(names=[x for x in range(12)])
# 10 is the column name with the profit. May need to change this.
df[10] = df[10].bfill()
df['date'] = pd.to_datetime(df[0] +' '+ df[1], errors='coerce')
df = df[df['date'].notnull()]
df['discount'] = df[2].str.strip('discount=e')
df['profit'] = df[10].str.strip('()')
df[['date', 'discount', 'profit']]
| date | discount | profit |
|:-------------------:|----------|--------|
| 2019-09-26 14:15:44 | 1 | 20% |
| 2019-09-26 14:16:44 | 4 | 20% |
| 2019-09-26 14:17:44 | 2 | 20% |
| 2019-09-26 14:18:44 | 3 | 20% |
| 2019-09-26 14:19:44 | 2 | 20% |
| 2019-09-26 14:20:44 | 1 | 15% |
| 2019-09-26 14:21:44 | 4 | 15% |
| 2019-09-26 14:22:44 | 2 | 15% |
| 2019-09-26 14:23:44 | 3 | 15% |
| 2019-09-26 14:24:44 | 2 | 15% |