Regex Python读取具有开始和停止条件的文件
大家好,我有一个下面的文件数据,我希望处理它以获得预期的输出,作为一名python学习者,我很想知道是否有基于启动和停止布尔索引的方法来实现这一点 在这里,一个文件行以一个名为Regex Python读取具有开始和停止条件的文件,regex,linux,python-3.x,pandas,Regex,Linux,Python 3.x,Pandas,大家好,我有一个下面的文件数据,我希望处理它以获得预期的输出,作为一名python学习者,我很想知道是否有基于启动和停止布尔索引的方法来实现这一点 在这里,一个文件行以一个名为SRV:的字符串开始,但在某些情况下,这些行始终在同一行开始和结束,而在某些情况下,这些行被扩展为换行 文件文本数据: 预期产出: 有没有更好的方法来实现这一点,我对熊猫也没意见。使用for group,然后通过加入聚合: df1 = (df['col'].groupby(df['col'].str.startswith(
SRV:
的字符串开始,但在某些情况下,这些行始终在同一行开始和结束,而在某些情况下,这些行被扩展为换行
文件文本数据:
预期产出:
有没有更好的方法来实现这一点,我对熊猫也没意见。使用for group,然后通过加入聚合:
df1 = (df['col'].groupby(df['col'].str.startswith('SRV').cumsum())
.agg(' '.join)
.reset_index(drop=True)
.to_frame(name='new'))
print (df1)
new
0 SRV: this is for bryan
1 SRV: this is for terry
2 SRV: this is for torain sec01: This is reserve...
3 SRV: this is for Jun
详细信息:
print (df['col'].str.startswith('SRV').cumsum())
0 1
1 2
2 3
3 3
4 3
5 3
6 4
Name: col, dtype: int32
对于DataFrame
使用:
import pandas as pd
temp=u"""col
SRV: this is for bryan
SRV: this is for terry
SRV: this is for torain
sec01: This is reserved
sec02: This is open for all
sec03: Closed!
SRV: this is for Jun"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep="|")
print (df)
col
0 SRV: this is for bryan
1 SRV: this is for terry
2 SRV: this is for torain
3 sec01: This is reserved
4 sec02: This is open for all
5 sec03: Closed!
6 SRV: this is for Jun
纯python解决方案:
out = []
with open("file.csv") as f1:
last = 0
for i, line in enumerate(f1.readlines()):
if line.strip().startswith('SRV'):
last = i
out.append([line.strip(), last])
from itertools import groupby
from operator import itemgetter
with open("out_file.csv", "w") as f2:
groups = groupby(out, key=itemgetter(1))
for _, g in groups:
gg = list(g)
h = ' '.join(list(map(itemgetter(0), gg)))
f2.write('\n' + h)
您可以尝试使用类似于
df[0].groupby(df[0].str.startswith('SRV').cumsum()).apply(''.join)
,其中0
是列名。(注意:这是使用熊猫数据框)@anky_91,这也很有效。这确实很棒-jezrael+1-jezrael,你能解释一下它是如何记住它必须保存数据直到它看到下一个srv
?@user294110-在编辑的答案中添加了纯python解决方案。
import pandas as pd
temp=u"""col
SRV: this is for bryan
SRV: this is for terry
SRV: this is for torain
sec01: This is reserved
sec02: This is open for all
sec03: Closed!
SRV: this is for Jun"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep="|")
print (df)
col
0 SRV: this is for bryan
1 SRV: this is for terry
2 SRV: this is for torain
3 sec01: This is reserved
4 sec02: This is open for all
5 sec03: Closed!
6 SRV: this is for Jun
out = []
with open("file.csv") as f1:
last = 0
for i, line in enumerate(f1.readlines()):
if line.strip().startswith('SRV'):
last = i
out.append([line.strip(), last])
from itertools import groupby
from operator import itemgetter
with open("out_file.csv", "w") as f2:
groups = groupby(out, key=itemgetter(1))
for _, g in groups:
gg = list(g)
h = ' '.join(list(map(itemgetter(0), gg)))
f2.write('\n' + h)