python分组()并为单词添加文本
我有一个代码:在前面对相同的单词进行分组,并合并和删除其中的冗余数据。它还从一个单词到下一个单词进行“文本”,并以相同的名称进行分组/添加: 例如:`输入文本文件python分组()并为单词添加文本,python,Python,我有一个代码:在前面对相同的单词进行分组,并合并和删除其中的冗余数据。它还从一个单词到下一个单词进行“文本”,并以相同的名称进行分组/添加: 例如:`输入文本文件 the cars and computers... Car(ferrari,lamborghini,porsche) some manufacturers specialise in "super" cars. Most people like them. Computer(hp,dell,apple,sony,fuji
the cars and computers...
Car(ferrari,lamborghini,porsche)
some manufacturers specialise in "super" cars.
Most people like them.
Computer(hp,dell,apple,sony,fujitsu)
These are some laptop manufacturers
car(skoda,audi)
GOOD cars
将缩进保留为的预期输出:
the cars and computers...
Car(ferrari,lamborghini,porsche,skoda,audi)
some manufacturers specialise in "super" cars.
Most people like them.
GOOD cars
Computer(hp,dell,apple,sony,fujitsu)
These are some laptop manufacturers
我已经完成了一个代码,它结合了和删除了冗余数据,但它没有将行分组并在前面添加到同一个单词,也没有删除添加的文本
我的代码:
import re
import collections
class Group:
def __init__(self):
self.members = []
self.text = []
with open('texta.txt', "r+") as f:
# so specific lines can be edited
lines = f.readlines()
groups = collections.defaultdict(Group)
group_pattern = re.compile(r'^(\S+)\((.*)\)$')
current_group = None
for line in range(len(lines)):
curr_line = lines[line]
# to prevent searches on lines with no group
if "(" in curr_line:
curr_line = curr_line.strip()
m = group_pattern.match(curr_line)
if m:
group_name, group_members = m.groups()
groups[group_name].members += filter(lambda x: x not in groups[group_name].members, group_members.split(','))
current_group = group_name
else:
if (current_group is not None) and (len(line) > 0):
groups[current_group].text.append(line)
already_seen = []
for line in range(len(lines)):
curr_line = lines[line]
for key in groups.keys():
if key in curr_line.strip():
if key in already_seen:
lines[line] = ""
else:
already_seen.append(key)
open_par = curr_line.index("(")
close_par = curr_line.index(")")
member_str = ",".join(groups[key].members)
lines[line] = curr_line[:open_par+1] + member_str + curr_line[close_par:]
f.truncate()
f.seek(0)
for line in lines:
f.write(line)
请帮我修改代码!答案将不胜感激 不确定您的代码出了什么问题。看起来你从来都没有在群组中添加文字 无论如何,您可以将数据聚合部分简化为:
import re
import collections
with open('texta.txt', "r+") as f:
p = re.compile(r'^(\S+)\((.*)\)$')
group_ids = collections.OrderedDict() # group -> set of ids (?)
group_words = collections.OrderedDict() # group -> list of words
group = None # last group, or None
for line in f:
match = p.match(line)
if match:
group, ids = match.groups()
group_ids.setdefault(group, set()).update(ids.split(','))
elif line.strip() and group:
group_words.setdefault(group, []).append(line.rstrip())
在此之后,将显示组\ ID和组\字
以所需格式将这些文件写入文件应该不是什么大问题,例如:
with open('textb.txt', 'w') as f:
for group, ids in group_ids.items():
f.write("%s(%s)\n" % (group, ','.join(ids)))
for word in group_words[group]:
f.write(word + '\n')
f.write('\n')
这将产生这个输出:bla是第二个汽车模块之后的另一个模块,用于测试
car(ab,ef,ad,cd)
go
drive
enjoy
bike(ac,de)
ride
bla(xx)
blub
或者,如果您喜欢使用r+模式,请确保先执行f.seek0,然后执行f.truncate,否则旧数据将不会被完全删除。您可以发布您的实际输出吗?这就是我得到的输出!事实上,对我来说,写入文件的风险更大。它不保留缩进,并且它应该写入这样的内容,即已经存在的文本不应该受到影响!你能帮我一下吗?@Chaeltrims在你的例子中,第二个“car”块下面的文本应该在同一行吗,即第二个块所在的位置应该有很多空行,还是只有一个空行作为分隔符?我只需要一个空行作为分隔符!顺便说一句,提供空行对你来说更舒服,那就帮我吧!Thnx,你为我的问题做了很多工作,事实上我对现有文件的写入能力很弱,我做了,这给我带来了很多缩进问题。那么,你能在现有文件而不是标准输出中提供输出吗?什么样的缩进问题?都在一条线上吗?必须显式添加行结束字符。更新:再次编辑。
car(ab,ef,ad,cd)
go
drive
enjoy
bike(ac,de)
ride
bla(xx)
blub