Python 追加文件中未重复的新信息_Python_Python 3.x_Python 2.7

Python 追加文件中未重复的新信息

python python-3.x python-2.7

Python 追加文件中未重复的新信息,python,python-3.x,python-2.7,Python,Python 3.x,Python 2.7,我正在尝试将文件（'a.txt'）的信息传输到其他（'b.txt'），但在文件'a.txt'中有重复的信息，我想要一个不重复信息的文件（'b.txt'）。如果存在相同的“id”，则会重复该信息，您可以在a.txt中看到：“名称：xxxxxxxx，姓氏：xxxxxxxxxxxx，id:xxxxxxxxxx，等等姓名：XXXXXXXX，姓氏：XXXXXXXXXXXX，id:XXXXXXXXXX等姓名：XXXXXXXX，姓氏：XXXXXXXXXXXX，id:XXXXXXXXXX等我尝试这样做，

我正在尝试将文件（'a.txt'）的信息传输到其他（'b.txt'），但在文件'a.txt'中有重复的信息，我想要一个不重复信息的文件（'b.txt'）。如果存在相同的“id”，则会重复该信息，您可以在a.txt中看到：“名称：xxxxxxxx，姓氏：xxxxxxxxxxxx，id:xxxxxxxxxx，等等姓名：XXXXXXXX，姓氏：XXXXXXXXXXXX，id:XXXXXXXXXX等姓名：XXXXXXXX，姓氏：XXXXXXXXXXXX，id:XXXXXXXXXX等

我尝试这样做，但在“如果id不在用户_id:”中有一个错误（不可损坏的类型：list）：

with open('a.txt', 'r') as f:

   user_id=set()
   user=[] 

   for line in f.readlines(): 
      id=[s[5:-1] for s in f.split() if s.startswith("id")] 


       if id not in user_id:
           user_id.add(id)
           user.append(line)

   with open('b.txt', 'a') as f:
       f.writelines(user)

因此，我想知道是否有其他选项可以将信息传输到其他文件，或者如何解决错误。谢谢大家!

您拆分了错误的变量。它应该是line而不是f（f是一个文件，line是文件中的一行，希望有意义）。另外，我认为这里不需要.readlines（）方法

所以只要改变一下：

for line in f:
    id=[s[5:-1] for s in line.split() if s.startswith("id")]

您拆分了错误的变量。它应该是line而不是f（f是一个文件，line是文件中的一行，希望有意义）。另外，我认为这里不需要.readlines（）方法

所以只要改变一下：

for line in f:
    id=[s[5:-1] for s in line.split() if s.startswith("id")]

f.split（）
您应该在您当前处理的行中查找您的id:
：
t = """name:xxxxxxxxx,surnames:xxxxxxxxxxxxx,id:1xxxxxxxxxxx
name:xxxxxxxxx,surnames:xxxxxxxxxxxxx,id:2xxxxxxxxxxx
name:xxxxxxxxx,surnames:xxxxxxxxxxxxx,id:3xxxxxxxxxxx, etc
name:xxxxxxxxx,surnames:nnnnnnnxxxxxx,id:1xxxxxxxxxxx
name:xxxxxxxxx,surnames:nnnnnnnnxxxxx,id:2xxxxxxxxxxx
name:xxxxxxxxx,surnames:nnnnnnnnxxxxx,id:3xxxxxxxxxxx"""

allLines = t.split("\n")     # same as f.readlines() gives you

data = {}   #  dictionary to hold id as key and line as value
for line in allLines:        # instead of f.readlines(): for the sake of demonstration
    idPos = line.find("id:")      # id: - position
    colPos = line.find(",",idPos) # , after id:, -1 if nothing in
    if idPos > -1:
        id = line[idPos+3: colPos if colPos > -1 else None] # slice the id
    data.setdefault(id,line) # creates key with line if not existent, else does nothing

for l in data:
    print(data[l])

使用dict存储Id/Linecontent将打乱文件的顺序，如果这很重要，请使用与您的方法相同的列表
输出：
name:xxxxxxxxx,surnames:xxxxxxxxxxxxx,id:1xxxxxxxxxxx
name:xxxxxxxxx,surnames:xxxxxxxxxxxxx,id:2xxxxxxxxxxx
name:xxxxxxxxx,surnames:xxxxxxxxxxxxx,id:3xxxxxxxxxxx, etc

f.split（）
您应该在您当前处理的行中查找您的id:
：
t = """name:xxxxxxxxx,surnames:xxxxxxxxxxxxx,id:1xxxxxxxxxxx
name:xxxxxxxxx,surnames:xxxxxxxxxxxxx,id:2xxxxxxxxxxx
name:xxxxxxxxx,surnames:xxxxxxxxxxxxx,id:3xxxxxxxxxxx, etc
name:xxxxxxxxx,surnames:nnnnnnnxxxxxx,id:1xxxxxxxxxxx
name:xxxxxxxxx,surnames:nnnnnnnnxxxxx,id:2xxxxxxxxxxx
name:xxxxxxxxx,surnames:nnnnnnnnxxxxx,id:3xxxxxxxxxxx"""

allLines = t.split("\n")     # same as f.readlines() gives you

data = {}   #  dictionary to hold id as key and line as value
for line in allLines:        # instead of f.readlines(): for the sake of demonstration
    idPos = line.find("id:")      # id: - position
    colPos = line.find(",",idPos) # , after id:, -1 if nothing in
    if idPos > -1:
        id = line[idPos+3: colPos if colPos > -1 else None] # slice the id
    data.setdefault(id,line) # creates key with line if not existent, else does nothing

for l in data:
    print(data[l])

使用dict存储Id/Linecontent将打乱文件的顺序，如果这很重要，请使用与您的方法相同的列表
输出：
name:xxxxxxxxx,surnames:xxxxxxxxxxxxx,id:1xxxxxxxxxxx
name:xxxxxxxxx,surnames:xxxxxxxxxxxxx,id:2xxxxxxxxxxx
name:xxxxxxxxx,surnames:xxxxxxxxxxxxx,id:3xxxxxxxxxxx, etc

Split是一种str
方法，这就是它不起作用的原因。你提到的重复信息，重复的是什么？字，整行？如果您提供此信息（您不必提供实际信息），也许我们可以提供更好的答案。请添加几行数据。我现在有一个新错误。我编辑了这个问题……Split是一种str
方法，这就是它不起作用的原因。你提到的重复信息，重复的是什么？字，整行？如果您提供此信息（您不必提供实际信息），也许我们可以提供更好的答案。请添加几行数据。我现在有一个新错误。我编辑了这个问题。。。