如何在Python中删除文件中的重复行_Python_File_Line

如何在Python中删除文件中的重复行

python file

如何在Python中删除文件中的重复行,python,file,line,Python,File,Line,我有一个重复行的文件。我想要的是删除一个重复的文件，使其具有唯一的行。但是我得到一个错误output.writelines（uniquelines（filelines）） TypeError:writelines（）参数必须是字符串序列我已经搜索了同样的问题，但我仍然不明白哪里错了。我的代码：代码使用不同的open:编解码器。读取时打开，写入时打开使用编解码器创建的文件对象的readlines。open返回unicode字符串列表。使用open创建文件对象的writelines时，需要一

我有一个重复行的文件。我想要的是删除一个重复的文件，使其具有唯一的行。但是我得到一个错误output.writelines（uniquelines（filelines）） TypeError:writelines（）参数必须是字符串序列 我已经搜索了同样的问题，但我仍然不明白哪里错了。我的代码：

代码使用不同的open:

编解码器。读取时打开

，写入时打开

使用

编解码器创建的文件对象的readlines
。open

返回unicode字符串列表。使用

open

创建文件对象的

writelines

时，需要一个（字节）字符串序列

替换以下行：

output = open("wordlist_unique.txt","w")
output.writelines(uniquelines(filelines))
output.close()

与：

或者最好（使用

和语句）：
如果你以后不需要把这些线排列整齐，我建议你把这些线排成一组<代码>设置（行列表）
。生产线的订单会搞砸，但复制品会消失
 我根本不会费心编码或解码。。只要用open（'organizations'txt'，'rb'）
和open（'wordlist_unique.txt'，'wb'）
打开就可以了。
在python中，使用集合从序列中删除重复对象是很常见的。使用set的唯一缺点是您失去了顺序（就像您在字典键中失去顺序一样，事实上这是完全相同的原因，但这并不重要。）如果文件中的顺序很重要，您可以使用OrderedICT（我认为从…2.7开始的标准库）的键作为psudo集，并从字符串序列中删除重复的字符串。如果顺序不重要，请使用set（）
而不是collections.orderedict.fromkeys（）
。使用文件模式'rb'（读取二进制文件）和'wb'（写入二进制文件），您就不必担心编码了——Python只会将它们视为字节。这使用了2.5之后引入的上下文管理器语法，因此如果这是语法错误，您可能需要根据需要使用上下文库进行调整
import collections

with open(infile, 'rb') as inf, open(outfile, 'wb') as outf:
    outf.writelines(collections.OrderedDict.fromkeys(inf))

你好，有其他解决方案：
对于此文件：
01 WLXB64US
01 WLXB64US
02 WLWB64US
02 WLWB64US
03 WLXB67US
03 WLXB67US
04 WLWB67US
04 WLWB67US
05 WLXB93US
05 WLXB93US
06 WLWB93US
06 WLWB93US

解决方案：
def deleteDuplicate():
    try:
        f = open('file.txt','r')
        lstResul = f.readlines()
        f.close()
        datos = []
        for lstRspn in lstResul:
            datos.append(lstRspn)
        lstSize = len(datos)
        i = 0
        f = open('file.txt','w')
        while i < lstSize:
            if i == 0:
                f.writelines(datos[i])
            else:
                if (str(datos[i-1].strip())).replace(' ','') == (str(datos[i].strip())).replace(' ',''):
                    print('next...')
                else:
                    f.writelines(datos[i])
            i = i + 1

    except Exception as err:

def deleteDuplicate（）：
尝试：
f=打开（'file.txt'，'r'）
lstrell=f.readlines（）
f、 关闭（）
datos=[]
对于LSTRSUL中的lstRspn：
附加数据（lstRspn）
lstSize=len（达托斯）
i=0
f=打开（'file.txt'，'w'）
而我的尺寸：
如果i==0：
f、 写线（datos[i]）
其他：
如果（str（datos[i-1].strip（））.replace（''，''）==（str（datos[i].strip（））.replace（''，''）：
打印（'下一个…'）
其他：
f、 写线（datos[i]）
i=i+1
除异常作为错误外：
您不需要readlines（）：迭代文件对象产生行。此外，您不需要<代码>密钥（）/代码>：迭代字典产生密钥。请考虑添加一个简短的说明来说明为什么您的答案是有用的。光靠代码是不够的。
import collections

with open(infile, 'rb') as inf, open(outfile, 'wb') as outf:
    outf.writelines(collections.OrderedDict.fromkeys(inf))

01 WLXB64US
01 WLXB64US
02 WLWB64US
02 WLWB64US
03 WLXB67US
03 WLXB67US
04 WLWB67US
04 WLWB67US
05 WLXB93US
05 WLXB93US
06 WLWB93US
06 WLWB93US

def deleteDuplicate():
    try:
        f = open('file.txt','r')
        lstResul = f.readlines()
        f.close()
        datos = []
        for lstRspn in lstResul:
            datos.append(lstRspn)
        lstSize = len(datos)
        i = 0
        f = open('file.txt','w')
        while i < lstSize:
            if i == 0:
                f.writelines(datos[i])
            else:
                if (str(datos[i-1].strip())).replace(' ','') == (str(datos[i].strip())).replace(' ',''):
                    print('next...')
                else:
                    f.writelines(datos[i])
            i = i + 1

    except Exception as err: