python从字符串列表中删除元素_Python

python从字符串列表中删除元素

python

python从字符串列表中删除元素,python,Python,我试图从字符串列表中删除元素（从文件中读取）。元素本身就是一个列表（以字符串的形式，用逗号分隔）我想从列表中删除那些具有相同元素的字符串。例如： 1:GGSIPU，排名，BTECH，9 2:GGSIPU，BTECH，排名9 3:GGSIPU，BTECH，排名9 因此，应删除第2行和第3行这是我的密码： # to remove duplicates with open('itemset3.txt', 'r') as f: lines = f.readlines() f.cl

我试图从字符串列表中删除元素（从文件中读取）。元素本身就是一个列表（以字符串的形式，用逗号分隔）

我想从列表中删除那些具有相同元素的字符串。例如：

1:GGSIPU，排名，BTECH，9

2:GGSIPU，BTECH，排名9

3:GGSIPU，BTECH，排名9

因此，应删除第2行和第3行

这是我的密码：

# to remove duplicates

with open('itemset3.txt', 'r') as f:
    lines = f.readlines()
    f.close()

i = 0

while (i<len(lines)):
    j = i + 1
    temp = []
    temp1 = lines[i].split(',')
    print 'outer %d %s' % (i,temp1)
    temp.append(temp1[0])
    temp.append(temp1[1])
    temp.append(temp1[2])
    while (j<len(lines)):
        if all(t in lines[j] for t in temp):
            print temp, ' found at ',j,': ',lines[j]
            # lines.remove(lines[j])
            del lines[j] 
        j = j + 1
    i = i + 1

f = open('itemset3.txt', 'w')
i = 0
while (i<len(lines)):
    f.write(lines[i])
    i = i + 1
f.close()

问题是在运行代码之后，输出中仍然存在一些冗余（重复）行。我应该如何纠正它

下面是制作元组时的输出：

('Certified', 'Winner', 'GGSIPU', '7')
('RANK', 'Application', 'GGSIPU', '7')
('Techexpo2015', 'Students', 'GGSIPU', '6')
('CHECK', 'Certified', 'GGSIPU', '7')
('RANK', 'SEMESTER', 'GGSIPU', '9')
('Application', 'Techexpo2015', 'GGSIPU', '7')
('GGSIPU', 'SEMESTER', 'RANK', '9')
('CHECK', 'Techexpo2015', 'GGSIPU', '7')
('RANK', 'Winner', 'GGSIPU', '7')
('CHECK', 'Winner', 'GGSIPU', '7')
('Winner', 'Students', 'GGSIPU', '6')
('GGSIPU', 'Winner', 'RANK', '7')
('GGSIPU', 'BTECH', 'RANK', '9')
('RANK', 'Techexpo2015', 'GGSIPU', '7')
('Certified', 'Students', 'GGSIPU', '6')
('GGSIPU', 'CHECK', 'RANK', '7')
('RANK', 'BTECH', 'GGSIPU', '9')
('GGSIPU', 'Students', 'RANK', '6')
('RANK', 'CALCULATOR', 'GGSIPU', '9')
('Winner', 'Techexpo2015', 'GGSIPU', '7')
('GGSIPU', 'Certified', 'RANK', '7')
('RANK', 'CHECK', 'GGSIPU', '7')
('CHECK', 'Application', 'GGSIPU', '7')
('RANK', 'Certified', 'GGSIPU', '7')
('GGSIPU', 'RANK', 'BTECH', '9')
('GGSIPU', 'CALCULATOR', 'RANK', '9')
('CHECK', 'Students', 'GGSIPU', '6')
('GGSIPU', 'Application', 'RANK', '7')
('GGSIPU', 'Techexpo2015', 'RANK', '7')
('Winner', 'Application', 'GGSIPU', '7')
('BTECH', 'SEMESTER', 'GGSIPU', '9')
('Certified', 'Techexpo2015', 'GGSIPU', '7')
('RANK', 'Students', 'GGSIPU', '6')
('SEMESTER', 'CALCULATOR', 'GGSIPU', '9')
('Certified', 'Application', 'GGSIPU', '7')
('Application', 'Students', 'GGSIPU', '6')
('BTECH', 'CALCULATOR', 'GGSIPU', '9')

下面这样的行仍然存在

1:（‘GGSIPU’、‘申请’、‘排名’、‘7’）

2:（‘排名’、‘申请’、‘GGSIPU’、‘7’）

我从

，

处拆分的字符串构造了元组，并将它们放入一个列表中。最后，我们创建了一个只需消除重复项的集合

在字符串

行上使用replace

，因为数据中很少有字符串没有新行

我处理过一小部分数据。希望它对您有用

我看到一个问题陈述、一个代码示例和一个输入示例，但没问题。@Two Bitalchest

我想从列表中删除那些具有相同元素的字符串

打开文件时使用

和

的全部目的是上下文管理器为您关闭文件。@Borja仍然不是问题。

我想从列表中删除那些具有相同元素的字符串

aka

如何从列表中删除具有相同元素的字符串？

您必须查看数据文件以查找更多不可打印的字符。或者尝试使用测试数据并查看结果i在创建元组后添加了输出请使用您的数据运行上述脚本，并查看是否仍然存在重复项如果存在，请跳过此解决方案

('Certified', 'Winner', 'GGSIPU', '7')
('RANK', 'Application', 'GGSIPU', '7')
('Techexpo2015', 'Students', 'GGSIPU', '6')
('CHECK', 'Certified', 'GGSIPU', '7')
('RANK', 'SEMESTER', 'GGSIPU', '9')
('Application', 'Techexpo2015', 'GGSIPU', '7')
('GGSIPU', 'SEMESTER', 'RANK', '9')
('CHECK', 'Techexpo2015', 'GGSIPU', '7')
('RANK', 'Winner', 'GGSIPU', '7')
('CHECK', 'Winner', 'GGSIPU', '7')
('Winner', 'Students', 'GGSIPU', '6')
('GGSIPU', 'Winner', 'RANK', '7')
('GGSIPU', 'BTECH', 'RANK', '9')
('RANK', 'Techexpo2015', 'GGSIPU', '7')
('Certified', 'Students', 'GGSIPU', '6')
('GGSIPU', 'CHECK', 'RANK', '7')
('RANK', 'BTECH', 'GGSIPU', '9')
('GGSIPU', 'Students', 'RANK', '6')
('RANK', 'CALCULATOR', 'GGSIPU', '9')
('Winner', 'Techexpo2015', 'GGSIPU', '7')
('GGSIPU', 'Certified', 'RANK', '7')
('RANK', 'CHECK', 'GGSIPU', '7')
('CHECK', 'Application', 'GGSIPU', '7')
('RANK', 'Certified', 'GGSIPU', '7')
('GGSIPU', 'RANK', 'BTECH', '9')
('GGSIPU', 'CALCULATOR', 'RANK', '9')
('CHECK', 'Students', 'GGSIPU', '6')
('GGSIPU', 'Application', 'RANK', '7')
('GGSIPU', 'Techexpo2015', 'RANK', '7')
('Winner', 'Application', 'GGSIPU', '7')
('BTECH', 'SEMESTER', 'GGSIPU', '9')
('Certified', 'Techexpo2015', 'GGSIPU', '7')
('RANK', 'Students', 'GGSIPU', '6')
('SEMESTER', 'CALCULATOR', 'GGSIPU', '9')
('Certified', 'Application', 'GGSIPU', '7')
('Application', 'Students', 'GGSIPU', '6')
('BTECH', 'CALCULATOR', 'GGSIPU', '9')

coverting lines into tuples a making sets.

allLines = set()

with open('data') as f:
    for line in f:
        line = line.strip()
        line = tuple(line.split(','))
        allLines.add(line)

pp(allLines)



{('Application', 'Students', 'GGSIPU', '6'),
 ('Application', 'Techexpo2015', 'GGSIPU', '7'),
 ('BTECH', 'CALCULATOR', 'GGSIPU', '9'),
 ('BTECH', 'SEMESTER', 'GGSIPU', '9'),
 ('CHECK', 'Application', 'GGSIPU', '7'),
 ('CHECK', 'Certified', 'GGSIPU', '7'),
 ('CHECK', 'Students', 'GGSIPU', '6'),
 ('CHECK', 'Techexpo2015', 'GGSIPU', '7'),
 ('CHECK', 'Winner', 'GGSIPU', '7'),
 ('Certified', 'Application', 'GGSIPU', '7'),
 ('Certified', 'Students', 'GGSIPU', '6'),
 ('Certified', 'Techexpo2015', 'GGSIPU', '7'),
 ('Certified', 'Winner', 'GGSIPU', '7'),
 ('GGSIPU', 'Application', 'RANK', '7'),
 ('GGSIPU', 'BTECH', 'RANK', '9'),
 ('GGSIPU', 'CALCULATOR', 'RANK', '9'),
 ('GGSIPU', 'CHECK', 'RANK', '7'),
 ('GGSIPU', 'Certified', 'RANK', '7'),
 ('GGSIPU', 'RANK', 'BTECH', '9'),
 ('GGSIPU', 'SEMESTER', 'RANK', '9'),
 ('GGSIPU', 'Students', 'RANK', '6'),
 ('GGSIPU', 'Techexpo2015', 'RANK', '7'),
 ('GGSIPU', 'Winner', 'RANK', '7'),
 ('RANK', 'Application', 'GGSIPU', '7'),
 ('RANK', 'BTECH', 'GGSIPU', '9'),
 ('RANK', 'CALCULATOR', 'GGSIPU', '9'),
 ('RANK', 'CHECK', 'GGSIPU', '7'),
 ('RANK', 'Certified', 'GGSIPU', '7'),
 ('RANK', 'SEMESTER', 'GGSIPU', '9'),
 ('RANK', 'Students', 'GGSIPU', '6'),
 ('RANK', 'Techexpo2015', 'GGSIPU', '7'),
 ('RANK', 'Winner', 'GGSIPU', '7'),
 ('SEMESTER', 'CALCULATOR', 'GGSIPU', '9'),
 ('Techexpo2015', 'Students', 'GGSIPU', '6'),
 ('Winner', 'Application', 'GGSIPU', '7'),
 ('Winner', 'Students', 'GGSIPU', '6'),
 ('Winner', 'Techexpo2015', 'GGSIPU', '7')}

with open('C:\Users\DELL\Documents\itemset3.txt', 'r') as f:
    lines = f.readlines()
    f.close()

linesUp = []
for line in lines:
    linesUp.append(tuple(line.replace("\n","").split(',')))

setOfLines = set(linesUp)