Python 从列表中创建一个新列表,但不包含重复项?

Python 从列表中创建一个新列表,但不包含重复项?,python,string,list,for-loop,if-statement,Python,String,List,For Loop,If Statement,我有一张单子 carner_list = ['<a href="/lyric/34808442/Loyle+Carner/Damselfly">Damselfly</a>', '<a href="/lyric/37311114/Loyle+Carner/Damselfly">Damselfly</a>', '<a href="/lyric/37360958/Loyle+Carner/Dam

我有一张单子

carner_list = ['<a href="/lyric/34808442/Loyle+Carner/Damselfly">Damselfly</a>',
 '<a href="/lyric/37311114/Loyle+Carner/Damselfly">Damselfly</a>',
 '<a href="/lyric/37360958/Loyle+Carner/Damselfly">Damselfly</a>',
 '<a href="/lyric/33661937/Loyle+Carner/The+Isle+of+Arran">The Isle of Arran</a>',
 '<a href="/lyric/33661936/Loyle+Carner/Mean+It+in+the+Morning">Mean It in the Morning</a>']
但这是行不通的

是语法错误还是我完全走错了方向


Best Russell

我键入了一个名为
listContains
的小函数,我认为它可以解决您的问题。您的代码不起作用,因为您在
新列表
中搜索值
i[38://code>,当在
新列表
中时,您附加了
i
的整个值
因此,您还应该对列表的每个值应用[38:]规则
我认为下面的代码可以更好地解释我的意思:

carner_list = ['<a href="/lyric/34808442/Loyle+Carner/Damselfly">Damselfly</a>',
 '<a href="/lyric/37311114/Loyle+Carner/Damselfly">Damselfly</a>',
 '<a href="/lyric/37360958/Loyle+Carner/Damselfly">Damselfly</a>',
 '<a href="/lyric/33661937/Loyle+Carner/The+Isle+of+Arran">The Isle of Arran</a>',
 '<a href="/lyric/33661936/Loyle+Carner/Mean+It+in+the+Morning">Mean It in the Morning</a>']
new_list = []

def listContains(myList, toSearch):
  for val in myList:
    if val[38:] == toSearch:
      return True
  return False

for i in carner_list:
  if listContains(new_list, i[38:]):
    print("found")
  else:
    new_list.append(i)
    print("not")
print(new_list)
carner_list=['',
'',
'',
'',
'']
新列表=[]
def列表包含(myList,toSearch):
对于myList中的val:
如果val[38::][=toSearch:
返回真值
返回错误
对于carner_列表中的i:
如果列表包含(新列表,i[38:]):
打印(“找到”)
其他:
新列表。附加(i)
打印(“非”)
打印(新列表)

如果您想测试它,您可以从字符串的一部分(从索引38到末尾)进行测试,您用来确定重复的字符串部分不是您实际存储在列表中的内容,因此
in
操作符将不起作用

您可以改为使用dict来存储消除重复的字符串,将您关心的字符串部分作为键,以便
中的
操作符可以工作:

new = {}
for i in carner_list:
    key = i[38:]
    if key not in new:
        new[key] = i
print(list(new.values()))
这将产生:

['<a href="/lyric/34808442/Loyle+Carner/Damselfly">Damselfly</a>', '<a href="/lyric/33661937/Loyle+Carner/The+Isle+of+Arran">The Isle of Arran</a>', '<a href="/lyric/33661936/Loyle+Carner/Mean+It+in+the+Morning">Mean It in the Morning</a>']
[“”,,“”]

因此,按照当前搜索的方式,您正在查找子字符串是否等于新列表中的任何内容。这永远不会是真的,因为它是一个子字符串

您可以使用lambda,然后对其进行筛选以获得真实结果,以查看该项是否在新列表中。然后将其强制转换为列表,并检查该列表的长度是否不等于0

len(list(filter(lambda x: i[38:] in x, new_list))) != 0
最终代码

carner_list=['',
'',
'',
'',
'']
新列表=[]
对于carner_列表中的i:
if len(列表(过滤器(λx:i[38:]in x,new_列表))!=0:
打印(“找到”)
其他:
新列表。附加(i)
打印(“非”)

使用
BeautifulSoup
解析html,然后检查

Ex:

from bs4 import BeautifulSoup

carner_list = ['<a href="/lyric/34808442/Loyle+Carner/Damselfly">Damselfly</a>',
 '<a href="/lyric/37311114/Loyle+Carner/Damselfly">Damselfly</a>',
 '<a href="/lyric/37360958/Loyle+Carner/Damselfly">Damselfly</a>',
 '<a href="/lyric/33661937/Loyle+Carner/The+Isle+of+Arran">The Isle of Arran</a>',
 '<a href="/lyric/33661936/Loyle+Carner/Mean+It+in+the+Morning">Mean It in the Morning</a>']

new_list = []
check_val = set()
for i in carner_list:
    s = BeautifulSoup(i, "html.parser")
    if s.text not in check_val:    #check for text
        new_list.append(i)
        check_val.add(s.text)
print(new_list)
['<a href="/lyric/34808442/Loyle+Carner/Damselfly">Damselfly</a>',
 '<a href="/lyric/33661937/Loyle+Carner/The+Isle+of+Arran">The Isle of '
 'Arran</a>',
 '<a href="/lyric/33661936/Loyle+Carner/Mean+It+in+the+Morning">Mean It in the '
 'Morning</a>']
从bs4导入美化组
carner_list=['',
'',
'',
'',
'']
新列表=[]
检查值=设置()
对于carner_列表中的i:
s=BeautifulSoup(i,“html.parser”)
如果s.text不在check_val:#检查文本
新列表。附加(i)
检查值添加(s.text)
打印(新列表)
输出:

from bs4 import BeautifulSoup

carner_list = ['<a href="/lyric/34808442/Loyle+Carner/Damselfly">Damselfly</a>',
 '<a href="/lyric/37311114/Loyle+Carner/Damselfly">Damselfly</a>',
 '<a href="/lyric/37360958/Loyle+Carner/Damselfly">Damselfly</a>',
 '<a href="/lyric/33661937/Loyle+Carner/The+Isle+of+Arran">The Isle of Arran</a>',
 '<a href="/lyric/33661936/Loyle+Carner/Mean+It+in+the+Morning">Mean It in the Morning</a>']

new_list = []
check_val = set()
for i in carner_list:
    s = BeautifulSoup(i, "html.parser")
    if s.text not in check_val:    #check for text
        new_list.append(i)
        check_val.add(s.text)
print(new_list)
['<a href="/lyric/34808442/Loyle+Carner/Damselfly">Damselfly</a>',
 '<a href="/lyric/33661937/Loyle+Carner/The+Isle+of+Arran">The Isle of '
 'Arran</a>',
 '<a href="/lyric/33661936/Loyle+Carner/Mean+It+in+the+Morning">Mean It in the '
 'Morning</a>']
[“”,
'',
'']

为什么不使用正则表达式

import re
carner_list = ['<a href="/lyric/34808442/Loyle+Carner/Damselfly">Damselfly</a>',
 '<a href="/lyric/37311114/Loyle+Carner/Damselfly">Damselfly</a>',
 '<a href="/lyric/37360958/Loyle+Carner/Damselfly">Damselfly</a>',
 '<a href="/lyric/33661937/Loyle+Carner/The+Isle+of+Arran">The Isle of Arran</a>',
 '<a href="/lyric/33661936/Loyle+Carner/Mean+It+in+the+Morning">Mean It in the Morning</a>']

print({re.findall(r'"([^"]*)"', x)[0].split("/")[4]: x for x in carner_list })

#Below is the output generated 
'''
{'Damselfly': '<a href="/lyric/37360958/Loyle+Carner/Damselfly">Damselfly</a>', 'The+Isle+of+Arran': '<a href="/lyric/33661937/Loyle+Carner/The+Isle+of+Arran">The Isle of Arran</a>', 'Mean+It+in+the+Morning': '<a href="/lyric/33661936/Loyle+Carner/Mean+It+in+the+Morning">Mean It in the Morning</a>'}
'''
重新导入
carner_list=['',
'',
'',
'',
'']
打印({re.findall(r')([^“]*)”,x)[0]。拆分(“/”[4]:x代表carner_列表中的x)
#下面是生成的输出
'''
{'damselly':'''The+Isle+of+Arran':'','Mean+It+in+The+The+Morning':'}
'''

您需要具有唯一文本的项目吗?
豆娘花
阿兰岛
?您当前的代码输出有什么问题?当您检查
carner_列表
中的项目是否也在
新列表
中时,这将始终计算为
,因为
新列表
是空的正是这个问题。我能解决它。我很高兴它解决了你的问题。请选择我的答案作为你的解决方案,这样将来的用户也可以立即使用它。谢谢!