比较两个列表并按字段搜索,Python

比较两个列表并按字段搜索,Python,python,list,compare,Python,List,Compare,我有两个文件要比较,然后生成特定的输出: 1) 下面是用户名文本文件的内容(存储用户观看的最新电影) 2) 下面是films.txt文件的内容,该文件存储了程序中可供用户使用的所有影片 0,Genre, Title, Rating, Likes 1,Sci-Fi,Out of the Silent Planet, PG,3 2,Sci-Fi,Solaris, PG,0 3,Sci-Fi,Star Trek, PG,0 4,Sci-Fi,Cosmos, PG,0 5,Drama, The Engl

我有两个文件要比较,然后生成特定的输出:

1) 下面是用户名文本文件的内容(存储用户观看的最新电影)

2) 下面是films.txt文件的内容,该文件存储了程序中可供用户使用的所有影片

0,Genre, Title, Rating, Likes
1,Sci-Fi,Out of the Silent Planet, PG,3
2,Sci-Fi,Solaris, PG,0
3,Sci-Fi,Star Trek, PG,0
4,Sci-Fi,Cosmos, PG,0
5,Drama, The English Patient, 15,0
6,Drama, Benhur, PG,0
7,Drama, The Pursuit of Happiness, 12, 0
8,Drama, The Thin Red Line, 18,0
9,Romance, When Harry met Sally, 12, 0
10,Romance, You've got mail, 12, 0
11,Romance, Last Tango in Paris, 18, 0
12,Romance, Casablanca, 12, 0
我需要的输出示例:用户当前观看了两部科幻电影和一部爱情电影。因此,输出应按类型(识别科幻和浪漫)搜索电影文本文件,并应在Films.txt文件中列出用户尚未观看的电影。在这种情况下

3,Sci-Fi,Star Trek, PG,0
4,Sci-Fi,Cosmos, PG,0
10,Romance, You've got mail, 12, 0
11,Romance, Last Tango in Paris, 18, 0
12,Romance, Casablanca, 12, 0
我有以下代码尝试执行上述操作,但它生成的输出不正确:

def viewrecs(username):
   #set the username variable to the text file -to use it in the next bit
   username = (username + ".txt")
   #open the username file that stores latest viewings
   with open(username,"r") as f:
      #open the csv file reader for the username file
          fReader=csv.reader(f)
          #for each row in the fReader
          for row in fReader:
             #set the genre variable to the row[0], in which row[0] is all the genres (column 1 in username file)
             genre=row[0]
             #next, open the films file
             with open("films.txt","r") as films:
                #open the csv reader for this file (filmsReader as opposed to fReader)
                filmsReader=csv.reader(films)
                #for each row in the films file
                for row in filmsReader:
                   #and for each field in the row 
                   for field in row:
                      #print(field)
                      #print(genre)
                      #print(field[0])
                      if genre in field and row[2] not in fReader:
                         print(row)
输出(不需要的):

我不想要一个重写或新的解决方案,但是,最好是修复上面的解决方案及其逻辑进程

@吉卜赛人——你的解决方案似乎几乎奏效了。我用过:

def viewrecs(username):

  #set the username variable to the text file -to use it in the next bit
  username = (username + ".txt")
  #open the username file that stores latest viewings
  lookup_set = set()
  with open(username,"r") as f:
    #open the csv file reader for the username file
    fReader=csv.reader(f)
    #for each row in the fReader
    for row in fReader:
      genre = row[1]
      name = row[2]
      lookup_set.add('%s-%s' % (genre, name))
  with open("films.txt","r") as films:
    filmsReader=csv.reader(films)
    #for each row in the films file
    for row in filmsReader:
      genre = row[1]
      name = row[2]
      lookup_key = '%s-%s' % (genre, name)
      if lookup_key not in lookup_set:
        print(row)
输出如下:它打印所有电影中不在第一组的所有行,而不仅仅是基于第一组中的类型的行:

['0', 'Genre', ' Title', ' Rating', ' Likes']
['3', 'Sci-Fi', 'Star Trek', ' PG', ' 0']
['4', 'Sci-Fi', 'Cosmos', ' PG', ' 0']
['5', 'Drama', ' The English Patient', ' 15', ' 0']
['6', 'Drama', ' Benhur', ' PG', ' 0']
['7', 'Drama', ' The Pursuit of Happiness', ' 12', ' 0']
['8', 'Drama', ' The Thin Red Line', ' 18', ' 0']
['10', 'Romance', " You've got mail", ' 12', ' 0']
['11', 'Romance', ' Last Tango in Paris', ' 18', ' 0']
['12', 'Romance', ' Casablanca', ' 12', ' 0']
注意:为了简单起见,我将第一组的格式更改为与所有电影条目相同:

1,Sci-Fi,Out of the Silent Planet, PG
2,Sci-Fi,Solaris, PG

使用集合和单独列表来过滤未观看的适当类型的电影如何?为此,我们甚至可以滥用字典的

def parse_file (file):
    return map(lambda x: [w.strip() for w in x.split(',')], open(file).read().split('\n'))

def movies_to_see ():
    seen = {film[0]: film[1] for film in parse_file('seen.txt')}
    films = parse_file('films.txt')
    to_see = []

    for film in films:
        if film[1] in seen.keys() and film[2] not in seen.values():
            to_see.append(film)
    return to_see 

使用集合和单独列表来过滤未观看的适当类型的电影如何?为此,我们甚至可以滥用字典的

def parse_file (file):
    return map(lambda x: [w.strip() for w in x.split(',')], open(file).read().split('\n'))

def movies_to_see ():
    seen = {film[0]: film[1] for film in parse_file('seen.txt')}
    films = parse_file('films.txt')
    to_see = []

    for film in films:
        if film[1] in seen.keys() and film[2] not in seen.values():
            to_see.append(film)
    return to_see 

好的,构建一个集合,遍历第一个文件,使用Genre+name作为条目

现在,在第二个文件上迭代,并在上面创建的集合中查找Genre+name的条目,如果不存在,则打印出来

一旦我回到家,我就可以输入一些代码

正如承诺的那样,我的代码如下:

def viewrecs(username):
  #set the username variable to the text file -to use it in the next bit
  username = (username + ".txt")
  # In this set we will collect the unique combinations of genre and name
  genre_name_lookup_set = set()
  # In this set we will collect the unique genres 
  genre_lookup_set = set()
  with open(username,"r") as f:
    #open the csv file reader for the username file
    fReader=csv.reader(f)
    #for each row in the fReader
    for row in fReader:
      genre = row[0]
      name = row[1]
      # Add the genre name combination to this set, duplicates will be taken care automatically as set won't allow dupes  
      genre_name_lookup_set.add('%s-%s' % (genre, name))
      # Add genre to this set
      genre_lookup_set.add(genre)
  with open("films.txt","r") as films:
    filmsReader=csv.reader(films)
    #for each row in the films file
    for row in filmsReader:
      genre = row[1]
      name = row[2]
      # Build a lookup key using genre and name, example:Sci-Fi-Solaris
      lookup_key = '%s-%s' % (genre, name)
      if lookup_key not in genre_name_lookup_set and genre in genre_lookup_set:
        print(row)

好的,构建一个集合,遍历第一个文件,使用Genre+name作为条目

现在,在第二个文件上迭代,并在上面创建的集合中查找Genre+name的条目,如果不存在,则打印出来

一旦我回到家,我就可以输入一些代码

正如承诺的那样,我的代码如下:

def viewrecs(username):
  #set the username variable to the text file -to use it in the next bit
  username = (username + ".txt")
  # In this set we will collect the unique combinations of genre and name
  genre_name_lookup_set = set()
  # In this set we will collect the unique genres 
  genre_lookup_set = set()
  with open(username,"r") as f:
    #open the csv file reader for the username file
    fReader=csv.reader(f)
    #for each row in the fReader
    for row in fReader:
      genre = row[0]
      name = row[1]
      # Add the genre name combination to this set, duplicates will be taken care automatically as set won't allow dupes  
      genre_name_lookup_set.add('%s-%s' % (genre, name))
      # Add genre to this set
      genre_lookup_set.add(genre)
  with open("films.txt","r") as films:
    filmsReader=csv.reader(films)
    #for each row in the films file
    for row in filmsReader:
      genre = row[1]
      name = row[2]
      # Build a lookup key using genre and name, example:Sci-Fi-Solaris
      lookup_key = '%s-%s' % (genre, name)
      if lookup_key not in genre_name_lookup_set and genre in genre_lookup_set:
        print(row)

使用
str.split()
str.join()
函数的解决方案:

# change file paths with your actual ones
with open('./text_files/user.txt', 'r') as userfile:
    viewed = userfile.read().split('\n')
    viewed_genders = set(g.split(',')[0] for g in viewed)

with open('./text_files/films.txt', 'r') as filmsfile:
    films = filmsfile.read().split('\n')
    not_viewed = [f for f in films
                  if f.split(',')[1] in viewed_genders and ','.join(f.split(',')[1:3]) not in viewed]

print('\n'.join(not_viewed))
输出:

3,Sci-Fi,Star Trek, PG,0
4,Sci-Fi,Cosmos, PG,0
10,Romance, You've got mail, 12, 0
11,Romance, Last Tango in Paris, 18, 0
12,Romance, Casablanca, 12, 0

使用
str.split()
str.join()
函数的解决方案:

# change file paths with your actual ones
with open('./text_files/user.txt', 'r') as userfile:
    viewed = userfile.read().split('\n')
    viewed_genders = set(g.split(',')[0] for g in viewed)

with open('./text_files/films.txt', 'r') as filmsfile:
    films = filmsfile.read().split('\n')
    not_viewed = [f for f in films
                  if f.split(',')[1] in viewed_genders and ','.join(f.split(',')[1:3]) not in viewed]

print('\n'.join(not_viewed))
输出:

3,Sci-Fi,Star Trek, PG,0
4,Sci-Fi,Cosmos, PG,0
10,Romance, You've got mail, 12, 0
11,Romance, Last Tango in Paris, 18, 0
12,Romance, Casablanca, 12, 0


你能提供这个答案的代码以便理解吗。genr=流派?许多人感谢你的打字错误。对我是说体裁。@pythoncarrot我的答案用代码更新了。请看一看谢谢-但是,在尝试代码时发生了以下错误:lookup_set.add(“%-%”(流派,名称))TypeError:在字符串格式化过程中没有转换所有参数*另外,您能解释一下“(“%-%”位的作用吗?它起作用了!几乎……我不得不添加一个“s”在每次%之后,我都要调整我之前遇到的格式问题。现在的问题是,在电影文件中打印所有不在初始外观中的电影…不仅仅是基于初始设置中的类型的电影。你能提供代码让这个答案易于理解吗。Gener=genre?非常感谢你的打字错误。是的。我的意思是流派。@pythoncarrot我的答案已用代码更新。请看一看谢谢-但是,在尝试代码时出现了以下错误:查找集。添加('%-%'%(流派,名称))类型错误:在字符串格式化过程中没有转换所有参数*另外,您能解释一下“('%-'%”是什么吗?”是吗?成功了!几乎……我不得不加上一个“s”在每个%之后调整我之前遇到的格式问题。现在的问题是打印电影文件中不在初始外观中的每一部电影…不仅仅是基于初始集
movies\u to_see
中的类型的电影是方法,
to_see
是它返回的数组。我不熟悉lambda和映射的使用,同样地,注释每一行将非常有用。我会尝试一下,但不清楚在何处实现它及其意义…解析文件只会生成一个列表,其中每个子列表都是作为参数接收的文件的一行,movies\u to\u see返回您的post
movie想要的输出s_to_see
是方法,
to_see
是它返回的数组。我不熟悉lambda和映射的使用,所以再次强调,注释每一行将非常有用。我会尝试一下,但不清楚在何处实现它及其意义…解析文件只会生成一个列表,其中每个子列表都是一行分隔的作为参数接收的文件,以及movies_to__see将返回您的邮件所需的输出。您是否可以对每一行的代码及其所做的操作进行注释?这对于理解逻辑非常有帮助。此外,在尝试您建议的内容时,会发生以下错误;如果f.split(',')[1]在已查看的_类型和“,”.join(f.split(“,”)[1:3])未在已查看中]索引器:列出索引,共个range@pythoncarrot,不应该有任何错误,我已在您发布的内容上对其进行了测试-效果良好。请检查您的代码是否有错误,并检查某些文件中是否有额外的列。错误仍然存在:如果f.split(',')[1]在体裁和“,”中加入(f.split(“,”)[1:3])未查看]索引器错误:列表索引超出范围只有当这些文件的实际内容与您发布的内容不同时,才会发生此错误。发布实际内容,或从文件中删除额外的空白。您是否能够对每行的代码及其作用进行注释?这对于理解逻辑非常有帮助。另外,在完全按照您的建议进行尝试时,会出现以下错误:如果f.split(',')[1]在已查看的_体裁中,而“,”.join(f.split(',')[1:3])不在已查看的体裁中]索引器:列表i