Python 正在尝试为字符串列表编制索引，并根据其索引删除字符串_Python_String_Pandas_List_Indexing

Python 正在尝试为字符串列表编制索引，并根据其索引删除字符串

python string pandas list indexing

Python 正在尝试为字符串列表编制索引，并根据其索引删除字符串,python,string,pandas,list,indexing,Python,String,Pandas,List,Indexing,我有一个列表列表（称为copy），其中每个列表中的元素（在大列表中）都是表示某些电影的字符串（如下所示）：其中一些词代表了电影类型。我想做的是，在每个列表中，寻找属于不同类型的单词（通过查看这些单词是否在一个名为set_genres的集合中），将它们放在列表的开头，并在后面附加单词“movie”。如果列表中有多个流派，我只想在最后一个流派之后加上“电影”一词。设置_类型和所需输出如下： set_genres={'action', 'adventure', 'animation', 'co

我有一个列表列表（称为copy），其中每个列表中的元素（在大列表中）都是表示某些电影的字符串（如下所示）：

其中一些词代表了电影类型。我想做的是，在每个列表中，寻找属于不同类型的单词（通过查看这些单词是否在一个名为set_genres的集合中），将它们放在列表的开头，并在后面附加单词“movie”。如果列表中有多个流派，我只想在最后一个流派之后加上“电影”一词。设置_类型和所需输出如下：

set_genres={'action',
 'adventure',
 'animation',
 'comedy',
 'crime',
 'documentary',
 'drama',
 'family',
 'fantasy',
 'foreign',
 'history',
 'horror',
 'music',
 'mystery',
 'romance',
 'science_fiction',
 'thriller',
 'tv_movie',
 'war',
 'western'}

#Output
[['history','action movie', '1960'],
 ['western','adventure movie', '1960'],
 ['fantasy movie','3d'],
 ['action', 'adventure movie', 'agent'], 
....]

我尝试实现这一目标的代码如下：

keys=[]
for list_top in copy:
        for idx, word in enumerate(list_top):
                if word in set_genres:
                        keys.append((idx,word))
        keys.sort(reverse=True)
        for idx, word in keys:
                del list_top[idx]
        for idx, word in keys:
                if idx==len(keys)-1:
                        list_top.insert(0,'{} movie'.format(word))
                else:
                        list_top.insert(0,word)

然而，这是不起作用的，我一直无法找出原因。它给了我以下错误：

indexes=[]...
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
 in 
      8                         keys.sort(reverse=True)
      9                 for idx, word in keys:
---> 10                         del list_top[idx]
     11                 for idx, word in keys:
     12                         if idx==len(keys)-1:

IndexError: list assignment index out of range

如果有人知道会出什么问题，我会感谢你的帮助

类似这样：

set_genres={'action',
 'adventure',
 'animation',
 'comedy',
 'crime',
 'documentary',
 'drama',
 'family',
 'fantasy',
 'foreign',
 'history',
 'horror',
 'music',
 'mystery',
 'romance',
 'science_fiction',
 'thriller',
 'tv_movie',
 'war',
 'western'}

base = [['history', '1960', 'action'],
 ['1960', 'western', 'adventure'],
 ['3d', 'fantasy'],
 ['agent', 'action', 'adventure']]

print(set_genres)
print(base)

for movie in base:
    for s in movie:
        if s not in set_genres:
            movie.remove(s)
            movie.append(s)


print(base)

产出：

[['history', 'action', '1960'], ['western', 'adventure', '1960'], ['fantasy', '3d'], ['action', 'adventure', 'agent']]

关于您的错误：

您正在修改正在迭代的列表。如果这样做，列表的大小将缩小，因此最终这将超出列表的边界

这就是您需要的：

copy=[['history'，'1960'，'action']，
[‘1960’、‘西部’、‘冒险’]，
[‘3d’、‘幻想’]，
[‘代理’、‘动作’、‘冒险’]]
set_genres={'action'，
“冒险”，
“动画”，
“喜剧”，
“犯罪”，
"纪录片",，
"戏剧",，
“家庭”，
“幻想”，
“外国”，
"历史",，
“恐怖”，
"音乐",，
“神秘”，
“浪漫”，
“科幻小说”，
“惊悚片”，
“电视电影”，
"战争",，
“西部”}
对于索引副本，请在枚举（副本）中列出顶部：
发现的单词=假
打印（列表（反转（列表顶部）））
对于ind_list_top，枚举中的单词（list（reversed（list_top）））：
如果找不到单词：
如果单词属于set_类型：
list_top[len（list_top）-ind_list_top-1]='{}movie'.格式（word）
找到的单词=真
如果找到单词：
复制[ind\u copy]=列表顶部
打印（副本）

对曾经的@БМцццСцццццццццц109

for ind_copy, list_top in enumerate(copy):
   keys=[]
   for ind_list_top, word  in enumerate(list_top):
      if word in set_genres:
         keys.append(word)
         del list_top[ind_list_top]
   keys[-1] = '{} movie'.format(keys[-1])
   copy[ind_copy] = keys + list_top

具有优化反向遍历的扩展

排序方法：
genres_set = {'action', 'adventure', 'animation', 'comedy', 'crime', 'documentary', 'drama', 'family',
              'fantasy', 'foreign', 'history', 'horror', 'music', 'mystery', 'romance', 'science_fiction',
              'thriller', 'tv_movie', 'war', 'western'}
inp_list = [['history', '1960', 'action'],
            ['1960', 'western', 'adventure'],
            ['3d', 'fantasy'],
            ['agent', 'action', 'adventure']
            ]
genres_res = [sorted(lst, key=lambda x: x in genres_set, reverse=True) for lst in inp_list]
for lst in genres_res:
    for i, genre in enumerate(lst[::-1]):
        if genre in genres_set:
            lst[-i-1] += ' movie'   # updating the last genre in sublist
            break
print(genres_res)

输出：
[['history', 'action movie', '1960'], ['western', 'adventure movie', '1960'], ['fantasy movie', '3d'], ['action', 'adventure movie', 'agent']]


另一种方法是使用发电机功能：
def arrange_genres(inp_list):
    for lst in inp_list:
        lst = sorted(lst, key=lambda x: x in genres_set, reverse=True)
        for i, genre in enumerate(lst[::-1]):
            if genre in genres_set:
                lst[-i - 1] += ' movie'
                break
        yield lst

res = list(arrange_genres(inp_list))

您可以使用列表理解
for i,list_top in enumerate(copy):
    temp = [x for x in list_top if x in set_genres]
    temp[-1]=temp[-1]+' movie'
    copy[i] = temp + [x for x in list_top if x not in set_genres]

print(copy)

>>output
[['history', 'action movie', '1960'], ['western', 'adventure movie', '1960'], ['fantasy movie', '3d'], ['action', 'adventure movie', 'agent']]

由于pandas
被标记，这里有一种使用np
和pd
的方法：
df=pd.DataFrame(l)



条件：
c1=df.ffill(1).iloc[:,-1].isin(set_genres) #check if the last element isin set_genres
c2=df.eq(df.ffill(1).iloc[:,-1],axis=0) #check where it matches the df elements
c3=df.isna() #check for None


选择：
choice1=df.mask(c2,df.astype(str)+' movie') #mask c1 and add movie to the elements
choice2=''


然后np.排序
和np.选择

pd.DataFrame(np.sort(np.select([c1[:,None]&c2,c3],[choice1,choice2],default=df)).T[::-1].T)


请同时发布一个与所提供的示例输入输出列表相关的set\u genres
示例。不要修改您正在迭代的列表。@anky\u 91我在上面添加了这个@DanielRoseman你能告诉我为什么吗？但同时，副本已经是原件的深度副本list@J.Doe，所需输出中的第二项未排序['western'、'action movie'、'1960']
-这是有意的吗？体裁的顺序重要吗？谢谢你的帮助！这确实添加了“电影”一词。然而，当有两种类型时，它会在两种类型中都添加“电影”一词，而不仅仅是在最后一种类型中。而且，它并不像我所说的那样，把这一类型放在列表的开头wanted@J.Doe您想先更改alphbetic这个词吗？对于属于体裁的词，我想将它们附加到列表的开头。然后，如果有不止一种类型，我只想在最后一种类型后面加上“电影”一词genre@J.Doe就像你说的那样改变了谢谢！这确实添加了“电影”一词，但当有两种类型时，它会将“电影”一词添加到这两种类型中，而不仅仅是最后一种类型。此外，它并没有像我所希望的那样将类型放在列表的开头。更具体地说，你能提供一个带有“电影”一词的示例输入列表吗？然后我会更新我的代码。通过这个，我还得到了indexer:list索引超出范围，它指向'keys[-1]='{}movie'。format（keys[-1]）'行。然后我们需要在附加movie之前检查键是否为空。你能试试这个：`if（len（keys）>0）：keys[-1]='{}movie'.格式化（keys[-1]）复制[ind\u copy]=keys+list\u top
choice1=df.mask(c2,df.astype(str)+' movie') #mask c1 and add movie to the elements
choice2=''

pd.DataFrame(np.sort(np.select([c1[:,None]&c2,c3],[choice1,choice2],default=df)).T[::-1].T)

               0                1       2
0        history     action movie    1960
1        western  adventure movie    1960
2  fantasy movie               3d        
3          agent  adventure movie  action