在Python 3.0中查找和替换列表元素？_Python_String_Python 3.x_List_Replace

在Python 3.0中查找和替换列表元素？

python string python-3.x list replace

在Python 3.0中查找和替换列表元素？,python,string,python-3.x,list,replace,Python,String,Python 3.x,List,Replace,我有三个大列表，分别是106756、106588和100个单词的L0、L1和L2 L0和L1组成数据标记化为单词标记，以及L2组成L0和L1列表共有的单词假设 L1 = ['newnes', 'imprint', 'elsevier', 'corporate', 'drive', 'suite', 'burlington', 'usa', 'linacre', 'jordan', 'hill', 'oxford', 'uk', 'elsevier', 'inc', 'ri

我有三个大列表，分别是106756、106588和100个单词的

L0

、

L1

和

L2

L0

和

L1

组成数据标记化为单词标记，以及

L2

组成

L0

和

L1

列表共有的单词

假设

L1 = ['newnes', 'imprint', 'elsevier', 'corporate', 'drive', 'suite',
     'burlington', 'usa', 'linacre', 'jordan', 'hill', 'oxford', 'uk', 
     'elsevier', 'inc', 'right', 'reserved', 'exception', 'newness', 'uk', ...]

L2 = ['usa', 'uk', 'hill', 'drive', ... ]

如您所见，在L1列表中有重复的单词，如

“newness”

，

“uk”

我需要的是，对于在

L2

中发现的每个

单词，比如（说'newness'
，'uk'
），我需要用它的修改的注入形式
替换它，就像在发现的单词的开始
或结束
位置

附加一个

特殊字符

。此外，对于发现的单词的所有实例（在

L2

中），应替换为

L1

中相同单词的修改版本。比如说,

假设单词

newness

在

L1

列表中出现了100次，newness这个词也出现在

L2

中。类似地，在

L2

中也有100个单词，它们也出现在

L1

中，具有多个频率

然后，在转换之后，列表应该看起来有点像这样：

newness ------> $newness$

uk -----------> $uk$

如何在列表中实现这一点？请帮忙。我也是python的新手。我只是想知道python中是否有一些命令可以实现这一点？我不知道从哪里开始？

为了计算列表中的内容，python在其集合模块中提供了一个类似于dict的Counter（）类：它计算O（n）中的发生次数，并将它们作为字典提供

from collections import Counter


L1 = ['newnes', 'imprint', 'elsevier', 'corporate', 'drive', 'suite',
     'burlington', 'usa', 'linacre', 'jordan', 'hill', 'oxford', 'uk', 
     'elsevier', 'inc', 'right', 'reserved', 'exception', 'newness', 'uk', ...]

L2 = ['usa', 'uk', 'hill', 'drive', ... ]


c = Counter(L1)
print(c)

输出：

Counter({'elsevier': 2, 'uk': 2, 'newnes': 1, 'imprint': 1, 'corporate': 1, 
         'drive': 1, 'suite': 1, 'burlington': 1, 'usa': 1, 'linacre': 1, 
         'jordan': 1, 'hill': 1, 'oxford': 1, 'inc': 1, 'right': 1, 'reserved': 1,
         'exception': 1, 'newness': 1, Ellipsis: 1})

['newnes', 'imprint', '#elsevier#', 'corporate', 'drive', 'suite', 'burlington', 
 'usa', 'linacre', 'jordan', 'hill', 'oxford', 'uk', '#elsevier#', 'inc', 
 'right', 'reserved', 'exception', 'newness', 'uk', Ellipsis]

['usa', 'uk', 'hill', 'drive', Ellipsis]

['~newnes~', '####imprint####', '++elsevier++', '-corporate-', ':drive:', ';suite;', 
 '=burlington=', ')usa)', '####imprint####', '(linacre(', '/jordan/', '&hill&', 
 '%oxford%', '***uk***', '***uk***', '++elsevier++', '$inc$', '§right§', '!reserved!', 
 '####imprint####', '#exception#', '####imprint####', '*newness*', '***uk***', 
 '+...+']

[')usa)', '***uk***', '&hill&', ':drive:', '+...+']

它提供了一种方便的方法，可以将结果排序为元组列表

（key，count）

named-如果使用第一个元组，则会得到最常用的单词，可以与列表理解一起使用，以修改源列表：

word,_ = c.most_common()[0]  # get word mos often used

# inplace modification of L1
L1[:] = [ x if x != word else "#"+word+"#" for x in L1] # use x if not the most used word
L2[:] = [ x if x != word else "#"+word+"#" for x in L2] # else pre-/append #

print(L1)
print(L2)

输出：

Counter({'elsevier': 2, 'uk': 2, 'newnes': 1, 'imprint': 1, 'corporate': 1, 
         'drive': 1, 'suite': 1, 'burlington': 1, 'usa': 1, 'linacre': 1, 
         'jordan': 1, 'hill': 1, 'oxford': 1, 'inc': 1, 'right': 1, 'reserved': 1,
         'exception': 1, 'newness': 1, Ellipsis: 1})

['newnes', 'imprint', '#elsevier#', 'corporate', 'drive', 'suite', 'burlington', 
 'usa', 'linacre', 'jordan', 'hill', 'oxford', 'uk', '#elsevier#', 'inc', 
 'right', 'reserved', 'exception', 'newness', 'uk', Ellipsis]

['usa', 'uk', 'hill', 'drive', Ellipsis]

['~newnes~', '####imprint####', '++elsevier++', '-corporate-', ':drive:', ';suite;', 
 '=burlington=', ')usa)', '####imprint####', '(linacre(', '/jordan/', '&hill&', 
 '%oxford%', '***uk***', '***uk***', '++elsevier++', '$inc$', '§right§', '!reserved!', 
 '####imprint####', '#exception#', '####imprint####', '*newness*', '***uk***', 
 '+...+']

[')usa)', '***uk***', '&hill&', ':drive:', '+...+']

计数器

中项目的顺序与原始列表中的顺序相关，在

L1

elsevier

中有多个项目的计数为2，它是第一个项目，因此在使用

最常见（）时也是第一个项目

编辑4条评论：
from collections import Counter

L1 = ['newnes', 'imprint', 'elsevier', 'corporate', 'drive', 'suite',
     'burlington', 'usa','imprint', 'linacre', 'jordan', 'hill', 'oxford', 'uk','uk', 
     'elsevier', 'inc', 'right', 'reserved','imprint', 'exception', 'imprint','newness', 'uk', "..."]

L2 = ['usa', 'uk', 'hill', 'drive', "..."]


c = Counter(L1) 


substs = "#*+~-:;=)(/&%$§!"
i = 0
for word,count in c.most_common():
    temp = substs[i]*count # use the i-th char as substitute, apply it count times
    L1[:] = [ x if x != word else temp+word+temp for x in L1] # use x if not the most used word
    L2[:] = [ x if x != word else temp+word+temp for x in L2] # else pre-/append #
    i += 1
    i = i % len(substs) # wrap around

print(L1)
print(L2)

输出：
Counter({'elsevier': 2, 'uk': 2, 'newnes': 1, 'imprint': 1, 'corporate': 1, 
         'drive': 1, 'suite': 1, 'burlington': 1, 'usa': 1, 'linacre': 1, 
         'jordan': 1, 'hill': 1, 'oxford': 1, 'inc': 1, 'right': 1, 'reserved': 1,
         'exception': 1, 'newness': 1, Ellipsis: 1})

['newnes', 'imprint', '#elsevier#', 'corporate', 'drive', 'suite', 'burlington', 
 'usa', 'linacre', 'jordan', 'hill', 'oxford', 'uk', '#elsevier#', 'inc', 
 'right', 'reserved', 'exception', 'newness', 'uk', Ellipsis]

['usa', 'uk', 'hill', 'drive', Ellipsis]

['~newnes~', '####imprint####', '++elsevier++', '-corporate-', ':drive:', ';suite;', 
 '=burlington=', ')usa)', '####imprint####', '(linacre(', '/jordan/', '&hill&', 
 '%oxford%', '***uk***', '***uk***', '++elsevier++', '$inc$', '§right§', '!reserved!', 
 '####imprint####', '#exception#', '####imprint####', '*newness*', '***uk***', 
 '+...+']

[')usa)', '***uk***', '&hill&', ':drive:', '+...+']

为了统计列表中的内容，python在其集合模块中提供了一个类似于dict的Counter（）类：它统计O（n）中出现的事件，并将它们作为字典提供
from collections import Counter


L1 = ['newnes', 'imprint', 'elsevier', 'corporate', 'drive', 'suite',
     'burlington', 'usa', 'linacre', 'jordan', 'hill', 'oxford', 'uk', 
     'elsevier', 'inc', 'right', 'reserved', 'exception', 'newness', 'uk', ...]

L2 = ['usa', 'uk', 'hill', 'drive', ... ]


c = Counter(L1)
print(c)

输出：
Counter({'elsevier': 2, 'uk': 2, 'newnes': 1, 'imprint': 1, 'corporate': 1, 
         'drive': 1, 'suite': 1, 'burlington': 1, 'usa': 1, 'linacre': 1, 
         'jordan': 1, 'hill': 1, 'oxford': 1, 'inc': 1, 'right': 1, 'reserved': 1,
         'exception': 1, 'newness': 1, Ellipsis: 1})

['newnes', 'imprint', '#elsevier#', 'corporate', 'drive', 'suite', 'burlington', 
 'usa', 'linacre', 'jordan', 'hill', 'oxford', 'uk', '#elsevier#', 'inc', 
 'right', 'reserved', 'exception', 'newness', 'uk', Ellipsis]

['usa', 'uk', 'hill', 'drive', Ellipsis]

['~newnes~', '####imprint####', '++elsevier++', '-corporate-', ':drive:', ';suite;', 
 '=burlington=', ')usa)', '####imprint####', '(linacre(', '/jordan/', '&hill&', 
 '%oxford%', '***uk***', '***uk***', '++elsevier++', '$inc$', '§right§', '!reserved!', 
 '####imprint####', '#exception#', '####imprint####', '*newness*', '***uk***', 
 '+...+']

[')usa)', '***uk***', '&hill&', ':drive:', '+...+']

它提供了一种方便的方法，可以将结果排序为元组列表（key，count）
named-如果使用第一个元组，则会得到最常用的单词，可以与列表理解一起使用，以修改源列表：
word,_ = c.most_common()[0]  # get word mos often used

# inplace modification of L1
L1[:] = [ x if x != word else "#"+word+"#" for x in L1] # use x if not the most used word
L2[:] = [ x if x != word else "#"+word+"#" for x in L2] # else pre-/append #

print(L1)
print(L2)

输出：
Counter({'elsevier': 2, 'uk': 2, 'newnes': 1, 'imprint': 1, 'corporate': 1, 
         'drive': 1, 'suite': 1, 'burlington': 1, 'usa': 1, 'linacre': 1, 
         'jordan': 1, 'hill': 1, 'oxford': 1, 'inc': 1, 'right': 1, 'reserved': 1,
         'exception': 1, 'newness': 1, Ellipsis: 1})

['newnes', 'imprint', '#elsevier#', 'corporate', 'drive', 'suite', 'burlington', 
 'usa', 'linacre', 'jordan', 'hill', 'oxford', 'uk', '#elsevier#', 'inc', 
 'right', 'reserved', 'exception', 'newness', 'uk', Ellipsis]

['usa', 'uk', 'hill', 'drive', Ellipsis]

['~newnes~', '####imprint####', '++elsevier++', '-corporate-', ':drive:', ';suite;', 
 '=burlington=', ')usa)', '####imprint####', '(linacre(', '/jordan/', '&hill&', 
 '%oxford%', '***uk***', '***uk***', '++elsevier++', '$inc$', '§right§', '!reserved!', 
 '####imprint####', '#exception#', '####imprint####', '*newness*', '***uk***', 
 '+...+']

[')usa)', '***uk***', '&hill&', ':drive:', '+...+']

计数器
中项目的顺序与原始列表中的顺序相关，在L1
-elsevier
中有多个项目的计数为2，它是第一个项目，因此在使用最常见（）时也是第一个项目

编辑4条评论：
from collections import Counter

L1 = ['newnes', 'imprint', 'elsevier', 'corporate', 'drive', 'suite',
     'burlington', 'usa','imprint', 'linacre', 'jordan', 'hill', 'oxford', 'uk','uk', 
     'elsevier', 'inc', 'right', 'reserved','imprint', 'exception', 'imprint','newness', 'uk', "..."]

L2 = ['usa', 'uk', 'hill', 'drive', "..."]


c = Counter(L1) 


substs = "#*+~-:;=)(/&%$§!"
i = 0
for word,count in c.most_common():
    temp = substs[i]*count # use the i-th char as substitute, apply it count times
    L1[:] = [ x if x != word else temp+word+temp for x in L1] # use x if not the most used word
    L2[:] = [ x if x != word else temp+word+temp for x in L2] # else pre-/append #
    i += 1
    i = i % len(substs) # wrap around

print(L1)
print(L2)

输出：
Counter({'elsevier': 2, 'uk': 2, 'newnes': 1, 'imprint': 1, 'corporate': 1, 
         'drive': 1, 'suite': 1, 'burlington': 1, 'usa': 1, 'linacre': 1, 
         'jordan': 1, 'hill': 1, 'oxford': 1, 'inc': 1, 'right': 1, 'reserved': 1,
         'exception': 1, 'newness': 1, Ellipsis: 1})

['newnes', 'imprint', '#elsevier#', 'corporate', 'drive', 'suite', 'burlington', 
 'usa', 'linacre', 'jordan', 'hill', 'oxford', 'uk', '#elsevier#', 'inc', 
 'right', 'reserved', 'exception', 'newness', 'uk', Ellipsis]

['usa', 'uk', 'hill', 'drive', Ellipsis]

['~newnes~', '####imprint####', '++elsevier++', '-corporate-', ':drive:', ';suite;', 
 '=burlington=', ')usa)', '####imprint####', '(linacre(', '/jordan/', '&hill&', 
 '%oxford%', '***uk***', '***uk***', '++elsevier++', '$inc$', '§right§', '!reserved!', 
 '####imprint####', '#exception#', '####imprint####', '*newness*', '***uk***', 
 '+...+']

[')usa)', '***uk***', '&hill&', ':drive:', '+...+']

上面的代码仅转换列表1中频率最高的顶部单词。如何在列表中的所有单词中包含相同的内容，而不考虑频率？我的意思是如何在一个步骤中转换L1的elsevier
和uk
。不是一步完成的，但是您可以迭代计数器结果，并将您喜欢的任何更改应用于这两个列表。对于计数器的所有结果，我使用I-th
字符，并在前后固定之前将其乘以计数次数。这不是一步，而是一步一步地修改列表，先替换一个单词，然后替换下一个单词。在循环中使用print（L1）
查看数据更改表单。你好，帕特里克。如何以高效的方式做到这一点。假设我有大量数据。将特殊字符（如word）注入到#word#需要更多的时间。@M#S如果这是一个不同的问题，您可能需要问一个新问题-如果您尝试用此答案解决它，请详细说明如何/什么不起作用，因为您认为它起作用。上面的代码只转换列表1中最上面的单词，频率最高。如何在列表中的所有单词中包含相同的内容，而不考虑频率？我的意思是如何在一个步骤中转换L1的elsevier
和uk
。不是一步完成的，但是您可以迭代计数器结果，并将您喜欢的任何更改应用于这两个列表。对于计数器的所有结果，我使用I-th
字符，并在前后固定之前将其乘以计数次数。这不是一步，而是一步一步地修改列表，先替换一个单词，然后替换下一个单词。在循环中使用print（L1）
查看数据更改表单。你好，帕特里克。如何以高效的方式做到这一点。假设我有大量数据。将特殊字符（如word）注入“word”需要更多的时间。@M#S如果这是一个不同的问题，您可能需要问一个新问题-如果您尝试用此答案解决它，请详细说明如何/哪些不起作用，因为您认为需要它才能起作用。