Python 如何仅从列表中提取英语单词？_Python_Regex_String_Nlp

Python 如何仅从列表中提取英语单词？

python regex string nlp

Python 如何仅从列表中提取英语单词？,python,regex,string,nlp,Python,Regex,String,Nlp,我试图从以下列表中仅提取英语单词： l = ['0', 'b', 'x14', 'x00', 'x1fP', 'xe0O', 'xd0', 'xea', 'i', 'x10', 'xa2', 'xd8', 'x08', 'x00', '00', 'x9d', 'x14', 'x00', 'x80', 'xcc', 'xbf', 'xb4', 'xdbLB', 'xb0', 'x7f', 'xe9', 'x9a', 'x87', 'xc6AZ', 'x005', 'x00', 'x00', 'x0

我试图从以下列表中仅提取英语单词：

l = ['0', 'b', 'x14', 'x00', 'x1fP', 'xe0O', 'xd0', 'xea', 'i', 'x10', 'xa2', 'xd8', 'x08', 'x00', '00', 'x9d', 'x14', 'x00', 'x80', 'xcc', 'xbf', 'xb4', 'xdbLB', 'xb0', 'x7f', 'xe9', 'x9a', 'x87', 'xc6AZ', 'x005', 'x00', 'x00', 'x00', 'x00', 'x00yR', 'G', 'x10', 'x00', 'xdc', 'x05', 'xde', 'x05', 'xe2', 'x05', 'xe8', 'x05', 'xdb', 'x05', 'xea', 'x05', 'x00', 'x00', 'x00', 't', 'x00', 'x04', 'x00', 'xef', 'xbeyRnDyR', 'G', 'x00', 'x00', 'x00', 'xe5E', 'x00', 'x00', 'x00', 'x00', 'xfb', 'x05', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'xe2', 'x0e', 'x00', 'xdc', 'x05', 'xde', 'x05', 'xe2', 'x05', 'xe8', 'x05', 'xdb', 'x05', 'xea', 'x05', 'x00', 'x00', 'x1c', 'x00', 'x80', 'x001', 'x00', 'x00', 'x00', 'x00', 'x00yR', 'G', 'x10', 'x00VBS', '', '', 'x00', 'x00', 't', 'x00', 'x04', 'x00', 'xef', 'xbeyR', 'GyR', 'G', 'x00', 'x00', 'x00', 'x9e', 'xa5', 'x00', 'x00', 'x00', 'x00K', 'x02', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'xe2', 'x0e', 'x00V', 'x00B', 'x00S', 'x00', 'x00R', 'x00a', 'x00n', 'x00s', 'x00o', 'x00m', 'x00w', 'x00a', 'x00r', 'x00e', 'x00', 'x00', 'x00', 'x00d', 'x00o', 'x00n', 'x00e', 'x00', 'x00', 'x00', 'x00', 'x80', 'x001', 'x00', 'x00', 'x00', 'x00', 'x00yRmG', 'x10', 'x00VBS', '', '', 'x00', 'x00', 't', 'x00', 'x04', 'x00', 'xef', 'xbeyR', 'GyRmG', 'x00', 'x00', 'x00', 'xb6', 'xba', 'x00', 'x00', 'x00', 'x00', 'xa4', 'x01', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x98w', 'x00V', 'x00B', 'x00S', 'x00', 'x00R', 'x00a', 'x00n', 'x00s', 'x00o', 'x00m', 'x00w', 'x00a', 'x00r', 'x00e', 'x00', 'x00', 'x00', 'x00d', 'x00o', 'x00n', 'x00e', 'x00', 'x00', 'x00', 'x00', 'xa4', 'x002', 'x00c', 'xf1', 'x02', 'x00oRjX', 'Test', 'For', 'SO', 'PDF', 'pdf', 'x00t', 'x00', 't', 'x00', 'x04', 'x00', 'xef', 'xbeyR', 'GyR', 'G', 'x00', 'x00', 'x00', 'xcf', 'xbc', 'x00', 'x00', 'x00', 'x00z', 'x04', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'x00', 'xd23', 'x98', 'x00D', 'x00e', 'x00f', 'x00e', 'x00n', 'x00s', 'x00e', 'x00', 'x00R', 'x00u', 'x00l', 'x00e', 'x00', 'x00', 'x00', 'x00V', 'x00B', 'x00S', 'x00', 'x00R', 'x00a', 'x00n', 'x00s', 'x00o', 'x00m', 'x00w', 'x00a', 'x00r', 'x00e', 'x00', 'x00p', 'x00d', 'x00f', 'x00', 'x00', 'x000', 'x00', 'x00', 'x00', '3']

从这个列表中，我需要的单词是

[“Test”，“For”，“SO”，“PDF”]

我尝试了以下方法：

for i in range(num_of_values):
    values = EnumValue(key, i)
    res = re.findall(r'\w+', str(values))
    print(res)

有人设法提取单词吗？

如果您知道要搜索的内容，只需搜索即可

# 'a' is your data list 

search=["Test", "For", "SO", "PDF", "pdf"]

for s in search:
    print(a.index(s))

在列表中搜索的单词的输出索引：

但是如果你想搜索所有需要听写的英语单词，那么就搜索每个英语单词

# This find all the occurrences for every words in the list 'a' 

#search is your list with words to search 

for s in search:
    indeces = [i for i, x in enumerate(a) if x == s]
    print(s,indeces)

输出：

Test [253]
For [254]
SO [255]
PDF [256]
pdf [257]

似乎你事先知道从列表中提取什么，所以我给你一些想法：

# Example 1: Search using a loop and create a new list
list_2 = []
for element in list_1:
    if 'pdf' in element:
            list_2.append(element)
            print('the element is in the list and was added to list_2 ')
    
# Example 2: If you know in advance what to extract use list comprehension
list_0 = ['Test', 'For', 'SO', 'PDF', 'pdf']
for elements in list_0: 
    if elements in list_1:
        print(elements)

# Checking if something is inside the list
for elements in list_1:
    if 'Test' in elements:
        print('The element is in the list')
        
# Return the element number in the list
index = list_1.index('Test')
print(index)

让我知道这是否适合您。

您可以在某种程度上使用它，它允许检查某个单词在给定语言中是否有效。在检查语言有效性之前，您需要检查

单词不是空的，并且长度超过一个字符
这个词只由字母组成

因此，在Python中，您需要首先安装

pyenchant

库（

pip在终端/控制台中安装pyenchant

），然后

导入附魔
l=['0'，'b'，'x14'，'x00'，'x1fP'，'xe0O'，'xd0'，'xea'，'i'，'x10'，'xa2'，'xd8'，'x08'，'x00'，'x9d'，'x14'，'x00'，'x80'，'XBC'，'xbf'，'xe9'，'x9a'，'x87'，'xc6AZ'，'x005'，'x00'，'x00'，'x00'，'x00'，'x00yR'，'G'，'x10'，'x00'，'x00'，'x05'，'x05'，'XB05'，'x05'，'x05'，'XA0 x05'，'x05'“x00”、“x00”、“x00”、“t”、“x00”、“x00”、“x04”、“x00”、“XF”、“xbeyRnDyR”、“G”、“x00”、“x00”、“x00”、“x00”、“x00”、“x00”、“x00”、“x00”、“x00”、“x00”、“x00”、“x00”、“x00”、“x00”、“x00”、“x00”、“x00”、“x00”、“x00”、“x00”、“x00”、“x0e”、“x00”、“x00”、“x00”、“x00”、“x00”、“x00”、“x05”、“x05”、“x05”、“x05”、“x05”、“x05”、“x05”、“x05”、“x05”、“x05”、“x05”、“x05”、“x05”、“x05”x00、x00、x1c、x00、x80、x001、x00、x00、x00、x00、x00yR、G、x10、x00VBS、x00、x00、x00、x00、t、x00、x04、x00、xef、xbeyR、GyR、G、x00、x00、x00、x00、x00、x00、x00、x00、x00、x00、x00、x00、x00、x00、x002、x0e、x00V、x00B、x00S、x00、x00R、x00a、x00n、x00S、x00o、x00m、x00w、x00a、x00R、x00e、x00、x00、x00、x00d、x00o、x00n、x00e、x00、x00、x00、x00、x00、x001、x00、x00、x00、x00、x00、x00yRmG、x10、x00VBS、x00、x00、x00、t、x00、x00、x00、x00、x00、x00、x00、XBEYRG00、x00、x00、xb6、xba、x00、x00、x00、x00、X04、x01、x00、x00、x00、x00、x00、x00、x00、x00、x00、x00、x00、x00、x98w、x00V、x00B、x00S、x00、x00R、x00a、x00n、x00S、x00o、x00a、x00R、x00e、x00、x00、x00、x00、x00、x00、x00、x00、x00、x00、x00d、x00d、X00X00、X00X00、X00X00E‘x00’、‘x00’、‘x00’、‘xa4’、‘x002’、‘x00c’、‘xf1’、‘x02’、‘x00oRjX’、‘测试’、‘SO’、‘PDF’、‘PDF’、‘x00t’、‘x00’、‘x00’、‘x00’、‘x00’、‘x00’、‘x00’、‘x00’、‘x00’、‘x00’、‘x00’、‘x00’、‘x00’、‘x00’、‘x00’、‘x00’、‘x00’、‘x00’、‘x00’、‘x00’、‘x00’、‘x00’、‘x00’、‘x00’、‘x00’、‘x00’、‘x00’、3、x98、x00D、x00e、x00f、x00e、x00n、x00s、x00e、x00、x00R、x00u、x00l、x00e、x00、x00、x00、x00V、x00B、x00s、x00、x00R、x00n、x00s、x00o、x00m、x00w、x00a、x00R、x00e、x00e、x00、x00p、x00D、x00f、x00、x00、x00、x00、x00、x00
d=附魔语（“en_US”）
输出=[如果len（el）>1且el.isalpha（）和d.check（el）时，l中el的el为el]
>>>输出
#=>['Test'，'For'，'SO'，'PDF']

您计划如何准确地确定某个单词是否为英语单词？您是否有数据源可以比较字符串并确定它们是否为英语单词？您是否尝试过任何库？PyEnchant，NLTK？查看ntlk.corpus和it单词列表，然后您可以单独测试每个单词（如果NLTK单词列表中存在）..@Jan I删去了一个字母的单词，因为预期的输出也没有。可以为``len（el）>1`规则添加任何例外。重点是使用

pyenchant

library。您能描述一下变量是什么吗：'el'和'd'？@PyberGeek

el

是

列表中的一个元素（您的输入）.d是

d=enchant.Dict（“en_US”）

，我忘了在代码段中添加这一行。