Python 如何检查标记化句子列表中的特定单词,然后将它们标记为1或0?
我试图将列表中的特定单词映射到另一个标记化句子列表,如果在该句子中找到该单词,则我会将1附加到其类别列表,将0附加到其余类别列表。 例如:Python 如何检查标记化句子列表中的特定单词,然后将它们标记为1或0?,python,list,nltk,tokenize,Python,List,Nltk,Tokenize,我试图将列表中的特定单词映射到另一个标记化句子列表,如果在该句子中找到该单词,则我会将1附加到其类别列表,将0附加到其余类别列表。 例如: category_a=["stain","sweat","wet","burn"] category_b=["love","bad","favorite"] category_c=["packaging&quo
category_a=["stain","sweat","wet","burn"]
category_b=["love","bad","favorite"]
category_c=["packaging","delivery"]
tokenized_sentences=['this deodorant does not stain my clothes','i love this product','i sweat all day']
for i in category_a:
for j in tokenized_sentences:
if(i in nltk.word_tokenize(j)):
list_a.append(j)
tag_a,tag_b,tag_c=([],)*3
tag_a.append(1)
tag_b.append(0)
tag_c.append(0)
final=tag_a+tag_b+tag_c
b类和c类的情况类似
Expected output:this deodorant does not stain my clothes-->[1,0,0]
i love this product-->[0,1,0]
i sweat all day-->[1,0,0]
great fragrance-->[0,0,0]
每句话我都会得到重复的输出,比如:我爱这个产品-->[1,0,0]
我喜欢这个产品-->[1,0,0]和
也像这样:[我喜欢这个产品,我整天出汗]-->[0,1,0]
请帮助我解决此问题,并以所需格式获取输出。您的比较顺序已关闭-我无法获得此结果 你做的事情——你从不检查正确的事情 这就是它的工作原理:
import nltk
category_a=["stain","sweat","wet","burn"]
category_b=["love","bad","favorite"]
category_c=["packaging","delivery"]
tokenized_sentences=['this deodorant does not stain my clothes',
'i love this product','i sweat all day']
r = []
for j in tokenized_sentences:
r = []
for c in [category_a,category_b,category_c]:
print(nltk.word_tokenize(j), c) # just a debug print whats compared here
if any( w in c for w in nltk.word_tokenize(j)):
r.append(1)
else:
r.append(0)
print(r) # print the result
输出:
['this', 'deodorant', 'does', 'not', 'stain', 'my', 'clothes'] ['stain', 'sweat', 'wet', 'burn']
['this', 'deodorant', 'does', 'not', 'stain', 'my', 'clothes'] ['love', 'bad', 'favorite']
['this', 'deodorant', 'does', 'not', 'stain', 'my', 'clothes'] ['packaging', 'delivery']
[1, 0, 0]
['i', 'love', 'this', 'product'] ['stain', 'sweat', 'wet', 'burn']
['i', 'love', 'this', 'product'] ['love', 'bad', 'favorite']
['i', 'love', 'this', 'product'] ['packaging', 'delivery']
[0, 1, 0]
['i', 'sweat', 'all', 'day'] ['stain', 'sweat', 'wet', 'burn']
['i', 'sweat', 'all', 'day'] ['love', 'bad', 'favorite']
['i', 'sweat', 'all', 'day'] ['packaging', 'delivery']
[1, 0, 0]
[('this deodorant does not stain my clothes', [1, 0, 0]), ('i love this product', [0, 1, 0]), ('i sweat all day', [1, 0, 0])]
你的帖子需要澄清,但据我所知,这应该能起到作用:
category_b = ["love", "bad", "favorite"]
category_c = ["packaging", "delivery"]
sentences = ['this deodorant does not stain my clothes', 'i love this product', 'i sweat all day']
results = []
for sentence in sentances:
cat_a = 0
cat_b = 0
cat_c = 0
for word in sentance.split():
if cat_a == 0:
cat_a = 1 if word in category_a else 0
if cat_b == 0:
cat_b = 1 if word in category_b else 0
if cat_c == 0:
cat_c = 1 if word in category_c else 0
results.append((sentance, [cat_a, cat_b, cat_c]))
print(results)
这段代码将检查每个句子是否包含每个给定类别的单词,并以元组的形式保存它们(句子和结果)。所有元组都将附加到名为results的列表中
输出:
['this', 'deodorant', 'does', 'not', 'stain', 'my', 'clothes'] ['stain', 'sweat', 'wet', 'burn']
['this', 'deodorant', 'does', 'not', 'stain', 'my', 'clothes'] ['love', 'bad', 'favorite']
['this', 'deodorant', 'does', 'not', 'stain', 'my', 'clothes'] ['packaging', 'delivery']
[1, 0, 0]
['i', 'love', 'this', 'product'] ['stain', 'sweat', 'wet', 'burn']
['i', 'love', 'this', 'product'] ['love', 'bad', 'favorite']
['i', 'love', 'this', 'product'] ['packaging', 'delivery']
[0, 1, 0]
['i', 'sweat', 'all', 'day'] ['stain', 'sweat', 'wet', 'burn']
['i', 'sweat', 'all', 'day'] ['love', 'bad', 'favorite']
['i', 'sweat', 'all', 'day'] ['packaging', 'delivery']
[1, 0, 0]
[('this deodorant does not stain my clothes', [1, 0, 0]), ('i love this product', [0, 1, 0]), ('i sweat all day', [1, 0, 0])]
我看到你正在开始你的编程冒险。要了解如何更清楚地表达您的问题,请查看以下内容: