Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/cocoa/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 我需要将文档字符串句子转换为列表_Python_Nlp_Spacy - Fatal编程技术网

Python 我需要将文档字符串句子转换为列表

Python 我需要将文档字符串句子转换为列表,python,nlp,spacy,Python,Nlp,Spacy,输入文件为: l1 = ['Passing much less urine', 'Bleeding from any body part', 'Feeling extremely lethargic/weak', 'Excessive sleepiness/restlessness', 'Altered mental status', 'Seizure/fits', 'Breathlessness', 'Blood in sputum', 'Chest pain', 'Sound/noise i

输入文件为:

l1 = ['Passing much less urine', 'Bleeding from any body part', 'Feeling extremely lethargic/weak', 'Excessive sleepiness/restlessness', 'Altered mental status', 'Seizure/fits', 'Breathlessness', 'Blood in sputum', 'Chest pain', 'Sound/noise in breathing', 'Drooling of saliva', 'Difficulty in opening mouth']


k=[]
for n in range(0,len(l1)):
    e = l1[n]
    doc =nlp(e)
    for token in doc:
        if token.lemma_ != "-PRON-":
            temp = token.lemma_.lower().strip()
        else:
            temp = token.lower_
        k.append(temp)
    cleaned_tokens = []
    t = []
    d = []
    
    for token in k:
        li = []
        if token not in stopwords and token not in punct:
            cleaned_tokens.append(token)
            
        li= " ".join(cleaned_tokens)
    t.append(li)
    print(t)
此代码提供以下输出:

['pass urine']
['pass urine bleed body']
['pass urine bleed body feel extremely lethargic weak']
但我需要的输出应该是:

["pass urine", "bleed body", "feel extremely lethargic weak"]

建议我如何获得此结果。

这将产生您想要的结果:

import spacy
nlp = spacy.load("en_core_web_md")

l1 = ['Passing much less urine', 'Bleeding from any body part', 'Feeling extremely lethargic/weak', 'Excessive sleepiness/restlessness', 'Altered mental status', 'Seizure/fits', 'Breathlessness', 'Blood in sputum', 'Chest pain', 'Sound/noise in breathing', 'Drooling of saliva', 'Difficulty in opening mouth']
docs = nlp.pipe(l1)

t= []
for doc in docs:
    clean_doc = " ".join([tok.text.lower() for tok in doc if not tok.is_stop and not tok.is_punct])
    t.append(clean_doc)           

print(t)

['passing urine', 'bleeding body', 'feeling extremely lethargic weak', 'excessive sleepiness restlessness', 'altered mental status', 'seizure fits', 'breathlessness', 'blood sputum', 'chest pain', 'sound noise breathing', 'drooling saliva', 'difficulty opening mouth']
如果你需要引理:

t= []
for doc in docs:
    clean_doc = " ".join([tok.lemma_.lower() for tok in doc if not tok.is_stop and not tok.is_punct])
    t.append(clean_doc)           

print(t)
['pass urine', 'bleed body', 'feel extremely lethargic weak', 'excessive sleepiness restlessness', 'alter mental status', 'seizure fit', 'breathlessness', 'blood sputum', 'chest pain', 'sound noise breathing', 'drool saliva', 'difficulty open mouth']

这将产生您想要的结果:

import spacy
nlp = spacy.load("en_core_web_md")

l1 = ['Passing much less urine', 'Bleeding from any body part', 'Feeling extremely lethargic/weak', 'Excessive sleepiness/restlessness', 'Altered mental status', 'Seizure/fits', 'Breathlessness', 'Blood in sputum', 'Chest pain', 'Sound/noise in breathing', 'Drooling of saliva', 'Difficulty in opening mouth']
docs = nlp.pipe(l1)

t= []
for doc in docs:
    clean_doc = " ".join([tok.text.lower() for tok in doc if not tok.is_stop and not tok.is_punct])
    t.append(clean_doc)           

print(t)

['passing urine', 'bleeding body', 'feeling extremely lethargic weak', 'excessive sleepiness restlessness', 'altered mental status', 'seizure fits', 'breathlessness', 'blood sputum', 'chest pain', 'sound noise breathing', 'drooling saliva', 'difficulty opening mouth']
如果你需要引理:

t= []
for doc in docs:
    clean_doc = " ".join([tok.lemma_.lower() for tok in doc if not tok.is_stop and not tok.is_punct])
    t.append(clean_doc)           

print(t)
['pass urine', 'bleed body', 'feel extremely lethargic weak', 'excessive sleepiness restlessness', 'alter mental status', 'seizure fit', 'breathlessness', 'blood sputum', 'chest pain', 'sound noise breathing', 'drool saliva', 'difficulty open mouth']

无法测试您的代码,但我会删除最后3行,并首先在外部打印干净的\u令牌,因为我无法测试您的代码,但我会删除最后3行并首先在外部打印干净的\u标记,以便进行循环。你可以通过应用输出柠檬化来升级你的答案吗。@AkashBhandari按照你的意愿完成。你可以通过应用输出柠檬化来升级你的答案吗。@AkashBhandari按照你的意愿完成。