Python 从课文中找出每一个可能的单词_Python

Python 从课文中找出每一个可能的单词

python

Python 从课文中找出每一个可能的单词,python,Python,我有这样一个文本： text = "Conscious of its spiritual and moral heritage, the Union is founded on the indivisible, universal values of human dignity, freedom, equality and solidarity; it is based on the principles of democracy and the rule of law."

我有这样一个文本：

text = "Conscious of its spiritual and moral heritage, the Union is founded on the indivisible, universal values of human dignity, freedom, equality and solidarity; it is based on the principles of democracy and the rule of law."

我想有一个列表，从这个文本，其中包含的每一个字的文本没有重复，我想

使用Pandas，但我不知道如何使用数据帧。

您可以将其添加到集合中，这样就不会有任何重复项，如果不需要，请删除逗号：

words = set()
for word in text.split(" "):
    words.add(word.replace(',',''))
if ',' in words:
    words.remove(',')

在将单词添加到列表时，可以删除“，”。您还可以使用

orderedict

模块删除重复项

text = "Conscious of its spiritual and moral heritage, the Union is founded on the indivisible, universal values of human dignity, freedom, equality and solidarity; it is based on the principles of democracy and the rule of law. It places the individual at the heart of its activities, by establishing the citizenship of the Union and by creating an area of freedom, security and justice."
words = []
from collections import OrderedDict
for word in text.split(" "):
   words.append(word.strip(",")) #=== Remove ',' from word
list1=list(OrderedDict.fromkeys(words)) #=== Remove duplicates
print(list1)

这不是最有效的，但可以使用列表

text = "Conscious of its spiritual and moral heritage, the Union is founded on the indivisible, universal values of human dignity, freedom, equality and solidarity; it is based on the principles of democracy and the rule of law. It places the individual at the heart of its activities, by establishing the citizenship of the Union and by creating an area of freedom, security and justice."

words = []

def get_unique_words(text):
    # converts all alphabetical characters to lower
    lower_text = text.lower()
    # splits string on space character 
    split_text = lower_text.split(' ')

    # empty list to populate unique words
    results_list = []
    # iterate over the list
    for word in split_text:
        # check to see if value is already in results lists
        if word not in results_list:
            # append the word if it is unique
            results_list.append(word)
    return results_list

results = get_unique_words(text)

print(results)

印刷品

['conscious', 'of', 'its', 'spiritual', 'and', 'moral', 'heritage,', 'the', 'union', 'is', 'founded', 'on', 'indivisible,', 'universal', 'values', 'human', 'dignity,', 'freedom,', 'equality', 'solidarity;', 'it', 'based', 'principles', 'democracy', 'rule', 'law.', 'places', 'individual', 'at', 'heart', 'activities,', 'by', 'establishing', 'citizenship', 'creating', 'an', 'area', 'security', 'justice.']

这样，逗号就被删除了，但它有点不可读：

list(set(''.join(text.split(",")).split(" ")))

你的问题是什么？这不是一个讨论论坛或教程。请花点时间阅读和阅读该页面上的其他链接。你为什么要与熊猫合作？在这种情况下您不需要它。

words=[]

[words.append（word）for word in text.replace（'，'，''）.split（“”）

words您测试过吗？集合中还有逗号吗？现在更正！谢谢，2分钟后接受：）请以代码格式提供。我添加了text.split（“”）。strip（“”），以去掉逗号，谢谢

list(set(''.join(text.split(",")).split(" ")))