Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/295.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/gwt/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何确定字符串是否为英语单词?_Python_Regex_String_Split - Fatal编程技术网

Python 如何确定字符串是否为英语单词?

Python 如何确定字符串是否为英语单词?,python,regex,string,split,Python,Regex,String,Split,我有一个输入字符串,其中一些不包含实际单词(例如,它包含数学公式,如x^2=y_2+4)。我想知道一种方法,通过我们是否有实际英语单词的子字符串来分割我的输入字符串。例如: 如果我的字符串是: “取:f(x)=\int_{0}^{1}z^3的导数,我们可以看到我们总是得到x^2=y_2+4,这与取g(x)的二重积分相同” 然后我想把它分成如下列表: ["Taking the derivative of: ", "f(x) = \int_{0}^{1} z^3, &q

我有一个输入字符串,其中一些不包含实际单词(例如,它包含数学公式,如
x^2=y_2+4
)。我想知道一种方法,通过我们是否有实际英语单词的子字符串来分割我的输入字符串。例如:

如果我的字符串是:

“取:f(x)=\int_{0}^{1}z^3的导数,我们可以看到我们总是得到x^2=y_2+4,这与取g(x)的二重积分相同”

然后我想把它分成如下列表:

["Taking the derivative of: ", "f(x) = \int_{0}^{1} z^3, ", "we can see that we always get ", "x^2 = y_2 + 4 ", "which is the same as taking the double integral of ", "g(x)"]

我怎样才能做到这一点?我认为正则表达式对此不起作用,或者至少我不知道正则表达式中有任何方法可以检测英语单词的最长子字符串(包括逗号、句点、分号等)。

U可以简单地使用
pyenchant
库,如本文所述:

输出:

True
['Taking', 'the', 'derivative', 'we', 'can', 'see', 'that', 'we', 'always', 'get', '4', 'which', 'is', 'the', 'same', 'as', 'taking', 'the', 'double', 'integral', 'of']
您可以通过在您的命令行中键入
pip install pyenchant
来安装它。在您的情况下,您必须循环遍历字符串中的所有字符串,并检查当前字符串是否为英语单词。以下是执行此操作的完整代码:

import enchant
d = enchant.Dict("en_US")

string = "Taking the derivative of: f(x) = \int_{0}^{1} z^3, we can see that we always get x^2 = y_2 + 4 which is the same as taking the double integral of g(x)"

stringlst = string.split(' ')
wordlst = []

for string in stringlst:
    if d.check(string):
        wordlst.append(string)

print(wordlst)
输出:

True
['Taking', 'the', 'derivative', 'we', 'can', 'see', 'that', 'we', 'always', 'get', '4', 'which', 'is', 'the', 'same', 'as', 'taking', 'the', 'double', 'integral', 'of']

希望这有帮助

U只需使用后文中提到的
pyenchant
库即可:

输出:

True
['Taking', 'the', 'derivative', 'we', 'can', 'see', 'that', 'we', 'always', 'get', '4', 'which', 'is', 'the', 'same', 'as', 'taking', 'the', 'double', 'integral', 'of']
您可以通过在您的命令行中键入
pip install pyenchant
来安装它。在您的情况下,您必须循环遍历字符串中的所有字符串,并检查当前字符串是否为英语单词。以下是执行此操作的完整代码:

import enchant
d = enchant.Dict("en_US")

string = "Taking the derivative of: f(x) = \int_{0}^{1} z^3, we can see that we always get x^2 = y_2 + 4 which is the same as taking the double integral of g(x)"

stringlst = string.split(' ')
wordlst = []

for string in stringlst:
    if d.check(string):
        wordlst.append(string)

print(wordlst)
输出:

True
['Taking', 'the', 'derivative', 'we', 'can', 'see', 'that', 'we', 'always', 'get', '4', 'which', 'is', 'the', 'same', 'as', 'taking', 'the', 'double', 'integral', 'of']

希望这有帮助

请检查我的答案是否满足你的要求。请检查我的答案是否满足你的要求。谢谢!这正是我需要的汉克斯!这正是我所需要的