Python 如何确定字符串是否为英语单词?
我有一个输入字符串,其中一些不包含实际单词(例如,它包含数学公式,如Python 如何确定字符串是否为英语单词?,python,regex,string,split,Python,Regex,String,Split,我有一个输入字符串,其中一些不包含实际单词(例如,它包含数学公式,如x^2=y_2+4)。我想知道一种方法,通过我们是否有实际英语单词的子字符串来分割我的输入字符串。例如: 如果我的字符串是: “取:f(x)=\int_{0}^{1}z^3的导数,我们可以看到我们总是得到x^2=y_2+4,这与取g(x)的二重积分相同” 然后我想把它分成如下列表: ["Taking the derivative of: ", "f(x) = \int_{0}^{1} z^3, &q
x^2=y_2+4
)。我想知道一种方法,通过我们是否有实际英语单词的子字符串来分割我的输入字符串。例如:
如果我的字符串是:
“取:f(x)=\int_{0}^{1}z^3的导数,我们可以看到我们总是得到x^2=y_2+4,这与取g(x)的二重积分相同”
然后我想把它分成如下列表:
["Taking the derivative of: ", "f(x) = \int_{0}^{1} z^3, ", "we can see that we always get ", "x^2 = y_2 + 4 ", "which is the same as taking the double integral of ", "g(x)"]
我怎样才能做到这一点?我认为正则表达式对此不起作用,或者至少我不知道正则表达式中有任何方法可以检测英语单词的最长子字符串(包括逗号、句点、分号等)。U可以简单地使用
pyenchant
库,如本文所述:
输出:
True
['Taking', 'the', 'derivative', 'we', 'can', 'see', 'that', 'we', 'always', 'get', '4', 'which', 'is', 'the', 'same', 'as', 'taking', 'the', 'double', 'integral', 'of']
您可以通过在您的命令行中键入pip install pyenchant
来安装它。在您的情况下,您必须循环遍历字符串中的所有字符串,并检查当前字符串是否为英语单词。以下是执行此操作的完整代码:
import enchant
d = enchant.Dict("en_US")
string = "Taking the derivative of: f(x) = \int_{0}^{1} z^3, we can see that we always get x^2 = y_2 + 4 which is the same as taking the double integral of g(x)"
stringlst = string.split(' ')
wordlst = []
for string in stringlst:
if d.check(string):
wordlst.append(string)
print(wordlst)
输出:
True
['Taking', 'the', 'derivative', 'we', 'can', 'see', 'that', 'we', 'always', 'get', '4', 'which', 'is', 'the', 'same', 'as', 'taking', 'the', 'double', 'integral', 'of']
希望这有帮助 U只需使用后文中提到的
pyenchant
库即可:
输出:
True
['Taking', 'the', 'derivative', 'we', 'can', 'see', 'that', 'we', 'always', 'get', '4', 'which', 'is', 'the', 'same', 'as', 'taking', 'the', 'double', 'integral', 'of']
您可以通过在您的命令行中键入pip install pyenchant
来安装它。在您的情况下,您必须循环遍历字符串中的所有字符串,并检查当前字符串是否为英语单词。以下是执行此操作的完整代码:
import enchant
d = enchant.Dict("en_US")
string = "Taking the derivative of: f(x) = \int_{0}^{1} z^3, we can see that we always get x^2 = y_2 + 4 which is the same as taking the double integral of g(x)"
stringlst = string.split(' ')
wordlst = []
for string in stringlst:
if d.check(string):
wordlst.append(string)
print(wordlst)
输出:
True
['Taking', 'the', 'derivative', 'we', 'can', 'see', 'that', 'we', 'always', 'get', '4', 'which', 'is', 'the', 'same', 'as', 'taking', 'the', 'double', 'integral', 'of']
希望这有帮助 请检查我的答案是否满足你的要求。请检查我的答案是否满足你的要求。谢谢!这正是我需要的汉克斯!这正是我所需要的