Python 如何检查字符串是否为返回类似isalpha（）的布尔值的泰语_Python_Regex_Thai

Python 如何检查字符串是否为返回类似isalpha（）的布尔值的泰语

python regex

Python 如何检查字符串是否为返回类似isalpha（）的布尔值的泰语,python,regex,thai,Python,Regex,Thai,我试图通过使用regex或任何可以解决问题的方法来检查str是否仅为泰语字符我想用 re.compile(u"[^\u0E00-\u0E7F']|^'|'$|''") ret = regexp_thai.sub("", s) 切分另一种语言或数字顺便说一下，它只是切片而不是返回布尔值我期望像这样的输出 s = "engภาษาไทยที่มีสระ123!@" regexp_thai = re.compile(u"[^\u0E00-\u0E7F']|^'|'$|''") ret = r

我试图通过使用

regex

或任何可以解决问题的方法来检查

str

是否仅为泰语字符

我想用

re.compile(u"[^\u0E00-\u0E7F']|^'|'$|''")
ret = regexp_thai.sub("", s)

切分另一种语言或数字顺便说一下，它只是切片而不是返回布尔值

我期望像这样的输出

s = "engภาษาไทยที่มีสระ123!@"
regexp_thai = re.compile(u"[^\u0E00-\u0E7F']|^'|'$|''") 
ret = regexp_thai.sub("", s)
print(ret)             # ภาษาไทยที่มีสระ
print(isthai(ret))     # True

u0E00-u0E7F

是泰语的unicode

如何编写

istai

函数

我不太确定所需的输出是什么。但是，我猜我们喜欢捕获Tai字母，基于您的原始表达式，我们可能只想添加一个简单的字符列表，用捕获组将其包装，然后从左到右滑动所需的Tai字母，可能类似于：

([\u0E00-\u0E7F]+)

试验演示

const regex=/（[\u0E00-\u0E7F]+）/gmu；
const str=`engภาษาไทยที่มีสระ123!@`;
让m；
while（（m=regex.exec（str））！==null）{
//这是避免具有零宽度匹配的无限循环所必需的
if（m.index==regex.lastIndex）{
regex.lastIndex++；
}
//可以通过'm`-变量访问结果。
m、 forEach（（匹配，组索引）=>{
log（`Found match，group${groupIndex}:${match}`）；
});
}

基本上

bool（重新匹配（“^[\u0E00-\u0E7F]*$”，test））

应计算为

True

iff

test

仅由泰语字符组成。对标点符号等进行微调是必要的。哇，这就是我想要的thx you@MichaelButscher它解决了！！这让我有了一个想法，但这不是我的愿望。对不起，这是一个很好的答案。你有功能代码。你可以看到正则表达式是如何工作的。你有一个网站链接，可以让你调整正则表达式，直到它完全符合你的要求。如果这是奥运会，而我是一名裁判，我会给你的答案打满分10分！

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"([\u0E00-\u0E7F]+)"

test_str = "engภาษาไทยที่มีสระ123!@"

matches = re.finditer(regex, test_str, re.MULTILINE | re.UNICODE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.