Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/359.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在不使用split函数的情况下从字符串中提取单词?_Python - Fatal编程技术网

Python 如何在不使用split函数的情况下从字符串中提取单词?

Python 如何在不使用split函数的情况下从字符串中提取单词?,python,Python,如何从字符串中提取单词,用标点符号、空格、数字等分隔这些单词…而不使用任何拆分、替换,或类似re的库。我仍在学习python,本书建议在不使用列表和字符串方法的情况下找到解决方案 Example Input : The@Tt11end Example Output: ["The", "Tt", "end"] 这是我迄今为止的尝试: def extract_words(sentence): words_list = [] separator = [",",".",";","'"

如何从字符串中提取单词,用标点符号、空格、数字等分隔这些单词…而不使用任何
拆分
替换
,或类似
re
的库。我仍在学习python,本书建议在不使用列表和字符串方法的情况下找到解决方案

Example Input : The@Tt11end
Example Output: ["The", "Tt", "end"]
这是我迄今为止的尝试:

def extract_words(sentence):

    words_list = []
    separator = [",",".",";","'","?","/","<",">","@","!","#","$","%","^","&","*","(",")","-","_","1","2","3","4","5","6","7","8","9"]
    counter= 0
    for i in range(len(sentence)):
        i=counter
        while(is_letter(sentence[i])):
            words+= sentence[i]
            i = i+1
            counter=counter+1
        words_list.append(words)
        words=""
    return words_list
编辑:这是我的
is_letter()
方法:

def is_letter(char):
    return ("A" <= char and char <= "Z") or \
    ("a" <= char and char <= "z")
def是字母(char):

return(“A”最好是在那里使用正则表达式,但如果您想要一些异国情调……这里是:

str = "The@Tt11end444sooqa"
delims = [0] + [i + 1 for i, s in enumerate(str) if not s.isalpha()] + [len(str) + 1]
parts = [str[delims[i]: delims[i + 1] - 1] for i in range(len(delims) - 1) if delims[i + 1] - delims[i] != 1]

扩展版可更好地了解正在发生的事情:

str = "The@Tt11end444sooqa"

# delims will contain indexes of all non-alphabetic characters
delims = [0]  # adding 0 index as first delimiter (start of string)
for i, s in enumerate(str):  # iterating through "str"
    if not s.isalpha():  # if character is non-alphabetic store it's index
        delims.append(i + 1)  # we add 1 to not include delimiter into final string
delims += [len(str) + 1]  # adding end of string index to not miss last part

# parts will contain parts of original string stored in "str"
parts = []
for i in range(len(delims) - 1):  #iterating over "delims" using indexes
    # do not include part if delimiters goes next one to another
    if delims[i + 1] - delims[i] != 1:
        substr = str[delims[i]: delims[i + 1] - 1]  # copy substring between delimiters
        parts.append(substr)

最好是在那里使用正则表达式,但如果您想要一些异国情调…这里是:

str = "The@Tt11end444sooqa"
delims = [0] + [i + 1 for i, s in enumerate(str) if not s.isalpha()] + [len(str) + 1]
parts = [str[delims[i]: delims[i + 1] - 1] for i in range(len(delims) - 1) if delims[i + 1] - delims[i] != 1]

扩展版可更好地了解正在发生的事情:

str = "The@Tt11end444sooqa"

# delims will contain indexes of all non-alphabetic characters
delims = [0]  # adding 0 index as first delimiter (start of string)
for i, s in enumerate(str):  # iterating through "str"
    if not s.isalpha():  # if character is non-alphabetic store it's index
        delims.append(i + 1)  # we add 1 to not include delimiter into final string
delims += [len(str) + 1]  # adding end of string index to not miss last part

# parts will contain parts of original string stored in "str"
parts = []
for i in range(len(delims) - 1):  #iterating over "delims" using indexes
    # do not include part if delimiters goes next one to another
    if delims[i + 1] - delims[i] != 1:
        substr = str[delims[i]: delims[i + 1] - 1]  # copy substring between delimiters
        parts.append(substr)
此代码用于:

def extract_words(sentence):
    sentence = list(sentence)
    words_list = []
    separator = [",",".",";","'","?","/","<",">","@","!","#","$","%","^","&","*","(",")","-","_","1","2","3","4","5","6","7","8","9"]
    bufferS = []
    for i in range(len(sentence)):
        if sentence[i] not in separator:
          bufferS.append(sentence[i])

        else:
          words_list.append(''.join(bufferS))
          bufferS = []
    words_list.append(''.join(bufferS))
    words_list = [x for x in words_list if x != '']     
    return words_list
它回来了

['aaaaaaa', 'bbbbbbb', 'ccccc', 'dddd']
没有使用库。

此代码执行以下操作:

def extract_words(sentence):
    sentence = list(sentence)
    words_list = []
    separator = [",",".",";","'","?","/","<",">","@","!","#","$","%","^","&","*","(",")","-","_","1","2","3","4","5","6","7","8","9"]
    bufferS = []
    for i in range(len(sentence)):
        if sentence[i] not in separator:
          bufferS.append(sentence[i])

        else:
          words_list.append(''.join(bufferS))
          bufferS = []
    words_list.append(''.join(bufferS))
    words_list = [x for x in words_list if x != '']     
    return words_list
它回来了

['aaaaaaa', 'bbbbbbb', 'ccccc', 'dddd']

没有使用库。

您的问题是每次都将
i
设置为
计数器
,并且它不会递增超过第一个非字母

它将每次递增,直到范围(len(句子))完成,但for It的每个循环将重置回原来的is_字母故障,在这种情况下,
i=3

例如

现在变量
i
将等于4,但是变量
计数器仍然等于3,因为它在while(is_字母)块内没有递增。在这方面更合适的用法是if/else,如下所示:

def extract_words(sentence):
    words_list = []
    words = ""
    for i in range(len(sentence)):
        if is_letter(sentence[i]):
            words += sentence[i]
        else:
            if words != "":
                words_list.append(words)
                words = ""
    if words != "":
        words_list.append(words)
    return words_list


def is_letter(char):
    return ("A" <= char and char <= "Z") or \
    ("a" <= char and char <= "z")

if __name__ == '__main__':
    print(extract_words("The@Tt11end"))
在此设置中,循环将仅使用i作为递增变量,因为它已经是for循环,并且在for上下文之外更改i值可能会导致问题,如您所见

下一次,每当字符串的字符是字母时,它都会被添加到word变量中。然后,如果下一个增量是符号,它会将该单词附加到列表中,并忽略符号/数字


最后,如果两个或多个符号相邻(这导致您得到一个空字符串列表
'
),它将检查单词是否已经包含任何字符,如果没有,它将继续下一个字符。

您的问题是每次都将
i
设置为
计数器,并且它不会递增超过第一个非字母

它将每次递增,直到范围(len(句子))完成,但for It的每个循环将重置回原来的is_字母故障,在这种情况下,
i=3

例如

现在变量
i
将等于4,但是变量
计数器仍然等于3,因为它在while(is_字母)块内没有递增。在这方面更合适的用法是if/else,如下所示:

def extract_words(sentence):
    words_list = []
    words = ""
    for i in range(len(sentence)):
        if is_letter(sentence[i]):
            words += sentence[i]
        else:
            if words != "":
                words_list.append(words)
                words = ""
    if words != "":
        words_list.append(words)
    return words_list


def is_letter(char):
    return ("A" <= char and char <= "Z") or \
    ("a" <= char and char <= "z")

if __name__ == '__main__':
    print(extract_words("The@Tt11end"))
在此设置中,循环将仅使用i作为递增变量,因为它已经是for循环,并且在for上下文之外更改i值可能会导致问题,如您所见

下一次,每当字符串的字符是字母时,它都会被添加到word变量中。然后,如果下一个增量是符号,它会将该单词附加到列表中,并忽略符号/数字


最后,如果两个或多个符号相邻(这导致您得到一个空字符串列表
'
),它将检查单词是否已经包含任何字符,如果不包含,它将继续到下一个字符。

只需对当前代码进行最小的更改,您就可以一次迭代字符串一个字符,并利用您已有的分隔符列表作为O(1)的集合查找时间。这将使您不必担心递增多个计数器变量:

def extract_words(sentence):
  separator_set = set([",",".",";","'","?","/","<",">","@","!","#","$","%","^","&","*","(",")","-","_","1","2","3","4","5","6","7","8","9"])

  words_list = []
  word = []
  for c in sentence:
    if c not in separator_set:
      word.append(c)
    else:
      if len(word) > 0:
        words_list.append(''.join(word))
        word = []

  if len(word) > 0:
    words_list.append(''.join(word))

  return words_list

def is_letter(char):
  return ("A" <= char and char <= "Z") or ("a" <= char and char <= "z")

def main():
  print(extract_words("The@Tt11end"))

if __name__ == '__main__':
  main()

通过对当前代码进行最小的更改,您可以一次迭代字符串一个字符,并利用您已有的分隔符列表作为O(1)查找时间的集合。这将使您不必担心递增多个计数器变量:

def extract_words(sentence):
  separator_set = set([",",".",";","'","?","/","<",">","@","!","#","$","%","^","&","*","(",")","-","_","1","2","3","4","5","6","7","8","9"])

  words_list = []
  word = []
  for c in sentence:
    if c not in separator_set:
      word.append(c)
    else:
      if len(word) > 0:
        words_list.append(''.join(word))
        word = []

  if len(word) > 0:
    words_list.append(''.join(word))

  return words_list

def is_letter(char):
  return ("A" <= char and char <= "Z") or ("a" <= char and char <= "z")

def main():
  print(extract_words("The@Tt11end"))

if __name__ == '__main__':
  main()

您的代码陷入了混乱,没有索引到给定的句子中

你只需要反复阅读句子中的字符

def is_letter(char):
    return ("A" <= char <= "Z") or ("a" <= char <= "z")

def extract_words(sentence):
    word = ""
    words_list = []
    for ch in sentence:
        if is_letter(ch):
            word += ch
        else:
            if word:
                words_list.append(word)
                word = ""
    if word:
        words_list.append(word)
    return words_list


print(extract_words('The@,Tt11end'))

代码会遍历
语句中的每个字符。如果是字母,则会将其添加到当前单词中。如果不是,则会将当前单词(如果有)添加到输出列表中。最后,如果最后一个字符是字母,则会剩下一个单词,该单词也会添加到输出中。

您的代码陷入了一个混乱,而不是对给定句子进行索引

你只需要反复阅读句子中的字符

def is_letter(char):
    return ("A" <= char <= "Z") or ("a" <= char <= "z")

def extract_words(sentence):
    word = ""
    words_list = []
    for ch in sentence:
        if is_letter(ch):
            word += ch
        else:
            if word:
                words_list.append(word)
                word = ""
    if word:
        words_list.append(word)
    return words_list


print(extract_words('The@,Tt11end'))

代码在
语句中的每个字符中进行迭代。如果是字母,则将其添加到当前单词中。如果不是,则将当前单词(如果有)添加到输出列表中。最后,如果最后一个字符是字母,则剩余的单词也将添加到输出中。

您发布的代码不会运行。您的代码作为posted不运行。好的,这更有用,但它仍然没有解释OP代码的错误。此代码与OP中的测试数据一起工作吗?@quamrana它现在工作了好的,这更有用,但它仍然没有解释OP代码的错误。此代码与OP中的测试数据一起工作吗什么?@quamrana是的now@ggorlen“现在好点了吗?”@ggorlen,现在好点了吗?