Python 在空格上拆分，某些字符之间除外_Python_String Parsing

Python 在空格上拆分，某些字符之间除外

python

Python 在空格上拆分，某些字符之间除外,python,string-parsing,Python,String Parsing,我正在分析一个文件，该文件包含以下行： type("book") title("golden apples") pages(10-35 70 200-234) comments("good read") 键入（“书籍”）标题（“金苹果”）页面（10-35 70 200-234）注释（“良好阅读”）我想把它分成几个单独的字段在我的示例中，有四个字段：类型、标题、页面和注释拆分后的预期结果为 ['type("book")', 'title("golden apples")', 'pages(1

我正在分析一个文件，该文件包含以下行：

type("book") title("golden apples") pages(10-35 70 200-234) comments("good read") 键入（“书籍”）标题（“金苹果”）页面（10-35 70 200-234）注释（“良好阅读”）我想把它分成几个单独的字段

在我的示例中，有四个字段：类型、标题、页面和注释

拆分后的预期结果为

['type("book")', 'title("golden apples")', 'pages(10-35 70 200-234)', 'comments("good read")] [“类型（“书籍”），“标题（“金苹果”），“页面（10-35 70 200-234）”，“评论”（“良好阅读”）] 很明显，简单的字符串拆分不起作用，因为它只会在每个空格处拆分。我想在空格上拆分，但保留括号和引号之间的任何内容

如何拆分此项？

此正则表达式应该适用于您

\s+（？=[^（）]*（？：\（|$）

解释

r"""
\s             # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?=            # Assert that the regex below can be matched, starting at this position (positive lookahead)
   [^()]          # Match a single character NOT present in the list “()”
      *              # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   (?:              # Match the regular expression below
                     # Match either the regular expression below (attempting the next alternative only if this one fails)
         \(             # Match the character “(” literally
      |              # Or match regular expression number 2 below (the entire group fails if this one fails to match)
         $              # Assert position at the end of a line (at the end of the string or before a line break character)
   )
)
"""

我会尝试用一种积极的态度来看待后面的断言

r'(?<=\))\s+'

r'（？在上拆分“
并将添加回每个元素（最后一个除外）。
让我添加一个非正则表达式解决方案：
line='键入（“书籍”）标题（“金苹果”）页面（10-35 70 200-234）注释（“良好阅读”）'
计数=0#括号计数器
最后一次中断=0#最后一次中断的索引
部分=[]
对于j，枚举中的字符（第行）：
如果字符为“（”：count+=1
elif字符为“'）：计数-=1
elif字符为“”，计数为0：
部分。追加（第[最后一行：（j）]）
最后一次中断=j+1
parts.append（第[last_break:]行）#添加最后一个元素
零件=元组（p表示零件中的p，如果p）#转换为元组并删除空零件
对于零件中的p：
印刷品（p）

一般来说，您会遇到某些问题，并且可能会受到严重的性能惩罚（特别是对于“向前看”和“向后看”），这可能会导致它们不是某个问题的最佳解决方案
还有,；我想我应该提到可以用来创建自定义文本解析器的模块。
Nice，尽管它似乎在返回的列表中添加了一些额外的括号（我也不确定它们来自哪里）。我正在使用py3。请尝试以下操作：re.split（r“\s+（？=[^（）]）*（？：\（$）”，subject）
@Keikoku修复了它。这是因为捕获组。您如何扩展它以支持圆括号（）和方括号[]？即忽略任何（匹配良好）括号之间的所有字符串如果输入文本中没有括号，例如test，那么这对括号就不起作用了。问题已经定义了格式。测试是不可能的。我最初问这个问题已经8年了，但我同意，使用解析器比regex更好，特别是对于括号之类的东西s和报价匹配。
r'(?<=\))\s+'

>>> import re
>>> result = re.split(r'(?<=\))\s+', 'type("book") title("golden apples") pages(10-35 70 200-234) comments("good read")')
>>> result
['type("book")', 'title("golden apples")', 'pages(10-35 70 200-234)', 'comments(
"good read")']