Python 如何在正则表达式中划分一个换行和两个换行?

Python 如何在正则表达式中划分一个换行和两个换行?,python,regex,newline,Python,Regex,Newline,我想通过以下方式对正则表达式的输出进行分组: 换行符“\n” 两个换行符“\n\n” 为了使用其他正则表达式拆分方法,我如何将其分成两组 查找单独的换行或我管理的两个换行。 例如: Facebook和谷歌利用了一项功能 适用于“企业开发人员”来 分发收集大量信息的应用程序 TechCrunch首次报告了关于私人用户的数据 苹果的策略被一些人形容为公司实力的一次令人毛骨悚然的展示 Verge总编辑Nilay Patel在一条推特中表示,这是一个值得关注的问题:首先,他们来取我们的企业证书,然后……

我想通过以下方式对正则表达式的输出进行分组:

  • 换行符“\n”
  • 两个换行符“\n\n”
  • 为了使用其他正则表达式拆分方法,我如何将其分成两组

    查找单独的换行或我管理的两个换行。 例如:

    Facebook和谷歌利用了一项功能
    适用于“企业开发人员”来
    分发收集大量信息的应用程序
    TechCrunch首次报告了关于私人用户的数据
    苹果的策略被一些人形容为公司实力的一次令人毛骨悚然的展示
    Verge总编辑Nilay Patel在一条推特中表示,这是一个值得关注的问题:首先,他们来取我们的企业证书,然后……嗯,到底是什么?\uu(\ n\n)\u)
    一些文本等等。。。
    
    我尝试了以下代码:

    def find_newlines(file):
        with open(file, "r") as content:
           text = content.read()
           content = re.split("\n+", text)
        return content
    
    结果是:

    ['Apple' , 'Something', 'Enything']
    
    我想要以下输出:

    ['Facebook and Google exploited a feature intended for “enterprise developers” to distribute apps that collect large amounts of data on private users, TechCrunch first reported.' __,__ 'Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power. Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?']
    
    我想买一组新线
    和2组两个换行符

    您似乎试图将文本分组为两个(或更多)由双换行符分隔的块。因此,一种方法是首先在
    \n\n
    上拆分文本。这将导致仍然包含单个换行符的
    块。然后,每个块都可以用空格替换任何剩余的换行符。这一切都可以使用Python列表理解来完成,如下所示:

    text = """Facebook and Google exploited a feature
    intended for “enterprise developers” to
    distribute apps that collect large amounts
    of data on private users, TechCrunch first reported.
    
    Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power.
    Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?"""
    
    content = [block.replace('\n', ' ') for block in text.split('\n\n')]
    
    print(content)
    
    import re
    
    text = """Facebook and Google exploited a feature
    intended for “enterprise developers” to
    distribute apps that collect large amounts
    of data on private users, TechCrunch first reported.
    
    
    
    Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power.
    Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?"""
    
    content = [block.replace('\n', ' ') for block in re.split('\n{2,}', text)]
    
    print(content)
    
    给您一个包含两个条目且没有换行符的列表:

    ['Facebook and Google exploited a feature intended for “enterprise developers” to distribute apps that collect large amounts of data on private users, TechCrunch first reported.', 'Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power. Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?']
    

    正则表达式可用于块由两个或多个空行分隔的情况,如下所示:

    text = """Facebook and Google exploited a feature
    intended for “enterprise developers” to
    distribute apps that collect large amounts
    of data on private users, TechCrunch first reported.
    
    Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power.
    Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?"""
    
    content = [block.replace('\n', ' ') for block in text.split('\n\n')]
    
    print(content)
    
    import re
    
    text = """Facebook and Google exploited a feature
    intended for “enterprise developers” to
    distribute apps that collect large amounts
    of data on private users, TechCrunch first reported.
    
    
    
    Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power.
    Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?"""
    
    content = [block.replace('\n', ' ') for block in re.split('\n{2,}', text)]
    
    print(content)
    

    你能说明输出是什么样子的吗?你能澄清你所说的组是什么意思吗?是否要两个列表,一个包含被至少一个字符分隔的所有行,另一个包含被两个字符分隔的行?您发布的结果也与您的示例不匹配,使用您当前的代码,您的结果应该是一个包含七个元素的列表:[“脸谱网…”、“预期…”、“分布式…”、“数据…”、“苹果…”、“边缘…”,一些文本…],对于该文本,您希望
    内容
    看起来像什么?您可以使用该按钮对问题进行改进。输出类似于“FOCUS\n Transform\n Deliver\n”,这就是您想要的输出?Martin,我使用了代码。做得好。我想知道我们能不能用正则表达式。哪一个更好?如果文本有时包含两个以上的换行符,而不是正好两个用于分隔块的换行符,则正则表达式版本将非常有用。在可能的情况下,我倾向于使用非正则表达式解决方案。