在一个单词中间截断一个字符串我正在寻找一种在Python中截断字符串的方法，它不会切断单词中间的字符串。_Python_Truncate

在一个单词中间截断一个字符串我正在寻找一种在Python中截断字符串的方法，它不会切断单词中间的字符串。

python

在一个单词中间截断一个字符串我正在寻找一种在Python中截断字符串的方法，它不会切断单词中间的字符串。,python,truncate,Python,Truncate,例如： Original: "This is really awesome." "Dumb" truncate: "This is real..." "Smart" truncate: "This is really..." 原文：“这真是太棒了。” “哑巴”截断：“这是真的……” “聪明”截断：“这真的是……” 我正在寻找一种从上面完成“智能”截断的方法。实际上，我在最近的一个项目中为此编写了一个解决方案。我把它的大部分压缩成更小的一点 def smart_trunca

例如：

Original: "This is really awesome." "Dumb" truncate: "This is real..." "Smart" truncate: "This is really..." 原文：“这真是太棒了。” “哑巴”截断：“这是真的……” “聪明”截断：“这真的是……”

我正在寻找一种从上面完成“智能”截断的方法。

实际上，我在最近的一个项目中为此编写了一个解决方案。我把它的大部分压缩成更小的一点

def smart_truncate(content, length=100, suffix='...'):
    if len(content) <= length:
        return content
    else:
        return ' '.join(content[:length+1].split(' ')[0:-1]) + suffix

def smart_truncate（内容，长度=100，后缀='…'）：
如果len（content）这是Adam解决方案中最后一行的一个稍微好一点的版本：
return content[:length].rsplit(' ', 1)[0]+suffix

（如果字符串前面没有空格，这会稍微提高效率，并返回更合理的结果。）
测试它：
>>> smart_truncate('The quick brown fox jumped over the lazy dog.', 23) + "..."
'The quick brown fox...'

或
或
def smart_truncate3（文本，长度=100，后缀='…'）：
“”“截断单词边界上的'text'，尽可能接近
它可以达到的目标长度。
"""
slen=len（后缀）
pattern=r'^（.{0，%d}\S）\S+\S+'%（length-slen-1）
如果长度（文本）>长度：
匹配=重新匹配（模式、文本）
如果匹配：
长度0=匹配。结束（0）
长度1=匹配。结束（1）
如果abs（长度0+slen长度）
有一些细微之处可能是您的问题，也可能不是您的问题，例如标签的处理（例如，如果您将标签显示为8个空格，但在内部将其视为1个字符），处理各种各样的分隔和非分隔空白，或允许断开连字号等。如果需要任何一种方式，您可能想看看textwrap模块。例如：
def truncate(text, max_size):
    if len(text) <= max_size:
        return text
    return textwrap.wrap(text, max_size-3)[0] + "..."

根据您想要的确切行为，还有一些其他选项（如展开选项卡）可能会引起您的兴趣
>>> import textwrap
>>> textwrap.wrap('The quick brown fox jumps over the lazy dog', 12)
['The quick', 'brown fox', 'jumps over', 'the lazy dog']

您只需获取其中的第一个元素，就完成了…
您可以使用Python 3.4+。以OP为例：
>>> import textwrap
>>> original = "This is really awesome."
>>> textwrap.shorten(original, width=20, placeholder="...")
'This is really...'

text wrap.shorten（文本，宽度，**kwargs）
折叠并截断给定文本以适应给定的宽度
首先，文本中的空白被折叠（所有空白被单个空格替换）。如果结果与宽度相符，则为
返回。否则，将从末尾删除足够多的单词，以便
剩余单词加上占位符与宽度相符：
对于Python3.4+，我将使用
对于旧版本：
def truncate(description, max_len=140, suffix='…'):    
    description = description.strip()
    if len(description) <= max_len:
        return description
    new_description = ''
    for word in description.split(' '):
      tmp_description = new_description + word
      if len(tmp_description) <= max_len-len(suffix):
          new_description = tmp_description + ' '
      else:
          new_description = new_description.strip() + suffix
          break
    return new_description

def truncate（说明，最大长度=140，后缀='…）：
description=description.strip（）
如果len（description）如果你实际上更喜欢用完整的句子而不是单词来截断，那么这里有一些东西可以开始：
def smart_truncate_by_sentence(content, length=100, suffix='...',):
    if not isinstance(content,str): return content
    if len(content) <= length:
        return content
    else:
        sentences=content.split('.')
        cs=np.cumsum([len(s) for s in sentences])
        n = max(1,  len(cs[cs<length]) )
        return '.'.join(sentences[:n])+ '. ...'*(n<len(sentences))

def smart_按句子截断（内容，长度=100，后缀='…'，）：
如果不是isinstance（content，str）：返回内容
如果是len（内容），那么rsplit很有趣。这两种方法（Python 2.4.3）的快速测试：Adam的代码：>>>>smart_truncate（‘敏捷的棕色狐狸跳过了懒狗’，，26）“敏捷的棕色狐狸跳过了……”使用bobince的代码：>>>smart_truncate（‘敏捷的棕色狐狸跳过了懒狗’，，26）敏捷的棕色狐狸……这个更好。但是我会把它放在if下，跳过else，它更像是pythonix。那么，让我们使用条件表达式：def smart_truncate（content，length=100，suffix='…'）：return（content if len（content），所以让我们确定，得到的字符串不会比length长：return content if len（content），我一直喜欢基于regex的解决方案：）这（至少是最上面的解决方案）甚至适用于没有空格的字符串（然后它会切割单词边界），尽管在这种情况下它不会添加后缀：）注意：如果width>len（s），则会在s[width]上得到一个越界。您可能需要对不需要截断的情况进行额外检查。这非常简洁。。。在第一个“长度”字符中根本没有空格的情况下，我将添加一个测试来避免空字符串。截断必须考虑后缀长度：<代码>返回'.'联接（内容[[长度+1LeN（后缀）] ]。拆分（''）（0：-1）] +后缀< /代码>这里有一个拐角的情况，可以咬某人：如果<代码>内容[]长度+ 1 ]。
恰好以空格结尾，返回的字符串将长于长度
。@Stan评论中的content[：length+1-len（后缀）
。@Adam在11年前回答得很好，但也很持久。感谢您为我们节省了大量搜索和代码错误：-）这很旧，但很有用。建议可能在连接后添加一个rstrip？“”。连接（content[：length+1]。拆分（“”）[0:-1]）。rstrip（）+后缀
否则，您可能会以类似于“你好，今天过得怎么样…”的内容结束。
textwrap。缩短（“你好，世界”，宽度=10，占位符=“…””）将生成“你好…”我刚尝试过这个方法，它在字形聚类的中间中断了，所以它甚至没有做正确的字符分解，更不用说断字了。
    lines = textwrap.wrap(text, max_size-3, break_long_words=False)
    return lines[0] + ("..." if len(lines)>1 else "")

>>> import textwrap
>>> textwrap.wrap('The quick brown fox jumps over the lazy dog', 12)
['The quick', 'brown fox', 'jumps over', 'the lazy dog']

>>> import textwrap
>>> original = "This is really awesome."
>>> textwrap.shorten(original, width=20, placeholder="...")
'This is really...'

def truncate(description, max_len=140, suffix='…'):    
    description = description.strip()
    if len(description) <= max_len:
        return description
    new_description = ''
    for word in description.split(' '):
      tmp_description = new_description + word
      if len(tmp_description) <= max_len-len(suffix):
          new_description = tmp_description + ' '
      else:
          new_description = new_description.strip() + suffix
          break
    return new_description

def smart_truncate_by_sentence(content, length=100, suffix='...',):
    if not isinstance(content,str): return content
    if len(content) <= length:
        return content
    else:
        sentences=content.split('.')
        cs=np.cumsum([len(s) for s in sentences])
        n = max(1,  len(cs[cs<length]) )
        return '.'.join(sentences[:n])+ '. ...'*(n<len(sentences))