截断一个string，而不是在一个字的中间结束

我正在寻找一种方法来截断在Python中的string，不会切断字中间的string。

例如：

原文：“这真的很棒。”
 “哑”截断：“这是真的...”
 “聪明”截断：“这真的...”

我正在寻找一种方法来完成从上面的“聪明”截断。

实际上，我在最近的一个项目上写了一个解决scheme。我已经把它的大部分缩小了一点。

def smart_truncate(content, length=100, suffix='...'): if len(content) <= length: return content else: return ' '.join(content[:length+1].split(' ')[0:-1]) + suffix

if语句会检查您的内容是否已经小于临界点。如果不是，则截断到所需的长度，在空间上拆分，删除最后一个元素（以便不切断一个单词），然后将它连接在一起（同时加上'…'）。

Adam的解决scheme中最后一行的版本稍微好一些：

 return content[:length].rsplit(' ', 1)[0]+suffix

（这样稍微高效一些，在string前面没有空格的情况下返回一个更明智的结果。）

有几个微妙的东西可能会或可能不会成为你的问题，如处理标签（例如，如果你把它们显示为8个空格，但在内部将它们当作1个字符），处理各种不同风格的破坏和非破坏，打破空白，或允许打断连字符等。如果任何这是可取的，你可能想看看textwrap模块。例如：

 def truncate(text, max_size): if len(text) <= max_size: return text return textwrap.wrap(text, max_size-3)[0] + "..."

大于max_size的单词的默认行为是将它们分解（使max_size成为一个硬限制）。您可以通过将break_long_words = False传递给wrap（）来更改为某些其他解决scheme使用的软限制，在这种情况下，它将返回整个单词。如果你想这个行为改变最后一行：

  lines = textwrap.wrap(text, max_size-3, break_long_words=False) return lines[0] + ("..." if len(lines)>1 else "")

还有其他一些选项，例如expand_tabs，根据您想要的确切行为，您可能会感兴趣。

 def smart_truncate1(text, max_length=100, suffix='...'): """Returns a string of at most `max_length` characters, cutting only at word-boundaries. If the string was truncated, `suffix` will be appended. """ if len(text) > max_length: pattern = r'^(.{0,%d}\S)\s.*' % (max_length-len(suffix)-1) return re.sub(pattern, r'\1' + suffix, text) else: return text

要么

 def smart_truncate2(text, min_length=100, suffix='...'): """If the `text` is more than `min_length` characters long, it will be cut at the next word-boundary and `suffix`will be appended. """ pattern = r'^(.{%d,}?\S)\s.*' % (min_length-1) return re.sub(pattern, r'\1' + suffix, text)

要么

 def smart_truncate3(text, length=100, suffix='...'): """Truncates `text`, on a word boundary, as close to the target length it can come. """ slen = len(suffix) pattern = r'^(.{0,%d}\S)\s+\S+' % (length-slen-1) if len(text) > length: match = re.match(pattern, text) if match: length0 = match.end(0) length1 = match.end(1) if abs(length0+slen-length) < abs(length1+slen-length): return match.group(0) + suffix else: return match.group(1) + suffix return text

 >>> import textwrap >>> textwrap.wrap('The quick brown fox jumps over the lazy dog', 12) ['The quick', 'brown fox', 'jumps over', 'the lazy dog']

你只要拿出第一个元素，就完成了…

 def smart_truncate(s, width): if s[width].isspace(): return s[0:width]; else: return s[0:width].rsplit(None, 1)[0]

testing它：

 >>> smart_truncate('The quick brown fox jumped over the lazy dog.', 23) + "..." 'The quick brown fox...'

从Python 3.4+开始，你可以使用textwrap.shorten 。以OP为例：

 >>> import textwrap >>> original = "This is really awesome." >>> textwrap.shorten(original, width=20, placeholder="...") 'This is really...'

textwrap.shorten（文本，宽度，** kwargs）

折叠并截断给定的文本以适应给定的宽度。

首先，文本中的空格被折叠（所有的空格被单个空格replace）。如果结果符合宽度，则返回。否则，从结尾删除足够的单词，以便余下的单词加上占位符适合宽度：

截断一个string，而不是在一个字的中间结束

Postgresql截断速度

将多字节string截断为n个字符

用Rails截断string？

Unix shell脚本截断大文件

不能截断表，因为它正在被FOREIGN KEY约束引用。

Bootstrap 3以响应的方式在表格的行内截断长文本

什么是删除行的前N个字符的unix命令？

截断Postgres数据库中的所有表

SQL Server截断和8192限制

从Bash中的文件中删除最后一行