Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/337.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何将包含18000个句子的文本分割成块_Python_String - Fatal编程技术网

Python 如何将包含18000个句子的文本分割成块

Python 如何将包含18000个句子的文本分割成块,python,string,Python,String,我如何将一篇由18000个句子组成的文本分割成块,每个块应该包含2000个句子(或多或少) 我知道如何按字符将文本分成相等的块,但这不是我需要的: s = "Wrap a single paragraph of text, returning a list of wrapped lines. Reformat the single paragraph in 'text' so it fits in lines of n more than 'width' columns, and re

我如何将一篇由18000个句子组成的文本分割成块,每个块应该包含2000个句子(或多或少)

我知道如何按字符将文本分成相等的块,但这不是我需要的:

s = "Wrap a single paragraph of text, returning a list of wrapped lines. Reformat the single paragraph in 'text' so it fits in lines of n more than 'width' columns, and return a list of wrapped lines."
list(map(''.join, zip(*[iter(s)]*2)))
请尝试以下操作:

l=s.split('.')
res={}
for i in range(len(l)//2000+1):
    res[i]=l[i:i+2000]
请尝试以下操作:

l=s.split('.')
res={}
for i in range(len(l)//2000+1):
    res[i]=l[i:i+2000]

将列表切成块

SLICE_SIZE = 3 #TODO modify per your needs
doc = '''Hello python.Bon.jack.jim.summit.How can I split a text consisting of 18000 sentences into chunks, each one should contain 2000 sentences (more or less).
I know how to split a text into equal chunks by characters, however this is not what I need.Note that these methods are looked up on the type (metaclass) of a class. They cannot be defined as class methods in the actual class. This is consistent with the lookup of special methods that are called on instances, only in this case the instance is itself a class'''

chunks = []
parts = doc.split('.')
reminder = len(parts) % SLICE_SIZE
chunks_count = int((len(parts) - reminder) / SLICE_SIZE)
for x in range(chunks_count):
    chunks.append(parts[x * SLICE_SIZE: (x+1) * SLICE_SIZE])
if reminder:
    chunks.append(parts[-reminder:])
print(chunks)

将列表切成块

SLICE_SIZE = 3 #TODO modify per your needs
doc = '''Hello python.Bon.jack.jim.summit.How can I split a text consisting of 18000 sentences into chunks, each one should contain 2000 sentences (more or less).
I know how to split a text into equal chunks by characters, however this is not what I need.Note that these methods are looked up on the type (metaclass) of a class. They cannot be defined as class methods in the actual class. This is consistent with the lookup of special methods that are called on instances, only in this case the instance is itself a class'''

chunks = []
parts = doc.split('.')
reminder = len(parts) % SLICE_SIZE
chunks_count = int((len(parts) - reminder) / SLICE_SIZE)
for x in range(chunks_count):
    chunks.append(parts[x * SLICE_SIZE: (x+1) * SLICE_SIZE])
if reminder:
    chunks.append(parts[-reminder:])
print(chunks)

你如何定义一个句子?你可以用('..')作为第一个词来拆分文本step@balderman. 按点拆分
@IoaTzimas:当然可以,但下一步是什么?你如何定义一个句子?你可以先按('.')拆分文本step@balderman. 被点分开
@IoaTzimas:当然,但是下一步怎么办?我想在这种情况下我可能会丢失一些文本。没有?Sry,我已经更新了代码。所有文本现在都将包含在内。我想在这种情况下,我可能会丢失部分文本。没有?Sry,我已经更新了代码。现在将包括所有文本