Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/342.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python语句字符串滑动窗口_Python_String_Python 2.7_Sliding Window - Fatal编程技术网

Python语句字符串滑动窗口

Python语句字符串滑动窗口,python,string,python-2.7,sliding-window,Python,String,Python 2.7,Sliding Window,我正在寻找一个滑动窗口拆分器,由窗口大小为N的单词组成 输入:“我喜欢食物,我喜欢饮料”,窗口大小3 输出:[“我爱食物”,“爱食物和”,“食物和我”,“我喜欢”…] 所有滑动窗口的建议都是围绕字符串的顺序,没有术语。有什么现成的吗?您可以使用具有不同偏移量的迭代器并压缩所有偏移量 def token_sliding_window(str, size): tokens = str.split(' ') for i in range(len(tokens )- size + 1):

我正在寻找一个滑动窗口拆分器,由窗口大小为N的单词组成

输入:“我喜欢食物,我喜欢饮料”,窗口大小3

输出:[“我爱食物”,“爱食物和”,“食物和我”,“我喜欢”…]


所有滑动窗口的建议都是围绕字符串的顺序,没有术语。有什么现成的吗?

您可以使用具有不同偏移量的迭代器并压缩所有偏移量

def token_sliding_window(str, size):
    tokens = str.split(' ')
    for i in range(len(tokens )- size + 1):
        yield tokens[i: i+size]
>>> arr = "I love food. blah blah".split()
>>> its = [iter(arr), iter(arr[1:]), iter(arr[2:])] #Construct the pattern for longer windowss
>>> zip(*its)
[('I', 'love', 'food.'), ('love', 'food.', 'blah'), ('food.', 'blah', 'blah')]
如果您有长句,或者可能是普通的旧循环(如另一个答案中所示),则可能需要使用。

一种基于订阅字符串序列的方法:

def split_on_window(sequence="I love food and I like drink", limit=4):
    results = []
    split_sequence = sequence.split()
    iteration_length = len(split_sequence) - (limit - 1)
    max_window_indicies = range(iteration_length)
    for index in max_window_indicies:
        results.append(split_sequence[index:index + limit])
    return results
样本输出:

>>> split_on_window("I love food and I like drink", 3)
['I', 'love', 'food']
['love', 'food', 'and']
['food', 'and', 'I']
['and', 'I', 'like']
['I', 'like', 'drink']
>>> list(split_on_window(s, 4))
[('I', 'love', 'food', 'and'), ('love', 'food', 'and', 'I'), 
('food', 'and', 'I', 'like'), ('and', 'I', 'like', 'drink')]
Sequence = I love food and I like drink, limit = 3
Repetitions = 1000000
Using subscripting -> 3.8326420784
Using izip -> 5.41380286217 # Modified to return a list for the benchmark.
以下是一个受@SuperSaiyan启发的备选答案:

from itertools import izip

def split_on_window(sequence, limit):
    split_sequence = sequence.split()
    iterators = [iter(split_sequence[index:]) for index in range(limit)]
    return izip(*iterators)
样本输出:

>>> split_on_window("I love food and I like drink", 3)
['I', 'love', 'food']
['love', 'food', 'and']
['food', 'and', 'I']
['and', 'I', 'like']
['I', 'like', 'drink']
>>> list(split_on_window(s, 4))
[('I', 'love', 'food', 'and'), ('love', 'food', 'and', 'I'), 
('food', 'and', 'I', 'like'), ('and', 'I', 'like', 'drink')]
Sequence = I love food and I like drink, limit = 3
Repetitions = 1000000
Using subscripting -> 3.8326420784
Using izip -> 5.41380286217 # Modified to return a list for the benchmark.
基准:

>>> split_on_window("I love food and I like drink", 3)
['I', 'love', 'food']
['love', 'food', 'and']
['food', 'and', 'I']
['and', 'I', 'like']
['I', 'like', 'drink']
>>> list(split_on_window(s, 4))
[('I', 'love', 'food', 'and'), ('love', 'food', 'and', 'I'), 
('food', 'and', 'I', 'like'), ('and', 'I', 'like', 'drink')]
Sequence = I love food and I like drink, limit = 3
Repetitions = 1000000
Using subscripting -> 3.8326420784
Using izip -> 5.41380286217 # Modified to return a list for the benchmark.

下面是我最后做的:def find_ngrams(input_list,n):返回zip(*[input_list[I:]表示范围(n)中的I)]