Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/331.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何提高拆分列表的速度?_Python_Performance_List_Pandas - Fatal编程技术网

Python 如何提高拆分列表的速度?

Python 如何提高拆分列表的速度?,python,performance,list,pandas,Python,Performance,List,Pandas,我只是想提高拆分列表的速度。现在我有了拆分列表的方法,但是速度没有我预期的快 def split_list(lines): return [x for xs in lines for x in xs.split('-')] import time lst= [] for i in range(1000000): lst.append('320000-320000') start=time.clock() lst_new=split_list(lst) end=tim

我只是想提高拆分列表的速度。现在我有了拆分列表的方法,但是速度没有我预期的快

def split_list(lines):
        return [x for xs in lines for x in xs.split('-')]

import time

lst= []
for i in range(1000000):
    lst.append('320000-320000')

start=time.clock()
lst_new=split_list(lst)
end=time.clock()
print('time\n',str(end-start))
例如,
输入

lst
 ['320000-320000', '320000-320000']
lst_new
 ['320000', '320000', '320000', '320000']
df
 id                                         line
0    1  320000-320000, 340000-320000, 320000-340000
1    2                                380000-320000
2    3                  380000-320000,380000-310000
3    4    370000-320000,370000-320000,320000-320000
4    5  320000-320000, 340000-320000, 320000-340000
5    6                                380000-320000
6    7                  380000-320000,380000-310000
7    8    370000-320000,370000-320000,320000-320000
8    9  320000-320000, 340000-320000, 320000-340000
9   10                                380000-320000
10  11                  380000-320000,380000-310000
11  12    370000-320000,370000-320000,320000-320000
df
 id  line_start  line_destination
0    1     320000    320000
1    2     380000    320000
2    3     380000    320000
3    4     370000    320000
4    5     320000    320000
5    6     380000    320000
6    7     380000    320000
7    8     370000    320000
8    9     320000    320000
9   10     380000    320000
10  11     380000    320000
11  12     370000    320000
输出

lst
 ['320000-320000', '320000-320000']
lst_new
 ['320000', '320000', '320000', '320000']
df
 id                                         line
0    1  320000-320000, 340000-320000, 320000-340000
1    2                                380000-320000
2    3                  380000-320000,380000-310000
3    4    370000-320000,370000-320000,320000-320000
4    5  320000-320000, 340000-320000, 320000-340000
5    6                                380000-320000
6    7                  380000-320000,380000-310000
7    8    370000-320000,370000-320000,320000-320000
8    9  320000-320000, 340000-320000, 320000-340000
9   10                                380000-320000
10  11                  380000-320000,380000-310000
11  12    370000-320000,370000-320000,320000-320000
df
 id  line_start  line_destination
0    1     320000    320000
1    2     380000    320000
2    3     380000    320000
3    4     370000    320000
4    5     320000    320000
5    6     380000    320000
6    7     380000    320000
7    8     370000    320000
8    9     320000    320000
9   10     380000    320000
10  11     380000    320000
11  12     370000    320000
我对拆分的速度不满意,因为我的数据包含许多列表

但现在我不知道是否有更有效的方法


根据建议,我试图更具体地描述我的整个问题

import pandas as pd

df = pd.DataFrame({ 'line':["320000-320000, 340000-320000, 320000-340000",
                            "380000-320000",
                            "380000-320000,380000-310000",
                            "370000-320000,370000-320000,320000-320000",
                            "320000-320000, 340000-320000, 320000-340000",
                            "380000-320000",
                            "380000-320000,380000-310000",
                            "370000-320000,370000-320000,320000-320000",
                            "320000-320000, 340000-320000, 320000-340000",
                            "380000-320000",
                            "380000-320000,380000-310000",
                            "370000-320000,370000-320000,320000-320000"], 'id':[1,2,3,4,5,6,7,8,9,10,11,12],})

def most_common(lst):
    return max(set(lst), key=lst.count)

def split_list(lines):
    return [x for xs in lines for x in xs.split('-')]

df['line']=df['line'].str.split(',')
col_ix=df['line'].index.values
df['line_start'] = pd.Series(0, index=df.index)
df['line_destination'] = pd.Series(0, index=df.index)

import time 
start=time.clock()

for ix in col_ix:
    col=df['line'][ix]
    col_split=split_list(col)
    even_col_split=col_split[0:][::2]
    even_col_split_most=most_common(even_col_split)
    df['line_start'][ix]=even_col_split_most

    odd_col_split=col_split[1:][::2]

    odd_col_split_most=most_common(odd_col_split)
    df['line_destination'][ix]=odd_col_split_most

end=time.clock()
print('time\n',str(end-start))

del df['line']
print('df\n',df)
输入

lst
 ['320000-320000', '320000-320000']
lst_new
 ['320000', '320000', '320000', '320000']
df
 id                                         line
0    1  320000-320000, 340000-320000, 320000-340000
1    2                                380000-320000
2    3                  380000-320000,380000-310000
3    4    370000-320000,370000-320000,320000-320000
4    5  320000-320000, 340000-320000, 320000-340000
5    6                                380000-320000
6    7                  380000-320000,380000-310000
7    8    370000-320000,370000-320000,320000-320000
8    9  320000-320000, 340000-320000, 320000-340000
9   10                                380000-320000
10  11                  380000-320000,380000-310000
11  12    370000-320000,370000-320000,320000-320000
df
 id  line_start  line_destination
0    1     320000    320000
1    2     380000    320000
2    3     380000    320000
3    4     370000    320000
4    5     320000    320000
5    6     380000    320000
6    7     380000    320000
7    8     370000    320000
8    9     320000    320000
9   10     380000    320000
10  11     380000    320000
11  12     370000    320000
输出

lst
 ['320000-320000', '320000-320000']
lst_new
 ['320000', '320000', '320000', '320000']
df
 id                                         line
0    1  320000-320000, 340000-320000, 320000-340000
1    2                                380000-320000
2    3                  380000-320000,380000-310000
3    4    370000-320000,370000-320000,320000-320000
4    5  320000-320000, 340000-320000, 320000-340000
5    6                                380000-320000
6    7                  380000-320000,380000-310000
7    8    370000-320000,370000-320000,320000-320000
8    9  320000-320000, 340000-320000, 320000-340000
9   10                                380000-320000
10  11                  380000-320000,380000-310000
11  12    370000-320000,370000-320000,320000-320000
df
 id  line_start  line_destination
0    1     320000    320000
1    2     380000    320000
2    3     380000    320000
3    4     370000    320000
4    5     320000    320000
5    6     380000    320000
6    7     380000    320000
7    8     370000    320000
8    9     320000    320000
9   10     380000    320000
10  11     380000    320000
11  12     370000    320000
您可以考虑
线路的编号
(例如
320000-32000
代表路线的起点和终点)

预期值

让代码运行得更快。(我无法忍受代码的速度)

根据您想对列表执行的操作,使用Generator可以稍微快一点

如果需要保存输出,则列表解决方案会更快

如果您只需要对单词进行一次迭代,那么可以通过使用生成器消除一些开销

def split_list_gen(lines):
    for line in lines:
        yield from line.split('-')
基准 输出 生成器解决方案的速度始终快约10%

list time: 0.4568295369982612
generator time: 0.4020671741918084

将更多的工作推到Python级别以下似乎可以提供一个小的加速:

In [7]: %timeit x = split_list(lst)
407 ms ± 876 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [8]: %timeit x = list(chain.from_iterable(map(methodcaller("split", "-"), lst
   ...: )))
374 ms ± 2.67 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
methodcaller
为您创建调用该函数的函数:

methodcaller("split", "-")(x) == x.split("-")
chain.from_iterable
创建一个迭代器,该迭代器由一组iterable中的元素组成:

list(chain.from_iterable([[1,2], [3,4]])) == [1,2,3,4]
map
ping将
methodcaller
返回的函数ping到字符串列表中会生成一个列表列表,该列表适合通过
从\u iterable
展开。这种功能性更强的方法的好处是,所涉及的函数都是用C实现的,可以处理Python对象中的数据,而不是处理Python对象中的Python字节代码

'-'.join(lst).split('-')
似乎要快一点:

>>> timeit("'-'.join(lst).split('-')", globals=globals(), number=10)
1.0838123590219766
>>> timeit("[x for xs in lst for x in xs.split('-')]", globals=globals(), number=10)
3.1370303670410067

首先使用
timeit
为代码段计时。第二,你能提供更多关于你的实际问题的背景吗?你能给出一个你的数据的例子吗?检查。目前,你正在寻找一个列表的平坦化。投票取消它。。。这似乎是我的答案。