Python 将一个文件复制N次_Python_List_Pandas

Python 将一个文件复制N次

python list pandas

Python 将一个文件复制N次,python,list,pandas,Python,List,Pandas,所以现在，如果我把一个列表乘以，比如，x=[1,2,3]*2，我得到的x是[1,2,3,1,2,3]，但这对熊猫不起作用因此，如果我想复制一个数据，我必须创建一个列、一个列表和多个： col_x_duplicates = list(df['col_x'])*N new_df = DataFrame(col_x_duplicates, columns=['col_x']) 然后对原始数据执行联接： pd.merge(new_df, df, on='col_x', how='left')

所以现在，如果我把一个列表乘以，比如，

x=[1,2,3]*2，我得到的x是[1,2,3,1,2,3]

，但这对熊猫不起作用

因此，如果我想复制一个数据，我必须创建一个列、一个列表和多个：

col_x_duplicates =  list(df['col_x'])*N

new_df = DataFrame(col_x_duplicates, columns=['col_x'])

然后对原始数据执行联接：

pd.merge(new_df, df, on='col_x', how='left')

现在，有没有更简单的方法将熊猫复制N次？或者更快捷的方法？

实际上，由于您希望复制整个数据帧（而不是每个元素），numpy.tile（）可能更好：

In [69]: import pandas as pd

In [70]: arr = pd.np.array([[1, 2, 3], [4, 5, 6]])

In [71]: arr
Out[71]: 
array([[1, 2, 3],
       [4, 5, 6]])

In [72]: df = pd.DataFrame(pd.np.tile(arr, (5, 1)))

In [73]: df
Out[73]: 
   0  1  2
0  1  2  3
1  4  5  6
2  1  2  3
3  4  5  6
4  1  2  3
5  4  5  6
6  1  2  3
7  4  5  6
8  1  2  3
9  4  5  6

[10 rows x 3 columns]

In [75]: df = pd.DataFrame(pd.np.tile(arr, (1, 3)))

In [76]: df
Out[76]: 
   0  1  2  3  4  5  6  7  8
0  1  2  3  1  2  3  1  2  3
1  4  5  6  4  5  6  4  5  6

[2 rows x 9 columns]

这里是一个单行程序，用于制作数据帧，其中包含数据帧的

副本

df

n_df = pd.concat([df] * n)

例如：

df = pd.DataFrame(
    data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']], 
    columns=['id', 'temp', 'name'], 
    index=pd.Index([1, 2, 3], name='row')
)
n = 4
n_df = pd.concat([df] * n)

然后

n_df

是以下数据帧：

    id  temp    name
row         
1   34  null    mark
2   22  null    mark
3   34  null    mark
1   34  null    mark
2   22  null    mark
3   34  null    mark
1   34  null    mark
2   22  null    mark
3   34  null    mark
1   34  null    mark
2   22  null    mark
3   34  null    mark

Numpy的repeat（）在这里可能很有用（也很快）。请参阅。是否希望输出列看起来像

[1,2,3,1,2,3]

或

[1,1,2,2,3,3]

？谢谢，这太棒了！在大熊猫df上运行时，羞耻感似乎太慢了！你知道有没有捷径吗？@redrubia你给tile（）打了好几次电话吗？它可能很慢，因为您每次都在分配额外的内存。如果您知道最终大小（在所有复制之后），可以尝试初始化该大小的零numpy数组，然后使用切片填充。@redrubia或者，如果您不需要修改复制的数据，请查看是否可以重构代码，以便将索引保存在某个位置，并重复访问同一数据帧，而不是创建新的平铺数据帧。这样，您就不用支付分配更多内存的费用。这是做同样事情的另一种方式：