Python 如何复制熊猫中的行?
我的熊猫数据框如下所示:Python 如何复制熊猫中的行?,python,pandas,dataframe,repeat,Python,Pandas,Dataframe,Repeat,我的熊猫数据框如下所示: Person ID ZipCode Gender 0 12345 882 38182 Female 1 32917 271 88172 Male 2 18273 552 90291 Female 我希望每行复制3次,如下所示: Person ID ZipCode Gender 0 12345 882 38182 Female 0 12345 882 38182
Person ID ZipCode Gender
0 12345 882 38182 Female
1 32917 271 88172 Male
2 18273 552 90291 Female
我希望每行复制3次,如下所示:
Person ID ZipCode Gender
0 12345 882 38182 Female
0 12345 882 38182 Female
0 12345 882 38182 Female
1 32917 271 88172 Male
1 32917 271 88172 Male
1 32917 271 88172 Male
2 18273 552 90291 Female
2 18273 552 90291 Female
2 18273 552 90291 Female
当然,重置索引,使其为:
0
1
2
...
我尝试了以下解决方案:
pd.concat([df[:5]]*3, ignore_index=True)
以及:
但是它们都不起作用。尝试使用:
上述代码将输出:
Person ID ZipCode Gender
0 12345 882 38182 Female
1 12345 882 38182 Female
2 12345 882 38182 Female
3 32917 271 88172 Male
4 32917 271 88172 Male
5 32917 271 88172 Male
6 18273 552 90291 Female
7 18273 552 90291 Female
8 18273 552 90291 Female
重复df
,3次的值
然后我们添加带有assigningnew_df.columns=df.columns
的列
您还可以在第一行中指定列名,如下所示:
newdf = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)
print(newdf)
上述代码还将输出:
Person ID ZipCode Gender
0 12345 882 38182 Female
1 12345 882 38182 Female
2 12345 882 38182 Female
3 32917 271 88172 Male
4 32917 271 88172 Male
5 32917 271 88172 Male
6 18273 552 90291 Female
7 18273 552 90291 Female
8 18273 552 90291 Female
你可以这样做
def do_things(df, n_times):
ndf = df.append(pd.DataFrame({'name' : np.repeat(df.name.values, n_times) }))
ndf = ndf.sort_values(by='name')
ndf = ndf.reset_index(drop=True)
return ndf
if __name__ == '__main__':
df = pd.DataFrame({'name' : ['Peter', 'Quill', 'Jackson']})
n_times = 3
print do_things(df, n_times)
还有解释
import pandas as pd
import numpy as np
n_times = 3
df = pd.DataFrame({'name' : ['Peter', 'Quill', 'Jackson']})
# name
# 0 Peter
# 1 Quill
# 2 Jackson
# Duplicating data.
df = df.append(pd.DataFrame({'name' : np.repeat(df.name.values, n_times) }))
# name
# 0 Peter
# 1 Quill
# 2 Jackson
# 0 Peter
# 1 Peter
# 2 Peter
# 3 Quill
# 4 Quill
# 5 Quill
# 6 Jackson
# 7 Jackson
# 8 Jackson
# The DataFrame is sorted by 'name' column.
df = df.sort_values(by=['name'])
# name
# 2 Jackson
# 6 Jackson
# 7 Jackson
# 8 Jackson
# 0 Peter
# 0 Peter
# 1 Peter
# 2 Peter
# 1 Quill
# 3 Quill
# 4 Quill
# 5 Quill
# Reseting the index.
# You can play with drop=True and drop=False, as parameter of `reset_index()`
df = df.reset_index()
# index name
# 0 2 Jackson
# 1 6 Jackson
# 2 7 Jackson
# 3 8 Jackson
# 4 0 Peter
# 5 0 Peter
# 6 1 Peter
# 7 2 Peter
# 8 1 Quill
# 9 3 Quill
# 10 4 Quill
# 11 5 Quill
这些将重复索引并保留列,如op所示
iloc
version 1
iloc
version 2
使用concat
:
pd.concat([df]*3).sort_index()
Out[129]:
Person ID ZipCode Gender
0 12345 882 38182 Female
0 12345 882 38182 Female
0 12345 882 38182 Female
1 32917 271 88172 Male
1 32917 271 88172 Male
1 32917 271 88172 Male
2 18273 552 90291 Female
2 18273 552 90291 Female
2 18273 552 90291 Female
我认为索引是自动生成的。除非将其作为数据帧的字段,否则无法更改。无论如何,这是一个索引。必须是唯一的。pd.concat([df[:5]]*3,忽略索引=True)
对我有效,你能展示你的df.index
,如果你的索引有问题,下面的解决方案可能不起作用。抱歉,我要澄清,pd.concat([df[:5]*3,忽略索引=True)
有效,但它会将行添加到数据帧的末尾,与其一行接一行地重复3行,不如说这对具有多索引值的数据帧很有吸引力,这在公认的解决方案中似乎并不适用。后者无法处理多重索引。
import pandas as pd
import numpy as np
n_times = 3
df = pd.DataFrame({'name' : ['Peter', 'Quill', 'Jackson']})
# name
# 0 Peter
# 1 Quill
# 2 Jackson
# Duplicating data.
df = df.append(pd.DataFrame({'name' : np.repeat(df.name.values, n_times) }))
# name
# 0 Peter
# 1 Quill
# 2 Jackson
# 0 Peter
# 1 Peter
# 2 Peter
# 3 Quill
# 4 Quill
# 5 Quill
# 6 Jackson
# 7 Jackson
# 8 Jackson
# The DataFrame is sorted by 'name' column.
df = df.sort_values(by=['name'])
# name
# 2 Jackson
# 6 Jackson
# 7 Jackson
# 8 Jackson
# 0 Peter
# 0 Peter
# 1 Peter
# 2 Peter
# 1 Quill
# 3 Quill
# 4 Quill
# 5 Quill
# Reseting the index.
# You can play with drop=True and drop=False, as parameter of `reset_index()`
df = df.reset_index()
# index name
# 0 2 Jackson
# 1 6 Jackson
# 2 7 Jackson
# 3 8 Jackson
# 4 0 Peter
# 5 0 Peter
# 6 1 Peter
# 7 2 Peter
# 8 1 Quill
# 9 3 Quill
# 10 4 Quill
# 11 5 Quill
df.iloc[np.arange(len(df)).repeat(3)]
df.iloc[np.arange(len(df) * 3) // 3]
pd.concat([df]*3).sort_index()
Out[129]:
Person ID ZipCode Gender
0 12345 882 38182 Female
0 12345 882 38182 Female
0 12345 882 38182 Female
1 32917 271 88172 Male
1 32917 271 88172 Male
1 32917 271 88172 Male
2 18273 552 90291 Female
2 18273 552 90291 Female
2 18273 552 90291 Female