在Python中合并数据帧
我有两只熊猫。第一个包含3401行1列,第二个包含4行3列 但我得到的是(我脚本的输出示例):在Python中合并数据帧,python,pandas,dataframe,merge,append,Python,Pandas,Dataframe,Merge,Append,我有两只熊猫。第一个包含3401行1列,第二个包含4行3列 但我得到的是(我脚本的输出示例): 我想做的是为每一封邮件,我想得到这样的东西: - mail1, Id1, Project1, Descr1, Id2, Project2, ... , Id4, Project4, Descr4 - mail2, Id1, Project1, Descr1, Id2, Project2, ... , Id4, Project4, Descr4 ... ... - mail3401, Id1,
我想做的是为每一封邮件,我想得到这样的东西:
- mail1, Id1, Project1, Descr1, Id2, Project2, ... , Id4, Project4, Descr4
- mail2, Id1, Project1, Descr1, Id2, Project2, ... , Id4, Project4, Descr4
... ...
- mail3401, Id1, Project1, Descr1, Id2, Project2, ... , Id4, Project4, Descr4
谢谢你的建议
这是我的密码:
path = r"/Users/kd/path"
allFiles = glob.glob(path + "/*.csv")
frame = pd.DataFrame()
file_names = []
j=0
for file_ in allFiles:
name = os.path.splitext(file_)[0]
i = int(name[-1])
file_names.append(name)
df = pd.read_csv(file_, index_col = None, header = 0)
if j>0:
globals()["self.dfInternautes%s"%i] = pd.concat([globals(["self.dfInternautes%s"%i], df], axis=1)
else:
globals()["self.dfInternautes%s"%i] = df
j += 1
要从数据帧生成一行,请使用
stack
。然后在第一个数据帧中迭代创建新列
>>> df1
0
0 email1
1 email2
2 email3
3 email4
4 email5
5 email6
>>> df2
0 1 2
0 Id1 Project1 Descr1
1 Id2 Project2 Descr2
2 Id3 Project3 Descr3
3 Id4 Project4 Descr4
>>> st = df2.stack()
>>> st
0 0 Id1
1 Project1
2 Descr1
1 0 Id2
1 Project2
2 Descr2
2 0 Id3
1 Project3
2 Descr3
3 0 Id4
1 Project4
2 Descr4
dtype: object
>>> df = df1.copy()
>>> for i in st.index: df[i] = st[i]
...
>>> df
0 (0, 0) (0, 1) (0, 2) (1, 0) (1, 1) (1, 2) (2, 0) (2, 1) \
0 email1 Id1 Project1 Descr1 Id2 Project2 Descr2 Id3 Project3
1 email2 Id1 Project1 Descr1 Id2 Project2 Descr2 Id3 Project3
2 email3 Id1 Project1 Descr1 Id2 Project2 Descr2 Id3 Project3
3 email4 Id1 Project1 Descr1 Id2 Project2 Descr2 Id3 Project3
4 email5 Id1 Project1 Descr1 Id2 Project2 Descr2 Id3 Project3
5 email6 Id1 Project1 Descr1 Id2 Project2 Descr2 Id3 Project3
(2, 2) (3, 0) (3, 1) (3, 2)
0 Descr3 Id4 Project4 Descr4
1 Descr3 Id4 Project4 Descr4
2 Descr3 Id4 Project4 Descr4
3 Descr3 Id4 Project4 Descr4
4 Descr3 Id4 Project4 Descr4
5 Descr3 Id4 Project4 Descr4
可以选择更改列名
df.columns = ['email', 'Id1', 'Project1', 'Descr1', 'Id2', 'Project2', 'Descr2', 'Id3', 'Project3', 'Descr3', 'Id4', 'Project4', 'Descr4']
因此,除了第一列(
mail1,mail2,
),您希望所有行都相同(Id1,Project1,Descr1,Id2,Project2,…,Id4,Project4,Descr4
)?@IanS是的,这正是我想要的!
>>> df1
0
0 email1
1 email2
2 email3
3 email4
4 email5
5 email6
>>> df2
0 1 2
0 Id1 Project1 Descr1
1 Id2 Project2 Descr2
2 Id3 Project3 Descr3
3 Id4 Project4 Descr4
>>> st = df2.stack()
>>> st
0 0 Id1
1 Project1
2 Descr1
1 0 Id2
1 Project2
2 Descr2
2 0 Id3
1 Project3
2 Descr3
3 0 Id4
1 Project4
2 Descr4
dtype: object
>>> df = df1.copy()
>>> for i in st.index: df[i] = st[i]
...
>>> df
0 (0, 0) (0, 1) (0, 2) (1, 0) (1, 1) (1, 2) (2, 0) (2, 1) \
0 email1 Id1 Project1 Descr1 Id2 Project2 Descr2 Id3 Project3
1 email2 Id1 Project1 Descr1 Id2 Project2 Descr2 Id3 Project3
2 email3 Id1 Project1 Descr1 Id2 Project2 Descr2 Id3 Project3
3 email4 Id1 Project1 Descr1 Id2 Project2 Descr2 Id3 Project3
4 email5 Id1 Project1 Descr1 Id2 Project2 Descr2 Id3 Project3
5 email6 Id1 Project1 Descr1 Id2 Project2 Descr2 Id3 Project3
(2, 2) (3, 0) (3, 1) (3, 2)
0 Descr3 Id4 Project4 Descr4
1 Descr3 Id4 Project4 Descr4
2 Descr3 Id4 Project4 Descr4
3 Descr3 Id4 Project4 Descr4
4 Descr3 Id4 Project4 Descr4
5 Descr3 Id4 Project4 Descr4
df.columns = ['email', 'Id1', 'Project1', 'Descr1', 'Id2', 'Project2', 'Descr2', 'Id3', 'Project3', 'Descr3', 'Id4', 'Project4', 'Descr4']