Python 基于值将一个数据帧拆分为具有相同列标题的多个数据帧
我有一个如下所示的数据框Python 基于值将一个数据帧拆分为具有相同列标题的多个数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个如下所示的数据框 +------+------+---+---+---+ | S.No | A | B | C | D | +------+------+---+---+---+ | 1 | 0.25 | 2 | 1 | 5 | +------+------+---+---+---+ | 2 | 1.1 | 4 | 2 | 5 | +------+------+---+---+---+ | 3 | 1.5 | 6 | 3 | 5 | +------+-----
+------+------+---+---+---+
| S.No | A | B | C | D |
+------+------+---+---+---+
| 1 | 0.25 | 2 | 1 | 5 |
+------+------+---+---+---+
| 2 | 1.1 | 4 | 2 | 5 |
+------+------+---+---+---+
| 3 | 1.5 | 6 | 3 | 5 |
+------+------+---+---+---+
| 4 | 0.32 | 3 | 4 | 5 |
+------+------+---+---+---+
| 5 | 1.45 | 5 | 5 | 5 |
+------+------+---+---+---+
| 6 | 1.9 | 7 | 6 | 5 |
+------+------+---+---+---+
| 7 | 0.5 | 3 | 4 | 5 |
+------+------+---+---+---+
| 8 | 1.49 | 5 | 5 | 5 |
+------+------+---+---+---+
我想将它们拆分为3个数据帧,具有相同的列标题值名称,拆分基于列A值,即第一个数据帧应从0.25开始并在1.5结束,第二个数据帧应从0.32开始并在1.9结束,第三个数据帧应从0.5开始并在1.49结束。i、 e当A列中的值介于0-1之间时,应开始拆分,它们都应保留相同的列标题值。预期输出如下,因为我是新手,我不知道如何正确地完成这项工作,在这方面的任何帮助将不胜感激
数据帧1:
数据帧2:
数据帧3:
让我们做一个简单的练习
首先确定值介于0和1之间的索引。这是通过使用介于和索引之间的组合来完成的。一旦有了索引,就可以开始使用iloc拆分数据帧 方法
根据您提供的解释,您包括一个中间条件, 例如: 第一个数据帧应该从0.25开始,在1.5结束 这意味着像0.32这样的值应该包含在数据帧中 使用该逻辑,您可以执行以下操作:
l=[.25,1.5,.32,1.9,.5,1.49]
r=[(a,b) for a,b in zip(l[::2],l[1::2])]
for i in r:
r i in r:
print(df[df['A'].between(*i,inclusive=True)].sort_values('A'))
print("----------------------------------")
您是基于A的条件值进行拆分,还是仅仅通过选择行索引进行拆分?太好了,达到了我的预期。谢谢@Roshan Santhosh
+------+------+---+---+---+
| S.No | A | B | C | D |
+------+------+---+---+---+
| 4 | 0.32 | 3 | 4 | 5 |
+------+------+---+---+---+
| 5 | 1.45 | 5 | 5 | 5 |
+------+------+---+---+---+
| 6 | 1.9 | 7 | 6 | 5 |
+------+------+---+---+---+
+------+------+---+---+---+
| S.No | A | B | C | D |
+------+------+---+---+---+
| 7 | 0.5 | 3 | 4 | 5 |
+------+------+---+---+---+
| 8 | 1.49 | 5 | 5 | 5 |
+------+------+---+---+---+
d={x: y for x , y in df.groupby(df.A.between(0,1).cumsum())}
#Identifies indices based on variable A
splitIndices = df.index[df.A.between(0,1)].tolist()
dfList = []
for i in range(len(splitIndices)-1):
startIndex = splitIndices[i]
endIndex = splitIndices[i+1]
tempDf = df.iloc[startIndex : endIndex]
#Appends the dataframe subset to the output list
dfList.append(tempDf.copy())
l=[.25,1.5,.32,1.9,.5,1.49]
r=[(a,b) for a,b in zip(l[::2],l[1::2])]
for i in r:
r i in r:
print(df[df['A'].between(*i,inclusive=True)].sort_values('A'))
print("----------------------------------")
S.No A B C D
0 1.0 0.25 2.0 1.0 5.0
3 4.0 0.32 3.0 4.0 5.0
6 7.0 0.50 3.0 4.0 5.0
1 2.0 1.10 4.0 2.0 5.0
4 5.0 1.45 5.0 5.0 5.0
7 8.0 1.49 5.0 5.0 5.0
2 3.0 1.50 6.0 3.0 5.0
----------------------------------
S.No A B C D
3 4.0 0.32 3.0 4.0 5.0
6 7.0 0.50 3.0 4.0 5.0
1 2.0 1.10 4.0 2.0 5.0
4 5.0 1.45 5.0 5.0 5.0
7 8.0 1.49 5.0 5.0 5.0
2 3.0 1.50 6.0 3.0 5.0
5 6.0 1.90 7.0 6.0 5.0
----------------------------------
S.No A B C D
6 7.0 0.50 3.0 4.0 5.0
1 2.0 1.10 4.0 2.0 5.0
4 5.0 1.45 5.0 5.0 5.0
7 8.0 1.49 5.0 5.0 5.0