Python 对其中一列包含nan值的数据帧进行排序
我有一个数据帧Python 对其中一列包含nan值的数据帧进行排序,python,sorting,dataframe,Python,Sorting,Dataframe,我有一个数据帧 +------------+------------+------------+------+ | Item Type | Year_Month | Total Cost | Diff | +------------+------------+------------+------+ | Baby Food | Jul-2017 | 3000 | 100 | +------------+------------+------------+------+ |
+------------+------------+------------+------+
| Item Type | Year_Month | Total Cost | Diff |
+------------+------------+------------+------+
| Baby Food | Jul-2017 | 3000 | 100 |
+------------+------------+------------+------+
| Baby Food | Jun-2017 | 2900 | 100 |
+------------+------------+------------+------+
| Cereal | Jul-2017 | 6000 | 1000 |
+------------+------------+------------+------+
| Cereal | Jun-2017 | 5000 | 1000 |
+------------+------------+------------+------+
| Snacks | Jul-2017 | 4500 | Nan |
+------------+------------+------------+------+
| Chocolates | Jul-2017 | 3000 | Nan |
+------------+------------+------------+------+
| Ice Cream | Jul-2017 | 4000 | Nan |
+------------+------------+------------+------+
我想根据diff对数据帧进行排序,但是如果它包含Nan,那么它应该根据总成本进行排序。因此,我的最终输出将如下所示
+------------+------------+------------+------+
| Item Type | Year_Month | Total Cost | Diff |
+------------+------------+------------+------+
| Cereal | Jul-2017 | 6000 | 1000 |
+------------+------------+------------+------+
| Cereal | Jun-2017 | 5000 | 1000 |
+------------+------------+------------+------+
| Baby Food | Jul-2017 | 3000 | 100 |
+------------+------------+------------+------+
| Baby Food | Jun-2017 | 2900 | 100 |
+------------+------------+------------+------+
| Snacks | Jul-2017 | 4500 | Nan |
+------------+------------+------------+------+
| Ice Cream | Jul-2017 | 4000 | Nan |
+------------+------------+------------+------+
| Chocolates | Jul-2017 | 3000 | Nan |
+------------+------------+------------+------+
一种方法是将数据帧分成两个数据帧(一个包含所有具有diff的行不等于Nan,另一个包含具有diff等于Nan的行的数据帧)。然后根据差异和总成本对每个数据帧进行排序,然后将它们合并
+-----------+------------+------------+------+
| Item Type | Year_Month | Total Cost | Diff |
+-----------+------------+------------+------+
| Baby Food | Jul-2017 | 3000 | 100 |
+-----------+------------+------------+------+
| Baby Food | Jun-2017 | 2900 | 100 |
+-----------+------------+------------+------+
| Cereal | Jul-2017 | 6000 | 1000 |
+-----------+------------+------------+------+
| Cereal | Jun-2017 | 5000 | 1000 |
+-----------+------------+------------+------+
+------------+------------+------------+------+
| Item Type | Year_Month | Total Cost | Diff |
+------------+------------+------------+------+
| Snacks | Jul-2017 | 4500 | Nan |
+------------+------------+------------+------+
| Ice Cream | Jul-2017 | 4000 | Nan |
+------------+------------+------------+------+
| Chocolates | Jul-2017 | 3000 | Nan |
+------------+------------+------------+------+
有没有其他优化的方法来实现这一点,因为这将涉及大量的计算?当按列对数据帧(df)进行排序时(此处为“Diff”),Nan值将到达数据帧的末尾。因此,通过按两列(“Diff”和“Total Cost”)对数据帧进行排序,我们可以得到所需的结果
以下是相同的代码:
df=df.sort_values(by=['Diff','Total Cost'],ascending=False)
您可以简单地使用带有以下函数键的排序函数: 输入:
import json
jsonv = [
{
"Item Type": "Snacks",
"Year_Month": "Jul-2017",
"Total Cost": 4500,
"Diff": "5"
},
{
"Item Type": "Ice Cream",
"Year_Month": "Jul-2017",
"Total Cost": 4000,
"Diff": "Nan"
},
{
"Item Type": "Chocolates",
"Year_Month": "Jul-2017",
"Total Cost": 3000,
"Diff": "4"
}
]
def extract_diff(json):
try:
jdiff = json['Diff']
ret = int(jdiff) if jdiff != 'Nan' else 0
return ret
except KeyError:
return 0
jsonv.sort(key=extract_diff, reverse=True)
print(json.dumps(jsonv, indent=4))
[
{
"Item Type": "Snacks",
"Year_Month": "Jul-2017",
"Total Cost": 4500,
"Diff": "5"
},
{
"Item Type": "Chocolates",
"Year_Month": "Jul-2017",
"Total Cost": 3000,
"Diff": "4"
},
{
"Item Type": "Ice Cream",
"Year_Month": "Jul-2017",
"Total Cost": 4000,
"Diff": "Nan"
}
]
Out:
import json
jsonv = [
{
"Item Type": "Snacks",
"Year_Month": "Jul-2017",
"Total Cost": 4500,
"Diff": "5"
},
{
"Item Type": "Ice Cream",
"Year_Month": "Jul-2017",
"Total Cost": 4000,
"Diff": "Nan"
},
{
"Item Type": "Chocolates",
"Year_Month": "Jul-2017",
"Total Cost": 3000,
"Diff": "4"
}
]
def extract_diff(json):
try:
jdiff = json['Diff']
ret = int(jdiff) if jdiff != 'Nan' else 0
return ret
except KeyError:
return 0
jsonv.sort(key=extract_diff, reverse=True)
print(json.dumps(jsonv, indent=4))
[
{
"Item Type": "Snacks",
"Year_Month": "Jul-2017",
"Total Cost": 4500,
"Diff": "5"
},
{
"Item Type": "Chocolates",
"Year_Month": "Jul-2017",
"Total Cost": 3000,
"Diff": "4"
},
{
"Item Type": "Ice Cream",
"Year_Month": "Jul-2017",
"Total Cost": 4000,
"Diff": "Nan"
}
]