Python 对其中一列包含nan值的数据帧进行排序

Python 对其中一列包含nan值的数据帧进行排序,python,sorting,dataframe,Python,Sorting,Dataframe,我有一个数据帧 +------------+------------+------------+------+ | Item Type | Year_Month | Total Cost | Diff | +------------+------------+------------+------+ | Baby Food | Jul-2017 | 3000 | 100 | +------------+------------+------------+------+ |

我有一个数据帧

+------------+------------+------------+------+
| Item Type  | Year_Month | Total Cost | Diff |
+------------+------------+------------+------+
| Baby Food  | Jul-2017   | 3000       | 100  |
+------------+------------+------------+------+
| Baby Food  | Jun-2017   | 2900       | 100  |
+------------+------------+------------+------+
| Cereal     | Jul-2017   | 6000       | 1000 |
+------------+------------+------------+------+
| Cereal     | Jun-2017   | 5000       | 1000 |
+------------+------------+------------+------+
| Snacks     | Jul-2017   | 4500       | Nan  |
+------------+------------+------------+------+
| Chocolates | Jul-2017   | 3000       | Nan  |
+------------+------------+------------+------+
| Ice Cream  | Jul-2017   | 4000       | Nan  |
+------------+------------+------------+------+
我想根据diff对数据帧进行排序,但是如果它包含Nan,那么它应该根据总成本进行排序。因此,我的最终输出将如下所示

+------------+------------+------------+------+
|  Item Type | Year_Month | Total Cost | Diff |
+------------+------------+------------+------+
| Cereal     | Jul-2017   | 6000       | 1000 |
+------------+------------+------------+------+
| Cereal     | Jun-2017   | 5000       | 1000 |
+------------+------------+------------+------+
| Baby Food  | Jul-2017   | 3000       | 100  |
+------------+------------+------------+------+
| Baby Food  | Jun-2017   | 2900       | 100  |
+------------+------------+------------+------+
| Snacks     | Jul-2017   | 4500       | Nan  |
+------------+------------+------------+------+
| Ice Cream  | Jul-2017   | 4000       | Nan  |
+------------+------------+------------+------+
| Chocolates | Jul-2017   | 3000       | Nan  |
+------------+------------+------------+------+
一种方法是将数据帧分成两个数据帧(一个包含所有具有diff的行不等于Nan,另一个包含具有diff等于Nan的行的数据帧)。然后根据差异和总成本对每个数据帧进行排序,然后将它们合并

+-----------+------------+------------+------+
| Item Type | Year_Month | Total Cost | Diff |
+-----------+------------+------------+------+
| Baby Food | Jul-2017   | 3000       | 100  |
+-----------+------------+------------+------+
| Baby Food | Jun-2017   | 2900       | 100  |
+-----------+------------+------------+------+
| Cereal    | Jul-2017   | 6000       | 1000 |
+-----------+------------+------------+------+
| Cereal    | Jun-2017   | 5000       | 1000 |
+-----------+------------+------------+------+


+------------+------------+------------+------+
| Item Type  | Year_Month | Total Cost | Diff |
+------------+------------+------------+------+
| Snacks     | Jul-2017   | 4500       | Nan  |
+------------+------------+------------+------+
| Ice Cream  | Jul-2017   | 4000       | Nan  |
+------------+------------+------------+------+
| Chocolates | Jul-2017   | 3000       | Nan  |
+------------+------------+------------+------+
有没有其他优化的方法来实现这一点,因为这将涉及大量的计算?

当按列对数据帧(df)进行排序时(此处为“Diff”),Nan值将到达数据帧的末尾。因此,通过按两列(“Diff”和“Total Cost”)对数据帧进行排序,我们可以得到所需的结果

以下是相同的代码:

    df=df.sort_values(by=['Diff','Total Cost'],ascending=False)

您可以简单地使用带有以下函数键的排序函数:

输入:

import json

jsonv = [
 {
   "Item Type": "Snacks",
   "Year_Month": "Jul-2017",
   "Total Cost": 4500,
   "Diff": "5"
 },
 {
   "Item Type": "Ice Cream",
   "Year_Month": "Jul-2017",
   "Total Cost": 4000,
   "Diff": "Nan"
 },
 {
   "Item Type": "Chocolates",
   "Year_Month": "Jul-2017",
   "Total Cost": 3000,
   "Diff": "4"
 }
]

def extract_diff(json):
    try:
        jdiff = json['Diff']
        ret = int(jdiff) if jdiff != 'Nan' else 0
        return ret
    except KeyError:
        return 0

jsonv.sort(key=extract_diff, reverse=True)

print(json.dumps(jsonv, indent=4))
[
    {
        "Item Type": "Snacks",
        "Year_Month": "Jul-2017",
        "Total Cost": 4500,
        "Diff": "5"
    },
    {
        "Item Type": "Chocolates",
        "Year_Month": "Jul-2017",
        "Total Cost": 3000,
        "Diff": "4"
    },
    {
        "Item Type": "Ice Cream",
        "Year_Month": "Jul-2017",
        "Total Cost": 4000,
        "Diff": "Nan"
    }
]
Out:

import json

jsonv = [
 {
   "Item Type": "Snacks",
   "Year_Month": "Jul-2017",
   "Total Cost": 4500,
   "Diff": "5"
 },
 {
   "Item Type": "Ice Cream",
   "Year_Month": "Jul-2017",
   "Total Cost": 4000,
   "Diff": "Nan"
 },
 {
   "Item Type": "Chocolates",
   "Year_Month": "Jul-2017",
   "Total Cost": 3000,
   "Diff": "4"
 }
]

def extract_diff(json):
    try:
        jdiff = json['Diff']
        ret = int(jdiff) if jdiff != 'Nan' else 0
        return ret
    except KeyError:
        return 0

jsonv.sort(key=extract_diff, reverse=True)

print(json.dumps(jsonv, indent=4))
[
    {
        "Item Type": "Snacks",
        "Year_Month": "Jul-2017",
        "Total Cost": 4500,
        "Diff": "5"
    },
    {
        "Item Type": "Chocolates",
        "Year_Month": "Jul-2017",
        "Total Cost": 3000,
        "Diff": "4"
    },
    {
        "Item Type": "Ice Cream",
        "Year_Month": "Jul-2017",
        "Total Cost": 4000,
        "Diff": "Nan"
    }
]