Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/294.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 获取按ID分区的每个字段的非空数据_Python_Pandas - Fatal编程技术网

Python 获取按ID分区的每个字段的非空数据

Python 获取按ID分区的每个字段的非空数据,python,pandas,Python,Pandas,我有一个这样的数据帧 id city province status date ---- -------- ---------- -------- ---------- 1 Cainta Rizal failed 22/07/2020 1 nan nan success 22/07/2020 1 nan nan success 22/07/2020

我有一个这样的数据帧

  id  city      province    status    date
----  --------  ----------  --------  ----------
   1  Cainta    Rizal       failed    22/07/2020
   1  nan       nan         success   22/07/2020
   1  nan       nan         success   22/07/2020
   2  Pasig     Manila      success   22/07/2020
   2  nan       nan         failed    22/07/2020
   2  nan       nan         failed    22/07/2020
   3  Marikina  Manila      failed    22/07/2020
   3  nan       nan         success   22/07/2020
   3  nan       nan         success   22/07/2020
我想要的是将上述数据帧转换为以下数据帧:

id  city      province    status    date
----  --------  ----------  --------  ----------
   1  Cainta    Rizal       success   22/07/2020
   2  Pasig     Manila      success   22/07/2020
   3  Marikina  Manila      success   22/07/2020
因此,标准是,对于状态为'success'的每个Id,获取城市和省份的非空值。我可以使用以下代码在SQL中实现这一点,我想在pandas中复制这一点:

SELECT ID,
       MAX(CITY) AS CITY,
       MAX(PROVINCE) AS PROVINCE,
       'SUCCESS' AS STATUS,
       MAX(CASE WHEN STATUS = 'SUCCESS' THEN DATE END) AS "DATE",
FROM TABLE
GROUP BY ID
我希望我的例子很清楚。非常感谢你


编辑:我会对一百万行DF执行此操作,如果可能的话,每个
id
的所有缺失值最好由替换缺失值,然后按列
status
过滤,最后按
id
获取第一个唯一行:


我不确定此SQL查询是否会对每个状态为“success”的Id执行
如果该解决方案显式声明为“nan”,则效果良好,但如果它是空白或空格,则会用它填充整个字段。我所做的变通方法是用replace()方法替换空格和空格。实际代码是df=df.replace(r'^\s*$',np.nan,regex=True)。谢谢,顺便说一句!
cols = ['city','province']
df[cols] = df.groupby(df['id'])[cols].ffill()
df = df.query('status == "success"').drop_duplicates('id')
print (df)
   id      city province   status        date
1   1    Cainta    Rizal  success  22/07/2020
3   2     Pasig   Manila  success  22/07/2020
7   3  Marikina   Manila  success  22/07/2020