Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/279.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫过滤并创建新列_Python_Pandas - Fatal编程技术网

Python 熊猫过滤并创建新列

Python 熊猫过滤并创建新列,python,pandas,Python,Pandas,我有一个问题: import pandas as pd import numpy as np df = pd.DataFrame(['Air type:1', 'Space kind:2', 'water', np.NaN], columns = ['A']) A 0 Air type:1 1 Space kind:2 2 water 3 NaN 我想将A中包含“:”的条目拆分为两个新列。因此,我尝试将此操作与.loc筛选器相结合: df.loc[(df.A.st

我有一个问题:

import pandas as pd
import numpy as np
df = pd.DataFrame(['Air type:1', 'Space kind:2', 'water', np.NaN], columns = ['A'])

      A
0   Air type:1
1   Space kind:2
2   water
3   NaN
我想将A中包含“:”的条目拆分为两个新列。因此,我尝试将此操作与.loc筛选器相结合:

df.loc[(df.A.str.contains(':')) & (~df.A.isnull()), ['B', 'C']] = df.A.str.split(':', expand = True)
但结果并不十分乐观:

     A            B       C
0   Air type:1   NaN    NaN
1   Space kind:2 NaN    NaN
2   water        NaN    NaN
3   NaN          NaN    NaN
如果我不过滤以下内容,它会起作用:

df[['B', 'C']] = df.A.str.split(':', expand = True)

           A           B        C
0   Air type:1      Air type    1
1   Space kind:2    Space kind  2
2   water             water    None
3   NaN                NaN     NaN
问题是
water
条目被错误地分配给了新列,之后我不得不手动修复它

为什么
.loc
+分配不起作用

理想情况下,我希望:

           A           B        C
0   Air type:1      Air type    1
1   Space kind:2    Space kind  2
2   water              NaN     NaN
3   NaN                NaN     NaN

尝试使用
df检查条件。其中

c  = c = df['A'].str.contains(":")
#c = df['A'].str.count(":").ge(1)
df[['B', 'C']] = df['A'].str.split(":",expand=True).where(c)


另一个版本,使用
.extract()

印刷品:

              A           B    C
0    Air type:1    Air type    1
1  Space kind:2  Space kind    2
2         water         NaN  NaN
3           NaN         NaN  NaN

另一种方法是使用
.stack()
&
.join

df1 = df.join(

  df.loc[df['A'].str.contains(':')==True]\
              .stack()\
              .str.split(':',expand=True).unstack(1).droplevel(1,1)
)


或直接指派

df[['B','C']] = df.loc[df['A'].str.contains(':')==True]\
                              .stack()\
                              .str.split(':',expand=True)\
                              .unstack(1).droplevel(1,1)

              A           B    C
0    Air type:1    Air type    1
1  Space kind:2  Space kind    2
2         water         NaN  NaN
3           NaN         NaN  NaN

很有魅力,谢谢你。你知道为什么
.loc
构造不起作用吗?@User2321在我的pandas版本中,tat-loc构造抛出了一个键错误
KeyError:“[Index(['B','C'],dtype='object')]中没有一个在[columns]中”
I see:)好的,无论如何谢谢!老实说,我认为它应该像我所想的那样通过索引分配
.loc
。可能值得抛出repl并通过调试器查看幕后发生的情况。无法复制,因为loc仅用于在我的版本中分配系列,而不用于分配数据帧,不确定在以后的版本中是否有任何更改
df1 = df.join(

  df.loc[df['A'].str.contains(':')==True]\
              .stack()\
              .str.split(':',expand=True).unstack(1).droplevel(1,1)
)
              A           0    1
0    Air type:1    Air type    1
1  Space kind:2  Space kind    2
2         water         NaN  NaN
3           NaN         NaN  NaN
df[['B','C']] = df.loc[df['A'].str.contains(':')==True]\
                              .stack()\
                              .str.split(':',expand=True)\
                              .unstack(1).droplevel(1,1)

              A           B    C
0    Air type:1    Air type    1
1  Space kind:2  Space kind    2
2         water         NaN  NaN
3           NaN         NaN  NaN