Python 将falsy值保留为NaN的布尔索引_Python_Pandas_Dataframe_Nan

Python 将falsy值保留为NaN的布尔索引

python pandas dataframe

Python 将falsy值保留为NaN的布尔索引,python,pandas,dataframe,nan,Python,Pandas,Dataframe,Nan,给定一个数据帧： Data 1 246804 2 135272 3 898.01 4 3453.33 5 shine 6 add 7 522

给定一个数据帧：

                         Data
1                      246804
2                      135272
3                      898.01
4                     3453.33
5                       shine  
6                        add
7                         522
8                         Nan
9                      string
10                      29.11
11                        20

我想要两个新的列

float

和

Strings

，它们都与原始数据帧具有相同的长度。获取

浮动

列很容易：

In [176]: pd.to_numeric(df.Data, errors='coerce')
Out[176]: 
1     246804.00
2     135272.00
3        898.01
4       3453.33
5           NaN
6           NaN
7        522.00
8           NaN
9           NaN
10        29.11
11        20.00
Name: Data, dtype: float64

正如您所看到的，非浮动被强制为

NaN

，这正是我想要的

要获取字符串，我要做的是：

In [177]: df[df.Data.str.isalpha()]
Out[177]: 
     Data
5   shine
6     add
8     Nan
9  string

但正如您所看到的，它不会将非字符串值保留为

NaN

。我想要这样的东西：

1                       NaN
2                       NaN
3                       NaN
4                       NaN
5                       shine  
6                       add
7                       NaN
8                       Nan (not NaN)
9                       string
10                      NaN
11                      NaN

我怎样才能让它这样做

floats = pd.to_numeric(df.Data, 'coerce')
pd.DataFrame(dict(
    floats=floats,
    strings=df.Data.mask(floats.notnull())
))

       floats strings
1   246804.00     NaN
2   135272.00     NaN
3      898.01     NaN
4     3453.33     NaN
5         NaN   shine
6         NaN     add
7      522.00     NaN
8         NaN     Nan
9         NaN  string
10      29.11     NaN
11      20.00     NaN

您甚至可以在

mask

中通过传递一个替代项使其更加明显

floats = pd.to_numeric(df.Data, 'coerce')
pd.DataFrame(dict(
    floats=floats,
    strings=df.Data.mask(floats.notnull(), '')
))

       floats strings
1   246804.00        
2   135272.00        
3      898.01        
4     3453.33        
5         NaN   shine
6         NaN     add
7      522.00        
8         NaN     Nan
9         NaN  string
10      29.11        
11      20.00

怎么样

df.Data.where(pd.to_numeric(df.Data, errors='coerce').isnull())
Out[186]: 
      Data
1      NaN
2      NaN
3      NaN
4      NaN
5    shine
6      add
7      NaN
8      Nan #not NaN
9   string
10     NaN
11     NaN

或者基于您的

df.Data.str.isalpha（）

要获取

字符串

，可以对

数据

列使用布尔索引，并将其定位在

浮动

为空的位置

df['Floats'] = pd.to_numeric(df.Data, errors='coerce')
df['Strings'] = df.Data.loc[df.Floats.isnull()]  # Optional: .astype(str)

>>> df
# Output:
#        Data     Floats Strings
# 1    246804  246804.00     NaN
# 2    135272  135272.00     NaN
# 3    898.01     898.01     NaN
# 4   3453.33    3453.33     NaN
# 5     shine        NaN   shine
# 6       add        NaN     add
# 7       522     522.00     NaN
# 8       Nan        NaN     Nan
# 9    string        NaN  string
# 10    29.11      29.11     NaN
# 11       20      20.00     NaN

我喜欢你的回答！Brb，撤销一些否决票以提高对您答案的投票率。这让我笑了。吹毛求疵：这是因为您修改了原始列。此外，此答案覆盖了

数据中的原始值，因此数字数据丢失。虽然其他答案非常好，但老实说，我最能体会到这一点的简单性。谢谢我也喜欢这个答案，因为它教会了我一些东西。我希望我能接受不止一个，但我不得不接受亚历山大的答案，因为这似乎是所有答案中最简单的一个。尽管如此，还是要感谢你的回答：）不要让我在这篇文章上发表所有的社论。他把答案贴出来半秒钟后，我就把它投了赞成票。我关心的是选票的上升，而不是被接受的答案。我认为开箱即用，我的答案通常与其他人的答案不同。我希望，通常情况下，别人的回答会更令人满意。只有当我看到一个OP在非常好的答案中根本没有选择答案时，我才会感到不满。另外，不要觉得你必须解释或证明你的选择。他的回答很好，我是成年人，我完全能应付。好。。。也许不是成年人，但w/e.@piRSquared谢谢你。。。我尊重这一点。我将来会有很多问题。。。请继续用您的专业知识为我们所有人增光添彩！
df['Floats'] = pd.to_numeric(df.Data, errors='coerce')
df['Strings'] = df.Data.loc[df.Floats.isnull()]  # Optional: .astype(str)

>>> df
# Output:
#        Data     Floats Strings
# 1    246804  246804.00     NaN
# 2    135272  135272.00     NaN
# 3    898.01     898.01     NaN
# 4   3453.33    3453.33     NaN
# 5     shine        NaN   shine
# 6       add        NaN     add
# 7       522     522.00     NaN
# 8       Nan        NaN     Nan
# 9    string        NaN  string
# 10    29.11      29.11     NaN
# 11       20      20.00     NaN