Python 在DataFrame中创建NaN列

Python 在DataFrame中创建NaN列,python,numpy,pandas,Python,Numpy,Pandas,我看到了下面的示例来说明如何在DataFrame中创建NaN列 import pandas as pd import numpy as np import math import copy import datetime as dt """ Accepts a list of symbols along with start and end date Returns the Event Matrix which is a pandas Datamatrix Event matrix has t

我看到了下面的示例来说明如何在DataFrame中创建NaN列

import pandas as pd
import numpy as np
import math
import copy
import datetime as dt

"""
Accepts a list of symbols along with start and end date
Returns the Event Matrix which is a pandas Datamatrix
Event matrix has the following structure :
    |IBM |GOOG|XOM |MSFT| GS | JP |
(d1)|nan |nan | 1  |nan |nan | 1  |
(d2)|nan | 1  |nan |nan |nan |nan |
(d3)| 1  |nan | 1  |nan | 1  |nan |
(d4)|nan |  1 |nan | 1  |nan |nan |
...................................
...................................
Also, d1 = start date
nan = no information about any event.
1 = status bit(positively confirms the event occurence)
"""

def find_events(ls_symbols, d_data):
    ''' Finding the event dataframe '''
    df_close = d_data['actual_close']
    ts_market = df_close['SPY']

    print "Finding Events"

    # Creating an empty dataframe
    df_events = copy.deepcopy(df_close) # type <class 'pandas.core.frame.DataFrame'>
    df_events = df_events * np.NAN # << why it works here

Q> 为什么它现在不在这里工作?

因为您有一列
state
,其中包含字符串,将字符串与
NaN相乘会产生错误。如果确实要将状态设置为
NaN
,请使用
frame['state']=np.NaN

注意
df_close
实际上是一个列,而不是一个数据帧。(
df_close=d_数据['actual_close']
。因此,
df_事件也是如此。您有一个包含三列的数据框,其中
state
是一个字符串,存储为Python对象。而且不能用数字乘以字符串/对象

无论如何,乘法是完全不必要的:

  • 所有
    df_close=df_close*np.NaN
    都是以不必要的模糊方式将NaN分配给整个列
  • 直接指定
    =np.NaN
    更为清晰。或发送至
    pd.np.NaN
  • 如果要将NaN分配给多个列,请执行:
    df[['year','pop']]=pd.np.NaN
  • 没有真正的乘法运算。不要这样写代码。不要虐待操作员

如果仔细观察,df_close和df_事件实际上只是一列,而不是一个数据帧。
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
        'year': [2000, 2001, 2002, 2001, 2002],
        'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
frame = DataFrame(data)
frame = frame * np.NAN # TypeError: can't multiply sequence by non-int of type 'float'