在python中设置列数据类型时出现问题
我试图在我的两列上设置数据类型,但它不起作用。我想将[trans_typ]设置为“category”,将[date]设置为date.time。还有一个索引[date],我已经设置为date.time,但我想将第一列也设置为date.time在python中设置列数据类型时出现问题,python,pandas,Python,Pandas,我试图在我的两列上设置数据类型,但它不起作用。我想将[trans_typ]设置为“category”,将[date]设置为date.time。还有一个索引[date],我已经设置为date.time,但我想将第一列也设置为date.time import numpy as np import pandas as pd import glob df = pd.read_csv('/home/jayaramdas/anaconda3/cf_data', low_memory=False, \
import numpy as np
import pandas as pd
import glob
df = pd.read_csv('/home/jayaramdas/anaconda3/cf_data', low_memory=False, \
parse_dates = True)
df.set_index(pd.to_datetime(df['date']), inplace=True)
df['trans_typ'].astype('category')
pd.to_datetime(df['date'])
df.dtypes
My output
date object
cmte_id object
trans_typ object
amount float64
fec_id object
cand_id object
dtype: object
这是我从打印输出的数据(df)
我刚刚使用了
df['date']=df['date'].astype('datetime64')
,它可以工作 您可以使用:
#if you need copy of column date to index
df.set_index(df['date'], inplace=True)
print df
date cmte_id trans_typ entity_typ state employer \
date
2007-08-15 2007-08-15 C00112250 24K ORG DC NaN
2007-09-26 2007-09-26 C00119040 24K CCM FL NaN
2007-09-26 2007-09-26 C00119040 24K CCM MD NaN
2011-02-25 2011-02-25 C00478404 24K COM MN NaN
2011-02-01 2011-02-01 C00140855 24K CCM DC NaN
2011-02-01 2011-02-01 C00140855 24K CCM DC NaN
2011-02-22 2011-02-22 C00140855 24K CCM MD NaN
2011-02-28 2011-02-28 C00093963 24K CCM ND NaN
occupation amount fec_id cand_id
date
2007-08-15 NaN 2000 C00431569 P00003392
2007-09-26 NaN 1000 C00367680 H2FL05127
2007-09-26 NaN 1000 C00140715 H2MD05155
2011-02-25 NaN 2400 C00326629 H8MN06047
2011-02-01 NaN 1000 C00373464 H2OH17109
2011-02-01 NaN 1000 C00289983 H4KY01040
2011-02-22 NaN 2500 C00140715 H2MD05155
2011-02-28 NaN 1000 C00474619 H0ND00135
#convert column trans_typ to category
#column date is datetime, no converted
df['trans_typ'] = df['trans_typ'].astype('category')
或:
谢谢,这很有帮助。不幸的是,我仍然存在()中提到的问题。如果你能照亮它,那就太好了!好的,给我一点时间,我试试看。
#if you need copy of column date to index
df.set_index(df['date'], inplace=True)
print df
date cmte_id trans_typ entity_typ state employer \
date
2007-08-15 2007-08-15 C00112250 24K ORG DC NaN
2007-09-26 2007-09-26 C00119040 24K CCM FL NaN
2007-09-26 2007-09-26 C00119040 24K CCM MD NaN
2011-02-25 2011-02-25 C00478404 24K COM MN NaN
2011-02-01 2011-02-01 C00140855 24K CCM DC NaN
2011-02-01 2011-02-01 C00140855 24K CCM DC NaN
2011-02-22 2011-02-22 C00140855 24K CCM MD NaN
2011-02-28 2011-02-28 C00093963 24K CCM ND NaN
occupation amount fec_id cand_id
date
2007-08-15 NaN 2000 C00431569 P00003392
2007-09-26 NaN 1000 C00367680 H2FL05127
2007-09-26 NaN 1000 C00140715 H2MD05155
2011-02-25 NaN 2400 C00326629 H8MN06047
2011-02-01 NaN 1000 C00373464 H2OH17109
2011-02-01 NaN 1000 C00289983 H4KY01040
2011-02-22 NaN 2500 C00140715 H2MD05155
2011-02-28 NaN 1000 C00474619 H0ND00135
#convert column trans_typ to category
#column date is datetime, no converted
df['trans_typ'] = df['trans_typ'].astype('category')
print df
date cmte_id trans_typ entity_typ state employer \
date
2007-08-15 2007-08-15 C00112250 24K ORG DC NaN
2007-09-26 2007-09-26 C00119040 24K CCM FL NaN
2007-09-26 2007-09-26 C00119040 24K CCM MD NaN
2011-02-25 2011-02-25 C00478404 24K COM MN NaN
2011-02-01 2011-02-01 C00140855 24K CCM DC NaN
2011-02-01 2011-02-01 C00140855 24K CCM DC NaN
2011-02-22 2011-02-22 C00140855 24K CCM MD NaN
2011-02-28 2011-02-28 C00093963 24K CCM ND NaN
occupation amount fec_id cand_id
date
2007-08-15 NaN 2000 C00431569 P00003392
2007-09-26 NaN 1000 C00367680 H2FL05127
2007-09-26 NaN 1000 C00140715 H2MD05155
2011-02-25 NaN 2400 C00326629 H8MN06047
2011-02-01 NaN 1000 C00373464 H2OH17109
2011-02-01 NaN 1000 C00289983 H4KY01040
2011-02-22 NaN 2500 C00140715 H2MD05155
2011-02-28 NaN 1000 C00474619 H0ND00135
print df.dtypes
date datetime64[ns]
cmte_id object
trans_typ category
entity_typ object
state object
employer float64
occupation float64
amount int64
fec_id object
cand_id object
dtype: object
#if you DONT need copy of column date to index
df.set_index('date', inplace=True)
print df
cmte_id trans_typ entity_typ state employer occupation \
date
2007-08-15 C00112250 24K ORG DC NaN NaN
2007-09-26 C00119040 24K CCM FL NaN NaN
2007-09-26 C00119040 24K CCM MD NaN NaN
2011-02-25 C00478404 24K COM MN NaN NaN
2011-02-01 C00140855 24K CCM DC NaN NaN
2011-02-01 C00140855 24K CCM DC NaN NaN
2011-02-22 C00140855 24K CCM MD NaN NaN
2011-02-28 C00093963 24K CCM ND NaN NaN
amount fec_id cand_id
date
2007-08-15 2000 C00431569 P00003392
2007-09-26 1000 C00367680 H2FL05127
2007-09-26 1000 C00140715 H2MD05155
2011-02-25 2400 C00326629 H8MN06047
2011-02-01 1000 C00373464 H2OH17109
2011-02-01 1000 C00289983 H4KY01040
2011-02-22 2500 C00140715 H2MD05155
2011-02-28 1000 C00474619 H0ND00135
df['trans_typ'] = df['trans_typ'].astype('category')
print df
cmte_id trans_typ entity_typ state employer occupation \
date
2007-08-15 C00112250 24K ORG DC NaN NaN
2007-09-26 C00119040 24K CCM FL NaN NaN
2007-09-26 C00119040 24K CCM MD NaN NaN
2011-02-25 C00478404 24K COM MN NaN NaN
2011-02-01 C00140855 24K CCM DC NaN NaN
2011-02-01 C00140855 24K CCM DC NaN NaN
2011-02-22 C00140855 24K CCM MD NaN NaN
2011-02-28 C00093963 24K CCM ND NaN NaN
amount fec_id cand_id
date
2007-08-15 2000 C00431569 P00003392
2007-09-26 1000 C00367680 H2FL05127
2007-09-26 1000 C00140715 H2MD05155
2011-02-25 2400 C00326629 H8MN06047
2011-02-01 1000 C00373464 H2OH17109
2011-02-01 1000 C00289983 H4KY01040
2011-02-22 2500 C00140715 H2MD05155
2011-02-28 1000 C00474619 H0ND00135
print df.dtypes
cmte_id object
trans_typ category
entity_typ object
state object
employer float64
occupation float64
amount int64
fec_id object
cand_id object
dtype: object
print df.index
DatetimeIndex(['2007-08-15', '2007-09-26', '2007-09-26', '2011-02-25',
'2011-02-01', '2011-02-01', '2011-02-22', '2011-02-28'],
dtype='datetime64[ns]', name=u'date', freq=None)