Python 将列编码为分类值_Python_Pandas_Dataframe

Python 将列编码为分类值

python pandas dataframe

Python 将列编码为分类值,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个数据框，如下所示： d = {'item': [1, 2,3,4,5,6], 'time': [1297468800, 1297468809, 12974688010, 1297468890, 1297468820,1297468805]} df = pd.DataFrame(data=d) item time 0 1 1297468800 1 2 1297468809 2 3 1297468801 3 4 1297

我有一个数据框，如下所示：

d = {'item': [1, 2,3,4,5,6], 'time': [1297468800, 1297468809, 12974688010, 1297468890, 1297468820,1297468805]}
df = pd.DataFrame(data=d)

   item         time
0     1   1297468800
1     2   1297468809
2     3   1297468801
3     4   1297468890
4     5   1297468820
5     6   1297468805

df

的输出如下：

d = {'item': [1, 2,3,4,5,6], 'time': [1297468800, 1297468809, 12974688010, 1297468890, 1297468820,1297468805]}
df = pd.DataFrame(data=d)

   item         time
0     1   1297468800
1     2   1297468809
2     3   1297468801
3     4   1297468890
4     5   1297468820
5     6   1297468805

此处的

时间

基于Unix系统时间。我的目标是替换数据帧中的

time

列

比如

mintime = 1297468800
maxtime = 1297468890

我想把时间分割成

（可以通过使用类似于20个间隔的参数来更改）间隔，并在

df

中重新编码

time

列。比如

   item         time
0     1          1
1     2          1
2     3          1
3     4          9
4     5          3
5     6          1

既然我有10亿条记录，那么最有效的方法是什么？谢谢

您可以使用

pd.cut

和

np.linspace

来指定垃圾箱。这会对列进行分类编码，然后您可以从中按顺序提取代码：

bins = np.linspace(df.time.min() - 1, df.time.max(), 10)
df['time'] = pd.cut(df.time, bins=bins, right=True).cat.codes + 1
df

   item  time
0     1     1
1     2     1
2     3     1
3     4     9
4     5     3
5     6     1

或者，根据处理间隔边的方式，也可以这样做

bins = np.linspace(df.time.min(), df.time.max() + 1, 10)
pd.cut(df.time, bins=bins, right=False).cat.codes + 1

0    1
1    1
2    1
3    9
4    2
5    1
dtype: int8

你能解释一下“我想把时间分成10个间隔（可以用20个间隔这样的参数来更改），然后在df中重新编码时间列”吗？这没有多大意义。你能带我们看一下这里的输出吗？当然，整个时间是

[mintime，maxtime]

，我想把这个时间间隔分成

10个时间段。如mintime=0，maxtime=10
。那么我们有（0,1），（1,2），（2,3），。。。。（9,10）。在这种情况下，时间是由continue unix系统进行编码的。我只想将它们分成几个间隔。谢谢。根据我的计算，最后一个输出应该是3，而不是2。你能检查一下吗？我会更正它。谢谢。我已经为你提供了如何获得这两个输出的选项。这取决于你如何对待间隔.问你是否需要更多的解释。