Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/EmptyTag/127.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Pandas 这意味着如何使用分类属性_Pandas_K Means_Categorical Data - Fatal编程技术网

Pandas 这意味着如何使用分类属性

Pandas 这意味着如何使用分类属性,pandas,k-means,categorical-data,Pandas,K Means,Categorical Data,我使用Cloudera 5.2虚拟机和pandas 0.18.0 我想将kmeans应用于我的数据帧。但是我有str列 我的数据帧是 adClicksPerTime.head(n=5) Out[50]: timestamp adCategory userId totalAdClicks 0 2016-05-26 15:00:00 automotive 355 1 1 2016-05-26 15:00:00 cloth

我使用Cloudera 5.2虚拟机和pandas 0.18.0 我想将kmeans应用于我的数据帧。但是我有str列

我的数据帧是

adClicksPerTime.head(n=5)
Out[50]: 
            timestamp   adCategory  userId  totalAdClicks
0 2016-05-26 15:00:00   automotive     355              1
1 2016-05-26 15:00:00     clothing    1027              1
2 2016-05-26 15:00:00    computers    1821              1
3 2016-05-26 15:00:00    computers    2139              1
4 2016-05-26 15:00:00  electronics     253              1

for col in adClicksPerTime:
     print(col)
     print(type(adClicksPerTime[col][1]))


timestamp
<class 'pandas.tslib.Timestamp'>
adCategory
<class 'str'>
userId
<class 'numpy.int64'>
totalAdClicks
<class 'numpy.int64'>
我尝试将字符串转换为分类类型,然后再分配数字代码

adClicksPerTime.adCategory = pd.Categorical.from_array(adClicksPerTime.adCategory)     

adClicksPerTime.head(n=5)
Out[54]: 
            timestamp   adCategory  userId  totalAdClicks
0 2016-05-26 15:00:00   automotive     355              1
1 2016-05-26 15:00:00     clothing    1027              1
2 2016-05-26 15:00:00    computers    1821              1
3 2016-05-26 15:00:00    computers    2139              1
4 2016-05-26 15:00:00  electronics     253              1

for col in adClicksPerTime:
     print(col)
     print(type(adClicksPerTime[col][1]))


timestamp
<class 'pandas.tslib.Timestamp'>
adCategory
<class 'str'>
userId
<class 'numpy.int64'>
totalAdClicks
<class 'numpy.int64'>
adClicksPerTime.adCategory=pd.category.from_数组(adClicksPerTime.adCategory)
adClicksPerTime.head(n=5)
出[54]:
时间戳adCategory用户ID totalAdClicks
0 2016-05-26 15:00:00汽车355 1
1 2016-05-26 15:00:00服装1027 1
2 2016-05-26 15:00:00计算机1821
3 2016-05-26 15:00:00计算机2139 1
4 2016-05-26 15:00:00电子253 1
对于ADC中的列,请单击时间:
打印(col)
打印(键入(adClicksPerTime[col][1]))
时间戳
广告类别
用户ID
总滴答声

如何将kmeans应用于此str字段?

Get dummies会将类别更改为dummies

dummies = pd.get_dummies(adClicksPerTime[adCategory])
del dummies['automotive']
print dummies.columns
然后将此数据帧与
adClicksPerTime
DataFrame合并,最后应用Kmeans

adClicksPerTime.info()
将为您提供数据类型。

k-means仅适用于连续变量。不要在这种数据上使用它!
dummies = pd.get_dummies(adClicksPerTime[adCategory])
del dummies['automotive']
print dummies.columns