Python KeyError:“数据帧中不存在以下'id_vars'

Python KeyError:“数据帧中不存在以下'id_vars',python,pandas,Python,Pandas,我是pandas的新手,我试图用两个excel计算用户购买的产品数量: 1.客户id选择要使用的客户id 2.交易记录所有交易,如: |cusid|products| ----------------------- | 1 | 12,13,14| | 1 | 05,12,12| 部分代码: import pandas as pd import numpy as np import time import turicreate as tc from sk

我是pandas的新手,我试图用两个excel计算用户购买的产品数量:

1.客户id选择要使用的客户id

2.交易记录所有交易,如:

|cusid|products|
-----------------------
|    1      | 12,13,14|    
|    1      | 05,12,12|
部分代码:


import pandas as pd
import numpy as np
import time
import turicreate as tc
from sklearn.model_selection import train_test_split

import sys
sys.path.append("..")

customers=pd.read_csv('testdata/data/recommend_1.csv')
transactions=pd.read_csv('testdata/data/trx_data.csv')
print(list(customers))
print (list(transactions))
print(customers.shape)
customers.head()
print(transactions.shape)
transactions.head()

data=pd.melt(transactions.set_index('cusid')['products'].apply(pd.Series).reset_index(drop=True),id_vars=['cusid'],value_name='products')\
        .dropna().drop(['variable'],axis=1)\
        .groupby(['cusid','products'])\
        .agg({'products':'count'})\
        .rename(columns={'products':'purchase_count'})\
        .reset_index(drop=True)\
        .rename(columns={'products':'productId'})

data['productId']=data['productId'].astype(np.int64)        
print(list(data))
print(data.shape)
data.head()


结果如下:

['cusid']
['cusid', 'products']
(1000, 1)
(62483, 2)
Traceback (most recent call last):
  File "recomm.py", line 20, in <module>
    data=pd.melt(transactions.set_index('cusid')['products'].apply(pd.Series).reset_index(drop=True),id_vars=['cusid'],value_name='products')\
  File "/Users/bijing/anaconda2/envs/turi/lib/python2.7/site-packages/pandas/core/reshape/melt.py", line 48, in melt
    "".format(missing=list(missing)))
KeyError: "The following 'id_vars' are not present in the DataFrame: ['cusid']"

customers = pd.read_csv('testdata/data/recommend_1.csv')
transactions = pd.read_csv('testdata/data/trx_data.csv')
transactions['products'] = transactions['products'].apply(lambda x: x.split('|'))
transactions = transactions['products'].apply(pd.Series).merge(transactions, right_index=True, left_index=True).drop(['products'], axis=1).melt(id_vars = ['cusid'], value_name='product').drop('variable', axis=1).dropna()
whitelist = set(customers['cusid'].tolist())
res = transactions[transactions['cusid'].isin(whitelist)].groupby(['cusid', 'product']).size().rename('count').reset_index()
您可以共享transactions.head输出吗?第二列似乎是字符串。因此您需要拆分并展平它。在此之后,逻辑非常简单,如下所示:

['cusid']
['cusid', 'products']
(1000, 1)
(62483, 2)
Traceback (most recent call last):
  File "recomm.py", line 20, in <module>
    data=pd.melt(transactions.set_index('cusid')['products'].apply(pd.Series).reset_index(drop=True),id_vars=['cusid'],value_name='products')\
  File "/Users/bijing/anaconda2/envs/turi/lib/python2.7/site-packages/pandas/core/reshape/melt.py", line 48, in melt
    "".format(missing=list(missing)))
KeyError: "The following 'id_vars' are not present in the DataFrame: ['cusid']"

customers = pd.read_csv('testdata/data/recommend_1.csv')
transactions = pd.read_csv('testdata/data/trx_data.csv')
transactions['products'] = transactions['products'].apply(lambda x: x.split('|'))
transactions = transactions['products'].apply(pd.Series).merge(transactions, right_index=True, left_index=True).drop(['products'], axis=1).melt(id_vars = ['cusid'], value_name='product').drop('variable', axis=1).dropna()
whitelist = set(customers['cusid'].tolist())
res = transactions[transactions['cusid'].isin(whitelist)].groupby(['cusid', 'product']).size().rename('count').reset_index()
您可以共享transactions.head输出吗?第二列似乎是字符串。因此您需要拆分并展平它。在此之后,逻辑非常简单,如下所示:

['cusid']
['cusid', 'products']
(1000, 1)
(62483, 2)
Traceback (most recent call last):
  File "recomm.py", line 20, in <module>
    data=pd.melt(transactions.set_index('cusid')['products'].apply(pd.Series).reset_index(drop=True),id_vars=['cusid'],value_name='products')\
  File "/Users/bijing/anaconda2/envs/turi/lib/python2.7/site-packages/pandas/core/reshape/melt.py", line 48, in melt
    "".format(missing=list(missing)))
KeyError: "The following 'id_vars' are not present in the DataFrame: ['cusid']"

customers = pd.read_csv('testdata/data/recommend_1.csv')
transactions = pd.read_csv('testdata/data/trx_data.csv')
transactions['products'] = transactions['products'].apply(lambda x: x.split('|'))
transactions = transactions['products'].apply(pd.Series).merge(transactions, right_index=True, left_index=True).drop(['products'], axis=1).melt(id_vars = ['cusid'], value_name='product').drop('variable', axis=1).dropna()
whitelist = set(customers['cusid'].tolist())
res = transactions[transactions['cusid'].isin(whitelist)].groupby(['cusid', 'product']).size().rename('count').reset_index()

在运行熔化之前,请尝试运行data=data.reset\u index。

在运行熔化之前,请尝试运行data=data.reset\u index。

您可以检查列名中是否有空格吗?没有,我只是将代码复制到这里。@PoppyBee您可以在读取的CSV文件中检查相同的内容吗?已经检查,并且没有任何空格更新了我的qUESTIONCE您可以检查列名中是否有空格吗?没有,我只是在这里复制我的代码。@PoppyBee您可以在您的raading CSV文件中检查相同的空格吗?已检查,并且没有空格更新了我的问题事务的输出。head cusid products 0 0 20 1 0 216 | 52 | 260 | 93 | 93 | 93 2 0 69 | 69 3 0 1 | 1 | 31 | 31 4 0 260 | 256有其他吗错误:回溯最近的调用上次:数据['productId']=data['productId']中的文件recomm.py,第28行.astypenp.int64 KeyError:'productId'事务的输出.head cusid products 0 0 20 1 0 216 | 52 | 260 | 93 | 93 | 93 2 0 69 | 69 3 0 1 | 31 | 4 0 260 | 256获取其他错误:回溯最近的调用:文件recomm.py,第28行,在data['productId']=data['productId']=data['productId']中。astypenp.int64 KeyError:'productId'