Python3数据帧密钥错误问题_Python_Python 3.x_Pandas

Python3数据帧密钥错误问题

python python-3.x pandas

Python3数据帧密钥错误问题,python,python-3.x,pandas,Python,Python 3.x,Pandas,我有一个数据帧爬网，如下所示：当我运行此代码时 crawl_stats = ( crawls['updated'] .groupby(crawls.index.get_level_values('url')) .agg({ 'number of crawls': 'count', 'proportion of updates': 'mean', 'number of updates': 'sum' }) 它显示了错

我有一个数据帧爬网，如下所示：

当我运行此代码时

crawl_stats = (
crawls['updated']
    .groupby(crawls.index.get_level_values('url'))
    .agg({
        'number of crawls': 'count', 
        'proportion of updates': 'mean', 
        'number of updates': 'sum'
    })

它显示了错误：

KeyError                                  Traceback (most recent call last)
<ipython-input-62-180f1041465d> in <module>
      8 crawl_stats = (
      9     crawls['updated']
---> 10         .groupby(crawls.index.get_level_values('url'))
     11         # .groupby('url')
     12         .agg({

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/indexes/base.py in _get_level_values(self, level)
   3155         """
   3156 
-> 3157         self._validate_index_level(level)
   3158         return self
   3159 

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/indexes/base.py in _validate_index_level(self, level)
   1942         elif level != self.name:
   1943             raise KeyError('Level %s must be same as name (%s)' %
-> 1944                            (level, self.name))
   1945 
   1946     def _get_level_number(self, level):

KeyError: 'Level url must be same as name (None)'

它还显示错误：

KeyError                                  Traceback (most recent call last)
<ipython-input-63-8c5f0f6f7c86> in <module>
      9     crawls['updated']
     10         # .groupby(crawls.index.get_level_values('url'))
---> 11         .groupby('url')
     12         .agg({
     13             'number of crawls': 'count',       
3293             # Add key to exclusions

    KeyError: 'url'

您需要替换此：

.groupby(crawls.index.get_level_values('url'))

与：

因为数据帧中没有索引

有两个问题-需要按列

url

分组，还需要为具有聚合函数的新列名定义元组列表：

crawls = pd.DataFrame({
    'url': ['a','a','a','a','b','b','b'],
    'updated': list(range(7))
})
print (crawls)
  url  updated
0   a        0
1   a        1
2   a        2
3   a        3
4   b        4
5   b        5
6   b        6

d = [('number of crawls', 'count'), 
     ('proportion of updates', 'mean'), 
     ('number of updates', 'sum')]
crawl_stats = crawls.groupby('url')['updated'].agg(d)
print (crawl_stats)
     number of crawls  proportion of updates  number of updates
url                                                            
a                   4                    1.5                  6
b                   3                    5.0                 15

编辑：

son numeric列的问题应转换为numpy数组，最好是创建dict并传递给DataFrame CONTRUCOR：

更改：

columns = ['url','hour','updated']
data = np.array((url,hour,updated)).T
df = pd.DataFrame(data=data, columns=columns)

致：

始终建议以文本而不是图像形式发布示例。请编辑您的帖子，然后让我们知道。什么是

打印（crawls.columns.tolist（））

？我根据您的指导修改了代码，但在编辑帖子时显示错误。您能给我帮助吗？检查问题下的注释，什么是

print（crawls.columns.tolist（））

？因为

Keyerror

意味着没有列

url

我根据你的指导修改了我的代码，但它显示了我编辑帖子时的错误。DataError:没有要删除的数字类型aggregate@pandalai-您的

update

列似乎不是数字，请在我的解决方案之前尝试

crawls['updated']=crawls['updated'].astype（float）

。谢谢您的帮助@耶斯雷尔终于成功了@潘达莱-不客气！如果我的回答有帮助，别忘了。谢谢

.groupby('url')

crawls = pd.DataFrame({
    'url': ['a','a','a','a','b','b','b'],
    'updated': list(range(7))
})
print (crawls)
  url  updated
0   a        0
1   a        1
2   a        2
3   a        3
4   b        4
5   b        5
6   b        6

d = [('number of crawls', 'count'), 
     ('proportion of updates', 'mean'), 
     ('number of updates', 'sum')]
crawl_stats = crawls.groupby('url')['updated'].agg(d)
print (crawl_stats)
     number of crawls  proportion of updates  number of updates
url                                                            
a                   4                    1.5                  6
b                   3                    5.0                 15

columns = ['url','hour','updated']
data = np.array((url,hour,updated)).T
df = pd.DataFrame(data=data, columns=columns)

columns = ['url','hour','updated']
df = pd.DataFrame({'url':url, 'hour':hour,'updated':updated}, columns=columns)