Python 尝试随机化数据集时出现奇怪错误

Python 尝试随机化数据集时出现奇怪错误,python,pandas,Python,Pandas,我正在尝试用以下代码洗牌数据 import pandas as pd import numpy as np from sklearn.naive_bayes import MultinomialNB data = pd.read_csv('dataset.txt') np.random.shuffle(data) 但是,运行此命令会导致以下错误。我不明白这个错误是从哪里来的 Traceback (most recent call last): File "sample2.py", lin

我正在尝试用以下代码洗牌数据

import pandas as pd
import numpy as np

from sklearn.naive_bayes import MultinomialNB
 data = pd.read_csv('dataset.txt')
 np.random.shuffle(data)
但是,运行此命令会导致以下错误。我不明白这个错误是从哪里来的

Traceback (most recent call last):
File "sample2.py", line 12, in <module>
 np.random.shuffle(data)
File "mtrand.pyx", line 4668, in mtrand.RandomState.shuffle (numpy/random /mtrand/mtrand.c:30498)
 File "mtrand.pyx", line 4671, in mtrand.RandomState.shuffle (numpy/random/mtrand/mtrand.c:30438)
 File "/Users/marcvanderpeet/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/frame.py", line 1992, in __getitem__
 return self._getitem_column(key)
 File "/Users/marcvanderpeet/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/frame.py", line 2004, in _getitem_column
 result = result[key]
 File "/Users/marcvanderpeet/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/frame.py", line 1992, in __getitem__
 return self._getitem_column(key)
 File "/Users/marcvanderpeet/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/frame.py", line 1999, in _getitem_column
 return self._get_item_cache(key)
 File "/Users/marcvanderpeet/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/generic.py", line 1345, in _get_item_cache
 values = self._data.get(item)
 File "/Users/marcvanderpeet/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/internals.py", line 3225, in get
 loc = self.items.get_loc(item)
 File "/Users/marcvanderpeet/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/indexes/base.py", line 1878, in get_loc
 return self._engine.get_loc(self._maybe_cast_indexer(key))
 File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc  (pandas/index.c:4027)
  File "pandas/index.pyx", line 157, in pandas.index.IndexEngine.get_loc (pandas/index.c:3891)
  File "pandas/hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12408)
  File "pandas/hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12359)
回溯(最近一次呼叫最后一次):
文件“sample2.py”,第12行,在
np.random.shuffle(数据)
文件“mtrand.pyx”,第4668行,在mtrand.RandomState.shuffle(numpy/random/mtrand/mtrand.c:30498)中
文件“mtrand.pyx”,第4671行,在mtrand.RandomState.shuffle(numpy/random/mtrand/mtrand.c:30438)中
文件“/Users/marcvanderpeet/Library/enthught/Canopy_64bit/User/lib/python2.7/site packages/pandas/core/frame.py”,第1992行,在u_getitem中__
返回self.\u getitem\u列(键)
文件“/Users/marcvanderpeet/Library/enthund/Canopy_64bit/User/lib/python2.7/site packages/pandas/core/frame.py”,第2004行,在_getitem_列中
结果=结果[键]
文件“/Users/marcvanderpeet/Library/enthught/Canopy_64bit/User/lib/python2.7/site packages/pandas/core/frame.py”,第1992行,在u_getitem中__
返回self.\u getitem\u列(键)
文件“/Users/marcvanderpeet/Library/enthught/Canopy_64bit/User/lib/python2.7/site packages/pandas/core/frame.py”,第1999行,在_getitem_列中
返回self.\u获取\u项目\u缓存(密钥)
文件“/Users/marcvanderpeet/Library/enthund/Canopy\u 64bit/User/lib/python2.7/site packages/pandas/core/generic.py”,第1345行,在获取项目缓存中
values=self.\u data.get(项目)
get中的文件“/Users/marcvanderpeet/Library/enthught/Canopy_64bit/User/lib/python2.7/site packages/pandas/core/internals.py”,第3225行
loc=自身项目。获取loc(项目)
文件“/Users/marcvanderpeet/Library/enthught/Canopy_64bit/User/lib/python2.7/site packages/pandas/index/base.py”,第1878行,在get_loc中
返回self.\u引擎。获取\u loc(self.\u可能\u cast\u索引器(键))
pandas.index.IndexEngine.get_loc(pandas/index.c:4027)中的文件“pandas/index.pyx”,第137行
pandas.index.IndexEngine.get_loc(pandas/index.c:3891)中的文件“pandas/index.pyx”,第157行
pandas.hashtable.PyObjectHashTable.get_项(pandas/hashtable.c:12408)中第675行的文件“pandas/hashtable.pyx”
pandas.hashtable.PyObjectHashTable.get_项(pandas/hashtable.c:12359)中第683行的文件“pandas/hashtable.pyx”

有什么想法吗?

我不太了解整个回溯过程,但对我来说,错误仅仅来自数据帧不是numpy数组这一事实。要修复它,只需使用数据帧的实际底层数组,使用
data.values


我对回溯中发生的事情的猜测是,
np.random.shuffle
不检查输入是否为有效数组,并尝试以与常规数组相同的方式从数据帧中操作和获取数据,因此所有关于
getitem
等的错误都存在。

我不太了解整个回溯,但需要我认为这个错误仅仅是因为数据帧不是numpy数组。要修复它,只需使用数据帧的实际底层数组,使用
data.values


我对回溯过程中发生的情况的猜测是,
np.random.shuffle
不检查输入是否为有效数组,并尝试以与常规数组相同的方式从数据帧中操作和获取数据,因此所有关于
getitem
等的错误都存在。

您正在对数据帧应用numpy函数数据帧

您可以将数据帧转换为numpy数组,并对其进行洗牌:

 np.random.shuffle(data.values)
或者,您可以使用以下功能:

data = data.sample(len(data))

您正在对数据帧应用numpy函数

您可以将数据帧转换为numpy数组,并对其进行洗牌:

 np.random.shuffle(data.values)
或者,您可以使用以下功能:

data = data.sample(len(data))