Pandas python分组方法_Pandas - Fatal编程技术网

Pandas python分组方法

pandas

Pandas python分组方法,pandas,Pandas,我在使用pandas模块的groupby函数时遇到问题。我得到错误：DataError:没有要聚合的数值类型我不确定我做错了什么——数据框中有数字数据下面是我的代码： lte_columns = ['Period start','Period end','zone','usid','site id','rank','Total LCQI Impact','LTE BLOCK Impact','LTE DROP Impact','LTE TPUT Impact','engineer notes

我在使用pandas模块的groupby函数时遇到问题。我得到错误：DataError:没有要聚合的数值类型

我不确定我做错了什么——数据框中有数字数据

下面是我的代码：

lte_columns = ['Period start','Period end','zone','usid','site id','rank','Total LCQI Impact','LTE BLOCK Impact','LTE DROP Impact','LTE TPUT Impact','engineer notes']
#lte_df = pd.DataFrame(dtype=float)
lte_df = pd.DataFrame(dtype=float)

## iterate over the CQI impact file seperate LTE from UMTS and perform lookup for each Technology/USID
testFile = "sample_CSCT_CQI_IMPACT_Greater Midwest_20160305_20160311.xls"
df = pd.read_excel(testFile,sheetname="Sheet1")

weekBegin = df['Date'].min()
weekEnd = df['Date'].max()

## update new dataFrames while iterating over input dataframe

for idx, row in df.iterrows():
    usid = row['USID']
    region, zone = row['District & Zone'].split('-')
    if usid in lte_lookup:
        site_id = lte_lookup[usid][1]
    else:
        site_id = "N/A"

    lte = pd.Series([weekBegin,weekEnd,zone,usid,site_id,'0','0','0','0','0','0'])
    lte_df = lte_df.append(lte,ignore_index=True)

lte_df.columns = lte_columns
grps = lte_df.groupby(['usid'])
avgs = grps.mean()
avgs.to_excel("pandas_out.xlsx",merge_cells=False) 

print "done"

以下是lte_df的示例：

>>> print lte_df
     Period start  Period end zone      usid    site id rank Total LCQI Impact LTE BLOCK Impact LTE DROP Impact LTE TPUT Impact engineer notes
0      03/05/2016  03/11/2016  69E   56788.0   MOL02607    0                 0                0               0               0              0
1      03/05/2016  03/11/2016  70F   58438.0   KSL05065    0                 0                0               0               0              0
2      03/05/2016  03/11/2016  69A  120595.0  MOL00531W    0                 0                0               0               0              0
3      03/05/2016  03/11/2016  70D   75566.0   KSL04272    0                 0                0               0               0              0
4      03/05/2016  03/11/2016  70F   58454.0   KSL05106    0                 0                0               0               0              0
5      03/05/2016  03/11/2016  70E   41793.0   KSL04151    0                 0                0               0               0              0
6      03/05/2016  03/11/2016  70C    9500.0   KSL06382    0                 0                0               0               0              0
7      03/05/2016  03/11/2016  69A   56586.0   MOL01143    0                 0                0               0               0              0


>>> lte_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6565 entries, 0 to 6564
Data columns (total 11 columns):
Period start         6565 non-null object
Period end           6565 non-null object
zone                 6565 non-null object
usid                 6565 non-null float64
site id              6565 non-null object
rank                 6565 non-null object
Total LCQI Impact    6565 non-null object
LTE BLOCK Impact     6565 non-null object
LTE DROP Impact      6565 non-null object
LTE TPUT Impact      6565 non-null object
engineer notes       6565 non-null object
dtypes: float64(1), object(10)
memory usage: 615.5+ KB
>>>

打印lte\U df 时段开始时段结束区域usid站点id排名总LCQI影响LTE阻塞影响LTE下降影响LTE TPUT影响工程师备注 0 03/05/2016 03/11/2016 69E 56788.0 MOL02607 0 0 2016年5月1日2016年11月3日70F 58438.0 KSL05065 0 2016年5月2日2016年11月3日69A 120595.0 MOL00531W 0 0 0 3/05/2016 03/11/2016 70D 75566.0 KSL04272 0 0 4 2016年5月3日2016年11月3日70F 58454.0 KSL05106 0 0 5/05/2016 03/11/2016 70E 41793.0 KSL04151 0 0 0 2016年5月6日2016年11月3日70C 9500.0 KSL06382 0 0 7 2016年5月3日2016年11月3日69A 56586.0 MOL01143 0 0 0 >>>lte_df.info（） Int64Index:6565个条目，0到6564 数据列（共11列）：时段开始6565非空对象期间结束6565非空对象 6565区非空对象 usid 6565非空浮点64 站点id 6565非空对象秩6565非空对象总LCQI影响6565非空对象 LTE块影响6565非空对象 LTE丢弃影响6565非空对象 LTE TPUT影响6565非空对象工程师注释6565非空对象数据类型：float64（1），object（10）内存使用率：615.5+KB >>>

根据您的数据框中可用的数据，您的groupby无法工作，因为您的代码正在尝试确定列的平均值，而无法确定，因为它们不是浮动。甚至其他列中的零也是字符串

所以这是行不通的：

grps = lte_df.groupby(['usid'])
avgs = grps.mean()

但例如

grps = lte_df[['Period start', 'usid']].groupby(['Period start'])
avgs = grps.mean()

将工作，因为它是由某个列分组的，而剩下的唯一一个列是一个float，因此将返回一些内容。我意识到这不是您试图做的，但这是一个如何工作的示例。

添加

lte_df.info（）

——错误表明您的所有列都被读取为对象，而不是int或floats。另请参见：这是lte_df.info（）的输出请编辑此问题以将其包括在内唯一的数字列是您正在创建组的列。拥有数字列将消除此问题。