Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/353.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/linq/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 列';s在熊猫中的分割应用组合中省略_Python_Pandas_Dataframe_Aggregate Functions_Split Apply Combine - Fatal编程技术网

Python 列';s在熊猫中的分割应用组合中省略

Python 列';s在熊猫中的分割应用组合中省略,python,pandas,dataframe,aggregate-functions,split-apply-combine,Python,Pandas,Dataframe,Aggregate Functions,Split Apply Combine,我正在进行拆分-应用-合并,以找到每个成员的总数量。我需要的数据帧应该有14列:MemberID、DSFS\u 0\u 1、DSFS\u 1\u 2、DSFS\u 2\u 3、DSFS\u 3\u 4、DSFS\u 4\u 5、DSFS\u 5\u 6、DSFS\u 6\u 7、DSFS\u 7\u 8、DSFS\u 9\u 10、DSFS\u 10\u 11、DSFS\u 12、DrugCount。然而,我没有得到第14个(药量),知道为什么吗?变量joined输出所有14个,但joined\

我正在进行拆分-应用-合并,以找到每个成员的总数量。我需要的数据帧应该有14列:
MemberID、DSFS\u 0\u 1、DSFS\u 1\u 2、DSFS\u 2\u 3、DSFS\u 3\u 4、DSFS\u 4\u 5、DSFS\u 5\u 6、DSFS\u 6\u 7、DSFS\u 7\u 8、DSFS\u 9\u 10、DSFS\u 10\u 11、DSFS\u 12、DrugCount
。然而,我没有得到第14个(
药量),知道为什么吗?变量
joined
输出所有14个,但
joined\u grouped\u add
(我在其中进行聚合的函数)只返回13个

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sys
from sklearn.cross_validation import train_test_split
from sklearn import linear_model

# this function takes the drugcount dataframe as input and output a tuple of 3 data frames: DrugCount_Y1,DrugCount_Y2,DrugCount_Y3
def process_DrugCount(drugcount):
    dc = pd.read_csv("DrugCount.csv")
    sub_map = {'1' : 1, '2':2, '3':3, '4':4, '5':5, '6':6, '7+' : 7}
    dc['DrugCount'] = dc.DrugCount.map(sub_map)
    dc['DrugCount'] = dc.DrugCount.astype(int)
    dc_grouped = dc.groupby(dc.Year, as_index=False)
    DrugCount_Y1 = dc_grouped.get_group('Y1')
    DrugCount_Y2 = dc_grouped.get_group('Y2')
    DrugCount_Y3 = dc_grouped.get_group('Y3')
    DrugCount_Y1.drop('Year', axis=1, inplace=True)
    DrugCount_Y2.drop('Year', axis=1, inplace=True)
    DrugCount_Y3.drop('Year', axis=1, inplace=True)
    return (DrugCount_Y1,DrugCount_Y2,DrugCount_Y3)

# this function converts strings such as "1- 2 month" to "1_2"
def replaceMonth(string):
    replace_map = {'0- 1 month' : "0_1", "1- 2 months": "1_2", "2- 3 months": "2_3", "3- 4 months": '3_4', "4- 5 months": "4_5", "5- 6 months": "5_6", "6- 7 months": "6_7", \
                   "7- 8 months" : "7_8", "8- 9 months": "8_9", "9-10 months": "9_10", "10-11 months": "10_11", "11-12 months": "11_12"}
    a_new_string = string.map(replace_map)
    return a_new_string

# this function processes a yearly drug count data
def process_yearly_DrugCount(aframe):
    processed_frame = None
    aframe.drop("Year", axis = 1, inplace = True)
    reformed = aframe[['DSFS']].apply(replaceMonth)
    gd = pd.get_dummies(reformed)
    joined =  pd.concat([aframe, gd], axis = 1)
    joined.drop("DSFS", axis = 1, inplace = True)
    joined_grouped = joined.groupby("MemberID", as_index = False)
    joined_grouped_agg = joined_grouped.agg(np.sum)
    print joined_grouped_agg
    return processed_frame
def main():
    pd.options.mode.chained_assignment = None 
    daysinhospital = pd.read_csv('DaysInHospital_Y2.csv')
    drugcount = pd.read_csv('DrugCount.csv')
    process_DrugCount(drugcount)
    process_yearly_DrugCount(drugcount)
    replaceMonth(drugcount['DSFS'])

if __name__ == '__main__':
    main()

简单地说,直接从csv中提取的
DrugCount
不是作为数字字段(int/float)读入的。否则它将保留在
.agg(np.sum)
处理中。在聚合之前,检查数据类型并查看它是否为
对象
类型(即字符串列):

事实上,在
process\u DrugCount()
函数中,可以显式地将DrugCount列转换为带astype的整数,但在
process\u DrugCount()
函数中不这样做。在后一个函数中运行同一行,合计金额处理中应保留药量:

aframe['DrugCount'] = aframe['DrugCount'].astype(int)
或者更好的方法是,在
main()
中,避免在后面的函数中进行两次转换:

drugcount['DrugCount'] = drugcount['DrugCount'].astype(int)
另外,请注意,允许使用其dtype参数显式指定列类型:

drugcount = pd.read_csv('DrugCount.csv', dtype={'DrugCount': np.int64})

调用函数的行在哪里?这里的帮助也太多了。我建议打断每个部分并添加打印语句来查看内容,以查看列被删除的位置。否则,设置一个。我在打印过程中打印了输出,并将其拆分。这就是为什么我知道一切都很好,直到我进行聚合
joined\u grouped\u agg
drugcount = pd.read_csv('DrugCount.csv', dtype={'DrugCount': np.int64})