Python “接收”'；系列'；对象是可变的，因此不能对其进行散列；在应用函数中使用dataframe时出错_Python_Pandas_Data Science

Python “接收”'；系列'；对象是可变的，因此不能对其进行散列；在应用函数中使用dataframe时出错

python pandas

Python “接收”'；系列'；对象是可变的，因此不能对其进行散列；在应用函数中使用dataframe时出错,python,pandas,data-science,Python,Pandas,Data Science,我一辈子都想不出如何摆脱这个错误！我试图查看两个不同的数据帧是否共享同一行的两列（例如，如果两个数据帧在“City”列下都有“Jacksonville”，在“State”列下都有“Florida”）。我正在尝试运行： hpricesold = convert_housing_data_to_quarters() hprices = hpricesold.copy() hprices = hprices.reset_index(inplace=False) def is_uni(df):

我一辈子都想不出如何摆脱这个错误！我试图查看两个不同的数据帧是否共享同一行的两列（例如，如果两个数据帧在“City”列下都有“Jacksonville”，在“State”列下都有“Florida”）。我正在尝试运行：

hpricesold = convert_housing_data_to_quarters()
hprices = hpricesold.copy()
hprices = hprices.reset_index(inplace=False)

def is_uni(df):
    if df in get_list_of_university_towns():
        return 1
    else:
        return 0

hprices['Is_Uni'] = hprices.apply(is_uni, axis=1)

这两个定义被称为：

def convert_housing_data_to_quarters():
    #create
    hd = pd.read_csv('City_Zhvi_AllHomes.csv')
    hd['State'] = hd['State'].map(states)
    hd = hd.set_index(["State", "RegionName"])
    hd = hd.drop(hd.loc[:, '1996-04':'1999-12'], inplace = False, axis = 1)


    hd = hd.loc[:, '2000-01':'2016-08']
    #finds the average value for each quarter
    hd = hd.groupby(np.arange(len(hd.columns))//3, axis=1).mean()

    #now to name the stupid thing...
    rec = pd.read_excel('gdplev.xls', header = [4])
    rec = rec.drop([0,1], axis=0)
    start = rec[rec["Unnamed: 4"] =="2000q1"].index.values.astype(int)[0]

    rec = rec.loc[start:]

    rec = rec.reset_index()
    rec = rec.drop(['index', 'Unnamed: 0', 'GDP in billions of current dollars',
           'GDP in billions of chained 2009 dollars', 'Unnamed: 3', 'Unnamed: 7'], axis = 1)
    rec = rec.rename(columns = {'Unnamed: 4' : 'Year', 
                                'GDP in billions of current dollars.1' : 'GDP (bil current)', 
                                'GDP in billions of chained 2009 dollars.1' : 'GDP (bil chained 2009)'})
    rec = rec.append({'Year' : '2016q3'}, ignore_index = True)


    for col in hd.columns:
        hd = hd.rename(columns = {hd.columns[col] : rec.loc[col, 'Year']})





def get_list_of_university_towns():
    #utowns = pd.read_table('university_towns.txt', header=None)
    #utowns = utowns.rename(columns = {0: 'Info'})

    lst = []
    state = ''
    regname = ''

    with open('university_towns.txt') as utowns:
        for line in utowns:
            if line.find("[edit]") != -1:
                location = line.find("[edit]")
                state = line[:location]
                #print (line[:location])

            elif line.find(" (") != -1:
                location = line.find(" (")
                regname = line[:location]
                #print (line[:location])
                lst.append([state, regname])
            #if line.find(":") != -1:
            #    location = line.find(":")
            #    regname = line[:location+1]
            #    lst.append([state, regname])
            else:
                regname = line[:-1]
                #print (regname)
                lst.append([state, regname])

    utowns = pd.DataFrame(lst, columns = ['State', 'RegionName'])
    return utowns

我有一种感觉，我的错误源于我如何在convert_housing_data_to_quarters（）中操作我的数据帧，但我在代码中有点迷失了方向。我觉得似乎每个列类型都是一个系列是有道理的，但是我如何使其不可变以便传递此函数呢？

注意行

hprices['is\u Uni']=hprices.apply（is\u Uni，axis=1）

。它将

is_uni

函数应用于hprices中的每一行（因为您通过了axis=1）

现在看看这个函数的起始行：

def is_uni（df）：

这意味着df实际上是整行

下一行包含

如果大学城（）的get_列表中的df:

，因此，您尝试检查整行，是否在大学城列表中（这可能是错误的根源）

我在这里看到需要纠正的两点：

这里只放包含城市名称的列，而不是df（整行）

in之后，不是此函数返回的所有数据帧，只放置包含大学城市的列

另一句话：调用生成整个数据帧的函数是一种不好的做法在循环中，每次返回相同的结果

而是在循环之前获取此数据帧一次，并将其保存在变量中，并将其作为参数传递给此函数

最后一句话：不要在包含df变量名内容的地方使用df变量名实际上不是数据帧。在这种情况下，它是一行从数据帧，所以您可以将此名称更改为仅行

而且代码的可读性会更高。

发布错误回溯将有助于某人确定代码中到底哪里出了问题，我想可能是发生了错误，因为您正试图使用

系列

对象作为字典中的键，键需要是可散列的。