Python设置“索引函数:keyrerror:”;[]中的任何一个都不在列“中”;
我目前正在阅读奥雷利昂·杰伦的《机器学习的手》一书。但是,我收到以下错误消息:(由于需要以下两个CSV下载,因此复制起来有些麻烦: 错误消息: 文件“C:\Users\xxx\Miniconda3\lib\site packages\pandas\core\frame.py”, 第4548行,在集合索引中 raise KeyError(f“列中没有{缺少}”) KeyError:“['Country']都不在列中” 守则:Python设置“索引函数:keyrerror:”;[]中的任何一个都不在列“中”;,python,pandas,indexing,Python,Pandas,Indexing,我目前正在阅读奥雷利昂·杰伦的《机器学习的手》一书。但是,我收到以下错误消息:(由于需要以下两个CSV下载,因此复制起来有些麻烦: 错误消息: 文件“C:\Users\xxx\Miniconda3\lib\site packages\pandas\core\frame.py”, 第4548行,在集合索引中 raise KeyError(f“列中没有{缺少}”) KeyError:“['Country']都不在列中” 守则: import matplotlib.pyplot as plt imp
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.linear_model
oecd_bli = pd.read_csv("BLI_24092020220751169.csv", thousands =',')
gdp_per_capita = pd.read_csv("gdp_per_capita.csv", thousands =',', delimiter ='\t', encoding =' latin1', na_values="n/a")
def prepare_country_stats(oecd_bli, gdp_per_capita):
oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value")
gdp_per_capita.rename(columns={"2015":"GDP per capita"}, inplace=True)
gdp_per_capita.set_index("Country", inplace=True)
full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita,
left_index=True, right_index=True)
full_country_stats.sort_values(by="GDP per capita", inplace=True)
remove_indices = [0, 1, 6, 8, 33, 34, 35]
keep_indices = list(set(range(36)) - set(remove_indices))
return full_country_stats[["GDP per capita", 'Life satisfaction']].iloc[keep_indices]
country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
X = np.c_[country_stats["GDP per capita"]]
y = np.c_[country_stats["Life satisfaction"]]
# Visualize the data
country_stats.plot( kind ='scatter', X ="GDP per capita", y ='Life satisfaction')
plt.show()
# Select a linear model
model = sklearn.linear_model.LinearRegression()
# Train the model
model.fit(X, y)
# Make a prediction for Cyprus
X_new = [[22587]]
# Cyprus's GDP per capita
print( model.predict(X_new))
但是,我已经在函数中卡住了。错误似乎与set_index
命令有关,我认为这是一个非常可靠的函数。当然,在我的CSV文件中存在Country
列
这里是人均gdp的屏幕截图
如果有人愿意花时间复制,我们将不胜感激。您的代码需要做一些更改: 更改此行:
gdp_per_capita = pd.read_csv("gdp_per_capita.csv", thousands =',', delimiter ='\t', encoding =' latin1', na_values="n/a")
为此(删除encoding='latin1'
):
改变这一点:
country_stats.plot(kind='scatter', X="GDP per capita", y='Life satisfaction')
对此(大写X至X):
在这两个变化之后,我能够得到散点图:
- 用
拉丁语编码阅读
csv将人均gdp
列读为国家
。因此,我建议国家
编码,解决了这个问题采用“utf-8”
- 您在
中有一个输入错误,已经指出了这一点散点图
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.linear_model
oecd_bli = pd.read_csv("BLI_26092020152902439.csv", thousands =',')
gdp_per_capita = pd.read_csv("gdp_per_capita.csv", delimiter = '\t', thousands =',', encoding ='utf-8', na_values="n/a")
def prepare_country_stats(oecd_bli, gdp_per_capita):
oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value")
gdp_per_capita.rename(columns={"2015":"GDP per capita"}, inplace=True)
print(gdp_per_capita)
gdp_per_capita.set_index("Country", inplace=True)
full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita,
left_index=True, right_index=True)
full_country_stats.sort_values(by="GDP per capita", inplace=True)
remove_indices = [0, 1, 6, 8, 33, 34, 35]
keep_indices = list(set(range(36)) - set(remove_indices))
return full_country_stats[["GDP per capita", 'Life satisfaction']].iloc[keep_indices]
country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
X = np.c_[country_stats["GDP per capita"]]
y = np.c_[country_stats["Life satisfaction"]]
# Visualize the data
country_stats.plot( kind ='scatter', x ="GDP per capita", y ='Life satisfaction')
plt.show()
# Select a linear model
model = sklearn.linear_model.LinearRegression()
# Train the model
model.fit(X, y)
# Make a prediction for Cyprus
X_new = [[22587]]
# Cyprus's GDP per capita
print( model.predict(X_new))
输出:
你确定你设置的
分隔符
正确吗?在屏幕截图中,从标题看,;
是你的分隔符
。天哪,怎么会忽略了这一点。我真希望我的德国电脑设置没有引起注意。非常感谢@รยקคгรђשค 非常感谢你澄清这一点-这条评论也帮了我很多谢谢!奇怪的是,我用的是“拉丁语1”。当我使用“utf-8”我得到以下信息:UnicodeDecodeError:“utf-8”编解码器无法解码位置235638处的字节0xa0:无效的start byteGlad。如果答案对您有帮助,请接受并投票。这样人们就知道问题已经解决了。非常感谢。我确实注意到X在CAP中,但我认为,因为在CAP中,这对您来说并不重要跟进。我注意到您在上图中绘制了回归。这是由您的IDE造成的吗?我使用Spyder,它没有绘制回归。我使用Pycharm/VS代码。当您运行matplotlib时,会打开一个单独的窗口。您为什么接受其他答案?只是想知道。我想我确实接受了您的答案。不确定发生了什么:)再次感谢
country_stats.plot(kind='scatter', x="GDP per capita", y='Life satisfaction')
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.linear_model
oecd_bli = pd.read_csv("BLI_26092020152902439.csv", thousands =',')
gdp_per_capita = pd.read_csv("gdp_per_capita.csv", delimiter = '\t', thousands =',', encoding ='utf-8', na_values="n/a")
def prepare_country_stats(oecd_bli, gdp_per_capita):
oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value")
gdp_per_capita.rename(columns={"2015":"GDP per capita"}, inplace=True)
print(gdp_per_capita)
gdp_per_capita.set_index("Country", inplace=True)
full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita,
left_index=True, right_index=True)
full_country_stats.sort_values(by="GDP per capita", inplace=True)
remove_indices = [0, 1, 6, 8, 33, 34, 35]
keep_indices = list(set(range(36)) - set(remove_indices))
return full_country_stats[["GDP per capita", 'Life satisfaction']].iloc[keep_indices]
country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
X = np.c_[country_stats["GDP per capita"]]
y = np.c_[country_stats["Life satisfaction"]]
# Visualize the data
country_stats.plot( kind ='scatter', x ="GDP per capita", y ='Life satisfaction')
plt.show()
# Select a linear model
model = sklearn.linear_model.LinearRegression()
# Train the model
model.fit(X, y)
# Make a prediction for Cyprus
X_new = [[22587]]
# Cyprus's GDP per capita
print( model.predict(X_new))