Python 两个变量大小相同的散点图_Python_Python 3.x

Python 两个变量大小相同的散点图

python python-3.x

Python 两个变量大小相同的散点图,python,python-3.x,Python,Python 3.x,数据源：文件夹：“世界发展指标” 文件：Indicators.csv 我试图在两个变量之间绘制散点图。然而，这两个变量的大小并不相同数据库如下所示：它按名称数据保存： CountryCode IndicatorName Year Value USA Population, total 1993 72498 USA Population, total 199

数据源： 文件夹：“世界发展指标” 文件：Indicators.csv

我试图在两个变量之间绘制散点图。然而，这两个变量的大小并不相同

数据库如下所示：它按名称数据保存：

CountryCode IndicatorName                   Year    Value
USA         Population, total               1993    72498
USA         Population, total               1994    76700
USA         Population, female (% of total) 1993    50.52691109
USA         Population, female (% of total) 1994    50.57235984
USA         GDP per capita (const 2005 US$) 1994    23086.93795
USA         Population, female (% of total) 1988    50.91933134
USA         Population, total               1988    61077

我想画两件事之间的散点图：绝对女性人口和人均GDP（const 2005美元）。绝对女性人口=人口总数*人口，女性（%）

挑战如下：

a）一个国家的总人口、女性人口和GDP值存在不同的年份。例如，对于美国，假设人口值的数量，总数仅存在20年，女性人口数字为18年，GDP值仅存在10年

没有NAN/Null值

我需要这些值，其中所有这些参数的值在给定年份的某个国家都存在
我是python新手，所以我无法在代码中表达我想要的东西。请任何人帮忙：

femalepop_filter = data['IndicatorName'].str.contains('Population, female') FemalePop = data[femalepop_filter] Pop_total=data['IndicatorName'].str.contains('Population, total') Pop_Tot=data[Pop_total] hist_indicator = 'GDP per capita \(const 2005' GDP_Filter = data['IndicatorName'].str.contains(hist_indicator) GDPValues=data[GDP_Filter] c1 = (FemalePop['CountryCode']) c2 = (GDPValues['CountryCode']) c3 = (Pop_Tot['CountryCode']) c4 = np.intersect1d(c1,c2) c5 = np.intersect1d(c3,c4)

我捕获了所有参数的国家代码。现在我在c5找到了他们的十字路口。有人能帮助我如何获取c5中国家代码的数据吗？
错误告诉您Python不知道如何连接（&）字符串和布尔变量
将bool转换为字符串，您的连接应该可以工作

通常，一步一步地调试代码。首先看看变量包含什么。您可以使用Python的“pretty print”（pprint）模块来实现这一点。这样，您可以打印出各种变量，以便查看它们包含的内容
试试类似于
data[data['CountryCode'].isin（c5）]
我找到了答案

data2=data[data['CountryCode'].isin(c5)] #Getting all the intersection of country codes in one dataset data2['concatyearandCC'] = data2["CountryCode"] + "" + data2["Year"].map(str) #Introducing new column which is concatenation of country code and Year so that I #get all the rows corresponding to same year and country code. c9 = pd.merge(FemalePop2,Pop_Tot2,on="concatyearandCC") c10= pd.merge(c9,GDPValues2,on="concatyearandCC") #Merging datasets containing female population%, GDP and total population of #females so that I can calculate absolute number of females. c10.rename(columns={'Value_x': 'Population_female%', 'Value_y': 'Population Total', 'Value': 'GDP Per capita'}, inplace=True) #Renaming some columns for ease. c10_Final['Abs_Female_Pop'] = c10_Final['Population_female%'] *c10_Final['Population Total'] #Finding absolute female population

如果您丢失数据，可能会填写零，这样可以平衡数据，并且您应该能够可视化它。嗨，问题是没有NAN值。这些行只存在于某个参数对应的年份，而不存在于另一个参数对应的年份。那么，在数据不存在的地方，这些行是什么样子的呢？假设没有给出1993年的GDP值。并给出了1993年的人口值（总人口和女性人口）。因此，1993年为年份的GDP值行根本不存在。这是一个数据库（不是表单）。对于给定的国家，所有参数在所有年份都没有提及。请立即检查。我想串联并不能解决我的问题。如何获取所有3个参数都存在的国家代码您可以创建一个按年份索引的词典。类似于
data={}；数据['1999']={'country'：'usa'，'population'：7，'女性百分比'=50}
。你需要做的远不止这些，但这就是你的想法。然后，当您需要访问这些值时，首先测试它们是否存在，并相应地执行操作。共有600万条记录。如何为所有人创建词典？我想到了一种方法。c1=（FemalePop['CountryCode']）c2=（GDPValues['CountryCode']）c3=（Pop_Tot['CountryCode']）如何获得所有这些CountryCode的交集。这个想法是，对于每个国家代码，无论哪三个值都存在，如果我尝试c1&c2&c3，我都会得到国家代码列表。我会得到错误类型错误：&:“str”和“bool”的操作数类型不受支持。您可以使用
GDPValues.has_key（“某物”）
来确定该值是否已设置。