Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/313.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫按列表中的值分组(串联)_Python_Pandas_Pandas Groupby - Fatal编程技术网

Python 熊猫按列表中的值分组(串联)

Python 熊猫按列表中的值分组(串联),python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我正在尝试按DataFrame系列中列表中的项目分组。正在使用的数据集是 布局大致如下: ... LanguageWorkedWith ... ConvertedComp ... Respondent 1 Python;C 50000 2 C++;C 70000 # read csv sos = pd.read_csv("develo

我正在尝试按DataFrame系列中列表中的项目分组。正在使用的数据集是

布局大致如下:

           ... LanguageWorkedWith ... ConvertedComp ...
Respondent
    1               Python;C              50000
    2                C++;C                70000
# read csv
sos = pd.read_csv("developer_survey_2020/survey_results_public.csv", index_col='Respondent')

# seperate string into list of strings, disregarding unanswered responses
temp = sos["LanguageWorkedWith"].dropna().str.split(';')

# create new DataFrame with respondent index and rows populated withknown languages
langs_known = pd.DataFrame(temp.tolist(), index=temp.index)

# stack columns as rows, dropping old column names
stacked_responses = langs_known.stack().reset_index(level=1, drop=True)

# Re-indexing sos DataFrame to match stacked_responses dimension
# Concatenate reindex series to ConvertedComp series columnwise
reindexed_pays = sos["ConvertedComp"].reindex(stacked_responses.index)
stacked_with_pay = pd.concat([stacked_responses, reindexed_pays], axis='columns')

# Remove rows with no salary data
# Renaming columns
stacked_with_pay.dropna(how='any', inplace=True)
stacked_with_pay.columns = ["LWW", "Salary"]

# Group by LLW and apply median 
lang_ave_pay = stacked_with_pay.groupby("LWW")["Salary"].median().sort_values(ascending=False).head()
我想在使用的语言列表中的唯一值上使用
groupby
,并将平均聚合器函数应用于ConvertedComp,比如

LanguageWorkedWith
        C++                 70000
        C                   60000
      Python                50000
实际上,我已经成功地实现了预期的输出,但我的解决方案似乎有点僵硬,而且对熊猫来说是新的,我相信可能有更好的方法

我的解决办法如下:

           ... LanguageWorkedWith ... ConvertedComp ...
Respondent
    1               Python;C              50000
    2                C++;C                70000
# read csv
sos = pd.read_csv("developer_survey_2020/survey_results_public.csv", index_col='Respondent')

# seperate string into list of strings, disregarding unanswered responses
temp = sos["LanguageWorkedWith"].dropna().str.split(';')

# create new DataFrame with respondent index and rows populated withknown languages
langs_known = pd.DataFrame(temp.tolist(), index=temp.index)

# stack columns as rows, dropping old column names
stacked_responses = langs_known.stack().reset_index(level=1, drop=True)

# Re-indexing sos DataFrame to match stacked_responses dimension
# Concatenate reindex series to ConvertedComp series columnwise
reindexed_pays = sos["ConvertedComp"].reindex(stacked_responses.index)
stacked_with_pay = pd.concat([stacked_responses, reindexed_pays], axis='columns')

# Remove rows with no salary data
# Renaming columns
stacked_with_pay.dropna(how='any', inplace=True)
stacked_with_pay.columns = ["LWW", "Salary"]

# Group by LLW and apply median 
lang_ave_pay = stacked_with_pay.groupby("LWW")["Salary"].median().sort_values(ascending=False).head()
输出:

LWW
Perl     76131.5
Scala    75669.0
Go       74034.0
Rust     74000.0
Ruby     71093.0
Name: Salary, dtype: float64
与选择特定语言时计算的值相匹配:
sos.loc[sos[“LanguageWorkedWith”].str.contains('Perl').fillna(False),“ConvertedComp”].media()


任何关于如何改进/提供此功能的功能/etc的提示都将不胜感激

在仅目标列数据框中,分解语言名称并将其与薪资相结合。下一步是使用melt将数据从水平格式转换为垂直格式。然后我们将语言名称分组,得到中间值


美好的扩展列表并使用
melt
肯定更好。我相信重置索引并不是绝对必要的,因为默认情况下,
melt
会忽略索引。即使
ignore_index
设置为false,您仍然可以使用“应答者”id作为索引获得相同的结果。再次感谢!我在张贴后注意到了你的观点。非常感谢你。如果我的答案对你有帮助,请接受它作为正确答案,并给它一个1+。