Python 从单元格中提取列表并将列表元素用作新列
试图创建一个数据框架,其中包含选举竞赛的名称、结果(共和党-民主党的普选分数)以及每次投票的投票差异。到目前为止,我的代码是:Python 从单元格中提取列表并将列表元素用作新列,python,loops,pandas,Python,Loops,Pandas,试图创建一个数据框架,其中包含选举竞赛的名称、结果(共和党-民主党的普选分数)以及每次投票的投票差异。到目前为止,我的代码是: def results_polls_diff(editinfo, polls): rows = [] for i, election in enumerate(editinfo): polls_key = election['slug'] this_election = polls[polls_key]
def results_polls_diff(editinfo, polls):
rows = []
for i, election in enumerate(editinfo):
polls_key = election['slug']
this_election = polls[polls_key]
npolls = this_election.shape[0]
diff = (this_election[candidates['R'].ix[i]] - this_election[candidates['D or I'].ix[i]])/100
for c in election['estimates']:
if c['party'] == 'Rep' :
r1 = c['value']
for c in election['estimates']:
if c['party'] == 'Dem' or c['party'] == 'ind' :
r2 = c['value']
result = (r1-r2)/100
#init_rows = []
#for d in diff:
# init_rows.append((polls_key, result, d))
#return init_rows
rows.append((polls_key, result, [d for d in diff]))
return rows
result_df = pd.DataFrame(results_polls_diff(editinfo, polls), columns = ['race', 'result', 'diff_list'])
result_df.head()
输出:
race result diff_list
0 2014-delaware-senate-wade-vs-coons -0.220 [-0.18, -0.16, -0.25, -0.15]
1 2014-massachusetts-senate-herr-vs-markey -0.207 [-0.2, -0.15, -0.16, -0.25, -0.22, -0.26, -0.2...
2 2014-rhode-island-senate-zaccaria-vs-reed -0.207 [-0.45, -0.42, -0.35]
3 2014-montana-senate-daines-vs-curtis 0.177 [0.14, 0.18, 0.16, 0.21, 0.13]
4 2014-hawaii-senate-cavasso-vs-schatz -0.477 [-0.52, -0.26, -0.51, -0.54, -0.37, -0.32]
我的目标更像这样:
race result diff_list
0 2014-delaware-senate-wade-vs-coons -0.22 -0.18
1 2014-delaware-senate-wade-vs-coons -0.22 -0.16
2 2014-delaware-senate-wade-vs-coons -0.22 -0.25
3 2014-delaware-senate-wade-vs-coons -0.22 -0.15
如果我使用代码的散列部分并更改append to
rows.append((init_rows))
,我会得到这个结果,但它似乎不再遍历所有的editinfo
。因此,我正在寻找的解决方案要么是让迭代开始工作的方法,要么是从diff_list
列中提取一个列表,以便元素占据该列中的一个单元格,并复制行的其余部分 这是一种策略。考虑<代码> df>代码>
df = pd.DataFrame(dict(A=list('ab'), B=[1, 2], C=[[1, 2, 3], [4, 5, 6]], ))
df
选项1使用
设置索引
,应用
,取消堆栈
df.set_index(['A', 'B']).C.apply(pd.Series).stack().reset_index(['A', 'B'], name='C')
names = ['A', 'B']
idx = pd.MultiIndex.from_tuples(df[names].values.tolist(), names=names)
pd.DataFrame(df.C.tolist(), idx).stack().reset_index(names, name='C')
选项2构建新索引和数据帧,然后
unstack
df.set_index(['A', 'B']).C.apply(pd.Series).stack().reset_index(['A', 'B'], name='C')
names = ['A', 'B']
idx = pd.MultiIndex.from_tuples(df[names].values.tolist(), names=names)
pd.DataFrame(df.C.tolist(), idx).stack().reset_index(names, name='C')