Python 以表格的形式输出结果
我的数据集如下:Python 以表格的形式输出结果,python,pandas,Python,Pandas,我的数据集如下: print(df) name points attempts 'Alex' 2 4 'Brian' 1 2 'Cathy' 3 5 'Daniel' 5 7 假设我有一些表单代码 for name in df: if points > 2: grade = 'pass' else: grade = 'f
print(df)
name points attempts
'Alex' 2 4
'Brian' 1 2
'Cathy' 3 5
'Daniel' 5 7
假设我有一些表单代码
for name in df:
if points > 2:
grade = 'pass'
else:
grade = 'fail'
average_points = points/attempts
attempts_left = 10 - attempts
name grade average_points attempts_left
'Alex' fail 0.5 6
'Brian' fail 0.5 8
'Cathy' pass 0.6 5
'Daniel' pass 0.71 3
我想在这里实现的是一个输出表(在一个数据框中),形式如下
for name in df:
if points > 2:
grade = 'pass'
else:
grade = 'fail'
average_points = points/attempts
attempts_left = 10 - attempts
name grade average_points attempts_left
'Alex' fail 0.5 6
'Brian' fail 0.5 8
'Cathy' pass 0.6 5
'Daniel' pass 0.71 3
问题是,我不确定应该在代码中使用哪些返回/附加函数。此外,我知道在我的原始数据集中为“grade”、“average_points”和“attempts_left”添加列可能更简单,但这种方法在我的情况下不起作用,因为我的原始数据比上面的工作示例更复杂
任何帮助都将不胜感激。谢谢 使用pandas.DataFrame()
和df.append
:
df2 = pandas.DataFrame()
for i,row in df.iterrows():
points = row["points"]
attempts = row["attempts"]
new_row = {}
new_row["name"] = row["name"]
if points > 2:
new_row["grade"] = 'pass'
else:
new_row["grade"] = 'fail'
new_row["average_points"] = points/attempts
new_row["attempts_left"] = 10 - attempts
df2 = df2.append(pandas.DataFrame(new_row,index=[i]))
print(df2)
产出:
attempts_left average_points grade name
0 6 0.500000 fail Alex
1 8 0.500000 fail Brian
2 5 0.600000 pass Cathy
3 3 0.714286 pass Daniel
使用pandas.DataFrame()
和df.append
:
df2 = pandas.DataFrame()
for i,row in df.iterrows():
points = row["points"]
attempts = row["attempts"]
new_row = {}
new_row["name"] = row["name"]
if points > 2:
new_row["grade"] = 'pass'
else:
new_row["grade"] = 'fail'
new_row["average_points"] = points/attempts
new_row["attempts_left"] = 10 - attempts
df2 = df2.append(pandas.DataFrame(new_row,index=[i]))
print(df2)
产出:
attempts_left average_points grade name
0 6 0.500000 fail Alex
1 8 0.500000 fail Brian
2 5 0.600000 pass Cathy
3 3 0.714286 pass Daniel
您可以将操作矢量化并使用
assign
In [839]: df.assign(attempts_left=10 - df.attempts,
...: average_points=df.points / df.attempts,
...: grade=np.where(df.points > 2, 'pass', 'fail'))
Out[839]:
name points attempts attempts_left average_points grade
0 'Alex' 2 4 6 0.500000 fail
1 'Brian' 1 2 8 0.500000 fail
2 'Cathy' 3 5 5 0.600000 pass
3 'Daniel' 5 7 3 0.714286 pass
您可以将操作矢量化并使用
assign
In [839]: df.assign(attempts_left=10 - df.attempts,
...: average_points=df.points / df.attempts,
...: grade=np.where(df.points > 2, 'pass', 'fail'))
Out[839]:
name points attempts attempts_left average_points grade
0 'Alex' 2 4 6 0.500000 fail
1 'Brian' 1 2 8 0.500000 fail
2 'Cathy' 3 5 5 0.600000 pass
3 'Daniel' 5 7 3 0.714286 pass
使用
应用
:
import pandas as pd
df = pd.DataFrame([
['Alex', 2, 4],
['Brian', 1, 2],
['Cathy', 3, 5],
['Daniel', 5, 7],
], columns=['name', 'points', 'attempts'])
df['grade'] = df['points'].apply(lambda points: 'pass' if points > 2 else 'fail')
df['attempts_left'] = df['points'].apply(lambda points: 'pass' if points > 2 else 'fail')
df['average_points'] = df[['points', 'attempts']].apply(lambda row: row['points']/row['attempts'], axis=1)
new_df = df[['name', 'grade', 'average_points', 'attempts_left']]
使用
应用
:
import pandas as pd
df = pd.DataFrame([
['Alex', 2, 4],
['Brian', 1, 2],
['Cathy', 3, 5],
['Daniel', 5, 7],
], columns=['name', 'points', 'attempts'])
df['grade'] = df['points'].apply(lambda points: 'pass' if points > 2 else 'fail')
df['attempts_left'] = df['points'].apply(lambda points: 'pass' if points > 2 else 'fail')
df['average_points'] = df[['points', 'attempts']].apply(lambda row: row['points']/row['attempts'], axis=1)
new_df = df[['name', 'grade', 'average_points', 'attempts_left']]
谢谢你的帮助,约翰!然而,我正在寻找与df2类似的东西。将('grade'/'average_points'/'truments_left')附加到一个新的数据帧上,每个名称作为一个单独的行。更多关于我的原始数据:它是一个timeseries索引,用于一个月内一个玩家的所有尝试。所以在本例中,我实际上是在计算玩家的每日结果(分数、平均分、左尝试次数)并输出表格,每行代表一个月中的一天。不确定我是否足够清楚,所以请随时询问更多信息。干杯。谢谢你的帮助,约翰!然而,我正在寻找与df2类似的东西。将('grade'/'average_points'/'truments_left')附加到一个新的数据帧上,每个名称作为一个单独的行。更多关于我的原始数据:它是一个timeseries索引,用于一个月内一个玩家的所有尝试。所以在本例中,我实际上是在计算玩家的每日结果(分数、平均分、左尝试次数)并输出表格,每行代表一个月中的一天。不确定我是否足够清楚,所以请随时询问更多信息。干杯