Python 基于现有列的行中的值,创建包含这些行的新列
我正在与Pandas一起使用Python,并遇到以下问题。我有一个数据框,其中有大量行描述每个日期的加密货币数据。到达最后一个日期后,将为另一种加密货币启动一个新的时间序列,所有时间序列都在同一列中。我正在寻找一种处理数据帧的方法,以便对于每个令牌\日期,所有加密货币数据都显示在一行中,这样总行数将等于令牌\日期的总行数 目前df如下所示:Python 基于现有列的行中的值,创建包含这些行的新列,python,pandas,dataframe,Python,Pandas,Dataframe,我正在与Pandas一起使用Python,并遇到以下问题。我有一个数据框,其中有大量行描述每个日期的加密货币数据。到达最后一个日期后,将为另一种加密货币启动一个新的时间序列,所有时间序列都在同一列中。我正在寻找一种处理数据帧的方法,以便对于每个令牌\日期,所有加密货币数据都显示在一行中,这样总行数将等于令牌\日期的总行数 目前df如下所示: token_id token_caption token_date token_price_usd token_marketcap_usd 64
token_id token_caption token_date token_price_usd token_marketcap_usd
64 WAN Wanchain 2019-06-24 0.3817 40414601.0
64 WAN Wanchain 2019-07-01 0.3644 38683920.0
64 WAN Wanchain 2019-07-08 0.3557 37759781.0
64 WAN Wanchain 2019-07-15 0.2625 27824362.0
64 WAN Wanchain 2019-07-22 0.2545 27036722.0
...
57 MAID 2017-07-24 0.3775 170824959.0
57 MAID 2017-07-31 0.2917 132012254.0
57 MAID 2017-08-07 0.3589 162410652.0
57 MAID 2017-08-14 0.3763 170283706.0
57 MAID 2017-08-21 0.4615 208873303.0
...
我正在寻找代码来实现类似的功能。:(列拆分将执行大约100次,最终得到约201列)
如果有任何帮助,我将不胜感激。我是Python的初学者,对如何实现这一点没有概念
谢谢大家! 如果将索引设置为
['token\u date','token\u caption']
并将标题取消堆叠,使其成为一列,则会得到一个非常干净的多索引
列,其中包含您要查找的内容:
In [144]: df
Out[144]:
token_id token_caption token_date token_price_usd token_marketcap_usd
0 64 WAN Wanchain 2019-06-24 0.3817 40414601.0
1 64 WAN Wanchain 2019-07-01 0.3644 38683920.0
2 64 WAN Wanchain 2019-07-08 0.3557 37759781.0
3 64 WAN Wanchain 2019-07-15 0.2625 27824362.0
4 64 WAN Wanchain 2019-07-22 0.2545 27036722.0
5 57 MAID 2019-06-24 0.3775 170824959.0
6 57 MAID 2019-07-01 0.2917 132012254.0
7 57 MAID 2019-07-08 0.3589 162410652.0
8 57 MAID 2019-07-15 0.3763 170283706.0
9 57 MAID 2019-07-22 0.4615 208873303.0
In [145]: df.set_index(["token_date", "token_caption"])[["token_price_usd", "token_marketcap_usd"]].unstack().swaplevel(axis=1)
Out[145]:
token_caption MAID WAN Wanchain MAID WAN Wanchain
token_price_usd token_price_usd token_marketcap_usd token_marketcap_usd
token_date
2019-06-24 0.3775 0.3817 170824959.0 40414601.0
2019-07-01 0.2917 0.3644 132012254.0 38683920.0
2019-07-08 0.3589 0.3557 162410652.0 37759781.0
2019-07-15 0.3763 0.2625 170283706.0 27824362.0
2019-07-22 0.4615 0.2545 208873303.0 27036722.0
为什么不使用:
给定数据
token_id token_caption token_date token_price_usd token_marketcap_usd
64 WAN_Wanchain 2019-06-24 0.3817 40414601.0
64 WAN_Wanchain 2019-07-01 0.3644 38683920.0
64 WAN_Wanchain 2019-07-08 0.3557 37759781.0
64 WAN_Wanchain 2019-07-15 0.2625 27824362.0
64 WAN_Wanchain 2019-07-22 0.2545 27036722.0
57 MAID 2019-06-24 0.3775 170824959.0
57 MAID 2019-07-01 0.2917 132012254.0
57 MAID 2019-07-08 0.3589 162410652.0
57 MAID 2019-07-15 0.3763 170283706.0
57 MAID 2019-07-22 0.4615 208873303.0
注:我重复了日期,因此有一些匹配
df.pivot("token_date", "token_caption", ["token_price_usd", "token_marketcap_usd"])
给予
我使用pivot_表并构造新列名:
df=df.pivot_table(index="token_date",columns="token_caption",values=["token_price_usd","token_marketcap_usd"])
token_marketcap_usd token_price_usd
token_caption MAID WAN Wanchain MAID WAN Wanchain
token_date
2017-07-24 170824959.0 NaN 0.3775 NaN
2017-07-31 132012254.0 NaN 0.2917 NaN
2017-08-07 162410652.0 NaN 0.3589 NaN
2017-08-14 170283706.0 NaN 0.3763 NaN
2017-08-21 208873303.0 NaN 0.4615 NaN
2019-06-24 NaN 40414601.0 NaN 0.3817
2019-07-01 NaN 38683920.0 NaN 0.3644
2019-07-08 NaN 37759781.0 NaN 0.3557
2019-07-15 NaN 27824362.0 NaN 0.2625
2019-07-22 NaN 27036722.0 NaN 0.2545
df.columns=[ lev2+" - "+lev1.split("_")[1].title() for lev1,lev2 in df.columns]
df.reindex(sorted(df.columns.values,reverse=True) ,axis=1)
WAN Wanchain - Price WAN Wanchain - Marketcap MAID - Price MAID - Marketcap
token_date
2017-07-24 NaN NaN 0.3775 170824959.0
2017-07-31 NaN NaN 0.2917 132012254.0
2017-08-07 NaN NaN 0.3589 162410652.0
2017-08-14 NaN NaN 0.3763 170283706.0
2017-08-21 NaN NaN 0.4615 208873303.0
2019-06-24 0.3817 40414601.0 NaN NaN
2019-07-01 0.3644 38683920.0 NaN NaN
2019-07-08 0.3557 37759781.0 NaN NaN
2019-07-15 0.2625 27824362.0 NaN NaN
2019-07-22 0.2545 27036722.0 NaN NaN
最后,您可以应用“重置索引”
token_price_usd token_marketcap_usd
token_caption MAID WAN_Wanchain MAID WAN_Wanchain
token_date
2019-06-24 0.3775 0.3817 170824959.0 40414601.0
2019-07-01 0.2917 0.3644 132012254.0 38683920.0
2019-07-08 0.3589 0.3557 162410652.0 37759781.0
2019-07-15 0.3763 0.2625 170283706.0 27824362.0
2019-07-22 0.4615 0.2545 208873303.0 27036722.0
df=df.pivot_table(index="token_date",columns="token_caption",values=["token_price_usd","token_marketcap_usd"])
token_marketcap_usd token_price_usd
token_caption MAID WAN Wanchain MAID WAN Wanchain
token_date
2017-07-24 170824959.0 NaN 0.3775 NaN
2017-07-31 132012254.0 NaN 0.2917 NaN
2017-08-07 162410652.0 NaN 0.3589 NaN
2017-08-14 170283706.0 NaN 0.3763 NaN
2017-08-21 208873303.0 NaN 0.4615 NaN
2019-06-24 NaN 40414601.0 NaN 0.3817
2019-07-01 NaN 38683920.0 NaN 0.3644
2019-07-08 NaN 37759781.0 NaN 0.3557
2019-07-15 NaN 27824362.0 NaN 0.2625
2019-07-22 NaN 27036722.0 NaN 0.2545
df.columns=[ lev2+" - "+lev1.split("_")[1].title() for lev1,lev2 in df.columns]
df.reindex(sorted(df.columns.values,reverse=True) ,axis=1)
WAN Wanchain - Price WAN Wanchain - Marketcap MAID - Price MAID - Marketcap
token_date
2017-07-24 NaN NaN 0.3775 170824959.0
2017-07-31 NaN NaN 0.2917 132012254.0
2017-08-07 NaN NaN 0.3589 162410652.0
2017-08-14 NaN NaN 0.3763 170283706.0
2017-08-21 NaN NaN 0.4615 208873303.0
2019-06-24 0.3817 40414601.0 NaN NaN
2019-07-01 0.3644 38683920.0 NaN NaN
2019-07-08 0.3557 37759781.0 NaN NaN
2019-07-15 0.2625 27824362.0 NaN NaN
2019-07-22 0.2545 27036722.0 NaN NaN