Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/330.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫:我如何使新冠病毒-19的数据框架正常化,不同国家有不同的爆发日期_Python_Pandas_Dataframe - Fatal编程技术网

Python 熊猫:我如何使新冠病毒-19的数据框架正常化,不同国家有不同的爆发日期

Python 熊猫:我如何使新冠病毒-19的数据框架正常化,不同国家有不同的爆发日期,python,pandas,dataframe,Python,Pandas,Dataframe,为了有意义地比较不同地区,我想在不同国家的疫情开始日期之前使新冠病毒-19确诊病例正常化。对于任何地区,该地区达到或超过10例确诊病例的那一天被视为“疫情爆发的第0天” 数据帧示例: [in] import pandas as pd confirmed_cases = {'Date':['1/22/20', '1/23/20', '1/24/20', '1/25/20', '1/26/20'], 'Australia':[0, 0, 0, 30, 50], 'Albania':[0, 20, 2

为了有意义地比较不同地区,我想在不同国家的疫情开始日期之前使新冠病毒-19确诊病例正常化。对于任何地区,该地区达到或超过10例确诊病例的那一天被视为“疫情爆发的第0天”

数据帧示例:

[in]
import pandas as pd
confirmed_cases = {'Date':['1/22/20', '1/23/20', '1/24/20', '1/25/20', '1/26/20'], 'Australia':[0, 0, 0, 30, 50], 'Albania':[0, 20, 25, 30, 50], 'Algeria':[25, 40, 50, 50, 70]}
df = pd.DataFrame(confirmed_cases)
df

[out]
    Date    Australia   Albania     Algeria
0   1/22/20        0         0          25
1   1/23/20        0        20          40
2   1/24/20        0        25          50
3   1/25/20       30        30          50
4   1/26/20       50        50          70
预期结果:

    Day Since Outbreak     Australia    Albania     Algeria
0           0                    30         20          25
1           1                    50         25          40
2           2                   NaN         30          50
3           3                   NaN         50          50
4           4                   NaN        NaN          70

有什么方法可以用简单的Python/Panda代码行执行此任务吗?

根据第一次运行的值<10,确定需要对每列进行多少次移位。然后移动它们。
cummin
可确保如果间歇值<10,则不会在
shift

df = df.drop(columns='Date')  # Wont need
s = df.lt(10).cummin().sum()

for col, shift in s.iteritems():
    df[col] = df[col].shift(-shift)

df['Days Since'] = range(len(df)) # Duplicative with index...


找到每个国家第一个超过阈值(10)的值的索引值,并将每个列向上移动那么多

df2 = df[['Australia', 'Albania', 'Algeria']].apply(lambda x: x.shift(-(x > 10).idxmax()))
# df2
   Australia  Albania  Algeria
0       30.0     20.0       25
1       50.0     25.0       40
2        NaN     30.0       50
3        NaN     50.0       50
4        NaN      NaN       70
重置索引以获取自列开始的日期

df2.reset_index().rename(columns={'index': 'Day Since Outbreak'})

   Day Since Outbreak  Australia  Albania  Algeria
0                   0       30.0     20.0       25
1                   1       50.0     25.0       40
2                   2        NaN     30.0       50
3                   3        NaN     50.0       50
4                   4        NaN      NaN       70

idxmax的良好使用+1@ScottBoston,谢谢,我周末确实在玩新冠病毒数据,而且最近已经这么做了
df2.reset_index().rename(columns={'index': 'Day Since Outbreak'})

   Day Since Outbreak  Australia  Albania  Algeria
0                   0       30.0     20.0       25
1                   1       50.0     25.0       40
2                   2        NaN     30.0       50
3                   3        NaN     50.0       50
4                   4        NaN      NaN       70