在python中修改/转换数据帧
我有一个数据框,其中包含日期值、从一个字母到3个字符字符串随机分配的类别,以及频率['a'、'B'、'C'] 我想修改初始数据框,以便将日期和每个类别作为一列,为类别列分配与类别相关的freqA,并将空值保留为无 我怎样才能做到这一点在python中修改/转换数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个数据框,其中包含日期值、从一个字母到3个字符字符串随机分配的类别,以及频率['a'、'B'、'C'] 我想修改初始数据框,以便将日期和每个类别作为一列,为类别列分配与类别相关的freqA,并将空值保留为无 我怎样才能做到这一点 This is my df (i forgot to include the index): +--------+----------+-------+-------+-------+ | Date | Category | freqA | freqB | f
This is my df (i forgot to include the index):
+--------+----------+-------+-------+-------+
| Date | Category | freqA | freqB | freqC |
+--------+----------+-------+-------+-------+
| 2/1/19 | A | 2 | 89 | 7 |
+--------+----------+-------+-------+-------+
| 2/2/19 | B | 5 | 98 | 8 |
+--------+----------+-------+-------+-------+
| 2/3/19 | A | 10 | 100 | 12 |
+--------+----------+-------+-------+-------+
| 2/4/19 | A | 17 | 121 | 15 |
+--------+----------+-------+-------+-------+
| 2/5/29 | C | 21 | 133 | 25 |
+--------+----------+-------+-------+-------+
| 2/6/19 | C | 25 | 134 | 31 |
+--------+----------+-------+-------+-------+
This is my target df:
+------+-----------+-----------+-----------+-------------+
| Date | CategoryA | CategoryB | CategoryC | Category[a] |
+------+-----------+-----------+-----------+-------------+
| Date | freqA | freqA | freqA | freqA |
+------+-----------+-----------+-----------+-------------+
我对蟒蛇和熊猫很陌生
因此,我非常感谢我所能得到的所有帮助这对您有用吗:
#working with subset of your data
>>> df = pd.DataFrame({'date':['2/1/19','3/1/19','4/1/19', '5/1/19','6/1/19'], 'Category':['A','B','A','A','C'],'freqA':[2,5,10
,17,21],'freqB':[89,98,100,121,133]})
#input data
>>> df
date Category freqA freqB
0 2/1/19 A 2 89
1 3/1/19 B 5 98
2 4/1/19 A 10 100
3 5/1/19 A 17 121
4 6/1/19 C 21 133
#using pivot to reshape the dataframe and renaming the columns
>>> df1 = df.pivot(index ='date', columns='Category',values='freqA')
>>> df1.columns = [f'Category{x}' for x in df1.columns.tolist()]
>>> print(df1)
输出:
CategoryA CategoryB CategoryC
date
2/1/19 2.0 NaN NaN
3/1/19 NaN 5.0 NaN
4/1/19 10.0 NaN NaN
5/1/19 17.0 NaN NaN
6/1/19 NaN NaN 21.0
您还可以使用fillna
处理NaN
值。以下是一个例子:
>>> df1.fillna(method='ffill', inplace=True)
>>> df1.fillna(method='bfill', inplace=True)
CategoryA CategoryB CategoryC
date
2/1/19 2.0 5.0 21.0
3/1/19 2.0 5.0 21.0
4/1/19 10.0 5.0 21.0
5/1/19 17.0 5.0 21.0
6/1/19 17.0 5.0 21.0
请给我们看看你试过的东西?而且,目标df看起来很奇怪,您只需要在列中输入
freqA
,或者输入freqA
@Grayrigel的值我需要freqA的值我尝试过类似的方法,使用了一个新的covid-19时间序列数据集,但它的构建方式与此数据集不同。到目前为止,我还没有为此数据框编写任何代码。我已经添加了一个答案,请告诉我它是否适用于您。谢谢非常地成功了!很高兴我能帮忙。祝你好运。快乐编码!!:)