Python 将df列名设置为索引（熊猫）_Python_Pandas_Plotly

Python 将df列名设置为索引（熊猫）

python pandas

Python 将df列名设置为索引（熊猫）,python,pandas,plotly,Python,Pandas,Plotly,我正在使用一个excel文件，其中包含一组基因名称以及它们在几年内每月出现的次数（如果有意义的话）。我目前使用熊猫读取文件并制作数据帧输入： import pandas as pd import plotly.express as px df = pd.read_csv('genes.csv', sep = ',', header = None) print(df) print(df.set_index([]).stack().reset_index(name='Date')) fig =

我正在使用一个excel文件，其中包含一组基因名称以及它们在几年内每月出现的次数（如果有意义的话）。我目前使用熊猫读取文件并制作数据帧

输入：

import pandas as pd
import plotly.express as px

df = pd.read_csv('genes.csv', sep = ',', header = None)
print(df)

print(df.set_index([]).stack().reset_index(name='Date'))
fig = px.line(df, title = 'Human Gene Occurances Per Month')
fig.show()

输出：

     0       1       2       3    ...     561      562      563      564
0    NaN  1971-1  1971-2  1971-3  ...  2017-9  2017-10  2017-11  2017-12
1  BRCA1       0       0       0  ...       0        0        0        0
2  BRCA2       0       0       0  ...       0        0        0        0
3   MAPK       0       0       0  ...       0        0        0        0

ValueError: Must pass non-zero number of levels/codes

我知道我想绘制这些数据，并且一直在试图找出如何将日期设置为索引，但不完全确定我是否需要这样做。我看到一些关于使用set_索引的帖子，所以我尝试使用下面的代码。这只是给了我一个错误

输入：

import pandas as pd
import plotly.express as px

df = pd.read_csv('genes.csv', sep = ',', header = None)
print(df)

print(df.set_index([]).stack().reset_index(name='Date'))
fig = px.line(df, title = 'Human Gene Occurances Per Month')
fig.show()

输出：

     0       1       2       3    ...     561      562      563      564
0    NaN  1971-1  1971-2  1971-3  ...  2017-9  2017-10  2017-11  2017-12
1  BRCA1       0       0       0  ...       0        0        0        0
2  BRCA2       0       0       0  ...       0        0        0        0
3   MAPK       0       0       0  ...       0        0        0        0

ValueError: Must pass non-zero number of levels/codes

我试图用Plotly为每个基因创建一个图表，在x轴上绘制日期，在y轴上绘制计数。非常感谢您的帮助。多谢各位

也不是所有计数都等于零，这只是希望打印时在压缩数据框中显示

import numpy as np 
import pandas as pd
import matplotlib.pyplot as p
#     0       1       2       3    ...     561      562      563      564
# 0    NaN  1971-1  1971-2  1971-3  ...  2017-9  2017-10  2017-11  2017-12
# 1  BRCA1       0       0       0  ...       0        0        0        0
# 2  BRCA2       0       0       0  ...       0        0        0        0
# 3   MAPK       0       0       0  ...       0        0        0        0

d={'0':['NaN','BRCA1','BRCA2'],'1':['1971-1',0,0],'2':['1971-2',1,0],'3':['1971-3',0,1]}
df =pd.DataFrame(data=d)
df=df.transpose()    # time series are typically in columns
df

熊猫有很多解决问题的方法，而不是你所能用的。除非你每天用它工作8小时，否则你会忘记的。我通过在个人wiki中保留一些完整的示例来管理它，这样当我忘了什么东西时，我可以更快地搜索它。

一般来说：您的数据帧：你的阴谋这是使用尽可能最简单的方法进行的，并将pandas plotting backed设置为plotly。它看起来有点奇怪的原因是您提供的数据集有限。我只是在其中添加了一些虚拟数据，以便能够识别不同的痕迹。继续尝试真实世界的数据，我很确定它看起来会很完美

完整代码：

查看最小可验证示例。因此，获得帮助的最佳方法是在代码中构造一个示例数据框。这更有帮助吗？不太确定如何添加示例df。当前正在阅读MRE文章。有关如何从字典中快速生成数据帧示例的示例，请参见我的答案。这是关于数据帧的pandas文档的直接内容，并且只是构建数据帧的方法之一。@LaurenKirsch如果您阅读了本文，您将在几分钟内学会如何共享数据帧示例。

df.rename(columns=df.iloc[0], inplace = True)
df.drop(df.index[0], inplace=True)
df.set_index(<column name>, inplace=True)

# transpose dataframe first
df=df.T
df.rename(columns=df.iloc[0], inplace = True)
df.drop(df.index[0], inplace=True)
df.rename(columns={'nan':'Time'}, inplace=True)
df.set_index('Time', inplace=True)

        BRCA1 BRCA2 MAPK
Time                    
1971-1      0     0    0
1971-2      0     0    0
1971-3      0     0    0
2017-9      0     0    0
2017-10     0     0    0
2017-11     0     0    0
2017-12     0     0    0

import pandas as pd
pd.options.plotting.backend = "plotly"

# data
df=pd.DataFrame({'0': {0: 'nan', 1: 'BRCA1', 2: 'BRCA2', 3: 'MAPK'},
                 '1': {0: '1971-1', 1: '0', 2: '0', 3: '0'},
                 '2': {0: '1971-2', 1: '0', 2: '0', 3: '0'},
                 '3': {0: '1971-3', 1: '1', 2: '0', 3: '0'},
                 '561': {0: '2017-9', 1: '1', 2: '2', 3: '0'},
                 '562': {0: '2017-10', 1: '1', 2: '2', 3: '0'},
                 '563': {0: '2017-11', 1: '1', 2: '2', 3: '3'},
                 '564': {0: '2017-12', 1: '1', 2: '2', 3: '3'}})

df=df.T
df.rename(columns=df.iloc[0], inplace = True)
df.drop(df.index[0], inplace=True)
df.rename(columns={'nan':'Time'}, inplace=True)
df.set_index('Time', inplace=True)
df.plot(template='plotly_dark')