Python 在panda中将列元素转换为列名（第二部分）_Python_Pandas

Python 在panda中将列元素转换为列名（第二部分）

python pandas

Python 在panda中将列元素转换为列名（第二部分）,python,pandas,Python,Pandas,这是我之前摆的姿势如何转换以下行： time1,stockA,bid,1 time2,stockA,ask,1.1 time3,stockB,ask,2.1 time4,stockB,bid,2.0 time5,stockA,bid,1.1 time6,stockA,ask,1.2 time7,stockA,high,1.5 time8,stockA,low,0.5 到以下熊猫数据帧： time stock bid

这是我之前摆的姿势

如何转换以下行：

   time1,stockA,bid,1
   time2,stockA,ask,1.1
   time3,stockB,ask,2.1
   time4,stockB,bid,2.0
   time5,stockA,bid,1.1
   time6,stockA,ask,1.2
   time7,stockA,high,1.5
   time8,stockA,low,0.5

到以下

熊猫数据帧

：

  time     stock       bid    ask    high    low
  time1    stockA      1      
  time2    stockA             1.1
  time3    stockB             2.1
  time4    stockB      2.0    
  time5    stockA      1.1
  time6    stockA             1.2
  time7    stockA                     1.5
  time8    stockA                            0.5

感谢您的帮助

我的方法是将csv读入两个df中，一个有投标栏，一个没有投标栏：

In [99]:

t="""time1,stockA,bid,1
 time2,stockA,ask,1.1
 time3,stockB,ask,2.1
 time4,stockB,bid,2.0
 time5,stockA,bid,1.1
 time6,stockA,ask,1.2
 time7,stockA,high,1.5
 time8,stockA,low,0.5"""

df = pd.read_csv(io.StringIO(t), header=None, names=['time', 'stock', 'bid', 'ask'], usecols=['time', 'stock'])
df
Out[99]:
     time   stock
0   time1  stockA
1   time2  stockA
2   time3  stockB
3   time4  stockB
4   time5  stockA
5   time6  stockA
6   time7  stockA
7   time8  stockA

对于第二个df，我们可以调用旋转df以根据“bid”值创建列，我们需要重置索引，然后我们可以将两个df合并在一起以获得所需的结果，如果需要，您可以使用空白字符串替换NaN值：

In [102]:

df_new = pd.read_csv(io.StringIO(t), header=None, names=['time', 'stock', 'bid', 'ask'], usecols=['time','bid','ask'])
df_new = df_new.pivot(columns ='bid', values='ask', index='time')
df_new = df_new.reset_index()
df = df.merge(df_new)
df
Out[102]:
     time   stock  ask  bid  high  low
0   time1  stockA  NaN  1.0   NaN  NaN
1   time2  stockA  1.1  NaN   NaN  NaN
2   time3  stockB  2.1  NaN   NaN  NaN
3   time4  stockB  NaN  2.0   NaN  NaN
4   time5  stockA  NaN  1.1   NaN  NaN
5   time6  stockA  1.2  NaN   NaN  NaN
6   time7  stockA  NaN  NaN   1.5  NaN
7   time8  stockA  NaN  NaN   NaN  0.5

我的方法是将csv读入2个df，其中一个包含或不包含bid ask列：

In [99]:

t="""time1,stockA,bid,1
 time2,stockA,ask,1.1
 time3,stockB,ask,2.1
 time4,stockB,bid,2.0
 time5,stockA,bid,1.1
 time6,stockA,ask,1.2
 time7,stockA,high,1.5
 time8,stockA,low,0.5"""

df = pd.read_csv(io.StringIO(t), header=None, names=['time', 'stock', 'bid', 'ask'], usecols=['time', 'stock'])
df
Out[99]:
     time   stock
0   time1  stockA
1   time2  stockA
2   time3  stockB
3   time4  stockB
4   time5  stockA
5   time6  stockA
6   time7  stockA
7   time8  stockA

In [102]:

df_new = pd.read_csv(io.StringIO(t), header=None, names=['time', 'stock', 'bid', 'ask'], usecols=['time','bid','ask'])
df_new = df_new.pivot(columns ='bid', values='ask', index='time')
df_new = df_new.reset_index()
df = df.merge(df_new)
df
Out[102]:
     time   stock  ask  bid  high  low
0   time1  stockA  NaN  1.0   NaN  NaN
1   time2  stockA  1.1  NaN   NaN  NaN
2   time3  stockB  2.1  NaN   NaN  NaN
3   time4  stockB  NaN  2.0   NaN  NaN
4   time5  stockA  NaN  1.1   NaN  NaN
5   time6  stockA  1.2  NaN   NaN  NaN
6   time7  stockA  NaN  NaN   1.5  NaN
7   time8  stockA  NaN  NaN   NaN  0.5

我的方法是将csv读入2个df，其中一个包含或不包含bid ask列：

In [99]:

t="""time1,stockA,bid,1
 time2,stockA,ask,1.1
 time3,stockB,ask,2.1
 time4,stockB,bid,2.0
 time5,stockA,bid,1.1
 time6,stockA,ask,1.2
 time7,stockA,high,1.5
 time8,stockA,low,0.5"""

df = pd.read_csv(io.StringIO(t), header=None, names=['time', 'stock', 'bid', 'ask'], usecols=['time', 'stock'])
df
Out[99]:
     time   stock
0   time1  stockA
1   time2  stockA
2   time3  stockB
3   time4  stockB
4   time5  stockA
5   time6  stockA
6   time7  stockA
7   time8  stockA

In [102]:

df_new = pd.read_csv(io.StringIO(t), header=None, names=['time', 'stock', 'bid', 'ask'], usecols=['time','bid','ask'])
df_new = df_new.pivot(columns ='bid', values='ask', index='time')
df_new = df_new.reset_index()
df = df.merge(df_new)
df
Out[102]:
     time   stock  ask  bid  high  low
0   time1  stockA  NaN  1.0   NaN  NaN
1   time2  stockA  1.1  NaN   NaN  NaN
2   time3  stockB  2.1  NaN   NaN  NaN
3   time4  stockB  NaN  2.0   NaN  NaN
4   time5  stockA  NaN  1.1   NaN  NaN
5   time6  stockA  1.2  NaN   NaN  NaN
6   time7  stockA  NaN  NaN   1.5  NaN
7   time8  stockA  NaN  NaN   NaN  0.5

我的方法是将csv读入2个df，其中一个包含或不包含bid ask列：

In [99]:

t="""time1,stockA,bid,1
 time2,stockA,ask,1.1
 time3,stockB,ask,2.1
 time4,stockB,bid,2.0
 time5,stockA,bid,1.1
 time6,stockA,ask,1.2
 time7,stockA,high,1.5
 time8,stockA,low,0.5"""

df = pd.read_csv(io.StringIO(t), header=None, names=['time', 'stock', 'bid', 'ask'], usecols=['time', 'stock'])
df
Out[99]:
     time   stock
0   time1  stockA
1   time2  stockA
2   time3  stockB
3   time4  stockB
4   time5  stockA
5   time6  stockA
6   time7  stockA
7   time8  stockA

In [102]:

df_new = pd.read_csv(io.StringIO(t), header=None, names=['time', 'stock', 'bid', 'ask'], usecols=['time','bid','ask'])
df_new = df_new.pivot(columns ='bid', values='ask', index='time')
df_new = df_new.reset_index()
df = df.merge(df_new)
df
Out[102]:
     time   stock  ask  bid  high  low
0   time1  stockA  NaN  1.0   NaN  NaN
1   time2  stockA  1.1  NaN   NaN  NaN
2   time3  stockB  2.1  NaN   NaN  NaN
3   time4  stockB  NaN  2.0   NaN  NaN
4   time5  stockA  NaN  1.1   NaN  NaN
5   time6  stockA  1.2  NaN   NaN  NaN
6   time7  stockA  NaN  NaN   1.5  NaN
7   time8  stockA  NaN  NaN   NaN  0.5

你要做的是旋转桌子。以下方法导致时间和库存形成多指标

 df = pd.read_csv('prices.csv', header=None, names=['time', 'stock', 'type',   'prices'], 
                  index_col=['time', 'stock', 'type'])

In [1062]:

df
Out[1062]:
                    prices
time    stock   type    
time1   stockA  bid 1.0
time2   stockA  ask 1.1
time3   stockB  ask 2.1
time4   stockB  bid 2.0
time5   stockA  bid 1.1
time6   stockA  ask 1.2
time7   stockA  high1.5
time8   stockA  low 0.5

我认为数据帧应该是这样的。那就做吧

您可以使用

df.fillna

使用您喜欢的任何内容填充NAN。一般来说，将列值转换为列标题称为数据透视

.unstack

旋转多索引的某一级别。您也可以检查

.pivot

。你能行

df.columns = df.columns.droplevel(0)

要去除列中包含每列“价格”的外部级别，您需要做的是透视表。以下方法导致时间和库存形成多指标

 df = pd.read_csv('prices.csv', header=None, names=['time', 'stock', 'type',   'prices'], 
                  index_col=['time', 'stock', 'type'])

In [1062]:

df
Out[1062]:
                    prices
time    stock   type    
time1   stockA  bid 1.0
time2   stockA  ask 1.1
time3   stockB  ask 2.1
time4   stockB  bid 2.0
time5   stockA  bid 1.1
time6   stockA  ask 1.2
time7   stockA  high1.5
time8   stockA  low 0.5

我认为数据帧应该是这样的。那就做吧

您可以使用

df.fillna

使用您喜欢的任何内容填充NAN。一般来说，将列值转换为列标题称为数据透视

.unstack

旋转多索引的某一级别。您也可以检查

.pivot

。你能行

df.columns = df.columns.droplevel(0)

要去除列中包含每列“价格”的外部级别，您需要做的是透视表。以下方法导致时间和库存形成多指标

 df = pd.read_csv('prices.csv', header=None, names=['time', 'stock', 'type',   'prices'], 
                  index_col=['time', 'stock', 'type'])

In [1062]:

df
Out[1062]:
                    prices
time    stock   type    
time1   stockA  bid 1.0
time2   stockA  ask 1.1
time3   stockB  ask 2.1
time4   stockB  bid 2.0
time5   stockA  bid 1.1
time6   stockA  ask 1.2
time7   stockA  high1.5
time8   stockA  low 0.5

我认为数据帧应该是这样的。那就做吧

您可以使用

df.fillna

使用您喜欢的任何内容填充NAN。一般来说，将列值转换为列标题称为数据透视

.unstack

旋转多索引的某一级别。您也可以检查

.pivot

。你能行

df.columns = df.columns.droplevel(0)

要去除列中包含每列“价格”的外部级别，您需要做的是透视表。以下方法导致时间和库存形成多指标

 df = pd.read_csv('prices.csv', header=None, names=['time', 'stock', 'type',   'prices'], 
                  index_col=['time', 'stock', 'type'])

In [1062]:

df
Out[1062]:
                    prices
time    stock   type    
time1   stockA  bid 1.0
time2   stockA  ask 1.1
time3   stockB  ask 2.1
time4   stockB  bid 2.0
time5   stockA  bid 1.1
time6   stockA  ask 1.2
time7   stockA  high1.5
time8   stockA  low 0.5

我认为数据帧应该是这样的。那就做吧

您可以使用

df.fillna

使用您喜欢的任何内容填充NAN。一般来说，将列值转换为列标题称为数据透视

.unstack

旋转多索引的某一级别。您也可以检查

.pivot

。你能行

df.columns = df.columns.droplevel(0)

为了摆脱列中包含每个列的“价格”的外部级别

有趣的想法。但是当我尝试它时，我得到：

ValueError:Index包含重复的条目，无法重塑。

但是当我将一些标题添加到原始数据文件时，它会起作用，例如

datetime、stock、type、prices

。您应该能够在读取文件时使用

name

参数添加这些标题，就像我在代码中所做的那样。无论如何，我很高兴你能让它工作。我必须实际添加标题，并使用

names

参数使其工作，这意味着我最终得到了一个冗余的行标题。我不知道这是为什么。正如我在这里发现的，unstack和pivot似乎都存在“重复条目”的问题。在那之后，我遇到了

pivot_table

函数，它似乎可以工作。因此，用

pd.pivot_table（df，index=['time'，'stk'，columns='type'，values='value'）

替换

df.unstack（）

在您的解决方案中终于对我起作用了。是的，pivot，您想要使用的透视表和反堆栈将取决于数据的结构以及您想要实现的目标。这是一个有趣的想法。但是当我尝试它时，我得到：

ValueError:Index包含重复的条目，无法重塑。

但是当我将一些标题添加到原始数据文件时，它会起作用，例如

datetime、stock、type、prices

。您应该能够在读取文件时使用

name

参数添加这些标题，就像我在代码中所做的那样。无论如何，我很高兴你能让它工作。我必须实际添加标题，并使用

names

pivot_table

函数，它似乎可以工作。因此，用

pd.pivot_table（df，index=['time'，'stk'，columns='type'，values='value'）

替换

df.unstack（）

在您的解决方案中终于对我起作用了。是的，pivot，要使用的数据透视表和取消堆栈将取决于数据的结构和要使用的内容