Python 通过从不同的数据帧中获取值并对其执行一些数学运算来创建新的数据帧_Python_Pandas_Dataframe

Python 通过从不同的数据帧中获取值并对其执行一些数学运算来创建新的数据帧

python pandas dataframe

Python 通过从不同的数据帧中获取值并对其执行一些数学运算来创建新的数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,假设我有一个包含16列和大约1000行的熊猫数据框，格式是这样的 date_time sec01 sec02 sec03 sec04 sec05 sec06 sec07 sec08 sec09 sec10 sec11 sec12 sec13 sec14 sec15 sec16 1970-01-01 05:54:17 8.50 8.62 8.53 8.45 8.50 8.62 8.53 8

假设我有一个包含16列和大约1000行的熊猫数据框，格式是这样的

date_time   sec01   sec02   sec03   sec04   sec05   sec06   sec07   sec08   sec09   sec10   sec11   sec12   sec13   sec14   sec15   sec16

1970-01-01 05:54:17 8.50    8.62    8.53    8.45    8.50    8.62    8.53    8.45    8.42    8.39    8.39    8.40    8.47    8.54    8.65    8.70
1970-01-01 05:56:55 8.43    8.62    8.55    8.45    8.43    8.62    8.55    8.45    8.42    8.39    8.39    8.40    8.46    8.53    8.65    8.71

现在我需要制作另一个包含32列的熊猫数据帧：

x_sec01 y_sec01 x_sec02 y_sec02 x_sec03 y_sec03 x_sec04 y_sec04 x_sec05 y_sec05 x_sec06 y_sec06 x_sec07 ...

其中，每列的值需要乘以特定的数学常数，该常数取决于列号（扇区号）：

因此，原始数据帧（sec01-sec16）中的每列都需要转换为两列（x_sec01，y_sec01），其相乘的系数取决于扇区号值

目前我正在使用这个函数，并为for循环中的每一行调用这个函数，这占用了太多的时间

def sec_to_xy(sec_no,sec_data):  #function to convert sector data to xy coordinate system
    for sec_convno in range(0,32,2):
        sector_number = (77-(sec_no-1)*2) #goes from 79 till 49
        x = sec_data * (math.cos(math.radians(1.40625*(sector_number))))
        y = sec_data * (math.sin(math.radians(1.40625*(sector_number))))   
    return(x,y)

总体思路是堆叠您的值，以便可以应用numpy的快速矢量化函数

# stack the dataframe
df2 = df.stack().reset_index(level=1)
df2.columns = ['sec', 'value']
# extract the sector number
df2['sec_no'] = df2['sec'].str.slice(-2).astype(int)

# apply numpy's vectorized functions
import numpy as np
df2['x'] = df2['value'] * (np.cos(np.radians(1.40625*(df2['sec_no']))))
df2['y'] = df2['value'] * (np.sin(np.radians(1.40625*(df2['sec_no']))))

在此阶段，df2的外观如下所示：

                       sec  value  sec_no         x         y
1970-01-01 05:54:17  sec01   8.50       1  8.497440  0.208600
1970-01-01 05:54:17  sec02   8.62       2  8.609617  0.422963
1970-01-01 05:54:17  sec03   8.53       3  8.506888  0.627506
1970-01-01 05:54:17  sec04   8.45       4  8.409311  0.828245
1970-01-01 05:54:17  sec05   8.50       5  8.436076  1.040491

现在旋转表格以返回原始形状：

df2[['sec', 'x', 'y']].pivot(columns='sec')

剩下要做的就是重命名列。

这里有一种使用NumPy的方法-

# Extract as float array
a = df.values # Extract all 16 columns
m,n = a.shape

# Scaling array
s = np.radians(1.40625*(np.arange(79,47,-2)))

# Initialize output array and set cosine and sine values
out = np.zeros((m,n,2))
out[:,:,0] = a*np.cos(s)
out[:,:,1] = a*np.sin(s)

# Transfer to a dataframe output
df_out = pd.DataFrame(out.reshape(-1,n*2),index=df.index)

请注意，如果实际上有17列，第一列是

date\u time

，那么我们需要跳过第一列。因此，在开始时，使用以下步骤获取

a = df.ix[:,1:].values

我理解第一部分，但第二部分似乎不起作用，我得到了这个“不能用空键标记索引”错误是的，我现在正在查看，很抱歉我对编程非常陌生你有什么熊猫版本（

pd.\u version\uuu

）？你能试试

.pivot（index=df2.index，columns='sec'）

吗？pandas的版本是'0.13.1'，也尝试过这个版本，但现在我得到了'ValueError:index包含重复条目，无法重塑'，我想我在这里做了一些非常愚蠢的事情，如果你升级到最新的pandas版本，它会工作。我很难用旧版本解决这个问题。你试过另一个答案吗？

a = df.ix[:,1:].values