Python 熊猫：在多个列之间线性插值多个步骤_Python_Pandas_Dataframe_Interpolation

Python 熊猫：在多个列之间线性插值多个步骤

python pandas dataframe

Python 熊猫：在多个列之间线性插值多个步骤,python,pandas,dataframe,interpolation,Python,Pandas,Dataframe,Interpolation,我已经在这里查看了我的问题的多个版本，但找不到我正在尝试做的事情的答案问题：我有一个熊猫数据框，里面有一组十进制数数据，在一个实验的多次迭代（每行）中收集，用于多个波长的光（每列）。这些波长间隔是列标题，由于我们机器的限制，波长/列之间的间隔目前为2.5 我现在需要计算每行的值应该是多少，波长间隔是0.1，而不是2.5。这将需要我创建间距为0.1的新列标题（因此当前列之间有24个新列），然后在每0.1步对每行中的值进行线性插值有人能帮忙吗？我完全不知道该怎么做到目前为止我得到的信息： #

我已经在这里查看了我的问题的多个版本，但找不到我正在尝试做的事情的答案

问题：
我有一个熊猫数据框，里面有一组十进制数数据，在一个实验的多次迭代（每行）中收集，用于多个波长的光（每列）。这些波长间隔是列标题，由于我们机器的限制，波长/列之间的间隔目前为2.5

我现在需要计算每行的值应该是多少，波长间隔是0.1，而不是2.5。这将需要我创建间距为0.1的新列标题（因此当前列之间有24个新列），然后在每0.1步对每行中的值进行线性插值

有人能帮忙吗？我完全不知道该怎么做

到目前为止我得到的信息：

# data_in = my original Panda dataframe with experiment data.
# wavelengths (column headers) go from 400 to 900 in 2.5nm intervals.
# I want 400 to 900 in 0.1 nm intervals.

# Create a copy dataframe for generating the interpolated columns, 
# copying the structure of the original file for the first 3 columns.
# (I need the first 3 columns intact for an unimportant reason)
data_interp = data_in[data_in.columns[0:3]].copy()

# Interpolate 400 to 900 nm in 0.1 nm steps for the column headers.
wave_array = np.linspace(400, 900, num=5000, endpoint=True)

# Import the interpolated numpy array as column headers in the new panda dataframe.
data_interp = pd.concat([data_interp,pd.DataFrame(columns=wave_array)])

# Use the pandas 'update' function to map any matching instances of columns and their data   
# from 'data_in' to 'data_interp' (ie, import all the 2.5 nm interval data from  
# the old dataframe to their proper place in the new dataframe).
data_interp.update(data_in)

现在我有了一个新的Panda dataframe（data_interp），它包含了我所有原始的2.5 nm间隔数据，以及大量带有0.1 nm间隔头的空列

我需要用插值数据填充所有这些空单元格，根据2.5 nm间隔处的数据计算

欢迎任何帮助，谢谢

编辑1：以下是我的输入数据帧（data\u in）和新插值数据帧（data\u interp）的几张照片。

# Mini data.
data_mini = [[10, 13, 11], [15, 14, 15], [19, 18, 22]] 

# Convert to pandas dataframe
data_mini_pd = pd.DataFrame(data_mini, columns = [400, 402.5, 405])  

# Copy new dataframe based on original dataframe
data_mini_pd_interp = data_mini_pd[data_mini_pd.columns[0:0]].copy()

# Interpolate 400 to 405 nm in 0.1 nm steps for the column headers.
wave_array_mini = np.linspace(400, 405, num=50, endpoint=True)

# Round all numbers to 1 decimal place, to prevent float placeholder overflow
# when importing to panda column headers.
wave_array_mini_round = np.around(wave_array_mini, decimals=1)

# Import the interpolated numpy array as column headers in the new panda dataframe.
data_mini_pd_interp = pd.concat([data_mini_pd_interp,pd.DataFrame(columns=wave_array_mini_round)])

# Use the pandas 'update' function to map any matching instances of columns and their data from 'data_in' to 'data_interp' (ie, import all the 2.5 nm interval
# data from  the old dataframe to their proper place in the new dataframe).
data_mini_pd_interp.update(data_mini_pd)

中的数据\u

：

数据交互

：

编辑2：一个微型示例。

# Mini data.
data_mini = [[10, 13, 11], [15, 14, 15], [19, 18, 22]] 

# Convert to pandas dataframe
data_mini_pd = pd.DataFrame(data_mini, columns = [400, 402.5, 405])  

# Copy new dataframe based on original dataframe
data_mini_pd_interp = data_mini_pd[data_mini_pd.columns[0:0]].copy()

# Interpolate 400 to 405 nm in 0.1 nm steps for the column headers.
wave_array_mini = np.linspace(400, 405, num=50, endpoint=True)

# Round all numbers to 1 decimal place, to prevent float placeholder overflow
# when importing to panda column headers.
wave_array_mini_round = np.around(wave_array_mini, decimals=1)

# Import the interpolated numpy array as column headers in the new panda dataframe.
data_mini_pd_interp = pd.concat([data_mini_pd_interp,pd.DataFrame(columns=wave_array_mini_round)])

# Use the pandas 'update' function to map any matching instances of columns and their data from 'data_in' to 'data_interp' (ie, import all the 2.5 nm interval
# data from  the old dataframe to their proper place in the new dataframe).
data_mini_pd_interp.update(data_mini_pd)

这个解决方案有点难看，但应该做到：

##generate data
nrows = 100
cols = [x/10.0 for x in range(0, 100, 25)]
data = {c: np.random.uniform(0, 1, nrows) for c in cols}

df = pd.DataFrame(data)

此解决方案有点难看，但应该可以做到：

##generate data
nrows = 100
cols = [x/10.0 for x in range(0, 100, 25)]
data = {c: np.random.uniform(0, 1, nrows) for c in cols}

df = pd.DataFrame(data)

我会转置矩阵并欺骗（新）索引，使其成为DatetimeIndex-绝对值将减少1000倍，但这对数据并不重要。这样就可以用不同的频率对数据帧进行重新采样

在此之后，将索引转换回浮点数并再次转置即可获得预期结果

从您的

数据\u mini\u dp

开始，它可以是：

df = data_mini_pd.T.set_index(pd.to_datetime(
    (data_mini_pd.columns * 10).astype(int), format='%f')
                              ).resample('100000ns').interpolate()

df.index = df.index.strftime('%f').astype('float64')/1000

resul = df.T

给予：

   400.0  400.1  400.2  400.3  400.4  400.5  400.6  400.7  400.8  400.9  401.0  401.1  401.2  401.3  401.4  401.5  401.6  401.7  401.8  401.9  402.0  402.1  402.2  402.3  402.4  402.5  402.6  402.7  402.8  402.9  403.0  403.1  403.2  403.3  403.4  403.5  403.6  403.7  403.8  403.9  404.0  404.1  404.2  404.3  404.4  404.5  404.6  404.7  404.8  404.9  405.0
0   10.0  10.12  10.24  10.36  10.48   10.6  10.72  10.84  10.96  11.08   11.2  11.32  11.44  11.56  11.68   11.8  11.92  12.04  12.16  12.28   12.4  12.52  12.64  12.76  12.88   13.0  12.92  12.84  12.76  12.68   12.6  12.52  12.44  12.36  12.28   12.2  12.12  12.04  11.96  11.88   11.8  11.72  11.64  11.56  11.48   11.4  11.32  11.24  11.16  11.08   11.0
1   15.0  14.96  14.92  14.88  14.84   14.8  14.76  14.72  14.68  14.64   14.6  14.56  14.52  14.48  14.44   14.4  14.36  14.32  14.28  14.24   14.2  14.16  14.12  14.08  14.04   14.0  14.04  14.08  14.12  14.16   14.2  14.24  14.28  14.32  14.36   14.4  14.44  14.48  14.52  14.56   14.6  14.64  14.68  14.72  14.76   14.8  14.84  14.88  14.92  14.96   15.0
2   19.0  18.96  18.92  18.88  18.84   18.8  18.76  18.72  18.68  18.64   18.6  18.56  18.52  18.48  18.44   18.4  18.36  18.32  18.28  18.24   18.2  18.16  18.12  18.08  18.04   18.0  18.16  18.32  18.48  18.64   18.8  18.96  19.12  19.28  19.44   19.6  19.76  19.92  20.08  20.24   20.4  20.56  20.72  20.88  21.04   21.2  21.36  21.52  21.68  21.84   22.0

我会转置矩阵，并欺骗（新）索引，使其成为DatetimeIndex——绝对值将减少1000倍，但这对数据并不重要。这样就可以用不同的频率对数据帧进行重新采样

在此之后，将索引转换回浮点数并再次转置即可获得预期结果

从您的

数据\u mini\u dp

开始，它可以是：

df = data_mini_pd.T.set_index(pd.to_datetime(
    (data_mini_pd.columns * 10).astype(int), format='%f')
                              ).resample('100000ns').interpolate()

df.index = df.index.strftime('%f').astype('float64')/1000

resul = df.T

给予：

   400.0  400.1  400.2  400.3  400.4  400.5  400.6  400.7  400.8  400.9  401.0  401.1  401.2  401.3  401.4  401.5  401.6  401.7  401.8  401.9  402.0  402.1  402.2  402.3  402.4  402.5  402.6  402.7  402.8  402.9  403.0  403.1  403.2  403.3  403.4  403.5  403.6  403.7  403.8  403.9  404.0  404.1  404.2  404.3  404.4  404.5  404.6  404.7  404.8  404.9  405.0
0   10.0  10.12  10.24  10.36  10.48   10.6  10.72  10.84  10.96  11.08   11.2  11.32  11.44  11.56  11.68   11.8  11.92  12.04  12.16  12.28   12.4  12.52  12.64  12.76  12.88   13.0  12.92  12.84  12.76  12.68   12.6  12.52  12.44  12.36  12.28   12.2  12.12  12.04  11.96  11.88   11.8  11.72  11.64  11.56  11.48   11.4  11.32  11.24  11.16  11.08   11.0
1   15.0  14.96  14.92  14.88  14.84   14.8  14.76  14.72  14.68  14.64   14.6  14.56  14.52  14.48  14.44   14.4  14.36  14.32  14.28  14.24   14.2  14.16  14.12  14.08  14.04   14.0  14.04  14.08  14.12  14.16   14.2  14.24  14.28  14.32  14.36   14.4  14.44  14.48  14.52  14.56   14.6  14.64  14.68  14.72  14.76   14.8  14.84  14.88  14.92  14.96   15.0
2   19.0  18.96  18.92  18.88  18.84   18.8  18.76  18.72  18.68  18.64   18.6  18.56  18.52  18.48  18.44   18.4  18.36  18.32  18.28  18.24   18.2  18.16  18.12  18.08  18.04   18.0  18.16  18.32  18.48  18.64   18.8  18.96  19.12  19.28  19.44   19.6  19.76  19.92  20.08  20.24   20.4  20.56  20.72  20.88  21.04   21.2  21.36  21.52  21.68  21.84   22.0

你能给我们提供一些样本数据（几行原始数据）并展示你目前得到的数据吗？@SergeBallesta我已经添加了两个数据帧截图的链接。我们无法从图像中复制任何内容。。。你应该提供可复制的数据。顺便说一句，为什么dataframe列是明智的？更常见的是用DateTimeIndex标记行。@SergeBallesta好的，我已经提供了一个我正在处理的数据的小示例。我无法控制传入数据的格式，这是我获取数据的方式。正如您所说，数据是通过一个日期时间索引来组织的，该索引标记了行（我在显示过程中切断了这些列，因为它包含一些敏感信息）。您能为我们提供一些示例数据（一些原始行）吗并显示您当前获得的内容？@SergeBallesta我已经添加了两个数据帧截图的链接。我们无法从图像中复制任何内容。。。你应该提供可复制的数据。顺便说一句，为什么dataframe列是明智的？更常见的是用DateTimeIndex标记行。@SergeBallesta好的，我已经提供了一个我正在处理的数据的小示例。我无法控制传入数据的格式，这是我获取数据的方式。正如您所说，数据是通过一个日期时间索引来组织的，该索引标记了行（我在显示过程中切断了这些列，因为它包含一些敏感信息）。这看起来可能是我所需要的。我正在吃一顿饭，等我回到办公桌前再看。不管怎样，谢谢你花时间写出来！现在我有更多的时间来处理这件事。这完全符合我的需要，即使在我调整了文件格式/列之后——谢谢！我来自MATLAB，在MATLAB中，循环的编写需要比Python更明确（/笨拙？）——压缩速记仍然是我习惯的东西。如果您看到这一点并愿意花更多的时间，我不介意解释“inter_df=…”命令的具体情况。如果没有，我会继续努力阅读，谢谢！这看起来可能是我需要的。我正在吃一顿饭，等我回到办公桌前再看。不管怎样，谢谢你花时间写出来！现在我有更多的时间来处理这件事。这完全符合我的需要，即使在我调整了文件格式/列之后——谢谢！我来自MATLAB，在MATLAB中，循环的编写需要比Python更明确（/笨拙？）——压缩速记仍然是我习惯的东西。如果您看到这一点并愿意花更多的时间，我不介意解释“inter_df=…”命令的具体情况。如果没有，我会继续努力阅读，谢谢！谢谢你花时间做这件事。它肯定比其他解决方案更简洁，尽管我对日期时间转换很谨慎。所有的提升都是由“data\u mini\u pd.resample（…”和“data\u mini\u pd.interpolate（）”和“.resample”完成的吗需要一个日期戳而不是一个浮点数？我只是惊讶于它在处理所有数据时没有明确指定行、在哪些列之间进行交互、需要创建中间列等。感谢您为此花费时间。这是必然的