Python 如何重新构造dataframe以基于列[se]值创建新列标签，然后使用列[value]值填充这些新列_Python_Pandas

Python 如何重新构造dataframe以基于列[se]值创建新列标签，然后使用列[value]值填充这些新列

python pandas

Python 如何重新构造dataframe以基于列[se]值创建新列标签，然后使用列[value]值填充这些新列,python,pandas,Python,Pandas,原始数据帧 index Date Device Element Sub_Element Value 179593 2017-11-28 16:39:00 x y eth_txload 9 179594 2017-11-28 16:39:00 x y eth_rxload 30 179595 2017-11-28 16:39:00 x y et

原始数据帧

index Date Device Element Sub_Element Value 179593 2017-11-28 16:39:00 x y eth_txload 9 179594 2017-11-28 16:39:00 x y eth_rxload 30 179595 2017-11-28 16:39:00 x y eth_ip_addr x.x.x.x 179596 2017-11-28 16:39:00 x y description string Date Device Element description eth_txload eth_rxload eth_ip_addr 2017-11-28 16:39:00 x y string 9 30 x.x.x.x 索引日期设备元素子元素值 179593 2017-11-28 16:39:00 x y eth_txload 9 179594 2017-11-28 16:39:00 x y eth_rxload 30 179595 2017-11-28 16:39:00 x y eth_ip_addr x.x.x.x 179596 2017-11-28 16:39:00 x y描述字符串所需数据帧

index Date Device Element Sub_Element Value 179593 2017-11-28 16:39:00 x y eth_txload 9 179594 2017-11-28 16:39:00 x y eth_rxload 30 179595 2017-11-28 16:39:00 x y eth_ip_addr x.x.x.x 179596 2017-11-28 16:39:00 x y description string Date Device Element description eth_txload eth_rxload eth_ip_addr 2017-11-28 16:39:00 x y string 9 30 x.x.x.x 日期设备元件说明eth\U txload eth\U rxload eth\U ip\U地址 2017-11-28 16:39:00 x y字符串9 30 x.x.x.x.x 最好的办法是什么

为每个子元素创建数据帧，并在=['Date'，'Device'，'Element']合并

或者使用一些df.iloc魔术来创建一个布尔掩码并将该值应用到一个新列

或者也许我错过了一个更好/更有效的方法

IIUC，给定：

print(df)

    index                 Date Device Element  Sub_Element    Value
0  179593  2017-11-28 16:39:00      x       y   eth_txload        9
1  179594  2017-11-28 16:39:00      x       y   eth_rxload       30
2  179595  2017-11-28 16:39:00      x       y  eth_ip_addr  x.x.x.x
3  179596  2017-11-28 16:39:00      x       y  description   string

然后：

输出：

Sub_Element                 Date Device Element description eth_ip_addr eth_rxload eth_txload
0            2017-11-28 16:39:00      x       y      string     x.x.x.x         30          9

                  Date Device Element eth_txload eth_rxload eth_ip_addr description
0  2017-11-28 16:39:00      x       y          9         30     x.x.x.x string

我是这样做的。我的解决方案不像斯科特的那样“花哨”，但我把逻辑中的步骤分解了。对于即插即用方案，他的解决方案可能更好：

#reading in dataframe from your text
df1 = pd.read_clipboard()

# creating an untouched copy of df1 for minpulation
df2 = df1.copy()    

# dropping the duplicates of index and Date to get one row
df1 = df1.drop_duplicates(subset=['index', 'Date'])

# creating a dictionary of key, value pairs for each column and value
kv = dict(zip(df2.Sub_Element, df2.Value))

# creating a datframe out of the above dictionary
new_df = pd.DataFrame(kv, index=[0])

# creating temp values to merge on
df1['tmp'] = 1
new_df['tmp'] = 1

# merging on the tmp values
output_df = df1.merge(new_df, on='tmp')

# cleaning up for the output
del output_df['Sub_Element']
del output_df['Value']
del output_df['tmp]

#output
        index      Date Device Element description eth_ip_addr eth_rxload  eth_txload 
0  2017-11-28  16:39:00      x       y      string     x.x.x.x         30   9

一个公认的更类似SQL的解决方案，但避免处理索引：

# read in the dataframe
df = pd.read_clipboard()

# set up what we will be joining to
anchor = df[['Date','Device','Element']].drop_duplicates()

# loop through the values we want to pivot out
for element in df['Sub_Element'].unique():

    # filter the original dataframe for the value for Sub_Element
    # using the copy method avoids SettingWithCopyWarning
    temp = df[df['Sub_Element']==element].copy() 

    temp.rename(columns={'Value':element},inplace=True) #rename the header

    # left join the new dataframe to the anchor in case of NaNs
    anchor = anchor.merge(temp[['Date','Device','Element',element]],
                          on=['Date','Device','Element'],how='left')
print(anchor)

输出：

Sub_Element                 Date Device Element description eth_ip_addr eth_rxload eth_txload
0            2017-11-28 16:39:00      x       y      string     x.x.x.x         30          9

                  Date Device Element eth_txload eth_rxload eth_ip_addr description
0  2017-11-28 16:39:00      x       y          9         30     x.x.x.x string

你能解释一下期望输出背后的逻辑吗？我忽略了其中的条件。它是否需要

子元素

列，并使每个值成为自己的列？@MattR我基本上是在处理类似传感器的数据，有数百万行。数据以[collection_timestamp、device、element（系统、接口等）、sub_element（cpu、mem、load、description等）、value（针对sub_e value的数据类型）]@MattR element的方案采集，sub_element有数百个唯一值。理想情况下，我希望弹出特定于设备和元素的所有sub_元素，并将它们放在一行上，以便于可读性和将来的屏蔽和df迭代。可能有点像一个扩展和减少操作。希望这有帮助，谢谢@马特：谢谢！感谢您的评论。@ScottBoston

sum（level=[0,1,2]）

的目的是什么？谢谢@ScottBoston感谢Scott，一旦我打开它并删除了.sum（level=[0,1,2]）部分，它就工作得很好，处理速度比Tad快得多Solution@Evan是的，在我开发时，sum（leve）就留在那里了，一旦我删除了'index'列，我就不再需要sum（leve）。谢谢@泰勒巴斯托我很高兴埃文质疑了总和（水平）的东西。这是不必要的。很高兴，这能帮上忙。这工作很完美，我能理解流程！谢谢，泰德