Python 将组标题数据移动到行中并删除标题行
我有一个csv,其中包含如下产品数据:Python 将组标题数据移动到行中并删除标题行,python,pandas,Python,Pandas,我有一个csv,其中包含如下产品数据: Item,Val1,Val2,Val3,Val4,Val5 SomeProductName1,,,,, SomeProductDetails1,,,,, ProductGroupHeader1,,,,, ProductInfo1,39,8,6,94,112 ProductInfo2,32,7,4,94,112 ProductGroupHeader2,,,,, ProductInfo3,39,8,6,94,112 Produc
Item,Val1,Val2,Val3,Val4,Val5
SomeProductName1,,,,,
SomeProductDetails1,,,,,
ProductGroupHeader1,,,,,
ProductInfo1,39,8,6,94,112
ProductInfo2,32,7,4,94,112
ProductGroupHeader2,,,,,
ProductInfo3,39,8,6,94,112
ProductInfo4,32,7,4,94,112
SomeProductName2,,,,,
SomeProductDetails2,,,,,
ProductGroupHeader21,,,,,
ProductInfo21,39,8,6,94,112
ProductInfo22,32,7,4,94,112
ProductGroupHeader2,,,,,
ProductInfo23,39,8,6,94,112
ProductInfo24,32,7,4,94,112
我需要它,因为:
Item,Val1,Val2,Val3,Val4,Val5
SomeProductName1, SomeProductDetails1, ProductGroupHeader1,,,,,
SomeProductName1, SomeProductDetails1, ProductInfo1,39,8,6,94,112
SomeProductName1, SomeProductDetails1, ProductInfo2,32,7,4,94,112
SomeProductName1, SomeProductDetails1, ProductGroupHeader2,,,,,
SomeProductName1, SomeProductDetails1, ProductInfo3,39,8,6,94,112
SomeProductName1, SomeProductDetails1, ProductInfo4,32,7,4,94,112
SomeProductName2, SomeProductDetails2, ProductGroupHeader21,,,,,
SomeProductName2, SomeProductDetails2, ProductInfo21,39,8,6,94,112
SomeProductName2, SomeProductDetails2, ProductInfo22,32,7,4,94,112
SomeProductName2, SomeProductDetails2, ProductGroupHeader2,,,,,
SomeProductName2, SomeProductDetails2, ProductInfo23,39,8,6,94,112
SomeProductName2, SomeProductDetails2, ProductInfo24,32,7,4,94,112
本质上,我希望从各自的行中获取SomeProductName
和SomeProductDetails
,删除这些行,然后在ProductInfo
行中添加两列值
csv有几千行,我最初的想法是根据需要循环更新和删除行
然后,我打算基于ProductName
和可能的+ProductDetails
我不熟悉pandas和python,只是想知道是否有更简单/更有效的方法?为了满足您的预期输出,您可以使用mask进行操作,其中所有值都是nan,使用
过滤器和isna
。假设结构严格,您可以使用shift
查找名称和详细信息行。然后concat
将使用where
和ffill
创建的名称和详细信息列添加到df,并仅选择所需的行
#get the rows with nan in all values columns
m = df.filter(like='Val').isna().all(1)
# get the rows with ProductName, it is where
# all val are nan and also where all val are nan two rows later (GroupHeader rows)
name = m&m.shift(-2)
# get the rows with ProductDetails, it is where
# all val are nan the row before (ProductName rows)
# and also all val are nan one row later (GroupHeader rows)
details = m & m.shift(-1) & m.shift(1)
# you can create the dataframe wth concat,
# use where to and ffill to keep name and details on followinf rows
df_ = (pd.concat([df['Item'].where(name).ffill().rename('Item_name'),
df['Item'].where(details).ffill().rename('Item_details'),
df],
axis=1)
[~(name|details)] #remove rows with only name and details
)
你得到了什么
print (df_)
Item_name Item_product Item Val1 Val2 \
2 SomeProductName1 SomeProductDetails1 ProductGroupHeader1 NaN NaN
3 SomeProductName1 SomeProductDetails1 ProductInfo1 39.0 8.0
4 SomeProductName1 SomeProductDetails1 ProductInfo2 32.0 7.0
5 SomeProductName1 SomeProductDetails1 ProductGroupHeader2 NaN NaN
6 SomeProductName1 SomeProductDetails1 ProductInfo3 39.0 8.0
7 SomeProductName1 SomeProductDetails1 ProductInfo4 32.0 7.0
10 SomeProductName2 SomeProductDetails2 ProductGroupHeader21 NaN NaN
11 SomeProductName2 SomeProductDetails2 ProductInfo21 39.0 8.0
12 SomeProductName2 SomeProductDetails2 ProductInfo22 32.0 7.0
13 SomeProductName2 SomeProductDetails2 ProductGroupHeader2 NaN NaN
14 SomeProductName2 SomeProductDetails2 ProductInfo23 39.0 8.0
15 SomeProductName2 SomeProductDetails2 ProductInfo24 32.0 7.0
Val3 Val4 Val5
2 NaN NaN NaN
3 6.0 94.0 112.0
4 4.0 94.0 112.0
5 NaN NaN NaN
6 6.0 94.0 112.0
7 4.0 94.0 112.0
10 NaN NaN NaN
11 6.0 94.0 112.0
12 4.0 94.0 112.0
13 NaN NaN NaN
14 6.0 94.0 112.0
15 4.0 94.0 112.0
编辑,要将groupheader添加为列,可以创建一个类似的掩码,然后在concat中以相同的方式使用它:
#rows where all values are nan but not next row
groupHeader = m&(~m).shift(-1)
df_ = (pd.concat([df['Item'].where(name).ffill().rename('Item_name'),
df['Item'].where(details).ffill().rename('Item_details'),
df['Item'].where(groupHeader).ffill().rename('Item_group'), #add this
df],
axis=1)
[~(name|details|groupHeader)] #remove also the rows with groupHeader only
)
太好了,谢谢。如果我还希望ProductGroupHeaders与其他2一样作为一列,我将如何实现这一点?