Python 使用正则表达式匹配一列中的模式,并基于列属性名称创建一个新列
我有一个Python 使用正则表达式匹配一列中的模式,并基于列属性名称创建一个新列,python,regex,pandas,dataframe,Python,Regex,Pandas,Dataframe,我有一个熊猫数据框,格式如下: 当前 product_typ [Milo, Milk, Sugar] [Water, Tea, Milo] [Bread, Water] [Bread, Water, Milo] [Salt, Water, Milo] [Milo, Milk, Water, Bread] [Salt, Milk, Bread] [Milo, Milk] product_typ matched_col [Milo, Milk,
熊猫数据框
,格式如下:
当前
product_typ
[Milo, Milk, Sugar]
[Water, Tea, Milo]
[Bread, Water]
[Bread, Water, Milo]
[Salt, Water, Milo]
[Milo, Milk, Water, Bread]
[Salt, Milk, Bread]
[Milo, Milk]
product_typ matched_col
[Milo, Milk, Sugar] Product_Milo_Milk_Sugar
[Water, Tea, Milo] Product_Water_Tea_Milo
[Bread, Water] Product_Bread_Water
[Bread, Water, Milo] Product_Bread_Water_Milo
[Salt, Water, Milo] Product_Salt_Water_Milo
[Milo, Milk, Water, Bread] Product_Milo_Milk_Water_Bread
[Salt, Milk, Bread] Product_Salt_Milk_Bread
[Milo, Milk] Product_Milo_Milk
我想用表单的regex创建一个新列。请记住,它是一个数据帧
预期产出
product_typ
[Milo, Milk, Sugar]
[Water, Tea, Milo]
[Bread, Water]
[Bread, Water, Milo]
[Salt, Water, Milo]
[Milo, Milk, Water, Bread]
[Salt, Milk, Bread]
[Milo, Milk]
product_typ matched_col
[Milo, Milk, Sugar] Product_Milo_Milk_Sugar
[Water, Tea, Milo] Product_Water_Tea_Milo
[Bread, Water] Product_Bread_Water
[Bread, Water, Milo] Product_Bread_Water_Milo
[Salt, Water, Milo] Product_Salt_Water_Milo
[Milo, Milk, Water, Bread] Product_Milo_Milk_Water_Bread
[Salt, Milk, Bread] Product_Salt_Milk_Bread
[Milo, Milk] Product_Milo_Milk
我试着用str.findall
匹配这个模式
很有效,但是这个替代品让我思考了很久。像这样可能:
df['matched_col'] = ['_'.join(map(str, l)) for l in df['product_typ']]
或
例如:
In [1681]: df = pd.DataFrame({'A': [['a','b','c'], ['b','c']]})
In [1682]: df
Out[1682]:
A
0 [a, b, c]
1 [b, c]
In [1684]: df['b'] = ['_'.join(map(str, l)) for l in df['A']]
In [1685]: df
Out[1685]:
A b
0 [a, b, c] a_b_c
1 [b, c] b_c
也许像这样:
df['matched_col'] = ['_'.join(map(str, l)) for l in df['product_typ']]
或
例如:
In [1681]: df = pd.DataFrame({'A': [['a','b','c'], ['b','c']]})
In [1682]: df
Out[1682]:
A
0 [a, b, c]
1 [b, c]
In [1684]: df['b'] = ['_'.join(map(str, l)) for l in df['A']]
In [1685]: df
Out[1685]:
A b
0 [a, b, c] a_b_c
1 [b, c] b_c
到目前为止您尝试了什么?到目前为止您尝试了什么?我得到一个错误TypeError:sequence item 0:expected str instance,float found
,当我检查第一个项目时,这就是-->['Milk','Milo']
请仅在列上应用它,而不是在整个数据帧上。另一个原因可能是此列包含浮点值。尝试将此应用于数据帧示例,您可以确保它包含正确的字符串值列表。我得到一个错误TypeError:sequence item 0:expected str instance,float found
,当我检查第一个项时,这就是-->['Milk',Milo']
请仅对包含列表的列应用此选项,而不是对整个数据帧应用此选项。另一个原因可能是此列包含浮点值。尝试将此应用于dataframe示例,您可以确保它包含正确的字符串值列表。