Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/304.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用正则表达式匹配一列中的模式,并基于列属性名称创建一个新列_Python_Regex_Pandas_Dataframe - Fatal编程技术网

Python 使用正则表达式匹配一列中的模式,并基于列属性名称创建一个新列

Python 使用正则表达式匹配一列中的模式,并基于列属性名称创建一个新列,python,regex,pandas,dataframe,Python,Regex,Pandas,Dataframe,我有一个熊猫数据框,格式如下: 当前 product_typ [Milo, Milk, Sugar] [Water, Tea, Milo] [Bread, Water] [Bread, Water, Milo] [Salt, Water, Milo] [Milo, Milk, Water, Bread] [Salt, Milk, Bread] [Milo, Milk] product_typ matched_col [Milo, Milk,

我有一个
熊猫数据框
,格式如下:

当前

product_typ

[Milo, Milk, Sugar]
[Water, Tea, Milo]
[Bread, Water]
[Bread, Water, Milo]
[Salt, Water, Milo]
[Milo, Milk, Water, Bread]
[Salt, Milk, Bread]
[Milo, Milk]
product_typ                          matched_col

[Milo, Milk, Sugar]                Product_Milo_Milk_Sugar
[Water, Tea, Milo]                 Product_Water_Tea_Milo
[Bread, Water]                     Product_Bread_Water
[Bread, Water, Milo]               Product_Bread_Water_Milo
[Salt, Water, Milo]                Product_Salt_Water_Milo
[Milo, Milk, Water, Bread]         Product_Milo_Milk_Water_Bread
[Salt, Milk, Bread]                Product_Salt_Milk_Bread
[Milo, Milk]                       Product_Milo_Milk
我想用表单的regex创建一个新列。请记住,它是一个数据帧

预期产出

product_typ

[Milo, Milk, Sugar]
[Water, Tea, Milo]
[Bread, Water]
[Bread, Water, Milo]
[Salt, Water, Milo]
[Milo, Milk, Water, Bread]
[Salt, Milk, Bread]
[Milo, Milk]
product_typ                          matched_col

[Milo, Milk, Sugar]                Product_Milo_Milk_Sugar
[Water, Tea, Milo]                 Product_Water_Tea_Milo
[Bread, Water]                     Product_Bread_Water
[Bread, Water, Milo]               Product_Bread_Water_Milo
[Salt, Water, Milo]                Product_Salt_Water_Milo
[Milo, Milk, Water, Bread]         Product_Milo_Milk_Water_Bread
[Salt, Milk, Bread]                Product_Salt_Milk_Bread
[Milo, Milk]                       Product_Milo_Milk
我试着用
str.findall
匹配
这个
模式
很有效,但是这个替代品让我思考了很久。

像这样可能:

df['matched_col'] = ['_'.join(map(str, l)) for l in df['product_typ']]
或 例如:

In [1681]: df = pd.DataFrame({'A': [['a','b','c'], ['b','c']]})                                                                                                                                             

In [1682]: df                                                                                                                                                                                               
Out[1682]: 
           A
0  [a, b, c]
1     [b, c]

In [1684]: df['b'] = ['_'.join(map(str, l)) for l in df['A']]                                                                                                                                               

In [1685]: df                                                                                                                                                                                               
Out[1685]: 
           A      b
0  [a, b, c]  a_b_c
1     [b, c]    b_c
也许像这样:

df['matched_col'] = ['_'.join(map(str, l)) for l in df['product_typ']]
或 例如:

In [1681]: df = pd.DataFrame({'A': [['a','b','c'], ['b','c']]})                                                                                                                                             

In [1682]: df                                                                                                                                                                                               
Out[1682]: 
           A
0  [a, b, c]
1     [b, c]

In [1684]: df['b'] = ['_'.join(map(str, l)) for l in df['A']]                                                                                                                                               

In [1685]: df                                                                                                                                                                                               
Out[1685]: 
           A      b
0  [a, b, c]  a_b_c
1     [b, c]    b_c

到目前为止您尝试了什么?到目前为止您尝试了什么?我得到一个错误
TypeError:sequence item 0:expected str instance,float found
,当我检查第一个项目时,这就是-->
['Milk','Milo']
请仅在列上应用它,而不是在整个数据帧上。另一个原因可能是此列包含浮点值。尝试将此应用于数据帧示例,您可以确保它包含正确的字符串值列表。我得到一个错误
TypeError:sequence item 0:expected str instance,float found
,当我检查第一个项时,这就是-->
['Milk',Milo']
请仅对包含列表的列应用此选项,而不是对整个数据帧应用此选项。另一个原因可能是此列包含浮点值。尝试将此应用于dataframe示例,您可以确保它包含正确的字符串值列表。