Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/283.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 为凌乱的数据替换部分字符串(以更快的方式代替字符串替换)?_Python_Regex_Pandas_Fasttext - Fatal编程技术网

Python 为凌乱的数据替换部分字符串(以更快的方式代替字符串替换)?

Python 为凌乱的数据替换部分字符串(以更快的方式代替字符串替换)?,python,regex,pandas,fasttext,Python,Regex,Pandas,Fasttext,我想替换产品变体的许多值 Big Ben Personalized Products AVENGERS – Stark / 2 set 2 BigBen Personalized Products Expendables – Statham / 2 set 2 BigBen Personalized Toy 20.00

我想替换产品变体的许多值

Big Ben Personalized Products AVENGERS – Stark / 2 set                                                2
BigBen Personalized Products Expendables – Statham / 2 set                                            2
BigBen Personalized Toy 20.00% Off Auto renew Adults Toy / 5 set                                      2
BigBen Personalized Toy 20.00% Off Auto renew Adults Toy / 3 set                                       1
Personalized Toy 5 set                                                                                  1
BIG BEN Personalized  Machine 20.00% Off Auto renew (Versand jeden 3 Monate) Kids Toy / 3 set    1
BigBen Personalized Toy 20.00% Off Auto renew (Versand jeden 2 Monate) Kids Toy / 5 set            1
BigBen Personalized Toy 20.00% Off Auto renew (Versand jeden 2 Monate) Adults Toy / 5 set              1
BigBen Personalized Products 20.00% Off Auto renew (Versand jeden 5 Monate) Adults Toy / 5 set                   
有许多产品变体实际上具有相同的值

我想知道是否有比使用更快捷的方法:

df["product_variant"]= df["product_variant"].str.replace('BigBen Personalized', '',case = False) 
df["product_variant"]= df["product_variant"].str.replace('Big Ben Personalized ', '',case = False)
df["product_variant"]= df["product_variant"].str.replace('BigBen Personalized', '',case = False)
df["product_variant"]= df["product_variant"].str.replace('Auto renew', '',case = False) 

一个选项是为这些示例创建一个带有2个捕获组的特定模式

对于大多数项目,在
产品
之后或
成人
儿童

  • 组1中捕获
    /
    之前的零件(如果存在)
  • 第2组中捕获1或数字后接
    set
示例模式

^(?:big\s*ben personalized (?:products\s+)?(?:.*?(?=Adult|Kids))?|personalized\s+)(\w+(?: \w+)*(?: – \w+(?: \w+)*)?)(?: /)? (\d+ set)\b.*

在替换中使用2个捕获组
\1(\2)

输出

                 product_variant
0       AVENGERS – Stark (2 set)
1  Expendables – Statham (2 set)
2             Adults Toy (5 set)
3             Adults Toy (3 set)
4                    Toy (5 set)
5               Kids Toy (3 set)
6               Kids Toy (5 set)
7             Adults Toy (5 set)
8             Adults Toy (5 set)

答案出来了吗?
import pandas as pd

regex = r"^Event:\s+Task_(\d+)Error:(NO_ERROR|ERROR_(?:MINOR|\d+))(?:\w+:(\w+))?"

items = [
    "Big Ben Personalized Products AVENGERS – Stark / 2 set                                                2",
    "BigBen Personalized Products Expendables – Statham / 2 set                                            2",
    "BigBen Personalized Toy 20.00% Off Auto renew Adults Toy / 5 set                                      2",
    "BigBen Personalized Toy 20.00% Off Auto renew Adults Toy / 3 set                                       1",
    "Personalized Toy 5 set                                                                                  1",
    "BIG BEN Personalized  Machine 20.00% Off Auto renew (Versand jeden 3 Monate) Kids Toy / 3 set    1",
    "BigBen Personalized Toy 20.00% Off Auto renew (Versand jeden 2 Monate) Kids Toy / 5 set            1",
    "BigBen Personalized Toy 20.00% Off Auto renew (Versand jeden 2 Monate) Adults Toy / 5 set              1",
    "BigBen Personalized Products 20.00% Off Auto renew (Versand jeden 5 Monate) Adults Toy / 5 set                   "
]


df = pd.DataFrame(items, columns=["product_variant"])
df["product_variant"] = df["product_variant"].replace(
    r'(?i)^(?:big\s*ben personalized (?:products\s+)?(?:.*?(?=Adult|Kids))?|personalized\s+)(\w+(?: \w+)*(?: – \w+(?: \w+)*)?)(?: /)? (\d+ set)\b.*',
    r'\1 (\2)',
    regex=True
)
print(df)
                 product_variant
0       AVENGERS – Stark (2 set)
1  Expendables – Statham (2 set)
2             Adults Toy (5 set)
3             Adults Toy (3 set)
4                    Toy (5 set)
5               Kids Toy (3 set)
6               Kids Toy (5 set)
7             Adults Toy (5 set)
8             Adults Toy (5 set)