Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/289.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/spring-mvc/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 删除字符串中的重复项,但删除整个数据帧的重复项_Python_Pandas_Transform - Fatal编程技术网

Python 删除字符串中的重复项,但删除整个数据帧的重复项

Python 删除字符串中的重复项,但删除整个数据帧的重复项,python,pandas,transform,Python,Pandas,Transform,我想在这篇文章中实现一些类似的功能,但要以一种高效的方式实现整个数据帧 我的数据看起来像这样:它是一个有很多列的熊猫数据框。它有逗号分隔的字符串,其中有许多重复项,我希望删除这些单独字符串中的所有重复项 +--------------------+---------+---------------------+ | Col1 | Col2 | Col3 | +--------------------+---------+------

我想在这篇文章中实现一些类似的功能,但要以一种高效的方式实现整个数据帧

我的数据看起来像这样:它是一个有很多列的熊猫数据框。它有逗号分隔的字符串,其中有许多重复项,我希望删除这些单独字符串中的所有重复项

+--------------------+---------+---------------------+
|        Col1        |  Col2   |        Col3         |
+--------------------+---------+---------------------+
| Dog, Dog, Dog      | India   | Facebook, Instagram |
| Dog, Squirrel, Cat | Norway  | Facebook, Facebook  |
| Cat, Cat, Cat      | Germany | Twitter             |
+--------------------+---------+---------------------+
可复制示例:

df = pd.DataFrame({"col1": ["Dog, Dog, Dog", "Dog, Squirrel, Cat", "Cat, Cat, Cat"],
                     "col2": ["India", "Norway", "Germany"],
                     "col3": ["Facebook, Instagram", "Facebook, Facebook", "Twitter"]})
我希望它将其转换为:

+--------------------+---------+---------------------+
|        Col1        |  Col2   |        Col3         |
+--------------------+---------+---------------------+
| Dog                | India   | Facebook, Instagram |
| Dog, Squirrel, Cat | Norway  | Facebook            |
| Cat                | Germany | Twitter             |
+--------------------+---------+---------------------+

让我们做
get_dummies
然后
dot

s=df.col1.str.get_dummies(', ')
df['Col1']=s.dot(s.columns+',').str[:-1]
df
Out[460]: 
                 col1     col2                 col3              Col1
0       Dog, Dog, Dog    India  Facebook, Instagram               Dog
1  Dog, Squirrel, Cat   Norway   Facebook, Facebook  Cat,Dog,Squirrel
2       Cat, Cat, Cat  Germany              Twitter               Cat

让我们做
get_dummies
然后
dot

s=df.col1.str.get_dummies(', ')
df['Col1']=s.dot(s.columns+',').str[:-1]
df
Out[460]: 
                 col1     col2                 col3              Col1
0       Dog, Dog, Dog    India  Facebook, Instagram               Dog
1  Dog, Squirrel, Cat   Norway   Facebook, Facebook  Cat,Dog,Squirrel
2       Cat, Cat, Cat  Germany              Twitter               Cat
您可以这样做:

for col in df.columns.tolist():
    df[col] = df[col].str.replace(r'\b(\w+)(,+\s+\1)+\b', r'\1')
您可以这样做:

for col in df.columns.tolist():
    df[col] = df[col].str.replace(r'\b(\w+)(,+\s+\1)+\b', r'\1')
尝试:

[“col1”、“col2”、“col3”中的列的
:
df[col]=df[col].str.split(“,”).map(set).str.join(“,”)
产出:

>>df
col1 col2 col3
0狗印度Facebook、Instagram
1只狗、猫、松鼠
2猫德国推特
试试:

[“col1”、“col2”、“col3”中的列的
:
df[col]=df[col].str.split(“,”).map(set).str.join(“,”)
产出:

>>df
col1 col2 col3
0狗印度Facebook、Instagram
1只狗、猫、松鼠
2猫德国推特