在csv python中折叠类别
我有一个数据框“locations”,其中包含一些商店的类型,它非常混乱,有很多不同的类别,所以我想合并一些类别,这样就有越来越少的简单类别。我该怎么做 例如:在csv python中折叠类别,python,dataframe,jupyter-notebook,Python,Dataframe,Jupyter Notebook,我有一个数据框“locations”,其中包含一些商店的类型,它非常混乱,有很多不同的类别,所以我想合并一些类别,这样就有越来越少的简单类别。我该怎么做 例如: store type mcdonalds fast-food nandos sit-down-food wetherspoons tech-pub southsider pub-and-dine 我喜欢把快餐和坐着吃的食物合并成“食物”,把科技酒吧和酒馆、餐厅合并成“酒馆”。如何
store type
mcdonalds fast-food
nandos sit-down-food
wetherspoons tech-pub
southsider pub-and-dine
我喜欢把快餐和坐着吃的食物合并成“食物”,把科技酒吧和酒馆、餐厅合并成“酒馆”。如何执行此操作?您可以使用由要替换为所需类型的类型键入的dict作为值。然后将列设置为列表,替换类型,但保留所需的类型
# Dict specifying the types to replace
type_dict = {'fast-food':'food','sit-down-food':'food',
'tech-pub':'pub','pub-and-dine':'pub'}
# Replace types that are dict keys but keep the values that aren't dict keys
df['type'] = [type_dict.get(i,i) for i in df['type']]
我的第一反应是使用pandas apply函数来映射所需的值。大致如下:
import pandas as pd
def nameMapper(name):
if "food" in name:
return "food"
elif "pub" in name:
return "pub"
else:
return "something else"
data = [
["mcdonalds", "fast-food"],
["nandos","sit-down-food"],
["wetherspoons","tech-pub"],
["southsider","pub-and-dine"]
]
df = pd.DataFrame(data, columns={"store", "type"})
print(df)
print("---------------------------")
df["type"] = df["type"].apply(nameMapper)
print(df)
当我运行这个程序时,产生了以下输出
$ python3 answer.py
store type
0 mcdonalds fast-food
1 nandos sit-down-food
2 wetherspoons tech-pub
3 southsider pub-and-dine
---------------------------
store type
0 mcdonalds food
1 nandos food
2 wetherspoons pub
3 southsider pub