Python 在数据框列中拆分值
我有一个数据帧名称df,我想删除fuel列中的这个“|”Python 在数据框列中拆分值,python,dataframe,split,index-error,Python,Dataframe,Split,Index Error,我有一个数据帧名称df,我想删除fuel列中的这个“|” id car fuel 1 Mercedes petrol|diesel|gas 2 Audi gas|petrol 所以我的数据看起来像这样 id car fuel 1 Mercedes petrol 1 Mercedes diesel 1 Mercedes gas 2 Audi gas 2 Audi petrol 这是
id car fuel
1 Mercedes petrol|diesel|gas
2 Audi gas|petrol
所以我的数据看起来像这样
id car fuel
1 Mercedes petrol
1 Mercedes diesel
1 Mercedes gas
2 Audi gas
2 Audi petrol
这是我试过的代码
df_1=hb.copy()
df_2=hb.copy()
df_3=hb.copy()
df_1['fuel']=df_1['fuel'].应用(λx:x.split('|')[0])
df_2['fuel']=df_2['fuel'].应用(λx:x.split(“|”)[1])
df_3['fuel']=df_3['fuel'].应用(λx:x.split(“|”)[2])
这将使Indexer:list索引超出范围您可以尝试以下操作:
#Create the dataframe
df = pd.DataFrame({
"id":[1,2],
"car":["Mercedes","Audi"],
"fuel":["petrol|diesel|gas","gas|petrol"]
})
#Create a new dataframe from the series, with car as the index
new_df = pd.DataFrame(df.fuel.str.split('|').tolist(), index=df.car).stack()
#Get rid of the secondary index
new_df = new_df.reset_index([0, 'car'])
#Add the 'id' back to the dataframe
#Note: There is probably a much more elegant way of doing this
new_df.loc[:,'id'] = new_df.car.apply(lambda x: df[df.loc[:,'car'] == x].id.values[0])
#Rename the columns
new_df.columns = ["car","fuel","id"]
您可以尝试以下方法:
#Create the dataframe
df = pd.DataFrame({
"id":[1,2],
"car":["Mercedes","Audi"],
"fuel":["petrol|diesel|gas","gas|petrol"]
})
#Create a new dataframe from the series, with car as the index
new_df = pd.DataFrame(df.fuel.str.split('|').tolist(), index=df.car).stack()
#Get rid of the secondary index
new_df = new_df.reset_index([0, 'car'])
#Add the 'id' back to the dataframe
#Note: There is probably a much more elegant way of doing this
new_df.loc[:,'id'] = new_df.car.apply(lambda x: df[df.loc[:,'car'] == x].id.values[0])
#Rename the columns
new_df.columns = ["car","fuel","id"]
试试这个:
df=pd.DataFrame({'car':['Mercedes','Audi'],'fuel':['petrol|diesel|gas','gas|petrol']}) #your dataframe
df2=pd.DataFrame() #new black dataframe
for i in range(0,len(df)): #iterating over df
list1=df.iloc[i,1].split('|') #split each value of 'fuel' and store it in a list
for j in range(0,len(list1)): #iterating over list1
list2={'car':df.iloc[i,0],'fuel':list1[j]} #make a dict of each combination of 'car' and elements of list1-'fuel'
df2=df2.append(list2,ignore_index=True) #append each value to the blank df
试试这个:
df=pd.DataFrame({'car':['Mercedes','Audi'],'fuel':['petrol|diesel|gas','gas|petrol']}) #your dataframe
df2=pd.DataFrame() #new black dataframe
for i in range(0,len(df)): #iterating over df
list1=df.iloc[i,1].split('|') #split each value of 'fuel' and store it in a list
for j in range(0,len(list1)): #iterating over list1
list2={'car':df.iloc[i,0],'fuel':list1[j]} #make a dict of each combination of 'car' and elements of list1-'fuel'
df2=df2.append(list2,ignore_index=True) #append each value to the blank df
这是一种方法
Ex:
df = pd.DataFrame({
"id":[1,2],
"car":["Mercedes","Audi"],
"fuel":["petrol|diesel|gas","gas|petrol"]
})
df["fuel"] = df["fuel"].str.split("|")
#Ref https://stackoverflow.com/a/48532692/532312
lst_col = 'fuel'
df = pd.DataFrame({
col:np.repeat(df[col].values, df[lst_col].str.len())
for col in df.columns.drop(lst_col)}
).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns]
print(df)
car fuel id
0 Mercedes petrol 1
1 Mercedes diesel 1
2 Mercedes gas 1
3 Audi gas 2
4 Audi petrol 2
输出:
df = pd.DataFrame({
"id":[1,2],
"car":["Mercedes","Audi"],
"fuel":["petrol|diesel|gas","gas|petrol"]
})
df["fuel"] = df["fuel"].str.split("|")
#Ref https://stackoverflow.com/a/48532692/532312
lst_col = 'fuel'
df = pd.DataFrame({
col:np.repeat(df[col].values, df[lst_col].str.len())
for col in df.columns.drop(lst_col)}
).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns]
print(df)
car fuel id
0 Mercedes petrol 1
1 Mercedes diesel 1
2 Mercedes gas 1
3 Audi gas 2
4 Audi petrol 2
这是一种方法
Ex:
df = pd.DataFrame({
"id":[1,2],
"car":["Mercedes","Audi"],
"fuel":["petrol|diesel|gas","gas|petrol"]
})
df["fuel"] = df["fuel"].str.split("|")
#Ref https://stackoverflow.com/a/48532692/532312
lst_col = 'fuel'
df = pd.DataFrame({
col:np.repeat(df[col].values, df[lst_col].str.len())
for col in df.columns.drop(lst_col)}
).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns]
print(df)
car fuel id
0 Mercedes petrol 1
1 Mercedes diesel 1
2 Mercedes gas 1
3 Audi gas 2
4 Audi petrol 2
输出:
df = pd.DataFrame({
"id":[1,2],
"car":["Mercedes","Audi"],
"fuel":["petrol|diesel|gas","gas|petrol"]
})
df["fuel"] = df["fuel"].str.split("|")
#Ref https://stackoverflow.com/a/48532692/532312
lst_col = 'fuel'
df = pd.DataFrame({
col:np.repeat(df[col].values, df[lst_col].str.len())
for col in df.columns.drop(lst_col)}
).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns]
print(df)
car fuel id
0 Mercedes petrol 1
1 Mercedes diesel 1
2 Mercedes gas 1
3 Audi gas 2
4 Audi petrol 2
可能的重复是如此令人困惑可能的重复是如此令人困惑谢谢你的回答,但这不是我为大数据集找到的正确答案。问题是什么?太慢了吗?谢谢你的回答,但这不是我为大数据集找到的正确答案。问题是什么?是不是太慢了?