Python 基于其他列中以前的值填充新列
我有一个关于用户、访问类型(预订或搜索)和酒店的数据集。我需要根据该行先前预订的酒店,用预订最多的酒店填充一个新列 比如说,Python 基于其他列中以前的值填充新列,python,pandas,Python,Pandas,我有一个关于用户、访问类型(预订或搜索)和酒店的数据集。我需要根据该行先前预订的酒店,用预订最多的酒店填充一个新列 比如说, **user** **visit_type** **hotel_code** **most_booked** 1 user1 search 1 NaN 2 user1 search 2 NaN 3 user
**user** **visit_type** **hotel_code** **most_booked**
1 user1 search 1 NaN
2 user1 search 2 NaN
3 user1 booking 1 NaN
4 user1 search 8 NaN
5 user1 booking 8 1
6 user2 search 6 NaN
7 user2 booking 6 NaN
8 user2 search 4 NaN
9 user2 booking 4 6
10 user2 booking 6 4
11 user2 booking 4 6
在这个例子中:
第3行hotel=NaN中,用户1预订最多的酒店是,因为之前没有预订过酒店,而第5行中的酒店是hotel=1
对于用户2,第7行是hotel=NaN,第9行是hotel=6,第10行hotel=4(因为这是最后一次预订,只有两个酒店预订),而对于最后一行11,酒店将是第6行,因为这是到目前为止预订最多的酒店(不考虑第11行)。这将实现您想要的:
import pandas as pd
import operator
from collections import defaultdict
d = { "user":["user1","user1","user1","user1","user1","user2","user2","user2","user2","user2","user2"],
"visit_type":["search","search","booking","search","booking","search","booking","search","booking","booking","booking"],
"hotel_code":[1,2,1,8,8,6,6,4,4,6,4]}
df = pd.DataFrame(data=d)
#Setting default value
df['most_booked']='NaN'
for user in df.user.unique():
#Ignoring searches, only considering bookings
df_bookings = df.loc[(df["visit_type"] == "booking") & (df['user'] == user)]
last_booked = ""
booking_counts = defaultdict(int)
for i, entry in df_bookings.iterrows():
#Skipping first booking
if last_booked != "":
highest = max(booking_counts.values())
#Prefers last booked if it equals max
if booking_counts[last_booked] == highest:
max_booked = last_booked
#Otherwise chooses max
else:
max_booked = max(booking_counts.items(), key=operator.itemgetter(1))[0]
df.loc[i, 'most_booked'] = max_booked
#Update number of bookings in dictionary
current_booking = entry["hotel_code"]
booking_counts[current_booking] += 1
last_booked = current_booking
print(df)
hotel_code user visit_type most_booked
0 1 user1 search NaN
1 2 user1 search NaN
2 1 user1 booking NaN
3 8 user1 search NaN
4 8 user1 booking 1
5 6 user2 search NaN
6 6 user2 booking NaN
7 4 user2 search NaN
8 4 user2 booking 6
9 6 user2 booking 4
10 4 user2 booking 6