Python 将共享同一密钥的行合并为一行_Python_Python 3.x_Pandas_Dataframe

Python 将共享同一密钥的行合并为一行

python python-3.x pandas dataframe

Python 将共享同一密钥的行合并为一行,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,我有一个数据框，希望创建另一个列，将名称以Answer和QID中相同的值开头的列组合在一起也就是说，具有以下数据帧 QID Category Text QType Question: Answer0 Answer1 Country 0 16 Automotive Access to car Single Do you have access to a car? I own a car/cars I own a car

我有一个数据框，希望创建另一个列，将名称以

Answer

和

QID

中相同的

值开头的列组合在一起
也就是说，具有以下数据帧
    QID     Category    Text    QType   Question:   Answer0     Answer1     Country
0   16  Automotive  Access to car   Single  Do you have access to a car?    I own a car/cars    I own a car/cars  UK
1   16  Automotive  Access to car   Single  Do you have access to a car?    I lease/ have a company car     I lease/have a company car  UK
2   16  Automotive  Access to car   Single  Do you have access to a car?    I have access to a car/cars     I have access to a car/cars     UK
3   16  Automotive  Access to car   Single  Do you have access to a car?    No, I don’t have access to a car/cars   No, I don't have access to a car    UK
4   16  Automotive  Access to car   Single  Do you have access to a car?    Prefer not to say   Prefer not to say   UK

因此，我希望得到以下结果：
        QID     Category    Text    QType   Question:   Answer0     Answer1     Answer2    Answer3  Country    Answers
    0   16  Automotive  Access to car   Single  Do you have access to a car?    I own a car/cars    I lease/ have a company car      I have access to a car/cars    No, I don’t have access to a car/cars    UK    ['I own a car/cars', 'I lease/ have a company car'   ,'I have access to a car/cars', 'No, I don’t have access to a car/cars', 'Prefer not to say     Prefer not to say']

到目前为止，我尝试了以下方法：
previous_qid = None
i = 0
j = 0
answers = []
new_row = {}
new_df = pd.DataFrame(columns=df.columns)
for _, row in df.iterrows():
    # get QID
    qid = row['QID']
    if qid == previous_qid:
        i+=1
        new_row['Answer'+str(i)]=row['Answer0']
        answers.append(row['Answer0'])
    elif new_row != {}:
        # we moved to a new row
        new_row['QID'] = qid
        new_row['Question'] = row['Question']
        new_row['Answers'] = answers
        # we create a new row in the new_dataframe
        new_df.append(new_row, ignore_index=True)
        # we clean up everything to receive the next row
        answers = []
        i=0
        j+=1
        new_row = {}
        # we add the information of the current row
        new_row['Answer'+str(i)]=row['Answer0']
        answers.append(row['Answer0'])
    previous_qid = qid

但是new\u df
结果为空。
这是通过QID逻辑分组得到答案列表，然后将列表拆分回列
重新导入
data=“”QID类别文本QType问题：回答0回答1国家/地区
你有车吗？我有车我有车英国
1 16汽车使用汽车单人您有汽车使用权吗？我租赁/有公司汽车我租赁/有公司汽车英国
2 16汽车进入汽车单人你有进入汽车的权利吗？我有进入汽车的权利我有进入汽车的权利
你有车吗？没有，我没有车/没有，我没有车
4 16汽车进入汽车单人你有进入汽车的途径吗？宁愿不说宁愿不说英国
a=[[t.strip（）表示重新拆分中的t（“，l）如果t！=”“]表示重新拆分中的l（（[0-9]？[]）*（.*），r“\2”，l）表示数据拆分中的l（“\n”）]]
df=pd.DataFrame（data=a[1:]，columns=a[0]）
#lazy-除了QID和Answer列之外，首先需要所有属性
agg={col:“first”表示列表（df.columns）中的col，如果col！=“QID”和“Answer”不在col}
#获取QID答案0中所有答案的列表
agg={**agg，***{“Answer0”：lambda s:list}
#行调用的助手函数。不需要，但更具可读性
def ans（r，i）：
返回“”如果i>=len（r[“AnswerT”]）否则r[“AnswerT”][i]
#使用assign将列表从聚合中拆分回列
#将Answer0从聚合重命名为AnserT，以便可以引用它。
#当你不再需要它的时候，不要放弃它
dfgrouped=df.groupby（“QID”）.agg（agg）.reset_index（）.rename（columns={“Answer0”：“AnswerT”}.assign(
Answer0=λdfa:dfa.apply（λr:ans（r，0），轴=1），
答案1=λdfa:dfa.apply（λr:ans（r，1），轴=1），
回答2=λdfa:dfa.apply（λr:ans（r，2），轴=1），
答案3=λdfa:dfa.apply（λr:ans（r，3），轴=1），
回答4=λdfa:dfa.apply（λr:ans（r，4），轴=1），
回答5=λdfa:dfa.apply（λr:ans（r，5），轴=1），
回答6=λdfa:dfa.apply（λr:ans（r，6），轴=1），
).下降（“应答”，轴=1）
打印（dfgrouped.to_字符串（index=False））

输出
QID    Category           Text   QType                     Question: Country           Answer0                      Answer1                      Answer2                                Answer3            Answer4 Answer5 Answer6
 16  Automotive  Access to car  Single  Do you have access to a car?      UK  I own a car/cars  I lease/ have a company car  I have access to a car/cars  No, I don’t have access to a car/cars  Prefer not to say                

更有活力
这对高级python
有了更深入的了解。使用**kwargs
和functools.partial
。实际上它仍然是静态的，列被定义为常量MAXANS

导入工具
MAXANS=8
def ansassign（dfa，行=0）：
返回dfa.apply（lambda r:“如果行>=len（r[“AnswerT”]），否则r[“AnswerT”][row]，axis=1）
dfgrouped=df.groupby（“QID”）.agg（agg）.reset_index（）.rename（columns={“Answer0”：“AnswerT”}.assign(
**{f“Answer{i}”：functools.partial（ansassign，row=i）表示范围内的i（MAXANS）}
).下降（“应答”，轴=1）
发布更多基本示例和预期结果。以上预期结果对我来说毫无意义。非常感谢！事实上，我的答案可能不止7个，我怎样才能使它变得动态，以获得与具有相同（QID
，问题：
）的行一样多的答案？@revolutionormonica我想不出一种真正动态的方法，因为无法找到列表中逐行的项目数。已经更新了，但是要注意很少有人是这类编码的专家。你的更新真的很酷！太先进了！我在考虑和大家分享这个故事。也许这能帮你弄到号码dynamically@RevolucionforMonica在许多方面，我更喜欢第一种方法——它更透明。80%的时间用于维护代码。。。使用很多高级概念的代码维护起来非常昂贵。是的，实际上我认为你是对的。我正在使用我共享的数据帧使您的代码动态化。我还不知道，但我相信我能做点什么。我发布了，而不是在这里用这个动态问题困扰你，如果你想要更多的点^^