Python 对数据帧中的事务链求和，由列值链接的行_Python_Pandas

Python 对数据帧中的事务链求和，由列值链接的行

python pandas

Python 对数据帧中的事务链求和，由列值链接的行,python,pandas,Python,Pandas,我试图从一个数据帧链接多个行，以便通过连接接收方ID和发送方ID获得所有可能的路径以下是我的数据帧示例： transaction_id sender_id receiver_id amount 0 213234 002 125 10 1 223322 017 354 90 2 343443 125 689 70 3

我试图从一个数据帧链接多个行，以便通过连接接收方ID和发送方ID获得所有可能的路径

以下是我的数据帧示例：

   transaction_id sender_id receiver_id  amount
0          213234       002         125      10
1          223322       017         354      90
2          343443       125         689      70
3          324433       689         233       5
4          328909       354         456      10

创建时使用：

df = pd.DataFrame(
    {'transaction_id': {0: '213234', 1: '223322', 2: '343443', 3: '324433', 4: '328909'},
     'sender_id': {0: '002', 1: '017', 2: '125', 3: '689', 4: '354'},
     'receiver_id': {0: '125', 1: '354', 2: '689', 3: '233', 4: '456'},
     'amount': {0: 10, 1: 90, 2: 70, 3: 5, 4: 10}}
)

我的代码的结果应该是链式ID列表和交易链的总金额。对于上面示例中的前两行，类似于：

[('002', '125', '689', '233'), 85]
[('017', '354', '456'), 100]

我已经尝试遍历这些行，并将每一行转换为

节点

类的实例，然后使用方法遍历链表，但我不知道下一步是什么：

class Node:
    def __init__(self,transaction_id,sender,receiver,amount):
        self.transac = transaction_id
        self.val = sender_id
        self.next = receiver_id
        self.amount = amount
    def traverse(self):
        node = self # start from the head node
        while node != None:
            print (node.val) # access the node value
            node = node.next # move on to the next node

for index, row in customerTransactionSqlDf3.iterrows():
    index = Node( 
        row["transaction_id"],
        row["sender_id"],
        row["receiver_id"],
        row["amount"]
    )

其他信息：

发件人id值是唯一的，对于每个发件人id，只有一个可能的交易链
没有循环，也不存在接收方id指向同一路径中的发送方id的链

id->id

next

节点

id

# receiver and amount rows, indexed by sender
edges = df[['sender_id', 'receiver_id', 'amount']].set_index('sender_id')
paths = {}   # sender -> [sender, receiver, receiver, receiver, ...]
totals = {}  # sender -> total amount

for sender, next_, amount in edges.itertuples():
    path = paths[sender] = [sender, next_]
    totals[sender] = amount
    while True:
        if next_ in paths:
            # re-use already found path
            path += paths[next_]
            totals[sender] += totals[next_]
            break

        try:
            next_, amount = edges.loc[next_]
        except KeyError:
            break  # path complete

        path.append(next_)
        totals[sender] += amount

for sender, next_, amount in edges.itertuples():
    if sender in paths:
        # already handled as part of a longer path
        continue

    paths[sender], totals[sender] = [sender, next_], amount
    senders = [sender]  # all sender ids along the path

    while True:
        if next_ in paths:
            # re-use already found path
            for sender in senders:
                paths[sender] += paths[next_]
                totals[sender] += totals[next_]
            break

        if next_ not in edges.index:
            break  # path complete

        # start a new path from this sender id
        paths[next_], totals[next_] = [next_], 0
        senders.append(next_)

        next_, amount = edges.loc[next_]
        for sender in senders:
            paths[sender].append(next_)
            totals[sender] += amount

df['path'], df['total'] = df.sender_id.map(paths), df.sender_id.map(totals)

交易\u id发送方\u id接收方\u id金额路径合计
0          213234       002         125      10  [002, 125, 689, 233]     85
1          223322       017         354      90       [017, 354, 456]    100
2          343443       125         689      70       [125, 689, 233]     75
3          324433       689         233       5            [689, 233]      5
4          328909       354         456      10            [354, 456]     10

for id, path in paths.items():
    print(id, path, totals[id])

002['002'，125'，689'，233']85
125 ['125', '689', '233'] 75
689 ['689', '233'] 5
017 ['017', '354', '456'] 100
354 ['354', '456'] 10

节点

节点

已访问

发送者\u id

sender\u id

遍历

属性以获取唯一链
df=pd.DataFrame(
{'transaction_id'：{0:'213234'，1:'223322'，2:'34343443'，3:'324433'，4:'328909'}，
“发件人id:{0:'002'，1:'017'，2:'125'，3:'689'，4:'354'}，
'接收者id'：{0:'125'，1:'354'，2:'689'，3:'233'，4:'456'}，
‘金额’：{0:10,1:90,2:70,3:5,4:10}
)
类节点：
定义初始（自身、交易id、发送方id、接收方id、金额）：
self.transac=事务\u id
self.sender=sender\u id
self.receiver=receiver\u id
self.next=无
self.amount=金额
自我访问=错误
def遍历（自身，链=无，总计=0）：
if（自访问）：#撤消访问的节点
返回
自我访问=真实
如果链为无：#这是遍历的开始
链=[self.sender]
链+=[自我接收器]
合计+=自付金额
如果self.next不是None：
返回self.next.travel（链，总计）
返回链，总计
transc=[节点（
行[“事务处理id”]，
行[“发送方id”]，
行[“接收方id”]，
行[“金额”]
)对于i，df.iterrows（）中的行
#连接节点
对于枚举（transc）中的i、v：
对于枚举（transc）中的j，k：
#如果接收器v与来自j的发送器相同
如果v.receiver==k.sender：
v、 next=k
summary=[i.TRAVENSE（）表示transc中的i]
summary=[如果i不是None，则i代表summary中的i]#删除None
打印（摘要）

输出：
[
(['002', '125', '689', '233'], 85), 
(['017', '354', '456'], 100)
]
我认为它可以完成这项工作，但预期的结果应该是：[（['002'，125'，689'，233'，85），（['017'，354'，456'，100），]
我不需要所有的事务链id，我只需要所有可能的链将我的解决方案用于获得唯一的链。这将是非常有帮助的，如果这被标记为答案，如果这解决了你的问题。太好了！这解决了我的问题