Pandas 如果其他列为空，则连接某些列_Pandas

Pandas 如果其他列为空，则连接某些列

pandas

Pandas 如果其他列为空，则连接某些列,pandas,Pandas,我有一个CSV文件，应该是这样的： ID, years_active, issues ------------------------------- 'Truck1', 8, 'In dire need of a paintjob' 'Car 5', 3, 'To small for large groups' 但是，CSV的格式有点不正确，目前看起来是这样的 ID, years_active, issues ------------------------ 'Truck1', 8, 'In

我有一个CSV文件，应该是这样的：

ID, years_active, issues
-------------------------------
'Truck1', 8, 'In dire need of a paintjob'
'Car 5', 3,  'To small for large groups'

但是，CSV的格式有点不正确，目前看起来是这样的

ID, years_active, issues
------------------------
'Truck1', 8, 'In dire need'
'','', 'of a'
'','', 'paintjob'
'Car 5', 3, 'To small for'
'', '', 'large groups'

现在，我可以通过缺少“ID”和“years\u active”值来识别错误的行，并希望将该行的“issues”值附加到具有“ID”和“years\u active”值的上一行

我对熊猫不是很有经验，但我想到了以下代码：

for index, row in df.iterrows():
        if row['years_active'] == None:
            df.loc[index-1]['issues'] += row['issues']

然而，如果条件未能触发。

我想做的事情可能吗？如果是这样，有人知道我做错了什么吗

给定您的示例输入：

df = pd.DataFrame({
    'ID': ['Truck1', '', '', 'Car 5', ''],
    'years_active': [8, '', '', 3, ''],
    'issues': ['In dire need', 'of a', 'paintjob', 'To small for', 'large groups']
})

您可以使用：

new_df = df.groupby(df.ID.replace('', method='ffill')).agg({'years_active': 'first', 'issues': ' '.join})

这将给你：

        years_active                      issues
ID                                              
Car 5              3   To small for large groups
Truck1             8  In dire need of a paintjob

因此，我们在这里所做的是将非空ID向前填充到后续的空ID中，并使用这些ID对相关行进行分组。然后，我们进行聚合，以获取活动年份的第一次出现，并按照问题列出现的顺序将问题列连接在一起，以创建单个结果。

以下使用

for

循环查找和添加字符串（来自JonClements答案的数据框）：

输出：

       ID                       issues years_active
0  Truck1   In dire need of a paintjob            8
3   Car 5    To small for large groups            3

在这个问题的上下文中可能值得一提的是，通过使用StringIO库，有一种经常被忽略的处理笨拙输入的方法

重要的一点是

read\u csv

可以从StringIO“文件”中读取

在这种情况下，我会舍弃单引号和多个逗号，这会混淆

read\u csv

，并将第二行和后续输入行附加到第一行，形成完整的常规csv行形式

read\u csv

以下是

read\u csv

接收到的内容

       ID   years_active                      issues
0  Truck1              8  In dire need of a paintjob
1   Car 5              3   To small for large groups

代码很难看，但很容易理解

import pandas as pd
from io import StringIO

for_pd = StringIO()
with open('jasper.txt') as jasper:
    print (jasper.readline(), file=for_pd)
    line = jasper.readline()
    complete_record = ''
    for line in jasper:
        line = ''.join(line.rstrip().replace(', ', ',').replace("'", ''))
        if line.startswith(','):
            complete_record += line.replace(',,', ',').replace(',', ' ')
        else:
            if complete_record:
                print (complete_record, file=for_pd)
            complete_record = line
if complete_record:
    print (complete_record, file=for_pd)

for_pd.seek(0)

df = pd.read_csv(for_pd)
print (df)

另一种选择是：

df.groupby（df.ID.str.len（）.gt（0.cumsum（））.agg（{'issues'：''.join，'years\u active'：'first'}）

import pandas as pd
from io import StringIO

for_pd = StringIO()
with open('jasper.txt') as jasper:
    print (jasper.readline(), file=for_pd)
    line = jasper.readline()
    complete_record = ''
    for line in jasper:
        line = ''.join(line.rstrip().replace(', ', ',').replace("'", ''))
        if line.startswith(','):
            complete_record += line.replace(',,', ',').replace(',', ' ')
        else:
            if complete_record:
                print (complete_record, file=for_pd)
            complete_record = line
if complete_record:
    print (complete_record, file=for_pd)

for_pd.seek(0)

df = pd.read_csv(for_pd)
print (df)