Pandas 如果其他列为空,则连接某些列

Pandas 如果其他列为空,则连接某些列,pandas,Pandas,我有一个CSV文件,应该是这样的: ID, years_active, issues ------------------------------- 'Truck1', 8, 'In dire need of a paintjob' 'Car 5', 3, 'To small for large groups' 但是,CSV的格式有点不正确,目前看起来是这样的 ID, years_active, issues ------------------------ 'Truck1', 8, 'In

我有一个CSV文件,应该是这样的:

ID, years_active, issues
-------------------------------
'Truck1', 8, 'In dire need of a paintjob'
'Car 5', 3,  'To small for large groups'
但是,CSV的格式有点不正确,目前看起来是这样的

ID, years_active, issues
------------------------
'Truck1', 8, 'In dire need'
'','', 'of a'
'','', 'paintjob'
'Car 5', 3, 'To small for'
'', '', 'large groups'
现在,我可以通过缺少“ID”和“years\u active”值来识别错误的行,并希望将该行的“issues”值附加到具有“ID”和“years\u active”值的上一行

我对熊猫不是很有经验,但我想到了以下代码:

for index, row in df.iterrows():
        if row['years_active'] == None:
            df.loc[index-1]['issues'] += row['issues']
然而,如果条件未能触发。
我想做的事情可能吗?如果是这样,有人知道我做错了什么吗

给定您的示例输入:

df = pd.DataFrame({
    'ID': ['Truck1', '', '', 'Car 5', ''],
    'years_active': [8, '', '', 3, ''],
    'issues': ['In dire need', 'of a', 'paintjob', 'To small for', 'large groups']
})
您可以使用:

new_df = df.groupby(df.ID.replace('', method='ffill')).agg({'years_active': 'first', 'issues': ' '.join})
这将给你:

        years_active                      issues
ID                                              
Car 5              3   To small for large groups
Truck1             8  In dire need of a paintjob

因此,我们在这里所做的是将非空ID向前填充到后续的空ID中,并使用这些ID对相关行进行分组。然后,我们进行聚合,以获取活动年份的第一次出现,并按照问题列出现的顺序将问题列连接在一起,以创建单个结果。

以下使用
for
循环查找和添加字符串(来自JonClements答案的数据框):

输出:

       ID                       issues years_active
0  Truck1   In dire need of a paintjob            8
3   Car 5    To small for large groups            3

在这个问题的上下文中可能值得一提的是,通过使用StringIO库,有一种经常被忽略的处理笨拙输入的方法

重要的一点是
read\u csv
可以从StringIO“文件”中读取

在这种情况下,我会舍弃单引号和多个逗号,这会混淆
read\u csv
,并将第二行和后续输入行附加到第一行,形成完整的常规csv行形式
read\u csv

以下是
read\u csv
接收到的内容

       ID   years_active                      issues
0  Truck1              8  In dire need of a paintjob
1   Car 5              3   To small for large groups
代码很难看,但很容易理解

import pandas as pd
from io import StringIO

for_pd = StringIO()
with open('jasper.txt') as jasper:
    print (jasper.readline(), file=for_pd)
    line = jasper.readline()
    complete_record = ''
    for line in jasper:
        line = ''.join(line.rstrip().replace(', ', ',').replace("'", ''))
        if line.startswith(','):
            complete_record += line.replace(',,', ',').replace(',', ' ')
        else:
            if complete_record:
                print (complete_record, file=for_pd)
            complete_record = line
if complete_record:
    print (complete_record, file=for_pd)

for_pd.seek(0)

df = pd.read_csv(for_pd)
print (df)

另一种选择是:
df.groupby(df.ID.str.len().gt(0.cumsum()).agg({'issues':''.join,'years\u active':'first'})
import pandas as pd
from io import StringIO

for_pd = StringIO()
with open('jasper.txt') as jasper:
    print (jasper.readline(), file=for_pd)
    line = jasper.readline()
    complete_record = ''
    for line in jasper:
        line = ''.join(line.rstrip().replace(', ', ',').replace("'", ''))
        if line.startswith(','):
            complete_record += line.replace(',,', ',').replace(',', ' ')
        else:
            if complete_record:
                print (complete_record, file=for_pd)
            complete_record = line
if complete_record:
    print (complete_record, file=for_pd)

for_pd.seek(0)

df = pd.read_csv(for_pd)
print (df)