Python 2.7 将两个dataframe列中的数据合并到一列中_Python 2.7_Pandas

Python 2.7 将两个dataframe列中的数据合并到一列中

python-2.7 pandas

Python 2.7 将两个dataframe列中的数据合并到一列中,python-2.7,pandas,Python 2.7,Pandas,我在两个独立的DataFrame列中有时间序列数据，它们引用相同的参数，但长度不同对于数据只存在于一列中的日期，我希望将此值放置在我的新列中。对于两列都有条目的日期，我希望得到平均值。（我想使用索引加入，它是一个日期时间值）有人能建议一种方法，让我把我的两个专栏结合起来吗？谢谢 Edit2：我编写了一些代码，应该合并两个列中的数据，但是当我尝试使用从第一个df有值但第二个df没有值的行生成的索引设置新值时，我得到了一个KeyError。代码如下： def merge_func(df):

我在两个独立的

DataFrame

列中有时间序列数据，它们引用相同的参数，但长度不同

对于数据只存在于一列中的日期，我希望将此值放置在我的新列中。对于两列都有条目的日期，我希望得到平均值。（我想使用索引加入，它是一个日期时间值）

有人能建议一种方法，让我把我的两个专栏结合起来吗？谢谢

Edit2：我编写了一些代码，应该合并两个列中的数据，但是当我尝试使用从第一个df有值但第二个df没有值的行生成的索引设置新值时，我得到了一个

KeyError

。代码如下：

def merge_func(df):
    null_index = df[(df['DOC_mg/L'].isnull() == False) & (df['TOC_mg/L'].isnull() == True)].index
    df['TOC_mg/L'][null_index] = df[null_index]['DOC_mg/L']
    notnull_index = df[(df['DOC_mg/L'].isnull() == True) & (df['TOC_mg/L'].isnull() == False)].index
    df['DOC_mg/L'][notnull_index] = df[notnull_index]['TOC_mg/L']

    df.insert(len(df.columns), 'Mean_mg/L', 0.0)
    df['Mean_mg/L'] = (df['DOC_mg/L'] + df['TOC_mg/L']) / 2
    return df

merge_func(sve)

下面是错误：

KeyError: "['2004-01-14T01:00:00.000000000+0100' '2004-03-04T01:00:00.000000000+0100'\n '2004-03-30T02:00:00.000000000+0200' '2004-04-12T02:00:00.000000000+0200'\n '2004-04-15T02:00:00.000000000+0200' '2004-04-17T02:00:00.000000000+0200'\n '2004-04-19T02:00:00.000000000+0200' '2004-04-20T02:00:00.000000000+0200'\n '2004-04-22T02:00:00.000000000+0200' '2004-04-26T02:00:00.000000000+0200'\n '2004-04-28T02:00:00.000000000+0200' '2004-04-30T02:00:00.000000000+0200'\n '2004-05-05T02:00:00.000000000+0200' '2004-05-07T02:00:00.000000000+0200'\n '2004-05-10T02:00:00.000000000+0200' '2004-05-13T02:00:00.000000000+0200'\n '2004-05-17T02:00:00.000000000+0200' '2004-05-20T02:00:00.000000000+0200'\n '2004-05-24T02:00:00.000000000+0200' '2004-05-28T02:00:00.000000000+0200'\n '2004-06-04T02:00:00.000000000+0200' '2004-06-10T02:00:00.000000000+0200'\n '2004-08-27T02:00:00.000000000+0200' '2004-10-06T02:00:00.000000000+0200'\n '2004-11-02T01:00:00.000000000+0100' '2004-12-08T01:00:00.000000000+0100'\n '2011-02-21T01:00:00.000000000+0100' '2011-03-21T01:00:00.000000000+0100'\n '2011-04-04T02:00:00.000000000+0200' '2011-04-11T02:00:00.000000000+0200'\n '2011-04-14T02:00:00.000000000+0200' '2011-04-18T02:00:00.000000000+0200'\n '2011-04-21T02:00:00.000000000+0200' '2011-04-25T02:00:00.000000000+0200'\n '2011-05-02T02:00:00.000000000+0200' '2011-05-09T02:00:00.000000000+0200'\n '2011-05-23T02:00:00.000000000+0200' '2011-06-07T02:00:00.000000000+0200'\n '2011-06-21T02:00:00.000000000+0200' '2011-07-04T02:00:00.000000000+0200'\n '2011-07-18T02:00:00.000000000+0200' '2011-08-31T02:00:00.000000000+0200'\n '2011-09-13T02:00:00.000000000+0200' '2011-09-28T02:00:00.000000000+0200'\n '2011-10-10T02:00:00.000000000+0200' '2011-10-25T02:00:00.000000000+0200'\n '2011-11-08T01:00:00.000000000+0100' '2011-11-28T01:00:00.000000000+0100'\n '2011-12-20T01:00:00.000000000+0100' '2012-01-19T01:00:00.000000000+0100'\n '2012-02-14T01:00:00.000000000+0100' '2012-03-13T01:00:00.000000000+0100'\n '2012-03-27T02:00:00.000000000+0200' '2012-04-02T02:00:00.000000000+0200'\n '2012-04-10T02:00:00.000000000+0200' '2012-04-17T02:00:00.000000000+0200'\n '2012-04-26T02:00:00.000000000+0200' '2012-04-30T02:00:00.000000000+0200'\n '2012-05-03T02:00:00.000000000+0200' '2012-05-07T02:00:00.000000000+0200'\n '2012-05-10T02:00:00.000000000+0200' '2012-05-14T02:00:00.000000000+0200'\n '2012-05-22T02:00:00.000000000+0200' '2012-06-05T02:00:00.000000000+0200'\n '2012-06-19T02:00:00.000000000+0200' '2012-07-03T02:00:00.000000000+0200'\n '2012-07-17T02:00:00.000000000+0200' '2012-07-31T02:00:00.000000000+0200'\n '2012-08-14T02:00:00.000000000+0200' '2012-08-28T02:00:00.000000000+0200'\n '2012-09-11T02:00:00.000000000+0200' '2012-09-25T02:00:00.000000000+0200'\n '2012-10-10T02:00:00.000000000+0200' '2012-10-24T02:00:00.000000000+0200'\n '2012-11-21T01:00:00.000000000+0100' '2012-12-18T01:00:00.000000000+0100'] not in index"

很接近，但是在使用isnull（）函数时，实际上不需要迭代行。默认情况下

df[(df['DOC_mg/L'].isnull() == False) & (df['TOC_mg/L'].isnull() == True)].index

将仅返回

DOC\u mg/L

不为空且

TOC\u mg/L

为空的行索引

现在您可以这样做来设置TOC_mg/L的值：

null_index = df[(df['DOC_mg/L'].isnull() == False) & \
                (df['TOC_mg/L'].isnull() == True)].index
df['TOC_mg/L'][null_index] = df['DOC_mg/L'][null_index] # EDIT To switch the index position.

这将使用TOC_mg/L为空且DOC_mg/L不为空的行的索引，并将TOC_mg/L的值设置为同一行中DOC_mg/L中的值

注意：这不是使用索引设置值的公认方法，但我已经这样做了一段时间了。只要确保在设置值时，等式的左侧是

df['col_name'][index]

。如果切换了

col\u name

和

index

，则将值设置为一个副本，而该副本永远不会设置回原始值

现在要设置平均值，您可以创建一个新列，我们将其称为

mean\u mg/L

，并将值设置为0.0。然后将此新列设置为两列的平均值：

# Insert a new col at the end of the dataframe columns name 'Mean_mg/L' 
#     with default value 0.0
df.insert(len(df.columns), 'Mean_mg/L', 0.0)
# Set this columns value to the average of DOC_mg/L and TOC_mg/L
df['Mean_mg/L'] = (df['DOC_mg/L'] + df['TOC_mg/L']) / 2

在我们用相应的列值填充空值的列中，平均值将与值相同。

查看此合并答案[如果在名称类似的列中需要添加后缀]：“”，现在请阅读此处：“”了解如何选择某些索引。合并后要执行的操作是，使用df[（df['colA'].isnull（）==True）和（df['colB'].isnull（）==False）]的变体查找缺少值的行，并设置缺少值的行。然后在colA和colBThanks之间取平均值，以获得您的评论@RyanG。我的列已经在相同的

DataFrame

中，并且共享一个索引（即日期时间）。我编写了一个函数来查找空行/如果两行都包含值，则取平均值。但是，当我尝试分配新值时，会出现语法错误。我已经在原来的问题中添加了新的代码。我会在循环完成后返回数据帧。您的实现已接近尾声，但下面的答案中有另一种解决此问题的方法。感谢您的解决方案。在计算平均值之前，我添加了一行额外的代码，将值从第二个

df

复制回第一个（我不确定

Pandas

在计算平均值时是否会跳过

NaNs

）。我得到一个

键错误

，表示

空索引

不在索引中

。我不明白为什么会出现这个错误，因为用于生成索引的

df

与我正在应用它的索引相同。我误解错误了吗？嗯。根据索引的设置方式，您似乎不能使用

df[null\u index][DOC\u mg/L']

，而是使用

df['DOC\u mg/L'][null\u index]

。这很有趣，因为我以前从未发生过这种情况，但一定是运气使然。我已经更新了帖子以包含修复。谢谢，这很有效。我还使用

.loc

df.loc[null_index，'TOC_mg/L']=df['DOC_mg/L']

找到了另一种解决方案，我很高兴你找到了。使用.loc[]是执行这些类型操作的公认方式。然而，由于我用老方法做了这么长时间，我从来没有完全采用过.loc[]，但现在很明显，关键错误可能存在，所以我也需要学习使用这种方法：D。谢谢你。