Python 如何连接两个文件并保留不匹配的行？_Python_Unix_Join

Python 如何连接两个文件并保留不匹配的行？

python unix join

Python 如何连接两个文件并保留不匹配的行？,python,unix,join,Python,Unix,Join,如何合并两个文件并保留不匹配的行我的第一个文件如下所示： apples 1.4 grapes 1.3 pears 2.1 oranges 1.1 grapefruit 1.0 我的第二个文件如下所示： apples Alex grapes Margery grapefruit Francis 我的输出应该是： apples 1.4

如何合并两个文件并保留不匹配的行

我的第一个文件如下所示：

apples          1.4       
grapes          1.3
pears           2.1
oranges         1.1
grapefruit      1.0

我的第二个文件如下所示：

apples         Alex
grapes         Margery
grapefruit     Francis

我的输出应该是：

apples          1.4     Alex  
grapes          1.3     Margery
pears           2.1
oranges         1.1
grapefruit      1.0     Francis

在此方面的任何帮助都将不胜感激，谢谢

鉴于文件2中没有文件1中没有的条目，这里有一个（未经测试的）解决方案：

import re

names = {}

with open("second.txt") as second:
    for line in second:
        m = match("([^\s]*)\s*([^\s]*)", line.strip())
        if m:
            names[m.group(1)] = m.group(2)

with open("first.txt") as first, open("output.txt", w) as out:
    for line in first:
        writeline = line
        m = match("([\s]*).*)", line.strip())
        if m:
            name = names.get(m.group(1), None)
            if name:
                writeline += "     " + name
        out.write(writeline)

我正在做的是，首先解析第二个文件，将所有的水果和相应的名称读入字典。然后浏览第一个文件，检查每行的水果是否有字典中的对应条目，如果发现该名称将添加到输出中。

鉴于文件2中没有条目，而文件1中没有条目，这里有一个（未测试的）解决方案：

import re

names = {}

with open("second.txt") as second:
    for line in second:
        m = match("([^\s]*)\s*([^\s]*)", line.strip())
        if m:
            names[m.group(1)] = m.group(2)

with open("first.txt") as first, open("output.txt", w) as out:
    for line in first:
        writeline = line
        m = match("([\s]*).*)", line.strip())
        if m:
            name = names.get(m.group(1), None)
            if name:
                writeline += "     " + name
        out.write(writeline)

我正在做的是，首先解析第二个文件，将所有的水果和相应的名称读入字典。然后浏览第一个文件，检查每行的水果在字典中的对应条目，如果发现该名称，则将其添加到输出中。

您可以使用pandas中的数据帧来完成此操作。将输入转换为数据帧，例如a和b

 import pandas as pd

数据帧a

           x    y
 0      apples  1.4
 1      grapes  1.3
 2       pears  2.1
 3     oranges  1.1
 4  grapefruit  1.0

数据帧b

        k        l
 0  apples     Alex
 1  grapes  Margery
 2   pears  Francis

现在，如果列名称不同，请使用水果名称重命名该列

 b.columns=['x','l']

然后在列名上合并

new=pd.merge(a, b, on='x', how='outer')

您的新数据帧如下所示

           x    y        l
 0      apples  1.4     Alex
 1      grapes  1.3  Margery
 2       pears  2.1  Francis
 3     oranges  1.1      NaN
 4  grapefruit  1.0      NaN

您可以使用pandas中的数据帧来实现这一点。将输入转换为数据帧，例如a和b

 import pandas as pd

数据帧a

           x    y
 0      apples  1.4
 1      grapes  1.3
 2       pears  2.1
 3     oranges  1.1
 4  grapefruit  1.0

数据帧b

        k        l
 0  apples     Alex
 1  grapes  Margery
 2   pears  Francis

现在，如果列名称不同，请使用水果名称重命名该列

 b.columns=['x','l']

然后在列名上合并

new=pd.merge(a, b, on='x', how='outer')

您的新数据帧如下所示

           x    y        l
 0      apples  1.4     Alex
 1      grapes  1.3  Margery
 2       pears  2.1  Francis
 3     oranges  1.1      NaN
 4  grapefruit  1.0      NaN

使用

awk

可以执行以下操作：

$ awk 'FNR==NR{seen[$1]=$2; next}         # read first file and construct array
       $1 in seen{seen[$1]=seen[$1] OFS $2} # add entry from second file
       END{ for (e in seen) print e, seen[e]}' file1 file2
apples 1.4 Alex
grapefruit 1.0 Francis
oranges 1.1
pears 2.1
grapes 1.3 Margery

订单将从原始文件更改，但这并不是一项要求

如果您想要相同的顺序和原始文件，并且更接近您的示例，您可以执行以下操作：

$ awk 'BEGIN{OFS="\t"}
       FNR==NR{ord[FNR]=$1
               seen[$1]=$2
               next}
       $1 in seen {seen[$1]=seen[$1] OFS $2}
       END{ for (i=1;i in ord;i++)
               printf "%-10s\t%s\n", ord[i], seen[ord[i]]}' f1 f2
apples      1.4 Alex
grapes      1.3 Margery
pears       2.1
oranges     1.1
grapefruit  1.0 Francis

使用

awk

可以执行以下操作：

$ awk 'FNR==NR{seen[$1]=$2; next}         # read first file and construct array
       $1 in seen{seen[$1]=seen[$1] OFS $2} # add entry from second file
       END{ for (e in seen) print e, seen[e]}' file1 file2
apples 1.4 Alex
grapefruit 1.0 Francis
oranges 1.1
pears 2.1
grapes 1.3 Margery

订单将从原始文件更改，但这并不是一项要求

如果您想要相同的顺序和原始文件，并且更接近您的示例，您可以执行以下操作：

$ awk 'BEGIN{OFS="\t"}
       FNR==NR{ord[FNR]=$1
               seen[$1]=$2
               next}
       $1 in seen {seen[$1]=seen[$1] OFS $2}
       END{ for (i=1;i in ord;i++)
               printf "%-10s\t%s\n", ord[i], seen[ord[i]]}' f1 f2
apples      1.4 Alex
grapes      1.3 Margery
pears       2.1
oranges     1.1
grapefruit  1.0 Francis

请阅读并遵循帮助文档中的发布指南。在这里申请。StackOverflow不是一个编码或教程服务。你自己到哪里去了？你的密码在哪里？你粘到哪一部分了？我试过各种方法，比如粘贴和连接（即join-o 1.1 1.2 2.1 2.2请阅读并遵循帮助文档中的发布指南。并在此处申请。StackOverflow不是编码或教程服务。你自己在哪里？你的代码在哪里？你卡住了什么部分？我尝试了各种方法，如粘贴和加入（即join-o 1.1 1.2.1 2.2）