Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/346.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
独立的名字、中间名和姓氏(Python)_Python - Fatal编程技术网

独立的名字、中间名和姓氏(Python)

独立的名字、中间名和姓氏(Python),python,Python,我有一个几百个成员的列表,我想按名字、中间名和姓氏分开,但有些成员有前缀(用“p”表示)。所有可能的组合: First Middle Last P First Middle Last First P Middle Last P First p Middle Last 在Python中,如何区分第一名(如果有p)、中间名(如果有p)和姓氏?这是我想出来的,但不太管用 import csv inPath = "input.txt" outPath = "output.txt" newlist =

我有一个几百个成员的列表,我想按名字、中间名和姓氏分开,但有些成员有前缀(用“p”表示)。所有可能的组合:

First Middle Last
P First Middle Last
First P Middle Last
P First p Middle Last
在Python中,如何区分第一名(如果有p)、中间名(如果有p)和姓氏?这是我想出来的,但不太管用

import csv
inPath = "input.txt"
outPath = "output.txt"

newlist = []

file = open(inPath, 'rU')
if file:
    for line in file:
        member = line.split()
        newlist.append(member)
    file.close()
else:
    print "Error Opening File."

file = open(outPath, 'wb')
if file:
    for i in range(len(newlist)):
        print i, newlist[i][0] # Should get the First Name with Prefix
        print i, newlist[i][1] # Should get the Middle Name with Prefix
        print i, newlist[i][-1]
    file.close()
else:
    print "Error Opening File."
我想要的是:

  • 获取第一个和中间的名称及其前缀(如果可用)
  • 将每个(第一个、中间个、最后一个)输出到单独的txt文件,或单个CSV文件(最好)
  • 非常感谢你的帮助

    names = [('A', 'John', 'Paul', 'Smith'),
    ('Matthew', 'M', 'Phil', 'Bond'),
    ('A', 'Morris', 'O', 'Reil', 'M', 'Big')]
    
    def getItem():
        for name in names:
            for (pos,item) in enumerate(name):
                yield item
    
    itembase = getItem()
    
    for i in enumerate(names):
        element = itembase.next()
        if len(element) == 1: firstName = element+" "+itembase.next()
        else: firstName = element
        element = itembase.next()
        if len(element) == 1: mName = element+" "+itembase.next()
        else: mName = element
        element = itembase.next()
        if len(element) == 1: lastName = element+" "+itembase.next()
        else: lastName = element
    
        print "First Name: "+firstName
        print "Middle Name: "+mName
        print "Last Name: "+lastName
        print "--"
    
    这似乎奏效了。用查找三个前缀的条件替换
    len(element)==1
    条件(我不知道您只需要检查3个,所以我用任何一个字母做了一个)

    **Output**
    First Name: A John
    Middle Name: Paul
    Last Name: Smith
    
    First Name: Matthew
    Middle Name: M Phil
    Last Name: Bond
    
    First Name: A Morris
    Middle Name: O Reil
    Last Name: M Big
    

    这个完整的测试脚本怎么样:

    import sys
    
    def process(file):
        for line in file:
            arr = line.split()
            if not arr:
                continue
            last = arr.pop()
            n = len(arr)
            if n == 4:
                first, middle = ' '.join(arr[:2]), ' '.join(arr[2:])
            elif n == 3:
                if arr[0] in ('M', 'Shk', 'BS'):
                    first, middle = ' '.join(arr[:2]), arr[-1]
                else:
                    first, middle = arr[0], ' '.join(arr[1:])
            elif n == 2:
                first, middle = arr
            else:
                continue
            print 'First: %r' % first
            print 'Middle: %r' % middle
            print 'Last: %r' % last
    
    if __name__ == '__main__':
        process(sys.stdin)
    
    如果您在Linux上运行此操作,请键入示例行,然后按Ctrl+D表示输入结束。在Windows上,使用Ctrl+Z而不是Ctrl+D。当然,您也可以在文件中使用管道

    以下输入文件:

    First Middle Last
    M First Middle Last
    First Shk Middle Last
    BS First M Middle Last
    
    给出此输出:

    First: 'First'
    Middle: 'Middle'
    Last: 'Last'
    First: 'M First'
    Middle: 'Middle'
    Last: 'Last'
    First: 'First'
    Middle: 'Shk Middle'
    Last: 'Last'
    First: 'BS First'
    Middle: 'M Middle'
    Last: 'Last'
    
    下面是另一个解决方案(通过更改所给出的源代码获得):


    现在,以面向对象的方式:

    class Name(object):
        def __init__(self, fullname):
            self.full = fullname
            s = self.full.split()
    
            try:
                self.first = " ".join(s[:2]) if len(s[0]) == 1 else s[0]
                s = s[len(self.first.split()):]
    
                self.middle = " ".join(s[:2]) if len(s[0]) == 1 else s[0]
                s = s[len(self.middle.split()):]
    
                self.last = " ".join(s[:2]) if len(s[0]) == 1 else s[0]
            finally:
                pass
    
    names = [
        "First Middle Last",
        "P First Middle Last",
        "First P Middle Last",
        "P First p Middle Last",
    ]
    
    for fullname in names:
        name = Name(fullname)
        print (name.first, name.middle, name.last)
    

    如果“M”、“Shk”和“BS”不是有效的名称/姓氏,即您不关心它们的确切位置,您可以使用一行代码将它们过滤掉:

    first, middle, last = filter(lambda x: x not in ('M','Shk','BS'), yourNameHere.split())
    
    其中,当然,
    yourname此处
    是包含要解析的名称的字符串

    警告:对于这段代码,我假设您总是有一个中间名,正如您在上面的示例中指定的那样。如果没有,则必须获取整个列表并对元素进行计数,以确定是否有中间名

    编辑:如果您确实关心前缀位置:

    first, middle, last = map(
        lambda x: x[1],
        filter(
            lambda (i,x): i not in (0, 2) or x not in ('M','Shk','BS'),
            enumerate(yourNameHere.split())))
    

    我会使用正则表达式,这是专门为这个目的设计的。 此解决方案易于维护和理解

    值得一试。

    由于预编译模式,它的执行速度会快得多

    完整脚本:

    import csv
    
    class CsvWriter(object):
        """
        Wraps csv.writer in a partial file-API compatibility layer
        """
        def __init__(self, fname, mode='w', *args, **kwargs):
            super(CsvWriter, self).__init__()
            self.f = open(fname, mode)
            self.writer = csv.writer(self.f, *args, **kwargs)
    
        def write(self, *args):
            """
            Writes a row of data to the csv file
    
            Can be called as
              .write()         puts a blank row
              .write(2)        puts a single cell
              .write([1,2,3])  puts 3 cells
              .write(1,2,3)    puts 3 cells
            """
            if len(args)==1 and hasattr(args[0], ('__iter__')):
                # single argument, and it's a sequence - let it be the row data
                rowdata = args[0]
            else:
                rowdata = args
    
            self.writer.writerow(rowdata)
    
        def close(self):
            self.writer = None
            self.f.close()
    
        def __enter__(self):
            return self
    
        def __exit__(self, *exc):
            self.close()
    
    class NameSplitter(object):
        def __init__(self, pre=None):
            super(NameSplitter, self).__init__()
    
            # list of accepted prefixes
            if pre is None:
                self.pre = set(['m','shk','bs'])
            else:
                self.pre = set([s.lower() for s in pre])
    
            # is-a-prefix word tester
            self.isPre = lambda x,p=self.pre: x.lower() in p
    
            jn = lambda *args: ' '.join(*args)
    
            # signature-based dispatch table
            self.match = {}
            self.match[(3,())]    = lambda w,j=jn: (w[0],         w[1],         w[2])
            self.match[(4,(0,))]  = lambda w,j=jn: (j(w[0],w[1]), w[2],         w[3])
            self.match[(4,(1,))]  = lambda w,j=jn: (w[0],         j(w[1],w[2]), w[3])
            self.match[(5,(0,2))] = lambda w,j=jn: (j(w[0],w[1]), j(w[2],w[3]), w[4])
    
        def __call__(self, nameStr):
            words = nameStr.split()
    
            # build hashable signature
            pres  = tuple(n for n,word in enumerate(words) if self.isPre(word))
            sig   = (len(words), pres)
    
            try:
                do = self.match[sig]
                return do(words)
            except KeyError:
                return None
    
    def process(inf, outf, fn):
        for line in inf:
            res = fn(line)
            if res is not None:
                outf.write(res)
    
    def main():
        infname = "input.txt"
        outfname = "output.csv"
    
        with open(infname,'rU') as inf:
            with CsvWriter(outfname) as outf:
                process(inf, outf, NameSplitter())
    
    if __name__=="__main__":
        main()
    
    import sys
    
    def f(a,b):
        if b in ('M','Shk','BS'):
                return '%s %s' % (b,a)
        else:
                return '%s,%s' % (b,a)
    
    for line in sys.stdin:
        sys.stdout.write(reduce(f, reversed(line.split(' '))))
    
    输入:

    First Middle Last
    M First Middle Last
    First Shk Middle Last
    BS First M Middle Last
    
    CSV输出:

    First,Middle,Last
    M First,Middle,Last
    First,Shk Middle,Last
    BS First,M Middle,Last
    

    示例中不清楚“前缀”是什么;例如,如何判断“ab C D”是
    (“ab”、“C”、“D”)
    还是
    (“A”、“bc”、“D”)
    。请给出一个更完整的例子,并更具体地解释“前缀”是什么。如果前缀只有一个字母长,而没有一个名称只有一个字母长,您可以尝试
    len()
    并将其过滤掉,将其与各自的名称分组。只是想一想。只有三个前缀“M”、“Shk”和“BS”似乎不起作用:
    firstmiddle Last | M First Middle Last | First Shk Middle Last | Shk First M Middle Last
    我已经说过,必须用所需的条件替换
    len(元素)==1
    。我不能为你做所有的工作,这只是一个例子。其他人提供了更好的,我们都在这里学习。在这里上课需要什么?顺便说一句,前缀不仅仅是一个字符串,尽管在问题..类中没有明确说明?可能是美观、可读性和代码重用。这里有一个来自Python的Zen:名称空间是一个非常好的想法——让我们做更多的事情吧!;)前缀,以及if语句中的表达式总是可以调整的,是的,在Q中不清楚。我的意思是,当你只需要一个函数时,为什么要使用一个类?或者可能
    [x[1]for x in filter(…)]
    我不确定哪一个性能更好,但第二种避免创建函数的方法…太棒了!工作起来很有魅力这将有助于我了解投票支持这篇文章的原因,这样我将来就可以避免这样的文章了。我试图对问题中给出的源代码做最小的修改,并给出答案。
    First Middle Last
    M First Middle Last
    First Shk Middle Last
    BS First M Middle Last
    
    First,Middle,Last
    M First,Middle,Last
    First,Shk Middle,Last
    BS First,M Middle,Last