Python 从列表中提取出发和到达

Python 从列表中提取出发和到达,python,list,identification,Python,List,Identification,我试图从结构和长度可变的列表中提取一些参数。基本上,这些参数是路线的出发地址和到达地址。此列表基于自然语言的句子构建,因此不遵循任何特定模板: 1st example : ['go', 'Buzenval', 'from', 'Chatelet'] 2nd example : ['How', 'go', 'street', 'Saint', 'Augustin', 'from', 'Buzenval'] 3rd example : ['go', 'from', '33', 'street', '

我试图从结构和长度可变的列表中提取一些参数。基本上,这些参数是路线的出发地址和到达地址。此列表基于自然语言的句子构建,因此不遵循任何特定模板:

1st example : ['go', 'Buzenval', 'from', 'Chatelet']
2nd example : ['How', 'go', 'street', 'Saint', 'Augustin', 'from', 'Buzenval']
3rd example : ['go', 'from', '33', 'street', 'Republique', 'to', '12','street','Napoleon']
我已经设法为每种情况创建了另一个非常相似的列表,除了出发和到达被实际的单词“出发”和“到达”替换。通过以上示例,我获得:

1st example : ['go', 'arrival', 'from', 'departure']
2nd example : ['How', 'go', 'arrival', 'from', 'departure']
3rd example : ['go', 'from', 'departure', 'to', 'arrival']
现在我有了这两种列表,我想确定出发和到达:

1rst example : departure = ['Chatelet'], arrival = ['Buzenval']
2nd example : departure =  ['Buzenval'], arrival = ['street','Saint','Augustin']
3rd example : departure = ['33','street','Republique'], arrival = ['12','street','Napoleon']
基本上,参数是两个列表中所有不同的参数,但我需要确定哪一个是出发点,哪一个是到达点。我想Regex可以帮我解决这个问题,但我不知道怎么做


谢谢你的帮助

Regex在这方面肯定会有所帮助,但我尝试了一种简单的方法。如果您提到的模式适用于所有人,那么这是适用的。我把它作为第一个例子。您可以对其余部分应用相同的逻辑并修改代码:

代码:

first = ['go', 'Buzenval', 'from', 'Chatelet'] # First Example
start = first.index('go')
end = first.index('from')
arrival = base[start+1:end]
departure = base[end+1:]
print("Departure: {0} , Arrival: {1}".format(departure,arrival))
输出:

Departure: ['Chatelet'] , Arrival: ['Buzenval']

我找到了一个解决你的三个例子的方法。您应该更改的一件事是变量名,我不知道如何命名它们。(这是旧版本,速度慢且难以理解。后面的版本更好)

两种方式的作用完全相同:

for example in ((['go', 'Buzenval', 'from', 'Chatelet'],
                 ['go', 'arrival', 'from', 'departure']
                 ),
                (['How', 'go', 'street', 'Saint', 'Augustin', 'from', 'Buzenval'],
                 ['How', 'go', 'arrival', 'from', 'departure']
                 ),
                (['go', 'from', '33', 'street', 'Republique', 'to', '12', 'street', 'Napoleon'],
                 ['go', 'from', 'departure', 'to', 'arrival']
                 )):
    print(extract_places(*example))
这两种类型的打印:

(['Buzenval'], ['Chatelet'])
(['street', 'Saint', 'Augustin'], ['Buzenval'])
(['12', 'street', 'Napoleon'], ['33', 'street', 'Republique'])

来自
Python
解释器的示例:

>>> import itertools
>>> key = None
>>> arr = ['go', 'from', '33', 'street', 'Republique', 'to', '12','street','Napoleon']
>>>
>>> for k, group in itertools.groupby(arr, lambda x: x in ['go', 'to','from']):
...     if k:
...         key = list(group)[-1]
...         continue
...     if key is not None:
...         if key == 'from':
...             tag = 'departure'
...         else:
...             tag = 'arrival'
...         print tag, list(group)
...     key = None
...
departure ['33', 'street', 'Republique']
arrival ['12', 'street', 'Napoleon']

这应该适合您:

l1 =  ['go', 'Buzenval', 'from', 'Chatelet']
l2 =  ['How', 'go', 'street', 'Saint', 'Augustin', 'from', 'Buzenval']
l3 =  ['go', 'from', '33', 'street', 'Republique', 'to', '12','street','Napoleon']

def get_locations (inlist):
    marker = 0
    end_dep = 0
    start_dep = 0

    for word in inlist:
        if word =="go":
            if inlist[marker+1] != "from":
                end_dep = marker +1
            else:
                start_dep = marker +2

        if word =="from" and start_dep == 0:
            start_dep = marker + 1

        if word == "to":
            end_dep = marker + 1
        marker +=1

    if end_dep > start_dep:
        start_loc = inlist[start_dep:end_dep-1]
        end_loc = inlist[end_dep:]

    else:
        start_loc = inlist [start_dep:]
        end_loc = inlist[end_dep: start_dep -1]

    return start_loc, end_loc

directions = get_locations (l3) #change to l1 / l2 to see other outputs

print( "departure = " + str(directions[0]))
print( "arrival = " + str(directions[1]))

你好,山姆,谢谢你的回复!这确实适用于我给出的例子,但不幸的是,在我的语言中,“from”和“to”有许多同义词。我甚至不确定是否能得到“go”这个词,因为有时这个句子是“从x到y的路线是什么”。。。但是如果我不能用regex解决这个问题,我会接受你的解决方案。也许你必须检查数据,然后维护这些单词的列表或词典。因为正则表达式也遵循一种模式,不能只处理随机模式。嗨,Megalng,非常感谢!如果我理解得很好,我必须在“关键字”中输入两个列表中可能相似的所有单词,对吗?如果我是对的,并且设置keywords=list(set(names).intersection(modes)),我应该能够将您的代码推广到很多用途cases@BenjaminBB我添加了一个新版本。对你来说,其中一个比另一个更容易理解?@BenjaminBB第二个版本也快十倍。我非常喜欢第二个!速度对我的程序很重要,所以越快越好。谢谢你花时间在这上面@这很有趣。我喜欢这样的任务。
>>> import itertools
>>> key = None
>>> arr = ['go', 'from', '33', 'street', 'Republique', 'to', '12','street','Napoleon']
>>>
>>> for k, group in itertools.groupby(arr, lambda x: x in ['go', 'to','from']):
...     if k:
...         key = list(group)[-1]
...         continue
...     if key is not None:
...         if key == 'from':
...             tag = 'departure'
...         else:
...             tag = 'arrival'
...         print tag, list(group)
...     key = None
...
departure ['33', 'street', 'Republique']
arrival ['12', 'street', 'Napoleon']
l1 =  ['go', 'Buzenval', 'from', 'Chatelet']
l2 =  ['How', 'go', 'street', 'Saint', 'Augustin', 'from', 'Buzenval']
l3 =  ['go', 'from', '33', 'street', 'Republique', 'to', '12','street','Napoleon']

def get_locations (inlist):
    marker = 0
    end_dep = 0
    start_dep = 0

    for word in inlist:
        if word =="go":
            if inlist[marker+1] != "from":
                end_dep = marker +1
            else:
                start_dep = marker +2

        if word =="from" and start_dep == 0:
            start_dep = marker + 1

        if word == "to":
            end_dep = marker + 1
        marker +=1

    if end_dep > start_dep:
        start_loc = inlist[start_dep:end_dep-1]
        end_loc = inlist[end_dep:]

    else:
        start_loc = inlist [start_dep:]
        end_loc = inlist[end_dep: start_dep -1]

    return start_loc, end_loc

directions = get_locations (l3) #change to l1 / l2 to see other outputs

print( "departure = " + str(directions[0]))
print( "arrival = " + str(directions[1]))