Python 长度大于1时元组的条件列表理解_Python_List_Dictionary_Tuples

Python 长度大于1时元组的条件列表理解

python list dictionary

Python 长度大于1时元组的条件列表理解,python,list,dictionary,tuples,Python,List,Dictionary,Tuples,我有一个带有元组的句子，表示国家或数字的位置： sample = In the first 11 months of 2004 Hong Kong 's international airport at Chek Lap Kok handled daily an average of 592 flights , 92,630 passengers , and more than 7,734 tons of cargo. 然后： tokenIDs2number = {(22,): 592.00,

我有一个带有元组的句子，表示国家或数字的位置：

sample = In the first 11 months of 2004 Hong Kong 's international airport at Chek Lap Kok handled daily an average of 592 flights , 92,630 passengers , and more than 7,734 tons of cargo.

然后：

tokenIDs2number = {(22,): 592.00, (25,): 92630.00,(34,): 7734.00}
tokenIDs2location = {(8,9): Hong Kong}

我需要为这些元组的不同组合创建不同的句子组合，我称之为槽句子：

In the first 11 months of 2004 LOCATION_SLOT 's international airport at Chek Lap Kok handled daily an average of NUMBER_SLOT flights , 92,630 passengers , and more than 7,734 tons of cargo.

In the first 11 months of 2004 LOCATION_SLOT 's international airport at Chek Lap Kok handled daily an average of 592 flights , NUMBER_SLOT passengers , and more than 7,734 tons of cargo.

In the first 11 months of 2004 LOCATION_SLOT 's international airport at Chek Lap Kok handled daily an average of 592 flights , 92,630 passengers , and more than NUMBER_SLOT tons of cargo.

但是，我当前的代码基本上采用元组中元素的组合，因此我有两个句子，如：

In the first 11 months of 2004 LOCATION_SLOT Kong 's international airport at Chek Lap Kok handled daily an average of NUMBER_SLOT flights , 92,630 passengers , and more than 7,734 tons of cargo.

In the first 11 months of 2004 Hong LOCATION_SLOT 's international airport at Chek Lap Kok handled daily an average of NUMBER_SLOT flights , 92,630 passengers , and more than 7,734 tons of cargo.

举个例子

我如何解决这个问题，以便当我有一个元组键

len>1

时，我根据自己的意愿将该键中的所有插槽填充为一个位置或数字插槽

当前代码：

 for locationTokenIDs, location in tokenIDs2location.items():
                    for numberTokenIDs, number in tokenIDs2number.items():    
                        sentenceDict = {}    
                        sentenceDict["sentence"] = sample    
                        sentenceDict["location-value-pair"] = {location:number}  
                        for locationTokenID in locationTokenIDs:
                            for numberTokenID in numberTokenIDs:                                   
                                finalTokens = cleanSample.split()
                                finalTokens[numberTokenID] = "NUMBER_SLOT"
                                finalTokens[locationTokenID] = "LOCATION_SLOT"   
                                slotSentence = (" ").join(finalTokens)
                                sentenceDict["parsedSentence"] = slotSentence

注意，我必须创建一个字典，它还跟踪位置-值对和每个槽-句子组合的原始句子。关键部分是生成正确的

slotcontent

注意，这只是一个例子，数字甚至可能是

24000000

，其中句子中的值是

2400万

，相同的万亿、百万、十亿和千

如果这是不可能的，另一种选择是填充组合中的所有插槽：

In the first 11 months of 2004 LOCATION_SLOT LOCATION_SLOT 's international airport at Chek Lap Kok handled daily an average of NUMBER_SLOT flights , 92,630 passengers , and more than 7,734 tons of cargo.

然后可能会修改句子以删除连续的槽，但我的偏好是一次完成所有操作。

代码将每个locationTokenID视为槽，而locationTokenID实际上表示应视为槽的令牌片的端点。因此，我们需要在locationTokenID:循环中删除locationTokenID的

（它在每个locationTokenID上循环，就像它是一个插槽一样），并用单个插槽替换locationTokenID对定义的相应字片
下面的代码解决了OP中解决的问题，但仍然存在其他问题（例如，只保留最后生成的slotSentence
；我将让您解决这个问题，因为我不知道您要将slot语句存储在什么样的数据结构中）：
输出：
在2004年的前11个月
赤濸角国际机场每日平均处理
航班数量92630名乘客，超过7734吨货物
货物
在2004年的前11个月
位于赤濸角的香港国际机场每日平均处理
592个航班，乘客人数，超过7734吨
货物。
在2004年前11个月
位于赤濸角的香港国际机场每日平均处理
592个航班，92630名乘客，超过吨
货物

这可以扩展到适用于包含任意数量空格的位置和编号。我们通过使NumberTokenId和LocationTokenId都是一个2长度元组来实现这一点，该元组为每个位置/编号指定一系列标记：
sample = "In the first 11 months of 2004 Hong Kong Central 's international airport at Chek Lap Kok handled daily an average of 592 flights , 92 630 passengers , and more than 7 734 tons of cargo."

tokenIDs2number = {(22,22): '592', (25,26): '92 630',(32,33): '7 734'}
tokenIDs2location = {(7,9): 'Hong Kong Central'}

for locationTokenIDs, location in tokenIDs2location.items():
    for numberTokenIDs, number in tokenIDs2number.items():    
        finalTokens = sample.split()
        finalTokens[numberTokenIDs[0]:(numberTokenIDs[1]+1)] = "NUMBER_SLOT"
        finalTokens[locationTokenIDs[0]:(locationTokenIDs[1]+1)] = "LOCATION_SLOT"   
        slotSentence = (" ").join(finalTokens)
        print(slotSentence)

输出：
2004年前11个月**L O C A T I O N_us L O T**
赤濸角国际机场每日平均处理592宗
航班，**N U M B E R  S L O T**乘客，超过7734吨
货物数量。
2004年前11个月**L O C A T I O N_us L O T**
赤濸角国际机场每日平均处理592宗
航班，92630名乘客，超过**吨
货物数量。
2004年前11个月**L O C A T I O N_us L O T**
赤濸角国际机场每天平均处理**N U
M B E R_S L O T**航班，92630名乘客，7734多名乘客
吨货物
考虑使用str.replace（）
，而不是分割句子字符串。为此，您需要使用千位分隔符转换tokenID2number
中的元素，对于Python 2.7+，可以使用format（int，，'）
处理@JonClements注释：
sample = "In the first 11 months of 2004 Hong Kong 's international airport " + \
         "at Chek Lap Kok handled daily an average of 592 flights " + \
         "92,630 passengers , and more than 7,734 tons of cargo."    
tokenIDs2number = {(22,): 592, (25,): 92630,(34,): 7734}
tokenIDs2location = {(8,9): 'Hong Kong'}

sentenceList = []
# ITERATE ACROSS A LIST COMPREHENSION FOR ALL POSSIBLE COMBINATIONS
for item in [[s,i,j] for s in [sample] \
                     for i in tokenIDs2location.items() \
                     for j in tokenIDs2number.items()]:
    sentenceDict = {}  
    sentenceDict["sentence"] = item[0]
    sentenceDict["location-value-pair"] = {item[1][1]: item[2][1]}
    sentenceDict["parsedSentence"] = sample.replace(item[1][1], 'LOCATION_SLOT').\
                                            replace(format(item[2][1], ','), 'NUMBER_SLOT')
    sentenceList.append(sentenceDict)

输出（句子列表的）
我已经解决了我的用例，但是使用了一种迂回的方式
我首先考虑包含多个LOCATION\u slot
或NUMBER\u slot
的slot语句-如果组合中的一个元组包含两个或多个slot，我将填充所有：
sentences2location2values = []

for locationTokenIDs, location in tokenIDs2location.items():
                    for numberTokenIDs, number in tokenIDs2number.items():    
                        sentenceDict = {}    
                        sentenceDict["sentence"] = sample    
                        sentenceDict["location-value-pair"] = {location:number}  
                        for locationTokenID in locationTokenIDs:
                            sampleTokens[locationTokenID] = "LOCATION_SLOT"

                        for numberTokenID in numberTokenIDs:
                            sampleTokens[numberTokenID] = "NUMBER_SLOT"

                    slotSentence = (" ").join(sampleTokens)
                    sentenceDict["parsedSentence"] = slotSentence
                    sentences2location2values.append(sentenceDict)

然后，我更改已解析的句子以删除连续的位置和编号槽：
for i,sentence in enumerate(sentences2location2values):
        sampleTokens = sentence['parsedSentence'].split()
        newTokens = []
        for i,token in enumerate(sampleTokens):
            if i>0 and ((token == "LOCATION_SLOT" and sampleTokens[i-1]=="LOCATION_SLOT") or (token == "NUMBER_SLOT" and sampleTokens[i-1]=="NUMBER_SLOT")):
                continue
            else:
                newTokens.append(token)

        sentence['parsedSentence']=(' ').join(newTokens)

虽然你认为迈克·德西莫的食谱很好。。。对于2.7+，您现在可以将其写成格式（int_值，，”）
。@JonClements这是否意味着我可以将替换（intWithCommas（item[2][1]），'NUMBER_SLOT'）
替换为替换（format（item[2][1]，”），'NUMBER_SLOT'）
？@JonClements如果元组值实际上是浮点值，会发生什么？请注意，这些甚至可以查看句子中的值，如2400万
，并将其转换为24000000.00
@JonClements-非常感谢！我不知道这一点，我们现在相信你了。@dhruvghuati，如果值是浮点数，只需将int（）
或round（）
转换为格式（int，，）
之前的最接近整数即可。并建议此解决方案是否有效。这是一个很好的答案，从逻辑上讲很有意义，您能解释一下为什么位置槽被空格分割吗？还有，我如何使这一点通用（有时插槽跨越两个以上的空格，例如“刚果民主共和国”，也可能有多个数字插槽，而不仅仅是位置。正在使用len（LocationTokenId）玩弄但我不会掩盖必要的国家/地区。这适用于具有任意数量空格的国家/地区，因为LocationTokenId中的值表示切片端点，并且在代码中被视为切片端点。适用于位置的相同逻辑也适用于数字。我用适用于位置的代码更新了答案
sentences2location2values = []

for locationTokenIDs, location in tokenIDs2location.items():
                    for numberTokenIDs, number in tokenIDs2number.items():    
                        sentenceDict = {}    
                        sentenceDict["sentence"] = sample    
                        sentenceDict["location-value-pair"] = {location:number}  
                        for locationTokenID in locationTokenIDs:
                            sampleTokens[locationTokenID] = "LOCATION_SLOT"

                        for numberTokenID in numberTokenIDs:
                            sampleTokens[numberTokenID] = "NUMBER_SLOT"

                    slotSentence = (" ").join(sampleTokens)
                    sentenceDict["parsedSentence"] = slotSentence
                    sentences2location2values.append(sentenceDict)

for i,sentence in enumerate(sentences2location2values):
        sampleTokens = sentence['parsedSentence'].split()
        newTokens = []
        for i,token in enumerate(sampleTokens):
            if i>0 and ((token == "LOCATION_SLOT" and sampleTokens[i-1]=="LOCATION_SLOT") or (token == "NUMBER_SLOT" and sampleTokens[i-1]=="NUMBER_SLOT")):
                continue
            else:
                newTokens.append(token)

        sentence['parsedSentence']=(' ').join(newTokens)