Python:if条件中的Return语句只返回最后一个值

Python:if条件中的Return语句只返回最后一个值,python,pandas,function,dataframe,return,Python,Pandas,Function,Dataframe,Return,我有一个带列的数据框:(名称、颜色、ID),还有一个带列的句子:(句子、ID)。我需要将水果数据框的每个记录与句子数据框进行比较,如果水果名称在句子中完全相同,则将其颜色连接到句子中水果名称之前 这就是我所做的: import pandas as pd import regex as re # create fruit dataframe fruit_data = [['Apple', 'Red', 1], ['Mango', 'Yellow', 2], ['Grapes', 'Green'

我有一个带列的
数据框:
(名称、颜色、ID)
,还有一个带列的
句子:
(句子、ID)
。我需要将水果数据框的每个记录与句子数据框进行比较,如果水果名称在句子中完全相同,则将其颜色连接到句子中水果名称之前

这就是我所做的:

import pandas as pd
import regex as re

# create fruit dataframe 
fruit_data = [['Apple', 'Red', 1], ['Mango', 'Yellow', 2], ['Grapes', 'Green', 3]] 
fruit_df = pd.DataFrame(fruit_data, columns = ['Name', 'Color', 'ID']) 
print(fruit_df)

# create sentence dataframe 
sentence = [['I like Apple', 1], ['I like ripe Mango', 2], ['Grapes are juicy', 3]] 
sentence_df = pd.DataFrame(sentence, columns = ['Sentence', 'ID']) 
print(sentence_df)


def search(desc, name, color, id):
    flag = 0
    if re.findall(r"\b" + name + r"\b", desc):
        desc_id = (sentence_df[sentence_df['Sentence'] == desc]['ID'].values[0])
        if desc_id == id:
            flag = 1
        
        if flag == 1:
            # for loop is used because fruit can appear more than once in sentence
            all_indexes = []
            for match in re.finditer(r"\b" + name + r"\b", desc):
                     all_indexes.append(match.start())
            
            arr = list(desc)
            for idx in sorted(all_indexes, reverse=True):
                       arr.insert(idx, color + " ")

            new_desc = ''.join(arr)
           
            print("modified sentence: ", new_desc)
            return new_desc 

def compare(name, color, id):
    sentence_df['Result'] = sentence_df['Sentence'].apply(lambda x: search(x, name, color, id))
    

fruit_df.apply(lambda x: compare(x['Name'], x['Color'], x['ID']), axis=1)
print ("The final result is: ")
print(sentence_df['Result'])
代码的结果是:

     Name    Color  ID
0   Apple     Red    1
1   Mango  Yellow    2
2  Grapes   Green    3

            Sentence  ID
0       I like Apple   1
1  I like ripe Mango   2
2   Grapes are juicy   3


modified sentence:  I like Red Apple
modified sentence:  I like ripe Yellow Mango
modified sentence:  Green Grapes are juicy


The final result is: 
0                      None
1                      None
2    Green Grapes are juicy
Name: Result, dtype: object

句子得到了正确的修改,但问题是,前两个句子没有存储在句子数据框的
Result
列中,只存储了最后一个句子。这是正确的方法还是我遗漏了什么?

经过一些修改后:

import pandas as pd
import re
# create fruit dataframe 
fruit_data = [['Apple', 'Red', 1], ['Mango', 'Yellow', 2], ['Grapes', 'Green', 3]] 
fruit_df = pd.DataFrame(fruit_data, columns = ['Name', 'Color', 'ID']) 
print(fruit_df)

# create sentence dataframe 
sentence = [['I like Apple', 1], ['I like ripe Mango', 2], ['Grapes are juicy', 3]] 
sentence_df = pd.DataFrame(sentence, columns = ['Sentence', 'ID']) 
print(sentence_df)


def search(ids):
    name = fruit_df[fruit_df['ID']==ids]['Name'].values[0]
    desc = sentence_df[sentence_df['ID']==ids]['Sentence'].values[0]
    color = fruit_df[fruit_df['ID']==ids]['Color'].values[0]
    if True:# kept to maintain this indentation
            # for loop is used because fruit can appear more than once in sentence
            all_indexes = []
            for match in re.finditer(r"\b" + name + r"\b", desc):
                     all_indexes.append(match.start())
            
            arr = list(desc)
            for idx in sorted(all_indexes, reverse=True):
                       arr.insert(idx, color + " ")

            new_desc = ''.join(arr)
           
            print("modified sentence: ", new_desc)
            return new_desc


sentence_df['Result'] = sentence_df['ID'].apply(lambda x: search(x))
    

print("The final result is: ")
print(sentence_df['Result'])

变化:

主要问题是这里的
fruit\u df.apply
。这为
fruits\u df
中的每个项目调用了compare函数,在提供的示例中,这意味着3次

然后,
compare
根据通过
fruit\u df.apply的当前结果修改结果列中的所有条目

所以,第一步就是只打一个电话

需要做的另一个更改是使用
外键:ID

ID存在于两个数据帧中,因此它足以识别
搜索
功能中的
名称
描述
颜色


输出:

      Name   Color  ID
0   Apple     Red   1
1   Mango  Yellow   2
2  Grapes   Green   3


            Sentence  ID
0       I like Apple   1
1  I like ripe Mango   2
2   Grapes are juicy   3


modified sentence:  I like Red Apple
modified sentence:  I like ripe Yellow Mango
modified sentence:  Green Grapes are juicy


The final result is: 
0            I like Red Apple
1    I like ripe Yellow Mango
2      Green Grapes are juicy
Name: Result, dtype: object
编辑:按OP要求,解决方案的快速修复版本

只需按如下方式更改原始代码的底部

def compare(name, color, id):
    sentence_df['Result'] = sentence_df['Result'].apply(lambda x: search(x, name, color, id) or x)
    
sentence_df['Result'] = sentence_df['Sentence']

fruit_df.apply(lambda x: compare(x['Name'], x['Color'], x['ID']), axis=1)
print ("The final result is: ")
print(sentence_df['Result'])

注意:在这个补丁中,上面提到的问题在技术上没有得到解决。只是引入了一个小旁路以达到所需的输出。

经过一些修改后:

import pandas as pd
import re
# create fruit dataframe 
fruit_data = [['Apple', 'Red', 1], ['Mango', 'Yellow', 2], ['Grapes', 'Green', 3]] 
fruit_df = pd.DataFrame(fruit_data, columns = ['Name', 'Color', 'ID']) 
print(fruit_df)

# create sentence dataframe 
sentence = [['I like Apple', 1], ['I like ripe Mango', 2], ['Grapes are juicy', 3]] 
sentence_df = pd.DataFrame(sentence, columns = ['Sentence', 'ID']) 
print(sentence_df)


def search(ids):
    name = fruit_df[fruit_df['ID']==ids]['Name'].values[0]
    desc = sentence_df[sentence_df['ID']==ids]['Sentence'].values[0]
    color = fruit_df[fruit_df['ID']==ids]['Color'].values[0]
    if True:# kept to maintain this indentation
            # for loop is used because fruit can appear more than once in sentence
            all_indexes = []
            for match in re.finditer(r"\b" + name + r"\b", desc):
                     all_indexes.append(match.start())
            
            arr = list(desc)
            for idx in sorted(all_indexes, reverse=True):
                       arr.insert(idx, color + " ")

            new_desc = ''.join(arr)
           
            print("modified sentence: ", new_desc)
            return new_desc


sentence_df['Result'] = sentence_df['ID'].apply(lambda x: search(x))
    

print("The final result is: ")
print(sentence_df['Result'])

变化:

主要问题是这里的
fruit\u df.apply
。这为
fruits\u df
中的每个项目调用了compare函数,在提供的示例中,这意味着3次

然后,
compare
根据通过
fruit\u df.apply的当前结果修改结果列中的所有条目

所以,第一步就是只打一个电话

需要做的另一个更改是使用
外键:ID

ID存在于两个数据帧中,因此它足以识别
搜索
功能中的
名称
描述
颜色


输出:

      Name   Color  ID
0   Apple     Red   1
1   Mango  Yellow   2
2  Grapes   Green   3


            Sentence  ID
0       I like Apple   1
1  I like ripe Mango   2
2   Grapes are juicy   3


modified sentence:  I like Red Apple
modified sentence:  I like ripe Yellow Mango
modified sentence:  Green Grapes are juicy


The final result is: 
0            I like Red Apple
1    I like ripe Yellow Mango
2      Green Grapes are juicy
Name: Result, dtype: object
编辑:按OP要求,解决方案的快速修复版本

只需按如下方式更改原始代码的底部

def compare(name, color, id):
    sentence_df['Result'] = sentence_df['Result'].apply(lambda x: search(x, name, color, id) or x)
    
sentence_df['Result'] = sentence_df['Sentence']

fruit_df.apply(lambda x: compare(x['Name'], x['Color'], x['ID']), axis=1)
print ("The final result is: ")
print(sentence_df['Result'])

注意:在这个补丁中,上面提到的问题在技术上没有得到解决。只是引入了一个小的旁路以达到所需的输出。

在这两个地方使用Dataframe.apply方法是代码中的问题。 apply方法帮助您在数据帧的任何轴上应用方法,而不是添加新列并为其赋值。 如果要在同一数据帧上工作,并且希望执行上述操作,则可以使用.assign方法。它允许您指定一个新列,该列的值由其他列的值计算得出。 对于您的代码,如果您想保留相同的代码,而不是像上面建议的那样重构它,那么您需要的只是一个循环

for idx, row in fruit_df.iterrows():
result = search(sentence_df.loc[idx,"Sentence"], row["Name"], row["Color"], row["ID"])
sentence_df.loc[idx,"Result"] = result

在这两个地方使用Dataframe.apply方法是代码中的问题。 apply方法帮助您在数据帧的任何轴上应用方法,而不是添加新列并为其赋值。 如果要在同一数据帧上工作,并且希望执行上述操作,则可以使用.assign方法。它允许您指定一个新列,该列的值由其他列的值计算得出。 对于您的代码,如果您想保留相同的代码,而不是像上面建议的那样重构它,那么您需要的只是一个循环

for idx, row in fruit_df.iterrows():
result = search(sentence_df.loc[idx,"Sentence"], row["Name"], row["Color"], row["ID"])
sentence_df.loc[idx,"Result"] = result

感谢您提供的解决方案,但我听说dataframe.apply()比iterrows()更高效,所以我使用df.apply()编写了它。是否可以单独使用df.apply()来纠正此问题?因为DataFrame.apply方法应用于列或行,所以答案是“否”。除非,您想在get go中添加一个空的“Result”列,并在Result列系列上执行.apply方法。感谢您提供的解决方案,但我听说dataframe.apply()比iterrows()更高效,所以我使用df.apply()编写了它。是否可以单独使用df.apply()来纠正此问题?因为DataFrame.apply方法应用于列或行,所以答案是“否”。除非,您想在get go中添加一个空的“Result”列,并在Result列系列中执行.apply方法。感谢您提供的解决方案,它非常有效,但我不应该重构代码。@Animeartist请检查编辑。感谢您提供的解决方案,它工作得很好,但我不应该重构代码。@Animeartist请检查编辑。