Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/312.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python:拆分字符串,使每个子字符串都是字典中的一个键_Python_String_Dataframe_Dictionary - Fatal编程技术网

Python:拆分字符串,使每个子字符串都是字典中的一个键

Python:拆分字符串,使每个子字符串都是字典中的一个键,python,string,dataframe,dictionary,Python,String,Dataframe,Dictionary,我有一个示例字符串: “青苹果,狡猾的狐狸,狡猾的狐狸皮,凉水,黄沙” 还有一本字典: strr_dict = {"green": "color", "apple": "fruit", "sly": "behavior", "fox": "animal", "cunning": "behavior"

我有一个示例字符串:

“青苹果,狡猾的狐狸,狡猾的狐狸皮,凉水,黄沙”

还有一本字典:

strr_dict = {"green": "color", "apple": "fruit", "sly": "behavior", "fox": "animal", "cunning": "behavior", "quick fox": "animal", "cool water": "drink", "yellow": "color", "sand": "matter"}
我想将字符串中的子字符串及其字典中的值显示为数据帧。这就是我所做的:

    import pandas as pd

    sample_str = "green apple, sly fox, cunning quick fox fur, cool water, yellow sand"
    strr_dict = {"green": "color", "apple": "fruit", "sly": "behavior", "fox": "animal", "cunning": "behavior", "quick fox": "animal", "cool water": "drink", "yellow": "color", "sand": "matter"}

    df_list = []
    stripped_list = [i.strip() for i in sample_str.split(',')]
    
    for i in stripped_list:
      if i in strr_dict:
        df_list.append([i, strr_dict[i]])
      else:
        for j in i.split(): 
          if j in strr_dict:
              df_list.append([j, strr_dict[j]])
          else:
            df_list.append([j, ""])
    
    strr_df = pd.DataFrame(df_list, columns=['Text', 'Value'])
    print(strr_df)
我得到的结果是:

             Text      Value
    0        green     color
    1        apple     fruit
    2          sly     behavior
    3          fox     animal
    4      cunning     behavior
    5        quick          
    6          fox     animal
    7          fur          
    8   cool water     drink
    9       yellow     color
    10        sand     matter
我期望的输出是:

             Text      Value
    0        green     color
    1        apple     fruit
    2          sly     behavior
    3          fox     animal
    4      cunning     behavior
    5    quick fox     animal
    6          fur          
    7   cool water     drink
    8       yellow     color
    9         sand     matter

如果子字符串与字典键完全匹配,我想显示这些值。我想知道如何相应地拆分字符串。在这种情况下,
cunning quick fox fur
应拆分为
cunning
quick fox
fur
。但情况并非总是如此,有时应将其拆分为
cunning
quick fox fur
,以从字典中获取其值。我对如何处理这种情况感到非常困惑。

因此这确实给出了您指定的输出。我不知道你为什么想要这样做,我也不知道这是否适用于你可能有的其他输入情况,但它应该-可以随意使用你准备好的任何其他eldritch数据集进行测试

import pandas as pd

sample_str = "green apple, sly fox, cunning quick fox fur, cool water, yellow sand"
strr_dict = {"green": "color", "apple": "fruit", "sly": "behavior", "fox": "animal", "cunning": "behavior",
             "quick fox": "animal", "cool water": "drink", "yellow": "color", "sand": "matter"}

df_list = []
stripped_list = [i.strip() for i in sample_str.split(',')]


checklist = []

for i in stripped_list:
    if i in strr_dict:
        df_list.append([i, strr_dict[i]])
        checklist.append(i)
    else:
        for z in list(strr_dict.keys()):
            if z in str(checklist):
                continue
            if z in i:
                try:
                    df_list.append([i, strr_dict[i]])
                    checklist.append(i)
                except:
                    df_list.append([z, strr_dict[z]])
                    checklist.append(z)
    for x in i.split():
        if x not in str(checklist) and x not in list(strr_dict.keys()):
            df_list.append([x, ""])



strr_df = pd.DataFrame(df_list, columns=['Text', 'Value'])
print(strr_df)
输出:

         Text     Value
0       green     color
1       apple     fruit
2         sly  behavior
3         fox    animal
4     cunning  behavior
5   quick fox    animal
6         fur          
7  cool water     drink
8      yellow     color
9        sand    matter

Process finished with exit code 0

“青苹果,狡猾的狐狸,狡猾的狐狸皮,凉水,黄沙”
所以有时候每个单词,用空格隔开,是一个键,但有时候两个单词合为一个键?输入非常混乱@Flying Thunder,没错。有时每个词都是一个键,有时两个或两个以上的词加在一起就是一个键。@动物学家,逻辑是什么?计算机如何知道两个词何时属于同一个词,何时不属于同一个词?您必须检查每个
分隔字符串与所有字典键,检查此处是否包含一个键,然后,当多个键(一个词和一个两个词,例如quick fox和fox)时,会发生什么?您的示例似乎只需要最长的匹配,所以这听起来是可行的,但是(我知道这是一个stackoverflow的陈词滥调),听起来只需确保您的输入正确就更容易了formated@FlyingThunder,是的,这样检查每一把钥匙是可能的,但我一直在寻找更有效的解决方案。嗨,非常感谢,它在大多数情况下都有效。对于以下情况:
狡猾的quick fox fur yellow sand
,它适用于此字符串,但是
cool water
之后结尾的
yellow sand
不会显示。这是我尝试执行的NLP过程的一部分,我想将值显示为数据帧。您的意思是什么?如果输入不起作用,您的输入是什么?当我使用这个输入时,
“青苹果,狡猾的狐狸,狡猾的狐狸,黄色的沙子,凉水”
它仍然在工作