Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/string/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Python检测模式中是否存在字符串_Python_String_Pandas_Dataframe - Fatal编程技术网

使用Python检测模式中是否存在字符串

使用Python检测模式中是否存在字符串,python,string,pandas,dataframe,Python,String,Pandas,Dataframe,我是python编码的初学者。我需要一只手来找到一个优雅的方法来做到这一点: 我得到了以下数据帧: pattern nb 1 a,b,c 150 2 b 100 3 c,b 30 4 c 10 根据string的存在,我想要这样的数据帧: pattern nb a b c 1 a,b,c 150 150 150 150 2 b 100 0 100 0 3 c,b

我是python编码的初学者。我需要一只手来找到一个优雅的方法来做到这一点:

我得到了以下数据帧:

  pattern  nb
1   a,b,c  150
2       b  100
3     c,b  30
4       c  10
根据string的存在,我想要这样的数据帧:

  pattern   nb    a    b     c
1   a,b,c   150   150  150   150
2       b   100   0    100   0
3     c,b   30    0    30    30
4       c   10    0    0     10
非常感谢

来自法国的问候


Arnaud

可能有更好的方法,但这将满足您的需求:

import pandas as pd
import numpy as np

pattern = ['a,b,c', 'b', 'c,b', 'c']
nb = [150, 100, 30, 10]

df = pd.DataFrame(data=np.column_stack([pattern, nb]), columns=['pattern', 'nb'])
df
>>>   pattern   nb
    0   a,b,c  150
    1       b  100
    2     c,b   30
    3       c   10
然后,您可以检查这些值,将正确的值添加到列表中,然后添加到末尾的数据帧中:

# we want to check whether a, b, or c is in the original pattern
# so we loop over a, b, and c one at a time
for value in ['a', 'b', 'c']:
    # when we do our check we want to store the values
    # so we initialise an empty list that we will use to add the values toused 
    new = [] 

    # now we loop over each pattern in the original DataFrame
    # enumerate is gives us back an index 'i' and a value 'p' ('p' for pattern in this case)
    # just like normal for loop
    # we need the index 'i' later to access the DataFrame values  
    for i, p in enumerate(df['pattern']): 

        # we now do a test to see if value (ie. a, b, or c) is in 'p'
        if value in p:
            # if it is we get the value of the pattern from the original DataFrame -> df['nb'].iloc[I]
            # df['nb'] selects the column in the DataFrame
            # and .iloc[i] gets the correct row
            # and we add it to the list
            new.append(df['nb'].iloc[i])
        else:
            # if a, b, or c is not in the pattern we add 0 to the list
            new.append(0)

    # after one iteration of the loop (a, b, c) and all tests
    # we then add a new column to the DataFrame
    # value in this case is 'a', 'b', or 'c'
    # so the column names are 'a', 'b' or 'c'
    df[value] = new

df
>>>   pattern   nb    a    b    c
    0   a,b,c  150  150  150  150
    1       b  100    0  100    0
    2     c,b   30    0   30   30
    3       c   10    0    0   10

可能有更好的方法,但这将满足您的需求:

import pandas as pd
import numpy as np

pattern = ['a,b,c', 'b', 'c,b', 'c']
nb = [150, 100, 30, 10]

df = pd.DataFrame(data=np.column_stack([pattern, nb]), columns=['pattern', 'nb'])
df
>>>   pattern   nb
    0   a,b,c  150
    1       b  100
    2     c,b   30
    3       c   10
然后,您可以检查这些值,将正确的值添加到列表中,然后添加到末尾的数据帧中:

# we want to check whether a, b, or c is in the original pattern
# so we loop over a, b, and c one at a time
for value in ['a', 'b', 'c']:
    # when we do our check we want to store the values
    # so we initialise an empty list that we will use to add the values toused 
    new = [] 

    # now we loop over each pattern in the original DataFrame
    # enumerate is gives us back an index 'i' and a value 'p' ('p' for pattern in this case)
    # just like normal for loop
    # we need the index 'i' later to access the DataFrame values  
    for i, p in enumerate(df['pattern']): 

        # we now do a test to see if value (ie. a, b, or c) is in 'p'
        if value in p:
            # if it is we get the value of the pattern from the original DataFrame -> df['nb'].iloc[I]
            # df['nb'] selects the column in the DataFrame
            # and .iloc[i] gets the correct row
            # and we add it to the list
            new.append(df['nb'].iloc[i])
        else:
            # if a, b, or c is not in the pattern we add 0 to the list
            new.append(0)

    # after one iteration of the loop (a, b, c) and all tests
    # we then add a new column to the DataFrame
    # value in this case is 'a', 'b', or 'c'
    # so the column names are 'a', 'b' or 'c'
    df[value] = new

df
>>>   pattern   nb    a    b    c
    0   a,b,c  150  150  150  150
    1       b  100    0  100    0
    2     c,b   30    0   30   30
    3       c   10    0    0   10

以下是一种利用模式由分隔符分隔这一事实的方法:

def splitter(row):
    """Split pattern and return a Series object"""
    return pd.Series(row['nb'], index=row['pattern'].split(','))

# Apply this function to each row of the dataframe and fill in the blanks
extra_cols = df.apply(splitter, axis=1).fillna(0)

# join the new columns back to the main dataframe
df.join(extra_cols)

以下是一种利用模式由分隔符分隔这一事实的方法:

def splitter(row):
    """Split pattern and return a Series object"""
    return pd.Series(row['nb'], index=row['pattern'].split(','))

# Apply this function to each row of the dataframe and fill in the blanks
extra_cols = df.apply(splitter, axis=1).fillna(0)

# join the new columns back to the main dataframe
df.join(extra_cols)

你的模式总是用逗号分隔的吗?它们确实不包含逗号吗?您好,不一定是逗号,但它总是一样的。您的模式总是这样用逗号分隔吗?它们确实不包含逗号吗?你好,不一定是逗号,但总是一样的。帕迪,非常感谢你的快速回答。我将对此进行思考,以了解代码的结构。正如我所说的:我是Python的新手。嗨,Arnaud,我在代码中添加了更多的注释,希望能更透彻地解释一下。我希望这有帮助!帕迪,非常感谢你的快速回答。我将对此进行思考,以了解代码的结构。正如我所说的:我是Python的新手。嗨,Arnaud,我在代码中添加了更多的注释,希望能更透彻地解释一下。我希望这有帮助!嗨,非常感谢你的高效回答。嗨,非常感谢你的高效回答。