使用Python检测模式中是否存在字符串_Python_String_Pandas_Dataframe

使用Python检测模式中是否存在字符串

python string pandas dataframe

使用Python检测模式中是否存在字符串,python,string,pandas,dataframe,Python,String,Pandas,Dataframe,我是python编码的初学者。我需要一只手来找到一个优雅的方法来做到这一点：我得到了以下数据帧： pattern nb 1 a,b,c 150 2 b 100 3 c,b 30 4 c 10 根据string的存在，我想要这样的数据帧： pattern nb a b c 1 a,b,c 150 150 150 150 2 b 100 0 100 0 3 c,b

我是python编码的初学者。我需要一只手来找到一个优雅的方法来做到这一点：

我得到了以下数据帧：

  pattern  nb
1   a,b,c  150
2       b  100
3     c,b  30
4       c  10

根据string的存在，我想要这样的数据帧：

  pattern   nb    a    b     c
1   a,b,c   150   150  150   150
2       b   100   0    100   0
3     c,b   30    0    30    30
4       c   10    0    0     10

非常感谢

来自法国的问候

Arnaud

可能有更好的方法，但这将满足您的需求：

import pandas as pd
import numpy as np

pattern = ['a,b,c', 'b', 'c,b', 'c']
nb = [150, 100, 30, 10]

df = pd.DataFrame(data=np.column_stack([pattern, nb]), columns=['pattern', 'nb'])
df
>>>   pattern   nb
    0   a,b,c  150
    1       b  100
    2     c,b   30
    3       c   10

然后，您可以检查这些值，将正确的值添加到列表中，然后添加到末尾的数据帧中：

# we want to check whether a, b, or c is in the original pattern
# so we loop over a, b, and c one at a time
for value in ['a', 'b', 'c']:
    # when we do our check we want to store the values
    # so we initialise an empty list that we will use to add the values toused 
    new = [] 

    # now we loop over each pattern in the original DataFrame
    # enumerate is gives us back an index 'i' and a value 'p' ('p' for pattern in this case)
    # just like normal for loop
    # we need the index 'i' later to access the DataFrame values  
    for i, p in enumerate(df['pattern']): 

        # we now do a test to see if value (ie. a, b, or c) is in 'p'
        if value in p:
            # if it is we get the value of the pattern from the original DataFrame -> df['nb'].iloc[I]
            # df['nb'] selects the column in the DataFrame
            # and .iloc[i] gets the correct row
            # and we add it to the list
            new.append(df['nb'].iloc[i])
        else:
            # if a, b, or c is not in the pattern we add 0 to the list
            new.append(0)

    # after one iteration of the loop (a, b, c) and all tests
    # we then add a new column to the DataFrame
    # value in this case is 'a', 'b', or 'c'
    # so the column names are 'a', 'b' or 'c'
    df[value] = new

df
>>>   pattern   nb    a    b    c
    0   a,b,c  150  150  150  150
    1       b  100    0  100    0
    2     c,b   30    0   30   30
    3       c   10    0    0   10

可能有更好的方法，但这将满足您的需求：

import pandas as pd
import numpy as np

pattern = ['a,b,c', 'b', 'c,b', 'c']
nb = [150, 100, 30, 10]

df = pd.DataFrame(data=np.column_stack([pattern, nb]), columns=['pattern', 'nb'])
df
>>>   pattern   nb
    0   a,b,c  150
    1       b  100
    2     c,b   30
    3       c   10

然后，您可以检查这些值，将正确的值添加到列表中，然后添加到末尾的数据帧中：

# we want to check whether a, b, or c is in the original pattern
# so we loop over a, b, and c one at a time
for value in ['a', 'b', 'c']:
    # when we do our check we want to store the values
    # so we initialise an empty list that we will use to add the values toused 
    new = [] 

    # now we loop over each pattern in the original DataFrame
    # enumerate is gives us back an index 'i' and a value 'p' ('p' for pattern in this case)
    # just like normal for loop
    # we need the index 'i' later to access the DataFrame values  
    for i, p in enumerate(df['pattern']): 

        # we now do a test to see if value (ie. a, b, or c) is in 'p'
        if value in p:
            # if it is we get the value of the pattern from the original DataFrame -> df['nb'].iloc[I]
            # df['nb'] selects the column in the DataFrame
            # and .iloc[i] gets the correct row
            # and we add it to the list
            new.append(df['nb'].iloc[i])
        else:
            # if a, b, or c is not in the pattern we add 0 to the list
            new.append(0)

    # after one iteration of the loop (a, b, c) and all tests
    # we then add a new column to the DataFrame
    # value in this case is 'a', 'b', or 'c'
    # so the column names are 'a', 'b' or 'c'
    df[value] = new

df
>>>   pattern   nb    a    b    c
    0   a,b,c  150  150  150  150
    1       b  100    0  100    0
    2     c,b   30    0   30   30
    3       c   10    0    0   10

以下是一种利用模式由分隔符分隔这一事实的方法：

def splitter(row):
    """Split pattern and return a Series object"""
    return pd.Series(row['nb'], index=row['pattern'].split(','))

# Apply this function to each row of the dataframe and fill in the blanks
extra_cols = df.apply(splitter, axis=1).fillna(0)

# join the new columns back to the main dataframe
df.join(extra_cols)

以下是一种利用模式由分隔符分隔这一事实的方法：

def splitter(row):
    """Split pattern and return a Series object"""
    return pd.Series(row['nb'], index=row['pattern'].split(','))

# Apply this function to each row of the dataframe and fill in the blanks
extra_cols = df.apply(splitter, axis=1).fillna(0)

# join the new columns back to the main dataframe
df.join(extra_cols)

你的模式总是用逗号分隔的吗？它们确实不包含逗号吗？您好，不一定是逗号，但它总是一样的。您的模式总是这样用逗号分隔吗？它们确实不包含逗号吗？你好，不一定是逗号，但总是一样的。帕迪，非常感谢你的快速回答。我将对此进行思考，以了解代码的结构。正如我所说的：我是Python的新手。嗨，Arnaud，我在代码中添加了更多的注释，希望能更透彻地解释一下。我希望这有帮助！帕迪，非常感谢你的快速回答。我将对此进行思考，以了解代码的结构。正如我所说的：我是Python的新手。嗨，Arnaud，我在代码中添加了更多的注释，希望能更透彻地解释一下。我希望这有帮助！嗨，非常感谢你的高效回答。嗨，非常感谢你的高效回答。