使用Python检测模式中是否存在字符串
我是python编码的初学者。我需要一只手来找到一个优雅的方法来做到这一点: 我得到了以下数据帧:使用Python检测模式中是否存在字符串,python,string,pandas,dataframe,Python,String,Pandas,Dataframe,我是python编码的初学者。我需要一只手来找到一个优雅的方法来做到这一点: 我得到了以下数据帧: pattern nb 1 a,b,c 150 2 b 100 3 c,b 30 4 c 10 根据string的存在,我想要这样的数据帧: pattern nb a b c 1 a,b,c 150 150 150 150 2 b 100 0 100 0 3 c,b
pattern nb
1 a,b,c 150
2 b 100
3 c,b 30
4 c 10
根据string的存在,我想要这样的数据帧:
pattern nb a b c
1 a,b,c 150 150 150 150
2 b 100 0 100 0
3 c,b 30 0 30 30
4 c 10 0 0 10
非常感谢
来自法国的问候
Arnaud可能有更好的方法,但这将满足您的需求:
import pandas as pd
import numpy as np
pattern = ['a,b,c', 'b', 'c,b', 'c']
nb = [150, 100, 30, 10]
df = pd.DataFrame(data=np.column_stack([pattern, nb]), columns=['pattern', 'nb'])
df
>>> pattern nb
0 a,b,c 150
1 b 100
2 c,b 30
3 c 10
然后,您可以检查这些值,将正确的值添加到列表中,然后添加到末尾的数据帧中:
# we want to check whether a, b, or c is in the original pattern
# so we loop over a, b, and c one at a time
for value in ['a', 'b', 'c']:
# when we do our check we want to store the values
# so we initialise an empty list that we will use to add the values toused
new = []
# now we loop over each pattern in the original DataFrame
# enumerate is gives us back an index 'i' and a value 'p' ('p' for pattern in this case)
# just like normal for loop
# we need the index 'i' later to access the DataFrame values
for i, p in enumerate(df['pattern']):
# we now do a test to see if value (ie. a, b, or c) is in 'p'
if value in p:
# if it is we get the value of the pattern from the original DataFrame -> df['nb'].iloc[I]
# df['nb'] selects the column in the DataFrame
# and .iloc[i] gets the correct row
# and we add it to the list
new.append(df['nb'].iloc[i])
else:
# if a, b, or c is not in the pattern we add 0 to the list
new.append(0)
# after one iteration of the loop (a, b, c) and all tests
# we then add a new column to the DataFrame
# value in this case is 'a', 'b', or 'c'
# so the column names are 'a', 'b' or 'c'
df[value] = new
df
>>> pattern nb a b c
0 a,b,c 150 150 150 150
1 b 100 0 100 0
2 c,b 30 0 30 30
3 c 10 0 0 10
可能有更好的方法,但这将满足您的需求:
import pandas as pd
import numpy as np
pattern = ['a,b,c', 'b', 'c,b', 'c']
nb = [150, 100, 30, 10]
df = pd.DataFrame(data=np.column_stack([pattern, nb]), columns=['pattern', 'nb'])
df
>>> pattern nb
0 a,b,c 150
1 b 100
2 c,b 30
3 c 10
然后,您可以检查这些值,将正确的值添加到列表中,然后添加到末尾的数据帧中:
# we want to check whether a, b, or c is in the original pattern
# so we loop over a, b, and c one at a time
for value in ['a', 'b', 'c']:
# when we do our check we want to store the values
# so we initialise an empty list that we will use to add the values toused
new = []
# now we loop over each pattern in the original DataFrame
# enumerate is gives us back an index 'i' and a value 'p' ('p' for pattern in this case)
# just like normal for loop
# we need the index 'i' later to access the DataFrame values
for i, p in enumerate(df['pattern']):
# we now do a test to see if value (ie. a, b, or c) is in 'p'
if value in p:
# if it is we get the value of the pattern from the original DataFrame -> df['nb'].iloc[I]
# df['nb'] selects the column in the DataFrame
# and .iloc[i] gets the correct row
# and we add it to the list
new.append(df['nb'].iloc[i])
else:
# if a, b, or c is not in the pattern we add 0 to the list
new.append(0)
# after one iteration of the loop (a, b, c) and all tests
# we then add a new column to the DataFrame
# value in this case is 'a', 'b', or 'c'
# so the column names are 'a', 'b' or 'c'
df[value] = new
df
>>> pattern nb a b c
0 a,b,c 150 150 150 150
1 b 100 0 100 0
2 c,b 30 0 30 30
3 c 10 0 0 10
以下是一种利用模式由分隔符分隔这一事实的方法:
def splitter(row):
"""Split pattern and return a Series object"""
return pd.Series(row['nb'], index=row['pattern'].split(','))
# Apply this function to each row of the dataframe and fill in the blanks
extra_cols = df.apply(splitter, axis=1).fillna(0)
# join the new columns back to the main dataframe
df.join(extra_cols)
以下是一种利用模式由分隔符分隔这一事实的方法:
def splitter(row):
"""Split pattern and return a Series object"""
return pd.Series(row['nb'], index=row['pattern'].split(','))
# Apply this function to each row of the dataframe and fill in the blanks
extra_cols = df.apply(splitter, axis=1).fillna(0)
# join the new columns back to the main dataframe
df.join(extra_cols)
你的模式总是用逗号分隔的吗?它们确实不包含逗号吗?您好,不一定是逗号,但它总是一样的。您的模式总是这样用逗号分隔吗?它们确实不包含逗号吗?你好,不一定是逗号,但总是一样的。帕迪,非常感谢你的快速回答。我将对此进行思考,以了解代码的结构。正如我所说的:我是Python的新手。嗨,Arnaud,我在代码中添加了更多的注释,希望能更透彻地解释一下。我希望这有帮助!帕迪,非常感谢你的快速回答。我将对此进行思考,以了解代码的结构。正如我所说的:我是Python的新手。嗨,Arnaud,我在代码中添加了更多的注释,希望能更透彻地解释一下。我希望这有帮助!嗨,非常感谢你的高效回答。嗨,非常感谢你的高效回答。