Python 通过搜索列表中字符串子字符串中的数据填充dataframe_Python_Python 3.x_Regex_String_Pandas

Python 通过搜索列表中字符串子字符串中的数据填充dataframe

python python-3.x regex string pandas

Python 通过搜索列表中字符串子字符串中的数据填充dataframe,python,python-3.x,regex,string,pandas,Python,Python 3.x,Regex,String,Pandas,假设有一个字符串列表： lst1 = ['A1 B1 C1', 'A2 B2 D1', 'S1 M1 A3', 'A4 B3 G1','H1 K1 W1'] 我想通过搜索每个字符串中的特定值（如果可用）来创建一个表，然后填充一个数据框像这样： 'A' 'B' 'C' 'D' string1 A1 B1 C1 Nan string2 A2 B2 Nan D1 string3 A3

假设有一个字符串列表：

lst1 = ['A1 B1 C1', 'A2 B2 D1', 'S1 M1 A3', 'A4 B3 G1','H1 K1 W1']

我想通过搜索每个字符串中的特定值（如果可用）来创建一个表，然后填充一个数据框

像这样：

         'A'     'B'     'C'      'D'
string1   A1      B1      C1      Nan
string2   A2      B2     Nan       D1
string3   A3      Nan    Nan      Nan
string4   A4      B3     Nan      Nan
string5   Nan     Nan    Nan      Nan

为了在每个字符串中进行搜索，我将每个字符串拆分为一个列表，使其成为嵌套列表，以便在每个字符串中运行for循环进行搜索。我的RegEx游戏不是很强大，但我认为这可以通过对RegEx的良好处理来实现

我当前的代码：

import pandas as pd
lst1 = ['A1 B1 C1', 'A2 B2 D1', 'S1 M1 A3', 'A4 B3 G1','H1 K1 W1']
modlst1 = []
for each in lst1:
    modlst1.append(each.split())

rows = range(len(modlst1)) ### rows for each string
cols = ['A','B','C','D']   ### cols for each string
df = pd.DataFrame(index = rows, columns = cols)
df = df.fillna(0)

### Populating values
for each in rows:
    for stuff in modlst1[each]:
        if stuff.startswith('A'):
           df['A'] = stuff
        elif stuff.startswith('B'):
           df['B'] = stuff
        elif stuff.startswith('C'):
           df['C'] = stuff
        elif stuff.startswith('D'):
           df['D'] = stuff

我对Python非常陌生，所以我仍在学习字符串操作和搜索与查找。我相信一定有更好的方法来做到这一点。我的解决方案不起作用，因为当我尝试将相同的值放入dataframe时，它们一直填充在我的dataframe中。但当我这样做的时候：

        if stuff.startswith('A'):
           print(stuff)

循环运行良好，我得到了不同的“A”、“B”、“C”、“D”值。例如：（我不想要这个）

以下是一种方法：

将熊猫作为pd导入
lst1=['A1 B1 C1'，'A2 B2 D1'，'S1 M1 A3'，'A4 B3 G1'，'H1 K1 W1']
cols=['A'，'B'，'C'，'D']####每个字符串的cols
df=pd.DataFrame（columns=cols）
###填充值
对于lst1中的elt：
新={}
对于英语教学中的sub_elt，拆分（“”）：
如果cols中的sub_elt[0]：
新建[sub_elt[0]]=sub_elt
df=df.append（pd.Series（新），ignore_index=True）

如果某些部分不清楚，请随时询问

谢谢。虽然当我将其应用于实际数据时，我在cols:indexer:string索引超出范围的行：if sub_elt[0]上得到了回溯。我的数据正是这种格式，所以我无法找出这个错误背后的原因，没有数据很难判断。也许在第一个元素之前有一个尾随空格？我想应该是它。但我试图用数据来检验我的理论，但却得到了同样的错误。你能解释一下，如果cols中的sub_elt[0]实际上是在做什么吗？

         'A'     'B'     'C'      'D'
string1   A1      B1      C1      Nan
string2   A1      B1      C1       D1
string3   A1      B1      C1       D1
string4   A1      B1      C1       D1
string5   A1      B1      C1       D1