Python 如何提取数据帧中字符串列表中编码的2D数组？_Python_String_List_Pandas_Dataframe

Python 如何提取数据帧中字符串列表中编码的2D数组？

python string list pandas dataframe

Python 如何提取数据帧中字符串列表中编码的2D数组？,python,string,list,pandas,dataframe,Python,String,List,Pandas,Dataframe,我弄乱了一个数据帧。我有一个列，其中包含编码数字列表的字符串 e、 g 编辑：实际上，逗号也丢失了 df= mycol 0 '[ 0.5497076 0.59722222 0.42361111]' 1 '[ 0.8030303 0.69090909 0.52727273]' 2 '[ 0.51461988 0.38194444 0.66666667]' 每个字符串用固定数量的元素对列表

我弄乱了一个数据帧。我有一个列，其中包含编码数字列表的字符串

e、 g

编辑：实际上，逗号也丢失了

df=
                                    mycol
0   '[ 0.5497076   0.59722222  0.42361111]'  
1   '[ 0.8030303   0.69090909  0.52727273]'  
2   '[ 0.51461988  0.38194444  0.66666667]'

每个字符串用固定数量的元素对列表进行编码。我想将此

mycl

转换为3（通常为N，其中

N=len（df[mycl][0]）

列

每个列都是数字，包含mycl中原始列表中的一个元素

我试过以下方法，但没有成功

df[mycol]=df[mycol].apply(lambda s: s.split())
df[mycol]=df[mycol].apply(lambda s: np.fromstring(s))

df[['mycol1','mycol2','mycol3']] = pd.DataFrame(df[mycol].values.tolist(), index= df.index)

这应该会有所帮助

Ex:

import pandas as pd
df = pd.DataFrame({"mycol": ['[ 0.5497076   0.59722222  0.42361111]', '[ 0.8030303   0.69090909  0.52727273]']})
df[['mycol1','mycol2','mycol3']]  = df["mycol"].apply(lambda x: x.replace("[", "").replace("]", "").split()).apply(pd.Series)
print(df)

                                   mycol     mycol1      mycol2      mycol3
0  [ 0.5497076   0.59722222  0.42361111]  0.5497076  0.59722222  0.42361111
1  [ 0.8030303   0.69090909  0.52727273]  0.8030303  0.69090909  0.52727273

输出：

import pandas as pd
df = pd.DataFrame({"mycol": ['[ 0.5497076   0.59722222  0.42361111]', '[ 0.8030303   0.69090909  0.52727273]']})
df[['mycol1','mycol2','mycol3']]  = df["mycol"].apply(lambda x: x.replace("[", "").replace("]", "").split()).apply(pd.Series)
print(df)

                                   mycol     mycol1      mycol2      mycol3
0  [ 0.5497076   0.59722222  0.42361111]  0.5497076  0.59722222  0.42361111
1  [ 0.8030303   0.69090909  0.52727273]  0.8030303  0.69090909  0.52727273

这应该会有所帮助

Ex:

import pandas as pd
df = pd.DataFrame({"mycol": ['[ 0.5497076   0.59722222  0.42361111]', '[ 0.8030303   0.69090909  0.52727273]']})
df[['mycol1','mycol2','mycol3']]  = df["mycol"].apply(lambda x: x.replace("[", "").replace("]", "").split()).apply(pd.Series)
print(df)

                                   mycol     mycol1      mycol2      mycol3
0  [ 0.5497076   0.59722222  0.42361111]  0.5497076  0.59722222  0.42361111
1  [ 0.8030303   0.69090909  0.52727273]  0.8030303  0.69090909  0.52727273

输出：

import pandas as pd
df = pd.DataFrame({"mycol": ['[ 0.5497076   0.59722222  0.42361111]', '[ 0.8030303   0.69090909  0.52727273]']})
df[['mycol1','mycol2','mycol3']]  = df["mycol"].apply(lambda x: x.replace("[", "").replace("]", "").split()).apply(pd.Series)
print(df)

                                   mycol     mycol1      mycol2      mycol3
0  [ 0.5497076   0.59722222  0.42361111]  0.5497076  0.59722222  0.42361111
1  [ 0.8030303   0.69090909  0.52727273]  0.8030303  0.69090909  0.52727273

您可以将列表转换为字典，然后直接将其转换为数据帧-

import re
def stringtodict(x):
    d = {}
    x = x.replace("[", "").replace("]", "").strip()
    x = re.split("\\s{1,}", x)
    for i in range(len(x)):
        d[str(i)] = float(x[i])
    return d

pd.DataFrame(df['col1'].apply(stringtodict).tolist())

我已将空格代码编辑为分隔符

您可以将列表转换为字典，然后直接将其转换为数据帧-

import re
def stringtodict(x):
    d = {}
    x = x.replace("[", "").replace("]", "").strip()
    x = re.split("\\s{1,}", x)
    for i in range(len(x)):
        d[str(i)] = float(x[i])
    return d

pd.DataFrame(df['col1'].apply(stringtodict).tolist())

我已经编辑了空格作为分隔符的代码

您的答案很好…但是，我的问题有缺陷，因为我的字符串中没有逗号，所以它不起作用。这应该会有帮助。

print（df[“mycl”].apply（lambda x:x.replace（“[”，”）.replace（“]，”）.split（））

您的注释与df[“mycl”]组合在一起。apply（pd.Series）成功。请编辑答案，以便我可以接受更新的代码段。你的答案很好…但是，我的问题有缺陷，因为我的字符串中没有逗号，所以它不起作用。这应该会有帮助。

print（df[“mycl”].apply（lambda x:x.replace（“[”，”）.replace（“]，”）.split（））

你的注释与df[“mycl”]组合。apply（pd.Series）成功。请编辑答案，以便我可以接受更新的片段。我喜欢你的解决方案，但@Rakesh首先到达时有一些非常相似的内容。我喜欢你的解决方案，但@Rakesh首先到达时有一些非常相似的内容