Python 将列表列表转换为数据帧

Python 将列表列表转换为数据帧,python,pandas,list,numpy,dataframe,Python,Pandas,List,Numpy,Dataframe,我有这种格式的数据帧。dataframe中总共有907行和2列,分别命名为Audio和Session。“音频”列包含可以看到的列表列表。这份名单的总长度是10000 Audio sentence [[-0.32357552647590637], [-0.4721883237361908],.....],the kind of them is a relative all the little

我有这种格式的数据帧。dataframe中总共有907行和2列,分别命名为Audio和Session。“音频”列包含可以看到的列表列表。这份名单的总长度是10000

Audio                                                     sentence
[[-0.32357552647590637], [-0.4721883237361908],.....],the kind of them is a relative all the little old lady is it to confide in them and head for buying them hate it consists of a vertical schrock
 [[-0.32357552647590637],[-0.4721883237361908],.....]]the kind of them is a relative all the little old lady is it to confide in them and head for buying them hate it consists of a vertical schrock


我试图将列表转换为数据帧,但它会分离每个字符,这不是我的目标

aa= pd.DataFrame.from_records(X_tra)   
它是这样做的

0   1   2   3   4   5   6   7   8   9   ...     269990  269991  269992  269993  269994  269995  269996  269997  269998  269999
0   [   [   0   .   0   0   3   9   1   1   ...     None    None    None    None    None    None    None    None    None    None
上述给定输出为实际输出。 预期输出如下所示

Audio                  Audio1                    sentence
-0.32357552647590637 -0.4721883237361908 ..... the kind of them is a relative all the little old lady is it to confide in them and head for buying them hate it consists of a vertical schrock
-0.32357552647590637 -0.4721883237361908 ......the kind of them is a relative all the little old lady is it to confide in them and head for buying them hate it consists of a vertical schrock

我想使用这个输出来训练神经网络,这样我的句子列将是Y,数据帧的其余部分将是X。

作为第一步,我将生成一个列名称列表:

N = 10000
colNames = ["Audio" + str(i) for i in range(N)]
我将从先前的数据帧
df
创建第二个数据帧
df2
,使用:

df2 = pd.DataFrame()
df2[colNames] =  pd.DataFrame(df["Audio"].values.tolist(), index=df.index)
这应该非常接近您想要的,只是每个值仍然在列表中。因此,结果应该与此相似:

>>> df2
     Audio0                    Audio1                   Audio2
0    [-0.32357552647590637]    [-0.4721883237361908]    ...
1    [-0.32357552647590637]    [-0.4721883237361908]    ...
2    ...

希望这有帮助。

这个解决方案怎么样

import pandas as pd
import numpy as np

data = pd.DataFrame({'Audio':[[[-0.32357552647590637],[-0.4721883237361908]], [[-0.32357552647590637], [-0.4721883237361908]]],
        'sentence':['the kind of them is a relative all the little old', 'More text']})

audios = data.Audio.apply(lambda x: np.ravel(np.array(x))).apply(pd.Series)
audios.columns = ['Audio'+ str(i) for i in range(len(audios.columns))]

audios['sentence'] = data['sentence']
示例数据为:


                  Audio                                    sentence
0   [[-0.32357552647590637], [-0.4721883237361908]] the kind of them is a relative all the little old
1   [[-0.32357552647590637], [-0.4721883237361908]] More text
和(在DF音频中)结果为:

    Audio0       Audio1      sentence
0   -0.323576   -0.472188   the kind of them is a relative all the little old
1   -0.323576   -0.472188   More text

您可以做的是展平“df.Audio”的每个条目,并用正确的列名构造一个新的
数据帧

# Flatten list in each row
audio_list_flat = []
for nested_list in list(df["Audio"]):
    audio_list_flat.append([y for x in nested_list for y in x])

# Get row with max length, assuming the length of Audio could be different
max_len = max([len(x) for x in audio_list_flat])

# Construct new dataframe
flat_df = pd.DataFrame(audio_list_flat,
                       columns=[f"Audio{i}" for i in range(max_len)],
                       index=df.index)
flat_df["sentence"] = df.sentence

通过这种方式,您可以使用纯熊猫来解决这个问题,而无需添加更多的依赖项。

什么是熊猫系列的
x_tra
?@MKPatel x_tra=x_train.tolist()原始音频列是熊猫系列。所以我把它转换成listit意味着它会给你一个音频列表,对吗?是的,我的代码输出是给定的。是的,得到了。我正在尝试解决你的错误。你的解决方案似乎更一般化了一点!问题是,您在音频列中只使用了4个值。我无法手动写入每个907行的10000个值。@shahidhamdam您不必手动写入,这只是一个示例,以查看发生了什么。要使用这种方法,您应该将代码与audios数据帧一起使用。
# Flatten list in each row
audio_list_flat = []
for nested_list in list(df["Audio"]):
    audio_list_flat.append([y for x in nested_list for y in x])

# Get row with max length, assuming the length of Audio could be different
max_len = max([len(x) for x in audio_list_flat])

# Construct new dataframe
flat_df = pd.DataFrame(audio_list_flat,
                       columns=[f"Audio{i}" for i in range(max_len)],
                       index=df.index)
flat_df["sentence"] = df.sentence