Python 将列表列表转换为数据帧_Python_Pandas_List_Numpy_Dataframe

Python 将列表列表转换为数据帧

python pandas list numpy dataframe

Python 将列表列表转换为数据帧,python,pandas,list,numpy,dataframe,Python,Pandas,List,Numpy,Dataframe,我有这种格式的数据帧。dataframe中总共有907行和2列，分别命名为Audio和Session。“音频”列包含可以看到的列表列表。这份名单的总长度是10000 Audio sentence [[-0.32357552647590637], [-0.4721883237361908],.....],the kind of them is a relative all the little

我有这种格式的数据帧。dataframe中总共有907行和2列，分别命名为Audio和Session。“音频”列包含可以看到的列表列表。这份名单的总长度是10000

Audio                                                     sentence
[[-0.32357552647590637], [-0.4721883237361908],.....],the kind of them is a relative all the little old lady is it to confide in them and head for buying them hate it consists of a vertical schrock
 [[-0.32357552647590637],[-0.4721883237361908],.....]]the kind of them is a relative all the little old lady is it to confide in them and head for buying them hate it consists of a vertical schrock

我试图将列表转换为数据帧，但它会分离每个字符，这不是我的目标

aa= pd.DataFrame.from_records(X_tra)

它是这样做的

0   1   2   3   4   5   6   7   8   9   ...     269990  269991  269992  269993  269994  269995  269996  269997  269998  269999
0   [   [   0   .   0   0   3   9   1   1   ...     None    None    None    None    None    None    None    None    None    None

上述给定输出为实际输出。预期输出如下所示

Audio                  Audio1                    sentence
-0.32357552647590637 -0.4721883237361908 ..... the kind of them is a relative all the little old lady is it to confide in them and head for buying them hate it consists of a vertical schrock
-0.32357552647590637 -0.4721883237361908 ......the kind of them is a relative all the little old lady is it to confide in them and head for buying them hate it consists of a vertical schrock

我想使用这个输出来训练神经网络，这样我的句子列将是Y，数据帧的其余部分将是X。

作为第一步，我将生成一个列名称列表：

N = 10000
colNames = ["Audio" + str(i) for i in range(N)]

我将从先前的数据帧

df

创建第二个数据帧

df2

，使用：

df2 = pd.DataFrame()
df2[colNames] =  pd.DataFrame(df["Audio"].values.tolist(), index=df.index)

这应该非常接近您想要的，只是每个值仍然在列表中。因此，结果应该与此相似：

>>> df2
     Audio0                    Audio1                   Audio2
0    [-0.32357552647590637]    [-0.4721883237361908]    ...
1    [-0.32357552647590637]    [-0.4721883237361908]    ...
2    ...

希望这有帮助。

这个解决方案怎么样

import pandas as pd
import numpy as np

data = pd.DataFrame({'Audio':[[[-0.32357552647590637],[-0.4721883237361908]], [[-0.32357552647590637], [-0.4721883237361908]]],
        'sentence':['the kind of them is a relative all the little old', 'More text']})

audios = data.Audio.apply(lambda x: np.ravel(np.array(x))).apply(pd.Series)
audios.columns = ['Audio'+ str(i) for i in range(len(audios.columns))]

audios['sentence'] = data['sentence']

示例数据为：


                  Audio                                    sentence
0   [[-0.32357552647590637], [-0.4721883237361908]] the kind of them is a relative all the little old
1   [[-0.32357552647590637], [-0.4721883237361908]] More text

和（在DF音频中）结果为：

    Audio0       Audio1      sentence
0   -0.323576   -0.472188   the kind of them is a relative all the little old
1   -0.323576   -0.472188   More text

您可以做的是展平“df.Audio”的每个条目，并用正确的列名构造一个新的

数据帧
# Flatten list in each row
audio_list_flat = []
for nested_list in list(df["Audio"]):
    audio_list_flat.append([y for x in nested_list for y in x])

# Get row with max length, assuming the length of Audio could be different
max_len = max([len(x) for x in audio_list_flat])

# Construct new dataframe
flat_df = pd.DataFrame(audio_list_flat,
                       columns=[f"Audio{i}" for i in range(max_len)],
                       index=df.index)
flat_df["sentence"] = df.sentence

通过这种方式，您可以使用纯熊猫来解决这个问题，而无需添加更多的依赖项。
什么是熊猫系列的x_tra
和？@MKPatel x_tra=x_train.tolist（）原始音频列是熊猫系列。所以我把它转换成listit意味着它会给你一个音频列表，对吗？是的，我的代码输出是给定的。是的，得到了。我正在尝试解决你的错误。你的解决方案似乎更一般化了一点！问题是，您在音频列中只使用了4个值。我无法手动写入每个907行的10000个值。@shahidhamdam您不必手动写入，这只是一个示例，以查看发生了什么。要使用这种方法，您应该将代码与audios数据帧一起使用。
# Flatten list in each row
audio_list_flat = []
for nested_list in list(df["Audio"]):
    audio_list_flat.append([y for x in nested_list for y in x])

# Get row with max length, assuming the length of Audio could be different
max_len = max([len(x) for x in audio_list_flat])

# Construct new dataframe
flat_df = pd.DataFrame(audio_list_flat,
                       columns=[f"Audio{i}" for i in range(max_len)],
                       index=df.index)
flat_df["sentence"] = df.sentence