Python 将列表列表转换为数据帧
我有这种格式的数据帧。dataframe中总共有907行和2列,分别命名为Audio和Session。“音频”列包含可以看到的列表列表。这份名单的总长度是10000Python 将列表列表转换为数据帧,python,pandas,list,numpy,dataframe,Python,Pandas,List,Numpy,Dataframe,我有这种格式的数据帧。dataframe中总共有907行和2列,分别命名为Audio和Session。“音频”列包含可以看到的列表列表。这份名单的总长度是10000 Audio sentence [[-0.32357552647590637], [-0.4721883237361908],.....],the kind of them is a relative all the little
Audio sentence
[[-0.32357552647590637], [-0.4721883237361908],.....],the kind of them is a relative all the little old lady is it to confide in them and head for buying them hate it consists of a vertical schrock
[[-0.32357552647590637],[-0.4721883237361908],.....]]the kind of them is a relative all the little old lady is it to confide in them and head for buying them hate it consists of a vertical schrock
我试图将列表转换为数据帧,但它会分离每个字符,这不是我的目标
aa= pd.DataFrame.from_records(X_tra)
它是这样做的
0 1 2 3 4 5 6 7 8 9 ... 269990 269991 269992 269993 269994 269995 269996 269997 269998 269999
0 [ [ 0 . 0 0 3 9 1 1 ... None None None None None None None None None None
上述给定输出为实际输出。
预期输出如下所示
Audio Audio1 sentence
-0.32357552647590637 -0.4721883237361908 ..... the kind of them is a relative all the little old lady is it to confide in them and head for buying them hate it consists of a vertical schrock
-0.32357552647590637 -0.4721883237361908 ......the kind of them is a relative all the little old lady is it to confide in them and head for buying them hate it consists of a vertical schrock
我想使用这个输出来训练神经网络,这样我的句子列将是Y,数据帧的其余部分将是X。作为第一步,我将生成一个列名称列表:
N = 10000
colNames = ["Audio" + str(i) for i in range(N)]
我将从先前的数据帧df
创建第二个数据帧df2
,使用:
df2 = pd.DataFrame()
df2[colNames] = pd.DataFrame(df["Audio"].values.tolist(), index=df.index)
这应该非常接近您想要的,只是每个值仍然在列表中。因此,结果应该与此相似:
>>> df2
Audio0 Audio1 Audio2
0 [-0.32357552647590637] [-0.4721883237361908] ...
1 [-0.32357552647590637] [-0.4721883237361908] ...
2 ...
希望这有帮助。这个解决方案怎么样
import pandas as pd
import numpy as np
data = pd.DataFrame({'Audio':[[[-0.32357552647590637],[-0.4721883237361908]], [[-0.32357552647590637], [-0.4721883237361908]]],
'sentence':['the kind of them is a relative all the little old', 'More text']})
audios = data.Audio.apply(lambda x: np.ravel(np.array(x))).apply(pd.Series)
audios.columns = ['Audio'+ str(i) for i in range(len(audios.columns))]
audios['sentence'] = data['sentence']
示例数据为:
Audio sentence
0 [[-0.32357552647590637], [-0.4721883237361908]] the kind of them is a relative all the little old
1 [[-0.32357552647590637], [-0.4721883237361908]] More text
和(在DF音频中)结果为:
Audio0 Audio1 sentence
0 -0.323576 -0.472188 the kind of them is a relative all the little old
1 -0.323576 -0.472188 More text
您可以做的是展平“df.Audio”的每个条目,并用正确的列名构造一个新的
数据帧
# Flatten list in each row
audio_list_flat = []
for nested_list in list(df["Audio"]):
audio_list_flat.append([y for x in nested_list for y in x])
# Get row with max length, assuming the length of Audio could be different
max_len = max([len(x) for x in audio_list_flat])
# Construct new dataframe
flat_df = pd.DataFrame(audio_list_flat,
columns=[f"Audio{i}" for i in range(max_len)],
index=df.index)
flat_df["sentence"] = df.sentence
通过这种方式,您可以使用纯熊猫来解决这个问题,而无需添加更多的依赖项。什么是熊猫系列的x_tra
和?@MKPatel x_tra=x_train.tolist()原始音频列是熊猫系列。所以我把它转换成listit意味着它会给你一个音频列表,对吗?是的,我的代码输出是给定的。是的,得到了。我正在尝试解决你的错误。你的解决方案似乎更一般化了一点!问题是,您在音频列中只使用了4个值。我无法手动写入每个907行的10000个值。@shahidhamdam您不必手动写入,这只是一个示例,以查看发生了什么。要使用这种方法,您应该将代码与audios数据帧一起使用。
# Flatten list in each row
audio_list_flat = []
for nested_list in list(df["Audio"]):
audio_list_flat.append([y for x in nested_list for y in x])
# Get row with max length, assuming the length of Audio could be different
max_len = max([len(x) for x in audio_list_flat])
# Construct new dataframe
flat_df = pd.DataFrame(audio_list_flat,
columns=[f"Audio{i}" for i in range(max_len)],
index=df.index)
flat_df["sentence"] = df.sentence