Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/drupal/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何为特定列的每个不同值选择一行,并在Python中合并以形成新的数据帧?_Python_Pandas_Dataframe - Fatal编程技术网

如何为特定列的每个不同值选择一行,并在Python中合并以形成新的数据帧?

如何为特定列的每个不同值选择一行,并在Python中合并以形成新的数据帧?,python,pandas,dataframe,Python,Pandas,Dataframe,我使用的数据集如下所示。这是一个视频字幕数据集,在“描述”列下有字幕 Video_ID Description mv89psg6zh4 A bird is bathing in a sink. mv89psg6zh4 A faucet is running while a bird stands. mv89psg6zh4 A bird gets washed. mv89psg6zh4 A parakeet is taking a shower in a sin

我使用的数据集如下所示。这是一个视频字幕数据集,在“描述”列下有字幕

Video_ID       Description
mv89psg6zh4    A bird is bathing in a sink.
mv89psg6zh4    A faucet is running while a bird stands.
mv89psg6zh4    A bird gets washed.
mv89psg6zh4    A parakeet is taking a shower in a sink.
mv89psg6zh4    The bird is taking a bath under the faucet.
mv89psg6zh4    A bird is standing in a sink drinking water.
R2DvpPTfl-E    PLAYING GAME ON LAPTOP.
R2DvpPTfl-E    THE MAN IS WATCHING LAPTOP.
l7x8uIdg2XU    A woman is pouring ingredients into a bowl.
l7x8uIdg2XU    A woman is adding milk to some pasta.
l7x8uIdg2XU    A person adds ingredients to pasta. 
l7x8uIdg2XU    the girls are doing the cooking.
但是,每个视频的字幕数量不同,也不统一

我打算为一个唯一的视频ID提取一行,并形成一个合并这些唯一行的新数据帧。另外,从现有数据帧中删除同一行

我想要的结果应该如下所示:

数据帧1-

Video_ID       Description
mv89psg6zh4    A faucet is running while a bird stands.
mv89psg6zh4    A bird gets washed.
mv89psg6zh4    A parakeet is taking a shower in a sink.
mv89psg6zh4    The bird is taking a bath under the faucet.
mv89psg6zh4    A bird is standing in a sink drinking water.
R2DvpPTfl-E    THE MAN IS WATCHING LAPTOP.
l7x8uIdg2XU    A woman is adding milk to some pasta.
l7x8uIdg2XU    A person adds ingredients to pasta. 
l7x8uIdg2XU    the girls are doing the cooking.
数据帧2-

Video_ID       Description
mv89psg6zh4    A bird is bathing in a sink.
R2DvpPTfl-E    PLAYING GAME ON LAPTOP.
l7x8uIdg2XU    A woman is pouring ingredients into a bowl.
因此,行基本上从现有数据框中移动,以形成新的数据框。

您可以使用
groupby()
对索引进行采样:

s = df.index.to_series().groupby(df['Video_ID']).apply(lambda x: x.sample(n=1))

# random unique
df.loc[s]

# rest of data
df.drop(s)

伟大、高效的解决方案!!你能告诉我如何选择描述最长的行而不是任意一行吗?