Python 从列表列表创建数据帧,但有不同的分隔符
我有一份清单:Python 从列表列表创建数据帧,但有不同的分隔符,python,pandas,dataframe,Python,Pandas,Dataframe,我有一份清单: [['1', 'Toy Story (1995)', "Animation|Children's|Comedy"], ['2', 'Jumanji (1995)', "Adventure|Children's|Fantasy"], ['3', 'Grumpier Old Men (1995)', 'Comedy|Romance']] 我希望最终得到一个包含这些列的熊猫数据帧 cols = ['MovieID', 'Name', 'Year', 'A
[['1', 'Toy Story (1995)', "Animation|Children's|Comedy"],
['2', 'Jumanji (1995)', "Adventure|Children's|Fantasy"],
['3', 'Grumpier Old Men (1995)', 'Comedy|Romance']]
我希望最终得到一个包含这些列的熊猫数据帧
cols = ['MovieID', 'Name', 'Year', 'Adventure', 'Children', 'Comedy', 'Fantasy', 'Romance']
对于“冒险”、“儿童”、“喜剧”、“幻想”、“浪漫”列,数据将为1或0
我试过:
for row in movies_list:
for element in row:
if '|' in element:
element = element.split('|')
但是,原始列表没有任何变化。。这里完全被难住了。使用DataFrame
构造函数:
对于列Name
和Year
需要,对于删除尾部)
,也将Year
转换为int
s
df[['Name','Year']] = df['Name'].str.split('\s\(', expand=True)
df['Year'] = df['Year'].str.rstrip(')').astype(int)
最后删除列数据
,并通过以下方式将df1
添加到原始列:
这是我的版本,不足以回答一行,但希望它能帮助你
import pandas as pd
import numpy as np
data = [['1', 'Toy Story (1995)', "Animation|Children's|Comedy"],
['2', 'Jumanji (1995)', "Adventure|Children's|Fantasy"],
['3', 'Grumpier Old Men (1995)', 'Comedy|Romance']]
cols = ['MovieID', 'Name', 'Year', 'Adventure', 'Children', 'Comedy', 'Fantasy', 'Romance']
final = []
for x in data:
output = []
output.append(x[0])
output.append(x[1].split("(")[0].lstrip().rstrip())
output.append(x[1].split("(")[1][:4])
for h in ['Adventure', 'Children', 'Comedy', 'Fantasy', 'Romance']:
output.append(h in x[2])
final.append(output)
df = pd.DataFrame(final, columns=cols)
print(df)
输出:
MovieID Name Year Adventure Children Comedy Fantasy \
0 1 Toy Story 1995 False True True False
1 2 Jumanji 1995 True True False True
2 3 Grumpier Old Men 1995 False False True False
Romance
0 False
1 False
2 True
再次感谢耶斯雷尔!
import pandas as pd
import numpy as np
data = [['1', 'Toy Story (1995)', "Animation|Children's|Comedy"],
['2', 'Jumanji (1995)', "Adventure|Children's|Fantasy"],
['3', 'Grumpier Old Men (1995)', 'Comedy|Romance']]
cols = ['MovieID', 'Name', 'Year', 'Adventure', 'Children', 'Comedy', 'Fantasy', 'Romance']
final = []
for x in data:
output = []
output.append(x[0])
output.append(x[1].split("(")[0].lstrip().rstrip())
output.append(x[1].split("(")[1][:4])
for h in ['Adventure', 'Children', 'Comedy', 'Fantasy', 'Romance']:
output.append(h in x[2])
final.append(output)
df = pd.DataFrame(final, columns=cols)
print(df)
MovieID Name Year Adventure Children Comedy Fantasy \
0 1 Toy Story 1995 False True True False
1 2 Jumanji 1995 True True False True
2 3 Grumpier Old Men 1995 False False True False
Romance
0 False
1 False
2 True