Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/292.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使用python处理数据集?_Python_Pandas - Fatal编程技术网

如何使用python处理数据集?

如何使用python处理数据集?,python,pandas,Python,Pandas,我有一个输入数据集名称data.csv 内容是 id , name 1 , Jone/Elvis/Tom 2 , Elvis/Tonny 名称列使用斜杠作为分隔符 我需要处理data.csv,我的预期输出是 id, Jone, Elvis, Tom, Toony 1, 1 , 1 , 1 , 0 2, 0 , 1 , 0 , 1 1表示名称中已存在列名,0表示不存在。 如何使用python和pandas来传输输入 import pandas as

我有一个输入数据集名称data.csv 内容是

id ,   name
1  ,  Jone/Elvis/Tom
2  ,  Elvis/Tonny
名称列使用斜杠作为分隔符 我需要处理data.csv,我的预期输出是

id, Jone, Elvis, Tom, Toony
1,   1  ,  1   ,  1 ,  0
2,   0  ,  1   ,  0 ,  1
1表示名称中已存在列名,0表示不存在。 如何使用python和pandas来传输输入

import pandas as pd

data = pd.read_csv("./data.csv")
data["name"]= data["name"].str.split("/")

jone = [0, 0]
elvis = [0, 0]
tom = [0, 0]
tonny = [0, 0]

for i in data.index:
    if any("Jone" in s for s in data.name[i]):
        jone[i] = 1
    else:
        jone[i] = 0

for i in data.index:
    if any("Elvis" in s for s in data.name[i]):
        elvis[i] = 1
    else:
        elvis[i] = 0

for i in data.index:
    if any("Tom" in s for s in data.name[i]):
        tom[i] = 1
    else:
        tom[i] = 0

for i in data.index:
    if any("Tonny" in s for s in data.name[i]):
        tonny[i] = 1
    else:
        tonny[i] = 0

data['Jone'] = jone
data['Elvis'] = elvis
data['Tom'] = tom
data['Tonny'] = tonny

让我们使用熊猫和
.str.get\u假人
sep
参数:

从剪贴板读入数据帧

df = pd.read_clipboard(sep='\s+\,\s+')
df
输入数据帧:

   id            name
0   1  Jone/Elvis/Tom
1   2     Elvis/Tonny
设置索引并使用字符串访问器与
get\u dummies

df1 = df.set_index('id')    
df1['name'].str.get_dummies(sep='/').reset_index()
输出:

   id  Elvis  Jone  Tom  Tonny
0   1      1     1    1      0
1   2      1     0    0      1
   id  Elvis  Jone  Tom  Tonny
0   1      1     1    1      0
1   2      1     0    0      1