Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/363.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/google-app-engine/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python将一列拆分为多列,并将拆分的列重新附加到原始数据帧中_Python_Pandas_Dataframe - Fatal编程技术网

Python将一列拆分为多列,并将拆分的列重新附加到原始数据帧中

Python将一列拆分为多列,并将拆分的列重新附加到原始数据帧中,python,pandas,dataframe,Python,Pandas,Dataframe,我想将数据框中的一列拆分为多列,然后将这些列附加回原始数据框,并根据拆分的列是否包含特定字符串来拆分原始数据框 我有一个dataframe,它有一个列,列中的值用分号分隔,如下所示 import pandas as pd data = {'ID':['1','2','3','4','5','6','7'], 'Residence':['USA;CA;Los Angeles;Los Angeles', 'USA;MA;Suffolk;Boston', 'Canada;ON','USA;F

我想将数据框中的一列拆分为多列,然后将这些列附加回原始数据框,并根据拆分的列是否包含特定字符串来拆分原始数据框

我有一个dataframe,它有一个列,列中的值用分号分隔,如下所示

import pandas as pd
data = {'ID':['1','2','3','4','5','6','7'], 
    'Residence':['USA;CA;Los Angeles;Los Angeles', 'USA;MA;Suffolk;Boston', 'Canada;ON','USA;FL;Charlotte', 'NA', 'Canada;QC', 'USA;AZ'],
    'Name':['Ann','Betty','Carl','David','Emily','Frank', 'George'],
    'Gender':['F','F','M','M','F','M','M']} 
df = pd.DataFrame(data) 
然后,我将该列拆分为如下所示,并根据它是否包含字符串
USA
将拆分的列拆分为两个

address = df['Residence'].str.split(';',expand=True)
country = address[0] != 'USA'
USA, nonUSA = address[~country], address[country]
现在,如果您运行
USA
nonUSA
,您会注意到
nonUSA
中有额外的列,还有一行没有国家信息。所以我去掉了那些
NA

USA.columns = ['Country', 'State', 'County', 'City']
nonUSA.columns = ['Country', 'State']
nonUSA = nonUSA.dropna(axis=0, subset=[1])
nonUSA = nonUSA[nonUSA.columns[0:2]]
现在我想将
USA
nonUSA
附加到我的原始数据帧,这样我将得到两个如下所示的数据帧:

USAdata = pd.DataFrame({'ID':['1','2','4','7'], 
    'Name':['Ann','Betty','David','George'],
    'Gender':['F','F','M','M'],
    'Country':['USA','USA','USA','USA'],
    'State':['CA','MA','FL','AZ'],
    'County':['Los Angeles','Suffolk','Charlotte','None'],
    'City':['Los Angeles','Boston','None','None']})
nonUSAdata = pd.DataFrame({'ID':['3','6'], 
    'Name':['David','Frank'],
    'Gender':['M','M'],
    'Country':['Canada', 'Canada'],
    'State':['ON','QC']})
不过我被困在这里了。我如何将我的原始数据框拆分为居住在
的人是否包括
美国
,并将居住在
的人(
美国
非美国
)的拆分列附加回我的原始数据框


(另外,我刚刚上传了目前为止我拥有的所有内容,但我很好奇是否有更干净/更智能的方法来做到这一点。)

原始数据中有唯一的索引,并且在下一个代码中两个数据帧都没有更改,因此您可以使用连接,然后通过或
concat
使用
axis=1

address = df['Residence'].str.split(';',expand=True)
country = address[0] != 'USA'
USA, nonUSA = address[~country], address[country]
USA.columns = ['Country', 'State', 'County', 'City']

nonUSA = nonUSA.dropna(axis=0, subset=[1])
nonUSA = nonUSA[nonUSA.columns[0:2]]
#changed order for avoid error
nonUSA.columns = ['Country', 'State']

或:

但似乎可以简化:

c = ['Country', 'State', 'County', 'City']
df[c] = df['Residence'].str.split(';',expand=True)
print (df)
  ID                       Residence    Name Gender Country State  \
0  1  USA;CA;Los Angeles;Los Angeles     Ann      F     USA    CA   
1  2           USA;MA;Suffolk;Boston   Betty      F     USA    MA   
2  3                       Canada;ON    Carl      M  Canada    ON   
3  4                USA;FL;Charlotte   David      M     USA    FL   
4  5                              NA   Emily      F      NA  None   
5  6                       Canada;QC   Frank      M  Canada    QC   
6  7                          USA;AZ  George      M     USA    AZ   

        County         City  
0  Los Angeles  Los Angeles  
1      Suffolk       Boston  
2         None         None  
3    Charlotte         None  
4         None         None  
5         None         None  
6         None         None  

您的第一种方法非常有效,我不知道我可以将数据帧连接在一起。对于简化版,我想知道如果我的
USA
nonUSA
的列名没有重叠,是否可以做类似的事情。(例如,如果
USA
列名仍然是
['Country'、'State'、'Country'、'City'],但是如果
nonUSA`列名是
['CountryB'、'StateB']
或者类似的东西,类似的方法还会起作用吗?@Jen-如果需要不同的列名需要您的解决方案,那么就为
['CountryB','StateB']
用于
USA
行,而
['Country','State','County','City']
中缺少值用于
非USA
行。
df = df.join(pd.concat([USA, nonUSA]))
print (df)
  ID                       Residence    Name Gender Country State  \
0  1  USA;CA;Los Angeles;Los Angeles     Ann      F     USA    CA   
1  2           USA;MA;Suffolk;Boston   Betty      F     USA    MA   
2  3                       Canada;ON    Carl      M  Canada    ON   
3  4                USA;FL;Charlotte   David      M     USA    FL   
4  5                              NA   Emily      F     NaN   NaN   
5  6                       Canada;QC   Frank      M  Canada    QC   
6  7                          USA;AZ  George      M     USA    AZ   

        County         City  
0  Los Angeles  Los Angeles  
1      Suffolk       Boston  
2          NaN          NaN  
3    Charlotte         None  
4          NaN          NaN  
5          NaN          NaN  
6         None         None  
c = ['Country', 'State', 'County', 'City']
df[c] = df['Residence'].str.split(';',expand=True)
print (df)
  ID                       Residence    Name Gender Country State  \
0  1  USA;CA;Los Angeles;Los Angeles     Ann      F     USA    CA   
1  2           USA;MA;Suffolk;Boston   Betty      F     USA    MA   
2  3                       Canada;ON    Carl      M  Canada    ON   
3  4                USA;FL;Charlotte   David      M     USA    FL   
4  5                              NA   Emily      F      NA  None   
5  6                       Canada;QC   Frank      M  Canada    QC   
6  7                          USA;AZ  George      M     USA    AZ   

        County         City  
0  Los Angeles  Los Angeles  
1      Suffolk       Boston  
2         None         None  
3    Charlotte         None  
4         None         None  
5         None         None  
6         None         None