Python 将一列值分隔为两列_Python_Python 3.x_Pandas_Python 2.7

Python 将一列值分隔为两列

python python-3.x pandas python-2.7

Python 将一列值分隔为两列,python,python-3.x,pandas,python-2.7,Python,Python 3.x,Pandas,Python 2.7,我得到的数据如下： Col Texas[x] Dallas Austin California[x] Los Angeles San Francisco state = None # initialize as None, in case something goes wrong city = None rowlist = [] for idx, row in df.iterrows(): # get the state if '[x]' in row['Col']:

我得到的数据如下：

Col
Texas[x]
Dallas
Austin
California[x]
Los Angeles
San Francisco

state = None  # initialize as None, in case something goes wrong  
city = None
rowlist = []
for idx, row in df.iterrows():
    # get the state
    if '[x]' in row['Col']:
        state = row['Col']
        continue
    # now, get the cities
    city = row['Col']
    rowlist.append([state, city])
df2 = pd.DataFrame(rowlist)

我想要的是：

col1              Col2
Texas[x]          Dallas
                  Austin
California[x]     Los Angeles
                  San Francisco

请帮忙

使用str.extract创建列，然后进行清理

df.Col.str.extract('(.*\[x\])?(.*)').ffill()\
.replace('', np.nan).dropna()\
.rename(columns = {0:'Col1', 1: 'Col2'})\
.set_index('Col1')

                 Col2
Col1    
Texas [x]       Dallas
Texas [x]       Austin
California [x]  Los Angeles
California [x]  San Francisco

更新：解决后续问题

df.Col.str.extract('(.*\[x\])?(.*)').ffill()\
.replace('', np.nan).dropna()\
.rename(columns = {0:'Col1', 1: 'Col2'})

你得到

    Col1            Col2
1   Texas[x]        Dallas
2   Texas[x]        Austin
4   California[x]   Los Angeles
5   California[x]   San Francisco

使用str.extract创建列，然后进行清理

df.Col.str.extract('(.*\[x\])?(.*)').ffill()\
.replace('', np.nan).dropna()\
.rename(columns = {0:'Col1', 1: 'Col2'})\
.set_index('Col1')

                 Col2
Col1    
Texas [x]       Dallas
Texas [x]       Austin
California [x]  Los Angeles
California [x]  San Francisco

更新：解决后续问题

df.Col.str.extract('(.*\[x\])?(.*)').ffill()\
.replace('', np.nan).dropna()\
.rename(columns = {0:'Col1', 1: 'Col2'})

你得到

    Col1            Col2
1   Texas[x]        Dallas
2   Texas[x]        Austin
4   California[x]   Los Angeles
5   California[x]   San Francisco

似乎

[x]

表示列表中的状态。您可以尝试使用

iterrows

在数据帧上迭代。大概是这样的：

Col
Texas[x]
Dallas
Austin
California[x]
Los Angeles
San Francisco

state = None  # initialize as None, in case something goes wrong  
city = None
rowlist = []
for idx, row in df.iterrows():
    # get the state
    if '[x]' in row['Col']:
        state = row['Col']
        continue
    # now, get the cities
    city = row['Col']
    rowlist.append([state, city])
df2 = pd.DataFrame(rowlist)

这假设您的初始数据帧名为

df

，列名为

Col

，并且仅当城市后面跟州时才起作用，这与您的数据样本中的情况类似。

看起来像

[x]

表示列表中的州。您可以尝试使用

iterrows

在数据帧上迭代。大概是这样的：

Col
Texas[x]
Dallas
Austin
California[x]
Los Angeles
San Francisco

state = None  # initialize as None, in case something goes wrong  
city = None
rowlist = []
for idx, row in df.iterrows():
    # get the state
    if '[x]' in row['Col']:
        state = row['Col']
        continue
    # now, get the cities
    city = row['Col']
    rowlist.append([state, city])
df2 = pd.DataFrame(rowlist)

这假设您的初始数据框名为

df

，列名为

Col

，并且仅当城市后面跟州时才有效，这与您的数据示例中的情况类似。

请正确设置数据框的格式。我不知道它是什么样子。您的数据中是否包含[x]来表示城市中的州？根据您的数据帧，您需要一个多索引，索引为col1中的州，col2为与该州关联的城市。@EdekiOkoh是的，它确实有州的标识符。@Chris抱歉，这是我第一次在这里发布，我一发布就立即意识到。现在很好。那么你想要一个基于状态的多索引吗？请发布整个脚本，以便我可以看到您是如何创建此数据帧的。请正确设置数据帧的格式。我不知道它是什么样子。您的数据中是否包含[x]来表示城市中的州？根据您的数据帧，您需要一个多索引，索引为col1中的州，col2为与该州关联的城市。@EdekiOkoh是的，它确实有州的标识符。@Chris抱歉，这是我第一次在这里发布，我一发布就立即意识到。现在很好。那么你想要一个基于状态的多索引吗？请发布整个脚本，以便我可以看到您是如何创建此数据帧的。谢谢！成功了！我唯一需要做的小调整是df[0].str而不是df.Col.str，因为Python笔记本无法识别它。您是weocome。Col是示例数据框中的列名。可能存在于实际数据中，其值为0,1，以此类推。只需删除最后一行，设置_index（），将解决方案进一步扩展，以解决另一个问题，即如果我们只将Col1和Col2保留为列，而不将Col1重置为索引。。。可以这样显示数据框吗：数据框（[['Texas[x]'，'Dallas']，['Texas[x]'，'Austin']，['California[x]'，'Los Angeles']，['California[x]'，'San Francisco']]，Columns=['col1'，'col 2']）我在完成之前错误地发布了我的评论。我删除了那条评论，并发布了一条新的评论，其中我要求以评论中详细说明的特定格式显示输出。抱歉给您带来不便！非常感谢。成功了！我唯一需要做的小调整是df[0].str而不是df.Col.str，因为Python笔记本无法识别它。您是weocome。Col是示例数据框中的列名。可能存在于实际数据中，其值为0,1，以此类推。只需删除最后一行，设置_index（），将解决方案进一步扩展，以解决另一个问题，即如果我们只将Col1和Col2保留为列，而不将Col1重置为索引。。。可以这样显示数据框吗：数据框（[['Texas[x]'，'Dallas']，['Texas[x]'，'Austin']，['California[x]'，'Los Angeles']，['California[x]'，'San Francisco']]，Columns=['col1'，'col 2']）我在完成之前错误地发布了我的评论。我删除了那条评论，并发布了一条新的评论，其中我要求以评论中详细说明的特定格式显示输出。抱歉给您带来不便！我还没有尝试过你的解决方案，因为上面的答案更简洁，所以我同意了。当我有时间的时候，我一定会努力的。谢谢你的回复！是的，瓦伊沙利的回答很好。它比我的更快更像蟒蛇。我的答案更简单，可读性更强，但我认为：）我还没有尝试过你的解决方案，因为上面的答案更简洁，所以我同意了。当我有时间的时候，我一定会努力的。谢谢你的回复！是的，瓦伊沙利的回答很好。它比我的更快更像蟒蛇。不过，我认为我的更简单，可读性更强：）