Python 使用If/else构造数据帧_Python_Pandas

Python 使用If/else构造数据帧

python pandas

Python 使用If/else构造数据帧,python,pandas,Python,Pandas,我目前正在根据第一列中的字符重新排列数据帧。我使用了下面的函数来重新排列数据 df['RegionName'] = df.loc[df.text.str.contains('(', regex=False), 'text'].str.extract(r'(.*?)\s*[\(\[]+.*[\n]*', expand=False) 我遇到的问题是，最后一步需要在我完成初始重新排列后选择剩余的数据。我相信我需要一个if-else语句，在该语句中，else将允许我完成最后一步。在我的尝试中，我不断得

我目前正在根据第一列中的字符重新排列数据帧。我使用了下面的函数来重新排列数据

df['RegionName'] = df.loc[df.text.str.contains('(', regex=False), 'text'].str.extract(r'(.*?)\s*[\(\[]+.*[\n]*', expand=False)

我遇到的问题是，最后一步需要在我完成初始重新排列后选择剩余的数据。我相信我需要一个if-else语句，在该语句中，else将允许我完成最后一步。在我的尝试中，我不断得到一个错误，我的布尔语句是不明确的。如何在if-else语句中使用上述代码来完成任务

谢谢

看来您需要：

#if need only values where mask is True, else get NaNs
mask = df.text.str.contains('(', regex=False)
df.loc[mask, 'RegionName'] = df.loc[mask, 'text'].str.extract(r'(.*?)\s*[\(\[]+.*[\n]*', 
                                                               expand=False)

或：

为了更好地理解：

df = pd.DataFrame({'text':[' (1', '(', '4', '[7', '{8', '{7', ' [1']})
print (df)
  text
0   (1
1    (
2    4
3   [7
4   {8
5   {7
6   [1

mask1 = df.text.str.contains('(', regex=False)
mask2 = df.text.str.contains('{', regex=False)
mask3 = df.text.str.contains('[', regex=False)

df['d'] = np.where(mask1, 1, 
          np.where(mask2, 3,
          np.where(mask3, 2, 4)))
print (df)
  text  d
0   (1  1
1    (  1
2    4  4
3   [7  2
4   {8  3
5   {7  3
6   [1  2

另一个更复杂的示例：

df = pd.DataFrame({'text':[' (1', '(', '4', '[ur', '{dFd', '{fGf', ' [io']})
print (df)

mask1 = df.text.str.contains('(', regex=False)
mask2 = df.text.str.contains('{', regex=False)
mask3 = df.text.str.contains('[', regex=False)

df['parsed'] = np.where(mask1, df.text.str.extract(r'(\d+)', expand=False), 
               np.where(mask2, df.text.str.extract(r'([A-Z]+)', expand=False),
               np.where(mask3, df.text.str.extract('([uo])+', expand=False), 4)))
print (df)

   text parsed
0    (1      1
1     (    NaN
2     4      4
3   [ur      u
4  {dFd      F
5  {fGf      G
6   [io      o

请你再详细说明一下。。。。我需要用什么来检查面具才是真的@jezraelmask的条件类似于

df.text.str.contains（“（”，regex=False）

-它返回

True/False

系列，被称为mask。关于它的另一个很好的注释是，尽管我不明白为什么我意识到这在不使用if/else语句的情况下是有效的。我把问题的一部分排除在外，因为我想让它清楚，我想我可以在经过某种指导后找到它（事实并非如此）。问题是，现在我实际上需要一些返回false但不是全部的值。例如，我需要在一列中包含“[”的所有项，以及在一列中包含“（”且两者均不存在的所有项。问题是NOTE。它将第一列中的NOTE转换为NaN，在第二列中保留原始值，以便保留['一个。好的，试着解释更多。在熊猫进程系列（简化1d数组）和

DataFrame

（简化2d数组）中。因此对于经典

if（条件）：code-else-code

不可能使用，因为它与数组一起工作。

df = pd.DataFrame({'text':[' (1', '(', '4', '[7', '{8', '{7', ' [1']})
print (df)
  text
0   (1
1    (
2    4
3   [7
4   {8
5   {7
6   [1

mask1 = df.text.str.contains('(', regex=False)
mask2 = df.text.str.contains('{', regex=False)
mask3 = df.text.str.contains('[', regex=False)

df['d'] = np.where(mask1, 1, 
          np.where(mask2, 3,
          np.where(mask3, 2, 4)))
print (df)
  text  d
0   (1  1
1    (  1
2    4  4
3   [7  2
4   {8  3
5   {7  3
6   [1  2

df = pd.DataFrame({'text':[' (1', '(', '4', '[ur', '{dFd', '{fGf', ' [io']})
print (df)

mask1 = df.text.str.contains('(', regex=False)
mask2 = df.text.str.contains('{', regex=False)
mask3 = df.text.str.contains('[', regex=False)

df['parsed'] = np.where(mask1, df.text.str.extract(r'(\d+)', expand=False), 
               np.where(mask2, df.text.str.extract(r'([A-Z]+)', expand=False),
               np.where(mask3, df.text.str.extract('([uo])+', expand=False), 4)))
print (df)

   text parsed
0    (1      1
1     (    NaN
2     4      4
3   [ur      u
4  {dFd      F
5  {fGf      G
6   [io      o