如何分割python列表或数组的每个单独元素_Python_Pandas

如何分割python列表或数组的每个单独元素

python pandas

如何分割python列表或数组的每个单独元素,python,pandas,Python,Pandas,我有一个python列表，它是从pandas系列中派生出来的，如下所示： dsa = pd.Series(crew_data['Work Type']) disc = [dsa] print(disc) [0 Disc - Standard Removal & Herbicide 1 Disc - Standard Removal & Herbicide 2 Standard Trim 3

我有一个python列表，它是从pandas系列中派生出来的，如下所示：

dsa = pd.Series(crew_data['Work Type'])
disc = [dsa]
print(disc)

[0      Disc - Standard Removal & Herbicide 
 1      Disc - Standard Removal & Herbicide  
 2                            Standard Trim  
 3                       Disc - Hazard Tree  
 4                       Disc - Hazard Tree  
                  ...                   
 134                     Disc - Hazard Tree  
 135                     Disc - Hazard Tree  
 136                     Disc - Hazard Tree  
 137                     Disc - Hazard Tree  
 138                     Disc - Hazard Tree  
 Name: Work Type, Length: 139, dtype: object]

结果如下：

dsa = pd.Series(crew_data['Work Type'])
disc = [dsa]
print(disc)

[0      Disc - Standard Removal & Herbicide 
 1      Disc - Standard Removal & Herbicide  
 2                            Standard Trim  
 3                       Disc - Hazard Tree  
 4                       Disc - Hazard Tree  
                  ...                   
 134                     Disc - Hazard Tree  
 135                     Disc - Hazard Tree  
 136                     Disc - Hazard Tree  
 137                     Disc - Hazard Tree  
 138                     Disc - Hazard Tree  
 Name: Work Type, Length: 139, dtype: object]

现在，下一步是对每个元素的前4个字符进行切片，以便返回的值是Disc

当在单个字符串上执行时，这似乎很简单，但是由于某种原因，尝试使用列表执行此操作时，似乎几乎是不可能的。这可以简单地在Excel中使用公式=LEFT（A1,4）来完成，所以肯定可以在python中简单地完成吗

如果有人有一个解决方案，那就太好了。

带一个示例数据帧

In [138]: df                                                                                     
Out[138]: 
  col1  col2 col3 newcol
0    a     1    x    Wow
1    b     2    y    Dud
2    c     1    z    Wow
In [139]: df['newcol']                                                                           
Out[139]: 
0    Wow
1    Dud
2    Wow
Name: newcol, dtype: object
In [140]: type(_)                                                                                
Out[140]: pandas.core.series.Series

选择一列给我一个系列；不需要另一个系列包装器

In [141]: pd.Series(df['newcol'])                                                                
Out[141]: 
0    Wow
1    Dud
2    Wow
Name: newcol, dtype: object

我们可以把它列在一个列表中，但这没有任何好处：

In [142]: [pd.Series(df['newcol'])]                                                              
Out[142]: 
[0    Wow
 1    Dud
 2    Wow
 Name: newcol, dtype: object]
In [143]: len(_)                                                                                 
Out[143]: 1

我们可以将值提取为numpy数组：

In [144]: pd.Series(df['newcol']).values                                                         
Out[144]: array(['Wow', 'Dud', 'Wow'], dtype=object)

我们可以对数组或序列中的每个元素应用字符串切片-使用列表理解：

In [145]: [astr[:2] for astr in _144]                                                            
Out[145]: ['Wo', 'Du', 'Wo']
In [146]: [astr[:2] for astr in _141]                                                            
Out[146]: ['Wo', 'Du', 'Wo']

列表理解不一定是最“高级”的方式，但它是一个良好的开端。实际上它接近最佳，因为切片字符串必须使用字符串方法；没有其他人实现字符串切片

pandas

有一个

str

方法，用于将字符串方法应用于序列：

In [147]: ds = df['newcol']  
In [151]: ds.str.slice(0,2)        # or ds.str[:2]                                                               
Out[151]: 
0    Wo
1    Du
2    Wo
Name: newcol, dtype: object

这比列表理解更清晰、更美观，但实际上速度较慢。

使用示例数据帧

In [138]: df                                                                                     
Out[138]: 
  col1  col2 col3 newcol
0    a     1    x    Wow
1    b     2    y    Dud
2    c     1    z    Wow
In [139]: df['newcol']                                                                           
Out[139]: 
0    Wow
1    Dud
2    Wow
Name: newcol, dtype: object
In [140]: type(_)                                                                                
Out[140]: pandas.core.series.Series

选择一列给我一个系列；不需要另一个系列包装器

In [141]: pd.Series(df['newcol'])                                                                
Out[141]: 
0    Wow
1    Dud
2    Wow
Name: newcol, dtype: object

我们可以把它列在一个列表中，但这没有任何好处：

In [142]: [pd.Series(df['newcol'])]                                                              
Out[142]: 
[0    Wow
 1    Dud
 2    Wow
 Name: newcol, dtype: object]
In [143]: len(_)                                                                                 
Out[143]: 1

我们可以将值提取为numpy数组：

In [144]: pd.Series(df['newcol']).values                                                         
Out[144]: array(['Wow', 'Dud', 'Wow'], dtype=object)

我们可以对数组或序列中的每个元素应用字符串切片-使用列表理解：

In [145]: [astr[:2] for astr in _144]                                                            
Out[145]: ['Wo', 'Du', 'Wo']
In [146]: [astr[:2] for astr in _141]                                                            
Out[146]: ['Wo', 'Du', 'Wo']

列表理解不一定是最“高级”的方式，但它是一个良好的开端。实际上它接近最佳，因为切片字符串必须使用字符串方法；没有其他人实现字符串切片

pandas

有一个

str

方法，用于将字符串方法应用于序列：

In [147]: ds = df['newcol']  
In [151]: ds.str.slice(0,2)        # or ds.str[:2]                                                               
Out[151]: 
0    Wo
1    Du
2    Wo
Name: newcol, dtype: object

这比列表理解更清晰、更美观，但实际上更慢。

我可能没有抓住问题的要点，但这里有一个正则表达式实现

import re

# Sample data
disc = ['                       Disc - Standard Removal & Herbicide ',
 '      Disc - Standard Removal & Herbicide  ',
'                           Standard Trim  ',
'                       Disc - Hazard Tree',
'                      Disc - Hazard Tree ',]

# Regular Expression pattern
# We have Disc in parenthesis because that's what we want to capture.
# Using re.search(<pattern>, <string>).group(1) returns the first matching group. Using just
# re.search(<pattern>, <string>).group() would return the entire row.
disc_pattern = r"\s+?(Disc)\s+?"

# List comprehension that skips rows without 'Disc'
[re.search(disc_pattern, i).group(1) for i in disc if re.match(disc_pattern, i)]

我可能遗漏了问题的要点，但这里有一个正则表达式实现

import re

# Sample data
disc = ['                       Disc - Standard Removal & Herbicide ',
 '      Disc - Standard Removal & Herbicide  ',
'                           Standard Trim  ',
'                       Disc - Hazard Tree',
'                      Disc - Hazard Tree ',]

# Regular Expression pattern
# We have Disc in parenthesis because that's what we want to capture.
# Using re.search(<pattern>, <string>).group(1) returns the first matching group. Using just
# re.search(<pattern>, <string>).group() would return the entire row.
disc_pattern = r"\s+?(Disc)\s+?"

# List comprehension that skips rows without 'Disc'
[re.search(disc_pattern, i).group(1) for i in disc if re.match(disc_pattern, i)]

这个列表是一个大字符串，还是列表中有多个对象？你能提供一个更好的例子吗？不，这些是单独的对象。它们代表系统数据库中每个单独任务的类别代码在

crew\u data['column']

上调用

pd.Series（）

有什么原因吗？通常，如果

crew\u data

是一个

DataFrame

，获得一个列将已经为您提供了一个

系列

？根据您问题中不清楚的一些细节，您的问题可能已经在这里得到了回答，谢谢链接。那句话写得很好。我在这个主题上搜索的所有内容都提供了一个带有for循环的函数，或者是一些更复杂但不起作用的函数……这个列表是一个大字符串，还是列表中有多个对象？你能提供一个更好的例子吗？不，这些是单独的对象。它们代表系统数据库中每个单独任务的类别代码在

crew\u data['column']

上调用

pd.Series（）

有什么原因吗？通常，如果

crew\u data

是一个

DataFrame

，获得一个列将已经为您提供了一个

系列

？根据您问题中不清楚的一些细节，您的问题可能已经在这里得到了回答，谢谢链接。那句话写得很好。我在这个主题上搜索的所有内容都提供了一个带有for循环的函数，或者一些更复杂的函数，但都不起作用……非常好+1。最后一个代码块ds.str.slice（0,2）假设为df.str.slice（0,2）。我错过了一个复制行，将

Out[141]

系列分配给

ds

@优点2非常好+1。最后一个代码块ds.str.slice（0,2）假设为df.str.slice（0,2）。我错过了一个复制行，将

Out[141]

系列分配给

ds

@优点2