Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/flash/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从DataFrame中列内的列表中选择部分字符串_Python_Regex_Pandas - Fatal编程技术网

Python 从DataFrame中列内的列表中选择部分字符串

Python 从DataFrame中列内的列表中选择部分字符串,python,regex,pandas,Python,Regex,Pandas,我有一些数据帧: d = {'fruit': ['apple', 'pear', 'peach'], 'values': ['apple_1_0,peach_1_5','pear_1_3','mango_1_0,banana_1_0,pineapple_1_10']} df = pd.DataFrame(data=d) df fruit values 0 apple apple_1_0,peach_1_5 1 pear pear_1_3 2 peach mango

我有一些数据帧:

d = {'fruit': ['apple', 'pear', 'peach'], 'values': ['apple_1_0,peach_1_5','pear_1_3','mango_1_0,banana_1_0,pineapple_1_10']}
df = pd.DataFrame(data=d)
df

fruit   values
0   apple   apple_1_0,peach_1_5
1   pear    pear_1_3
2   peach   mango_1_0,banana_1_0,pineapple_1_10
“值”列中的字符串是逗号分隔的,我想要包含子字符串“\u 1\u 0”的字符串

期望输出:

类似这样的内容有点接近我正在尝试的内容,但在~100000行上速度非常慢:

for row in range(len(df)):
    print([zero for zero in df['values'].str.split(',', expand=False)[row] if "_1_0" in zero])

['apple_1_0']
[]
['mango_1_0', 'banana_1_0']
让我们试试爆炸


简单的解决方案:

将numpy作为np导入 作为pd进口熊猫 d={'水果':['苹果','梨','桃', “值”:苹果、桃子、梨、芒果、香蕉、菠萝 df=pd.DataFramedata=d 新数据=df['values'].str.split',' new_data=new_data.applylambda lst:[如果元素中有“\u 1\u 0”,则元素表示lst中的元素] new_data=new_data.str.join, new_data=new_data.replace,np.NaN
这是一个备选方案,作为列表理解:

    df["values"] = [ ",".join(entry if entry.endswith("1_0") 
                              else "" 
                              for entry in val.split(","))
                       .rstrip(",")
                   for val in df["values"]
                   ]

     df = df.replace({"": np.nan})

    df


   fruit    values
0   apple   apple_1_0
1   pear    NaN
2   peach   mango_1_0,banana_1_0
使用findall,您可以执行以下操作:

import numpy as np
import pandas as pd

d = {'fruit': ['apple', 'pear', 'peach'], 'values': ['apple_1_0,peach_1_5','pear_1_3','mango_1_0,banana_1_0,pineapple_1_10']}
df = pd.DataFrame(data=d)

df['values'] = df['values'].str.findall(r'[^,]*_1_0(?=,|$)').apply(','.join).replace('', np.NaN)    
print ( df )
Regex[^,]*_1_0?=,|$匹配以_1_0结尾,后跟逗号或字符串结尾的非逗号字符串

我们也可以使用lambda:

df['values'] = df['values'].str.findall(r'[^,]*_1_0(?=,|$)').apply(lambda items: ','.join(items) if len(items) > 0 else np.NaN)

你说得对,我忘了分开的部分。很好。您的正则表达式中是否需要单词boundary\b?我已将其更改为[^,]*以使其与连字号单词匹配,如banana-fruit\u 1\u 0
import numpy as np
import pandas as pd

d = {'fruit': ['apple', 'pear', 'peach'], 'values': ['apple_1_0,peach_1_5','pear_1_3','mango_1_0,banana_1_0,pineapple_1_10']}
df = pd.DataFrame(data=d)

df['values'] = df['values'].str.findall(r'[^,]*_1_0(?=,|$)').apply(','.join).replace('', np.NaN)    
print ( df )
   fruit                values
0  apple             apple_1_0
1   pear                   NaN
2  peach  mango_1_0,banana_1_0
df['values'] = df['values'].str.findall(r'[^,]*_1_0(?=,|$)').apply(lambda items: ','.join(items) if len(items) > 0 else np.NaN)