用于证券的Python正则表达式_Python_Regex_Finance

用于证券的Python正则表达式

python regex

用于证券的Python正则表达式,python,regex,finance,Python,Regex,Finance,我有一个文本文件，其中包含安全名称、$amounts和投资组合的%。我正试图找出如何使用regex将公司分开。我有一个原始的解决方案，允许我.split（'%'），然后创建我需要的3个变量，但我发现一些证券的名称中包含%，因此解决方案不充分字符串示例： Pinterest, Inc. Series F, 8.00%$24,808,9320.022%ResMed,Inc.$23,495,3260.021%Eaton Corp. PLC$53,087,8430.047% 当前正则表达式 [a-z

我有一个文本文件，其中包含安全名称、$amounts和投资组合的%。我正试图找出如何使用regex将公司分开。我有一个原始的解决方案，允许我

.split（'%'）

，然后创建我需要的3个变量，但我发现一些证券的名称中包含

，因此解决方案不充分

字符串示例：

Pinterest, Inc. Series F, 8.00%$24,808,9320.022%ResMed,Inc.$23,495,3260.021%Eaton Corp. PLC$53,087,8430.047%

当前正则表达式

[a-zA-Z0-9,$.\s]+[.0-9%]$

我当前的正则表达式只找到最后一家公司。例如，

Eaton Corp.PLC$530878430.047%

关于如何找到公司的每一个实例，有什么帮助吗

所需解决方案

["Pinterest, Inc. Series F, 8.00%$24,808,9320.022%","ResMed,Inc.$23,495,3260.021%","Eaton Corp. PLC$53,087,8430.047%"]

在Python 3中：

import re
p = re.compile(r'[^$]+\$[^%]+%')
p.findall('Pinterest, Inc. Series F, 8.00%$24,808,9320.022%ResMed,Inc.$23,495,3260.021%Eaton Corp. PLC$53,087,8430.047%')

结果:

['Pinterest, Inc. Series F, 8.00%$24,808,9320.022%', 'ResMed,Inc.$23,495,3260.021%', 'Eaton Corp. PLC$53,087,8430.047%']

您最初的问题是，

锚点使正则表达式只在行的末尾匹配。但是，删除

仍会在

8.00

之后的

位置将Pinterest拆分为两个条目

为了解决这个问题，正则表达式查找一个

，然后再查找一个

，并将通过

处理的所有内容作为一个条目。这种模式适用于您给出的示例，但是，当然，我不知道它是否适用于您的所有数据

编辑正则表达式的工作原理如下：

r'               Use a raw string so you don't have to double the backslashes
  [^$]+          Look for anything up to the next $
       \$        Match the $ itself (\$ because $ alone means end-of-line)
         [^%]+   Now anything up to the next %
              %  And the % itself
               ' End of the string

Python的工作解决方案，包括命名组：

（？P（？P.*？）\$（？P[\d，\.]*？%）

在我提供的链接中，您可以看到更改实时生效，侧边栏提供了对所用语法的解释。

它确实抓住了我想要的内容。美元组除了小数点到千分之一点XXX外，还包括一个%符号。这正是我所需要的！非常感谢。

(?P<item>(?P<name>.*?)\$(?P<usd>[\d,\.]*?%))