Python 匹配源代码文件中的正则表达式并保存到列表/数据帧_Python_Regex_Pandas

Python 匹配源代码文件中的正则表达式并保存到列表/数据帧

python regex pandas

Python 匹配源代码文件中的正则表达式并保存到列表/数据帧,python,regex,pandas,Python,Regex,Pandas,我想匹配我拥有的typescript文件中的正则表达式，并将每个匹配项添加到列表中。我的最终目标是用它制作一个数据帧。类型脚本文件如下所示： ApplicationStarted: { gaData: { eventCategory: AnalyticsConstants.EventCategories.UserInteraction, eventAction: 'Application Started'

我想匹配我拥有的typescript文件中的正则表达式，并将每个匹配项添加到列表中。我的最终目标是用它制作一个数据帧。类型脚本文件如下所示：

 ApplicationStarted: {
            gaData: {
                eventCategory: AnalyticsConstants.EventCategories.UserInteraction,
                eventAction: 'Application Started',
                eventLabel: ''
            },
            eventName: 'Application Started',
            description: 'Raised when the Fiddler Everywhere application is started'
        },

        InstanceStarted: {
            gaData: {
                eventCategory: AnalyticsConstants.EventCategories.UserInteraction,
                eventAction: 'New Instance Started',
                eventLabel: 'Instance started for path {0}'
            },
            eventName: 'New Instance Started',
            description: 'Raised when a new instance of the Angular application is started - this could be due to opening a new window or trying it in browser.'
        },

eventAction                    description
Application Started        Raised when the Fiddler Everywhere application is started
New Instance Started       Raised when a new instance of the Angular application is started ..

我想要的正则表达式匹配是

（？使用提供的正则表达式模式，可以将提取的文本加载到数据帧中：
import re
import pandas as pd

text = """
ApplicationStarted: {
            gaData: {
                eventCategory: AnalyticsConstants.EventCategories.UserInteraction,
                eventAction: 'Application Started',
                eventLabel: ''
            },
            eventName: 'Application Started',
            description: 'Raised when the Fiddler Everywhere application is started'
        },

        InstanceStarted: {
            gaData: {
                eventCategory: AnalyticsConstants.EventCategories.UserInteraction,
                eventAction: 'New Instance Started',
                eventLabel: 'Instance started for path {0}'
            },
            eventName: 'New Instance Started',
            description: 'Raised when a new instance of the Angular application is started - this could be due to opening a new window or trying it in browser.'
        },
"""
regex_1 = re.compile(r"(?<=eventAction:).*")
regex_2 = re.compile(r"(?<=description:).*.")
res_1 = re.findall(regex_1, text)
res_1 = [res.strip(' ').strip('\'').strip(',')[:-1] for res in res_1]
res_2 = re.findall(regex_2, text)
res_2 = [res.strip(' ').strip('\'') for res in res_2]
cols = ['eventAction', 'description']
df = pd.DataFrame(zip(res_1, res_2),
                  columns=cols
                  )
print(df)

通过设置最大列宽查看整个df（值150适用于此数据）：
我们得到：
            eventAction                                                                                                                            description
0   Application Started                                                                              Raised when the Fiddler Everywhere application is started
1  New Instance Started  Raised when a new instance of the Angular application is started - this could be due to opening a new window or trying it in browser.

使用提供的正则表达式模式，可以将提取的文本加载到数据帧中：
import re
import pandas as pd

text = """
ApplicationStarted: {
            gaData: {
                eventCategory: AnalyticsConstants.EventCategories.UserInteraction,
                eventAction: 'Application Started',
                eventLabel: ''
            },
            eventName: 'Application Started',
            description: 'Raised when the Fiddler Everywhere application is started'
        },

        InstanceStarted: {
            gaData: {
                eventCategory: AnalyticsConstants.EventCategories.UserInteraction,
                eventAction: 'New Instance Started',
                eventLabel: 'Instance started for path {0}'
            },
            eventName: 'New Instance Started',
            description: 'Raised when a new instance of the Angular application is started - this could be due to opening a new window or trying it in browser.'
        },
"""
regex_1 = re.compile(r"(?<=eventAction:).*")
regex_2 = re.compile(r"(?<=description:).*.")
res_1 = re.findall(regex_1, text)
res_1 = [res.strip(' ').strip('\'').strip(',')[:-1] for res in res_1]
res_2 = re.findall(regex_2, text)
res_2 = [res.strip(' ').strip('\'') for res in res_2]
cols = ['eventAction', 'description']
df = pd.DataFrame(zip(res_1, res_2),
                  columns=cols
                  )
print(df)

通过设置最大列宽查看整个df（值150适用于此数据）：
我们得到：
            eventAction                                                                                                                            description
0   Application Started                                                                              Raised when the Fiddler Everywhere application is started
1  New Instance Started  Raised when a new instance of the Angular application is started - this could be due to opening a new window or trying it in browser.