Python 匹配源代码文件中的正则表达式并保存到列表/数据帧
我想匹配我拥有的typescript文件中的正则表达式,并将每个匹配项添加到列表中。我的最终目标是用它制作一个数据帧。 类型脚本文件如下所示:Python 匹配源代码文件中的正则表达式并保存到列表/数据帧,python,regex,pandas,Python,Regex,Pandas,我想匹配我拥有的typescript文件中的正则表达式,并将每个匹配项添加到列表中。我的最终目标是用它制作一个数据帧。 类型脚本文件如下所示: ApplicationStarted: { gaData: { eventCategory: AnalyticsConstants.EventCategories.UserInteraction, eventAction: 'Application Started'
ApplicationStarted: {
gaData: {
eventCategory: AnalyticsConstants.EventCategories.UserInteraction,
eventAction: 'Application Started',
eventLabel: ''
},
eventName: 'Application Started',
description: 'Raised when the Fiddler Everywhere application is started'
},
InstanceStarted: {
gaData: {
eventCategory: AnalyticsConstants.EventCategories.UserInteraction,
eventAction: 'New Instance Started',
eventLabel: 'Instance started for path {0}'
},
eventName: 'New Instance Started',
description: 'Raised when a new instance of the Angular application is started - this could be due to opening a new window or trying it in browser.'
},
eventAction description
Application Started Raised when the Fiddler Everywhere application is started
New Instance Started Raised when a new instance of the Angular application is started ..
我想要的正则表达式匹配是
(?使用提供的正则表达式模式,可以将提取的文本加载到数据帧中:
import re
import pandas as pd
text = """
ApplicationStarted: {
gaData: {
eventCategory: AnalyticsConstants.EventCategories.UserInteraction,
eventAction: 'Application Started',
eventLabel: ''
},
eventName: 'Application Started',
description: 'Raised when the Fiddler Everywhere application is started'
},
InstanceStarted: {
gaData: {
eventCategory: AnalyticsConstants.EventCategories.UserInteraction,
eventAction: 'New Instance Started',
eventLabel: 'Instance started for path {0}'
},
eventName: 'New Instance Started',
description: 'Raised when a new instance of the Angular application is started - this could be due to opening a new window or trying it in browser.'
},
"""
regex_1 = re.compile(r"(?<=eventAction:).*")
regex_2 = re.compile(r"(?<=description:).*.")
res_1 = re.findall(regex_1, text)
res_1 = [res.strip(' ').strip('\'').strip(',')[:-1] for res in res_1]
res_2 = re.findall(regex_2, text)
res_2 = [res.strip(' ').strip('\'') for res in res_2]
cols = ['eventAction', 'description']
df = pd.DataFrame(zip(res_1, res_2),
columns=cols
)
print(df)
通过设置最大列宽查看整个df(值150适用于此数据):
我们得到:
eventAction description
0 Application Started Raised when the Fiddler Everywhere application is started
1 New Instance Started Raised when a new instance of the Angular application is started - this could be due to opening a new window or trying it in browser.
使用提供的正则表达式模式,可以将提取的文本加载到数据帧中:
import re
import pandas as pd
text = """
ApplicationStarted: {
gaData: {
eventCategory: AnalyticsConstants.EventCategories.UserInteraction,
eventAction: 'Application Started',
eventLabel: ''
},
eventName: 'Application Started',
description: 'Raised when the Fiddler Everywhere application is started'
},
InstanceStarted: {
gaData: {
eventCategory: AnalyticsConstants.EventCategories.UserInteraction,
eventAction: 'New Instance Started',
eventLabel: 'Instance started for path {0}'
},
eventName: 'New Instance Started',
description: 'Raised when a new instance of the Angular application is started - this could be due to opening a new window or trying it in browser.'
},
"""
regex_1 = re.compile(r"(?<=eventAction:).*")
regex_2 = re.compile(r"(?<=description:).*.")
res_1 = re.findall(regex_1, text)
res_1 = [res.strip(' ').strip('\'').strip(',')[:-1] for res in res_1]
res_2 = re.findall(regex_2, text)
res_2 = [res.strip(' ').strip('\'') for res in res_2]
cols = ['eventAction', 'description']
df = pd.DataFrame(zip(res_1, res_2),
columns=cols
)
print(df)
通过设置最大列宽查看整个df(值150适用于此数据):
我们得到:
eventAction description
0 Application Started Raised when the Fiddler Everywhere application is started
1 New Instance Started Raised when a new instance of the Angular application is started - this could be due to opening a new window or trying it in browser.