在Python中从大量字符串中读取引号内的子字符串_Python_String_Request_Substring_Urlrequest

在Python中从大量字符串中读取引号内的子字符串

python string

在Python中从大量字符串中读取引号内的子字符串,python,string,request,substring,urlrequest,Python,String,Request,Substring,Urlrequest,我有以下字符串： {"name":"INPROCEEDINGS","__typename":"PublicationConferencePaper"},"hasPermiss ionToLike":true,"hasPermissionToFollow":true,"publicationCategory":"researchSu mmary","hasPublicFulltexts":false,"canClaim":false,"publicationType":"inProcee ding

我有以下字符串：

{"name":"INPROCEEDINGS","__typename":"PublicationConferencePaper"},"hasPermiss
ionToLike":true,"hasPermissionToFollow":true,"publicationCategory":"researchSu
mmary","hasPublicFulltexts":false,"canClaim":false,"publicationType":"inProcee
dings","fulltextRequesterCount":0,"requests":{"__pagination__":
[{"offset":0,"limit":1,"list":[]}]},"activeFiguresCount":0,"activeFigures":
{"__pagination__":[{"offset":0,"limit":100,"list":
[]}]},"abstract":"Heterogeneous Multiprocessor System-on-Chip (MPSoC) are 
progressively becoming predominant in most modern mobile devices. These 
devices are required to perform processing of applications within thermal,
 energy and performance constraints. However, most stock power and thermal
 management mechanisms either neglect some of these constraints or rely on 
frequency scaling to achieve energy-efficiency and temperature reduction on 
the device. Although this inefficient technique can reduce temporal thermal
 gradient, but at the same time hurts the performance of the executing task.
 In this paper, we propose a thermal and energy management mechanism which 
achieves reduction in thermal gradient as well as energy-efficiency through 
resource mapping and thread-partitioning of applications with online 
optimization in heterogeneous MPSoCs. The efficacy of the proposed approach is 
experimentally appraised using different applications from Polybench benchmark 
suite on Odroid-XU4 developmental platform. Results show 28% performance 
improvement, 28.32% energy saving and reduced thermal variance of over 76%
 when compared to the existing approaches. Additionally, the method is able to
 free more than 90% in memory storage on the MPSoC, which would have been 
previously utilized to store several task-to-thread mapping 
configurations.","hasRequestedAbstract":false,"lockedFields"

我正在尝试获取“abstract”：“和”，“hasRequestedAbstract”之间的子字符串。为此，我使用以下代码：

    import requests
    #some more codes here........
    to_visit_url = 'https://www.researchgate.net/publication/328749434_TEEM_Online_Thermal-_and_Energy-Efficiency_Management_on_CPU-GPU_MPSoCs'
    this_page = requests.get(to_visit_url)
    content = str(page.content, encoding="utf-8")
    abstract = re.search('\"abstract\":\"(.*)\",\"hasRequestedAbstract\"', content)
    print('Abstract:\n' + str(abstract))

但在抽象变量中，它的值为None。可能是什么问题？如何获取上面提到的子字符串

注意：虽然看起来我可以将其理解为JSON对象，但这不是一个选项，因为上面提供的示例文本只是完整html内容的一小部分，从中提取JSON对象非常困难

请注意，页面的完整内容，即页面内容，可从此处下载：

或者也可以直接从URL下载源代码：

re.search

不返回解析结果列表。它返回

SRE_Match

对象。若要获得匹配列表，需要使用

re.findall

方法

测试代码

import re
import requests

test_pattern = re.compile('\"abstract\":\"(.*)\",\"hasRequestedAbstract\"')
test_requests = requests.get("https://www.researchgate.net/publication/328749434_TEEM_Online_Thermal-_and_Energy-Efficiency_Management_on_CPU-GPU_MPSoCs")

print(test_pattern.findall(test_requests.text)[0])

结果

'Heterogeneous Multiprocessor System-on-Chip (MPSoC) are progressively becoming predominant in most modern mobile devices. These devices are required to perform processing of applications within thermal, energy and performance constraints. However, most stock power and thermal management mechanisms either neglect some of these constraints or rely on frequency scaling to achieve energy-efficiency and temperature reduction on the device. Although this inefficient technique can reduce temporal thermal gradient, but at the same time hurts the performance of the executing task. In this paper, we propose a thermal and energy management mechanism which achieves reduction in thermal gradient as well as energy-efficiency through resource mapping and thread-partitioning of applications with online optimization in heterogeneous MPSoCs. The efficacy of the proposed approach is experimentally appraised using different applications from Polybench benchmark suite on Odroid-XU4 developmental platform. Results show 28% performance improvement, 28.32% energy saving and reduced thermal variance of over 76% when compared to the existing approaches. Additionally, the method is able to free more than 90% in memory storage on the MPSoC, which would have been previously utilized to store several task-to-thread mapping configurations.'

re.search

不返回已解析的结果列表。它返回

SRE_Match

对象。若要获得匹配列表，需要使用

re.findall

方法

测试代码

import re
import requests

test_pattern = re.compile('\"abstract\":\"(.*)\",\"hasRequestedAbstract\"')
test_requests = requests.get("https://www.researchgate.net/publication/328749434_TEEM_Online_Thermal-_and_Energy-Efficiency_Management_on_CPU-GPU_MPSoCs")

print(test_pattern.findall(test_requests.text)[0])

结果

'Heterogeneous Multiprocessor System-on-Chip (MPSoC) are progressively becoming predominant in most modern mobile devices. These devices are required to perform processing of applications within thermal, energy and performance constraints. However, most stock power and thermal management mechanisms either neglect some of these constraints or rely on frequency scaling to achieve energy-efficiency and temperature reduction on the device. Although this inefficient technique can reduce temporal thermal gradient, but at the same time hurts the performance of the executing task. In this paper, we propose a thermal and energy management mechanism which achieves reduction in thermal gradient as well as energy-efficiency through resource mapping and thread-partitioning of applications with online optimization in heterogeneous MPSoCs. The efficacy of the proposed approach is experimentally appraised using different applications from Polybench benchmark suite on Odroid-XU4 developmental platform. Results show 28% performance improvement, 28.32% energy saving and reduced thermal variance of over 76% when compared to the existing approaches. Additionally, the method is able to free more than 90% in memory storage on the MPSoC, which would have been previously utilized to store several task-to-thread mapping configurations.'

当您执行

请求时。获取（…）

您应该获取请求对象吗

这些对象非常聪明，您可以使用内置的

.json（）

方法将问题中发布的字符串作为python字典返回

尽管我注意到你发布的链接并没有指向任何类似的东西，而是指向一个完整的html文档。如果你想解析这样的网站，你应该看看beautifulsoup。（）

当您执行

请求时。获取（…）

您应该获取请求对象吗

这些对象非常聪明，您可以使用内置的

.json（）

方法将问题中发布的字符串作为python字典返回

尽管我注意到你发布的链接并没有指向任何类似的东西，而是指向一个完整的html文档。如果你想解析这样的网站，你应该看看beautifulsoup。（）

这个答案不是使用正则表达式，而是使用正则表达式。答复如下:

import re
import requests

def fetch_abstract(url = "https://www.researchgate.net/publication/328749434_TEEM_Online_Thermal-_and_Energy-Efficiency_Management_on_CPU-GPU_MPSoCs"):
    test_requests = requests.get(url)
    index = 0
    inner_count = 0
    while index < len(test_requests.text):
            index = test_requests.text.find('[Show full abstract]</a><span class=\"lite-page-hidden', index)
            if index == -1:
                break
            inner_count += 1
            if inner_count == 4:
                #extract the abstract from here -->
                temp = test_requests.text[index-1:]
                index2 = temp.find('</span></div><a class=\"nova-e-link nova-e-link--color-blue')
                quote_index = temp.find('\">')
                abstract = test_requests.text[index + quote_index + 2 : index - 1 + index2]
                print(abstract)
            index += 52

if __name__ == '__main__':
    fetch_abstract()

重新导入
导入请求
def fetch_摘要（url=”https://www.researchgate.net/publication/328749434_TEEM_Online_Thermal-_and_Energy-Efficiency_Management_on_CPU-GPU_MPSoCs"):
test_requests=requests.get（url）
索引=0
内部计数=0
而索引index=test_requests.text.find（'[Show full abstract]此答案不使用正则表达式，而是执行此任务。答案如下：
import re
import requests

def fetch_abstract(url = "https://www.researchgate.net/publication/328749434_TEEM_Online_Thermal-_and_Energy-Efficiency_Management_on_CPU-GPU_MPSoCs"):
    test_requests = requests.get(url)
    index = 0
    inner_count = 0
    while index < len(test_requests.text):
            index = test_requests.text.find('[Show full abstract]</a><span class=\"lite-page-hidden', index)
            if index == -1:
                break
            inner_count += 1
            if inner_count == 4:
                #extract the abstract from here -->
                temp = test_requests.text[index-1:]
                index2 = temp.find('</span></div><a class=\"nova-e-link nova-e-link--color-blue')
                quote_index = temp.find('\">')
                abstract = test_requests.text[index + quote_index + 2 : index - 1 + index2]
                print(abstract)
            index += 52

if __name__ == '__main__':
    fetch_abstract()

重新导入
导入请求
def fetch_摘要（url=”https://www.researchgate.net/publication/328749434_TEEM_Online_Thermal-_and_Energy-Efficiency_Management_on_CPU-GPU_MPSoCs"):
test_requests=requests.get（url）
索引=0
内部计数=0
而索引index=test_requests.text.find（'[Show full abstract]content[content.index（“abstract:”）+9:content.index（“hasRequestedAbstract”）]
？首先需要知道的是，search（）返回一个索引，如果没有匹配的字符串，则返回None。这意味着正则表达式找不到与模式匹配的字符串。content[content.index（“abstract:”）+9:content.index（“hasRequestedAbstract”）]
？您需要知道的第一件事是搜索（）返回一个索引，如果没有匹配的字符串，则返回None。这意味着您的正则表达式找不到与您的模式匹配的字符串。这取决于两个键都存在于json对象中，并且都按照发布的顺序。显然，在这种情况下它是有效的，但通常我个人会将其作为json使用，而不是使用正则表达式。@kde713我收到以下错误-->abstract=test\u pattern.findall（test\u requests.text）[0]回溯（最近一次调用）：文件“”，第1行，在索引器中：列出索引中的索引range@TheCoder你能更新打印的结果吗（test_requests.text）
？我认为您的请求结果与我的不同。@kde713请看一下我更新的问题。您可以从那里下载内容。@TheCoder我在您的google文档文件中找不到抽象部分。您能突出显示文档中的目标部分吗？这取决于json对象中的两个键以及它们的顺序ted。显然它在这种情况下可以工作，但通常我个人会将其作为json使用，而不是使用正则表达式。@kde713我得到以下错误-->abstract=test_pattern.findall（test_requests.text）[0]Traceback（最近一次调用）：File“”，第1行，索引器中：列出索引range@TheCoder你能更新打印的结果吗（test_requests.text）
？我认为您的请求结果与我的不同。@kde713请看一下我更新的问题。您可以从那里下载内容。@TheCoder我在您的google文档文件中找不到抽象部分。您能突出显示文档中的目标部分吗？