Python 提取URL的特定部分

Python 提取URL的特定部分,python,python-3.x,beautifulsoup,scrapy,python-requests,Python,Python 3.x,Beautifulsoup,Scrapy,Python Requests,对于给定的URL 由于没有给出特定的div和class属性,如何通过在python中导入BeautifulSoup和request库来提取抽象和声明部分?对于摘要我可以使用p标记,但是可以用于声明部分的内容 我以前从未使用过BeautifulSoup,也很久没有使用过Python了,但是类似的东西应该可以使用(尽管您应该添加错误检查): #获取索赔部分的内容 #输入: #soup-根soup文档 def getClaimsContent(汤): #索赔部分的文本内容 索赔内容=”; #从索赔部

对于给定的URL


由于没有给出特定的div和class属性,如何通过在python中导入BeautifulSoup和request库来提取抽象和声明部分?对于摘要我可以使用p标记,但是可以用于声明部分的内容

我以前从未使用过
BeautifulSoup
,也很久没有使用过Python了,但是类似的东西应该可以使用(尽管您应该添加错误检查):

#获取索赔部分的内容
#输入:
#soup-根soup文档
def getClaimsContent(汤):
#索赔部分的文本内容
索赔内容=”;
#从索赔部分开始的第一条水平线
#可能必须是“soup.find_all(isClaimsHeader)”
firstHR=soup.find(isClaimsHeader.find('hr');
#对于每个兄弟姐妹,直到下一个水平线。。。
node=firstHR.next\u同级;
while(node.name.lower()!=“hr”):
claimsContent+=节点。获取文本();
node=node.next\u同级;
归还索赔内容;
#如果“。。。“Claims…”是当前标记
def isClaimsHeader(标记):
#可以简化,只是为了调试而扩展
如果(tag.name.lower()=='center'):
text=tag.get_text().lower();
#对于调试,请确保文本实际上等于“声明”
打印('中间文本:'+文本);
返回(文本=‘索赔’);

我希望我一切都好。。请让我知道这是否有效。

看起来像页面中的第一个
标记包含摘要。所以,正如您所说,您可以使用
soup.find('p').text
直接获取它

为了从索赔部分获取文本,我使用了与以下类似的逻辑。首先,使用
soup.find('i',text='Claims')
查找
Claims
标记。然后使用函数
text=True
获取此标记后面的所有文本

但是,这将得到直到页面末尾的所有文本,因此,如果
text==“Description”
,则中断循环

claims_tag = soup.find('i', text='Claims')
for text in claims_tag.find_all_next(text=True):
    if text == 'Description':
        break
    print(text)
输出:

Claims


What is claimed is: 
 1.  A computer implemented method for providing lossless compression of an enumeration space for genetic founder lines, the computer implemented method comprising obtaining
an input comprising a set of genetic founder lines and a maximum number of generations G, wherein the following is iteratively performed for h=0, 1, .  . . , G: generating a set of genetic crossing templates of a height h, wherein each of the set of
genetic crossing templates represents a binary tree, the binary tree comprising h levels representing a given generation, each of the h levels comprising a set of nodes wherein when h>0 one or more of the h levels of the binary tree correspond to at
least one cross between at least one pair of nodes in the set of nodes, wherein each of the set of genetic crossing templates comprises an array of h entries, wherein a position of an entry within the array corresponds to a level in the binary tree
represented by the genetic crossing template, each of the h entries in the array comprising a value indicating a number of leaf nodes in the set of nodes for the level in the binary tree;  and determining if at least a first genetic crossing template in
the set of genetic crossing templates is redundant with respect to a second genetic crossing template in the set of genetic crossing templates;  and based on the at least first genetic crossing template being redundant, removing the at least first
genetic crossing template from the set of genetic crossing templates, the removing creating an updated set of genetic crossing templates.

 2.  The computer implemented method of claim 1, wherein the determining comprises: comparing the array of the at least first genetic crossing template with the array of the second genetic crossing template;  determining, based on the comparing,
if the value within each entry of the array for the at least first genetic crossing template matches the value within each corresponding entry of the array for the second genetic crossing template;  and determining that the at least first genetic
crossing template is redundant with respect to the second genetic crossing template based on the value within each entry of the array for the at least first genetic crossing template matching the value within each corresponding entry of the array for the
second genetic crossing template.

 3.  The computer implemented method of claim 1, wherein at least one of the set of genetic crossing templates of height h is generated by combining a previously generated genetic crossing template of a height less than h with another previously
generated genetic crossing template of a height less than h.

 4.  The computer implemented method of claim 1, further comprising: generating a set of genetic crossing instances for each genetic crossing template in the updated set of genetic crossing templates based on the set of founder lines, wherein
each genetic crossing instance in the set of genetic crossing instances is the binary tree represented by the genetic crossing template in the updated set of genetic crossing templates with leaf nodes at each of the h levels labeled with one of the set
of genetic founder lines, wherein the set of genetic crossing instances comprises a genetic crossing instance for each of a plurality of different leaf node labeling variations based on the set of genetic founder lines.

 5.  The computer implemented method of claim 4, further comprising: determining if at least a first genetic crossing instance in the set of genetic crossing instances is redundant with respect to a second genetic crossing instance in the set of
genetic crossing instances;  and based on the at least first genetic crossing instance being redundant, removing the at least first genetic crossing instance from the set of genetic crossing instances, the removing creating an updated set of genetic
crossing instances.

 6.  The computer implemented method of claim 5, wherein the determining comprises: comparing the binary tree of the at least a first genetic crossing instance with the binary tree of the second genetic crossing instance;  determining, based on
the comparing, if the label of each leaf node in the binary tree of the at least first genetic crossing instance matches a label of a corresponding leaf node in the binary tree of the second genetic crossing instance;  and determining that the at least
first genetic crossing instance is redundant with respect to the second genetic crossing instance based on the label of each leaf node in the binary tree of the at least first genetic crossing instance matching a label of a corresponding leaf node in the
binary tree of the second genetic crossing instance.

 7.  The computer implemented method of claim 4, wherein at least one of the set of genetic crossing instances is generated by combining each of a previously generated set of crossing instances of a given genetic crossing template in the update
set of genetic crossing templates with each of a previously generated set of crossing instances of at least one other genetic crossing template in the update set of genetic crossing templates. 

又漂亮又干净!我提出了完全相同的想法,
'\n'.join(itertools.takewhile(lambda x:x!=“Description”,start.find\u all\u next(string=True))
,其中
start
声明标签。
Claims


What is claimed is: 
 1.  A computer implemented method for providing lossless compression of an enumeration space for genetic founder lines, the computer implemented method comprising obtaining
an input comprising a set of genetic founder lines and a maximum number of generations G, wherein the following is iteratively performed for h=0, 1, .  . . , G: generating a set of genetic crossing templates of a height h, wherein each of the set of
genetic crossing templates represents a binary tree, the binary tree comprising h levels representing a given generation, each of the h levels comprising a set of nodes wherein when h>0 one or more of the h levels of the binary tree correspond to at
least one cross between at least one pair of nodes in the set of nodes, wherein each of the set of genetic crossing templates comprises an array of h entries, wherein a position of an entry within the array corresponds to a level in the binary tree
represented by the genetic crossing template, each of the h entries in the array comprising a value indicating a number of leaf nodes in the set of nodes for the level in the binary tree;  and determining if at least a first genetic crossing template in
the set of genetic crossing templates is redundant with respect to a second genetic crossing template in the set of genetic crossing templates;  and based on the at least first genetic crossing template being redundant, removing the at least first
genetic crossing template from the set of genetic crossing templates, the removing creating an updated set of genetic crossing templates.

 2.  The computer implemented method of claim 1, wherein the determining comprises: comparing the array of the at least first genetic crossing template with the array of the second genetic crossing template;  determining, based on the comparing,
if the value within each entry of the array for the at least first genetic crossing template matches the value within each corresponding entry of the array for the second genetic crossing template;  and determining that the at least first genetic
crossing template is redundant with respect to the second genetic crossing template based on the value within each entry of the array for the at least first genetic crossing template matching the value within each corresponding entry of the array for the
second genetic crossing template.

 3.  The computer implemented method of claim 1, wherein at least one of the set of genetic crossing templates of height h is generated by combining a previously generated genetic crossing template of a height less than h with another previously
generated genetic crossing template of a height less than h.

 4.  The computer implemented method of claim 1, further comprising: generating a set of genetic crossing instances for each genetic crossing template in the updated set of genetic crossing templates based on the set of founder lines, wherein
each genetic crossing instance in the set of genetic crossing instances is the binary tree represented by the genetic crossing template in the updated set of genetic crossing templates with leaf nodes at each of the h levels labeled with one of the set
of genetic founder lines, wherein the set of genetic crossing instances comprises a genetic crossing instance for each of a plurality of different leaf node labeling variations based on the set of genetic founder lines.

 5.  The computer implemented method of claim 4, further comprising: determining if at least a first genetic crossing instance in the set of genetic crossing instances is redundant with respect to a second genetic crossing instance in the set of
genetic crossing instances;  and based on the at least first genetic crossing instance being redundant, removing the at least first genetic crossing instance from the set of genetic crossing instances, the removing creating an updated set of genetic
crossing instances.

 6.  The computer implemented method of claim 5, wherein the determining comprises: comparing the binary tree of the at least a first genetic crossing instance with the binary tree of the second genetic crossing instance;  determining, based on
the comparing, if the label of each leaf node in the binary tree of the at least first genetic crossing instance matches a label of a corresponding leaf node in the binary tree of the second genetic crossing instance;  and determining that the at least
first genetic crossing instance is redundant with respect to the second genetic crossing instance based on the label of each leaf node in the binary tree of the at least first genetic crossing instance matching a label of a corresponding leaf node in the
binary tree of the second genetic crossing instance.

 7.  The computer implemented method of claim 4, wherein at least one of the set of genetic crossing instances is generated by combining each of a previously generated set of crossing instances of a given genetic crossing template in the update
set of genetic crossing templates with each of a previously generated set of crossing instances of at least one other genetic crossing template in the update set of genetic crossing templates.