Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/311.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用beautiful soup以有条件的方式获取类内容_Python_Xml_Beautifulsoup - Fatal编程技术网

Python 使用beautiful soup以有条件的方式获取类内容

Python 使用beautiful soup以有条件的方式获取类内容,python,xml,beautifulsoup,Python,Xml,Beautifulsoup,我想使用beautiful soup查找子标记(增益或损失)大于0的标记。然后我想打印内部标记“增益”、“损耗”和“band.textualrepresentation”的内容。这基本上就是我想要的脚本(尽管这个脚本不起作用) 我很早就遇到了麻烦,我甚至无法打印gains的内容,更不用说打印满足特定标准的内容了。我当前的脚本 def parseLog(file): file = sys.argv[1] handler = open(file).read()

我想使用beautiful soup查找子标记(增益或损失)大于0的标记。然后我想打印内部标记“增益”、“损耗”和“band.textualrepresentation”的内容。这基本上就是我想要的脚本(尽管这个脚本不起作用)

我很早就遇到了麻烦,我甚至无法打印gains的内容,更不用说打印满足特定标准的内容了。我当前的脚本

def parseLog(file):
        file = sys.argv[1]
        handler = open(file).read()
        soup = Soup(handler)
        for anytype in soup.findall('anytype'):
                gain = anytype.fetch('gains')
                print gain

parseLog(sys.argv[1])
返回

Traceback (most recent call last):
  File "./soup.py", line 13, in <module>
    parseLog(sys.argv[1])
  File "./soup.py", line 9, in parseLog
    for anytype in soup.findall('anytype'):
TypeError: 'NoneType' object is not callable

更新 当前的解决方案

import sys
from BeautifulSoup import BeautifulSoup as Soup

def parseLog(file):
        file = sys.argv[1]
        handler = open(file).read()
        soup = Soup(handler)
        for anytype in soup(lambda x: x.name=='anytype' and (hasattr(x, 'gains') and int(x.gains.string) > 0 or hasattr(x, 'losses') and int(x.losses.string) > 0)):
                gain = anytype.gains.string
                loss = anytype.losses.string
                band = anytype.band.textualrepresentation.string
                print gain, loss, band

parseLog(sys.argv[1])
仍然返回错误

Traceback (most recent call last):
  File "./soup.py", line 15, in <module>
    parseLog(sys.argv[1])
  File "./soup.py", line 9, in parseLog
    for anytype in soup(lambda x: x.name=='anytype' and (hasattr(x, 'gains') and int(x.gains.string) > 0 or hasattr(x, 'losses') and int(x.losses.string) > 0)):
  File "/Users/jacob/homebrew/lib/python2.7/site-packages/BeautifulSoup.py", line 659, in __call__
    return apply(self.findAll, args, kwargs)
  File "/Users/jacob/homebrew/lib/python2.7/site-packages/BeautifulSoup.py", line 849, in findAll
    return self._findAll(name, attrs, text, limit, generator, **kwargs)
  File "/Users/jacob/homebrew/lib/python2.7/site-packages/BeautifulSoup.py", line 377, in _findAll
    found = strainer.search(i)
  File "/Users/jacob/homebrew/lib/python2.7/site-packages/BeautifulSoup.py", line 966, in search
    found = self.searchTag(markup)
  File "/Users/jacob/homebrew/lib/python2.7/site-packages/BeautifulSoup.py", line 924, in searchTag
    or (markup and self._matches(markup, self.name)) \
  File "/Users/jacob/homebrew/lib/python2.7/site-packages/BeautifulSoup.py", line 983, in _matches
    result = matchAgainst(markup)
  File "./soup.py", line 9, in <lambda>
    for anytype in soup(lambda x: x.name=='anytype' and (hasattr(x, 'gains') and int(x.gains.string) > 0 or hasattr(x, 'losses') and int(x.losses.string) > 0)):
AttributeError: 'NoneType' object has no attribute 'string'
我还是会

Traceback (most recent call last):
  File "./soup.py", line 13, in <module>
    parseLog(sys.argv[1])
  File "./soup.py", line 10, in parseLog
    gain = anytype.gains.string
AttributeError: 'NoneType' object has no attribute 'string'
回溯(最近一次呼叫最后一次):
文件“/soup.py”,第13行,在
parseLog(sys.argv[1])
文件“/soup.py”,第10行,在parseLog中
增益=anytype.gains.string
AttributeError:“非类型”对象没有属性“字符串”
代码应为:

for anytype in soup(lambda x: x.name=='anytype' and (int(x.gains.string) > 0 or int(x.losses.string) > 0)):
    gain = anytype.gains.string
    loss = anytype.losses.string
    band = anytype.band.textualrepresentation.string
    print gain loss band

python
|
,我们需要在执行整数比较之前将字符串转换为数字,例如
int(x.gains.string)
。希望能有所帮助。

我会将整个文档解析为一个数据框架,然后再进行任何操作;这可能会使数据清理过程更加透明和易于理解

我将在这里使用
xmltojson
,因为我不熟悉beautiful soup(尽管为了使它成为有效的XML,我不得不将整个内容包含在“document”标记中):

导入xmltojson
作为pd进口熊猫
打开(文件)为f时:
j=eval(xmltojson.parse(“+f.read()+”))
df=pd.io.json.json_规范化(j['document']['anytype'])
df.columns=['type','band','gain','loss','struct']
测向[(测向增益>'0')|(测向损耗>'0')][[[“频带”、“增益”、“损耗”]]
频带增益损耗
0 22问题11.1 2 1
122Q11.2 0 1

实际上,您可以将其作为字符串保留,并执行
x.gains.string>“0”
有意义的操作,但前提是数据格式良好。或者,如果字符串是
'>'0'
'foo'>'0'
,则很难找到bug。我们需要错误,而不是默默地产生不正确的结果。我仍然得到
文件“/soup.py”,第13行打印增益丢失带^SyntaxError:无效语法
,作为此代码的错误,这意味着您的某些元素不包含
增益
丢失
子元素。您可以通过soup(lambda x:x.name='anytype'和(hasattr(x,'gains')和int(x.gains.string)>0或hasattr(x,'loss')和int(x.loss.string)>0)来进一步保护它。@Jacob此时确定
x
具有
gains
但它不是对象。因此,您可能需要更多的保护
(hasattr(x,'gains'),x.gains不是无)
我在使用此代码时收到此错误
回溯(最近的调用最后一次):File“/script.py”,第5行,以打开(File)作为f:TypeError:强制使用Unicode:需要字符串或缓冲区,键入find
Traceback (most recent call last):
  File "./soup.py", line 15, in <module>
    parseLog(sys.argv[1])
  File "./soup.py", line 9, in parseLog
    for anytype in soup(lambda x: x.name=='anytype' and (hasattr(x, 'gains') and int(x.gains.string) > 0 or hasattr(x, 'losses') and int(x.losses.string) > 0)):
  File "/Users/jacob/homebrew/lib/python2.7/site-packages/BeautifulSoup.py", line 659, in __call__
    return apply(self.findAll, args, kwargs)
  File "/Users/jacob/homebrew/lib/python2.7/site-packages/BeautifulSoup.py", line 849, in findAll
    return self._findAll(name, attrs, text, limit, generator, **kwargs)
  File "/Users/jacob/homebrew/lib/python2.7/site-packages/BeautifulSoup.py", line 377, in _findAll
    found = strainer.search(i)
  File "/Users/jacob/homebrew/lib/python2.7/site-packages/BeautifulSoup.py", line 966, in search
    found = self.searchTag(markup)
  File "/Users/jacob/homebrew/lib/python2.7/site-packages/BeautifulSoup.py", line 924, in searchTag
    or (markup and self._matches(markup, self.name)) \
  File "/Users/jacob/homebrew/lib/python2.7/site-packages/BeautifulSoup.py", line 983, in _matches
    result = matchAgainst(markup)
  File "./soup.py", line 9, in <lambda>
    for anytype in soup(lambda x: x.name=='anytype' and (hasattr(x, 'gains') and int(x.gains.string) > 0 or hasattr(x, 'losses') and int(x.losses.string) > 0)):
AttributeError: 'NoneType' object has no attribute 'string'
for anytype in soup(lambda x: x.name=='anytype' and (hasattr(x, 'gains'))):
        gain = anytype.gains.string
        print gain
Traceback (most recent call last):
  File "./soup.py", line 13, in <module>
    parseLog(sys.argv[1])
  File "./soup.py", line 10, in parseLog
    gain = anytype.gains.string
AttributeError: 'NoneType' object has no attribute 'string'
for anytype in soup(lambda x: x.name=='anytype' and (int(x.gains.string) > 0 or int(x.losses.string) > 0)):
    gain = anytype.gains.string
    loss = anytype.losses.string
    band = anytype.band.textualrepresentation.string
    print gain loss band
import xmltojson
import pandas as pd

with open(file) as f:
    j = eval(xmltojson.parse("<document> "+ f.read() + "</document>"))

df = pd.io.json.json_normalize(j['document']['anytype'])
df.columns = ['type', 'band', 'gain', 'loss', 'struct']
df[(df.gain > '0') | (df.loss > '0')][['band', 'gain', 'loss']]

      band gain loss
0  22q11.1    2    1
1  22q11.2    0    1