Python 3.x 使用selenium和xpath绕过/跳过包含无文本单元格的表行_Python 3.x_Selenium_Xpath

Python 3.x 使用selenium和xpath绕过/跳过包含无文本单元格的表行

python-3.x selenium xpath

Python 3.x 使用selenium和xpath绕过/跳过包含无文本单元格的表行,python-3.x,selenium,xpath,Python 3.x,Selenium,Xpath,我相信这个问题的答案很简单，但经过数小时的研究和测试，我还没有解决这个问题问题就在这里。我最近开始使用selenium从一个创建动态表的网站上获取信息。在测试期间，我注意到我在查看收集的数据时遇到了一些问题。在一些数据检查之后，我注意到一些表字段缺少文本，这导致了出现在代码第二部分中的错误。我已经决定在代码中绕过这些表项，但仍然会出现错误，因此我的代码不正确 # I'm obtaining the <td> tags in the table # with this. td = r

我相信这个问题的答案很简单，但经过数小时的研究和测试，我还没有解决这个问题

问题就在这里。我最近开始使用selenium从一个创建动态表的网站上获取信息。在测试期间，我注意到我在查看收集的数据时遇到了一些问题。在一些数据检查之后，我注意到一些表字段缺少文本，这导致了出现在代码第二部分中的错误。我已经决定在代码中绕过这些表项，但仍然会出现错误，因此我的代码不正确

# I'm obtaining the <td> tags in the table
# with this.
td = row.find_elements_by_xpath(".//td")

# I slice out the desired items this way
# This outputs a <class 'str'>
td[3].text

# I found that this item has no text in some 
# table rows, which causes issues. I have tried 
# using the following to catch and bypass these
# rows

if not td[3].text:
   pass
else:
  # run some code
  # harvest the entire row


if len(td[3].text) != 0:
  # run some code
  # harvest the entire row
else:
  pass 


if len(td[3].text) == 11:
  # run some code
  # harvest the entire row
else:
  pass 


if td[3].text) != '':
  # run some code
  # harvest the entire row
else:
  pass 

# this element is the one that might be empty
td_time = row.find_element_by_xpath(".//td[4]/span/time")
if (len(td_time.text)) != 11:
   print ('no')
elif (len(td_time.text)) == 11:
   print ('yes')

#我正在获取表中的标记
#用这个。
td=行。通过xpath（“.//td”）查找元素
#我用这种方法切下所需的项目
#这将输出一个
td[3]。正文
#我发现此项在某些情况下没有文本
#表行，这会导致问题。我试过了
#使用以下方法捕获并绕过这些
#排
如果不是td[3]。文本：
通过
其他：
#运行一些代码
#收割整行
如果len（td[3].text）！=0:
#运行一些代码
#收割整行
其他：
通过
如果len（td[3].text）==11：
#运行一些代码
#收割整行
其他：
通过
如果td[3]。文本）！=''：
#运行一些代码
#收割整行
其他：
通过
#此元素可能为空
td_time=row.find_元素通过xpath（“.//td[4]/span/time”）
如果（len（td_time.text））！=11:
打印（'否'）
elif（len（td_time.text））==11：
打印（'是'）

我正在刮的桌子有五列。最后一列包含日期，某些包含较旧数据的行中缺少这些日期

# Example with date
<td headers="th-date th-4206951" class="td-date">
   <b class="cell-label ng-binding">Publish Date</b>
   <span class="cell-content"><time datetime="2019-06-05T00:00:00Z" class="ng-binding">04 Jun 2019</time></span>
</td>

# Example without date
<td headers="th-date th-2037023" class="td-date">
  <b class="cell-label ng-binding">Publish Date</b>
  <span class="cell-content"><time datetime="" class="ng-binding"></time></span>
</td>

#带有日期的示例
出版日期
2019年6月4日
#没有日期的例子
出版日期

这些代码示例都没有捕获到空文本块，这在后处理收集的数据时会导致问题

所以我的问题是：如何绕过使用XPATH获得的没有文本的元素？

我只需检查下面的元素

rows = driver.find_elements_by_xpath("//table[starts-with(@id,'mytable')]/tbody/tr[not(td[string-length(normalize-space(text()))=0])]")
for r in rows:
    columns = r.find_elements_by_tag_name('td')
    for col in columns:
        print (col.text)

示例HTML：


1.
FR
2.
SR
TR
4.
检查只有空格的单元格
5.
全部的
排

试试

“//td/*[text（）]”“

你的建议是：td=row。通过xpath（“.//td/*[text（）]”）查找元素。是的，基于我对xpath的有限知识，这可能行得通。不幸的是，你的建议行不通。我会根据你的建议做一些研究。你能在表格的至少一部分贴上一些

td[3]

和不带文本的帖子吗？谢谢你的回答。我目前正在做一些测试，以确定它如何与我正在抓取的站点协同工作。我可能必须指定每个td类，因为在代码仍然产生问题之后使用--td=row.find_elements_by_xpath（“.//td”）--。我现在正在研究后者。我的结果集中仍然有空字符串。我目前正在尝试尽可能精确地使用xpath查找。例如，我尝试过这个方法，但失败了：row.find_element_by_xpath（//td[以（@class，'th-synopsis'）]开头]/span”）.text）。非常感谢任何指导。因此，基本上，您只希望在一行中的所有

td

s都有数据的情况下获取文本，对吗？正确。我是selenium新手，需要更详细地查看此模块的文档。让我更新答案，这将只得到没有任何空单元格的行。