Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/343.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python BeautifulSoup中的CSS选择_Python_Web Scraping_Beautifulsoup - Fatal编程技术网

Python BeautifulSoup中的CSS选择

Python BeautifulSoup中的CSS选择,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我有一个页面,其中包含: <td class="tablec" width="50%"> <div class="atomic"><strong>Date:</strong> Tuesday, October 31, 2017 At 08:00</div> <div class="atomic"><strong>Duration:</strong> 1 day&

我有一个页面,其中包含:

    <td class="tablec" width="50%">
        <div class="atomic"><strong>Date:</strong> Tuesday, October 31, 2017 At 08:00</div>
        <div class="atomic"><strong>Duration:</strong> 1 day</div>
    </td>
但它是空的。你能给我一些提示吗?

如果你使用的是bs4,从标记中提取文本的正确方法是使用.text/.content/.string属性

for line in soup.select("td.tablec > div.atomic"):
     print(line.text)

Date: Tuesday, October 31, 2017 At 08:00
Duration: 1 day
如果只需要最后一行的输出,可以使用:

dur = soup.select("td.tablec > div.atomic")[-1].text.split(None, 1)[-1]

print(dur)
1 day

好的,谢谢。但它仍然没有打印任何东西。似乎soup.selecttd.tablec>div.atomic什么也不返回。@bLAZ当我将代码段复制到终端时,它运行正常。我猜你是从一个用JS动态加载内容的网站上抓取的。@COLDSPEED-hmm,如果它是用JS加载的,这是个问题吗?printsoup\u cal显示页面source@bLAZ是的,bs4只抓取静态内容。您需要一个用于动态内容的webdriver。看看selenium。@COLDSPEED,但使用soup\u-cal=BeautifulSouppage\u-cal.content,'html.parser'可以得到该页面的内容。
dur = soup.select("td.tablec > div.atomic")[-1].text.split(None, 1)[-1]

print(dur)
1 day