Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/321.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从a href inside p inside div中提取/刮取文本_Python_Html_Web Scraping_Beautifulsoup_Screen Scraping - Fatal编程技术网

Python 从a href inside p inside div中提取/刮取文本

Python 从a href inside p inside div中提取/刮取文本,python,html,web-scraping,beautifulsoup,screen-scraping,Python,Html,Web Scraping,Beautifulsoup,Screen Scraping,我正在使用BeautifulSoup(bs4)和Python,目前我有这种结构 <div class="class1"> <a class="name" href="/doctor/dr-xxxxxxxxx"><h2>Dr. XX XXXX</h2></a> <p class="specialties"><a href="/location/abcd">ab cd</a></p>

我正在使用BeautifulSoup(bs4)和Python,目前我有这种结构

<div class="class1">
  <a class="name" href="/doctor/dr-xxxxxxxxx"><h2>Dr. XX XXXX</h2></a>
  <p class="specialties"><a href="/location/abcd">ab cd</a></p>
  <p class="doc-clinic-name">
     <a class="light_grey link" href="/clinic/fff">f ff</a>
  </p>
</div>


<div class="class2">
  <p class="locality">
    <a class="link grey" href="/location/doctors/ccc">c cc</a>
  </p>
  <p class="fees">INR 999</p>
  <div class="timings">
       <p><span class="strong">MON-SAT</span><br/><span>11:00AM-1:00PM</span>                                   <span>6:00PM-8:00PM</span></p>
  <div class="clear"></div>
</div>
所以基本上post和x包含div class1和class2。现在我想提取的信息是

XXXXXX博士 abcd fff ccc 999卢比 周一至周六上午11:00-下午1:00

如何在post和x变量中进行分支以获取所需信息。谢谢

编辑

我在html中添加了空格。是否可以在不损害空间的情况下生成该格式的csv?
XX XXXX博士,ab cd,f ff,c cc,INR 999,周一至周六11:00AM-1:00PM首先,您的缩进看起来是错误的。其次,我认为在使用
find
时不需要
for
循环,因为它应该只返回第一个匹配项

如果您只是想要链接,可以尝试:

for link in soup.find("div", {"class": "class1"}).findAll("a"):
  print link.text
或者,如果您希望链接本身:

for link in soup.find("div", {"class": "class1"}).findAll("a"):
  print link.get("href")
您还应该注意用于搜索类的方法,方法是将dict传递给
find
方法(编辑:我怀疑还有其他方法可以做到这一点。这正是我学会的方法!)

因此,您可以根据需要尽可能具体

doctorlink = soup.find(("div", {"class": "class1"}).find("a", {"class": "name"})

有没有办法在csv格式的不同条目之间引入逗号
doctorlink = soup.find(("div", {"class": "class1"}).find("a", {"class": "name"})
>>> ' '.join(soup.find("div", "class1").getText().split())
u'Dr. XXXXXX abcd fff'
>>> ' '.join(soup.find("div", "class2").getText().split())
u'ccc INR 999 MON-SAT11:00AM-1:00PM 6:00PM-8:00PM'