Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python网页抓取';无';_Python_Python 3.x_Pandas_Beautifulsoup_Jupyter Notebook - Fatal编程技术网

Python网页抓取';无';

Python网页抓取';无';,python,python-3.x,pandas,beautifulsoup,jupyter-notebook,Python,Python 3.x,Pandas,Beautifulsoup,Jupyter Notebook,我正试图从站点中获取事件数据 下面的代码继续执行,并在主登录页上找到事件数据和事件链接 import pandas as pd import bs4 as bs import urllib.request source = urllib.request.urlopen('https://10times.com/losangeles-us/technology/conferences').read() soup = bs.BeautifulSoup(source,'html.parser')

我正试图从站点中获取事件数据

下面的代码继续执行,并在主登录页上找到事件数据和事件链接

import pandas as pd
import bs4 as bs
import urllib.request

source = urllib.request.urlopen('https://10times.com/losangeles-us/technology/conferences').read()
soup = bs.BeautifulSoup(source,'html.parser')

aa = []
bb = []

#---Get Event Data---
table = soup.find('tbody')
table_rows = table.find_all('tr') #find table rows (tr)
for x in table_rows:   
    data = x.find_all('td')  #find table data
    row = [x.text for x in data]
    if len(row) > 2: #Exlcudes rows with only event name/link, but no data.
        aa.append(row)
df_event = pd.DataFrame(aa, columns=['Date', 'Event Name', 'Venue', 'Description', 'Type', 'Unnamed:'])
df_event.columns = ['Date', 'Event Name', 'Venue', 'Description', 'Type', 'Interested/Following Count']

#---Get Links---
h2 = soup.find_all('h2')
for i in h2:
    links = i.a['href']
    bb.append(links)
df_link = pd.DataFrame(bb)
df_link.columns = ['Links']

#---Combines dfs---#
df = pd.concat([df_event,df_link],sort=False, axis=1)
df.index += 1       

#--Export to HTML---
df.to_html('10times_Scrape.html',render_links=True)
我现在想转到我拉的每个事件链接,并从页面上找到地址/完整的事件描述示例链接:

事件描述和地址都可以在p标记中找到。然而,当我阅读链接时,我只是不返回任何内容。。我用的是下面正确的div类?我做错了什么?我想查看“类”:“col-md-6”并提取地址

#---Get Address---
for i in bb:
    soup2 = bs.BeautifulSoup(i, 'html.parser')
    text2 = soup2.find('div', attrs={'class':'col-md-6'})
    print(text2)

似乎您错过了获取内部链接的
urllib

#---Get Address---
for i in bb:
    inner_source = urllib.request.urlopen(i).read()

    soup2 = bs.BeautifulSoup(inner_source, 'html.parser')
    text2 = soup2.find('div', 'col-md-6')
    print(text2)
使用
find
/
find_all
还可以将类作为第二个位置参数传递。而且只要强制执行
find
将只返回第一个事件,即使有很多