Python 从网站中删除表格：无法寻址正确的表格_Python_Html_Web Scraping_Html Table

Python 从网站中删除表格：无法寻址正确的表格

python html web-scraping

Python 从网站中删除表格：无法寻址正确的表格,python,html,web-scraping,html-table,Python,Html,Web Scraping,Html Table,我是一名Python新手，刚刚开始学习它，但有以下问题：我想从网站上刮取公文包数据（；向下滚动，单击“公文包”）；但我无法说明正确的tr类别“c-投资组合”，但始终以右侧第一个表格“Erstemission 20.09.2019”的值结束我在reddit/stackoverflow上尝试了超过15种网络教程和问题/答案，但无法解决它，我想这在这个网站上很特别。下面是我最高级的代码如果有任何建议，我将不胜感激！：）最好的，朱利安其他尝试： import pandas as pd impo

我是一名Python新手，刚刚开始学习它，但有以下问题：我想从网站上刮取公文包数据（；向下滚动，单击“公文包”）；但我无法说明正确的tr类别“c-投资组合”，但始终以右侧第一个表格“Erstemission 20.09.2019”的值结束

我在reddit/stackoverflow上尝试了超过15种网络教程和问题/答案，但无法解决它，我想这在这个网站上很特别。下面是我最高级的代码

如果有任何建议，我将不胜感激！：）

最好的，朱利安

其他尝试：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

from urllib.request import urlopen
from bs4 import BeautifulSoup

url ='https://www.wikifolio.com/de/de/w/wffalkinve'
html = urlopen(url)

soup = BeautifulSoup(html, 'lxml')
type(soup)



soup.find_all('tr')

# Print the first 10 rows for sanity check
rows = soup.find_all('tr')
print(rows[:10])

for row in rows:
    row_td = row.find_all('td')
print(row_td)
type(row_td)


str_cells = str(row_td)
cleantext = BeautifulSoup(str_cells, "lxml").get_text()
print(cleantext)

import re

list_rows = []
for row in rows:
    cells = row.find_all('td')
    str_cells = str(cells)
    clean = re.compile('<.*?>')
    clean2 = (re.sub(clean, '',str_cells))
    list_rows.append(clean2)
print(clean2)
type(clean2)

df = pd.DataFrame(list_rows)
df.head(10)

df1 = df[0].str.split(',', expand=True)
df1.head(10)

编辑：更容易找到相应的表格：tr c类投资组合

如果您试图抓取的内容依赖于javascript加载，则Pandas、BS解决方案将无法工作。我建议使用硒铬无头解决方案。我会试试——现在我不知道硒是什么：D非常感谢！您或其他任何人是否知道有哪些资源可以尝试在桌子上涂硒？我找不到合适的解决方案，但我可能会看看相关的答案。


#Create empty list
col=[]
i=0

#For each row, store each first element (header) and an empty list
for t in tr_elements[0]:
    i+=1
    name=t.text_content()
    print('%d: %s' % (i,name))
    col.append((name,[]))

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

from urllib.request import urlopen
from bs4 import BeautifulSoup

url ='https://www.wikifolio.com/de/de/w/wffalkinve'
html = urlopen(url)

soup = BeautifulSoup(html, 'lxml')
type(soup)



soup.find_all('tr')

# Print the first 10 rows for sanity check
rows = soup.find_all('tr')
print(rows[:10])

for row in rows:
    row_td = row.find_all('td')
print(row_td)
type(row_td)


str_cells = str(row_td)
cleantext = BeautifulSoup(str_cells, "lxml").get_text()
print(cleantext)

import re

list_rows = []
for row in rows:
    cells = row.find_all('td')
    str_cells = str(cells)
    clean = re.compile('<.*?>')
    clean2 = (re.sub(clean, '',str_cells))
    list_rows.append(clean2)
print(clean2)
type(clean2)

df = pd.DataFrame(list_rows)
df.head(10)

df1 = df[0].str.split(',', expand=True)
df1.head(10)

from bs4 import BeautifulSoup
import requests
a = requests.get("https://www.wikifolio.com/de/de/w/wffalkinve")
soup = BeautifulSoup(a.text, 'lxml')
# searching for the rows directly
rows = soup.find_all('tr', {'class': 'c-portfolio'})
print(rows[:100])