Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/297.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/75.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从网站中删除表格:无法寻址正确的表格_Python_Html_Web Scraping_Html Table - Fatal编程技术网

Python 从网站中删除表格:无法寻址正确的表格

Python 从网站中删除表格:无法寻址正确的表格,python,html,web-scraping,html-table,Python,Html,Web Scraping,Html Table,我是一名Python新手,刚刚开始学习它,但有以下问题:我想从网站上刮取公文包数据(;向下滚动,单击“公文包”);但我无法说明正确的tr类别“c-投资组合”,但始终以右侧第一个表格“Erstemission 20.09.2019”的值结束 我在reddit/stackoverflow上尝试了超过15种网络教程和问题/答案,但无法解决它,我想这在这个网站上很特别。下面是我最高级的代码 如果有任何建议,我将不胜感激!:) 最好的, 朱利安 其他尝试: import pandas as pd impo

我是一名Python新手,刚刚开始学习它,但有以下问题:我想从网站上刮取公文包数据(;向下滚动,单击“公文包”);但我无法说明正确的tr类别“c-投资组合”,但始终以右侧第一个表格“Erstemission 20.09.2019”的值结束

我在reddit/stackoverflow上尝试了超过15种网络教程和问题/答案,但无法解决它,我想这在这个网站上很特别。下面是我最高级的代码

如果有任何建议,我将不胜感激!:)

最好的, 朱利安

其他尝试:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

from urllib.request import urlopen
from bs4 import BeautifulSoup

url ='https://www.wikifolio.com/de/de/w/wffalkinve'
html = urlopen(url)

soup = BeautifulSoup(html, 'lxml')
type(soup)



soup.find_all('tr')

# Print the first 10 rows for sanity check
rows = soup.find_all('tr')
print(rows[:10])

for row in rows:
    row_td = row.find_all('td')
print(row_td)
type(row_td)


str_cells = str(row_td)
cleantext = BeautifulSoup(str_cells, "lxml").get_text()
print(cleantext)

import re

list_rows = []
for row in rows:
    cells = row.find_all('td')
    str_cells = str(cells)
    clean = re.compile('<.*?>')
    clean2 = (re.sub(clean, '',str_cells))
    list_rows.append(clean2)
print(clean2)
type(clean2)

df = pd.DataFrame(list_rows)
df.head(10)

df1 = df[0].str.split(',', expand=True)
df1.head(10)
编辑:更容易找到相应的表格:tr c类投资组合


如果您试图抓取的内容依赖于javascript加载,则Pandas、BS解决方案将无法工作。我建议使用硒铬无头解决方案。我会试试——现在我不知道硒是什么:D非常感谢!您或其他任何人是否知道有哪些资源可以尝试在桌子上涂硒?我找不到合适的解决方案,但我可能会看看相关的答案。

#Create empty list
col=[]
i=0

#For each row, store each first element (header) and an empty list
for t in tr_elements[0]:
    i+=1
    name=t.text_content()
    print('%d: %s' % (i,name))
    col.append((name,[]))
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

from urllib.request import urlopen
from bs4 import BeautifulSoup

url ='https://www.wikifolio.com/de/de/w/wffalkinve'
html = urlopen(url)

soup = BeautifulSoup(html, 'lxml')
type(soup)



soup.find_all('tr')

# Print the first 10 rows for sanity check
rows = soup.find_all('tr')
print(rows[:10])

for row in rows:
    row_td = row.find_all('td')
print(row_td)
type(row_td)


str_cells = str(row_td)
cleantext = BeautifulSoup(str_cells, "lxml").get_text()
print(cleantext)

import re

list_rows = []
for row in rows:
    cells = row.find_all('td')
    str_cells = str(cells)
    clean = re.compile('<.*?>')
    clean2 = (re.sub(clean, '',str_cells))
    list_rows.append(clean2)
print(clean2)
type(clean2)

df = pd.DataFrame(list_rows)
df.head(10)

df1 = df[0].str.split(',', expand=True)
df1.head(10)
from bs4 import BeautifulSoup
import requests
a = requests.get("https://www.wikifolio.com/de/de/w/wffalkinve")
soup = BeautifulSoup(a.text, 'lxml')
# searching for the rows directly
rows = soup.find_all('tr', {'class': 'c-portfolio'})
print(rows[:100])