使用python从站点提取文本

使用python从站点提取文本,python,web-scraping,Python,Web Scraping,我尝试提取其中的一些文本,但我被困在某个地方: import pandas as pd import requests from bs4 import BeautifulSoup res = requests.get('https://www.legit.ng/1087216-igbo-proverbs-meaning.html') soup = BeautifulSoup(res.content, 'html.parser') data1 = [] for i in soup.findA

我尝试提取其中的一些文本,但我被困在某个地方:

import  pandas as pd
import requests
from bs4 import BeautifulSoup

res = requests.get('https://www.legit.ng/1087216-igbo-proverbs-meaning.html')
soup = BeautifulSoup(res.content, 'html.parser')

data1 = []
for i in soup.findAll('div'):
    name = i.find('blockquote')
    if name:
        data.append(name.text.split('–')[1])

        data
结果:

[' Gidi gidi bụ ugwu eze.',
 ' Gidi gidi bụ ugwu eze.',
 ' Gidi gidi bụ ugwu eze.',
 ' Gidi gidi bụ ugwu eze.',
 ' Gidi gidi bụ ugwu eze.',
 ' Gidi gidi bụ ugwu eze.']

我只希望blockquote元素具有Igbo谚语,而不是网站中的英语含义

在每个
div
中都有许多
blockquote
,因此您应该使用
查找所有('blockquote')
来获取它们。当前,从每个
div

import requests
from bs4 import BeautifulSoup

res = requests.get('https://www.legit.ng/1087216-igbo-proverbs-meaning.html')
soup = BeautifulSoup(res.content, 'html.parser')

data = []
for div in soup.find_all('div'):
    for block in div.find_all('blockquote'):
        text = block.text
        if text.startswith('Igbo Proverb – '):
            #name = text.split('–')[1].strip()
            name = text[15:]  # len('Igbo Proverb – ') == 15
            data.append(name)

for item in data:
    print(item)
结果

Gidi gidi bụ ugwu eze.
Otu onye tuo izu, o gbue ochu
Oge adighi eche mmadu
Ihe di woro ogori azuala na ahia.
O bulu na i taa m aru n'ike, ma i zeghi nshi; mu taa gi aru n'isi, agaghi m ezere uvulu.
Ihere adịghị eme onye ara ka ọ na-eme ụmụ-nna ya.
Nwunye awo si na di atoka uto, ya jiri nuta nke ya kworo ya n'azu.
Nwaanyi muta ite ofe mmiri mmiri, di ya amuta ipi utara aka were suru ofe.
Gidi gidi bụ ugwu eze.
Otu onye tuo izu, o gbue ochu
Oge adighi eche mmadu
Ihe di woro ogori azuala na ahia.
O bulu na i taa m aru n'ike, ma i zeghi nshi; mu taa gi aru n'isi, agaghi m ezere uvulu.
Ihere adịghị eme onye ara ka ọ na-eme ụmụ-nna ya.
Nwunye awo si na di atoka uto, ya jiri nuta nke ya kworo ya n'azu.
Nwaanyi muta ite ofe mmiri mmiri, di ya amuta ipi utara aka were suru ofe.
Gidi gidi bụ ugwu eze.
Otu onye tuo izu, o gbue ochu
Oge adighi eche mmadu
Ihe di woro ogori azuala na ahia.
O bulu na i taa m aru n'ike, ma i zeghi nshi; mu taa gi aru n'isi, agaghi m ezere uvulu.
Ihere adịghị eme onye ara ka ọ na-eme ụmụ-nna ya.
Nwunye awo si na di atoka uto, ya jiri nuta nke ya kworo ya n'azu.
Nwaanyi muta ite ofe mmiri mmiri, di ya amuta ipi utara aka were suru ofe.
Gidi gidi bụ ugwu eze.
Otu onye tuo izu, o gbue ochu
Oge adighi eche mmadu
Ihe di woro ogori azuala na ahia.
O bulu na i taa m aru n'ike, ma i zeghi nshi; mu taa gi aru n'isi, agaghi m ezere uvulu.
Ihere adịghị eme onye ara ka ọ na-eme ụmụ-nna ya.
Nwunye awo si na di atoka uto, ya jiri nuta nke ya kworo ya n'azu.
Nwaanyi muta ite ofe mmiri mmiri, di ya amuta ipi utara aka were suru ofe.
Gidi gidi bụ ugwu eze.
Otu onye tuo izu, o gbue ochu
Oge adighi eche mmadu
Ihe di woro ogori azuala na ahia.
O bulu na i taa m aru n'ike, ma i zeghi nshi; mu taa gi aru n'isi, agaghi m ezere uvulu.
Ihere adịghị eme onye ara ka ọ na-eme ụmụ-nna ya.
Nwunye awo si na di atoka uto, ya jiri nuta nke ya kworo ya n'azu.
Nwaanyi muta ite ofe mmiri mmiri, di ya amuta ipi utara aka were suru ofe.
Gidi gidi bụ ugwu eze.
Otu onye tuo izu, o gbue ochu
Oge adighi eche mmadu
Ihe di woro ogori azuala na ahia.
O bulu na i taa m aru n'ike, ma i zeghi nshi; mu taa gi aru n'isi, agaghi m ezere uvulu.
Ihere adịghị eme onye ara ka ọ na-eme ụmụ-nna ya.
Nwunye awo si na di atoka uto, ya jiri nuta nke ya kworo ya n'azu.
Nwaanyi muta ite ofe mmiri mmiri, di ya amuta ipi utara aka were suru ofe.

在每个
div
中都有许多
blockquote
,因此您应该使用
find\u all('blockquote')