Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/maven/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x TypeError:在执行web抓取时_Python 3.x_Pandas_Web Scraping - Fatal编程技术网

Python 3.x TypeError:在执行web抓取时

Python 3.x TypeError:在执行web抓取时,python-3.x,pandas,web-scraping,Python 3.x,Pandas,Web Scraping,我只是在抓取数据,想做两列标题和日期,但出现了TypeError TypeError:from_dict()获得意外的关键字参数“columns” 代码: import requests from bs4 import BeautifulSoup import pandas as pd url = 'https://timesofindia.indiatimes.com/topic/Hiv' while True: response=requests.get(url)

我只是在抓取数据,想做两列标题和日期,但出现了TypeError

TypeError:from_dict()获得意外的关键字参数“columns”

代码:

import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://timesofindia.indiatimes.com/topic/Hiv'

    while True:
        response=requests.get(url)
        soup = BeautifulSoup(response.content,'html.parser')
        content = soup.find_all('div',{'class': 'content'})


    for contents in content:
        title_tag = contents.find('span',{'class':'title'})
        title= title_tag.text[1:-1] if title_tag else 'N/A'
        date_tag = contents.find('span',{'class':'meta'})
        date = date_tag.text if date_tag else 'N/A'

        hiv={title : date}
        print(' title : ', title ,' \n date : ' ,date )



    url_tag = soup.find('div',{'class':'pagination'})
    if url_tag.get('href'):
        url = 'https://timesofindia.indiatimes.com/' + url_tag.get('href')
        print(url)    
    else:
        break
hiv1 = pd.DataFrame.from_dict(hiv , orient = 'index' , columns = ['title' ,'date'])    

pandas被更新到0.23.4版,然后也出现了错误。

我注意到的第一件事是字典的构造被关闭。我假设你想要整本字典的标题:日期。你现在的方式只会保留最后一种

然后,当您这样做时,数据帧的索引是键,值是序列/列。所以技术上只有一列。我可以通过重置索引来创建这两个列,然后将该索引放入我重命名的列中
'title'

import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://timesofindia.indiatimes.com/topic/Hiv'


response=requests.get(url)
soup = BeautifulSoup(response.content,'html.parser')
content = soup.find_all('div',{'class': 'content'})

hiv = {}
for contents in content:
    title_tag = contents.find('span',{'class':'title'})
    title= title_tag.text[1:-1] if title_tag else 'N/A'
    date_tag = contents.find('span',{'class':'meta'})
    date = date_tag.text if date_tag else 'N/A'

    hiv.update({title : date})
    print(' title : ', title ,' \n date : ' ,date )

hiv1 = pd.DataFrame.from_dict(hiv , orient = 'index' , columns = ['date'])  
hiv1 = hiv1.rename_axis('title').reset_index()
输出:

print (hiv1)
                                                title                  date
0   I told my boyfriend I was HIV positive and thi...           01 Dec 2018
1   Pay attention to these 7 very common HIV sympt...           30 Nov 2018
2   Transfusion of HIV blood: Panel seeks time til...  2019-01-06T03:54:33Z
3   No. of pregnant women testing HIV+ dips; still...           01 Dec 2018
4                             Busted:5 HIV AIDS myths           30 Nov 2018
5                    Myths and taboos related to AIDS           01 Dec 2018
6                                                 N/A                   N/A
7   Mumbai: Free HIV tests at six railway stations...           23 Nov 2018
8   HIV blood tranfusion: Tamil Nadu govt assures ...  2019-01-05T09:05:27Z
9     Autopsy performed on HIV+ve donor’s body at GRH  2019-01-03T07:45:03Z
10  Madras HC directs to videograph HIV+ve donor’s...  2019-01-01T01:23:34Z
11  HIV +ve Tamil Nadu teen who attempted suicide ...  2018-12-31T03:37:56Z
12    Another woman claims she got HIV-infected blood  2018-12-31T06:34:32Z
13    Another woman says she got HIV from donor blood           29 Dec 2018
14  HIV case: Five-member panel begins inquiry in ...           29 Dec 2018
15  Pregnant woman turns HIV positive after blood ...           26 Dec 2018
16  Pregnant woman contracts HIV after blood trans...           26 Dec 2018
17  Man attacks niece born with HIV for sleeping i...           16 Dec 2018
18  Health ministry implements HIV AIDS Act 2017: ...           11 Sep 2018
19  When meds don’t heal: HIV+ kids fight daily wa...           03 Sep 2018
但我不太清楚你为什么会出错。这没有意义,因为您正在使用更新的熊猫。也许卸载Pandas,然后重新安装它

否则,我想您只需在两行中完成,并在转换为dataframe后命名列:

hiv1 = pd.DataFrame.from_dict(hiv, orient = 'index').reset_index()
hiv1.columns = ['title','date']

问题与
机器学习无关
-请不要垃圾邮件标签(删除并替换为
熊猫
)。您可以发布您的数据/词典样本吗?在我的数据集上尝试时,我没有收到该错误。你确定pandas版本为0.23.4吗?
将pandas作为pd导入,然后
pd.\uuuu版本\uuuu
要确保你使用的是0.23.4,我使用的是pandas 0.23.4