Python 从BeautifulSoup中删除非类型

Python 从BeautifulSoup中删除非类型,python,pandas,beautifulsoup,nonetype,Python,Pandas,Beautifulsoup,Nonetype,我试图用以下代码从我提取的数字中删除逗号: with requests.Session() as s: url = 'https://www.zoopla.co.uk/for-sale/property/london/paddington/?q=Paddington%2C%20London&results_sort=newest_listings&search_source=home' r = s.get(url, headers=req_headers)

我试图用以下代码从我提取的数字中删除逗号:

with requests.Session() as s:
    url = 'https://www.zoopla.co.uk/for-sale/property/london/paddington/?q=Paddington%2C%20London&results_sort=newest_listings&search_source=home'
    r = s.get(url, headers=req_headers)
    soup = BeautifulSoup(r.content, 'lxml')
    prices = []
    for price in soup.find_all('a', {"class":"listing-results-price text-price"}):
        prices.append(price.text)
        if price is None:
            print('none')
    df['price'] = prices
    df['price'] = df['price'].str.extract('(\d+([\d,]?\d)*(\.\d+)?)', expand=True) #remove extract numbers with commas
    df['price'] = df['price'].replace(',','', inplace = True)
这将返回一列,其中所有值均为无。是否仍要删除此非类型错误

在运行最后一行之前,数据帧如下所示:

         price
0          NaN
1    1,875,000
2    4,950,000
3      500,000
4      675,000
5      980,000
6      475,000
7      849,950
8    1,050,000
9    1,050,000
10     650,000
11   1,100,000
12   1,300,000
13     895,000
14   1,000,000
15  26,800,000
16   1,600,000
17     695,000
18   2,100,000
19     510,000
20   1,200,000
21   3,000,000
22     599,000
23  26,800,000
24   1,550,000
25     750,000
26   1,600,000
27   1,025,000

使用
df['price'].replace(',','',inplace=True)
,您正在替换
inplace
,它不会返回任何内容

你需要:

df['price'] = df['price'].str.replace(',','')
输出:

0        NaN
1    1875000
2    4950000
3     500000
4     675000
5     980000
6     475000
7     849950
8    1050000
9    1050000

作为参考,请使用
df['price']查看。replace(',','',inplace=True)
,您正在替换
inplace
,它不会返回任何内容

你需要:

df['price'] = df['price'].str.replace(',','')
输出:

0        NaN
1    1875000
2    4950000
3     500000
4     675000
5     980000
6     475000
7     849950
8    1050000
9    1050000

作为参考,请看一看

我建议您在构建数据帧之前,应在数据提取端对其进行处理。您可以按照以下方式构建列表:

from bs4 import BeautifulSoup
import requests
url = 'https://www.zoopla.co.uk/for-sale/property/london/paddington/?q=Paddington%2C%20London&results_sort=newest_listings&search_source=home'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
res_lis = [int(price.text.strip().split('\n')[0].replace('£', '').replace(',', '')) for price in soup.find_all('a', {"class":"listing-results-price text-price"}) if price]
print(res_lis)
结果:

[2000000, 549950, 1050000, 500000, 675000, 980000, 475000, 849950, 1050000, 1050000, 650000, 1100000, 1300000, 895000, 1000000, 26800000, 1600000, 695000, 2100000, 510000, 3000000, 1200000, 599000, 26800000, 1550000, 750000, 1600000, 1025000]

最好是在存储数据之前,根据要求尽可能多地构造/操作所有数据,这将是您的数据提取阶段,然后

我建议您在构造数据帧之前,应在数据提取结束时对其进行处理。您可以按以下方式构造列表:

from bs4 import BeautifulSoup
import requests
url = 'https://www.zoopla.co.uk/for-sale/property/london/paddington/?q=Paddington%2C%20London&results_sort=newest_listings&search_source=home'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
res_lis = [int(price.text.strip().split('\n')[0].replace('£', '').replace(',', '')) for price in soup.find_all('a', {"class":"listing-results-price text-price"}) if price]
print(res_lis)
结果:

[2000000, 549950, 1050000, 500000, 675000, 980000, 475000, 849950, 1050000, 1050000, 650000, 1100000, 1300000, 895000, 1000000, 26800000, 1600000, 695000, 2100000, 510000, 3000000, 1200000, 599000, 26800000, 1550000, 750000, 1600000, 1025000]

如果您在存储之前尽可能多地按照要求构造/操作所有数据(这将是您的数据提取阶段),然后

您能给我们显示输入数据吗?我建议您将您的if条件提前一行。:-)@HarvIpan在试图替换最后一行上的逗号之前,我已编辑了帖子以显示数据。@cWallenPole不幸的是,值仍以NoneWith
df['price']返回。替换(',','',inplace=True)
,您正在替换
inplace
,它不会返回任何内容。您可以向我们显示输入数据吗?我建议您将if条件提前一行。:-)@HarvIpan在试图替换最后一行的逗号之前,我已经编辑了这篇文章以显示数据。@cWallenPole不幸的是,值仍然以NoneWith
df['price']返回。替换(',','',inplace=True)
,您正在替换
inplace
,它不会返回任何内容。谢谢。不幸的是,这只返回与输入数据完全相同的列(带逗号的数字)。@James,嗯,你确定要用
df['price']=df['price'].str.replace(',','')
,请注意
.str
评估员。如果您不使用
.str
assesor,它将返回与您提到的相同的数据。谢谢。不幸的是,这只返回与输入数据完全相同的列(带逗号的数字)。@James,嗯,你确定要用
df['price']=df['price'].str.replace(',','')
,请注意
.str
评估员。如果不使用
.str
assesor,它将返回与您提到的相同的数据。