Python 使用str.extract从文本列表中提取dataframe列

Python 使用str.extract从文本列表中提取dataframe列,python,pandas,Python,Pandas,我试图将带有div和span标记的字符串列表转换为带有两列的pandas数据帧;一个是价格,另一个是车型 这是初始列表的一个示例 [<div class="related-ad-content"><div class="title mult-lines-lt-1280"><a class="related-ad-title" href="/a-cars-bakkies/foreshore/available-24-7+call-now+jeep-wrangler-u

我试图将带有div和span标记的字符串列表转换为带有两列的pandas数据帧;一个是价格,另一个是车型

这是初始列表的一个示例

[<div class="related-ad-content"><div class="title mult-lines-lt-1280"><a class="related-ad-title" href="/a-cars-bakkies/foreshore/available-24-7+call-now+jeep-wrangler-unlimited-3-8l-rubicon/1007198433990910332475709"><span>AVAILABLE 24/7-CALL NOW-Jeep Wrangler Unlimited 3.8L Rubicon</span> </a></div><div class="price"><span class="value wrapper"><span class="ad-price">
                     R 279,900

                 </span></span></div><div class="property-info"><span class="icon-calendar-green"></span><span class="property-label">2008</span><span class="icon-mileageV2"></span><span class="property-label">95,000km</span><span class="icon-fuel-type hidden-when-lt-320"></span><span class="property-label hidden-when-lt-320">Petrol</span><span class="icon-transmission hidden-lt-small"></span><span class="property-label hidden-lt-small">Manual</span></div><div class="description-content has-seller-avatar" data-desc-cfg='{"toggleable":true,"splitMin":550,"splitMax":900}' data-is-desc-toggable="true" data-is-pre-desc-shorter-than-split-min="true"><span class="related-ad-description"><span class="description-text">AVAILABLE 24/7-CALL NOW-FINANCE TEAM READY FOR YOUR CALL.WE HAVE A SOLUTION.Jeep Wrangler Unlimited 3.8 Rubicon Manual with only 95000km, last service done at 95000km. 2008Extras include: Tow bar, Spot lamps, Rock sliders, FOX suspension with body lift kit, Alloy wheels, 5 x Mud Terrain Tyres, Maniac Front and rear off –road bumpers, Navigation and smash &amp; grab tint. Spare keys also available. This RUBICON JEEP has superb off-road capabilities with unrivalled reliability in the 4x4 market.Interior is untarnished and she was always garage with a<span> meticulous previous owner. This vehicle offers superb value for money. Come in today and test drive this car.Don’t delay its priced to sell. Fuel consumption 11.2km/L or 8.9/100km on highway use. Tank capacity 85L and service interval’s 12000km. FINANCE AVAILABLE WITH ALL THE MAJOR BANKS.Same day APPROVAL and DELIVERY, call us to get pre-approved.</span><span class="toggle-suffix-description hidden">...</span></span><span class="link-go-vip">Read More</span></span></div><div class="seller-avatar"><!--M^s0-0-2-0-16-23-86-3-10-29-srpPremiumCarAds-seller-avatar-0 s0-0-2-0-16-23-86-3-10-29 srpPremiumCarAds-seller-avatar-0--><div class="bolt-img bolt-image loading-container"><img alt="Alpine Autohaus" class="lazyload" data-src="https://i.ebayimg.com/images/g/eG4AAOSwlSZefGC3/s-l100.jpg" onload="this.parentNode.classList.add('lazyloaded');"/></div><!--M/--></div><div class="location-date"><i class="icon-location-related-ads"></i><span>Foreshore </span><span class="creation-date"><span>20 mins ago</span></span></div><div class="actions-bar"><div class="watchListV2" data-adid="1007198433990910332475709" data-is-user-logged-in="false" data-short-adid="719843399"><div class="save"><i class="icon icon-love-red"></i><span class="text-save-full"><span class="save-added hidden">Added to List</span><span class="save-add">Add to My List</span></span><span class="text-save-short"><span class="save-added hidden">Added</span><span class="save-add">My List</span></span></div></div><span class="separator"></span><span class="contact lt-1280">Contact</span><span class="contact gt-1280">Contact Seller</span></div></div>,
 <div class="related-ad-content"><div class="title mult-lines-lt-1280"><a class="related-ad-title" href="/a-cars-bakkies/foreshore/available-24-7+call-now+chevrolet-utility-1-4-ac/1007198427660910332475709"><span>AVAILABLE 24/7-CALL NOW-Chevrolet Utility 1.4 AC</span> </a></div><div class="price"><span class="value wrapper"><span class="ad-price">
                     R 124,900

                 </span></span></div>]

我建议使用不同的方法。首先使用BeautifulSoup解析HTML,提取所有相关标记,然后最终使用所提取的数据创建一个数据框

差不多

from bs4 import BeautifulSoup


soup = BeautifulSoup(listings)
ads_nodes = soup.find_all('div', {'related-ad-content'})

def get_price(ad):
    # look for span tag with class ad-price
    return ad.find('span', {'class': 'ad-price'}).get_text(strip=True)

def get_model(ad):
    # look for span tag inside an a tag with class ad-price
    return ad.find('a', {'class': 'related-ad-title'}).find('span').get_text(strip=True)

def parse_ads(ads):
    for ad in ads:
        yield {
            'model': get_model(ad),
            'price': get_price(ad)
        }

df = pd.DataFrame(parse_ads(ads_nodes))


model   price
0   AVAILABLE 24/7-CALL NOW-Jeep Wrangler Unlimite...   R 279,900
1   AVAILABLE 24/7-CALL NOW-Chevrolet Utility 1.4 AC    R 124,900

你期望得到什么?@wwnde用我的期望更新了这个问题output@Emm
[[24/7-CALL NOW Jeep Wrangler Unlimited 3.8升Rubicon,]
此列表是列表还是dataframe@komatiraju032抱歉,这是dataframeQuick问题中的一行,为什么要在此上下文中使用yield关键字,我是否也可以创建一个for循环来迭代每个ad,将model指定为键,将函数的输出指定为值?出于好奇,这也可能是另一种选择。一般来说,我是发电机的忠实粉丝:-)
a =pd.DataFrame(listings)
a[0].str.extract('<span>(?P<first>.*?)<span class="ad-price">(?P<price>.*?)</span>')
first                                                            price

AVAILABLE 24/7-CALL NOW-Jeep Wrangler Unlimited 3.8L Rubicon     \n R 279,900\n \n
from bs4 import BeautifulSoup


soup = BeautifulSoup(listings)
ads_nodes = soup.find_all('div', {'related-ad-content'})

def get_price(ad):
    # look for span tag with class ad-price
    return ad.find('span', {'class': 'ad-price'}).get_text(strip=True)

def get_model(ad):
    # look for span tag inside an a tag with class ad-price
    return ad.find('a', {'class': 'related-ad-title'}).find('span').get_text(strip=True)

def parse_ads(ads):
    for ad in ads:
        yield {
            'model': get_model(ad),
            'price': get_price(ad)
        }

df = pd.DataFrame(parse_ads(ads_nodes))


model   price
0   AVAILABLE 24/7-CALL NOW-Jeep Wrangler Unlimite...   R 279,900
1   AVAILABLE 24/7-CALL NOW-Chevrolet Utility 1.4 AC    R 124,900