Regex 在href上使用带有使用BeautifulSoup标记的正则表达式时出现的问题_Regex_Python 3.x_Web Scraping_Beautifulsoup

Regex 在href上使用带有使用BeautifulSoup标记的正则表达式时出现的问题

regex python-3.x web-scraping

Regex 在href上使用带有使用BeautifulSoup标记的正则表达式时出现的问题,regex,python-3.x,web-scraping,beautifulsoup,Regex,Python 3.x,Web Scraping,Beautifulsoup,试图根据包含特定字符串的href从标记中提取文本，下面是我的示例代码的一部分： Experience = soup.find_all(id='background-experience-container') Exp = {} for element in Experience: Exp['Experience'] = {} for element in Experience: role = element.find(href=re.compile("title").ge

试图根据包含特定字符串的

href

从标记中提取文本，下面是我的示例代码的一部分：

Experience = soup.find_all(id='background-experience-container')

Exp = {}

for element in Experience:
    Exp['Experience'] = {}


for element in Experience:
    role = element.find(href=re.compile("title").get_text()
    Exp['Experience']["Role"] = role


for element in Experience:
    company = element.find(href=re.compile("exp-company-name").get_text()
    Exp['Experience']['Company'] = company

它不喜欢我定义

Exp['outer\u key']['inner\u key']=value的语法，它返回SyntaxError

我正在尝试创建一个Dict.Dict
，其中包含有关角色和公司的信息，还希望包含每个角色和公司的日期，但还没有到那个程度
有人能在我的代码中发现任何明显的错误吗
非常感谢您的帮助
 find\u all
可以返回许多值（即使您通过id
搜索），因此最好使用list
保留所有值-Exp=[]

Experience = soup.find_all(id='background-experience-container')

# create empty list
Exp = []

for element in Experience:
    # create empty dictionary
    dic = {}

    # add elements to dictionary
    dic['Role'] = element.find(href=re.compile("title")).get_text()
    dic['Company'] = element.find(href=re.compile("exp-company-name")).get_text()

    # add dictionary to list
    Exp.append(dic)

# display

print(Exp[0]['Role'])
print(Exp[0]['Company'])

print(Exp[1]['Role'])
print(Exp[1]['Company'])

# or

for x in Exp:
    print(x['Role'])
    print(x['Company'])

如果您确定find\u all
只提供一个元素（并且您需要键“体验”
），那么您可以这样做
Experience = soup.find_all(id='background-experience-container')

# create main dictionary
Exp = {}

for element in Experience:
    # create empty dictionary
    dic = {}

    # add elements to dictionary
    dic['Role'] = element.find(href=re.compile("title")).get_text()
    dic['Company'] = element.find(href=re.compile("exp-company-name")).get_text()

    # add dictionary to main dictionary
    Exp['Experience'] = dic

# display

print(Exp['Experience']['Role'])
print(Exp['Experience']['Company'])

或
它的可能副本似乎是Exp['Experience'][“Role”]=Role
无法工作，因为它基本上未初始化。从另一个问题来看，似乎您可以使用.append（…）
来代替或预先初始化数组。@Lupinity mine是一个稍微不同的问题-我想构建如下输出：{Experience:{role:role\u name，company:company\u name}，{role:role\u name，company:company\u name}，}感谢您的回复，我已尝试使用您的第一个解决方案修改我的代码，但是我得到了一个错误dic['Company']=element.find（href=re.compile（“exp Company name”）.get\text（）^SyntaxError:invalid syntaxe）
之前的dic['Company']=element.find（href=re.compile（“exp compile Company name”）.get\text（）

Experience = soup.find_all(id='background-experience-container')

# create main dictionary
Exp = {}

for element in Experience:
    Exp['Experience'] = {
       'Role': element.find(href=re.compile("title")).get_text()
       'Company': element.find(href=re.compile("exp-company-name")).get_text()
    }

# display

print(Exp['Experience']['Role'])
print(Exp['Experience']['Company'])