Python 使用for循环遍历多个字符串的列表_Python_List_Loops_Wikipedia

Python 使用for循环遍历多个字符串的列表

python list loops

Python 使用for循环遍历多个字符串的列表,python,list,loops,wikipedia,Python,List,Loops,Wikipedia,我对用Python编写代码相当陌生。对于个人项目，我正在寻找从维基百科页面列表中检索生日和死亡日期的不同方法。我正在使用软件包我尝试实现这一点的一种方法是迭代Wikipedia摘要，并在我计算一行中的四位数时返回索引 import wikipedia as wp names = ('Zaha Hadid', 'Rem Koolhaas') wiki_summary = wp.summary(names) b_counter = 0 i_b_year = [] d_counter = 0 i_

我对用Python编写代码相当陌生。对于个人项目，我正在寻找从维基百科页面列表中检索生日和死亡日期的不同方法。我正在使用软件包

我尝试实现这一点的一种方法是迭代Wikipedia摘要，并在我计算一行中的四位数时返回索引

import wikipedia as wp

names = ('Zaha Hadid', 'Rem Koolhaas')
wiki_summary = wp.summary(names)
b_counter = 0
i_b_year = []
d_counter = 0
i_d_year = []

for i,x in enumerate(wiki_summary):
    if x.isdigit() == True:
        b_counter += 1
        if b_counter == 4:
           i_b_year = i
           break
        else:
            continue        
    else:
        b_counter = 0

到目前为止，这适用于我列表中的第一个人，但我希望迭代我的

名称

列表中的所有名称。是否有一种方法可以使用for循环查找索引，并使用for循环迭代

名称

我知道还有其他类似解析的方法来查找

bday

标记，但我想尝试几种不同的解决方案。

我不熟悉Wikipedia包，但您似乎可以在名称元组上进行迭代：

import Wikipedia as wp

names = ('Zaha Hadid', 'Rem Koolhaas')

i_b_year = []
for name in names: #This line is new
    wiki_summary = wp.summary(name) #Just changed names for name
    b_counter = 0
    d_counter = 0
    i_d_year = []

    for i,x in enumerate(wiki_summary):
        if x.isdigit() == True:
            b_counter += 1
            if b_counter == 4:
               i_b_year.append(i) #I am guessing you want this list to increase with each name in names. Thus, 'append'.
               break
            else:
                continue        
        else:
            b_counter = 0

我不熟悉Wikipedia软件包，但似乎您可以只迭代名称元组：

import Wikipedia as wp

names = ('Zaha Hadid', 'Rem Koolhaas')

i_b_year = []
for name in names: #This line is new
    wiki_summary = wp.summary(name) #Just changed names for name
    b_counter = 0
    d_counter = 0
    i_d_year = []

    for i,x in enumerate(wiki_summary):
        if x.isdigit() == True:
            b_counter += 1
            if b_counter == 4:
               i_b_year.append(i) #I am guessing you want this list to increase with each name in names. Thus, 'append'.
               break
            else:
                continue        
        else:
            b_counter = 0

您正在尝试：

声明两个空列表以存储每个人的出生年份和死亡年份

从元组中获取每个人的维基百科摘要

用摘要中的4位数字解析前两个数字，并将它们附加到出生年份和死亡年份列表中

问题是这些人的摘要可能不包括出生年份和死亡年份作为前两个4位数字。例如，维基百科的摘要将他的出生年份列为第一个4位数字，但第二个4位数字在这行：

2005年，他与马克·维格利和奥勒·鲍曼共同创办了Volume杂志。

我们可以看到，

出生年份

和

死亡年份

列表可能不包含准确信息

以下是实现您想要实现的目标的代码：

import wikipedia as wp

names = ('Zaha Hadid', 'Rem Koolhaas')
i_b_year = []
i_d_year = []

for person_name in names:
    wiki_summary = wp.summary(person_name)
    birth_year_found = False
    death_year_found = False
    digits = ""    

    for c in wiki_summary:
        if c.isdigit() == True:
            if birth_year_found == False:                
                digits += c
                if len(digits) == 4:
                    birth_year_found = True
                    i_b_year.append(int(digits))
                    digits = ""
            elif death_year_found == False:
                digits += c
                if len(digits) == 4:
                    death_year_found = True
                    i_d_year.append(int(digits))
                    break
        else:
            digits = ""
    if birth_year_found == False:
        i_b_year.append(0)
    if death_year_found == False:
        i_d_year.append(0)

for i in range(len(names)):
    print(names[i], i_b_year[i], i_d_year[i])

输出：

Zaha Hadid 1950 2016
Rem Koolhaas 1944 2005

免责声明：在上述代码中，如果在任何人的摘要中找不到两个4位数的数字，我将附加0。正如我已经提到的，维基百科摘要没有断言会将一个人的出生年份和死亡年份列为前两个4位数。这些列表可能包含错误的信息。

您试图：

声明两个空列表以存储每个人的出生年份和死亡年份

从元组中获取每个人的维基百科摘要

用摘要中的4位数字解析前两个数字，并将它们附加到出生年份和死亡年份列表中

2005年，他与马克·维格利和奥勒·鲍曼共同创办了Volume杂志。

我们可以看到，

出生年份

和

死亡年份

列表可能不包含准确信息

以下是实现您想要实现的目标的代码：

import wikipedia as wp

names = ('Zaha Hadid', 'Rem Koolhaas')
i_b_year = []
i_d_year = []

for person_name in names:
    wiki_summary = wp.summary(person_name)
    birth_year_found = False
    death_year_found = False
    digits = ""    

    for c in wiki_summary:
        if c.isdigit() == True:
            if birth_year_found == False:                
                digits += c
                if len(digits) == 4:
                    birth_year_found = True
                    i_b_year.append(int(digits))
                    digits = ""
            elif death_year_found == False:
                digits += c
                if len(digits) == 4:
                    death_year_found = True
                    i_d_year.append(int(digits))
                    break
        else:
            digits = ""
    if birth_year_found == False:
        i_b_year.append(0)
    if death_year_found == False:
        i_d_year.append(0)

for i in range(len(names)):
    print(names[i], i_b_year[i], i_d_year[i])

输出：

Zaha Hadid 1950 2016
Rem Koolhaas 1944 2005

免责声明：在上述代码中，如果在任何人的摘要中找不到两个4位数的数字，我将附加0。正如我已经提到的，没有人断言维基百科摘要会将一个人的出生年份和死亡年份列为前两个4位数字。这些列表可能包含错误的信息。

首先，由于以下几个原因，您的代码无法工作：

导入wikipedia只能使用第一个小写字母

导入wikipedia

summary

方法接受字符串（在您的案例名称中），因此您必须为集合中的每个名称调用它

抛开这一切，让我们试着实现你的目标：

import wikipedia as wp
import re

# First thing we see (at least for pages provided) is that dates all share the same format:
# For those who are no longer with us 31 October 1950 – 31 March 2016
# For those who are still alive 17 November 1944
# So we have to build regex patterns to find those
# First is the months pattern, since it's quite a big one
MONTHS_PATTERN = r"January|February|March|April|May|June|July|August|September|October|November|December"
# Next we build our date pattern, double curly braces are used for literal text
DATE_PATTERN = re.compile(fr"\d{{1,2}}\s({MONTHS_PATTERN})\s\d{{,4}}")
# Declare our set of names, great choice of architects BTW :)
names = ('Zaha Hadid', 'Rem Koolhaas')
# Since we're trying to get birthdays and dates of death, we will create a dictionary for storing values
lifespans = {}
# Iterate over them in a loop
for name in names:
    lifespan = {'birthday': None, 'deathday': None}
    try:
        summary = wp.summary(name)
        # First we find the first date in summary, since it's most likely to be the birthday
        first_date = DATE_PATTERN.search(summary)
        if first_date:
            # If we've found a date – suppose it's birthday
            bday = first_date.group()
            lifespan['birthday'] = bday
            # Let's check whether the person is no longer with us
            LIFESPAN_PATTERN = re.compile(fr"{bday}\s–\s{DATE_PATTERN.pattern}")
            lifespan_found = LIFESPAN_PATTERN.search(summary)
            if lifespan_found:
                lifespan['deathday'] = lifespan_found.group().replace(f"{bday} – ", '')
            lifespans[name] = lifespan
        else:
            print(f'No dates were found for {name}')
    except wp.exceptions.PageError:
        # Handle not found page, so that code won't break
        print(f'{name} was not found on Wikipedia')
        pass

# Print result
print(lifespans)

提供的名称的输出：

{'Zaha Hadid': {'birthday': '31 October 1950', 'deathday': '31 March 2016'}, 'Rem Koolhaas': {'birthday': '17 November 1944', 'deathday': None}}

这种方法效率低下，并且有很多缺陷，比如如果我们得到一个包含符合正则表达式的日期的页面，而不是生日和死亡日。这很难看（尽管我已经尽了最大的努力：），你最好还是分析一下标签

如果您对维基百科的日期格式不满意，我建议您查看

datetime

。此外，考虑到<强>这些正则表达式<强> > <强>这两个特定的页< /强>，我没有对维基百科中的日期如何表示进行任何研究。因此，如果有任何不一致之处，我建议您坚持使用解析标记