Python生成两个数据集实例_Python_Python 2.7

Python生成两个数据集实例

python python-2.7

Python生成两个数据集实例,python,python-2.7,Python,Python 2.7,编程新手，这似乎是一个基本问题，但我无法解决。下面的代码创建了一个.txt文件，其中给出了最后一个数据集的两个实例有人能解释一下为什么这段代码生成的数据集是上一个数据集的两倍吗？谢谢导入urllib 进口稀土 ##NL东部统计数据。 teamstate=[“wsh”、“phi”、“nym”、“mia”、“atl”] 球队名单=[“华盛顿国民队”、“费城费城人队”、“纽约大都会队”、“迈阿密马林队”、“亚特兰大勇士队”] 球队名单=[“华盛顿国民队”、“费城费城人队”、“纽约大都会队”、“迈

编程新手，这似乎是一个基本问题，但我无法解决。下面的代码创建了一个.txt文件，其中给出了最后一个数据集的两个实例

有人能解释一下为什么这段代码生成的数据集是上一个数据集的两倍吗？谢谢

导入urllib
进口稀土
##NL东部统计数据。
teamstate=[“wsh”、“phi”、“nym”、“mia”、“atl”]
球队名单=[“华盛顿国民队”、“费城费城人队”、“纽约大都会队”、“迈阿密马林队”、“亚特兰大勇士队”]
球队名单=[“华盛顿国民队”、“费城费城人队”、“纽约大都会队”、“迈阿密马林鱼队”、“亚特兰大勇士队”]
j=0
i=0
而（我有几件事：
在列表上使用zip
。这几乎将它们组合成一个由元组组成的列表，元组中的元素相互匹配。由于您已将元素正确排序，因此这将毫无痛苦地工作
如果您查看页面，大约有7或8个元素与您的正则表达式匹配。使用re.findall
将返回一个列表，因此如果您希望正确获得击球平均值（列表中的第二个），则需要在此进行一些转换
上面第2条主要是代码返回以下内容的原因：
the batting average of the Washington Nationals  is:  ['22', '.304', '.362', '.530', '3.21', '2', '0.93', '.179']
the batting average of the Philadelphia Phillies  is:  ['19', '.306', '.364', '.468', '5.96', '2', '1.75', '.311']
the batting average of the New York Mets  is:  ['10', '.179', '.243', '.337', '6.75', '2', '1.64', '.304']
the batting average of the Miami Marlins  is:  ['27', '.301', '.358', '.451', '3.00', '2', '1.31', '.268']
the batting average of the Atlanta Braves  is:  ['6', '.179', '.225', '.337', '1.38', '3', '0.85', '.184']
[Finished in 19.0s]

稍微改变一下你的方法：
import urllib
import re
##NL East stats.
teamstate = ["wsh","phi","nym","mia","atl"]
teamnamelist = ["washington-nationals","philadelphia-phillies","new-york-mets","miami-marlins","atlanta-braves"]
teamlist = ["Washington Nationals","Philadelphia Phillies","New York Mets","Miami Marlins","Atlanta Braves"]

for x, y, z in zip(teamstate, teamnamelist, teamlist):
    url = "http://espn.go.com/mlb/team/_/name/%s/%s" % (x, y)
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()
    regex = '<span class="stat">(.+?)</span>'
    pattern = re.compile(regex)
    val = re.findall(pattern,htmltext)[1]
    print "The batting average of the %s is %s." % (z, str(val))

使用lxml
和请求
（因为从长远来看它更快）：
结果:
The batting average of the Washington Nationals is .304.
The batting average of the Philadelphia Phillies is .306.
The batting average of the New York Mets is .179.
The batting average of the Miami Marlins is .301.
The batting average of the Atlanta Braves is .179.
[Finished in 22.5s]

The batting average of the Washington Nationals is .304.
The batting average of the Philadelphia Phillies is .306.
The batting average of the New York Mets is .179.
The batting average of the Miami Marlins is .301.
The batting average of the Atlanta Braves is .179.
[Finished in 10.6s]

让我们知道这是否有帮助。
你能再检查一下你的缩进吗…它看起来不像是在复制/粘贴i
和j
时幸存下来的，它们的值总是完全相同的，所以你可以先去掉其中一个。在zip（teamstate，teamnamelist，teamlist）中为x，y，z使用
似乎比方法更漂亮，而方法则更漂亮。此外，你有很多元素与你的正则表达式匹配。正如你所见，我的技能集实际上不存在。如果你觉得有帮助，请点击我文章顶部附近的绿色复选图标接受答案。
import requests as rq
from lxml import html

teamstate = ["wsh","phi","nym","mia","atl"]
teamnamelist = ["washington-nationals","philadelphia-phillies","new-york-mets","miami-marlins","atlanta-braves"]
teamlist = ["Washington Nationals","Philadelphia Phillies","New York Mets","Miami Marlins","Atlanta Braves"]

for x, y, z in zip(teamstate, teamnamelist, teamlist):
    url = "http://espn.go.com/mlb/team/_/name/%s/%s" % (x, y)
    r = rq.get(url)
    tree = html.fromstring(r.text)
    val = tree.xpath("//span[@class='stat']/text()")[1]
    print "The batting average of the %s is %s." % (z, str(val))

The batting average of the Washington Nationals is .304.
The batting average of the Philadelphia Phillies is .306.
The batting average of the New York Mets is .179.
The batting average of the Miami Marlins is .301.
The batting average of the Atlanta Braves is .179.
[Finished in 10.6s]