Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/python-2.7/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python生成两个数据集实例_Python_Python 2.7 - Fatal编程技术网

Python生成两个数据集实例

Python生成两个数据集实例,python,python-2.7,Python,Python 2.7,编程新手,这似乎是一个基本问题,但我无法解决。下面的代码创建了一个.txt文件,其中给出了最后一个数据集的两个实例 有人能解释一下为什么这段代码生成的数据集是上一个数据集的两倍吗? 谢谢 导入urllib 进口稀土 ##NL东部统计数据。 teamstate=[“wsh”、“phi”、“nym”、“mia”、“atl”] 球队名单=[“华盛顿国民队”、“费城费城人队”、“纽约大都会队”、“迈阿密马林队”、“亚特兰大勇士队”] 球队名单=[“华盛顿国民队”、“费城费城人队”、“纽约大都会队”、“迈

编程新手,这似乎是一个基本问题,但我无法解决。下面的代码创建了一个.txt文件,其中给出了最后一个数据集的两个实例

有人能解释一下为什么这段代码生成的数据集是上一个数据集的两倍吗? 谢谢

导入urllib
进口稀土
##NL东部统计数据。
teamstate=[“wsh”、“phi”、“nym”、“mia”、“atl”]
球队名单=[“华盛顿国民队”、“费城费城人队”、“纽约大都会队”、“迈阿密马林队”、“亚特兰大勇士队”]
球队名单=[“华盛顿国民队”、“费城费城人队”、“纽约大都会队”、“迈阿密马林鱼队”、“亚特兰大勇士队”]
j=0
i=0
而(我有几件事:

  • 在列表上使用
    zip
    。这几乎将它们组合成一个由元组组成的列表,元组中的元素相互匹配。由于您已将元素正确排序,因此这将毫无痛苦地工作
  • 如果您查看页面,大约有7或8个元素与您的正则表达式匹配。使用
    re.findall
    将返回一个列表,因此如果您希望正确获得击球平均值(列表中的第二个),则需要在此进行一些转换
  • 上面第2条主要是代码返回以下内容的原因:

    the batting average of the Washington Nationals  is:  ['22', '.304', '.362', '.530', '3.21', '2', '0.93', '.179']
    the batting average of the Philadelphia Phillies  is:  ['19', '.306', '.364', '.468', '5.96', '2', '1.75', '.311']
    the batting average of the New York Mets  is:  ['10', '.179', '.243', '.337', '6.75', '2', '1.64', '.304']
    the batting average of the Miami Marlins  is:  ['27', '.301', '.358', '.451', '3.00', '2', '1.31', '.268']
    the batting average of the Atlanta Braves  is:  ['6', '.179', '.225', '.337', '1.38', '3', '0.85', '.184']
    [Finished in 19.0s]
    
    稍微改变一下你的方法:

    import urllib
    import re
    ##NL East stats.
    teamstate = ["wsh","phi","nym","mia","atl"]
    teamnamelist = ["washington-nationals","philadelphia-phillies","new-york-mets","miami-marlins","atlanta-braves"]
    teamlist = ["Washington Nationals","Philadelphia Phillies","New York Mets","Miami Marlins","Atlanta Braves"]
    
    for x, y, z in zip(teamstate, teamnamelist, teamlist):
        url = "http://espn.go.com/mlb/team/_/name/%s/%s" % (x, y)
        htmlfile = urllib.urlopen(url)
        htmltext = htmlfile.read()
        regex = '<span class="stat">(.+?)</span>'
        pattern = re.compile(regex)
        val = re.findall(pattern,htmltext)[1]
        print "The batting average of the %s is %s." % (z, str(val))
    
    使用
    lxml
    请求
    (因为从长远来看它更快):

    结果:

    The batting average of the Washington Nationals is .304.
    The batting average of the Philadelphia Phillies is .306.
    The batting average of the New York Mets is .179.
    The batting average of the Miami Marlins is .301.
    The batting average of the Atlanta Braves is .179.
    [Finished in 22.5s]
    
    The batting average of the Washington Nationals is .304.
    The batting average of the Philadelphia Phillies is .306.
    The batting average of the New York Mets is .179.
    The batting average of the Miami Marlins is .301.
    The batting average of the Atlanta Braves is .179.
    [Finished in 10.6s]
    

    让我们知道这是否有帮助。

    你能再检查一下你的缩进吗…它看起来不像是在复制/粘贴
    i
    j
    时幸存下来的,它们的值总是完全相同的,所以你可以先去掉其中一个。在zip(teamstate,teamnamelist,teamlist)中为x,y,z使用
    似乎比
    方法更漂亮,而
    方法则更漂亮。此外,你有很多元素与你的正则表达式匹配。正如你所见,我的技能集实际上不存在。如果你觉得有帮助,请点击我文章顶部附近的绿色复选图标接受答案。
    import requests as rq
    from lxml import html
    
    teamstate = ["wsh","phi","nym","mia","atl"]
    teamnamelist = ["washington-nationals","philadelphia-phillies","new-york-mets","miami-marlins","atlanta-braves"]
    teamlist = ["Washington Nationals","Philadelphia Phillies","New York Mets","Miami Marlins","Atlanta Braves"]
    
    for x, y, z in zip(teamstate, teamnamelist, teamlist):
        url = "http://espn.go.com/mlb/team/_/name/%s/%s" % (x, y)
        r = rq.get(url)
        tree = html.fromstring(r.text)
        val = tree.xpath("//span[@class='stat']/text()")[1]
        print "The batting average of the %s is %s." % (z, str(val))
    
    The batting average of the Washington Nationals is .304.
    The batting average of the Philadelphia Phillies is .306.
    The batting average of the New York Mets is .179.
    The batting average of the Miami Marlins is .301.
    The batting average of the Atlanta Braves is .179.
    [Finished in 10.6s]