Python 来自a<；预处理>；标签_Python_Html_Python 2.7_File Io

Python 来自a<；预处理>；标签

python html python-2.7 file-io

Python 来自a<；预处理>；标签,python,html,python-2.7,file-io,Python,Html,Python 2.7,File Io,因此，我正在用Python2.7编写一些代码，从网站中提取一些信息，从该集合中提取相关数据，然后以更有用的方式格式化这些数据。具体来说，我想从html #Pulling data from GFS lamps from lxml import html import requests import numpy as np ICAO = raw_input("What station would you like GFS lamps data for? ") page = requests.

因此，我正在用Python2.7编写一些代码，从网站中提取一些信息，从该集合中提取相关数据，然后以更有用的方式格式化这些数据。具体来说，我想从html


#Pulling data from GFS lamps

from lxml import html
import requests
import numpy as np

ICAO = raw_input("What station would you like GFS lamps data for? ")

page = requests.get('http://www.nws.noaa.gov/cgi-bin/lamp/getlav.pl?sta=' + ICAO)
tree = html.fromstring(page.content)
Lamp = tree.xpath('//pre/text()') #stores class of //pre html element in list Lamp
gfsLamps = open('ICAO', 'w') #stores text of Lamp into a new file
gfsLamps.write(Lamp[0])

array = np.genfromtxt('ICAO') #puts file into an array

array[5]

您可以使用KOGD作为国际民航组织来测试这一点。按原样，我得到了值错误：检测到了一些错误，它列出了第2-23行（得到了26列而不是8列）。我做错事的第一步是什么？还是说我的想法完全错了？
问题不在于将数据放入文件部分，而是使用genfromtxt将其取出。问题在于genfromtxt是一个非常严格的函数，通常需要完整的数据，除非您指定许多选项来跳过列和行。改用这个：
a b cdef 124

arrays变量将包含每行的数组，其中包含该行中由空格分隔的每个元素，例如，如果您的行具有以下数据：
['a','b','cdef','124']

此行的数组将为：
from lxml import html
import requests
import numpy as np

ICAO = raw_input("What station would you like GFS lamps data for? ")

page = requests.get('http://www.nws.noaa.gov/cgi-bin/lamp/getlav.pl?sta=' + ICAO)
tree = html.fromstring(page.content)
Lamp = tree.xpath('//pre/text()') #stores class of //pre html element in list Lamp
gfsLamps = open('ICAO', 'w') #stores text of Lamp into a new file
gfsLamps.write(Lamp[0])
gfsLamps.close()
array = [np.array(map(str, line.split())) for line in open('ICAO')]
print array

数组将包含这样的每行数组，可以根据需要进一步处理。
所以完整的代码是：
请检查我的回答是否对您有所帮助，或者您是否遇到了任何问题。@我不是谷歌，因此我以前遇到的问题已经消失，但我没有得到我想要的，或者我没有正确处理它。如果执行array[0]
或array[1]
操作，则既没有输出，也没有错误。如果我做了比这更大的事情，它会说索引器：列表索引超出范围
。我决定尝试使用print函数，根据您给我的代码，看看它能告诉我关于数组的什么。我得到了：[array（['I']，dtype=''S1'）、array（['C']，dtype=''S1'）、array（['A']，dtype=''S1'）、array（['O']，dtype='S1'）]）然而，字母change@iamnotgoogle这些字母是I而不是L，a，m，p而不是C，a和O。这只是文件名，你的问题中提到了国际民航组织，所以我就用了这个。我已经用我写的代码行执行了代码，它给出了所需的输出。我会在我的答案中添加完整的代码，这样你就可以看到你是否有任何错误。我在写入文件后没有关闭它。这是我能看到的唯一区别。现在可以了。谢谢大家!@SamuelSmith哦，对了，忘了，如果文件没有关闭，就无法打开：D
from lxml import html
import requests
import numpy as np

ICAO = raw_input("What station would you like GFS lamps data for? ")

page = requests.get('http://www.nws.noaa.gov/cgi-bin/lamp/getlav.pl?sta=' + ICAO)
tree = html.fromstring(page.content)
Lamp = tree.xpath('//pre/text()') #stores class of //pre html element in list Lamp
gfsLamps = open('ICAO', 'w') #stores text of Lamp into a new file
gfsLamps.write(Lamp[0])
gfsLamps.close()
array = [np.array(map(str, line.split())) for line in open('ICAO')]
print array