Python 如何更改html I'的日期格式；我在刮吗？_Python_Html_Date_Web Scraping

Python 如何更改html I'的日期格式；我在刮吗？

python html date web-scraping

Python 如何更改html I'的日期格式；我在刮吗？,python,html,date,web-scraping,Python,Html,Date,Web Scraping,我正在从一个网站上删除日期。日期在我的浏览器中以可用的格式显示，但当我从网站提取数据字符串时，格式会改变吗？以MM/DD/YYYY格式获取日期的最简单方法是什么在网站上，日期显示为：“12/05/2013 9:26 PM GMT”，当我在下面的脚本中提取它时，日期显示为：“Thu Dec 05 16:26:24 EST 2013 GMT”。我只想捕捉“12/05/2013”的价值观代码中存在各种问题。您应该尝试使用，这样就不需要重复相同的代码五次对于BeautifulSoup，您可以使用函

我正在从一个网站上删除日期。日期在我的浏览器中以可用的格式显示，但当我从网站提取数据字符串时，格式会改变吗？以MM/DD/YYYY格式获取日期的最简单方法是什么

在网站上，日期显示为：“12/05/2013 9:26 PM GMT”，当我在下面的脚本中提取它时，日期显示为：“Thu Dec 05 16:26:24 EST 2013 GMT”。我只想捕捉“12/05/2013”的价值观

代码中存在各种问题。您应该尝试使用，这样就不需要重复相同的代码五次

对于BeautifulSoup，您可以使用函数

find_all

而不是

find

，来查找标记的所有匹配项

而BeautifulSoup显然是以特定的格式解析时间，所以完成任务的一种方法就是解析BeautifulSoup返回的字符串

我对您的代码做了很多更改：

#Import libraries
import urllib2
from bs4 import BeautifulSoup
import datetime

#create soup
soup = BeautifulSoup(urllib2.urlopen('https://www.theice.com/marketdata/DelayedMarkets.shtml?productId=3418&hubId=4080').read())
table = soup.find('table', {"class":"data default borderless"})

#Find and record time
time_idx = -1
for idx, th in enumerate(table.find_all('th')):
    # Find the column index of Time
    if th.get_text() == 'Time':
        time_idx = idx
        break

timevar = []
for tr in table.find_all('tr'):
    # Extract the content of each column in a list
    td_contents = [td.get_text() for td in tr.find_all('td')]
    # If this row matches our requirement, take the Time column
    if 'Dec13' in td_contents:
        time_str = td_contents[time_idx]
        # This will capture Thu Dec 05 16:26:24 EST 2013 GMT, convert to datetime object
        time_obj = datetime.datetime.strptime(time_str,'%a %b %d %H:%M:%S EST %Y GMT')
        timevar.append(datetime.datetime.strftime(time_obj,'%x'))

#create output document
with open('CarbonPrice.txt','a') as f:
    f.write(timevar[0])

代码中存在各种问题。您应该尝试使用，这样就不需要重复相同的代码五次

对于BeautifulSoup，您可以使用函数

find_all

而不是

find

，来查找标记的所有匹配项

而BeautifulSoup显然是以特定的格式解析时间，所以完成任务的一种方法就是解析BeautifulSoup返回的字符串

我对您的代码做了很多更改：

#Import libraries
import urllib2
from bs4 import BeautifulSoup
import datetime

#create soup
soup = BeautifulSoup(urllib2.urlopen('https://www.theice.com/marketdata/DelayedMarkets.shtml?productId=3418&hubId=4080').read())
table = soup.find('table', {"class":"data default borderless"})

#Find and record time
time_idx = -1
for idx, th in enumerate(table.find_all('th')):
    # Find the column index of Time
    if th.get_text() == 'Time':
        time_idx = idx
        break

timevar = []
for tr in table.find_all('tr'):
    # Extract the content of each column in a list
    td_contents = [td.get_text() for td in tr.find_all('td')]
    # If this row matches our requirement, take the Time column
    if 'Dec13' in td_contents:
        time_str = td_contents[time_idx]
        # This will capture Thu Dec 05 16:26:24 EST 2013 GMT, convert to datetime object
        time_obj = datetime.datetime.strptime(time_str,'%a %b %d %H:%M:%S EST %Y GMT')
        timevar.append(datetime.datetime.strftime(time_obj,'%x'))

#create output document
with open('CarbonPrice.txt','a') as f:
    f.write(timevar[0])

代码中存在各种问题。您应该尝试使用，这样就不需要重复相同的代码五次

对于BeautifulSoup，您可以使用函数

find_all

而不是

find

，来查找标记的所有匹配项

而BeautifulSoup显然是以特定的格式解析时间，所以完成任务的一种方法就是解析BeautifulSoup返回的字符串

我对您的代码做了很多更改：

#Import libraries
import urllib2
from bs4 import BeautifulSoup
import datetime

#create soup
soup = BeautifulSoup(urllib2.urlopen('https://www.theice.com/marketdata/DelayedMarkets.shtml?productId=3418&hubId=4080').read())
table = soup.find('table', {"class":"data default borderless"})

#Find and record time
time_idx = -1
for idx, th in enumerate(table.find_all('th')):
    # Find the column index of Time
    if th.get_text() == 'Time':
        time_idx = idx
        break

timevar = []
for tr in table.find_all('tr'):
    # Extract the content of each column in a list
    td_contents = [td.get_text() for td in tr.find_all('td')]
    # If this row matches our requirement, take the Time column
    if 'Dec13' in td_contents:
        time_str = td_contents[time_idx]
        # This will capture Thu Dec 05 16:26:24 EST 2013 GMT, convert to datetime object
        time_obj = datetime.datetime.strptime(time_str,'%a %b %d %H:%M:%S EST %Y GMT')
        timevar.append(datetime.datetime.strftime(time_obj,'%x'))

#create output document
with open('CarbonPrice.txt','a') as f:
    f.write(timevar[0])

代码中存在各种问题。您应该尝试使用，这样就不需要重复相同的代码五次

对于BeautifulSoup，您可以使用函数

find_all

而不是

find

，来查找标记的所有匹配项

而BeautifulSoup显然是以特定的格式解析时间，所以完成任务的一种方法就是解析BeautifulSoup返回的字符串

我对您的代码做了很多更改：

#Import libraries
import urllib2
from bs4 import BeautifulSoup
import datetime

#create soup
soup = BeautifulSoup(urllib2.urlopen('https://www.theice.com/marketdata/DelayedMarkets.shtml?productId=3418&hubId=4080').read())
table = soup.find('table', {"class":"data default borderless"})

#Find and record time
time_idx = -1
for idx, th in enumerate(table.find_all('th')):
    # Find the column index of Time
    if th.get_text() == 'Time':
        time_idx = idx
        break

timevar = []
for tr in table.find_all('tr'):
    # Extract the content of each column in a list
    td_contents = [td.get_text() for td in tr.find_all('td')]
    # If this row matches our requirement, take the Time column
    if 'Dec13' in td_contents:
        time_str = td_contents[time_idx]
        # This will capture Thu Dec 05 16:26:24 EST 2013 GMT, convert to datetime object
        time_obj = datetime.datetime.strptime(time_str,'%a %b %d %H:%M:%S EST %Y GMT')
        timevar.append(datetime.datetime.strftime(time_obj,'%x'))

#create output document
with open('CarbonPrice.txt','a') as f:
    f.write(timevar[0])

以下是一种方法：

>>> import time
>>> date_time = 'Thu Dec 05 16:26:24 EST 2013 GMT'
>>> year = time.strptime(date_time, "%a %b %d %H:%M:%S EST %Y GMT").tm_year
>>> month = time.strptime(date_time, "%a %b %d %H:%M:%S EST %Y GMT").tm_mon
>>> day = time.strptime(date_time, "%a %b %d %H:%M:%S EST %Y GMT").tm_mday
>>> print("%i/%i/%i"%(month, day, year))
12/5/2013

以下是一种方法：

>>> import time
>>> date_time = 'Thu Dec 05 16:26:24 EST 2013 GMT'
>>> year = time.strptime(date_time, "%a %b %d %H:%M:%S EST %Y GMT").tm_year
>>> month = time.strptime(date_time, "%a %b %d %H:%M:%S EST %Y GMT").tm_mon
>>> day = time.strptime(date_time, "%a %b %d %H:%M:%S EST %Y GMT").tm_mday
>>> print("%i/%i/%i"%(month, day, year))
12/5/2013

以下是一种方法：

>>> import time
>>> date_time = 'Thu Dec 05 16:26:24 EST 2013 GMT'
>>> year = time.strptime(date_time, "%a %b %d %H:%M:%S EST %Y GMT").tm_year
>>> month = time.strptime(date_time, "%a %b %d %H:%M:%S EST %Y GMT").tm_mon
>>> day = time.strptime(date_time, "%a %b %d %H:%M:%S EST %Y GMT").tm_mday
>>> print("%i/%i/%i"%(month, day, year))
12/5/2013

以下是一种方法：

>>> import time
>>> date_time = 'Thu Dec 05 16:26:24 EST 2013 GMT'
>>> year = time.strptime(date_time, "%a %b %d %H:%M:%S EST %Y GMT").tm_year
>>> month = time.strptime(date_time, "%a %b %d %H:%M:%S EST %Y GMT").tm_mon
>>> day = time.strptime(date_time, "%a %b %d %H:%M:%S EST %Y GMT").tm_mday
>>> print("%i/%i/%i"%(month, day, year))
12/5/2013

Web scraping Web scraping Web scraping Web scraping Beautiful Group也可以通过标记的内容进行搜索。查找（'th'，text='Time'）感谢您提供的附加信息，但使用它我们将无法获得列索引，我想？我很感谢在这方面的额外帮助-我知道这不是一个好方法，但不确定for循环的语法。两个注意事项：1）为了让它运行，我必须在“break”之后添加一个额外的回车符，以便代码在定义timevar之前退出if循环2）我将变量写入一个.txt文件，因此我以“f.write（timevar[0]）结束，是的，您可以像这样将输出更改为一个文件。关于额外的中断，我不确定。BeautifulSoup也可以通过标记的内容进行搜索：table.find（'th'，text='Time'）感谢提供额外的信息，但是使用它我们将无法获得列索引，我想？我很感谢在这方面的额外帮助-我知道这不是一个好方法，但不确定for循环的语法。两个注意事项：1）为了让它运行，我必须在“break”之后添加一个额外的回车符，以便代码在定义timevar之前退出if循环2）我将变量写入一个.txt文件，因此我以“f.write（timevar[0]）结束，是的，您可以像这样将输出更改为一个文件。关于额外的中断，我不确定。BeautifulSoup也可以通过标记的内容进行搜索：table.find（'th'，text='Time'）感谢提供额外的信息，但是使用它我们将无法获得列索引，我想？我很感谢在这方面的额外帮助-我知道这不是一个好方法，但不确定for循环的语法。两个注意事项：1）为了让它运行，我必须在“break”之后添加一个额外的回车符，以便代码在定义timevar之前退出if循环2）我将变量写入一个.txt文件，因此我以“f.write（timevar[0]）结束，是的，您可以像这样将输出更改为一个文件。关于额外的中断，我不确定。BeautifulSoup也可以通过标记的内容进行搜索：table.find（'th'，text='Time'）感谢提供额外的信息，但是使用它我们将无法获得列索引，我想？我很感谢在这方面的额外帮助-我知道这不是一个好方法，但不确定for循环的语法。两个注意事项：1）为了让它运行，我必须在“break”之后添加一个额外的回车符，以便代码在定义timevar之前退出if循环2）我将变量写入一个.txt文件，因此我以“f.write（timevar[0]）结束，是的，您可以像这样将输出更改为一个文件。关于额外的休息时间，我不确定。