自定义datetimeparsing,在读取csv后将日期和时间合并

自定义datetimeparsing,在读取csv后将日期和时间合并,csv,parsing,datetime,pandas,time,Csv,Parsing,Datetime,Pandas,Time,在阅读文本文件时,我看到一个奇怪的格式,其中日期和时间包含在单独的列中,如下所示(文件是作为分隔符的选项卡) 我想: 在两列上解析日期和时间([0],[1]) 将所有时间戳提前30分钟移动,即将:30替换为:00 我使用了以下代码: timeparse = lambda x: pd.datetime.strptime(x.replace(':30',':00'), '%H:%M') df = pd.read_csv('Chart_1.txt', sep='\t', skip

在阅读文本文件时,我看到一个奇怪的格式,其中日期和时间包含在单独的列中,如下所示(文件是作为分隔符的选项卡)

我想:

  • 在两列上解析日期和时间(
    [0],[1]

  • 将所有时间戳提前30分钟移动,即将
    :30
    替换为
    :00

我使用了以下代码:

timeparse = lambda x: pd.datetime.strptime(x.replace(':30',':00'), '%H:%M')

df = pd.read_csv('Chart_1.txt',
    sep='\t',
    skiprows=1,
    date_parser=timeparse,
    parse_dates=['Time'],
    header=1)
这似乎是在解析时间而不是日期(显然,这是我告诉它要做的)。 此外,跳过行有助于查找
日期
时间
标题,但它会丢弃我需要的标题
temp
room 1

您可以使用:

import pandas as pd


df = pd.read_csv('Chart_1.txt', sep='\t')
#get temperature to variable tempfrom third column
temp = df.columns[2]
print (temp)
Dry resultant temperature (°C)

#get aps to variable aps from second row and third column
aps = df.iloc[1, 2]
print (aps)
AE4854c_Campshill_openings reduced_communal areas increased openings2.aps

#create mask from first column - all values contains / - dates
mask = df.iloc[:, 0].str.contains('/',na=False)
#shift all rows to right NOT contain dates
df1 = df[~mask].shift(1, axis=1)
#get rows with dates
df2 = df[mask]
#concat df1 and df2, sort unsorted indexes
df = pd.concat([df1, df2]).sort_index()
#create new column names by assign
#first 3 are custom, other are from first row and fourth to end columns 
df.columns = ['date','time','no name'] + df.iloc[0, 3:].tolist()
#remove first 2 row
df = df[2:]
#fill NaN values in column date by forward filling
df.date = df.date.ffill()
#convert column to datetime
df.date = pd.to_datetime(df.date, format='%a, %d/%b')
#replace 30 minutes to 00
df.time = df.time.str.replace(':30', ':00')
您可以使用:

import pandas as pd


df = pd.read_csv('Chart_1.txt', sep='\t')
#get temperature to variable tempfrom third column
temp = df.columns[2]
print (temp)
Dry resultant temperature (°C)

#get aps to variable aps from second row and third column
aps = df.iloc[1, 2]
print (aps)
AE4854c_Campshill_openings reduced_communal areas increased openings2.aps

#create mask from first column - all values contains / - dates
mask = df.iloc[:, 0].str.contains('/',na=False)
#shift all rows to right NOT contain dates
df1 = df[~mask].shift(1, axis=1)
#get rows with dates
df2 = df[mask]
#concat df1 and df2, sort unsorted indexes
df = pd.concat([df1, df2]).sort_index()
#create new column names by assign
#first 3 are custom, other are from first row and fourth to end columns 
df.columns = ['date','time','no name'] + df.iloc[0, 3:].tolist()
#remove first 2 row
df = df[2:]
#fill NaN values in column date by forward filling
df.date = df.date.ffill()
#convert column to datetime
df.date = pd.to_datetime(df.date, format='%a, %d/%b')
#replace 30 minutes to 00
df.time = df.time.str.replace(':30', ':00')

您的数据的
选项卡
副本有问题-我不知道它们在哪里和不在哪里。是否可以通过WetTransfer、gdocs、dropbox或其他问题将您的样本归档并共享?来自
5的数据。
th行正确解析为
Time
simulation
列otr not?@jezrael,我已将该文件上传到[(dropbox)您的数据的
选项卡的副本有问题-我不知道在哪里和哪里没有。是否可以通过WetTransfer、gdocs、dropbox将您的样本存档并共享?或者另一个问题-来自
5的数据。
th行正确解析为
Time
simulation
列otr not?@jezrael,我已经找到了已在[(dropbox)上加载文件
print (df.head())
       date   time no name 3F_T09_SE_SW_Bed1 GF_office_S GF_office_W_tea  \
2 1900-01-01  00:00   11.94             11.47       14.72           16.66   
3 1900-01-01  01:00   12.00             11.63       14.83           16.69   
4 1900-01-01  02:00   12.04             11.73       14.85           16.68   
5 1900-01-01  03:00   12.06             11.80       14.83           16.65   
6 1900-01-01  04:00   12.08             11.84       14.79           16.62   

  GF_Act.Room GF_Communal areas GF_Reception GF_Ent Lobby   ...    \
2       17.41             12.74        12.93        10.85   ...     
3       17.45             12.74        13.14        11.00   ...     
4       17.44             12.71        13.23        11.09   ...     
5       17.41             12.68        13.27        11.16   ...     
6       17.36             12.65        13.28        11.21   ...     

  2F_S01_SE_SW_Bedroom 2F_S01_SE_SW_Int Circ 2F_S01_SE_SW_Storage_int circ  \
2                12.58                 12.17                         12.54   
3                12.64                 12.22                         12.49   
4                12.68                 12.27                         12.48   
5                12.70                 12.30                         12.49   
6                12.71                 12.31                         12.51   

  GF_G01_SE_SW_Bedroom GF_G01_SE_SW_Storage_Bed 3F_T09_SE_SW_Bathroom  \
2                14.51                    14.61                 11.49   
3                14.55                    14.59                 11.50   
4                14.56                    14.59                 11.52   
5                14.55                    14.58                 11.54   
6                14.54                    14.57                 11.56   

  3F_T09_SE_SW_Circ 3F_T09_SE_SW_Storage_int circ GF_Lounge GF_Cafe  
2             11.52                         11.38     12.83   12.86  
3             11.56                         11.35     13.03   13.03  
4             11.61                         11.36     13.13   13.13  
5             11.65                         11.39     13.17   13.17  
6             11.68                         11.42     13.18   13.18  

[5 rows x 31 columns]