Python 如何从字符串中提取特定单词?
我有一个包含多行的文件,希望提取每行的前三个字Python 如何从字符串中提取特定单词?,python,mysql,Python,Mysql,我有一个包含多行的文件,希望提取每行的前三个字 str = [] str = [ Feb 17 07:10:07 afg-prod-web2 journal: afg-prod-web2 statistics: 192.168.28.12 - 200 - "{\x0A \x22identifier\x22: {\x0A \x22company_code\x22: \x22TSC\x22,\x0A \x22product_type\x22: \x22airtim
str = []
str = [
Feb 17 07:10:07 afg-prod-web2 journal: afg-prod-web2 statistics: 192.168.28.12 - 200 - "{\x0A \x22identifier\x22: {\x0A \x22company_code\x22: \x22TSC\x22,\x0A \x22product_type\x22: \x22airtime-ctg\x22,\x0A \x22host_type\x22: \x22android\x22\x0A },\x0A \x22id\x22: {\x0A \x22type\x22: \x22guest\x22,\x0A \x22group\x22: \x22guest\x22,\x0A \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242ac110003\x22,\x0A \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A },\x0A \x22stats\x22: [\x0A {\x0A \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A \x22software_id\x22: \x22A-ACTG\x22,\x0A \x22action_id\x22: \x22open_app\x22,\x0A \x22values\x22: {\x0A \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A \x22language\x22: \x22en\x22\x0A }\x0A }\x0A ]\x0A}"
Feb 17 07:10:07 afg-prod-web2 journal: afg-prod-web2 statistics: 192.168.28.12 - 200 - "{\x0A \x22identifier\x22: {\x0A \x22company_code\x22: \x22TSC\x22,\x0A \x22product_type\x22: \x22airtime-ctg\x22,\x0A \x22host_type\x22: \x22android\x22\x0A },\x0A \x22id\x22: {\x0A \x22type\x22: \x22guest\x22,\x0A \x22group\x22: \x22guest\x22,\x0A \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242ac110003\x22,\x0A \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A },\x0A \x22stats\x22: [\x0A {\x0A \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A \x22software_id\x22: \x22A-ACTG\x22,\x0A \x22action_id\x22: \x22open_app\x22,\x0A \x22values\x22: {\x0A \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A \x22language\x22: \x22en\x22\x0A }\x0A }\x0A ]\x0A}"
Feb 17 07:10:07 afg-prod-web2 journal: afg-prod-web2 statistics: 192.168.28.12 - 200 - "{\x0A \x22identifier\x22: {\x0A \x22company_code\x22: \x22TSC\x22,\x0A \x22product_type\x22: \x22airtime-ctg\x22,\x0A \x22host_type\x22: \x22android\x22\x0A },\x0A \x22id\x22: {\x0A \x22type\x22: \x22guest\x22,\x0A \x22group\x22: \x22guest\x22,\x0A \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242ac110003\x22,\x0A \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A },\x0A \x22stats\x22: [\x0A {\x0A \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A \x22software_id\x22: \x22A-ACTG\x22,\x0A \x22action_id\x22: \x22open_app\x22,\x0A \x22values\x22: {\x0A \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A \x22language\x22: \x22en\x22\x0A }\x0A }\x0A ]\x0A}"
Feb 17 07:10:07 afg-prod-web1 journal: afg-prod-web1 statistics: 192.168.28.12 - 200 - "{\x0A \x22identifier\x22: {\x0A \x22company_code\x22: \x22TSC\x22,\x0A \x22product_type\x22: \x22airtime-ctg\x22,\x0A \x22host_type\x22: \x22android\x22\x0A },\x0A \x22id\x22: {\x0A \x22type\x22: \x22guest\x22,\x0A \x22group\x22: \x22guest\x22,\x0A \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242ac110003\x22,\x0A \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A },\x0A \x22stats\x22: [\x0A {\x0A \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A \x22software_id\x22: \x22A-ACTG\x22,\x0A \x22action_id\x22: \x22open_app\x22,\x0A \x22values\x22: {\x0A \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A \x22language\x22: \x22en\x22\x0A }\x0A }\x0A ]\x0A}"]
我想从每一行中提取日期
即2月17日07:10:07
,并将其放入数组中
我尝试应用for循环,但出现错误:
IndexError: list index out of range
我试过的代码:
for i in splitdata:
abc = splitdata[logcount]
aa = abc.split()
if(aa[0] == "Feb"):
aaa = "".join([aa[0],' ',aa[1],' ',aa[2]])
logtime.append(aaa)
logcount += 2
else:
pass
print logtime
如果您的日志保存在名为log.log的文件中,您可以通过执行以下操作来获取日期:
with open('log.log') as f:
log_time = []
for line in f:
log_time.append(line[:15])
print(log_time)
您只需检查len(拆分字符串)即可避免此类错误。改进代码有很大的余地
- 使用可重用的方法
- 按索引访问之前,请检查列表的长度
- 在python中,if条件不需要括号
- 以智能的方式使用列表理解
- 您用来加入列表的代码表明您需要在python中学习很多东西。祝你好运李>
In[1]:sample_text=“”2月17日07:10:07 afg-prod-web2期刊:afg-prod-web2统计:192.168.28.12-200-“{\x0A
…:\x22标识符\x22:{\x0A\x22公司代码\x22:\x22TSC\x22\x0A\x22产品类型\x22:\x22实时
…:-ctg\x22\x0A\x22host\u type\x22:\x22android\x22\x0A}\x0A\x22id\x22:{\x0A\x22type\x22:\
…:x22guest\x22\x0A\x22group\x22:\x22guest\x22\x0A\x22uuid\x22:\x22fdcdc-ade2-11e6-8404-0242a
…:c110003\x22\x0A\x22设备id\x22:\x222f504f5ed3c64934\x22\x0A}\x0A\x22stats\x22:[\x0A
…:{\x0A\x22timestamp\x22:\x222017-02-16T23:29:57+0000\x22\x0A\x22软件id\x22:\x22A-A
…:CTG\x22\x0A\x22action\u id\x22:\x22open\u app\x22\x0A\x22values\x22:{\x0A
…:\x22设备id\x22:\x222f504f5ed3c64934\x22\x0A\x22语言\x22:\x22en\x22\x0A}\x0A
…:}\x0A]\x0A}“”
在[2]中:def从日志(日志文本)获取时间:
…:log\u text\u split=log\u text.split(“”)
…:如果len(log\u text\u split)<3:
…:通过
…:elif log\u text\u split[0]=“Feb”:
…:返回“”。加入(日志\文本\拆分[0:3])
...:
在[3]中:从日志中获取时间(示例文本)
出[3]:“2月17日07:10:07”
您能提到wat是splitdata和logcount值吗?splitdata是我提到的str,logcount是下一行的计数,即获取下一个日期的计数。我认为错误是因为中间有空行,在这种情况下是“aa”“将没有任何数据。在文件中,中间没有空行。正确的列表项应该用逗号分隔吗?”?列表中的项目之间缺少逗号。此外,当循环开始时,logcount=0?如果LogCube值不超过0,那么索引将超出范围,因为列表STR中只有一个元素。我认为它会产生同样的错误indexer:list index out-range
,op试图解决这个问题。文件中没有空白行
检查@tintintin对他的问题的评论那么,你认为indexer
为什么会被提出?