Python 如何从字符串中提取特定单词?

Python 如何从字符串中提取特定单词?,python,mysql,Python,Mysql,我有一个包含多行的文件,希望提取每行的前三个字 str = [] str = [ Feb 17 07:10:07 afg-prod-web2 journal: afg-prod-web2 statistics: 192.168.28.12 - 200 - "{\x0A \x22identifier\x22: {\x0A \x22company_code\x22: \x22TSC\x22,\x0A \x22product_type\x22: \x22airtim

我有一个包含多行的文件,希望提取每行的前三个字

str = []

str = [
Feb 17 07:10:07 afg-prod-web2 journal: afg-prod-web2 statistics: 192.168.28.12 - 200 - "{\x0A    \x22identifier\x22: {\x0A        \x22company_code\x22: \x22TSC\x22,\x0A        \x22product_type\x22: \x22airtime-ctg\x22,\x0A        \x22host_type\x22: \x22android\x22\x0A    },\x0A    \x22id\x22: {\x0A        \x22type\x22: \x22guest\x22,\x0A        \x22group\x22: \x22guest\x22,\x0A        \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242ac110003\x22,\x0A        \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A    },\x0A    \x22stats\x22: [\x0A        {\x0A            \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A            \x22software_id\x22: \x22A-ACTG\x22,\x0A            \x22action_id\x22: \x22open_app\x22,\x0A            \x22values\x22: {\x0A                \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A                \x22language\x22: \x22en\x22\x0A            }\x0A        }\x0A    ]\x0A}"

Feb 17 07:10:07 afg-prod-web2 journal: afg-prod-web2 statistics: 192.168.28.12 - 200 - "{\x0A    \x22identifier\x22: {\x0A        \x22company_code\x22: \x22TSC\x22,\x0A        \x22product_type\x22: \x22airtime-ctg\x22,\x0A        \x22host_type\x22: \x22android\x22\x0A    },\x0A    \x22id\x22: {\x0A        \x22type\x22: \x22guest\x22,\x0A        \x22group\x22: \x22guest\x22,\x0A        \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242ac110003\x22,\x0A        \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A    },\x0A    \x22stats\x22: [\x0A        {\x0A            \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A            \x22software_id\x22: \x22A-ACTG\x22,\x0A            \x22action_id\x22: \x22open_app\x22,\x0A            \x22values\x22: {\x0A                \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A                \x22language\x22: \x22en\x22\x0A            }\x0A        }\x0A    ]\x0A}"

Feb 17 07:10:07 afg-prod-web2 journal: afg-prod-web2 statistics: 192.168.28.12 - 200 - "{\x0A    \x22identifier\x22: {\x0A        \x22company_code\x22: \x22TSC\x22,\x0A        \x22product_type\x22: \x22airtime-ctg\x22,\x0A        \x22host_type\x22: \x22android\x22\x0A    },\x0A    \x22id\x22: {\x0A        \x22type\x22: \x22guest\x22,\x0A        \x22group\x22: \x22guest\x22,\x0A        \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242ac110003\x22,\x0A        \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A    },\x0A    \x22stats\x22: [\x0A        {\x0A            \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A            \x22software_id\x22: \x22A-ACTG\x22,\x0A            \x22action_id\x22: \x22open_app\x22,\x0A            \x22values\x22: {\x0A                \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A                \x22language\x22: \x22en\x22\x0A            }\x0A        }\x0A    ]\x0A}"

Feb 17 07:10:07 afg-prod-web1 journal: afg-prod-web1 statistics: 192.168.28.12 - 200 - "{\x0A    \x22identifier\x22: {\x0A        \x22company_code\x22: \x22TSC\x22,\x0A        \x22product_type\x22: \x22airtime-ctg\x22,\x0A        \x22host_type\x22: \x22android\x22\x0A    },\x0A    \x22id\x22: {\x0A        \x22type\x22: \x22guest\x22,\x0A        \x22group\x22: \x22guest\x22,\x0A        \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242ac110003\x22,\x0A        \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A    },\x0A    \x22stats\x22: [\x0A        {\x0A            \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A            \x22software_id\x22: \x22A-ACTG\x22,\x0A            \x22action_id\x22: \x22open_app\x22,\x0A            \x22values\x22: {\x0A                \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A                \x22language\x22: \x22en\x22\x0A            }\x0A        }\x0A    ]\x0A}"]
我想从每一行中提取
日期
2月17日07:10:07
,并将其放入数组中

我尝试应用for循环,但出现错误:

IndexError: list index out of range
我试过的代码:

for i in splitdata:
            abc  = splitdata[logcount]
            aa = abc.split()
            if(aa[0] == "Feb"):
                aaa = "".join([aa[0],' ',aa[1],' ',aa[2]])
                logtime.append(aaa)
                logcount += 2   
            else:
                pass
        print logtime

如果您的日志保存在名为log.log的文件中,您可以通过执行以下操作来获取日期:

with open('log.log') as f: 
    log_time = []
    for line in f:
        log_time.append(line[:15])
print(log_time) 
您只需检查len(拆分字符串)即可避免此类错误。改进代码有很大的余地

  • 使用可重用的方法
  • 按索引访问之前,请检查列表的长度
  • 在python中,if条件不需要括号
  • 以智能的方式使用列表理解
  • 您用来加入列表的代码表明您需要在python中学习很多东西。祝你好运
In[1]:sample_text=“”2月17日07:10:07 afg-prod-web2期刊:afg-prod-web2统计:192.168.28.12-200-“{\x0A
…:\x22标识符\x22:{\x0A\x22公司代码\x22:\x22TSC\x22\x0A\x22产品类型\x22:\x22实时
…:-ctg\x22\x0A\x22host\u type\x22:\x22android\x22\x0A}\x0A\x22id\x22:{\x0A\x22type\x22:\
…:x22guest\x22\x0A\x22group\x22:\x22guest\x22\x0A\x22uuid\x22:\x22fdcdc-ade2-11e6-8404-0242a
…:c110003\x22\x0A\x22设备id\x22:\x222f504f5ed3c64934\x22\x0A}\x0A\x22stats\x22:[\x0A
…:{\x0A\x22timestamp\x22:\x222017-02-16T23:29:57+0000\x22\x0A\x22软件id\x22:\x22A-A
…:CTG\x22\x0A\x22action\u id\x22:\x22open\u app\x22\x0A\x22values\x22:{\x0A
…:\x22设备id\x22:\x222f504f5ed3c64934\x22\x0A\x22语言\x22:\x22en\x22\x0A}\x0A
…:}\x0A]\x0A}“”
在[2]中:def从日志(日志文本)获取时间:
…:log\u text\u split=log\u text.split(“”)
…:如果len(log\u text\u split)<3:
…:通过
…:elif log\u text\u split[0]=“Feb”:
…:返回“”。加入(日志\文本\拆分[0:3])
...:
在[3]中:从日志中获取时间(示例文本)
出[3]:“2月17日07:10:07”

您能提到wat是splitdata和logcount值吗?splitdata是我提到的str,logcount是下一行的计数,即获取下一个日期的计数。我认为错误是因为中间有空行,在这种情况下是“aa”“将没有任何数据。在文件中,中间没有空行。正确的列表项应该用逗号分隔吗?”?列表中的项目之间缺少逗号。此外,当循环开始时,logcount=0?如果LogCube值不超过0,那么索引将超出范围,因为列表STR中只有一个元素。我认为它会产生同样的错误
indexer:list index out-range
,op试图解决这个问题。
文件中没有空白行
检查@tintintin对他的问题的评论那么,你认为
indexer
为什么会被提出?