如何在多个文件中搜索3个单独的字符串，并用python将它们打印到excel文件中？_Python_Regex_Excel_String_File

如何在多个文件中搜索3个单独的字符串，并用python将它们打印到excel文件中？

python regex excel string file

如何在多个文件中搜索3个单独的字符串，并用python将它们打印到excel文件中？,python,regex,excel,string,file,Python,Regex,Excel,String,File,我有一个脚本，在打印到excel文件之前寻找4个单独的字符串。前3个是单独单行上的字符串，我用正则表达式搜索，第4个是代码块，我用beautiful soup 4搜索。我能得到《美丽的汤》的文本，但由于某种原因，前3篇没有 import xlwt from xlwt import Workbook import os from bs4 import BeautifulSoup import re from os import listdir fileNumber = 1 cve = ""

我有一个脚本，在打印到excel文件之前寻找4个单独的字符串。前3个是单独单行上的字符串，我用正则表达式搜索，第4个是代码块，我用beautiful soup 4搜索。我能得到《美丽的汤》的文本，但由于某种原因，前3篇没有

import xlwt 
from xlwt import Workbook 
import os
from bs4 import BeautifulSoup
import re
from os import listdir

fileNumber = 1
cve = ""
titlePrint = ""
titleStrip = ""
date = ""
code = ""
col = 0
row = 0


directory = "/Users/Documents/databasescript/web_scrape_db/exploits_test_folder"

for filename in os.listdir(directory):
    if filename.endswith(".txt"):
        with open(str(fileNumber) + ".txt") as f:

            for line in f:
                #CVE

                if re.search(r'https://nvd.nist.gov/vuln/detail/', line):
                    cve = line[118:131]
                    print"found 'https://nvd.nist.gov/vuln/detail/'"

                #Title
                if '<h1 class="card-title text-secondary text-center"' in line:
                    titlePrint = f.next().translate(None, '&#039;').strip()
                    print "found title"
                #Date
                if '<meta property="article:published_time"' in line:
                    date = line[53:63]
                    print "found date"

                if fileNumber == 6:
                    break       
                #Source Code
                soup = BeautifulSoup(open("/Users/Documents/databasescript/web_scrape_db/exploits_test_folder/"+(str(fileNumber))+".txt"), "html.parser")




                #increment file number      
                fileNumber+=1

导入xlwt
从xlwt导入工作簿
导入操作系统
从bs4导入BeautifulSoup
进口稀土
从操作系统导入listdir
fileNumber=1
cve=“”
titlePrint=“”
titleStrip=“”
date=“”
code=“”
col=0
行=0
directory=“/Users/Documents/databasescript/web\u scrape\u db/exploits\u test\u文件夹”
对于os.listdir（目录）中的文件名：
如果filename.endswith（“.txt”）：
将open（str（fileNumber）+“.txt”）作为f：
对于f中的行：
#CVE
如果重新搜索（r'https://nvd.nist.gov/vuln/detail/，第行）：
cve=行[118:131]
打印“找到”https://nvd.nist.gov/vuln/detail/'"
#头衔
如果“脚本可能正在执行fileNumber==6
条件，并在到达您要查找的3个字符串之前中断循环
您的脚本只在第一个.txt文件的前6行中搜索，然后每隔一个.txt文件搜索一行
也许你想把fileNumber+=1
移回到if filename.endswith（“.txt”）：
条件中？考虑简化这一点。这是关于写入xlsx或查找字符串的问题。@user1558604查找字符串，因为它找不到字符串。我可以写入excel文件，但它找不到前3个字符串CVE、Title和DATE。请编辑您的问题，并提供一个最小的、可复制的示例，其中不包含任何xlsx代码。@user1558604我编辑过它
https://nvd.nist.gov/vuln/detail/
<h1 class="card-title text-secondary text-center"
<meta property="article:published_time"