如何在多个文件中搜索3个单独的字符串,并用python将它们打印到excel文件中?
我有一个脚本,在打印到excel文件之前寻找4个单独的字符串。前3个是单独单行上的字符串,我用正则表达式搜索,第4个是代码块,我用beautiful soup 4搜索。我能得到《美丽的汤》的文本,但由于某种原因,前3篇没有如何在多个文件中搜索3个单独的字符串,并用python将它们打印到excel文件中?,python,regex,excel,string,file,Python,Regex,Excel,String,File,我有一个脚本,在打印到excel文件之前寻找4个单独的字符串。前3个是单独单行上的字符串,我用正则表达式搜索,第4个是代码块,我用beautiful soup 4搜索。我能得到《美丽的汤》的文本,但由于某种原因,前3篇没有 import xlwt from xlwt import Workbook import os from bs4 import BeautifulSoup import re from os import listdir fileNumber = 1 cve = ""
import xlwt
from xlwt import Workbook
import os
from bs4 import BeautifulSoup
import re
from os import listdir
fileNumber = 1
cve = ""
titlePrint = ""
titleStrip = ""
date = ""
code = ""
col = 0
row = 0
directory = "/Users/Documents/databasescript/web_scrape_db/exploits_test_folder"
for filename in os.listdir(directory):
if filename.endswith(".txt"):
with open(str(fileNumber) + ".txt") as f:
for line in f:
#CVE
if re.search(r'https://nvd.nist.gov/vuln/detail/', line):
cve = line[118:131]
print"found 'https://nvd.nist.gov/vuln/detail/'"
#Title
if '<h1 class="card-title text-secondary text-center"' in line:
titlePrint = f.next().translate(None, ''').strip()
print "found title"
#Date
if '<meta property="article:published_time"' in line:
date = line[53:63]
print "found date"
if fileNumber == 6:
break
#Source Code
soup = BeautifulSoup(open("/Users/Documents/databasescript/web_scrape_db/exploits_test_folder/"+(str(fileNumber))+".txt"), "html.parser")
#increment file number
fileNumber+=1
导入xlwt
从xlwt导入工作簿
导入操作系统
从bs4导入BeautifulSoup
进口稀土
从操作系统导入listdir
fileNumber=1
cve=“”
titlePrint=“”
titleStrip=“”
date=“”
code=“”
col=0
行=0
directory=“/Users/Documents/databasescript/web\u scrape\u db/exploits\u test\u文件夹”
对于os.listdir(目录)中的文件名:
如果filename.endswith(“.txt”):
将open(str(fileNumber)+“.txt”)作为f:
对于f中的行:
#CVE
如果重新搜索(r'https://nvd.nist.gov/vuln/detail/,第行):
cve=行[118:131]
打印“找到”https://nvd.nist.gov/vuln/detail/'"
#头衔
如果“脚本可能正在执行fileNumber==6
条件,并在到达您要查找的3个字符串之前中断循环
您的脚本只在第一个.txt文件的前6行中搜索,然后每隔一个.txt文件搜索一行
也许你想把fileNumber+=1
移回到if filename.endswith(“.txt”):
条件中?考虑简化这一点。这是关于写入xlsx或查找字符串的问题。@user1558604查找字符串,因为它找不到字符串。我可以写入excel文件,但它找不到前3个字符串CVE、Title和DATE。请编辑您的问题,并提供一个最小的、可复制的示例,其中不包含任何xlsx代码。@user1558604我编辑过它
https://nvd.nist.gov/vuln/detail/
<h1 class="card-title text-secondary text-center"
<meta property="article:published_time"