Python 如何按句子编号、句子(按';';)在CSV中写入文件?
因此,我试图读取文件列表,提取文件ID和摘要。摘要的每一句话都应该写进一个CSV文件,文件ID、句子编号和句子以“|”分隔 有人告诉我使用NLTK的标记器。我已经安装了NLTK,但不知道如何让它与我的代码一起工作。我的Python是3.2.2。下面是我的代码:Python 如何按句子编号、句子(按';';)在CSV中写入文件?,python,csv,file-io,Python,Csv,File Io,因此,我试图读取文件列表,提取文件ID和摘要。摘要的每一句话都应该写进一个CSV文件,文件ID、句子编号和句子以“|”分隔 有人告诉我使用NLTK的标记器。我已经安装了NLTK,但不知道如何让它与我的代码一起工作。我的Python是3.2.2。下面是我的代码: import re, os, sys import csv # Read into the list of files. topdir = r'E:\Grad\LIS\LIS590 Text mining\Part1\Part1' # T
import re, os, sys
import csv
# Read into the list of files.
topdir = r'E:\Grad\LIS\LIS590 Text mining\Part1\Part1' # Topdir has to be an object rather than a string, which means that there is no paranthesis.
matches = []
for root, dirnames, filenames in os.walk(topdir):
for filename in filenames:
if filename.endswith(('.txt','.pdf')):
matches.append(os.path.join(root, filename))
# Create a list and fill in the list with the abstracts. Every abstract is a string in the list.
capturedabstracts = []
for filepath in matches[:10]: # Testing with the first 10 files.
with open (filepath,'rt') as mytext:
mytext=mytext.read()
# code to capture files
matchFile=re.findall(r'File\s+\:\s+(\w\d{7})',mytext)[0]
capturedfiles.append(matchFile)
# code to capture abstracts
matchAbs=re.findall(r'Abstract\s+\:\s+(\w.+)'+'\n',mytext)[0]
capturedabstracts.append(matchAbs)
print (capturedabstracts)
with open('Abstract.csv', 'w') as csvfile:
writer = csv.writer(csvfile)
for data in capturedabstracts:
writer.writerow([data])
我是Python初学者,我可能无法理解您的注释,如果您能提供带有修订代码的注释,那就太好了。首先,查看并将文本拆分为列表,然后使用writerows存储到csv:
with file(u'Abstract.csv','w') as outfile:
sent_detector = nltk.data.load('tokenizers/punkt/english.pickle')
list_of_sentences = sent_detector.tokenize(text.strip())
writer = csv.DictWriter(outfile, headers = ['phrase'], delimiter = '|', quotechar = None, quoting = csv.QUOTE_NONE, escapechar="\\")
for phrase in list_of_sentences:
phrasedict = {'phrase':phrase}
writer.writerow(phrase)
writer.close()
尝试使用
writerow
试着这样做:
with open('Abstract.csv', 'w') as csvfile:
writer = csv.writer(csvfile)
for data in capturedabstracts:
writer.writerow([data])
“…应写入CSV文件,文件ID、句子编号和句子以“|”分隔。”?