Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/311.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何使用脚本计算Libre Office文件中的字数?_Python_Libreoffice_Odf_Odfpy - Fatal编程技术网

Python 如何使用脚本计算Libre Office文件中的字数?

Python 如何使用脚本计算Libre Office文件中的字数?,python,libreoffice,odf,odfpy,Python,Libreoffice,Odf,Odfpy,我正在尝试编写一个脚本,它获取一个包含X.odt文件的文件夹,并计算字数。它必须将其写入csv文件中,并注明日期 我试着用odfpy来做 import odf import glob import pandas as pd import os from odf.opendocument import load as load_odt filenames = [] word_counts = [] for f in glob.glob('*.odt'): print(f) doc

我正在尝试编写一个脚本,它获取一个包含X.odt文件的文件夹,并计算字数。它必须将其写入csv文件中,并注明日期

我试着用odfpy来做

import odf
import glob
import pandas as pd
import os
from odf.opendocument import load as load_odt

filenames = []
word_counts = []
for f in glob.glob('*.odt'):
    print(f)
    doc = load_odt(f)
    if doc.text.hasChildNodes():
        n = 0
        for e in doc.text.childNodes:
            if ":text:" in e.qname[0]:
                words = [w for w in str(e).split(" ") if len(w) > 0]
                n += len(words)
            else:
                print(e.qname[0])
        filenames.append(f)
        word_counts.append(n)

df = pd.DataFrame({'date':[pd.Timestamp.now() for i in range(len(filenames))], 'filename':filenames, 'word_count':word_counts})
print(df)
csv_filename = 'word_count.csv'


它以某种方式工作,但CSV中缺少一些文件。有什么想法吗?

看起来这很有效:

import odf
import glob
import pandas as pd
import os
from odf.opendocument import load as load_odt

filenames = []
word_counts = []
for f in glob.glob('*.odt'):
    print(f)
    doc = load_odt(f)
    n = 0
    for e in doc.body.childNodes:
        if type(e) == odf.element.Text or type(e) == odf.element.Element:
            words = [w for w in str(e).split(" ") if len(w) > 0]
            n += len(words)
        else:
            print(type(e))
    
    filenames.append(f)
    word_counts.append(n)

df = pd.DataFrame({'date':[pd.Timestamp.now() for i in range(len(filenames))], 'filename':filenames, 'word_count':word_counts})
print(df)
csv_filename = 'word_count.csv'

df.to_csv(csv_filename, index = False, mode='a', header=not os.path.exists(csv_filename))
print(df.sum(axis = 0))

与LibreOffice的词数不完全相同,但这就足够了。

词数不完全相同那么,为什么不问问
DocumentInfo.DocumentStatistic
?我该怎么做?