通过Python将Elasticsearch中的原始数据与导入的数据进行比较_Python_Csv_<img Src="//i.stack.imgur.com/RUiNP.png" Height="16" Width="18" Alt="" Class="sponsor Tag Img">elasticsearch_Import

通过Python将Elasticsearch中的原始数据与导入的数据进行比较

python csv import

通过Python将Elasticsearch中的原始数据与导入的数据进行比较,python,csv,elasticsearch,import,Python,Csv,elasticsearch,Import,我使用Python将数据导入Elasticsearch。为了确保数据传输与原始数据完全相同，我需要将原始数据（在*.csv中）与ES索引进行比较。原始数据包括一些包含自由文本数据的字段，并且可以包括非ascii字符。第一个检查是比较csv中的行数与索引中的文档数。为此，我编写了以下函数： def match_csv_index(es_con, path, filename, enc, sep, index=None): if index is None: index

我使用Python将数据导入Elasticsearch。为了确保数据传输与原始数据完全相同，我需要将原始数据（在*.csv中）与ES索引进行比较。原始数据包括一些包含自由文本数据的字段，并且可以包括非ascii字符。第一个检查是比较csv中的行数与索引中的文档数。为此，我编写了以下函数：

def match_csv_index(es_con, path, filename, enc, sep, index=None):

    if index is None:
        index = filename.lower()

    with open(path + filename + '.csv', encoding=enc) as f:
        reader = csv.DictReader(f, delimiter=sep)
        total_lines = sum(1 for line in reader)-1
        print('CSV total lines excluding header', total_lines)

        index_count = es_con.cat.count(index).split()[2]
        print('ES total hits', index_count)

    if total_lines == index_count:
        return True
    else:
        print('Data mismatch (origin-destination)', total_lines - index_count)
        return False

我有以下问题：

我想知道这是处理大文件的最佳方式吗
如果行数与文档数不匹配，如何查找差异
对于更多的检查，如数据范围和数据类型，您推荐什么