使用python/xlrd比较两个单独工作表中的excel数据_Python_Excel_Xlrd_Xlwt_Vba

使用python/xlrd比较两个单独工作表中的excel数据

python excel vba

使用python/xlrd比较两个单独工作表中的excel数据,python,excel,xlrd,xlwt,vba,Python,Excel,Xlrd,Xlwt,Vba,我有两个从两个单独的excel工作簿中提取的列表列表。每个元素包含它自己的两个元素。这些列表表示在每个excel工作簿的前两列中找到的数据。例如： search_terms = [['term1',300],['term2',400],['term3',200]...] #words searched on our website with number of hits for each item_description = [[900001,'a string with term1'],[90

我有两个从两个单独的excel工作簿中提取的列表列表。每个元素包含它自己的两个元素。这些列表表示在每个excel工作簿的前两列中找到的数据。例如：

search_terms = [['term1',300],['term2',400],['term3',200]...] #words searched on our website with number of hits for each
item_description = [[900001,'a string with term1'],[900002,'a string with term 2'],[900003,'a string with term 1 and 2']...] #item numbers with matching descriptions

我的目标是将搜索词中的字符串与项目描述中的字符串进行比较，并根据每个搜索词的项目描述编译匹配的项目编号列表。然后，我想根据它们产生的点击量，选择前250个术语和匹配的项目编号

我从xlrd生成了两个列表，我想我应该转换成元组，然后生成一个类似于以下内容的列表：

results = [['term1',300,900001,900003],['term2',400,900002,900003],['term3',200]] #search term, number of hits, and matching item numbers based on item description

然后，我将使用xlwt将项目编号写入母excel文件中匹配的术语/点击的相邻列，以用于显示/演示

当涉及到python、xlrd和一般编程时，我是新手。我感谢任何意见和指导，以及对我的方法的天真的敏感性

你的思路是对的，但我认为你在这里想要的是一本以术语为键、以值列表为值的词典。最终会看起来像这样：

{
    'term1': [300, 900001,900003],
    'term2': [400,900002,900003],
    'term3': [200]  # are there numbers missing from this one?
}

下面是这方面的代码：

import re
from collections import defaultdict

search_terms = [['term1',300],['term2',400],['term3',200]] #words searched on our website with number of hits for each
item_description = [[900001,'a string with term1'],[900002,'a string with term2'],[900003,'a string with term1 and term2']]

d = defaultdict(list)
i = 0

for item in search_terms:
    d[item[0]].append(item[1])
    rgx = re.compile(item[0])
    for info in item_description:
        matches = re.findall(rgx, info[1])
        if matches:
            d[item[0]].append(info[0])
        print matches
print d

Defaultdict测试字典中是否已经存在密钥，如果不存在，则添加该密钥。然后遍历字典，将它们的键放入第一列，然后遍历列表，并将它们分别放入各自的列中。如果这不符合您的数据结构，请告诉我，我可以尝试调整它。

我没有在结果[2]中输入900000个数字，因为本例中的关键字“term3”不在项目描述中。这看起来像是我正在寻找的方法。我要用它做面条！谢谢你迄今为止的帮助！这将搜索部分匹配项吗？例如，术语为“苹果”，描述为“含有苹果的产品”，项目编号为900001。即使它是一个与apple/apples匹配的部分字符串，并且是一个句子，而不是简单的apple，也能正确识别该项。谢谢你迄今为止的帮助！编辑好了，nm，我刚刚查找了您正在使用的正则表达式操作。这是一段很好的代码，用于我想做的事情！非常感谢。是的！在findallpattern中，string查找所有子字符串并将它们放入列表中。如果只想在单词的开头或结尾处查找子字符串，可以使用^单词开头和$end of word修饰符。快速提问。如果搜索词是多个单词，我们如何修改此代码以匹配项目编号？例如：“apple skin”是搜索到的术语，我们希望apple或skin与描述匹配？另外，字符串的情况是否重要？如果搜索的词是apple，但描述中有apple，它还会匹配吗？