Python 实体匹配包deepmatcher出错_Python_Entity_Matching

Python 实体匹配包deepmatcher出错

python

Python 实体匹配包deepmatcher出错,python,entity,matching,Python,Entity,Matching,获取以下错误： ValueError Traceback (most recent call last) <ipython-input-6-2d323ffe212f> in <module>() ----> 1 train, validation, test = dm.data.process(path='/content/', train='train.csv', validation='validat

获取以下错误：

ValueError                                Traceback (most recent call last)
<ipython-input-6-2d323ffe212f> in <module>()
----> 1 train, validation, test = dm.data.process(path='/content/', train='train.csv', validation='validation.csv', test='test.csv')

1 frames
/usr/local/lib/python3.7/dist-packages/deepmatcher/data/process.py in _check_header(header, id_attr, left_prefix, right_prefix, label_attr, ignore_columns)
     32         if attr not in (id_attr, label_attr) and attr not in ignore_columns:
     33             if not attr.startswith(left_prefix) and not attr.startswith(right_prefix):
---> 34                 raise ValueError('Attribute ' + attr + ' is not a left or a right table '
     35                                  'column, not a label or id and is not ignored. Not sure '
     36                                  'what it is...')

ValueError: Attribute ltable_id is not a left or a right table column, not a label or id and is not ignored. Not sure what it is...

ValueError回溯（最近一次调用）
在（）
---->1列，验证，测试=dm.data.process（路径='/content/'，列='train.csv'，验证='validation.csv'，测试='test.csv'）
1帧
/usr/local/lib/python3.7/dist-packages/deepmatcher/data/process.py in\u check\u标题（标题、id\u attr、左\u前缀、右\u前缀、标签\u attr、忽略\u列）
32如果attr不在（id\u attr，label\u attr）和attr不在ignore\u列中：
33如果不是attr.startswith（左前缀）而不是attr.startswith（右前缀）：
--->34 raise VALUERROR（'属性'+attr+'不是左表或右表'
35'列，不是标签或id，不会被忽略。不确定'
36“这是什么……”
ValueError:属性ltable_id不是左表列或右表列，也不是标签或id，因此不会被忽略。不知道是什么。。。

我正在使用

用于此学习的数据集，因为之前使用我自己的数据集进行的测试给出了相同的错误

代码：

导入deepmatcher作为dm train，validation，test=dm.data.process（路径='/content/'，train='train.csv'，validation='validation.csv'，test='test.csv'）

就这样。我下面是repo github.com/anhaidgroup/deepmatcher

寻找更好的理解和可能的解决方法。提前感谢。

我没有测试它，但错误消息可能表明它需要特殊列才能工作

并且先看一下页面，用回购可以确认一下

有一个示例表，其列的名称为

Left…

和

Right…

还有一个链接，您可以在其中看到

Step 1. Process labeled data¶

Left" attributes (required): ... 
   These column names are expected to be prefixed with "left_" by default.

"Right" attributes (required): "... 
   These column names are expected to be prefixed with "right_" by default.

"Left" attributes (required):  ...
  This can be customized by setting the left_prefix parameter (e.g., use "ltable_" as the prefix).
"Right" attributes (required): ...
  This can be customized by setting the right_prefix parameter (e.g., use "rtable_" as the prefix).

这表明列需要前缀

left

，

right

，但您的数据有列

ltable\u id

，

rtable\u id

。因此，您必须在加载列名之后以及与

DeepMatch

编辑：

回购协议中也有链接

在那里你可以看到

Step 1. Process labeled data¶

Left" attributes (required): ... 
   These column names are expected to be prefixed with "left_" by default.

"Right" attributes (required): "... 
   These column names are expected to be prefixed with "right_" by default.

"Left" attributes (required):  ...
  This can be customized by setting the left_prefix parameter (e.g., use "ltable_" as the prefix).
"Right" attributes (required): ...
  This can be customized by setting the right_prefix parameter (e.g., use "rtable_" as the prefix).

它显示了示例代码

dm.data.process(... left_prefix='left_', right_prefix='right_', ...)

这意味着你能做到

dm.data.process(... left_prefix='ltable_', right_prefix='rtable_', ...)

编辑：

我对它进行了测试，它通过

company\u exp\u data.zip

解决了这个问题

import deepmatcher as dm 

train, validation, test = dm.data.process(
    path='/content/',
    #path='exp_data', 
    train='train.csv', 
    validation='valid.csv', 
    test='test.csv',
    
    left_prefix='ltable_',
    right_prefix='rtable_',
)

但接下来，它又带来了另一个问题

RuntimeError: Google drive link https://drive.google.com/uc?export=download&id=1Vih8gAmgBnuYDxfblbT94P6WjB7s1ZSh is currently unavailable, because the quota was exceeded.

它试图从

谷歌硬盘

读取一些数据，但

超出了配额。

也许它需要手动下载并更改源代码以从本地计算机加载。但这是一个新问题。或者，此问题应发送给本模块的作者。他应该把数据放在其他服务器上，并更改源代码

总结：您的问题是您没有阅读文档。

您可以显示最少的工作代码-这样我们就可以看到您所做的事情，您导入了哪些模块？我们可以在我们的comptuers.import deepmatcher上测试这些代码，作为dm train，validation，test=dm.data.process（path='/content/'，train='train.csv'，validation='validation.csv'，test='test.csv'），就是这样。我遵循以下回购协议------提出质疑。它将更具可读性，更多的人将看到它，因此更多的人可能会帮助您。我不知道这个模块，但根据错误消息，所有问题都可能是csv中的数据。它可能需要一些预处理才能用于deepmatcher。它似乎有列

ltable_id

，但可能需要名为

id

的列，或者必须是第一列或最后一列（标签）。或者它只需要两列——左边和右边。是的，我现在看到了，我没有完全阅读文档。我被困在入门部分。我试过在我的数据集中使用left_uu和right_u，当它不起作用时，我非常迷茫，但你的解释是有道理的。非常感谢。非常感谢。额外的信息（可能有用，也可能没用。）：正如我提到的，我从我的数据集开始，那是我第一次看到错误的时候，我最终解决了这个问题。我将数据拆分为train和test，然后创建csv文件，在创建文件时，我没有将索引设置为false，而是创建了一个空白标题，这反过来又在process.py中创建了错误。现在，问题解决了，我还在用左边的。但一切都很酷。再次感谢！