Deep learning 句子而非标记的序列标记_Deep Learning_Nlp_Pytorch_Huggingface Transformers

Deep learning 句子而非标记的序列标记

deep-learning nlp pytorch

Deep learning 句子而非标记的序列标记,deep-learning,nlp,pytorch,huggingface-transformers,Deep Learning,Nlp,Pytorch,Huggingface Transformers,我有属于段落的句子。每个句子都有一个标签。 [s1，s2，s3，…]，[l1，l2，l3，…] 我知道我必须使用编码器对每个句子进行编码，然后使用序列标签。你能指导我如何将它们结合起来吗？如果我正确理解了你的问题，你正在寻找将句子编码为数字表示的方法假设您有如下数据： data = ["Sarah, is that you? Hahahahahaha Todd give you another black eye??" "Well, being slick comes wit

我有属于段落的句子。每个句子都有一个标签。 [s1，s2，s3，…]，[l1，l2，l3，…]

我知道我必须使用编码器对每个句子进行编码，然后使用序列标签。你能指导我如何将它们结合起来吗？

如果我正确理解了你的问题，你正在寻找将句子编码为数字表示的方法

假设您有如下数据：

data = ["Sarah, is that you? Hahahahahaha  Todd give you another black eye??"
        "Well, being slick comes with the job of being a propagandist, Andi..."
        "Sad to lose a young person who was earnestly working for the common good and public safety when so many are in the basement smoking pot and playing computer games."]

labels = [0,1,0]

现在您需要构建一个分类器，因为训练分类器数据应该是数字格式的，所以这里我们将文本数据转换为数字结构，我们将使用tf idf矢量器为文本数据创建矩阵，然后应用任何算法

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline

vectorizerPipe = Pipeline([
                     ('tfidf', TfidfVectorizer(lowercase=True,stop_words='english')),
                     ('classification', LinearSVC(penalty='l2',loss='hinge'))])

trained_model = vectorizerPipe.fit(data,labels)

这里构建了管道，第一步是特征向量提取（将文本数据转换为数字格式），下一步我们将对其应用算法。这两个步骤中都有很多参数，您可以尝试。

之后，我们用.FIT方法对管道进行FIR，并传递数据和标签。

如果你发现下面的答案是有用的，请考虑[接受] [（]）。这可以帮助未来的访问者确定哪一个答案是最能描述问题的答案。