Python 生成FastText多标签格式
我想为堆栈溢出标记预测器应用FastText 我将标记作为数据帧:Python 生成FastText多标签格式,python,Python,我想为堆栈溢出标记预测器应用FastText 我将标记作为数据帧: df['Tags'] 0 [php] 1 [firefox] 2 [r] 3 [c#] 4 [php, api] ... 179994 [php, flash] 179995 [delphi] 179996
df['Tags']
0 [php]
1 [firefox]
2 [r]
3 [c#]
4 [php, api]
...
179994 [php, flash]
179995 [delphi]
179996 [c]
179997 [android]
179998 [java, email]
Name: Tags, Length: 134222, dtype: object
我想将每个元素转换为一个字符串\uuuuuu label\uuuuuxx\uuuuu label\uuuuuuyy\uuuu
,因此我尝试了:
tags=['__label__'.join(s) for s in df['Tags']]
这导致:
['php', 'firefox', 'r', 'c#', 'php__label__api', 'c#__label__asp.net', '.net__label__javascript', 'sql', '.net', 'algorithm', 'windows-7']
但我希望我的结果是
['__label__php', '__label__firefox', '__label__r', '__label__c#', '__label__php__label__api', '__label__c#__label__asp.net', '__label__.net__label__javascript', '__label__sql', '__label__.net', '__label__algorithm', '__label__windows-7']
尝试:
测试:
同样值得一看的是
result=df['Tags'].applymap(lambda s:j+j.join(s))
但我没有测试它。添加了代码块并减少了垂直空间。还将所需输出移到更靠近输出部分的位置。
tags = ['__label__' + '__label__'.join(s) for s in df['Tags']]
labels = [['foo'], ['bar', 'baz']]
j = '__label__'
[j + j.join(l) for l in labels]
# out: ['__label__foo', '__label__bar__label__baz']