Numpy 导入包含两个混合列的Txt文件_Numpy_Sentiment Analysis

Numpy 导入包含两个混合列的Txt文件

numpy

Numpy 导入包含两个混合列的Txt文件,numpy,sentiment-analysis,Numpy,Sentiment Analysis,我想导入一个txt文件，如下所示： 0 @switchfoot http://twitpic.com/2y1zl - Awww that's a bummer. You shoulda got David Carr of Third Day to do it. ;D 0 is upset that he can't update his Facebook by texting it... and might cry as a result School today also. Blah!

我想导入一个txt文件，如下所示：

0 @switchfoot http://twitpic.com/2y1zl - Awww  that's a bummer.  You shoulda got David Carr of Third Day to do it. ;D
0 is upset that he can't update his Facebook by texting it... and might cry as a result  School today also. Blah!
0 @Kenichan I dived many times for the ball. Managed to save 50%  The rest go out of bounds
4 my whole body feels itchy and like its on fire 
4 @nationwideclass no  it's not behaving at all. i'm mad. why am i here? because I can't see you all over there. 
0 @Kwesidei not the whole crew

所需的返回是一个numpy.array，它有两列，即

tw='0'或'4'

和

tw='string'

。但它总是给我错误。有人能帮忙吗

Train_tw=np.genfromtxt("classified_tweets0.txt",dtype=(int,str),names=['sentiment','tw'])

表达式中的错误是

ValueError: mismatch in size of old and new data-descriptor

如果我使用

dtype=None

，我得到

ValueError: Some errors were detected !
    Line #2 (got 22 columns instead of 20)
    Line #3 (got 19 columns instead of 20)
    Line #4 (got 11 columns instead of 20)
    Line #5 (got 22 columns instead of 20)
    Line #6 (got 6 columns instead of 20)

从“空白”分隔符开始，它将每行分成20、22等字段。文本中的空格与第一个一样是分隔符

一个选项是编辑文件，并用一些唯一的分隔符替换第一个空格。另一个选项是使用分隔符的字段长度版本。经过一点实验，这个负载看起来是合理的（这是Py3，所以我使用Unicode字符串dtype）

[32]中的

：np.genfromtxt（“stack42754603.txt”，dtype='int，U100'，分隔符=[2100]，名称=['emotional'，'tw']）
出[32]：
数组（[（0，“@switchfoothttp://twitpic.com/2y1zl -啊，真糟糕。你应该找第三D区的大卫·卡尔，
（0，“对无法通过发短信更新Facebook感到不安……可能会因为今天上学而哭泣”），
（0，“@Kenichan我多次跳投抢球。成功扑救了50%，其余的都出界了\n”），
（4，'我全身发痒，像着火了一样'，
（4，“@nationwideclass不，这根本不正常。我疯了。我为什么在这里？因为我看不到你们所有人。”，
（0，“@Kwesidei不是全体船员”），
dtype=[（“情绪”，“错误是什么？它无法区分数字后面的空格和字符串中的空格。请使用唯一的分隔符，或尝试字段长度版本。谢谢！但我已经尝试过，并出现以下错误。我使用的是Python 3。TypeError回溯（最近一次调用）/usr/local/lib/python3.4/dist-packages/numpy/lib//u iotools.py in easy\u dtype（ndtype，names，defaultfmt，**validationargs）893 try:->894 ndtype=np.dtype（ndtype）895类型错误除外：类型错误：数据类型“u100”不理解
In [32]: np.genfromtxt("stack42754603.txt",dtype='int,U100',delimiter=[2,100],names=['sentiment','tw'])
Out[32]: 
array([ (0, "@switchfoot http://twitpic.com/2y1zl - Awww  that's a bummer.  You shoulda got David Carr of Third D"),
       (0, "is upset that he can't update his Facebook by texting it... and might cry as a result  School today "),
       (0, '@Kenichan I dived many times for the ball. Managed to save 50%  The rest go out of bounds\n'),
       (4, 'my whole body feels itchy and like its on fire\n'),
       (4, "@nationwideclass no  it's not behaving at all. i'm mad. why am i here? because I can't see you all o"),
       (0, '@Kwesidei not the whole crew')], 
      dtype=[('sentiment', '<i4'), ('tw', '<U100')])