Numpy 导入包含两个混合列的Txt文件

Numpy 导入包含两个混合列的Txt文件,numpy,sentiment-analysis,Numpy,Sentiment Analysis,我想导入一个txt文件,如下所示: 0 @switchfoot http://twitpic.com/2y1zl - Awww that's a bummer. You shoulda got David Carr of Third Day to do it. ;D 0 is upset that he can't update his Facebook by texting it... and might cry as a result School today also. Blah!

我想导入一个txt文件,如下所示:

0 @switchfoot http://twitpic.com/2y1zl - Awww  that's a bummer.  You shoulda got David Carr of Third Day to do it. ;D
0 is upset that he can't update his Facebook by texting it... and might cry as a result  School today also. Blah!
0 @Kenichan I dived many times for the ball. Managed to save 50%  The rest go out of bounds
4 my whole body feels itchy and like its on fire 
4 @nationwideclass no  it's not behaving at all. i'm mad. why am i here? because I can't see you all over there. 
0 @Kwesidei not the whole crew 
所需的返回是一个numpy.array,它有两列,即
tw='0'或'4'
tw='string'
。但它总是给我错误。有人能帮忙吗

Train_tw=np.genfromtxt("classified_tweets0.txt",dtype=(int,str),names=['sentiment','tw'])

表达式中的错误是

ValueError: mismatch in size of old and new data-descriptor
如果我使用
dtype=None
,我得到

ValueError: Some errors were detected !
    Line #2 (got 22 columns instead of 20)
    Line #3 (got 19 columns instead of 20)
    Line #4 (got 11 columns instead of 20)
    Line #5 (got 22 columns instead of 20)
    Line #6 (got 6 columns instead of 20)
从“空白”分隔符开始,它将每行分成20、22等字段。文本中的空格与第一个一样是分隔符

一个选项是编辑文件,并用一些唯一的分隔符替换第一个空格。另一个选项是使用分隔符的字段长度版本。经过一点实验,这个负载看起来是合理的(这是Py3,所以我使用Unicode字符串dtype)

[32]中的
:np.genfromtxt(“stack42754603.txt”,dtype='int,U100',分隔符=[2100],名称=['emotional','tw'])
出[32]:
数组([(0,“@switchfoothttp://twitpic.com/2y1zl -啊,真糟糕。你应该找第三D区的大卫·卡尔,
(0,“对无法通过发短信更新Facebook感到不安……可能会因为今天上学而哭泣”),
(0,“@Kenichan我多次跳投抢球。成功扑救了50%,其余的都出界了\n”),
(4,'我全身发痒,像着火了一样',
(4,“@nationwideclass不,这根本不正常。我疯了。我为什么在这里?因为我看不到你们所有人。”,
(0,“@Kwesidei不是全体船员”),

dtype=[(“情绪”,“错误是什么?它无法区分数字后面的空格和字符串中的空格。请使用唯一的分隔符,或尝试字段长度版本。谢谢!但我已经尝试过,并出现以下错误。我使用的是Python 3。
TypeError回溯(最近一次调用)/usr/local/lib/python3.4/dist-packages/numpy/lib//u iotools.py in easy\u dtype(ndtype,names,defaultfmt,**validationargs)893 try:->894 ndtype=np.dtype(ndtype)895类型错误除外:类型错误:数据类型“u100”不理解
In [32]: np.genfromtxt("stack42754603.txt",dtype='int,U100',delimiter=[2,100],names=['sentiment','tw'])
Out[32]: 
array([ (0, "@switchfoot http://twitpic.com/2y1zl - Awww  that's a bummer.  You shoulda got David Carr of Third D"),
       (0, "is upset that he can't update his Facebook by texting it... and might cry as a result  School today "),
       (0, '@Kenichan I dived many times for the ball. Managed to save 50%  The rest go out of bounds\n'),
       (4, 'my whole body feels itchy and like its on fire\n'),
       (4, "@nationwideclass no  it's not behaving at all. i'm mad. why am i here? because I can't see you all o"),
       (0, '@Kwesidei not the whole crew')], 
      dtype=[('sentiment', '<i4'), ('tw', '<U100')])