列分离python_Python_Csv - Fatal编程技术网

列分离python

python csv

列分离python,python,csv,Python,Csv,我正在写我的学士学位论文，正在用python分析我的数据。不幸的是，我不是编程专家，也不认识任何使用python的人我有一个代码，用逗号分隔CSV文件中的列。我希望代码用|分隔列我曾试图用a替换第58行中的逗号，但这行不通，令人惊讶。因为我是编程领域的一个无名小卒，谷歌搜索对我来说毫无意义。任何帮助都将不胜感激 from sklearn.feature_extraction.text import CountVectorizer from sklearn import linear_mode

我正在写我的学士学位论文，正在用python分析我的数据。不幸的是，我不是编程专家，也不认识任何使用python的人

我有一个代码，用逗号分隔CSV文件中的列。我希望代码用|分隔列

我曾试图用a替换第58行中的逗号，但这行不通，令人惊讶。因为我是编程领域的一个无名小卒，谷歌搜索对我来说毫无意义。任何帮助都将不胜感激

from sklearn.feature_extraction.text import CountVectorizer
from sklearn import linear_model
import csv
import cPickle
from sklearn.metrics import accuracy_score

def main():
    train_file = "train.csv"
    test_file  = "test.csv"
    # Read documents
    train_docs, Y = read_docs(train_file)

    # Define which features to extract (character bigrams in this case)
    extract = CountVectorizer(lowercase=False, ngram_range=(2,2), 
                              analyzer="char")

    extract.fit(train_docs) # create vocabulary from training data

    # Extract features from train data
    X = extract.transform(train_docs)

    # Initialize model
    model = linear_model.LogisticRegression()

    # Train model
    model.fit(X, Y)

    # Write model to file so it can be reused
    cPickle.dump((extract,model),open("model.pickle","w")) 

    # Print coefficients to see which features are important
    for i,f in enumerate(extract.get_feature_names()):
        print f, model.coef_[0][i]

    # Testing
    # Read test data
    test_docs, Y_test = read_docs(test_file)

    # Extract features from test data
    X_test = extract.transform(test_docs)

    # Apply model to test data
    Y_predict = model.predict(X_test)

    # Evaluation
    print accuracy_score(Y_test, Y_predict)

def read_docs(filename):
    '''
    Return X,Y where X is the list of documents and Y the list of their
    labels.
    '''
    X = []
    Y = []
    with open(filename) as f:
        r = csv.reader(f)
        for row in r:
            text,label = row
            X.append(text)
            Y.append(int(label))
    return X,Y


main()

在这一刻，我做到了这一点：

 csv.register_dialect('pipes', delimiter='|')

    with open(filename) as f:
        r = csv.reader(f, dialect ='pipes')
        for row in r:
            text,label = row
            X.append(text)
            Y.append(int(label))
    return X,Y

但我现在一直在犯错误：

Traceback (most recent call last):
  File "D:/python/logreggwen.py", line 67, in <module>
    main()
  File "D:/python/logreggwen.py", line 11, in main
    train_docs, Y = read_docs(train_file)
  File "D:/python/logreggwen.py", line 61, in read_docs
    text,label = row
ValueError: need more than 1 value to unpack

回溯（最近一次呼叫最后一次）：
文件“D:/python/logreggwen.py”，第67行，在
main（）
文件“D:/python/logreggwen.py”，第11行，主目录
列车文档，Y=读取列车文档（列车文档）
读取文档中的文件“D:/python/logreggwen.py”，第61行
文本，标签=行
ValueError:需要超过1个值才能解包

您需要告诉CSV阅读器数据文件使用的分隔符：

csv.reader(f, delimiter='|')

但实际上，您需要阅读相应的文档：

谢谢！我发现了，但现在我得到了一些我无法放置的错误。调试你的代码<代码>打印报告（行）而不是

文本，标签=行

。这将给你下一条线索。