列分离python
我正在写我的学士学位论文,正在用python分析我的数据。不幸的是,我不是编程专家,也不认识任何使用python的人 我有一个代码,用逗号分隔CSV文件中的列。我希望代码用|分隔列 我曾试图用a替换第58行中的逗号,但这行不通,令人惊讶。因为我是编程领域的一个无名小卒,谷歌搜索对我来说毫无意义。任何帮助都将不胜感激列分离python,python,csv,Python,Csv,我正在写我的学士学位论文,正在用python分析我的数据。不幸的是,我不是编程专家,也不认识任何使用python的人 我有一个代码,用逗号分隔CSV文件中的列。我希望代码用|分隔列 我曾试图用a替换第58行中的逗号,但这行不通,令人惊讶。因为我是编程领域的一个无名小卒,谷歌搜索对我来说毫无意义。任何帮助都将不胜感激 from sklearn.feature_extraction.text import CountVectorizer from sklearn import linear_mode
from sklearn.feature_extraction.text import CountVectorizer
from sklearn import linear_model
import csv
import cPickle
from sklearn.metrics import accuracy_score
def main():
train_file = "train.csv"
test_file = "test.csv"
# Read documents
train_docs, Y = read_docs(train_file)
# Define which features to extract (character bigrams in this case)
extract = CountVectorizer(lowercase=False, ngram_range=(2,2),
analyzer="char")
extract.fit(train_docs) # create vocabulary from training data
# Extract features from train data
X = extract.transform(train_docs)
# Initialize model
model = linear_model.LogisticRegression()
# Train model
model.fit(X, Y)
# Write model to file so it can be reused
cPickle.dump((extract,model),open("model.pickle","w"))
# Print coefficients to see which features are important
for i,f in enumerate(extract.get_feature_names()):
print f, model.coef_[0][i]
# Testing
# Read test data
test_docs, Y_test = read_docs(test_file)
# Extract features from test data
X_test = extract.transform(test_docs)
# Apply model to test data
Y_predict = model.predict(X_test)
# Evaluation
print accuracy_score(Y_test, Y_predict)
def read_docs(filename):
'''
Return X,Y where X is the list of documents and Y the list of their
labels.
'''
X = []
Y = []
with open(filename) as f:
r = csv.reader(f)
for row in r:
text,label = row
X.append(text)
Y.append(int(label))
return X,Y
main()
在这一刻,我做到了这一点:
csv.register_dialect('pipes', delimiter='|')
with open(filename) as f:
r = csv.reader(f, dialect ='pipes')
for row in r:
text,label = row
X.append(text)
Y.append(int(label))
return X,Y
但我现在一直在犯错误:
Traceback (most recent call last):
File "D:/python/logreggwen.py", line 67, in <module>
main()
File "D:/python/logreggwen.py", line 11, in main
train_docs, Y = read_docs(train_file)
File "D:/python/logreggwen.py", line 61, in read_docs
text,label = row
ValueError: need more than 1 value to unpack
回溯(最近一次呼叫最后一次):
文件“D:/python/logreggwen.py”,第67行,在
main()
文件“D:/python/logreggwen.py”,第11行,主目录
列车文档,Y=读取列车文档(列车文档)
读取文档中的文件“D:/python/logreggwen.py”,第61行
文本,标签=行
ValueError:需要超过1个值才能解包
您需要告诉CSV阅读器数据文件使用的分隔符:
csv.reader(f, delimiter='|')
但实际上,您需要阅读相应的文档:
谢谢!我发现了,但现在我得到了一些我无法放置的错误。调试你的代码<代码>打印报告(行)而不是
文本,标签=行
。这将给你下一条线索。