Python 2.7 如何在Anaconda Jupyter笔记本和Python 2.7上开始Udacity的机器学习课程?

Python 2.7 如何在Anaconda Jupyter笔记本和Python 2.7上开始Udacity的机器学习课程?,python-2.7,github,machine-learning,anaconda,jupyter-notebook,Python 2.7,Github,Machine Learning,Anaconda,Jupyter Notebook,我想开始一门关于udacity的机器学习课程。所以我下载了ud120-projects-master.zip文件,并将其解压缩到我的下载文件夹中。我已经安装了AnacondaJupyter笔记本(python 2.7) 第一个迷你项目是Naïve Bayes,因此我打开了jupyter笔记本和%load nb_author_id.py以转换为.ipynb 但是我想我必须首先在tools文件夹中运行startup.py来提取数据 所以我运行了startup.ipynb # %load startu

我想开始一门关于udacity的机器学习课程。所以我下载了ud120-projects-master.zip文件,并将其解压缩到我的下载文件夹中。我已经安装了AnacondaJupyter笔记本(python 2.7)

第一个迷你项目是Naïve Bayes,因此我打开了jupyter笔记本和%load nb_author_id.py以转换为.ipynb 但是我想我必须首先在tools文件夹中运行startup.py来提取数据

所以我运行了startup.ipynb

# %load startup.py
print
print "checking for nltk"
try:
    import nltk
except ImportError:
    print "you should install nltk before continuing"

print "checking for numpy"
try:
    import numpy
except ImportError:
    print "you should install numpy before continuing"

print "checking for scipy"
try:
    import scipy
except:
    print "you should install scipy before continuing"

print "checking for sklearn"
try:
    import sklearn
except:
    print "you should install sklearn before continuing"

print
print "downloading the Enron dataset (this may take a while)"
print "to check on progress, you can cd up one level, then execute <ls -lthr>"
print "Enron dataset should be last item on the list, along with its current size"
print "download will complete at about 423 MB"
import urllib
url = "https://www.cs.cmu.edu/~./enron/enron_mail_20150507.tgz"
urllib.urlretrieve(url, filename="../enron_mail_20150507.tgz") 
print "download complete!"


print
print "unzipping Enron dataset (this may take a while)"
import tarfile
import os
os.chdir("..")
tfile = tarfile.open("enron_mail_20150507.tgz", "r:gz")
tfile.extractall(".")

print "you're ready to go!"
错误/警告

C:\Users\jr31964\AppData\Local\Continuum\Anaconda2\lib\site-packages\sklearn\cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)




no. of Chris training emails: 7936
no. of Sara training emails: 7884

如何从Naïve Bayes mini project开始,以及需要哪些先决条件操作。

使用Anaconda是正确的决定-这解决了Python 2和Python 3之间的一系列不兼容问题以及各种包依赖关系。我很辛苦地完成了这项工作,并正在将代码转换为Python3(&dependencies),因为当我完成时,我想要一个最新的环境和编程技能;但那只是我

显然,您可以忽略这个弃用警告:sklearn 0.19.0仍然有效。任何试图在0.20.0之后运行此操作的人都会遇到问题。但是,如果您觉得这很烦人(像我一样),您可以编辑文件tools/email_preprocess.py并更改以下行(注释中的原始行):

另外,因为某些安装依赖于其他安装。较早的成功安装(如numpy)会导致其他软件包(如scipy)的安装失败,因为该软件包的预请求是numpy+mkl。如果您刚刚安装了numpy,则需要卸载并替换它。在https-colon//github.com/scipy/scipy/issues/7221(我已达到链接限制)上查看更多关于此的信息

我遇到的下一个问题是,在我的机器上,enron_mail_20150507.tgz中的电子邮件文件量太大,以至于它运行了几个小时而没有到达完成消息:

print "you're ready to go!"
事实证明,我的IDE(PyCharm)在文件解包时对其进行了索引,这会杀死磁盘。由于索引文本文件是不必要的,我关闭了目录“maildir”的索引。这使startup.py得以完成

urllib遇到的错误是由于包中发生了更改:您需要将import语句更改为:

import urllib.request
…然后将第34行(上面的错误消息)发送到:

请注意,github上的此链接非常有用:

此答案的其余部分与Windows 10相关,因此Linux用户可以跳过此部分。

我遇到的下一个问题是,由于没有针对W10正确优化安装,一些包导入失败。解决此问题的宝贵资源是一组Windows优化的.whl(wheel)文件,可在以下位置找到:

下一个问题是.tgz文件的解包引入了Linux和Windows文件之间可能熟悉的LF/CRLF字符问题。github上的@monkshow92对此有一个修复程序,此处:(再次链接限制)https冒号//github.com/udacity/ud120 projects/issues/46


除此之外,这是一件轻而易举的事……

因为我认为这门课是在Python 3中进行的,所以我建议在Python 3中创建一个conda环境。即使有python 2的基本python安装,也可以这样做。这将节省您将Python3中的所有课程代码转换为Python2的时间

conda create --name UdacityCourseEnvironment python=3.6

# to get into your new environment (mac/linux)
source activate UdacityCourseEnvironment

# to get into your new environment (windows)
activate UdacityCourseEnvironment

# When you need new packages inside your new environment 
conda install nameOfPackage

来源:

我认为您更有可能从Udacity支持论坛获得有用的帮助。你在那里试过吗?
#features_train, features_test, labels_train, labels_test = cross_validation.train_test_split(word_data, authors, test_size=0.1, random_state=42)
features_train, features_test, labels_train, labels_test = train_test_split(word_data, authors, test_size=0.1, random_state=42)
print "you're ready to go!"
import urllib.request
urllib.request.urlretrieve(url, filename="../enron_mail_20150507.tar.gz")
conda create --name UdacityCourseEnvironment python=3.6

# to get into your new environment (mac/linux)
source activate UdacityCourseEnvironment

# to get into your new environment (windows)
activate UdacityCourseEnvironment

# When you need new packages inside your new environment 
conda install nameOfPackage