Python 尝试使用keras get_file函数提取文本数据_Python_Text_Keras

Python 尝试使用keras get_file函数提取文本数据

python text keras

Python 尝试使用keras get_file函数提取文本数据,python,text,keras,Python,Text,Keras,我目前正在研究一个keras程序，该程序试图使用CNN生成文本数据。在我的教授提供给我的代码中，我使用以下函数： path = get_file('input.txt', origin='https://www.dropbox.com/s/2z0zdn54cqu3cqj/input.txt?dl=0') from keras.utils.data_utils import get_file 这是使用以下函数导入的： path = get_file('input.txt', origin='h

我目前正在研究一个

keras

程序，该程序试图使用CNN生成文本数据。在我的教授提供给我的代码中，我使用以下函数：

path = get_file('input.txt', origin='https://www.dropbox.com/s/2z0zdn54cqu3cqj/input.txt?dl=0')

from keras.utils.data_utils import get_file

这是使用以下函数导入的：

path = get_file('input.txt', origin='https://www.dropbox.com/s/2z0zdn54cqu3cqj/input.txt?dl=0')

from keras.utils.data_utils import get_file

现在提供给我们的原始文本语料库工作正常。但是，每当我在

get_file

函数中更改文件

origin

并重命名要另存为的文件名时，我就开始获得

HTML

code。这有什么特别的原因吗？例如，尽管我使用了

https://github.com/nlp-compromise/nlp-corpus/blob/master/poe/man_of_crowd.txt

和

https://raw.githubusercontent.com/nlp-compromise/nlp-corpus/master/poe/man_of_crowd.txt

（第二个链接是原始文件）

对于第一个链接，，即使它看起来解析为文本文件资源，但它是GitHub上的

HTML

页面，这就是从该链接下载时获得

HTML

代码的原因

至于第二个原始链接，它实际上指向文本文件资源，当您使用以下方式下载文件时：

>> from keras.utils.data_utils import get_file
>> path = get_file('man_of_crowd.txt', 
                'https://raw.githubusercontent.com/nlp-compromise/nlp-corpus/master/poe/man_of_crowd.txt')

Downloading data from https://raw.githubusercontent.com/nlp-compromise/nlp-corpus/master/poe/man_of_crowd.txt
16384/20391 [=======================>......] - ETA: 0s

它实际上以文本文件的形式下载，路径为：

>> print(path)
/home/<username>/.keras/datasets/man_of_crowd.txt

>打印（路径）
/home/.keras/datasets/man_of_crowd.txt

kerasutil函数实际上使用了一个六个包装器。

get_file

方法的代码可以在他们的GitHub存储库中找到。

我觉得

'xxx.txt'，origin='

不起作用，因为只需键入

'xxx.txt'，originurl''

效果更好。