Python 3.x 在Google云存储上的PDF文件上使用textract_Python 3.x_Google Cloud Platform_Google Cloud Storage

Python 3.x 在Google云存储上的PDF文件上使用textract

python-3.x google-cloud-platform google-cloud-storage

Python 3.x 在Google云存储上的PDF文件上使用textract,python-3.x,google-cloud-platform,google-cloud-storage,Python 3.x,Google Cloud Platform,Google Cloud Storage,我想将textract应用于托管在Google云存储上的pdf文件我已经使用并解析了一个txt文件（一切正常）和下载作为字符串（） storage\u client=storage.client（） bucket=storage\u client.get\u bucket（'bucket-for-pdf'） blob=bucket.get\u blob（'keywords.txt'）关键词\u file=blob.下载\u作为\u字符串（）.解码（'utf8'）下载_as_string（

我想将textract应用于托管在Google云存储上的pdf文件

我已经使用并解析了一个txt文件（一切正常）和下载作为字符串（）

storage\u client=storage.client（）
bucket=storage\u client.get\u bucket（'bucket-for-pdf'）
blob=bucket.get\u blob（'keywords.txt'）
关键词\u file=blob.下载\u作为\u字符串（）.解码（'utf8'）

下载_as_string（）的具体工作原理是什么？我可以做一些类似于pdf文件的事情吗？像这样的

storage\u client=storage.client（）
bucket=storage\u client.get\u bucket（'bucket-for-pdf'）
blob=bucket.get\u blob（文件名）
file\u name=blob。将\u下载到\u文件（file\u name）
返回textract.process（文件名，language='eng'，
编码（'utf-8'）。解码（'utf-8'））

上述代码导致错误：

self.\u stream.write（块）
AttributeError:“str”对象没有属性“write”

更新：到目前为止，唯一的解决办法是下载文件，完成后将其删除。

下载到文件方法采用文件对象，而不是文件名。尝试以下方法：

file_name=“/tmp/my file”
storage\u client=storage.client（）
bucket=storage\u client.get\u bucket（'bucket-for-pdf'）
打开（文件名，“wb”）作为文件对象：
blob.download_to_file（file_obj）
返回textract.process（文件名，语言='eng'，编码='utf-8'）。解码（'utf-8'））

让我试一试