面对“问题”;wget";用python
我是python的新手。我面临“wget”和“urllib.urlretrieve(str(myurl),tail)”的问题 当我运行脚本时,它正在下载文件,但文件名以“”结尾 我的完整代码:面对“问题”;wget";用python,python,wget,Python,Wget,我是python的新手。我面临“wget”和“urllib.urlretrieve(str(myurl),tail)”的问题 当我运行脚本时,它正在下载文件,但文件名以“”结尾 我的完整代码: import os import wget import urllib import subprocess with open('/var/log/na/na.access.log') as infile, open('/tmp/reddy_log.txt', 'w') as outfile: r
import os
import wget
import urllib
import subprocess
with open('/var/log/na/na.access.log') as infile, open('/tmp/reddy_log.txt', 'w') as outfile:
results = set()
for line in infile:
if ' 200 ' in line:
tokens = line.split()
results.add(tokens[6]) # 7th token
for result in sorted(results):
print >>outfile, result
with open ('/tmp/reddy_log.txt') as infile:
results = set()
for line in infile:
head, tail = os.path.split(line)
print tail
myurl = "http://data.xyz.com" + str(line)
print myurl
wget.download(str(myurl))
# urllib.urlretrieve(str(myurl),tail)
输出:
# python last.py
0011400026_recap.xml
http://data.na.com/feeds/mobile/android/v2.0/video/games/high/0011400026_recap.xml
latest_1.xml
http://data.na.com/feeds/mobile/iphone/article/league/news/latest_1.xml
currenttime.js
列出文件:
# ls
0011400026_recap.xml? currenttime.js? latest_1.xml? today.xml?
对你所经历的行为的一个可能的解释是你确实这样做了 不清理您的输入
行
在文件对象上迭代时,(对于infle中的行:
)字符串
如果未使用换行符('\n'
),则会以换行符终止get
在使用行
之前删除换行符,哦,换行符
您使用行
产生的内容中仍然存在
作为这一概念的说明,请看一下成绩单
我做了一个测试
08:28$cat>a_文件
A.
B
C
08:29$cat>test.py
数据=打开('a_文件')
对于行输入数据:
新建文件=打开(第“w”行)
新建_文件。关闭()
08:31$ls
a_文件test.py
08:31$python test.py
08:31$ls
A.a_文件b?Ctest.py
08:31$ls-b
a\n a\u文件b\n c\n test.py
08:31 $
如您所见,我从文件中读取行,并使用
line
作为文件名,猜猜看,ls
最后有一个?
——但是我们可以做得更好,正如在
精细手册页,共页ls
而且,正如您在ls-b的输出中所看到的,文件名不是
以问号结尾(默认情况下只是一个占位符)
通过ls
程序),但以换行符终止
当我这么做的时候,我不得不说你应该避免使用
用于存储计算中间结果的临时文件
Python的一个很好的特性是生成器表达式的存在,
如果需要,可以按如下方式编写代码
import wget
# you matched on a '200' on the whole line, I assume that what
# you really want is to match a specific column, the 'error_column'
# that I symbolically load from an external resource
from my_constants import error_column, payload_column
# here it is a sequence of generator expressions, each one relying
# on the previous one
# 1. the lines in the file, stripped from the white space
# on the right (the newline is considered white space)
# === not strictly necessary, just convenient because
# === below we want to test for non-empty lines
lines = (line.rstrip() for line in open('whatever.csv'))
# 2. the lines are converted to a list of 'tokens'
all_tokens = (line.split() for line in lines if line)
# 3. for each 'tokens' in the 'all_tokens' generator expression, we
# check for the code '200' and possibly generate a new target
targets = (tokens[payload_column] for tokens in all_tokens if tokens[error_column]=='200')
# eventually, use the 'targets' generator to proceed with the downloads
for target in targets: wget.download(target)
不要被大量的注释所愚弄,没有注释我的代码只是
import wget
from my_constants import error_column
lines = (line.rstrip() for line in open('whatever.csv'))
all_tokens = (line.split() for line in lines if line)
targets = (tokens[payload_column] for tokens in all_tokens if tokens[error_column]=='200')
for target in targets: wget.download(target)
看起来像一个换行符,因为它每次都打印一行。如果没有看到
line
@CoryMadden,就很难确定我应该提供什么更多的信息?line
对于初学者来说。myurl=''+str(line)print myurl#wgproc=subprocess.Popen(['wget','-r','-trys=10',str url',-o',log'],stdout=subprocess.PIPE,stderr=subprocess.stdout)#(标准输出,垃圾)=wgproc.communicate()wget.download(str(myurl))#urllib.urlretrieve(str(myurl),tail)您显示的代码无法提供您显示的输出。此外,压痕是错误的。更不用说在评论中发布代码了。此外,不需要临时文件。整行200
上的匹配迟早会导致错误匹配。也就是说,我的水晶球告诉我,myurl=”http://data.xyz.com“+str(line.strip())
确实是您想要的。现在wget正在工作,正在使用strip():myurl=”“+str(line.strip())print myurl filename=wget.download(myurl)print filename
import wget
# you matched on a '200' on the whole line, I assume that what
# you really want is to match a specific column, the 'error_column'
# that I symbolically load from an external resource
from my_constants import error_column, payload_column
# here it is a sequence of generator expressions, each one relying
# on the previous one
# 1. the lines in the file, stripped from the white space
# on the right (the newline is considered white space)
# === not strictly necessary, just convenient because
# === below we want to test for non-empty lines
lines = (line.rstrip() for line in open('whatever.csv'))
# 2. the lines are converted to a list of 'tokens'
all_tokens = (line.split() for line in lines if line)
# 3. for each 'tokens' in the 'all_tokens' generator expression, we
# check for the code '200' and possibly generate a new target
targets = (tokens[payload_column] for tokens in all_tokens if tokens[error_column]=='200')
# eventually, use the 'targets' generator to proceed with the downloads
for target in targets: wget.download(target)
import wget
from my_constants import error_column
lines = (line.rstrip() for line in open('whatever.csv'))
all_tokens = (line.split() for line in lines if line)
targets = (tokens[payload_column] for tokens in all_tokens if tokens[error_column]=='200')
for target in targets: wget.download(target)