面对“问题”；wget"；用python_Python_Wget

面对“问题”；wget"；用python

python

面对“问题”；wget"；用python,python,wget,Python,Wget,我是python的新手。我面临“wget”和“urllib.urlretrieve（str（myurl），tail）”的问题当我运行脚本时，它正在下载文件，但文件名以“”结尾我的完整代码： import os import wget import urllib import subprocess with open('/var/log/na/na.access.log') as infile, open('/tmp/reddy_log.txt', 'w') as outfile: r

我是python的新手。我面临“wget”和“urllib.urlretrieve（str（myurl），tail）”的问题

当我运行脚本时，它正在下载文件，但文件名以“”结尾

我的完整代码：

import os
import wget
import urllib
import subprocess
with open('/var/log/na/na.access.log') as infile, open('/tmp/reddy_log.txt', 'w') as outfile:
    results = set()
    for line in infile:
        if ' 200 ' in line:
            tokens = line.split()
            results.add(tokens[6]) # 7th token
    for result in sorted(results):
        print >>outfile, result
with open ('/tmp/reddy_log.txt') as infile:
     results = set()
     for line in infile:
     head, tail = os.path.split(line)
                print tail
                myurl = "http://data.xyz.com" + str(line)
                print myurl
                wget.download(str(myurl))
                #  urllib.urlretrieve(str(myurl),tail)

输出：

# python last.py
0011400026_recap.xml

http://data.na.com/feeds/mobile/android/v2.0/video/games/high/0011400026_recap.xml

latest_1.xml

http://data.na.com/feeds/mobile/iphone/article/league/news/latest_1.xml

currenttime.js

列出文件：

# ls
0011400026_recap.xml?                   currenttime.js?  latest_1.xml?      today.xml?

对你所经历的行为的一个可能的解释是你确实这样做了不清理您的输入

行

在文件对象上迭代时，（

对于infle中的行：

）字符串如果未使用换行符（

'\n'

），则会以换行符终止get 在使用

行

之前删除换行符，哦，换行符您使用

行

产生的内容中仍然存在

作为这一概念的说明，请看一下成绩单我做了一个测试

08:28$cat>a_文件
A.
B
C
08:29$cat>test.py
数据=打开（'a_文件'）
对于行输入数据：
新建文件=打开（第“w”行）
新建_文件。关闭（）
08:31$ls
a_文件test.py
08:31$python test.py
08:31$ls
A.a_文件b？Ctest.py
08:31$ls-b
a\n a\u文件b\n c\n test.py
08:31 $

如您所见，我从文件中读取行，并使用

line

作为文件名，猜猜看，

ls

最后有一个

？

——但是我们可以做得更好，正如在精细手册页，共页

ls

而且，正如您在ls-b的输出中所看到的，文件名不是以问号结尾（默认情况下只是一个占位符）通过

ls

程序），但以换行符终止

当我这么做的时候，我不得不说你应该避免使用用于存储计算中间结果的临时文件

Python的一个很好的特性是生成器表达式的存在，如果需要，可以按如下方式编写代码

import wget

# you matched on a '200' on the whole line, I assume that what
# you really want is to match a specific column, the 'error_column'
# that I symbolically load from an external resource
from my_constants import error_column, payload_column

# here it is a sequence of generator expressions, each one relying
# on the previous one

# 1. the lines in the file, stripped from the white space
#    on the right (the newline is considered white space)
#    === not strictly necessary, just convenient because
#    === below we want to test for non-empty lines
lines = (line.rstrip() for line in open('whatever.csv'))

# 2. the lines are converted to a list of 'tokens' 
all_tokens = (line.split() for line in lines if line)

# 3. for each 'tokens' in the 'all_tokens' generator expression, we
#    check for the code '200' and possibly generate a new target
targets = (tokens[payload_column] for tokens in all_tokens if tokens[error_column]=='200')

# eventually, use the 'targets' generator to proceed with the downloads
for target in targets: wget.download(target)

不要被大量的注释所愚弄，没有注释我的代码只是

import wget
from my_constants import error_column

lines = (line.rstrip() for line in open('whatever.csv'))
all_tokens = (line.split() for line in lines if line)
targets = (tokens[payload_column] for tokens in all_tokens if tokens[error_column]=='200')

for target in targets: wget.download(target)

看起来像一个换行符，因为它每次都打印一行。如果没有看到

line

@CoryMadden，就很难确定我应该提供什么更多的信息？

line

对于初学者来说。myurl=''+str（line）print myurl#wgproc=subprocess.Popen（['wget'，'-r'，'-trys=10'，str url'，-o'，log']，stdout=subprocess.PIPE，stderr=subprocess.stdout）#（标准输出，垃圾）=wgproc.communicate（）wget.download（str（myurl））#urllib.urlretrieve（str（myurl），tail）您显示的代码无法提供您显示的输出。此外，压痕是错误的。更不用说在评论中发布代码了。此外，不需要临时文件。整行

上的匹配迟早会导致错误匹配。也就是说，我的水晶球告诉我，

myurl=”http://data.xyz.com“+str（line.strip（））

确实是您想要的。现在wget正在工作，正在使用strip（）：myurl=”“+str（line.strip（））print myurl filename=wget.download（myurl）print filename

import wget

# you matched on a '200' on the whole line, I assume that what
# you really want is to match a specific column, the 'error_column'
# that I symbolically load from an external resource
from my_constants import error_column, payload_column

# here it is a sequence of generator expressions, each one relying
# on the previous one

# 1. the lines in the file, stripped from the white space
#    on the right (the newline is considered white space)
#    === not strictly necessary, just convenient because
#    === below we want to test for non-empty lines
lines = (line.rstrip() for line in open('whatever.csv'))

# 2. the lines are converted to a list of 'tokens' 
all_tokens = (line.split() for line in lines if line)

# 3. for each 'tokens' in the 'all_tokens' generator expression, we
#    check for the code '200' and possibly generate a new target
targets = (tokens[payload_column] for tokens in all_tokens if tokens[error_column]=='200')

# eventually, use the 'targets' generator to proceed with the downloads
for target in targets: wget.download(target)

import wget
from my_constants import error_column

lines = (line.rstrip() for line in open('whatever.csv'))
all_tokens = (line.split() for line in lines if line)
targets = (tokens[payload_column] for tokens in all_tokens if tokens[error_column]=='200')

for target in targets: wget.download(target)