如何使用python仅从CSV文件中抓取特定URL?

如何使用python仅从CSV文件中抓取特定URL?,python,csv,selenium-webdriver,web-crawler,Python,Csv,Selenium Webdriver,Web Crawler,我有一个CSV文件,里面有很多URL,都有不同的域扩展名(.com,.eu,.org等等)。但我只想在python 2.7中使用第行中的if'.nl:对扩展名为.nl的域进行爬网: from selenium import webdriver import csv fieldnames = ['Website', '@media', 'googleadservices.com/pagead/conversion'] def csv_writerheader(path): with o

我有一个CSV文件,里面有很多URL,都有不同的域扩展名(
.com
.eu
.org
等等)。但我只想在python 2.7中使用第行中的
if'.nl:
对扩展名为
.nl
的域进行爬网:

from selenium import webdriver
import csv

fieldnames = ['Website', '@media', 'googleadservices.com/pagead/conversion']

def csv_writerheader(path):
    with open(path, 'w') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames, lineterminator='\n')
        writer.writeheader()

def csv_writer(dictdata, path):
    with open(path, 'a') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames, lineterminator='\n')
        writer.writerow(dictdata)

csv_output_file = 'output!.csv'

driver = webdriver.Chrome(executable_path=r'C:\Users\Jacob\PycharmProjects\Testing\chromedriver_win32\chromedriver.exe')    

keywords = ['@media', 'googleadservices.com/pagead/conversion']

csv_writerheader(csv_output_file)

with open('top1m-edited.csv') as example_file:
    example_reader = csv.reader(example_file)
    for row in example_reader:

        # INITIALIZE DICT
        data = {'Website': row}

        if '.nl' in row:  # MAKING THE DOMAIN DISTINCTION HERE
            try:
                driver.get(row[0])
                html = driver.page_source    

                for searchstring in keywords:
                    if searchstring.lower() in html.lower():
                        print (row, searchstring, 'FOUND!')
                        data[searchstring] = 'FOUND!'
                    else:
                        print (row, searchstring, 'not found')
                        data[searchstring] = 'not found'    

                csv_writer(data, csv_output_file)

            except:
                pass
打印结果:

C:\Python27\python.exe "C:/Users/Jacob/PycharmProjects/Testing/fooling around 2.py"

Process finished with exit code 0
所以我的脚本在这种状态下基本上不做任何事情,除了导出一个几乎没有结果的CSV文件

但是,当我在第行中省略了
if'.nl:
时,脚本工作得非常好

我应该做什么调整,以便只使用脚本导入/刮取
.nl
域URL

for row in example_reader:
类型
是一个列表。因此,它正在列表中查找一个正好是“.nl”的项。你有几个选择。如果CSV文件仅包含一列URL,则可以更改:

if '.nl' in row:
为此:

if '.nl' in row[0]:
编辑:此外,您对
的任何分配都需要更改为
行[0]
,例如
数据={'Website':行[0]}