Python 2.7 熊猫：一次迭代一列，以自动进行谷歌搜索？_Python 2.7_Csv_Pandas_Automation

Python 2.7 熊猫：一次迭代一列，以自动进行谷歌搜索？

python-2.7 csv pandas automation

Python 2.7 熊猫：一次迭代一列，以自动进行谷歌搜索？,python-2.7,csv,pandas,automation,Python 2.7,Csv,Pandas,Automation,我试图在csv中的特定列上（通过python 2.7）自动执行100次google搜索（一行中每个字符串一次，每个查询返回URL）；但是，我无法让Pandas向Google搜索自动程序读取行内容 *谷歌搜索源= 总的来说，当我使用以下代码时，我可以成功打印查询的URL： from google import search query = "apples" for url in search(query, stop=5, pause=2.0): print(url) 但是，当我添加pa

我试图在csv中的特定列上（通过python 2.7）自动执行100次google搜索（一行中每个字符串一次，每个查询返回URL）；但是，我无法让Pandas向Google搜索自动程序读取行内容

*谷歌搜索源=

总的来说，当我使用以下代码时，我可以成功打印查询的URL：

from google import search

query = "apples"
for url in search(query, stop=5, pause=2.0):
    print(url)

但是，当我添加panda（读取每个“查询”）时，行不会按预期的方式读取->查询即查询“data.irow（n）”而不是行内容，一次查询一个。

 from google import search
import pandas as pd
from pandas import DataFrame

query_performed = 0
querying = True
query = 'data.irow(n)'

#read the excel file at column 2 (i.e. "Fruit")
df = pd.read_csv('C:\Users\Desktop\query_results.csv', header=0, sep=',', index_col= 'Fruit')

# need to specify "Column2" and one "data.irow(n)" queried at a time
while querying: 
    if query_performed <= 100:
        print("query") 
        query_performed +=1
    else:
        querying =  False
    print("Asked all 100 query's")


#prints initial urls for each "query" in a google search
for url in search(query, stop=5, pause=2.0):
    print(url)

仅供参考：我的Excel.CSV格式如下：

     B
1   **Fruit**
2   apples
2   oranges
4   mangos
5   mangos
6   mangos
...
101 mangos

非常感谢您对下一步的建议！提前谢谢

这是我得到的。正如我在评论中提到的，我无法让stop参数像我认为的那样工作。也许我误解了它的用法。我假设您每次搜索只需要前5个URL

样品df

d = {"B" : ["mangos", "oranges", "apples"]}
df = pd.DataFrame(d)

然后

这给了你很多。这方面的格式有点粗糙

    B   C   D   E   F   G
0    mangos  http://en.wikipedia.org/wiki/Mango  http://en.wikipedia.org/wiki/Mango_(disambigua...   http://en.wikipedia.org/wiki/Mangifera  http://en.wikipedia.org/wiki/Mangifera_indica   http://en.wikipedia.org/wiki/Purple_mangosteen
1    oranges     http://en.wikipedia.org/wiki/Orange_(fruit)     http://en.wikipedia.org/wiki/Bitter_orange  http://en.wikipedia.org/wiki/Valencia_orange    http://en.wikipedia.org/wiki/Rutaceae   http://en.wikipedia.org/wiki/Cherry_Orange
2    apples  https://www.apple.com/  http://desmoines.citysearch.com/review/692986920    http://local.yahoo.com/info-28919583-apple-sto...   http://www.judysbook.com/Apple-Store-BtoB~Cell...   https://tr.foursquare.com/v/apple-store/4b466b...

如果您不想指定列（即[C]，d.]），可以执行以下操作

df.join(df["B"].apply(lambda fruit : pd.Series([url for url in 
                     search(fruit, stop=stop, pause=2.0)][:stop])))

您想对search（）返回的URL做什么？嗨@Bob Haffner，将URL导出到CSV（在C、D、E……列中）我称之为“查询”“从。因此，我将有搜索词，并在同一行，相应的网址。也许有一个更有效的过程，但这是我最初的尝试。非常感谢您的任何建议！感谢stop参数似乎没有像我想象的那样限制URL的数量。它对您有用吗？包含搜索字符串的列的名称是什么？水果-是包含搜索字符串的列（位于：B列）的名称。抱歉，我没有跟踪您。您介意发布数据帧的前几行吗？打印df.columns时会发生什么？没问题，我刚刚学习了导入图像，所以截屏了目标Excel文件（“CSV（Excel）头：”）以及打印更新代码时收到的命令行错误。再次感谢，我真的非常感谢任何额外的帮助！并添加到我的更新代码中。再次感谢您，非常感谢您为解决此问题提供的帮助！

    B   C   D   E   F   G
0    mangos  http://en.wikipedia.org/wiki/Mango  http://en.wikipedia.org/wiki/Mango_(disambigua...   http://en.wikipedia.org/wiki/Mangifera  http://en.wikipedia.org/wiki/Mangifera_indica   http://en.wikipedia.org/wiki/Purple_mangosteen
1    oranges     http://en.wikipedia.org/wiki/Orange_(fruit)     http://en.wikipedia.org/wiki/Bitter_orange  http://en.wikipedia.org/wiki/Valencia_orange    http://en.wikipedia.org/wiki/Rutaceae   http://en.wikipedia.org/wiki/Cherry_Orange
2    apples  https://www.apple.com/  http://desmoines.citysearch.com/review/692986920    http://local.yahoo.com/info-28919583-apple-sto...   http://www.judysbook.com/Apple-Store-BtoB~Cell...   https://tr.foursquare.com/v/apple-store/4b466b...

df.join(df["B"].apply(lambda fruit : pd.Series([url for url in 
                     search(fruit, stop=stop, pause=2.0)][:stop])))