Web scraping excel中的列表数据抓取

Web scraping excel中的列表数据抓取,web-scraping,Web Scraping,我有一个Excel列表。一个代码在A列,另一个在B列 有一个网站,我需要在两个不同的框中输入这两个详细信息,然后转到另一个页面 该页面包含我需要在Excel中略过的某些详细信息 这方面有什么帮助吗?好的。试一试: import pandas as pd import requests df = pd.read_excel('C:/test/data.xlsx') url = 'http://rla.dgft.gov.in:8100/dgft/IecPrint' results = p

我有一个Excel列表。一个代码在A列,另一个在B列

有一个网站,我需要在两个不同的框中输入这两个详细信息,然后转到另一个页面

该页面包含我需要在Excel中略过的某些详细信息

这方面有什么帮助吗?

好的。试一试:

import pandas as pd
import requests



df = pd.read_excel('C:/test/data.xlsx')



url = 'http://rla.dgft.gov.in:8100/dgft/IecPrint'
results = pd.DataFrame()
for row in df.itertuples():
    payload = {
            'iec': '%010d' %row[1],
            'name':row[2]}
    response = requests.post(url, params=payload)
    print ('IEC: %010d\tName: %s' %(row[1],row[2]))
    try:
        dfs = pd.read_html(response.text)
    except:
        print ('The name Given By you does not match with the data OR you have entered less than three letters')
        temp_df = pd.DataFrame([['%010d' %row[1],row[2], 'ERROR']],
                               columns = ['IEC','Party Name and Address','ERROR'])
        results = results.append(temp_df, sort=False).reset_index(drop=True)
        continue

    generalData = dfs[0]
    generalData = generalData.iloc[:,[0,-1]].set_index(generalData.columns[0]).T.reset_index(drop=True)

    directorData = dfs[1]
    directorData = directorData.iloc[:,[-1]].T.reset_index(drop=True)
    directorData.columns = [ 'director_%02d' %(each+1) for each in directorData.columns ]

    try:
        branchData = dfs[2]
        branchData = branchData.iloc[:,[-1]].T.reset_index(drop=True)
        branchData.columns = [ 'branch_%02d' %(each+1) for each in branchData.columns ]
    except:
        branchData = pd.DataFrame()
        print ('No Branch Data.')

    temp_df = pd.concat([generalData, directorData, branchData], axis=1)
    results = results.append(temp_df, sort=False).reset_index(drop=True)


results.to_excel('path.new_file.xlsx', index=False)
输出:

print (results.to_string())
          IEC IEC Allotment Date            File Number   File Date                             Party Name and Address      Phone No                     e_mail            Exporter Type IEC Status Date of Establishment BIN (PAN+Extension) PAN ISSUE DATE PAN ISSUED BY  Nature Of Concern                                      Banker Detail                                        director_01                                        director_02                                        director_03                                          branch_01                                          branch_02                                          branch_03                                          branch_04                                          branch_05                                          branch_06                                          branch_07                                          branch_08                                          branch_09
0  0305008111         03.05.2005  04/04/131/51473/AM20/  20.08.2019  NISSAN MOTOR INDIA PVT. LTD. PLOT-1A,SIPCOT IN...  918939917907  shailesh.kumar@rnaipl.com  5 Merchant/Manufacturer  Valid IEC            2005-02-07    AACCN0695D FT001            NaN           NaN  3 Private Limited  STANDARD CHARTERED BANK A/C Type:1 CA A/C No :...  HARDEEP SINGH BRAR GURMEL SINGH BRAR HOUSE NO ...  JEROME YVES MARIE SAIGOT THIERRY SAIGOT A9/2, ...  KOJI KAWAKITA KIHACHI KAWAKITA 3-21-3, NAGATAK...  Branch Code:165TH FLOOR ORCHID BUSINESS PARK,S...  Branch Code:14NRPDC , WAREHOUSE NO.B -2A,PATAU...  Branch Code:12EQUINOX BUSINESS PARK TOWER 3 4T...  Branch Code:8GRAND PALLADIUM,5TH FLR.,B WING,,...  Branch Code:6TVS LOGISTICS SERVICES LTD.SING,C...  Branch Code:2PLOT 1A SIPCOT INDUL PARK,ORAGADA...  Branch Code:5BLDG.NO.3 PART,124A,VALLAM A,SRIP...  Branch Code:15SURVEY NO. 678 679 680 681 682 6...  Branch Code:10INDOSPACE SKCL INDL.PARK,BULD.NO...

你必须提供更多的信息。向我们提供这些数据。请描述您想要的输出。以下是数据:以下是数据:IEC编号公司名称0305008111日产汽车印度私人有限公司。网站:输入这两个数据后,您会得到不同的详细信息,我需要取消这些信息,例如5000个企业,是否有帮助?好的。知道了。是的,我能做到。要过一会儿才能拿到。但很快看一下,看起来很简单。我想接下来的问题是,您是否希望严格通过excel来实现这一点?或者像Python这样的其他编程语言?Python会很棒,因为我刚刚开始学习它。哥们,有几个问题。1.我的数据是excel格式的,因为许多条目都以零开头,而且在读取csv文件时,生成的文件没有返回任何内容,打印(结果)也是如此。但是,打印df时,只有当它们是csv文件中的单个条目时,才会给出值。如果有多个条目出现错误,我可以共享excel吗?确定。给我发封电子邮件。杰森。schvach@gmail.com你能把excel文件也包括进来吗?我会处理它的,兄弟,我也在努力理解每一行。。。谢谢你的帮助