Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/281.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
为什么搜索查询表显示的是表头,而不是BeautifulSoup(Python)中的数据?_Python_Html_Pandas_Html Parser - Fatal编程技术网

为什么搜索查询表显示的是表头,而不是BeautifulSoup(Python)中的数据?

为什么搜索查询表显示的是表头,而不是BeautifulSoup(Python)中的数据?,python,html,pandas,html-parser,Python,Html,Pandas,Html Parser,我正试图解析它以搜索结果 请选择: 学校=全部 运动=足球 会议=全部 年份=2005-2006 状态=全部 这个搜索结果包含226个条目,我想对所有226个条目进行解析,并将其转换为pandas dataframe,以便dataframe包含School、Conference、GSR、“FGR”和“State”。到目前为止,我能够解析表头,但无法解析表中的数据。请告知代码和解释 注意:我不熟悉Python和Beautifulsoup 到目前为止我已经尝试过的代码: url='https:

我正试图解析它以搜索结果

请选择:

学校=全部 运动=足球 会议=全部 年份=2005-2006 状态=全部 这个搜索结果包含226个条目,我想对所有226个条目进行解析,并将其转换为pandas dataframe,以便dataframe包含School、Conference、GSR、“FGR”和“State”。到目前为止,我能够解析表头,但无法解析表中的数据。请告知代码和解释

注意:我不熟悉Python和Beautifulsoup

到目前为止我已经尝试过的代码:

   url='https://web3.ncaa.org/aprsearch/gsrsearch'

    #Create a handle, page, to handle the contents of the website
    page = requests.get(url)

    #Store the contents of the website under doc
    doc = lh.fromstring(page.content)

    #Parse data that are stored between <tr>..</tr> of HTML
    tr_elements = doc.xpath('//tr')

#Create empty list
col=[]
i=0

#For each row, store each first element (header) and an empty list
for t in tr_elements[0]:
    i+=1
    name=t.text_content()
    print ('%d:"%s"'%(i,name))
    col.append((name,[]))

#Since out first row is the header, data is stored on the second row onwards
for j in range(1,len(tr_elements)):
    #T is our j'th row
    T=tr_elements[j]

    #If row is not of size 10, the //tr data is not from our table 
    if len(T)!=10:
        break

    #i is the index of our column
    i=0

    #Iterate through each element of the row
    for t in T.iterchildren():
        data=t.text_content() 
        #Check if row is empty
        if i>0:
        #Convert any numerical value to integers
            try:
                data=int(data)
            except:
                pass
        #Append the data to the empty list of the i'th column
        col[i][1].append(data)
        #Increment i for the next column
        i+=1
Dict={title:column for (title,column) in col}
df=pd.DataFrame(Dict)
迄今为止的产出:

您可以粘贴标题和有效负载,然后使用.post。我仍在学习如何正确使用它,不太确定到底需要什么,或者什么是敏感信息,这就是为什么我屏蔽了其中一些信息…正如我所说,我仍在学习,但设法让它返回json

这将返回json,然后仅转换为数据帧

您可以通过检查页面来获取标题和负载,然后单击XHR,您可能需要刷新页面,以便显示gsrsearch。然后点击它并滚动找到它。不过你得把引号放进去

代码:

输出:


我们能看看输出是什么样子吗?@LucasDurand。。请参阅更新的question@Data_is_Power你可能不得不使用selenium@BittoBennichan.. 谢谢你的及时回复!我们能用Python做些什么吗?是的。我尝试模拟ajax请求,但没有成功。我认为硒应该是这里的自然选择
import json
import requests
from pandas.io.json import json_normalize


url='https://web3.ncaa.org/aprsearch/gsrsearch'

# Here's where you'll put your headers from Inspect
headers = {
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'Connection': 'keep-alive',
...
...
...
'X-Requested-With': 'XMLHttpRequest'}

# Here's where you put Form Data from Inspect
payload = {'schoolOrgId': '',
'conferenceOrgId':'', 
'sportCode': 'MFB',
'cohortYear': '2005', # I changed this to year 2005
'state':'',
... }




r = requests.post(url, headers=headers, data=payload)
jsonStr = r.text
jsonObj = json.loads(jsonStr)



df = json_normalize(jsonObj)
print (df)
     cohortYear  conferenceId  ...   sportDesc  state
0          2005           875  ...    Football     OH
1          2005           916  ...    Football     AL
2          2005           916  ...    Football     AL
3          2005           911  ...    Football     AL
4          2005         24312  ...    Football     AL
5          2005           846  ...    Football     NY
6          2005           916  ...    Football     MS
7          2005           912  ...    Football     NC
8          2005           905  ...    Football     AZ
9          2005           905  ...    Football     AZ
10         2005           818  ...    Football     AR
11         2005           911  ...    Football     AR
12         2005           911  ...    Football     AL
13         2005           902  ...    Football     TN
14         2005           875  ...    Football     IN
15         2005           826  ...    Football     SC
16         2005         25354  ...    Football     TX
17         2005           876  ...    Football     FL
18         2005          5486  ...    Football     ID
19         2005           821  ...    Football     MA
20         2005           875  ...    Football     OH
21         2005             0  ...    Football     UT
22         2005           865  ...    Football     RI
23         2005           846  ...    Football     RI
24         2005           838  ...    Football     PA
25         2005           875  ...    Football     NY
26         2005         21451  ...    Football     IN
27         2005             0  ...    Football     CA
28         2005           923  ...    Football     CA
29         2005           825  ...    Football     CA
..          ...           ...  ...         ...    ...
210        2005             0  ...    Football     MD
211        2005           923  ...    Football     UT
212        2005           905  ...    Football     UT
213        2005         21451  ...    Football     IN
214        2005           911  ...    Football     TN
215        2005           837  ...    Football     PA
216        2005           826  ...    Football     VA
217        2005           821  ...    Football     VA
218        2005           821  ...    Football     VA
219        2005           846  ...    Football     NY
220        2005           821  ...    Football     NC
221        2005           905  ...    Football     WA
222        2005           905  ...    Football     WA
223        2005           825  ...    Football     UT
224        2005           823  ...    Football     WV
225        2005           912  ...    Football     NC
226        2005           853  ...    Football     IL
227        2005           818  ...    Football     KY
228        2005           875  ...    Football     MI
229        2005           837  ...    Football     VA
230        2005           827  ...    Football     WI
231        2005          5486  ...    Football     WY
232        2005           865  ...    Football     CT
233        2005           853  ...    Football     OH
234        2005           914  ...    Football     AR
235        2005           912  ...    Football     NC
236        2005           826  ...    Football     NC
237        2005           826  ...    Football     SC
238        2005           916  ...    Football     AR
239        2005           912  ...    Football     SC

[240 rows x 12 columns]