Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/276.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/cassandra/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 试图解析大学教师网站上的名字(和博士学位)。只有这一点很难做到 从bs4导入BeautifulSoup#导入BeautifulSoup包 导入urllib2 url='1〕https://www.marshall.usc.edu/faculty/phd“#将url设置为变量 page=urlib2.urlopen(url) soup=BeautifulSoup(page.read(),“lxml”)#将页面内容设置为变量soup #names=soup.find_all('tr',{'class':'odd views row first'}) names=soup.find_all('td',{'class':'views field views field faculty name last value active'})#设置名称'cell'和标记 #namesU=names.replaceAll(“]*>”,“”) #名称。条带(“”) #names2=names.sub(“”,“”) 打印(姓名)_Python_Parsing_Web - Fatal编程技术网

Python 试图解析大学教师网站上的名字(和博士学位)。只有这一点很难做到 从bs4导入BeautifulSoup#导入BeautifulSoup包 导入urllib2 url='1〕https://www.marshall.usc.edu/faculty/phd“#将url设置为变量 page=urlib2.urlopen(url) soup=BeautifulSoup(page.read(),“lxml”)#将页面内容设置为变量soup #names=soup.find_all('tr',{'class':'odd views row first'}) names=soup.find_all('td',{'class':'views field views field faculty name last value active'})#设置名称'cell'和标记 #namesU=names.replaceAll(“]*>”,“”) #名称。条带(“”) #names2=names.sub(“”,“”) 打印(姓名)

Python 试图解析大学教师网站上的名字(和博士学位)。只有这一点很难做到 从bs4导入BeautifulSoup#导入BeautifulSoup包 导入urllib2 url='1〕https://www.marshall.usc.edu/faculty/phd“#将url设置为变量 page=urlib2.urlopen(url) soup=BeautifulSoup(page.read(),“lxml”)#将页面内容设置为变量soup #names=soup.find_all('tr',{'class':'odd views row first'}) names=soup.find_all('td',{'class':'views field views field faculty name last value active'})#设置名称'cell'和标记 #namesU=names.replaceAll(“]*>”,“”) #名称。条带(“”) #names2=names.sub(“”,“”) 打印(姓名),python,parsing,web,Python,Parsing,Web,您可以在“td”的find_all之后使用“text”属性来解决此问题 因此,从find_all中得到的结果,您只需迭代,得到每个部分的“文本”部分,并将其放入名称数组中 以下是实现此目的的列表理解方法: from bs4 import BeautifulSoup #imports beautifulSoup package import urllib2 url = 'https://www.marshall.usc.edu/faculty/phd' #sets url to a variab

您可以在“td”的find_all之后使用“text”属性来解决此问题

因此,从find_all中得到的结果,您只需迭代,得到每个部分的“文本”部分,并将其放入名称数组中

以下是实现此目的的列表理解方法:

from bs4 import BeautifulSoup #imports beautifulSoup package
import urllib2

url = 'https://www.marshall.usc.edu/faculty/phd' #sets url to a variable
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read(), "lxml") #sets the contents of the page to the variable soup

#names = soup.find_all('tr', {'class': 'odd views-row-first'})

names = soup.find_all('td', {'class': 'views-field views-field-field-faculty-name-last-value active'}) #sets the name 'cell' and tags
#namesU = names.replaceAll("<[^>]*>","")

#names.strip('<td class="views-field views-field-field-faculty-name-last-value active">') 
#names2 = names.sub('<td class="views-field views-field-field-faculty-name-last-value active">', '')

print(names)
运行此操作后,输出将产生:

names = [i.text.strip() for i in soup.find_all('td', {'class': 'views-field views-field-field-faculty-name-last-value active'})]

谢谢你帮我做这一切!虽然我一直在每个数组单元格的第一个引号前面加上字符“u”,比如[u'Amato,Andrea',u'Banerjee,Trambak'..]有什么想法吗?这意味着字符串是unicode格式的。这应该不是问题。如果您确实不想要unicode,只需确保将其转换为列表中的str(),
str(i.text.strip())
。如果你认为这个答案可以接受,请投票并接受。
['Amato, Andrea', 'Banerjee, Trambak', 'Basu, Pallavi', 'Chang, Wayne', 'Chung, Sung Hun', 'Comings, Alison', 'Cui, Hailong', 'DeGroot, Tyler', 'Dutton, Chaumanix', 'Fu, Luella', 'Golrezaei, Negin', 'Grandy, Jake', 'Han, Rong Qing', 'Han, Ju Rie (Alyssa)', 'Harmon, Derek', 'Hong, Jihoon', 'Jia, He', 'Joshi, Priyanka', 'Kays, Allison', 'Kfir, Alon', 'Kim, Jeunghyun', 'Kim, Pureum', 'Kim, Yookyoung', 'Kim , Jennifer', 'Krikorian, Mariam', 'Lang, Tina', 'Lee, Jennifer', 'Lee, Suk won', 'Lee, Yoonju', 'Li, Guang', 'Li, Yuan', 'Ling, Yun', 'Magkotsios, Georgios', 'Min, Bora', 'Newman, David', 'Oh, Seung Hwan', 'Ozkan, Erhun', 'Paulson, Courtney', 'Pei, Lei', 'Pyun, Sung June', 'Raj, Medha', 'Raveendhran, Roshni', 'Rich, Beverly', 'Ritter, Stacey', 'Sahoo, Satish', 'Skripnik, Roman', 'Smallets, Stephanie', 'Song, Shiwon', 'Stamenov, Ventsislav', 'Subler, Megan', 'Talijan, Vuk', 'Uhalde, Arianna', 'Valsesia, Francesca', 'Wan, Yuan', 'Wang, Jue', 'Wang, Weinan', 'Wang, Xuan', 'Wang, Yongzhi (Alex)', 'Wang, Yingfei (Fiona)', 'Wong, Vivian', 'Xia, Jingjing', 'Xing, Zhe (Adele)', 'Xu, Zibin', 'Yang, Louis', 'Yao, Yao', 'Yi, Irene', 'Yordanov, Kristian', 'Yu, Xiaoqian', 'Zhang, Heng', 'Zhang, Yanwei (Wayne)', 'Zhang, Yingguang', 'Zhang, Mengxia']