Regex 如何将多行内容转换为列表？_Regex_Python 2.7_Web Scraping_Beautifulsoup

Regex 如何将多行内容转换为列表？

regex python-2.7 web-scraping

Regex 如何将多行内容转换为列表？,regex,python-2.7,web-scraping,beautifulsoup,Regex,Python 2.7,Web Scraping,Beautifulsoup,我试图将刮取的内容转换为列表以进行数据操作，但出现以下错误：TypeError:“NoneType”对象不可调用 #! /usr/bin/python from urllib import urlopen from BeautifulSoup import BeautifulSoup import os import re # Copy all of the content from the provided web page webpage = urlopen("http://www.op

我试图将刮取的内容转换为列表以进行数据操作，但出现以下错误：TypeError:“NoneType”对象不可调用

#! /usr/bin/python

from urllib import urlopen
from BeautifulSoup import BeautifulSoup
import os
import re

# Copy all of the content from the provided web page
webpage = urlopen("http://www.optionstrategist.com/calculators/free-volatility-    data").read()

# Grab everything that lies between the title tags using a REGEX
preBegin = webpage.find('<pre>') # Locate the pre provided
preEnd = webpage.find('</pre>') # Locate the /pre provided

# Copy the content between the pre tags
voltable = webpage[preBegin:preEnd] 

# Pass the content to the Beautiful Soup Module
raw_data = BeautifulSoup(voltable).splitline()

#/usr/bin/python
从urllib导入urlopen
从BeautifulSoup导入BeautifulSoup
导入操作系统
进口稀土
#从提供的网页复制所有内容
网页=urlopen（“http://www.optionstrategist.com/calculators/free-volatility-    数据“）.read（）
#使用正则表达式抓取标题标签之间的所有内容
preBegin=网页。查找（“”）#找到预先提供的
preEnd=网页。查找（“”）#找到/预先提供的
#在预标记之间复制内容
voltable=网页[预开始：预结束]
#将内容传递到Beauty Soup模块
原始数据=美化组（可电压）.splitline（）

代码非常简单。这是BeautifulSoup4的代码：

#!/usr/bin/env python
from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup

url = "http://www.optionstrategist.com/calculators/free-volatility-data"
soup = BeautifulSoup(urlopen(url))
print soup.pre.string

#在HTML页面中查找所有标记
preTags=webpage.find_all（'pre'）
对于preTags中的标签：
#获取标记内的文本
打印（tag.get_text（））

参考：

要从第一个

pre

元素获取文本：

要提取包含数据的行，请执行以下操作：

现在，每行包含固定列格式的数据。您可以使用切片和/或正则表达式来解析/验证每行中的各个字段。

使用HTML解析器（IIRC，有BeautifulSoup）。只需意识到您已经导入了它，但您从未使用它来提取标记，而是选择正则表达式……我在新代码中添加了它。但我在将内容转换为列表以进行数据操作时遇到问题。我得到了TypeError:“NoneType”对象不可调用。原始数据是。我以前从未遇到过这种情况。完成这项任务所需的一切：，，请学习阅读手册。事实上，我做到了。但我不明白，它不是有效的Python。OP使用的BeautifulSoup 3对

find_all（）

，

get_text（）有不同的拼写

来自bs4版本。@J.F.Sebastian:Valid/invalid Python与BS版本的有效代码不同。您的答案是：它有无效的Python语法，并且它使用的API与问题中的代码不兼容。@J.F.Sebastian：感谢您对糟糕的Python语法的评论。但我不在乎BeautifulSoup版本是否兼容。

#!/usr/bin/env python
from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup

url = "http://www.optionstrategist.com/calculators/free-volatility-data"
soup = BeautifulSoup(urlopen(url))
print soup.pre.string

from itertools import dropwhile

lines = soup.pre.string.splitlines()
# drop lines before the data table header
lines = dropwhile(lambda line: not line.startswith("Symbol"), lines)
# extract lines with data
lines = (line for line in lines if '%ile' in line)