Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/358.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 我需要从BeautifulSoup的字符串输出中删除多余的字符_Python_Unicode_Beautifulsoup - Fatal编程技术网

Python 我需要从BeautifulSoup的字符串输出中删除多余的字符

Python 我需要从BeautifulSoup的字符串输出中删除多余的字符,python,unicode,beautifulsoup,Python,Unicode,Beautifulsoup,我需要删除对我来说重要的数据周围的[u'前缀和']后缀。这将被放入一个数据库中,从我看到的情况来看,它需要这些额外的字符。我怎样才能移除它们?我已经尝试了。替换变量,但它返回一个错误 import urllib import mechanize from bs4 import BeautifulSoup import requests import re import MySQLdb import time db = MySQLdb.connect( host=" ", user="

我需要删除对我来说重要的数据周围的[u'前缀和']后缀。这将被放入一个数据库中,从我看到的情况来看,它需要这些额外的字符。我怎样才能移除它们?我已经尝试了。替换变量,但它返回一个错误

import urllib
import mechanize
from bs4 import BeautifulSoup
import requests
import re
import MySQLdb
import time

db = MySQLdb.connect(
  host=" ",
  user=" ",
  passwd=" ",
  db=" ")

inc = 0

# while inc != 3289:
c = db.cursor()
c.execute("""SELECT `symbol` FROM `stocks` LIMIT %s,1""", (inc,))
result = c.fetchall()
result = str(result)

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
br = mechanize.Browser()
br.set_handle_robots(False)
br.addHeaders = [('User-agent',user_agent)]

term = result.replace('((','').replace(',)','').replace("'",'')
url = "http://www.marketwatch.com/investing/stock/"+term
soup = BeautifulSoup(requests.get(url).text)
search = soup.find('p', attrs = {'class':'data bgLast'})
cur = search.findAll(text = True)
search2 = soup.find('span', attrs = {'class':'bgChange'})
diff = search2.findAll(text = True)
print term
print cur
print diff

c.execute("""UPDATE stocks SET cur = %s WHERE symbol = %s""", (cur,term))
c.execute("""UPDATE stocks SET diff = %s WHERE symbol = %s""", (diff,term))
db.commit()
不,谢谢你,jonrsharpe,我找到了答案。在原始代码中.findAll正在检索一个结果集。我所要做的就是将其更改为str,从而允许将strip函数传递给它。修订后的代码如下:

import urllib
import mechanize
from bs4 import BeautifulSoup
import requests
import re
import MySQLdb
import time

db = MySQLdb.connect(
  host=" ",
  user=" ",
  passwd=" ",
  db=" ")

inc = 0

# while inc != 3289:
c = db.cursor()
c.execute("""SELECT `symbol` FROM `stocks` LIMIT %s,1""", (inc,))
result = c.fetchall()
result = str(result)

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
br = mechanize.Browser()
br.set_handle_robots(False)
br.addHeaders = [('User-agent',user_agent)]

term = result.replace('((','').replace(',)','').replace("'",'')
url = "http://www.marketwatch.com/investing/stock/"+term
soup = BeautifulSoup(requests.get(url).text)
search = soup.find('p', attrs = {'class':'data bgLast'})
cur = str(search.findAll(text = True))
search2 = soup.find('span', attrs = {'class':'bgChange'})
diff = str(search2.findAll(text = True))
cur = cur.strip("'[]u")
diff = diff.strip("'[]u")
print term
print cur
print diff

c.execute("""UPDATE stocks SET cur = %s WHERE symbol = %s""", (cur,term))
c.execute("""UPDATE stocks SET diff = %s WHERE symbol = %s""", (diff,term))
db.commit()
别这样!除了字符串,还有其他数据类型

result
是一个列表列表<代码>搜索。findAll提供文本节点列表。例如,您可以通过说出
result[0][0]
来获得第一行的
符号
值;只需说出
search.getText()
,即可获得元素的文本


将结构化对象(如列表)序列化为平面字符串,然后尝试从中提取位是不明智的做法。

您知道,您看到的是一个包含Unicode字符串的单元素列表,是吗?是的,但是我怎样才能使变量只包含没有u和方括号的文本?或者至少只显示文本。。。?!没有帮助。还有其他文件吗?
result = str(result)
...
cur = str(search.findAll(text = True))