Python 大字符串(从BeautifulSoup)放入SQLite表-不能使用传统方法
我希望你能帮忙 我目前正在编写一个代码来提取SteamSpy.com上各种游戏的历史数据。在Patreon上支持项目后,您可以查看每个游戏的各种指标的悠久历史。我想在多个游戏之间进行比较,因此我想提取数据 我从前面知道,BeautifulSoup对这项任务非常有帮助,但不幸的是,我不能像前面那样使用它。我将在下面详细描述它,但主要问题是,所有相关数据都包含在一个标记中Python 大字符串(从BeautifulSoup)放入SQLite表-不能使用传统方法,python,sqlite,beautifulsoup,python-requests,Python,Sqlite,Beautifulsoup,Python Requests,我希望你能帮忙 我目前正在编写一个代码来提取SteamSpy.com上各种游戏的历史数据。在Patreon上支持项目后,您可以查看每个游戏的各种指标的悠久历史。我想在多个游戏之间进行比较,因此我想提取数据 我从前面知道,BeautifulSoup对这项任务非常有帮助,但不幸的是,我不能像前面那样使用它。我将在下面详细描述它,但主要问题是,所有相关数据都包含在一个标记中 示例:Dota2 网址: 源代码: 源代码 这是我感兴趣的源代码部分(登录时显然要长得多,您可以访问历史数据) 输出 这是我
- 示例:Dota2
- 网址:
- 源代码:
C:\Python35\python.exe "C:/Users/nohgjk/Dropbox/Gaming/Project steamSpy/Python/steamSpy - Test/steamSpySoup2.py"
var data2sales= [
{
"key": "Owners",
"bar": true,
"values": [
[1489363200000, 549045, ""],
[1489449600000, 550812, ""],
[1489536000000, 550773, ""],
[1489622400000, 544180, ""],
[1489708800000, 532284, ""],
[1489795200000, 546592, ""],
[1489881600000, 545925, "#2B6A94"],
[1489968000000, 550721, ""],
[1490054400000, 539253, ""],
[1490140800000, 536258, ""],
[1490227200000, 544210, ""],
[1490313600000, 560977, ""],
[1490400000000, 562907, ""],
[1490486400000, 554817, "#2B6A94"],
[1490572800000, 552973, ""],
[1490659200000, 551875, ""],
[1490745600000, 554853, ""],
[1490832000000, 553309, ""],
[1490918400000, 551987, ""],
[1491004800000, 551671, ""],
[1491091200000, 541915, "#2B6A94"],
[1491177600000, 541280, ""] ]},{
"key" : "Price",
"values" : [
[1489363200000, 19.99, ""],
[1489449600000, 19.99, ""],
[1489536000000, 19.99, ""],
[1489622400000, 19.99, ""],
[1489708800000, 19.99, ""],
[1489795200000, 19.99, ""],
[1489881600000, 19.99, "#2B6A94"],
[1489968000000, 19.99, ""],
[1490054400000, 19.99, ""],
[1490140800000, 19.99, ""],
[1490227200000, 19.99, ""],
[1490313600000, 19.99, ""],
[1490400000000, 19.99, ""],
[1490486400000, 19.99, "#2B6A94"],
[1490572800000, 19.99, ""],
[1490659200000, 19.99, ""],
[1490745600000, 19.99, ""],
[1490832000000, 19.99, ""],
[1490918400000, 19.99, ""],
[1491004800000, 19.99, ""],
[1491091200000, 19.99, "#2B6A94"],
[1491177600000, 19.99, ""]] } ];
Process finished with exit code 0
我在这个例子中用了你的例子。 使用
regex
查找方括号中的所有文本
像这样:
ownr = soup.find("div", {"id": "tab-sales"}).find("script").get_text()
some_data = []
matches = re.findall(r'\[(.*?)\]', ownr)
for match in matches:
some_data.append(match.split(','))
owners, price = zip(*[iter(some_data)] * (len(some_data) / 2))
哪些输出(预编辑):
查找有关regex的更多信息。似乎是regex的任务
lists = re.findall(pattern=r"([0-9]{13},.*, \".*\")", string=ownr)
lists
输出:
['1489363200000, 97073321, ""',
'1489449600000, 97138657, ""',
'1489536000000, 97126694, ""',
'1489622400000, 98535521, ""',
'1489708800000, 98482905, ""',
'1489795200000, 98496091, ""',
'1489881600000, 98627987, "#2B6A94"',
'1489968000000, 98798351, ""',
'1490054400000, 98936652, ""',
'1490140800000, 99025494, ""',
'1490227200000, 99208644, ""',
'1490313600000, 99163634, ""',
'1490400000000, 99097059, ""',
'1490486400000, 98986347, "#2B6A94"',
'1490572800000, 99005343, ""',
'1490659200000, 99023673, ""',
'1490745600000, 99084059, ""',
'1490832000000, 98988641, ""',
'1490918400000, 99120523, ""',
'1491004800000, 99058884, ""',
'1491091200000, 99206546, "#2B6A94"',
'1491177600000, 99155567, ""',
'1489363200000,, "#ffffff"',
'1489449600000,, "#ffffff"',
'1489536000000,, "#ffffff"',
'1489622400000,, "#ffffff"',
'1489708800000,, "#ffffff"',
'1489795200000,, "#ffffff"',
'1489881600000,, "#ffffff"',
'1489968000000,, "#ffffff"',
'1490054400000,, "#ffffff"',
'1490140800000,, "#ffffff"',
'1490227200000,, "#ffffff"',
'1490313600000,, "#ffffff"',
'1490400000000,, "#ffffff"',
'1490486400000,, "#ffffff"',
'1490572800000,, "#ffffff"',
'1490659200000,, "#ffffff"',
'1490745600000,, "#ffffff"',
'1490832000000,, "#ffffff"',
'1490918400000,, "#ffffff"',
'1491004800000,, "#ffffff"',
'1491091200000,, "#ffffff"',
'1491177600000,, "#ffffff"']
我试着分析你代码的输出
data = '''
var data2sales= [
{
"key": "Owners",
"bar": true,
"values": [
[1489363200000, 549045, ""],
[1489449600000, 550812, ""],
[1489536000000, 550773, ""],
[1489622400000, 544180, ""],
[1489708800000, 532284, ""],
[1489795200000, 546592, ""],
[1489881600000, 545925, "#2B6A94"],
[1489968000000, 550721, ""],
[1490054400000, 539253, ""],
[1490140800000, 536258, ""],
[1490227200000, 544210, ""],
[1490313600000, 560977, ""],
[1490400000000, 562907, ""],
[1490486400000, 554817, "#2B6A94"],
[1490572800000, 552973, ""],
[1490659200000, 551875, ""],
[1490745600000, 554853, ""],
[1490832000000, 553309, ""],
[1490918400000, 551987, ""],
[1491004800000, 551671, ""],
[1491091200000, 541915, "#2B6A94"],
[1491177600000, 541280, ""] ]},{
"key" : "Price",
"values" : [
[1489363200000, 19.99, ""],
[1489449600000, 19.99, ""],
[1489536000000, 19.99, ""],
[1489622400000, 19.99, ""],
[1489708800000, 19.99, ""],
[1489795200000, 19.99, ""],
[1489881600000, 19.99, "#2B6A94"],
[1489968000000, 19.99, ""],
[1490054400000, 19.99, ""],
[1490140800000, 19.99, ""],
[1490227200000, 19.99, ""],
[1490313600000, 19.99, ""],
[1490400000000, 19.99, ""],
[1490486400000, 19.99, "#2B6A94"],
[1490572800000, 19.99, ""],
[1490659200000, 19.99, ""],
[1490745600000, 19.99, ""],
[1490832000000, 19.99, ""],
[1490918400000, 19.99, ""],
[1491004800000, 19.99, ""],
[1491091200000, 19.99, "#2B6A94"],
[1491177600000, 19.99, ""]] } ];
'''
使用此代码
import json
json_data = data.split('=')[1].split(';')[0]
data_dict = json.loads(json_data)
print(data_dict[0]['key'])
现在data_dict是一本python字典。绝对出色的伴侣,这正是我想要的。当我将regex作为re导入时,它可以正常工作。我已经根据您上次的请求更新了代码,以拆分为所有者/价格@user3494191运行代码时,我得到以下错误代码:
Traceback(最后一次调用):文件“C:/Users/nohgjk/Dropbox/Gaming/Project-steamSpy/Python/steamSpy-Test/steamSpySoup5.py”,第71行,在owners中,price=zip(*[iter(some_数据)]*(len(some_数据)/2))TypeError:无法将序列乘以“float”类型的非int
@zroq谢谢您的回答,这确实是解决方案。我已经将@Zrog的回答标记为已接受,但感谢您也花时间回答。)这似乎是一个更好的解决方案,我将在稍后研究。谢谢你的帮助。它工作得非常好,最后,我想我会选择这个解决方案,我已经完全忘记了在这类任务中使用JSON。再次感谢,我有个问题。在登录后运行代码时,在某些地方,第三个“列”中填充了一个很长的字符串(我猜与Steam销售相关)。例如,在输出示例中,“#2B6A94”
替换为”http://store.steampowered.com/news/20987/“
或“哦……先生!!”
甚至“影子战士2 25%折扣+\'王之路\'DLC”
-此外,现在有5个“列”而不是3个,但我不怀疑这与此有任何关系。你能帮我解决这个问题吗?否则我真的很喜欢JSON方法!当第三列有一个长字符串时,您想做什么?它是\
字符。我设法用一些.replace
命令来运行它。谢谢你的帮助。
['1489363200000, 97073321, ""',
'1489449600000, 97138657, ""',
'1489536000000, 97126694, ""',
'1489622400000, 98535521, ""',
'1489708800000, 98482905, ""',
'1489795200000, 98496091, ""',
'1489881600000, 98627987, "#2B6A94"',
'1489968000000, 98798351, ""',
'1490054400000, 98936652, ""',
'1490140800000, 99025494, ""',
'1490227200000, 99208644, ""',
'1490313600000, 99163634, ""',
'1490400000000, 99097059, ""',
'1490486400000, 98986347, "#2B6A94"',
'1490572800000, 99005343, ""',
'1490659200000, 99023673, ""',
'1490745600000, 99084059, ""',
'1490832000000, 98988641, ""',
'1490918400000, 99120523, ""',
'1491004800000, 99058884, ""',
'1491091200000, 99206546, "#2B6A94"',
'1491177600000, 99155567, ""',
'1489363200000,, "#ffffff"',
'1489449600000,, "#ffffff"',
'1489536000000,, "#ffffff"',
'1489622400000,, "#ffffff"',
'1489708800000,, "#ffffff"',
'1489795200000,, "#ffffff"',
'1489881600000,, "#ffffff"',
'1489968000000,, "#ffffff"',
'1490054400000,, "#ffffff"',
'1490140800000,, "#ffffff"',
'1490227200000,, "#ffffff"',
'1490313600000,, "#ffffff"',
'1490400000000,, "#ffffff"',
'1490486400000,, "#ffffff"',
'1490572800000,, "#ffffff"',
'1490659200000,, "#ffffff"',
'1490745600000,, "#ffffff"',
'1490832000000,, "#ffffff"',
'1490918400000,, "#ffffff"',
'1491004800000,, "#ffffff"',
'1491091200000,, "#ffffff"',
'1491177600000,, "#ffffff"']
data = '''
var data2sales= [
{
"key": "Owners",
"bar": true,
"values": [
[1489363200000, 549045, ""],
[1489449600000, 550812, ""],
[1489536000000, 550773, ""],
[1489622400000, 544180, ""],
[1489708800000, 532284, ""],
[1489795200000, 546592, ""],
[1489881600000, 545925, "#2B6A94"],
[1489968000000, 550721, ""],
[1490054400000, 539253, ""],
[1490140800000, 536258, ""],
[1490227200000, 544210, ""],
[1490313600000, 560977, ""],
[1490400000000, 562907, ""],
[1490486400000, 554817, "#2B6A94"],
[1490572800000, 552973, ""],
[1490659200000, 551875, ""],
[1490745600000, 554853, ""],
[1490832000000, 553309, ""],
[1490918400000, 551987, ""],
[1491004800000, 551671, ""],
[1491091200000, 541915, "#2B6A94"],
[1491177600000, 541280, ""] ]},{
"key" : "Price",
"values" : [
[1489363200000, 19.99, ""],
[1489449600000, 19.99, ""],
[1489536000000, 19.99, ""],
[1489622400000, 19.99, ""],
[1489708800000, 19.99, ""],
[1489795200000, 19.99, ""],
[1489881600000, 19.99, "#2B6A94"],
[1489968000000, 19.99, ""],
[1490054400000, 19.99, ""],
[1490140800000, 19.99, ""],
[1490227200000, 19.99, ""],
[1490313600000, 19.99, ""],
[1490400000000, 19.99, ""],
[1490486400000, 19.99, "#2B6A94"],
[1490572800000, 19.99, ""],
[1490659200000, 19.99, ""],
[1490745600000, 19.99, ""],
[1490832000000, 19.99, ""],
[1490918400000, 19.99, ""],
[1491004800000, 19.99, ""],
[1491091200000, 19.99, "#2B6A94"],
[1491177600000, 19.99, ""]] } ];
'''
import json
json_data = data.split('=')[1].split(';')[0]
data_dict = json.loads(json_data)
print(data_dict[0]['key'])