Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/sqlite/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 大字符串(从BeautifulSoup)放入SQLite表-不能使用传统方法_Python_Sqlite_Beautifulsoup_Python Requests - Fatal编程技术网

Python 大字符串(从BeautifulSoup)放入SQLite表-不能使用传统方法

Python 大字符串(从BeautifulSoup)放入SQLite表-不能使用传统方法,python,sqlite,beautifulsoup,python-requests,Python,Sqlite,Beautifulsoup,Python Requests,我希望你能帮忙 我目前正在编写一个代码来提取SteamSpy.com上各种游戏的历史数据。在Patreon上支持项目后,您可以查看每个游戏的各种指标的悠久历史。我想在多个游戏之间进行比较,因此我想提取数据 我从前面知道,BeautifulSoup对这项任务非常有帮助,但不幸的是,我不能像前面那样使用它。我将在下面详细描述它,但主要问题是,所有相关数据都包含在一个标记中 示例:Dota2 网址: 源代码: 源代码 这是我感兴趣的源代码部分(登录时显然要长得多,您可以访问历史数据) 输出 这是我

我希望你能帮忙

我目前正在编写一个代码来提取SteamSpy.com上各种游戏的历史数据。在Patreon上支持项目后,您可以查看每个游戏的各种指标的悠久历史。我想在多个游戏之间进行比较,因此我想提取数据

我从前面知道,BeautifulSoup对这项任务非常有帮助,但不幸的是,我不能像前面那样使用它。我将在下面详细描述它,但主要问题是,所有相关数据都包含在一个标记中

  • 示例:Dota2
  • 网址:
  • 源代码:
源代码 这是我感兴趣的源代码部分(登录时显然要长得多,您可以访问历史数据)

输出 这是我的输出,它显然与源代码中标记的内容类似:

C:\Python35\python.exe "C:/Users/nohgjk/Dropbox/Gaming/Project steamSpy/Python/steamSpy - Test/steamSpySoup2.py"

var data2sales= [
{
  "key": "Owners",
  "bar": true,
  "values": [
[1489363200000, 549045, ""],
[1489449600000, 550812, ""],
[1489536000000, 550773, ""],
[1489622400000, 544180, ""],
[1489708800000, 532284, ""],
[1489795200000, 546592, ""],
[1489881600000, 545925, "#2B6A94"],
[1489968000000, 550721, ""],
[1490054400000, 539253, ""],
[1490140800000, 536258, ""],
[1490227200000, 544210, ""],
[1490313600000, 560977, ""],
[1490400000000, 562907, ""],
[1490486400000, 554817, "#2B6A94"],
[1490572800000, 552973, ""],
[1490659200000, 551875, ""],
[1490745600000, 554853, ""],
[1490832000000, 553309, ""],
[1490918400000, 551987, ""],
[1491004800000, 551671, ""],
[1491091200000, 541915, "#2B6A94"],
[1491177600000, 541280, ""] ]},{
  "key" : "Price",
  "values" : [
[1489363200000, 19.99, ""],
[1489449600000, 19.99, ""],
[1489536000000, 19.99, ""],
[1489622400000, 19.99, ""],
[1489708800000, 19.99, ""],
[1489795200000, 19.99, ""],
[1489881600000, 19.99, "#2B6A94"],
[1489968000000, 19.99, ""],
[1490054400000, 19.99, ""],
[1490140800000, 19.99, ""],
[1490227200000, 19.99, ""],
[1490313600000, 19.99, ""],
[1490400000000, 19.99, ""],
[1490486400000, 19.99, "#2B6A94"],
[1490572800000, 19.99, ""],
[1490659200000, 19.99, ""],
[1490745600000, 19.99, ""],
[1490832000000, 19.99, ""],
[1490918400000, 19.99, ""],
[1491004800000, 19.99, ""],
[1491091200000, 19.99, "#2B6A94"],
[1491177600000, 19.99, ""]] } ];


Process finished with exit code 0

我在这个例子中用了你的例子。 使用
regex
查找方括号中的所有文本

像这样:

ownr = soup.find("div", {"id": "tab-sales"}).find("script").get_text()

some_data = []

matches = re.findall(r'\[(.*?)\]', ownr)
for match in matches:
    some_data.append(match.split(','))

owners, price = zip(*[iter(some_data)] * (len(some_data) / 2))
哪些输出(预编辑):


查找有关regex的更多信息。

似乎是regex的任务

lists = re.findall(pattern=r"([0-9]{13},.*, \".*\")", string=ownr)
lists
输出:

['1489363200000, 97073321, ""',
 '1489449600000, 97138657, ""',
 '1489536000000, 97126694, ""',
 '1489622400000, 98535521, ""',
 '1489708800000, 98482905, ""',
 '1489795200000, 98496091, ""',
 '1489881600000, 98627987, "#2B6A94"',
 '1489968000000, 98798351, ""',
 '1490054400000, 98936652, ""',
 '1490140800000, 99025494, ""',
 '1490227200000, 99208644, ""',
 '1490313600000, 99163634, ""',
 '1490400000000, 99097059, ""',
 '1490486400000, 98986347, "#2B6A94"',
 '1490572800000, 99005343, ""',
 '1490659200000, 99023673, ""',
 '1490745600000, 99084059, ""',
 '1490832000000, 98988641, ""',
 '1490918400000, 99120523, ""',
 '1491004800000, 99058884, ""',
 '1491091200000, 99206546, "#2B6A94"',
 '1491177600000, 99155567, ""',
 '1489363200000,, "#ffffff"',
 '1489449600000,, "#ffffff"',
 '1489536000000,, "#ffffff"',
 '1489622400000,, "#ffffff"',
 '1489708800000,, "#ffffff"',
 '1489795200000,, "#ffffff"',
 '1489881600000,, "#ffffff"',
 '1489968000000,, "#ffffff"',
 '1490054400000,, "#ffffff"',
 '1490140800000,, "#ffffff"',
 '1490227200000,, "#ffffff"',
 '1490313600000,, "#ffffff"',
 '1490400000000,, "#ffffff"',
 '1490486400000,, "#ffffff"',
 '1490572800000,, "#ffffff"',
 '1490659200000,, "#ffffff"',
 '1490745600000,, "#ffffff"',
 '1490832000000,, "#ffffff"',
 '1490918400000,, "#ffffff"',
 '1491004800000,, "#ffffff"',
 '1491091200000,, "#ffffff"',
 '1491177600000,, "#ffffff"']

我试着分析你代码的输出

data = '''
var data2sales= [
{
  "key": "Owners",
  "bar": true,
  "values": [
[1489363200000, 549045, ""],
[1489449600000, 550812, ""],
[1489536000000, 550773, ""],
[1489622400000, 544180, ""],
[1489708800000, 532284, ""],
[1489795200000, 546592, ""],
[1489881600000, 545925, "#2B6A94"],
[1489968000000, 550721, ""],
[1490054400000, 539253, ""],
[1490140800000, 536258, ""],
[1490227200000, 544210, ""],
[1490313600000, 560977, ""],
[1490400000000, 562907, ""],
[1490486400000, 554817, "#2B6A94"],
[1490572800000, 552973, ""],
[1490659200000, 551875, ""],
[1490745600000, 554853, ""],
[1490832000000, 553309, ""],
[1490918400000, 551987, ""],
[1491004800000, 551671, ""],
[1491091200000, 541915, "#2B6A94"],
[1491177600000, 541280, ""] ]},{
  "key" : "Price",
  "values" : [
[1489363200000, 19.99, ""],
[1489449600000, 19.99, ""],
[1489536000000, 19.99, ""],
[1489622400000, 19.99, ""],
[1489708800000, 19.99, ""],
[1489795200000, 19.99, ""],
[1489881600000, 19.99, "#2B6A94"],
[1489968000000, 19.99, ""],
[1490054400000, 19.99, ""],
[1490140800000, 19.99, ""],
[1490227200000, 19.99, ""],
[1490313600000, 19.99, ""],
[1490400000000, 19.99, ""],
[1490486400000, 19.99, "#2B6A94"],
[1490572800000, 19.99, ""],
[1490659200000, 19.99, ""],
[1490745600000, 19.99, ""],
[1490832000000, 19.99, ""],
[1490918400000, 19.99, ""],
[1491004800000, 19.99, ""],
[1491091200000, 19.99, "#2B6A94"],
[1491177600000, 19.99, ""]] } ];
'''
使用此代码

import json
json_data = data.split('=')[1].split(';')[0]
data_dict = json.loads(json_data)
print(data_dict[0]['key'])

现在data_dict是一本python字典。

绝对出色的伴侣,这正是我想要的。当我将regex作为re导入时,它可以正常工作。我已经根据您上次的请求更新了代码,以拆分为所有者/价格@user3494191运行代码时,我得到以下错误代码:
Traceback(最后一次调用):文件“C:/Users/nohgjk/Dropbox/Gaming/Project-steamSpy/Python/steamSpy-Test/steamSpySoup5.py”,第71行,在owners中,price=zip(*[iter(some_数据)]*(len(some_数据)/2))TypeError:无法将序列乘以“float”类型的非int
@zroq谢谢您的回答,这确实是解决方案。我已经将@Zrog的回答标记为已接受,但感谢您也花时间回答。)这似乎是一个更好的解决方案,我将在稍后研究。谢谢你的帮助。它工作得非常好,最后,我想我会选择这个解决方案,我已经完全忘记了在这类任务中使用JSON。再次感谢,我有个问题。在登录后运行代码时,在某些地方,第三个“列”中填充了一个很长的字符串(我猜与Steam销售相关)。例如,在输出示例中,
“#2B6A94”
替换为
”http://store.steampowered.com/news/20987/“
“哦……先生!!”
甚至
“影子战士2 25%折扣+\'王之路\'DLC”
-此外,现在有5个“列”而不是3个,但我不怀疑这与此有任何关系。你能帮我解决这个问题吗?否则我真的很喜欢JSON方法!当第三列有一个长字符串时,您想做什么?它是
\
字符。我设法用一些
.replace
命令来运行它。谢谢你的帮助。
['1489363200000, 97073321, ""',
 '1489449600000, 97138657, ""',
 '1489536000000, 97126694, ""',
 '1489622400000, 98535521, ""',
 '1489708800000, 98482905, ""',
 '1489795200000, 98496091, ""',
 '1489881600000, 98627987, "#2B6A94"',
 '1489968000000, 98798351, ""',
 '1490054400000, 98936652, ""',
 '1490140800000, 99025494, ""',
 '1490227200000, 99208644, ""',
 '1490313600000, 99163634, ""',
 '1490400000000, 99097059, ""',
 '1490486400000, 98986347, "#2B6A94"',
 '1490572800000, 99005343, ""',
 '1490659200000, 99023673, ""',
 '1490745600000, 99084059, ""',
 '1490832000000, 98988641, ""',
 '1490918400000, 99120523, ""',
 '1491004800000, 99058884, ""',
 '1491091200000, 99206546, "#2B6A94"',
 '1491177600000, 99155567, ""',
 '1489363200000,, "#ffffff"',
 '1489449600000,, "#ffffff"',
 '1489536000000,, "#ffffff"',
 '1489622400000,, "#ffffff"',
 '1489708800000,, "#ffffff"',
 '1489795200000,, "#ffffff"',
 '1489881600000,, "#ffffff"',
 '1489968000000,, "#ffffff"',
 '1490054400000,, "#ffffff"',
 '1490140800000,, "#ffffff"',
 '1490227200000,, "#ffffff"',
 '1490313600000,, "#ffffff"',
 '1490400000000,, "#ffffff"',
 '1490486400000,, "#ffffff"',
 '1490572800000,, "#ffffff"',
 '1490659200000,, "#ffffff"',
 '1490745600000,, "#ffffff"',
 '1490832000000,, "#ffffff"',
 '1490918400000,, "#ffffff"',
 '1491004800000,, "#ffffff"',
 '1491091200000,, "#ffffff"',
 '1491177600000,, "#ffffff"']
data = '''
var data2sales= [
{
  "key": "Owners",
  "bar": true,
  "values": [
[1489363200000, 549045, ""],
[1489449600000, 550812, ""],
[1489536000000, 550773, ""],
[1489622400000, 544180, ""],
[1489708800000, 532284, ""],
[1489795200000, 546592, ""],
[1489881600000, 545925, "#2B6A94"],
[1489968000000, 550721, ""],
[1490054400000, 539253, ""],
[1490140800000, 536258, ""],
[1490227200000, 544210, ""],
[1490313600000, 560977, ""],
[1490400000000, 562907, ""],
[1490486400000, 554817, "#2B6A94"],
[1490572800000, 552973, ""],
[1490659200000, 551875, ""],
[1490745600000, 554853, ""],
[1490832000000, 553309, ""],
[1490918400000, 551987, ""],
[1491004800000, 551671, ""],
[1491091200000, 541915, "#2B6A94"],
[1491177600000, 541280, ""] ]},{
  "key" : "Price",
  "values" : [
[1489363200000, 19.99, ""],
[1489449600000, 19.99, ""],
[1489536000000, 19.99, ""],
[1489622400000, 19.99, ""],
[1489708800000, 19.99, ""],
[1489795200000, 19.99, ""],
[1489881600000, 19.99, "#2B6A94"],
[1489968000000, 19.99, ""],
[1490054400000, 19.99, ""],
[1490140800000, 19.99, ""],
[1490227200000, 19.99, ""],
[1490313600000, 19.99, ""],
[1490400000000, 19.99, ""],
[1490486400000, 19.99, "#2B6A94"],
[1490572800000, 19.99, ""],
[1490659200000, 19.99, ""],
[1490745600000, 19.99, ""],
[1490832000000, 19.99, ""],
[1490918400000, 19.99, ""],
[1491004800000, 19.99, ""],
[1491091200000, 19.99, "#2B6A94"],
[1491177600000, 19.99, ""]] } ];
'''
import json
json_data = data.split('=')[1].split(';')[0]
data_dict = json.loads(json_data)
print(data_dict[0]['key'])