Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/330.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python s这取决于您的原始内容是什么-如果它是请求对象(在我的示例中),那么您将需要获取要解析的文本内容。在您的例子中,您的对象似乎是str对象,因此只需传递str本身就足够了。如果您在解析json时遇到问题,我建议您在上获取更多的信息。我肯定已经了解到json模_Python_Json_Beautifulsoup - Fatal编程技术网

Python s这取决于您的原始内容是什么-如果它是请求对象(在我的示例中),那么您将需要获取要解析的文本内容。在您的例子中,您的对象似乎是str对象,因此只需传递str本身就足够了。如果您在解析json时遇到问题,我建议您在上获取更多的信息。我肯定已经了解到json模

Python s这取决于您的原始内容是什么-如果它是请求对象(在我的示例中),那么您将需要获取要解析的文本内容。在您的例子中,您的对象似乎是str对象,因此只需传递str本身就足够了。如果您在解析json时遇到问题,我建议您在上获取更多的信息。我肯定已经了解到json模,python,json,beautifulsoup,Python,Json,Beautifulsoup,s这取决于您的原始内容是什么-如果它是请求对象(在我的示例中),那么您将需要获取要解析的文本内容。在您的例子中,您的对象似乎是str对象,因此只需传递str本身就足够了。如果您在解析json时遇到问题,我建议您在上获取更多的信息。我肯定已经了解到json模块有多么繁琐。有些人可能会觉得这很难看,但为了让JSON模块解析字符串,我必须这样做:pre_JSON='{'+'.join(my_list)然后my_JSON=JSON.loads(pre_JSON)@JohnLaudun听起来更像是传递给j


s这取决于您的原始内容是什么-如果它是
请求
对象(在我的示例中),那么您将需要获取要解析的文本
内容
。在您的例子中,您的对象似乎是
str
对象,因此只需传递
str
本身就足够了。如果您在解析json时遇到问题,我建议您在上获取更多的信息。我肯定已经了解到json模块有多么繁琐。有些人可能会觉得这很难看,但为了让JSON模块解析字符串,我必须这样做:
pre_JSON='{'+'.join(my_list)
然后
my_JSON=JSON.loads(pre_JSON)
@JohnLaudun听起来更像是传递给
json
模块的字符串格式不正确。您需要确保从html解析的字符串结果可以作为字典读取。如果缺少
{
那么它可能是被错误地从html解析中剥离出来的。我喜欢用
regex
这样做的想法,这总是超出我的理解,但是行
script\u tag=re.findall(r'q\(?s:.+)\),text)
抛出一个
错误:未知扩展名
。它看起来不像右括号?
739 if char==”:740 break-->741 raise error(“未知扩展名”)
我喜欢使用
regex
这样做的想法,这总是超出我的理解范围,但是行
script\u tag=re.findall(r'q\((?s:.+)\),text)
抛出了一个
错误:未知扩展名
。看起来它不像右括号?
739 if char=>)“:740中断-->741引发错误(“未知扩展”)
<script>q("talkPage.init", {
"event":"TEDGlobal 2005",
"filmed":1120694400,
"published":1158019860,
import json

text = open('dawkins_script_element.txt', 'r').read()
data = json.loads(text)
<script>q("talkPage.init", {
"el": "[data-talk-page]",
"__INITIAL_DATA__":
<script>foo</script>
<script>bar</script>
<script>q("talkPage.init",{
"foo1":"bar1",
"event":"TEDGlobal 2005",
"filmed":1120694400,
"published":1158019860,
"foo2":"bar2"
})</script>
<script>q("talkPage.init",{
"foo1":"bar1",
"event":"foobar",
"filmed":123,
"published":456,
"foo2":"bar2"
})</script>
<script>foo</script>
<script>bar</script>
res = requests.get(url) # your link here
soup = bs4.BeautifulSoup(res.content)
my_list = [i.string.lstrip('q("talkPage.init", ').rstrip(')') for i in soup.select('script') if i.string and i.string.startswith('q')]

# my_list should now be filled with all the json text that is from a <script> tag followed by a 'q'
# note that I lstrip and rstrip on the script based no your sample (assuming there's a closing bracket), but if the convention is different you'll need to update that accordingly.

#...#
my_jsons = []
for json_string in my_list:
    my_jsons.append(json.loads(json_string))

# parse your my_jsons however you want.
print(my_jsons[0]['event'])
print(my_jsons[0]['filmed'])
print(my_jsons[0]['published'])

# Output:
# TEDGlobal 2005
# 1120694400
# 1158019860
import re
# Filters the script-tag all the way to end ')' of q.
scipt_tag = re.findall(r'<script>q\((?s:.+)\)', t)
json_content = re.search(r'(?<=q\()(?s:.+)\)', script_tag[0]).group()
json_content = json_content[:-1]  # Strip last ')'
import json
json_content = json.loads(json_content)
json_content['event']  # or whatever
def get_val(a):
re.search('r(?<=' + a + r'\": )(.+)').group(0)
def get_metadata(the_file):

    # Load the modules we need
    from bs4 import BeautifulSoup
    import json
    import re
    from datetime import datetime

    # Read the file, load it into BS, then grab section we want
    text = the_file.read()
    soup = BeautifulSoup(text, "html5lib")
    my_list = [i.string.lstrip('q("talkPage.init", {\n\t"el": "[data-talk-page]",\n\t "__INITIAL_DATA__":')
               .rstrip('})')
               for i in soup.select('script') 
               if i.string and i.string.startswith('q')]

    # Read first layer of JSON and get out those elements we want
    pre_json = '{"' + "".join(my_list)
    my_json = json.loads(pre_json)
    slug = my_json['slug']
    vcount = my_json['viewed_count']
    event = my_json['event']

    # Read second layer of JSON and get out listed elements:
    properties = "filmed,published" # No spaces between terms!
    talks_listed = str(my_json['talks']).split(",")
    regex_list = [".*("+i+").*" for i in properties.split(",")]
    matches = []
    for e in regex_list:
        filtered = filter(re.compile(e).match, talks_listed)
        indexed = "".join(filtered).split(":")[1]
        matches.append(indexed)
    filmed = datetime.utcfromtimestamp(float(matches[0])).strftime('%Y-%m-%d')
    # published = datetime.utcfromtimestamp(float(matches[1])).strftime('%Y-%m-%d')
    return slug, vcount, event, filmed, #published