Python 如何在while循环中处理意外的json响应_Python_Python 3.x_Python Requests

Python 如何在while循环中处理意外的json响应

python python-3.x

Python 如何在while循环中处理意外的json响应,python,python-3.x,python-requests,Python,Python 3.x,Python Requests,我正在构建一个python脚本，根据数据库中提供的用户列表从Instagram收集数据。但是，我在处理意外的JSON响应时遇到了一些问题为了提供一些上下文，该程序正在从我的数据库表中获取用户名（24/7，循环数百个帐户-因此是while True:循环），请求一个带有该用户名的URL，并期待某个JSON响应（具体来说，它在响应中查找['entry_data']['ProfilePage'][0]）。但是，当Instagram上找不到用户名时，JSON就不同了，预期的部分（['entry\u

我正在构建一个python脚本，根据数据库中提供的用户列表从Instagram收集数据。但是，我在处理意外的JSON响应时遇到了一些问题

为了提供一些上下文，该程序正在从我的数据库表中获取用户名（24/7，循环数百个帐户-因此是

while True:

循环），请求一个带有该用户名的URL，并期待某个JSON响应（具体来说，它在响应中查找

['entry_data']['ProfilePage'][0]

）。但是，当Instagram上找不到用户名时，JSON就不同了，预期的部分（

['entry\u data']['ProfilePage'][0]

）也不在其中。所以我的脚本崩溃了

使用当前代码：

def get_username_from_db():
    try:
        with connection.cursor() as cursor:
            cursor.execute("SELECT * FROM ig_users_raw WHERE `username` IS NOT NULL ORDER BY `ig_users_raw`.`last_checked` ASC LIMIT 1")
            row = cursor.fetchall()
            username = row[0]['username']
    except pymysql.IntegrityError:
        print('ERROR: ID already exists in PRIMARY KEY column')
    return username

def request_url(url):
    try:
        response = requests.get(url)
    except requests.HTTPError:
        raise requests.HTTPError(f'Received non 200 status code from {url}')
    except requests.RequestException:
        raise requests.RequestException
    else:
        return response.text

def extract_json_data(url):
    try:
        r = requests.get(url, headers=headers)
    except requests.HTTPError:
        raise requests.HTTPError('Received non-200 status code.')
    except requests.RequestException:
        raise requests.RequestException
    else:
        print(url)
        soup = BeautifulSoup(r.content, "html.parser")
        scripts = soup.find_all('script', type="text/javascript", text=re.compile('window._sharedData'))
        stringified_json = scripts[0].get_text().replace('window._sharedData = ', '')[:-1]
        j = json.loads(stringified_json)['entry_data']['ProfilePage'][0]
        return j

if __name__ == '__main__':
    while True:
        sleep(randint(5,15))
        username = get_username_from_db()
        url = f'https://www.instagram.com/{username}/'
        j = extract_json_data(url)
        json_string = json.dumps(j)
        user_id = j['graphql']['user']['id']
        username = j['graphql']['user']['username']
        #print(user_id)
        try:
            with connection.cursor() as cursor:
                db_data = (json_string, datetime.datetime.now(),user_id)
                sql = "UPDATE `ig_users_raw` SET json=%s, last_checked=%s WHERE `user_id`= %s "
                cursor.execute(sql, db_data)
                connection.commit()
                print(f'{datetime.datetime.now()} - data inserted for user: {user_id} - {username}')
        except pymysql.Error:
            print('ERROR: ', pymysql.Error)

我得到以下错误/回溯：

https://www.instagram.com/geloria.itunes/
Traceback (most recent call last):
  File "D:\Python\Ministry\ig_raw.py", line 63, in <module>
    j = extract_json_data(url)
  File "D:\Python\Ministry\ig_raw.py", line 55, in extract_json_data
    j = json.loads(stringified_json)['entry_data']['ProfilePage'][0]
  File "C:\Users\thoma\AppData\Local\Programs\Python\Python36-32\lib\json\__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "C:\Users\thoma\AppData\Local\Programs\Python\Python36-32\lib\json\decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Users\thoma\AppData\Local\Programs\Python\Python36-32\lib\json\decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)

https://www.instagram.com/geloria.itunes/
回溯（最近一次呼叫最后一次）：
文件“D:\Python\Ministry\ig_raw.py”，第63行，在
j=提取json数据（url）
文件“D:\Python\Ministry\ig\u raw.py”，第55行，在extract\u json\u数据中
j=json.loads（stringified_json）['entry_data']['ProfilePage'][0]
文件“C:\Users\thoma\AppData\Local\Programs\Python\Python36-32\lib\json\\ u_init\u_.py”，第354行，加载
返回\u默认\u解码器。解码
文件“C:\Users\thoma\AppData\Local\Programs\Python\Python36-32\lib\json\decoder.py”，第339行，在decode中
obj，end=self.raw\u decode（s，idx=\u w（s，0.end（））
文件“C:\Users\thoma\AppData\Local\Programs\Python\Python36-32\lib\json\decoder.py”，第357行，原始解码
从None引发JSONDecodeError（“预期值”，s，err.value）
json.decoder.JSONDecodeError:预期值：第2行第1列（字符1）

理想情况下，我希望它跳过该帐户（在本例中是

geloria.itunes

），然后移动到数据库中的下一个帐户。我可能想删除该帐户，或者至少从行中删除用户名

为了自己解决这个问题，我尝试了

if/else

循环，但如果继续，我只会在同一个帐户上循环

你对我如何处理这个具体问题有什么建议吗

谢谢

首先，您需要找出发生异常的原因

出现此错误的原因是告诉

json

解析无效（非json）字符串

只需使用您在回溯中提供的URL运行此示例：

重新导入
导入请求
从bs4导入BeautifulSoup
r=请求。获取（“https://www.instagram.com/geloria.itunes/")
打印（r.状态代码）#输出404（！）
soup=BeautifulSoup（r.content，“html.parser”）
scripts=soup.find_all（'script'，type=“text/javascript”，text=re.compile（'window.\u sharedData'））
stringified_json=脚本[0]。获取_text（）.replace（'window.\u sharedData='，''）[：-1]
打印（stringized_json）
#j=json.load（stringified_json）#将引发异常

输出：

\n（function（）{\n function normalizeError（err）{\n。。。
...
字符串化（normalizedError））；\n}\n}\n}\n}\n}（）
正如您所看到的，stringized\u json
不是有效的json字符串

正如您所提到的，它是无效的，因为此instagram页面被隐藏或不存在（HTTP状态代码为404未找到
）。您将错误的响应传递给json.loads（）
，因为您没有检查脚本中的响应状态代码
以下except
子句未捕获“404 case”，因为您收到了有效的HTTP响应，因此没有引发异常：
请求除外。HTTPError:
引发请求。HTTPError（'收到非200状态代码'）
除requests.RequestException外：
引发请求。请求异常

所以基本上你有两种方法来解决这个问题：

手动检查响应HTTP状态代码，如if r.status\u code！=200…
或者使用raise\u for_status（）
在400时引发异常。谢谢Ivan！你的建议有助于克服这个“问题”。