Python 如何更好地自动化需要多个程序和URL的程序_Python_Awk_Sed_Beautifulsoup

Python 如何更好地自动化需要多个程序和URL的程序

python awk sed

Python 如何更好地自动化需要多个程序和URL的程序,python,awk,sed,beautifulsoup,Python,Awk,Sed,Beautifulsoup,我试图从一个叫做82games.com的网站上收集一些具体的数据。我目前有一个使用beautifulsoup、awk和sed的解决方案，但并不理想。对于初学者，我希望能够迭代多个html页面，并在所有页面上迭代运行我的程序，而不是重复该过程，例如为每个页面键入新的url和新的destination.txt文件 Python、BS4、awk、sed import requests import re from bs4 import BeautifulSoup def function():

我试图从一个叫做82games.com的网站上收集一些具体的数据。我目前有一个使用beautifulsoup、awk和sed的解决方案，但并不理想。对于初学者，我希望能够迭代多个html页面，并在所有页面上迭代运行我的程序，而不是重复该过程，例如为每个页面键入新的url和新的destination.txt文件

Python、BS4、awk、sed

import requests
import re
from bs4 import BeautifulSoup

def function():
    page = requests.get('http://www.82games.com/1819/18ATL16.HTM#bypos')

    soup = BeautifulSoup(page.text, 'html.parser')
    cleantext = BeautifulSoup(page.text, "html.parser").text
    text = str(soup)
    print(type(text))
    print(str(cleantext))
    ans = remove(text)
    return ans

def remove(string): 
    return "".join(string.split())

if __name__ == '__main__':
    function()

驱动程序代码（在bash中） python nba_stats.py | awk NF>JohnCollinsAH2.txt

sed-i'1，/Production By position/d'JohnCollinsAH2.txt

看一看。它是一个网络蜘蛛引擎，允许您跟踪链接并输出结构化数据。这样您就可以保留解析逻辑