替换文本文件中不正确的URL，并在Python中修复它们_Python_Html_Parsing_Url

替换文本文件中不正确的URL，并在Python中修复它们

python html parsing url

替换文本文件中不正确的URL，并在Python中修复它们,python,html,parsing,url,Python,Html,Parsing,Url,我得到的URL删除了前向睫毛，我基本上需要更正文本文件中的URL 文件中的URL如下所示： https:www.ebay.co.ukitmReds-Challenge-184-214-Holo-shinny-Rare-Pokemon-Card-SM-unbreken-Bonds-rare14315281970？hash=item1cf1c4aa32%3Ag%3axbaaoswjgrfsgi1&LH_BIN=1 我需要将其更正为： https://www.ebay.co.uk/itm/Reds-C

我得到的URL删除了前向睫毛，我基本上需要更正文本文件中的URL

文件中的URL如下所示：

https:www.ebay.co.ukitmReds-Challenge-184-214-Holo-shinny-Rare-Pokemon-Card-SM-unbreken-Bonds-rare14315281970？hash=item1cf1c4aa32%3Ag%3axbaaoswjgrfsgi1&LH_BIN=1

我需要将其更正为：

https://www.ebay.co.uk/itm/Reds-Challenge-184-214-Holo-Shiny-Rare-Pokemon-Card-SM-Unbroken-Bonds-Rare/124315281970?hash=item1cf1c4aa32%3Ag%3AXBAAAOSwJGRfSGI1&LH_BIN=1

因此，基本上我需要一个正则表达式或另一种方法，它将在这些正向斜杠中编辑文件中的每个URL，并替换文件中损坏的URL。

而True:
while True:
    import time
    import re
    #input file
    fin = open("ebay2.csv", "rt")
    #output file to write the result to
    fout = open("out.txt", "wt")


    #for each line in the input file
    for line in fin:
        #read replace the string and write to output file
        fout.write(line.replace('https://www.ebay.co.uk/sch/', 'https://').replace('itm', '/itm/').replace('https:www.ebay','https://www.ebay'))

    with open('out.txt') as f:
      regex = r"\d{12}"
      subst = "/\\g<0>"
      for l in f:
          result = re.sub(regex, subst, l, 0, re.MULTILINE)
          if result:
              print(result)

    fin.close()
    fout.close()
    time.sleep(1)

导入时间
进口稀土
#输入文件
fin=打开（“ebay2.csv”、“rt”）
#要将结果写入的输出文件
fout=打开（“out.txt”、“wt”）
#对于输入文件中的每一行
对于fin中的行：
#读取替换字符串并写入输出文件
第四行写入（第四行替换）https://www.ebay.co.uk/sch/“，”https://'）。替换（'itm'，'/itm/'）。替换（'https:www.ebay'，'https://www.ebay'))
将open（'out.txt'）作为f：
regex=r“\d{12}”
subst=“/\\g”
对于f中的l：
结果=re.sub（正则表达式、subst、l、0、re.MULTILINE）
如果结果为：
打印（结果）
财务结束（）
fout.close（）
时间。睡眠（1）

我终于想到了这个。这有点笨拙，但它的速度足够快。

URL是否遵循任何特定模式？例如，如果我们有“http:example.combacd”，我们怎么知道它是“”还是“”？如果你的URL确实遵循一种模式，那么最好包含一些这样人们就能看到这种模式。这种模式只是一种标准的易趣商品URL。它们都是一样的。你试过什么了吗？有问题吗？它可以帮助您添加您尝试过的示例，以便我们可以尝试并帮助您修复它。