使用libreofficebasic阅读HTML页面
我是LibreOffice Basic的新手。我正试图在LibreOffice Calc中编写一个宏,它将从一个牢房(例如斯塔克)中读取韦斯特罗斯贵族住宅的名称,并通过在冰与火的维基上查找该住宅的文字来输出该住宅的文字。它应该是这样工作的: 以下是伪代码:使用libreofficebasic阅读HTML页面,html,screen-scraping,libreoffice-calc,libreoffice-basic,Html,Screen Scraping,Libreoffice Calc,Libreoffice Basic,我是LibreOffice Basic的新手。我正试图在LibreOffice Calc中编写一个宏,它将从一个牢房(例如斯塔克)中读取韦斯特罗斯贵族住宅的名称,并通过在冰与火的维基上查找该住宅的文字来输出该住宅的文字。它应该是这样工作的: 以下是伪代码: Read HouseName from column A Open HtmlFile at "http://www.awoiaf.westeros.org/index.php/House_" & HouseName Iterate
Read HouseName from column A
Open HtmlFile at "http://www.awoiaf.westeros.org/index.php/House_" & HouseName
Iterate through HtmlFile to find line which begins "<table class="infobox infobox-body"" // Finds the info box for the page.
Read Each Row in the table until Row begins Words
Read the contents of the next <td> tag, and return this as a string.
从A列读取房屋名称
在“打开HTML文件”http://www.awoiaf.westeros.org/index.php/House_“&房屋名称
迭代HtmlFile以查找以“开头的行,这主要有两个问题。
1.业绩
您的UDF需要在存储它的每个单元格中获取HTTP资源。
2.HTML
不幸的是,OpenOffice或LibreOffice中并没有HTML解析器,只有一个XML解析器。这就是为什么我们不能用UDF直接解析HTML
这将起作用,但速度缓慢且不太普及:
Public Function FETCHHOUSE(sHouse as String) as String
sURL = "http://awoiaf.westeros.org/index.php/House_" & sHouse
oSimpleFileAccess = createUNOService ("com.sun.star.ucb.SimpleFileAccess")
oInpDataStream = createUNOService ("com.sun.star.io.TextInputStream")
on error goto falseHouseName
oInpDataStream.setInputStream(oSimpleFileAccess.openFileRead(sUrl))
on error goto 0
dim delimiters() as long
sContent = oInpDataStream.readString(delimiters(), false)
lStartPos = instr(1, sContent, "<table class=" & chr(34) & "infobox infobox-body" )
if lStartPos = 0 then
FETCHHOUSE = "no infobox on page"
exit function
end if
lEndPos = instr(lStartPos, sContent, "</table>")
sTable = mid(sContent, lStartPos, lEndPos-lStartPos + 8)
lStartPos = instr(1, sTable, "Words" )
if lStartPos = 0 then
FETCHHOUSE = "no Words on page"
exit function
end if
lEndPos = instr(lStartPos, sTable, "</tr>")
sRow = mid(sTable, lStartPos, lEndPos-lStartPos + 5)
oTextSearch = CreateUnoService("com.sun.star.util.TextSearch")
oOptions = CreateUnoStruct("com.sun.star.util.SearchOptions")
oOptions.algorithmType = com.sun.star.util.SearchAlgorithms.REGEXP
oOptions.searchString = "<td[^<]*>"
oTextSearch.setOptions(oOptions)
oFound = oTextSearch.searchForward(sRow, 0, Len(sRow))
If oFound.subRegExpressions = 0 then
FETCHHOUSE = "Words header but no Words content on page"
exit function
end if
lStartPos = oFound.endOffset(0) + 1
lEndPos = instr(lStartPos, sRow, "</td>")
sWords = mid(sRow, lStartPos, lEndPos-lStartPos)
FETCHHOUSE = sWords
exit function
falseHouseName:
FETCHHOUSE = "House name does not exist"
End Function
公共函数FETCHHOUSE(sHouse作为字符串)作为字符串
苏尔=”http://awoiaf.westeros.org/index.php/House_“&sHouse
oSimpleFileAccess=createUNOService(“com.sun.star.ucb.SimpleFileAccess”)
oInpDataStream=createUNOService(“com.sun.star.io.TextInputStream”)
错误转到FalshouseName
oInpDataStream.setInputStream(oSimpleFileAccess.openFileRead(sUrl))
错误转到0
dim分隔符()的长度相同
sContent=oInpDataStream.readString(分隔符(),false)
lStartPos=instr(1,sContent),“亲密选民:我已经编辑了这个问题,我问的问题是否更清楚?