我正在处理Excel列表,希望转到:
栓皮栎[医]氧化腺苷(氧化腺苷)豪厄尔
转入:
<i>Quercus agrifolia</i> var. <i>oxyadenia</i> (Torr.) J.T. Howell我已经应用了格式设置的富文本格式化列表,但我希望将其发送到Access,其中显式地包含在相关文本周围的格式标签。
发布于 2015-11-19 06:21:48
我也想做同样的事情,在MSDN上找到了一个答案:将格式化excel单元格的内容转换为HTML格式。
我希望这也能帮助你,它使用了excel宏。
编辑:当我需要修改嵌套标记的代码时,请查找我对下面宏的更新:
Function fnConvert2HTML(myCell As Range) As String
Dim bldTagOn, itlTagOn, ulnTagOn, colTagOn As Boolean
Dim i, chrCount As Integer
Dim chrCol, chrLastCol, htmlTxt, htmlEnd As String
bldTagOn = False
itlTagOn = False
ulnTagOn = False
colTagOn = False
chrCol = "NONE"
'htmlTxt = "<html>"
htmlTxt = ""
chrCount = myCell.Characters.Count
For i = 1 To chrCount
htmlEnd = ""
With myCell.Characters(i, 1)
If (.Font.Color) Then
chrCol = fnGetCol(.Font.Color)
If Not colTagOn Then
htmlTxt = htmlTxt & "<font color=#" & chrCol & ">"
colTagOn = True
Else
If chrCol <> chrLastCol Then htmlTxt = htmlTxt & "</font><font color=#" & chrCol & ">"
End If
Else
chrCol = "NONE"
If colTagOn Then
htmlEnd = "</font>" & htmlEnd
'htmlTxt = htmlTxt & "</font>"
colTagOn = False
End If
End If
chrLastCol = chrCol
If .Font.Bold = True Then
If Not bldTagOn Then
htmlTxt = htmlTxt & "<b>"
bldTagOn = True
End If
Else
If bldTagOn Then
'htmlTxt = htmlTxt & "</b>"
htmlEnd = "</b>" & htmlEnd
bldTagOn = False
End If
End If
If .Font.Italic = True Then
If Not itlTagOn Then
htmlTxt = htmlTxt & "<i>"
itlTagOn = True
End If
Else
If itlTagOn Then
'htmlTxt = htmlTxt & "</i>"
htmlEnd = "</i>" & htmlEnd
itlTagOn = False
End If
End If
If .Font.Underline > 0 Then
If Not ulnTagOn Then
htmlTxt = htmlTxt & "<u>"
ulnTagOn = True
End If
Else
If ulnTagOn Then
'htmlTxt = htmlTxt & "</u>"
htmlEnd = "</u>" & htmlEnd
ulnTagOn = False
End If
End If
If (Asc(.Text) = 10) Then
htmlTxt = htmlTxt & htmlEnd & "<br>"
Else
htmlTxt = htmlTxt & htmlEnd & .Text
End If
End With
Next
If colTagOn Then
htmlTxt = htmlTxt & "</font>"
colTagOn = False
End If
If bldTagOn Then
htmlTxt = htmlTxt & "</b>"
bldTagOn = False
End If
If itlTagOn Then
htmlTxt = htmlTxt & "</i>"
itlTagOn = False
End If
If ulnTagOn Then
htmlTxt = htmlTxt & "</u>"
ulnTagOn = False
End If
'htmlTxt = htmlTxt & "</html>"
fnConvert2HTML = htmlTxt
End Function
Function fnGetCol(strCol As String) As String
Dim rVal, gVal, bVal As String
strCol = Right("000000" & Hex(strCol), 6)
bVal = Left(strCol, 2)
gVal = Mid(strCol, 3, 2)
rVal = Right(strCol, 2)
fnGetCol = rVal & gVal & bVal
End Function发布于 2018-08-01 01:36:51
这里有一个替代解决方案,它更快,但产生更混乱的输出(因为它使用Word的HTML引擎)。您需要向VBA项目添加以下引用:
然后,通过运行例如调用下面的代码。立即窗口中的convertToHtml(Range("A1:A100")):
' Given a temporary file path, load the HTML in that file
' and return the first paragraph's inner HTML.
Function extractFirstParagraph(filePath As String) As String
Dim fs As New FileSystemObject, _
html As New MSHTML.HTMLDocument, _
par As MSHTML.HTMLGenericElement
html.body.innerHTML = fs.OpenTextFile(filePath).ReadAll
Set par = html.getElementsByTagName("P")(0)
extractFirstParagraph = par.innerHTML
End Function
Sub convertToHtml(rng As Range)
' Open a single Word instance.
Dim w As New Word.Application, d As Word.Document
Set d = w.Documents.Add
Dim cell As Range
Const tempFile As String = "c:\temp\msword.html"
' For each cell in the range ...
For Each cell In rng
If cell.Value <> "" Then
' ... copy it into the Word document ...
cell.Copy
d.Range.PasteSpecial xlPasteFormats
' ... save the Word document as HTML
' in a temporary file ...
d.SaveAs2 tempFile, wdFormatHTML
' ... and extract the first paragraph.
cell.Value = extractFirstParagraph(tempFile)
Debug.Print "Cell " & cell.Address & " done."
End If
Next cell
' Close Word once you're done. Note that if a bug
' is encountered, this cleanup won't occur and the
' Word process will need to be killed to release
' file locks, otherwise you get an unhelpful error.
w.Quit False
End Sub您可以使用正则表达式来清理输出,方法是添加对Microsoft VBScript正则表达式5.5的引用,并运行如下函数:
' Used to avoid duplication in cleanWordHtml.
Function eraseInPlace(ByRef r As RegExp, _
ByRef s As String, p As String) As String
r.Pattern = p
s = r.Replace(s, "")
End Function
' Eliminate junk tags from HTML generated by Word.
Function cleanWordHtml(inputString As String)
Dim r As New RegExp
r.Global = True
eraseInPlace r, inputString, "mso-[^;""]*(; )?"
eraseInPlace r, inputString, " style="""""
eraseInPlace r, inputString, "<\?xml[^>]*>"
eraseInPlace r, inputString, "<\/?o:[^>]*>"
eraseInPlace r, inputString, "<SPAN><\/SPAN>"
cleanWordHtml = inputString
End Function如果您需要将<span>标记转换为<font>标记(我也需要这样做,因为我正在导入Access富文本字段,该字段不支持<font>),请尝试调用此函数并传入在extractFirstParagraph函数中构造的MSHTML对象:
' Given a <p> DOM node, replace any children of the
' form <span style="color: foo"> with <font color="foo">.
Function convertSpanToFont(ByRef par As MSHTML.HTMLGenericElement, _
ByRef doc As MSHTML.HTMLDocument)
Dim span As MSHTML.HTMLSpanElement, _
font As MSHTML.HTMLFontElement
For Each span In par.getElementsByTagName("span")
Set font = doc.createElement("font")
If IsNull(span.Style.Color) _
Or span.Style.Color <> "" Then
font.Color = span.Style.Color
font.innerHTML = span.innerHTML
span.insertAdjacentElement "afterEnd", font
span.removeNode True
End If
Next span
End Function我还考虑将整个电子表格从Excel中保存为HTML,然后使用另一个工具将其转换为Access可以处理的格式,但Excel的HTML导出会生成CSS类,而不是内联样式。如果只需要将电子表格的一部分转换为HTML,则此方法也很有用。
发布于 2021-03-03 21:10:05
实际上,使用Excel的内部XML表示并使用XSL样式表将其转换为XML的解决方案要干净得多,速度更快。
您可以在https://github.com/HeimMatthias/Excel-HTML-Tools-Public上找到VBA代码和所需的XSLT。
免责声明:多年来,我在使用了不可饶恕的缓慢VBA脚本之后,自己编写了这个工具。在这个小提琴中可以看到示例输出。
技术细节:
-Property可以接受一个参数xlRangeValueXMLSpreadsheet,用它返回一个完全格式化的xml-字符串,其中包含其内容的html-对象。在包含以下内容的单元格上运行ActiveSheet.Range("A1").Value(xlRangeValueXMLSpreadsheet)
栓皮栎[医]氧化腺苷(氧化腺苷)豪厄尔
返回以下字符串:
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font ss:FontName="Calibri" ss:Size="11" ss:Color="#000000"/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID="s62">
<Font ss:FontName="Calibri" ss:Size="11" ss:Color="#000000"
ss:Italic="1"/>
</Style>
</Styles>
<Worksheet ss:Name="Tabelle1">
<Table ss:ExpandedColumnCount="1" ss:ExpandedRowCount="1"
ss:DefaultColumnWidth="61.714285714285708"
ss:DefaultRowHeight="14.571428571428571">
<Row>
<Cell ss:StyleID="s62"><ss:Data ss:Type="String"
xmlns="http://www.w3.org/TR/REC-html40"><I><Font html:Color="#000000">Quercus agrifolia</Font></I><Font
html:Color="#000000"> var. </Font><I><Font html:Color="#000000">oxyadenia</Font></I><Font
html:Color="#000000"> (Torr.) J.T. Howell</Font></ss:Data></Cell>
</Row>
</Table>
</Worksheet>
</Workbook>Cell-Tag包围了包含--或多或少--干净的html-代码的ss:Data-Object。无论如何,净化这些数据以获得干净的html比对每个字母进行解析和访问其样式要简单得多,速度也快得多。(要快得多,我指的是比以前快100倍的轻松程度)。
需要注意的事项:单元格样式(来自模板和单个单元格)并不表示为html。这并不总是特别明显。在上面的例子中,单元格自动收到斜体样式,因为第一个单词已经被斜体化了。由于字符串中也有罗马(直立)字,斜体段落的代码用html表示。但如果整个细胞被斜体化,<i>-tags就会丢失。这尤其令人困惑,因为如果对应的样式以<i>-Tag作为属性,则不能仅仅用ss:Italic="1"来包围整个html,因此还需要检查它是否包括html-部件中的-Tags。当然,这适用于所有单元格样式。
https://stackoverflow.com/questions/33620147
复制相似问题