首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >将富文本转换为HTML格式标记

将富文本转换为HTML格式标记
EN

Stack Overflow用户
提问于 2015-11-09 23:16:46
回答 3查看 14.8K关注 0票数 6

我正在处理Excel列表,希望转到:

栓皮栎[医]氧化腺苷(氧化腺苷)豪厄尔

转入:

代码语言:javascript
复制
<i>Quercus agrifolia</i> var. <i>oxyadenia</i> (Torr.) J.T. Howell

我已经应用了格式设置的富文本格式化列表,但我希望将其发送到Access,其中显式地包含在相关文本周围的格式标签。

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2015-11-19 06:21:48

我也想做同样的事情,在MSDN上找到了一个答案:将格式化excel单元格的内容转换为HTML格式

我希望这也能帮助你,它使用了excel宏。

编辑:当我需要修改嵌套标记的代码时,请查找我对下面宏的更新:

代码语言:javascript
复制
Function fnConvert2HTML(myCell As Range) As String
    Dim bldTagOn, itlTagOn, ulnTagOn, colTagOn As Boolean
    Dim i, chrCount As Integer
    Dim chrCol, chrLastCol, htmlTxt, htmlEnd As String

    bldTagOn = False
    itlTagOn = False
    ulnTagOn = False
    colTagOn = False
    chrCol = "NONE"
    'htmlTxt = "<html>"
    htmlTxt = ""
    chrCount = myCell.Characters.Count

    For i = 1 To chrCount
    htmlEnd = ""
        With myCell.Characters(i, 1)
            If (.Font.Color) Then
                chrCol = fnGetCol(.Font.Color)
                If Not colTagOn Then
                    htmlTxt = htmlTxt & "<font color=#" & chrCol & ">"
                    colTagOn = True
                Else
                    If chrCol <> chrLastCol Then htmlTxt = htmlTxt & "</font><font color=#" & chrCol & ">"
                End If
            Else
                chrCol = "NONE"
                If colTagOn Then
                    htmlEnd = "</font>" & htmlEnd
                    'htmlTxt = htmlTxt & "</font>"
                    colTagOn = False
                End If
            End If
            chrLastCol = chrCol

            If .Font.Bold = True Then
                If Not bldTagOn Then
                    htmlTxt = htmlTxt & "<b>"
                    bldTagOn = True
                End If
            Else
                If bldTagOn Then
                    'htmlTxt = htmlTxt & "</b>"
                    htmlEnd = "</b>" & htmlEnd
                    bldTagOn = False
                End If
            End If

            If .Font.Italic = True Then
                If Not itlTagOn Then
                    htmlTxt = htmlTxt & "<i>"
                    itlTagOn = True
                End If
            Else
                If itlTagOn Then
                    'htmlTxt = htmlTxt & "</i>"
                    htmlEnd = "</i>" & htmlEnd
                    itlTagOn = False
                End If
            End If

            If .Font.Underline > 0 Then
                If Not ulnTagOn Then
                    htmlTxt = htmlTxt & "<u>"
                    ulnTagOn = True
                End If
            Else
                If ulnTagOn Then
                    'htmlTxt = htmlTxt & "</u>"
                    htmlEnd = "</u>" & htmlEnd
                    ulnTagOn = False
                End If
            End If

            If (Asc(.Text) = 10) Then
                htmlTxt = htmlTxt & htmlEnd & "<br>"
            Else
                htmlTxt = htmlTxt & htmlEnd & .Text
            End If

        End With
    Next

    If colTagOn Then
        htmlTxt = htmlTxt & "</font>"
        colTagOn = False
    End If
    If bldTagOn Then
        htmlTxt = htmlTxt & "</b>"
        bldTagOn = False
    End If
    If itlTagOn Then
        htmlTxt = htmlTxt & "</i>"
        itlTagOn = False
    End If
    If ulnTagOn Then
        htmlTxt = htmlTxt & "</u>"
        ulnTagOn = False
    End If
    'htmlTxt = htmlTxt & "</html>"
    fnConvert2HTML = htmlTxt
End Function

Function fnGetCol(strCol As String) As String
    Dim rVal, gVal, bVal As String
    strCol = Right("000000" & Hex(strCol), 6)
    bVal = Left(strCol, 2)
    gVal = Mid(strCol, 3, 2)
    rVal = Right(strCol, 2)
    fnGetCol = rVal & gVal & bVal
End Function
票数 11
EN

Stack Overflow用户

发布于 2018-08-01 01:36:51

这里有一个替代解决方案,它更快,但产生更混乱的输出(因为它使用Word的HTML引擎)。您需要向VBA项目添加以下引用:

  • Microsoft HTML对象库
  • Microsoft脚本运行时
  • Microsoft 16.0对象库

然后,通过运行例如调用下面的代码。立即窗口中的convertToHtml(Range("A1:A100"))

代码语言:javascript
复制
' Given a temporary file path, load the HTML in that file
' and return the first paragraph's inner HTML.
Function extractFirstParagraph(filePath As String) As String
    Dim fs As New FileSystemObject, _
        html As New MSHTML.HTMLDocument, _
        par As MSHTML.HTMLGenericElement
    html.body.innerHTML = fs.OpenTextFile(filePath).ReadAll
    Set par = html.getElementsByTagName("P")(0)
    extractFirstParagraph = par.innerHTML
End Function

Sub convertToHtml(rng As Range)
    ' Open a single Word instance.
    Dim w As New Word.Application, d As Word.Document
    Set d = w.Documents.Add

    Dim cell As Range
    Const tempFile As String = "c:\temp\msword.html"
    ' For each cell in the range ...
    For Each cell In rng
        If cell.Value <> "" Then
            ' ... copy it into the Word document ...
            cell.Copy
            d.Range.PasteSpecial xlPasteFormats
            ' ... save the Word document as HTML
            ' in a temporary file ...
            d.SaveAs2 tempFile, wdFormatHTML
            ' ... and extract the first paragraph.
            cell.Value = extractFirstParagraph(tempFile)
            Debug.Print "Cell " & cell.Address & " done."
        End If
    Next cell

    ' Close Word once you're done. Note that if a bug
    ' is encountered, this cleanup won't occur and the 
    ' Word process will need to be killed to release
    ' file locks, otherwise you get an unhelpful error.
    w.Quit False
End Sub

您可以使用正则表达式来清理输出,方法是添加对Microsoft VBScript正则表达式5.5的引用,并运行如下函数:

代码语言:javascript
复制
' Used to avoid duplication in cleanWordHtml.
Function eraseInPlace(ByRef r As RegExp, _
    ByRef s As String, p As String) As String
    r.Pattern = p
    s = r.Replace(s, "")
End Function

' Eliminate junk tags from HTML generated by Word.
Function cleanWordHtml(inputString As String)
    Dim r As New RegExp
    r.Global = True
    eraseInPlace r, inputString, "mso-[^;""]*(; )?"
    eraseInPlace r, inputString, " style="""""
    eraseInPlace r, inputString, "<\?xml[^>]*>"
    eraseInPlace r, inputString, "<\/?o:[^>]*>"
    eraseInPlace r, inputString, "<SPAN><\/SPAN>"
    cleanWordHtml = inputString
End Function

如果您需要将<span>标记转换为<font>标记(我也需要这样做,因为我正在导入Access富文本字段,该字段不支持<font>),请尝试调用此函数并传入在extractFirstParagraph函数中构造的MSHTML对象:

代码语言:javascript
复制
' Given a <p> DOM node, replace any children of the
' form <span style="color: foo"> with <font color="foo">.
Function convertSpanToFont(ByRef par As MSHTML.HTMLGenericElement, _
    ByRef doc As MSHTML.HTMLDocument)
    Dim span As MSHTML.HTMLSpanElement, _
        font As MSHTML.HTMLFontElement
    For Each span In par.getElementsByTagName("span")
        Set font = doc.createElement("font")
        If IsNull(span.Style.Color) _
            Or span.Style.Color <> "" Then
            font.Color = span.Style.Color
            font.innerHTML = span.innerHTML
            span.insertAdjacentElement "afterEnd", font
            span.removeNode True
        End If
    Next span
End Function

我还考虑将整个电子表格从Excel中保存为HTML,然后使用另一个工具将其转换为Access可以处理的格式,但Excel的HTML导出会生成CSS类,而不是内联样式。如果只需要将电子表格的一部分转换为HTML,则此方法也很有用。

票数 3
EN

Stack Overflow用户

发布于 2021-03-03 21:10:05

实际上,使用Excel的内部XML表示并使用XSL样式表将其转换为XML的解决方案要干净得多,速度更快。

您可以在https://github.com/HeimMatthias/Excel-HTML-Tools-Public上找到VBA代码和所需的XSLT。

免责声明:多年来,我在使用了不可饶恕的缓慢VBA脚本之后,自己编写了这个工具。在这个小提琴中可以看到示例输出。

技术细节:

-Property可以接受一个参数xlRangeValueXMLSpreadsheet,用它返回一个完全格式化的xml-字符串,其中包含其内容的html-对象。在包含以下内容的单元格上运行ActiveSheet.Range("A1").Value(xlRangeValueXMLSpreadsheet)

栓皮栎[医]氧化腺苷(氧化腺苷)豪厄尔

返回以下字符串:

代码语言:javascript
复制
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40">
 <Styles>
  <Style ss:ID="Default" ss:Name="Normal">
   <Alignment ss:Vertical="Bottom"/>
   <Borders/>
   <Font ss:FontName="Calibri" ss:Size="11" ss:Color="#000000"/>
   <Interior/>
   <NumberFormat/>
   <Protection/>
  </Style>
  <Style ss:ID="s62">
   <Font ss:FontName="Calibri" ss:Size="11" ss:Color="#000000"
    ss:Italic="1"/>
  </Style>
 </Styles>
 <Worksheet ss:Name="Tabelle1">
  <Table ss:ExpandedColumnCount="1" ss:ExpandedRowCount="1"
   ss:DefaultColumnWidth="61.714285714285708"
   ss:DefaultRowHeight="14.571428571428571">
   <Row>
    <Cell ss:StyleID="s62"><ss:Data ss:Type="String"
      xmlns="http://www.w3.org/TR/REC-html40"><I><Font html:Color="#000000">Quercus agrifolia</Font></I><Font
       html:Color="#000000"> var. </Font><I><Font html:Color="#000000">oxyadenia</Font></I><Font
       html:Color="#000000"> (Torr.) J.T. Howell</Font></ss:Data></Cell>
   </Row>
  </Table>
 </Worksheet>
</Workbook>

Cell-Tag包围了包含--或多或少--干净的html-代码的ss:Data-Object。无论如何,净化这些数据以获得干净的html比对每个字母进行解析和访问其样式要简单得多,速度也快得多。(要快得多,我指的是比以前快100倍的轻松程度)。

需要注意的事项:单元格样式(来自模板和单个单元格)并不表示为html。这并不总是特别明显。在上面的例子中,单元格自动收到斜体样式,因为第一个单词已经被斜体化了。由于字符串中也有罗马(直立)字,斜体段落的代码用html表示。但如果整个细胞被斜体化,<i>-tags就会丢失。这尤其令人困惑,因为如果对应的样式以<i>-Tag作为属性,则不能仅仅用ss:Italic="1"来包围整个html,因此还需要检查它是否包括html-部件中的-Tags。当然,这适用于所有单元格样式。

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/33620147

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档