文章/答案/技术大牛

发布

社区首页 >问答首页 >将富文本转换为HTML格式标记

问将富文本转换为HTML格式标记
EN

Stack Overflow用户

提问于 2015-11-09 23:16:46

回答 3查看 14.8K关注 0票数 6

我正在处理Excel列表，希望转到：

栓皮栎[医]氧化腺苷(氧化腺苷)豪厄尔

转入：

<i>Quercus agrifolia</i> var. <i>oxyadenia</i> (Torr.) J.T. Howell

我已经应用了格式设置的富文本格式化列表，但我希望将其发送到Access，其中显式地包含在相关文本周围的格式标签。

excel

vba

回答 3

Stack Overflow用户

回答已采纳

发布于 2015-11-19 06:21:48

我也想做同样的事情，在MSDN上找到了一个答案：将格式化excel单元格的内容转换为HTML格式。

我希望这也能帮助你，它使用了excel宏。

编辑:当我需要修改嵌套标记的代码时，请查找我对下面宏的更新：

Function fnConvert2HTML(myCell As Range) As String
    Dim bldTagOn, itlTagOn, ulnTagOn, colTagOn As Boolean
    Dim i, chrCount As Integer
    Dim chrCol, chrLastCol, htmlTxt, htmlEnd As String

    bldTagOn = False
    itlTagOn = False
    ulnTagOn = False
    colTagOn = False
    chrCol = "NONE"
    'htmlTxt = "<html>"
    htmlTxt = ""
    chrCount = myCell.Characters.Count

    For i = 1 To chrCount
    htmlEnd = ""
        With myCell.Characters(i, 1)
            If (.Font.Color) Then
                chrCol = fnGetCol(.Font.Color)
                If Not colTagOn Then
                    htmlTxt = htmlTxt & "<font color=#" & chrCol & ">"
                    colTagOn = True
                Else
                    If chrCol <> chrLastCol Then htmlTxt = htmlTxt & "</font><font color=#" & chrCol & ">"
                End If
            Else
                chrCol = "NONE"
                If colTagOn Then
                    htmlEnd = "</font>" & htmlEnd
                    'htmlTxt = htmlTxt & "</font>"
                    colTagOn = False
                End If
            End If
            chrLastCol = chrCol

            If .Font.Bold = True Then
                If Not bldTagOn Then
                    htmlTxt = htmlTxt & "<b>"
                    bldTagOn = True
                End If
            Else
                If bldTagOn Then
                    'htmlTxt = htmlTxt & "</b>"
                    htmlEnd = "</b>" & htmlEnd
                    bldTagOn = False
                End If
            End If

            If .Font.Italic = True Then
                If Not itlTagOn Then
                    htmlTxt = htmlTxt & "<i>"
                    itlTagOn = True
                End If
            Else
                If itlTagOn Then
                    'htmlTxt = htmlTxt & "</i>"
                    htmlEnd = "</i>" & htmlEnd
                    itlTagOn = False
                End If
            End If

            If .Font.Underline > 0 Then
                If Not ulnTagOn Then
                    htmlTxt = htmlTxt & "<u>"
                    ulnTagOn = True
                End If
            Else
                If ulnTagOn Then
                    'htmlTxt = htmlTxt & "</u>"
                    htmlEnd = "</u>" & htmlEnd
                    ulnTagOn = False
                End If
            End If

            If (Asc(.Text) = 10) Then
                htmlTxt = htmlTxt & htmlEnd & "<br>"
            Else
                htmlTxt = htmlTxt & htmlEnd & .Text
            End If

        End With
    Next

    If colTagOn Then
        htmlTxt = htmlTxt & "</font>"
        colTagOn = False
    End If
    If bldTagOn Then
        htmlTxt = htmlTxt & "</b>"
        bldTagOn = False
    End If
    If itlTagOn Then
        htmlTxt = htmlTxt & "</i>"
        itlTagOn = False
    End If
    If ulnTagOn Then
        htmlTxt = htmlTxt & "</u>"
        ulnTagOn = False
    End If
    'htmlTxt = htmlTxt & "</html>"
    fnConvert2HTML = htmlTxt
End Function

Function fnGetCol(strCol As String) As String
    Dim rVal, gVal, bVal As String
    strCol = Right("000000" & Hex(strCol), 6)
    bVal = Left(strCol, 2)
    gVal = Mid(strCol, 3, 2)
    rVal = Right(strCol, 2)
    fnGetCol = rVal & gVal & bVal
End Function

票数 11

Stack Overflow用户

发布于 2018-08-01 01:36:51

这里有一个替代解决方案，它更快，但产生更混乱的输出(因为它使用Word的HTML引擎)。您需要向VBA项目添加以下引用：

Microsoft HTML对象库
Microsoft脚本运行时
Microsoft 16.0对象库

然后，通过运行例如调用下面的代码。立即窗口中的convertToHtml(Range("A1:A100"))：

' Given a temporary file path, load the HTML in that file
' and return the first paragraph's inner HTML.
Function extractFirstParagraph(filePath As String) As String
    Dim fs As New FileSystemObject, _
        html As New MSHTML.HTMLDocument, _
        par As MSHTML.HTMLGenericElement
    html.body.innerHTML = fs.OpenTextFile(filePath).ReadAll
    Set par = html.getElementsByTagName("P")(0)
    extractFirstParagraph = par.innerHTML
End Function

Sub convertToHtml(rng As Range)
    ' Open a single Word instance.
    Dim w As New Word.Application, d As Word.Document
    Set d = w.Documents.Add

    Dim cell As Range
    Const tempFile As String = "c:\temp\msword.html"
    ' For each cell in the range ...
    For Each cell In rng
        If cell.Value <> "" Then
            ' ... copy it into the Word document ...
            cell.Copy
            d.Range.PasteSpecial xlPasteFormats
            ' ... save the Word document as HTML
            ' in a temporary file ...
            d.SaveAs2 tempFile, wdFormatHTML
            ' ... and extract the first paragraph.
            cell.Value = extractFirstParagraph(tempFile)
            Debug.Print "Cell " & cell.Address & " done."
        End If
    Next cell

    ' Close Word once you're done. Note that if a bug
    ' is encountered, this cleanup won't occur and the 
    ' Word process will need to be killed to release
    ' file locks, otherwise you get an unhelpful error.
    w.Quit False
End Sub

您可以使用正则表达式来清理输出，方法是添加对Microsoft VBScript正则表达式5.5的引用，并运行如下函数：

' Used to avoid duplication in cleanWordHtml.
Function eraseInPlace(ByRef r As RegExp, _
    ByRef s As String, p As String) As String
    r.Pattern = p
    s = r.Replace(s, "")
End Function

' Eliminate junk tags from HTML generated by Word.
Function cleanWordHtml(inputString As String)
    Dim r As New RegExp
    r.Global = True
    eraseInPlace r, inputString, "mso-[^;""]*(; )?"
    eraseInPlace r, inputString, " style="""""
    eraseInPlace r, inputString, "<\?xml[^>]*>"
    eraseInPlace r, inputString, "<\/?o:[^>]*>"
    eraseInPlace r, inputString, "<SPAN><\/SPAN>"
    cleanWordHtml = inputString
End Function

如果您需要将标记转换为标记(我也需要这样做，因为我正在导入Access富文本字段，该字段不支持)，请尝试调用此函数并传入在extractFirstParagraph函数中构造的MSHTML对象：

' Given a <p> DOM node, replace any children of the
' form <span style="color: foo"> with <font color="foo">.
Function convertSpanToFont(ByRef par As MSHTML.HTMLGenericElement, _
    ByRef doc As MSHTML.HTMLDocument)
    Dim span As MSHTML.HTMLSpanElement, _
        font As MSHTML.HTMLFontElement
    For Each span In par.getElementsByTagName("span")
        Set font = doc.createElement("font")
        If IsNull(span.Style.Color) _
            Or span.Style.Color <> "" Then
            font.Color = span.Style.Color
            font.innerHTML = span.innerHTML
            span.insertAdjacentElement "afterEnd", font
            span.removeNode True
        End If
    Next span
End Function

我还考虑将整个电子表格从Excel中保存为HTML，然后使用另一个工具将其转换为Access可以处理的格式，但Excel的HTML导出会生成CSS类，而不是内联样式。如果只需要将电子表格的一部分转换为HTML，则此方法也很有用。

票数 3

Stack Overflow用户

发布于 2021-03-03 21:10:05

实际上，使用Excel的内部XML表示并使用XSL样式表将其转换为XML的解决方案要干净得多，速度更快。

您可以在https://github.com/HeimMatthias/Excel-HTML-Tools-Public上找到VBA代码和所需的XSLT。

免责声明:多年来，我在使用了不可饶恕的缓慢VBA脚本之后，自己编写了这个工具。在这个小提琴中可以看到示例输出。

技术细节：

-Property可以接受一个参数xlRangeValueXMLSpreadsheet，用它返回一个完全格式化的xml-字符串，其中包含其内容的html-对象。在包含以下内容的单元格上运行ActiveSheet.Range("A1").Value(xlRangeValueXMLSpreadsheet)

栓皮栎[医]氧化腺苷(氧化腺苷)豪厄尔

返回以下字符串：

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40">
 <Styles>
  <Style ss:ID="Default" ss:Name="Normal">
   <Alignment ss:Vertical="Bottom"/>
   <Borders/>
   <Font ss:FontName="Calibri" ss:Size="11" ss:Color="#000000"/>
   <Interior/>
   <NumberFormat/>
   <Protection/>
  </Style>
  <Style ss:ID="s62">
   <Font ss:FontName="Calibri" ss:Size="11" ss:Color="#000000"
    ss:Italic="1"/>
  </Style>
 </Styles>
 <Worksheet ss:Name="Tabelle1">
  <Table ss:ExpandedColumnCount="1" ss:ExpandedRowCount="1"
   ss:DefaultColumnWidth="61.714285714285708"
   ss:DefaultRowHeight="14.571428571428571">
   <Row>
    <Cell ss:StyleID="s62"><ss:Data ss:Type="String"
      xmlns="http://www.w3.org/TR/REC-html40"><I><Font html:Color="#000000">Quercus agrifolia</Font></I><Font
       html:Color="#000000"> var. </Font><I><Font html:Color="#000000">oxyadenia</Font></I><Font
       html:Color="#000000"> (Torr.) J.T. Howell</Font></ss:Data></Cell>
   </Row>
  </Table>
 </Worksheet>
</Workbook>

Cell-Tag包围了包含--或多或少--干净的html-代码的ss:Data-Object。无论如何，净化这些数据以获得干净的html比对每个字母进行解析和访问其样式要简单得多，速度也快得多。(要快得多，我指的是比以前快100倍的轻松程度)。

需要注意的事项：单元格样式(来自模板和单个单元格)并不表示为html。这并不总是特别明显。在上面的例子中，单元格自动收到斜体样式，因为第一个单词已经被斜体化了。由于字符串中也有罗马(直立)字，斜体段落的代码用html表示。但如果整个细胞被斜体化，-tags就会丢失。这尤其令人困惑，因为如果对应的样式以-Tag作为属性，则不能仅仅用ss:Italic="1"来包围整个html，因此还需要检查它是否包括html-部件中的-Tags。当然，这适用于所有单元格样式。

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/33620147

复制

相似问题

问将富文本转换为HTML格式标记
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将富文本转换为HTML格式标记EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将富文本转换为HTML格式标记
EN