首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在java中提取html响应的特定文本

如何在java中提取html响应的特定文本
EN

Stack Overflow用户
提问于 2021-04-02 18:43:07
回答 1查看 60关注 0票数 1

我有一个需要从HTML响应中获取特定文本(API名称)的需求。

下面是来自服务器的HTML响应。

代码语言:javascript
复制
<!DOCTYPE html>
<html lang="en">
   <head>
      <meta charset="UTF-8">
      <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
      <title>Ambassador Developer Portal</title>
      <link
         rel="stylesheet"
         href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:300,400,600,700,900"
         type="text/css"
         media="all"
         >
      <link rel="stylesheet" href="/docs/styles/master.css" type="text/css" media="all">
   </head>
   <body>
      <div class="o-page">
         <header class="o-page__header c-header">
            <a href="/docs/" >
            <img class="c-header__brand" src="/docs/assets/svg/AmbassadorType.svg" width="180px" height="18px"/>
            </a>
            <nav class="c-header__nav">
               <ul>
                  <li><a href="https://www.getambassador.io">Ambassador</a></li>
                  <li><a href="https://www.getambassador.io/products/">Products</a></li>
                  <li><a href="https://blog.getambassador.io/">Blog</a></li>
               </ul>
            </nav>
            <div class="c-header__misc">
               <ul>
                  <form class="c-search-box">
                     <label>
                     <input type="search" placeholder="Search">
                     </label>
                  </form>
               </ul>
            </div>
         </header>
         <nav class="o-page__nav c-nav">
            <div>
               <strong>APIs</strong>
               <ul>
                  <li>
                     <a class="" href="/docs/doc/ambassador/netbanking">
                     ambassador.netbanking
                     </a>
                  </li>
                  <li>
                     <a class="" href="/docs/doc/ambassador/regular-httpbin">
                     ambassador.regular-httpbin
                     </a>
                  </li>
                  <li>
                     <a class="" href="/docs/doc/default/petstore">
                     default.petstore
                     </a>
                  </li>
               </ul>
            </div>
            <br />
            <div>
               <strong>Reference</strong>
               <ul>
                  <li><a class="" href="/docs/page/Content">Content</a></li>
                  <li><a class="" href="/docs/page/Introduction">Introduction</a></li>
               </ul>
            </div>
            <br />
            <div>
               <strong>Services without documentation</strong>
               <ul>
                  <li>
                     <samp>ambassador.quote-backend</samp>
                  </li>
                  <li>
                     <samp>ambassador.service-a</samp>
                  </li>
                  <li>
                     <samp>ambassador.service-b</samp>
                  </li>
                  <li>
                     <samp>default.sample-app</samp>
                  </li>
                  <li>
                     <samp>default.sample-app-backend-route</samp>
                  </li>
                  <li>
                     <samp>keycloak.keycloak</samp>
                  </li>
               </ul>
            </div>
         </nav>
         <main class="o-page__main">
            <div>
               <article>
                  <section>
                     <div>
                        <p><span>
                           </span>
                        </p>
                        <h1>Welcome to the Ambassador Dev Portal</h1>
                        <h2>Customizing the Portal</h2>
                        <p>This content is fully customizable for your specific needs.
                           For details on customizing the portal, see <a href="https://www.getambassador.io/reference/dev-portal">https://www.getambassador.io/reference/dev-portal</a>.
                        </p>
                        <h2>Available Services</h2>
                        <p>The following services are exposed through this Ambassador instance:</p>
                        <table cellpadding="2em" width="100%">
                           <thead>
                              <tr>
                                 <td><b>Service Name</b></td>
                                 <td><b>Swagger URL</b></td>
                              </tr>
                           </thead>
                           <tbody>
                              <tr style="background: rgba(86,61,124,.05);">
                                 <td>
                                    <samp>ambassador.netbanking</samp>
                                 </td>
                                 <td>
                                    <a href="/docs/doc/ambassador/netbanking"><code>API Documentation</code></a>
                                 </td>
                              </tr>
                              <tr>
                                 <td>
                                    <samp>ambassador.quote-backend</samp>
                                 </td>
                                 <td>
                                    <code><span style="color:red">No API Documentation</span></code>
                                 </td>
                              </tr>
                              <tr style="background: rgba(86,61,124,.05);">
                                 <td>
                                    <samp>ambassador.regular-httpbin</samp>
                                 </td>
                                 <td>
                                    <a href="/docs/doc/ambassador/regular-httpbin"><code>API Documentation</code></a>
                                 </td>
                              </tr>
                              <tr>
                                 <td>
                                    <samp>ambassador.service-a</samp>
                                 </td>
                                 <td>
                                    <code><span style="color:red">No API Documentation</span></code>
                                 </td>
                              </tr>
                              <tr style="background: rgba(86,61,124,.05);">
                                 <td>
                                    <samp>ambassador.service-b</samp>
                                 </td>
                                 <td>
                                    <code><span style="color:red">No API Documentation</span></code>
                                 </td>
                              </tr>
                              <tr>
                                 <td>
                                    <samp>default.petstore</samp>
                                 </td>
                                 <td>
                                    <a href="/docs/doc/default/petstore"><code>API Documentation</code></a>
                                 </td>
                              </tr>
                              <tr style="background: rgba(86,61,124,.05);">
                                 <td>
                                    <samp>default.sample-app</samp>
                                 </td>
                                 <td>
                                    <code><span style="color:red">No API Documentation</span></code>
                                 </td>
                              </tr>
                              <tr>
                                 <td>
                                    <samp>default.sample-app-backend-route</samp>
                                 </td>
                                 <td>
                                    <code><span style="color:red">No API Documentation</span></code>
                                 </td>
                              </tr>
                              <tr style="background: rgba(86,61,124,.05);">
                                 <td>
                                    <samp>keycloak.keycloak</samp>
                                 </td>
                                 <td>
                                    <code><span style="color:red">No API Documentation</span></code>
                                 </td>
                              </tr>
                           </tbody>
                        </table>
                     </div>
                  </section>
               </article>
            </div>
         </main>
         <footer class="o-page__footer c-footer">
            <nav>
               <ul>
                  <li><a href="https://d6e.co/slack">Slack</a></li>
                  <li><a href="https://github.com/datawire/ambassador">GitHub</a></li>
                  <li><a href="https://www.getambassador.io/contact">Sales</a></li>
               </ul>
            </nav>
         </footer>
      </div>
   </body>
</html>

从上面的响应中,我只需要从这个块/部分中获取这些文本

代码语言:javascript
复制
ambassador.netbanking
ambassador.regular-httpbin
default.petstore
代码语言:javascript
复制
      <div>
               <strong>APIs</strong>
               <ul>
                  <li>
                     <a class="" href="/docs/doc/ambassador/netbanking">
                     ambassador.netbanking
                     </a>
                  </li>
                  <li>
                     <a class="" href="/docs/doc/ambassador/regular-httpbin">
                     ambassador.regular-httpbin
                     </a>
                  </li>
                  <li>
                     <a class="" href="/docs/doc/default/petstore">
                     default.petstore
                     </a>
                  </li>
               </ul>
            </div>

到目前为止,我已经尝试了这段代码来获得所需的输出

代码语言:javascript
复制
public JSONArray getApiList(){
        JSONArray apiSpecList = new JSONArray();
        String res = this.getApiResponse("https://gifted-wiles-4865.edgestack.me/docs/");
        Document document = Jsoup.parse(res);
        Elements divs = document.select("samp");
        Elements divs1 = document.getElementsByClass("o-page__nav c-nav");
        //Elements divs1 = document.getElementsBy("/docs/doc/");
        Element link = document.select("a").first();
        String test = link.text();

        System.out.println("Text: " + link.text());
        //res=res.substring(res.indexOf("{"),res.lastIndexOf("}") );
        //System.out.println(res);
       // @data = Hash.from_xml(res).to_json;
        return apiSpecList;
    }

public String getApiResponse(String url) {
        RestTemplate restTemplate = restTemplate = new RestTemplate();
        ResponseEntity<String> response;
        logger.info("Ambassador , Connecting [{}] ",url);
        HttpHeaders headers = new HttpHeaders();
        //headers.set("Authorization", "Basic " + access_token);
        headers.setContentType(MediaType.APPLICATION_JSON);

        HttpEntity<String> request = new HttpEntity<String>(null, headers);
        String resp = "";
        try {
            request = new HttpEntity<String>(null, headers);
            ResponseEntity<String> result = restTemplate.exchange(url, HttpMethod.GET, request, String.class);
            resp = result.getBody();
        }  catch (Exception err) {
            logger.error("Ambassador , Error [{}] ",err.getMessage());
        }
        return resp;
    }

那么如何从HTML响应中获取这些特定的文本呢?

EN

回答 1

Stack Overflow用户

发布于 2021-04-02 18:57:36

我建议你使用Selenium,它更适合与网站相关的任务。你可以试试这个

代码语言:javascript
复制
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;

WebDriver driver = new ChromeDriver();
driver.get("link of the website");

content = driver.findElement(By.xpath("your xpath link"));
println(content.text) 
票数 -1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66918117

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档