首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >解析Microdata时,我得到的是空值

解析Microdata时,我得到的是空值
EN

Stack Overflow用户
提问于 2015-09-11 08:44:16
回答 1查看 654关注 0票数 1
代码语言:javascript
复制
 <div class="content-sidebar-wrap"><main class="content" role="main" itemprop="mainContentOfPage" itemscope="itemscope" itemtype="http://schema.org/Blog"><article class="post-2334 post type-post status-publish format-standard has-post-thumbnail category-blog-basics entry" itemscope="itemscope" itemtype="http://schema.org/BlogPosting" itemprop="blogPost"><header class="entry-header"><h1 class="entry-title" itemprop="headline">Examples of Blogs</h1> 
    <p class="entry-meta">by <span class="entry-author" itemprop="author" itemscope="itemscope" itemtype="http://schema.org/Person"><span class="entry-author-name" itemprop="name">Kenneth Byrd</span></span> | Go from 0 to 5,000 blog subscribers in 60 days <a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" rel="nofollow">(Click Here)</a></p></header><img src="http://blogbasics.com/wp-content/uploads/Examples-of-Blogs.jpg" width="5315" height="3543" alt="examples of blogs" title="" class="attachment-tru-post wp-post-image" /><div class="entry-content" itemprop="text"><h3>Overview</h3>
    <p>This article includes examples of blogs from various niches. There are millions of example blogs out there in all different shapes and sizes. A good place to start is <a href="http://technorati.com/" target="_blank">Technorati</a>, a directory of blogs, or <a href="http://alltop.com/" target="_blank">Alltop</a>. Search these websites and then come back and tell us about the good blogs and the bad blogs that you found. Below are also more examples of blogs that you should look at:</p>
    <h3><strong>Personal blogs</strong></h3>
    <p><a title="Curl Centric" href="http://www.curlcentric.com/natural-hair-101/" target="_blank">Curl Centric</a>: Dedicated to providing healthy hair care information.</p>
    <h3>Travel</h3>
    <p><a href="http://boardingarea.com/" target="_blank">Boarding Area</a>:  A collection of bloggers on travel.  Range from personal stories to specific advice on airlines, hotels and places.</p>
    <div class="content-box-yellow"><a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/">Learn how to start a blog that generates traffic, revenue, &amp; popularity.</a></div>
    <p><a href="http://vivisrandomramblings.blogspot.com/" target="_blank">Vivi&#8217;s Random Ramblings</a>: A nice collection of random posts mostly demonstrating that Violy is a well-travelled, excellent photographer.</p>
    <!-- Quick Adsense WordPress Plugin: http://quicksense.net/ -->
    <div style="float:none;margin:5px 0 5px 0;text-align:center;">
    <script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
    <!-- Blog Basics - 300 x 250 -->
    <ins class="adsbygoogle"
         style="display:inline-block;width:300px;height:250px"
         data-ad-client="ca-pub-5556427932737077"
         data-ad-slot="6553509385"></ins>
    <script>
    (adsbygoogle = window.adsbygoogle || []).push({});
    </script>
    </div>

我正在尝试使用itemprop库解析所有itemtype属性中存在的所有itemtype属性的值。

下面是示例HTML页面正文:

代码语言:javascript
复制
<body class="single single-post postid-2334 single-format-standard custom-header header-image header-full-width full-width-content" itemscope="itemscope" itemtype="http://schema.org/WebPage"><div class="site-container"><header class="site-header" role="banner" itemscope="itemscope" itemtype="http://schema.org/WPHeader"><div class="wrap"><div class="title-area"><p class="site-title" itemprop="headline"><a href="http://blogbasics.com/">Blog Basics</a></p><div id="title_image"><a href="http://blogbasics.com/" title="Blog Basics"><img src="http://blogbasics.com/wp-content/uploads/cropped-cropped-Win-1.png" title="Blog Basics" /></a><style>#title { display:none; }</style></div><p class="site-description" itemprop="description">Starting a blog? Learn how to make it amazing.</p></div></div></header><nav class="nav-primary" role="navigation" itemscope="itemscope" itemtype="http://schema.org/SiteNavigationElement"><div class="wrap"><ul id="menu-primary-navigation" class="menu genesis-nav-menu menu-primary"><li id="menu-item-2590" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-home menu-item-2590"><a title="Blog Basics" href="http://blogbasics.com">Home</a></li>
<li id="menu-item-3187" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-3187"><a href="http://blogbasics.com/blog">Blog</a></li>
<li id="menu-item-3722" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-3722"><a href="http://blogbasics.com/welcome">Free Updates</a></li>
<li id="menu-item-2578" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-2578"><a title="Blogging Tools" href="http://blogbasics.com/blogging-tools/">Blogging Tools</a></li>
</ul></div></nav><div class="site-inner"><div class="feature-area widget-area">
<div id="spyr_tru_notifybar-2" class="widget notify_bar"><div class="widget-wrap">Starting a blog? Learn how to make it awesome!</div></div>

<div id="spyr_tru_twocolumn-3" class="widget widget_spyr_tru_twocolumn"><div class="widget-wrap">
<div class="column one-half first original"><div align="middle"><a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" target="_blank"><img src="https://lh3.ggpht.com/wMcDasimpny0oKmsIxI0xowRIxbFpaa9Rjg3aAUMG8UUtp4XamG03gYcGlsXTRmvkFqgySaihhq2_KCSr8cN7Q=s0"></a><script data-leadbox="14581e773f72a2:12e927026b46dc" data-url="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" data-config="%7B%7D" type="text/javascript" src="https://curlcentric.leadpages.net/leadbox-910.js"></script></div>
</div>
<div class="column one-half last original"><p>Learn how to build a blog that generates traffic, revenue, & popularity in 30 days.</p>
<p>Just enter your email address in the box below and click "Submit".</p>
</div>
<div class="clear"></div>
</div></div>
<div id="spyr_tru_subscribesocial-2" class="widget feature-area-bottom tru_subscribe_social"><div class="widget-wrap">
<div class="tru_subscribesocial_wrap">
    <form action="http://www.aweber.com/scripts/addlead.pl" method="post" target="_blank">
        <div class="hidden_fields"><input type="hidden" name="meta_web_form_id" value="276964962" />
<input type="hidden" name="meta_split_id" value="" />
<input type="hidden" name="listname" value="awlist3567293" />
<input type="hidden" name="redirect" value="http://www.aweber.com/thankyou-coi.htm?m=text" id="redirect_f956eccce03104dc62dec5f8c897285e" />

<input type="hidden" name="meta_adtracking" value="Blog_Basics" />
<input type="hidden" name="meta_message" value="1" />
<input type="hidden" name="meta_required" value="email" />

<input type="hidden" name="meta_tooltip" value="" /></div>
        <input type="email" class="default_value" name="email" value="Enter email to get updates" /></span>
        <input type="submit" value="Submit" />
        </form>
    <div class="social_menu">
        <ul id="menu-social" class="menu superfish">

            </ul>
        </div>
    <div class="clear"></div>
    </div>
</div></div>
</div><div class="content-sidebar-wrap"><main class="content" role="main" itemprop="mainContentOfPage" itemscope="itemscope" itemtype="http://schema.org/Blog"><article class="post-2334 post type-post status-publish format-standard has-post-thumbnail category-blog-basics entry" itemscope="itemscope" itemtype="http://schema.org/BlogPosting" itemprop="blogPost"><header class="entry-header"><h1 class="entry-title" itemprop="headline">Examples of Blogs</h1> 
<p class="entry-meta">by <span class="entry-author" itemprop="author" itemscope="itemscope" itemtype="http://schema.org/Person"><span class="entry-author-name" itemprop="name">Kenneth Byrd</span></span> | Go from 0 to 5,000 blog subscribers in 60 days <a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" rel="nofollow">(Click Here)</a></p></header><img src="http://blogbasics.com/wp-content/uploads/Examples-of-Blogs.jpg" width="5315" height="3543" alt="examples of blogs" title="" class="attachment-tru-post wp-post-image" /><div class="entry-content" itemprop="text"><h3>Overview</h3>
<p>This article includes examples of blogs from various niches. There are millions of example blogs out there in all different shapes and sizes. A good place to start is <a href="http://technorati.com/" target="_blank">Technorati</a>, a directory of blogs, or <a href="http://alltop.com/" target="_blank">Alltop</a>. Search these websites and then come back and tell us about the good blogs and the bad blogs that you found. Below are also more examples of blogs that you should look at:</p>
<h3><strong>Personal blogs</strong></h3>
<p><a title="Curl Centric" href="http://www.curlcentric.com/natural-hair-101/" target="_blank">Curl Centric</a>: Dedicated to providing healthy hair care information.</p>
<h3>Travel</h3>
<p><a href="http://boardingarea.com/" target="_blank">Boarding Area</a>:  A collection of bloggers on travel.  Range from personal stories to specific advice on airlines, hotels and places.</p>
<div class="content-box-yellow"><a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/">Learn how to start a blog that generates traffic, revenue, &amp; popularity.</a></div>
<p><a href="http://vivisrandomramblings.blogspot.com/" target="_blank">Vivi&#8217;s Random Ramblings</a>: A nice collection of random posts mostly demonstrating that Violy is a well-travelled, excellent photographer.</p>
<!-- Quick Adsense WordPress Plugin: http://quicksense.net/ -->
<div style="float:none;margin:5px 0 5px 0;text-align:center;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- Blog Basics - 300 x 250 -->
<ins class="adsbygoogle"
     style="display:inline-block;width:300px;height:250px"
     data-ad-client="ca-pub-5556427932737077"
     data-ad-slot="6553509385"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>

<p><a href="http://www.whygo.com/" target="_blank">Why go network of blogs</a>: Another group of travel bloggers.  Each blogger has their own patch, which range from Portland, which looks a nice city, to Iceland and France.</p>
<h3>Technical</h3>
<p><a href="http://techcrunch.com/" target="_blank">Techcrunch</a>:  This is the one to learn all about technology and in particular technology business, technology start-ups and gadgets.  You&#8217;ll usually hear the techie gossip here first.</p>
<p><a href="http://speckyboy.com/2010/02/25/50-amazing-personal-blog-web-designs/" target="_blank">Speckyboy.com</a>: Great blog on the design of websites.  Good on lists, (usually 50) of well researched examples of good or unusual design.  Gives even the least technical good ideas to discuss with their own designers.</p>
<h3>On Blogging</h3>
<p><a href="http://www.trafficgenerationcafe.com/" target="_blank">Traffic Generation Cafe</a>: Ana Hoffman&#8217;s very friendly, very knowledgeable blog on building traffic for your blog.</p>
<div class="content-box-yellow"><a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/">Learn how to start a blog that generates traffic, revenue, &amp; popularity.</a></div>
<p><a href="http://blogbasics.com/blog/" target="_blank">Blog Basics</a>: This website is a blog that is focused on topics like &#8216;how to blog&#8217; and &#8216;how to make money blogging&#8217;.</p>
<h3>Over to you</h3>
<p>Which blogs do you like?  Are you writing a blog?  Then tell us about it.</p>

<!-- Quick Adsense WordPress Plugin: http://quicksense.net/ -->
<div style="float:none;margin:5px 0 5px 0;text-align:center;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- Banner -->
<ins class="adsbygoogle"
     style="display:inline-block;width:468px;height:60px"
     data-ad-client="ca-pub-5556427932737077"
     data-ad-slot="1983708988"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>

<div style="font-size:0px;height:0px;line-height:0px;margin:0;padding:0;clear:both"></div><div style="clear:both;"></div><div id='ois-1' class='ois-design' ><div class="ois-outer ois-8-outer">
    <div class="ois-8-call-top"></div>
    <div class="ois-8-inner ois-inner">
        <div class="col-md-7 ois-8-left">
            <div class="ois-8-title">Get Exclusive Tips</div>
            <div class="ois-8-subtitle">Instantly discover how you can start a blog that generates traffic and income when you join the Blog Basics Tribe (It’s Free). Here's your chance. Just type in your email address.</div>
        </div> <!-- .span7 left side -->    
        <div class="col-md-5 ois-8-right">
            <div class="ois-8-img-wrapper">
                <img src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" data-src="https://lh3.ggpht.com/wMcDasimpny0oKmsIxI0xowRIxbFpaa9Rjg3aAUMG8UUtp4XamG03gYcGlsXTRmvkFqgySaihhq2_KCSr8cN7Q=s0" class="ois-img ois-8-img" /><noscript><img src="https://lh3.ggpht.com/wMcDasimpny0oKmsIxI0xowRIxbFpaa9Rjg3aAUMG8UUtp4XamG03gYcGlsXTRmvkFqgySaihhq2_KCSr8cN7Q=s0" class="ois-img ois-8-img" /></noscript>
            </div>
            <div class="ois-8-form">
                <form action="http://www.aweber.com/scripts/addlead.pl" method="post" id="ois-form-1" data-service="aweber" ><div id="ois-8-email-input-wrapper">
    <input type="text" name="email" class="ois-8-email-input ois-email-input ois-form-control" placeholder="Your Email"/>
</div>
<div id="ois-8-button-wrapper">
    <input type="submit" class="ois-btn ois-8-button" value="Submit"/>
</div><input type='hidden' name='listname' value='awlist3567293'/>
<input type='hidden' name='meta_message' value='1'/>
<input type='hidden' name='redirect' value='http://www.aweber.com/thankyou-coi.htm?m=video&e=example%40example.com&name=Example%20Subscriber&l=awlist3567293'/>
</form>
            </div> <!-- #ois-8-form -->
        </div><!-- .right .col-md-5 right side-->
        <div style="clear:both"></div>
    </div> <!-- inner -->
</div> <!-- outer --></div></div>
<div class="spyr_sliding_share">
    <div class="spyr_sliding_share_text">Share this article</div>
    <div class="spyr_sliding_share_wrap">
            <div class="spyr_sliding_share_button spyr_sb_facebook">
                <a href="#" class="icon icon-facebook"><span>Facebook</span></a>
                <div class="spyr_sb_inner"><div class="fb-like" data-href="http://blogbasics.com/examples-of-blogs/" data-send="false" data-layout="button_count" data-width="100" data-show-faces="false"></div></div>
                </div>
            <div class="spyr_sliding_share_button spyr_sb_twitter">
                <a href="#" class="icon icon-twitter"><span>Twitter</span></a>
                <div class="spyr_sb_inner"><a href="https://twitter.com/share" class="twitter-share-button" data-url="http://blogbasics.com/examples-of-blogs/" data-text="Examples of Blogs | Blog Basics" data-via="kbyrdjr">Tweet</a></div>
                </div>
            <div class="spyr_sliding_share_button spyr_sb_gplus">
                <a href="#" class="icon icon-gplus"><span>Google+</span></a>
                <div class="spyr_sb_inner"><div class="g-plusone" data-size="medium" data-href="http://blogbasics.com/examples-of-blogs/"></div></div>
                </div>
            <div class="spyr_sliding_share_button spyr_sb_pinterest">
                <a href="#" class="icon icon-pinterest"><span>Pinterest</span></a>
                <div class="spyr_sb_inner"><a href="http://pinterest.com/pin/create/button/?url=http://blogbasics.com/examples-of-blogs/&media=http://blogbasics.com/wp-content/uploads/Examples-of-Blogs-550x367.jpg&description=Examples of Blogs" class="pin-it-button" count-layout="horizontal"><img border="0" src="//assets.pinterest.com/images/PinExt.png" title="Pin It" /></a></div>
                </div>
            <div class="spyr_sliding_share_button spyr_sb_mail">
                <a href="#" class="icon icon-mail"><span>Email a Friend</span></a>
                <div class="spyr_sb_inner"><a href="mailto:?subject=Examples of Blogs&body=I found value in this and I think you will too.%0A%0AExamples of Blogs: http://blogbasics.com/examples-of-blogs/">Email a Friend</a></div>
                </div>
        </div>
    <div class="clear"></div>
    </div><footer class="entry-footer"></footer></article><div class="entry-comments" id="comments"><h3>Comments</h3><ol class="comment-list">
    <li class="comment even thread-even depth-1" id="comment-261">
    <article itemprop="comment" itemscope="itemscope" itemtype="http://schema.org/UserComments">


        <header class="comment-header">
            <p class="comment-author" itemprop="creator" itemscope="itemscope" itemtype="http://schema.org/Person">
                <img alt='' src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" data-src="http://0.gravatar.com/avatar/9e0fce9a7b5f7de7f6c528f448007f08?s=48&#038;d=mm&#038;r=g" srcset='http://0.gravatar.com/avatar/9e0fce9a7b5f7de7f6c528f448007f08?s=96&amp;d=mm&amp;r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /><noscript><img alt='' src="http://0.gravatar.com/avatar/9e0fce9a7b5f7de7f6c528f448007f08?s=48&#038;d=mm&#038;r=g" srcset='http://0.gravatar.com/avatar/9e0fce9a7b5f7de7f6c528f448007f08?s=96&amp;d=mm&amp;r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /></noscript><span itemprop="name">violy</span> <span class="says">says</span>            </p>

            <p class="comment-meta">
                <time class="comment-time" datetime="2012-01-09T04:42:00+00:00" itemprop="commentTime"><a href="http://blogbasics.com/examples-of-blogs/#comment-261" class="comment-time-link" itemprop="url">January 9, 2012 at 4:42 am</a></time>            </p>
        </header>

        <div class="comment-content" itemprop="commentText">

            <p>Hi sir thank you so much for the nice compliment about my blog (Vivi&#8217;s Random Ramblings&#8221;), I&#8217;m blogging for not even 2 months now and it&#8217;s really overwhelming to see this compliment and getting a lot of good feedback  too and traffic which is a real surprise .. thank you so much!! &#8211; violy</p>
        </div>

        <div class="comment-reply"><a rel='nofollow' class='comment-reply-link' href='#comment-261' onclick='return addComment.moveForm( "comment-261", "261", "respond", "2334" )' aria-label='Reply to violy'>Reply</a></div>

    </article>
    <ul class="children">

    <li class="comment odd alt depth-2" id="comment-262">
    <article itemprop="comment" itemscope="itemscope" itemtype="http://schema.org/UserComments">


        <header class="comment-header">
            <p class="comment-author" itemprop="creator" itemscope="itemscope" itemtype="http://schema.org/Person">
                <img alt='' src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" data-src="http://1.gravatar.com/avatar/4383ac0510061b683651a0eca3d58e42?s=48&#038;d=mm&#038;r=g" srcset='http://1.gravatar.com/avatar/4383ac0510061b683651a0eca3d58e42?s=96&amp;d=mm&amp;r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /><noscript><img alt='' src="http://1.gravatar.com/avatar/4383ac0510061b683651a0eca3d58e42?s=48&#038;d=mm&#038;r=g" srcset='http://1.gravatar.com/avatar/4383ac0510061b683651a0eca3d58e42?s=96&amp;d=mm&amp;r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /></noscript><span itemprop="name"><a href="http://blogbasics.com" class="comment-author-link" rel="external nofollow" itemprop="url">Paul Odtaa</a></span> <span class="says">says</span>            </p>

            <p class="comment-meta">
                <time class="comment-time" datetime="2012-01-09T09:44:00+00:00" itemprop="commentTime"><a href="http://blogbasics.com/examples-of-blogs/#comment-262" class="comment-time-link" itemprop="url">January 9, 2012 at 9:44 am</a></time>            </p>
        </header>

        <div class="comment-content" itemprop="commentText">

            <p>Hi Violy, </p>
<p>I really like your blog and your photography is great. </p>
        </div>

        <div class="comment-reply"><a rel='nofollow' class='comment-reply-link' href='#comment-262' onclick='return addComment.moveForm( "comment-262", "262", "respond", "2334" )' aria-label='Reply to Paul Odtaa'>Reply</a></div>

    </article>
    </li><!-- #comment-## -->
</ul><!-- .children -->
</li><!-- #comment-## -->

    <li class="comment even thread-odd thread-alt depth-1" id="comment-270">
    <article itemprop="comment" itemscope="itemscope" itemtype="http://schema.org/UserComments">


        <header class="comment-header">
            <p class="comment-author" itemprop="creator" itemscope="itemscope" itemtype="http://schema.org/Person">
                <img alt='' src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" data-src="http://1.gravatar.com/avatar/ad28f49a806d0e3abd1a94013a259b9b?s=48&#038;d=mm&#038;r=g" srcset='http://1.gravatar.com/avatar/ad28f49a806d0e3abd1a94013a259b9b?s=96&amp;d=mm&amp;r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /><noscript><img alt='' src="http://1.gravatar.com/avatar/ad28f49a806d0e3abd1a94013a259b9b?s=48&#038;d=mm&#038;r=g" srcset='http://1.gravatar.com/avatar/ad28f49a806d0e3abd1a94013a259b9b?s=96&amp;d=mm&amp;r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /></noscript><span itemprop="name"><a href="http://allisondduncan.com" class="comment-author-link" rel="external nofollow" itemprop="url">Allison Duncan</a></span> <span class="says">says</span>            </p>

            <p class="comment-meta">
                <time class="comment-time" datetime="2012-01-20T21:17:00+00:00" itemprop="commentTime"><a href="http://blogbasics.com/examples-of-blogs/#comment-270" class="comment-time-link" itemprop="url">January 20, 2012 at 9:17 pm</a></time>           </p>
        </header>

        <div class="comment-content" itemprop="commentText">

            <p>Hi there,</p>
<p>Thanks for featuring my blog on your site. It&#8217;s always nice to see your work being appreciated and linked to.</p>
<p>I look forward to seeing what your site has coming down the pike.</p>
<p>Thanks for reading!</p>
<p>Allison</p>
        </div>

        <div class="comment-reply"><a rel='nofollow' class='comment-reply-link' href='#comment-270' onclick='return addComment.moveForm( "comment-270", "270", "respond", "2334" )' aria-label='Reply to Allison Duncan'>Reply</a></div>

    </article>
    </li><!-- #comment-## -->

我正在使用jsoup库来解析和提取HTML。我正在尝试使用以下代码:

代码语言:javascript
复制
doc = Jsoup.connect("http://blogbasics.com/examples-of-blogs/").get();

            Elements links = doc.select("itemtype > [itemprop]");

            for (Element element : links) {
                System.out.println(" itemprop :"+element.attr("itemprop"));
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

但我得到的是空洞的价值。我是新手,请让我知道正确的代码。如果有其他方法从HTML中提取itemtypeitemprop,请分享它将会有很大的帮助。

代码语言:javascript
复制
<div class="content-sidebar-wrap">
<main class="content" role="main" itemprop="mainContentOfPage" itemscope="itemscope" 
itemtype="http://schema.org/Blog"><article class="post-2334 post type-post status-publish 
format-standard has-post-thumbnail category-blog-basics entry" itemscope="itemscope" 
itemtype="http://schema.org/BlogPosting" itemprop="blogPost"><header class="entry-header">
<h1 class="entry-title" itemprop="headline">Examples of Blogs</h1> 
<p class="entry-meta">by <span class="entry-author" itemprop="author" itemscope="itemscope" 
itemtype="http://schema.org/Person"><span class="entry-author-name" itemprop="name">Kenneth Byrd</span></span> |
 Go from 0 to 5,000 blog subscribers in 60 days
 <a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" rel="nofollow">(Click Here)</a>
 </p></header><img src="http://blogbasics.com/wp-content/uploads/Examples-of-Blogs.jpg" width="5315" height="3543" 
 alt="examples of blogs" title="" class="attachment-tru-post wp-post-image" /><div class="entry-content"
 itemprop="text"><h3>Overview</h3><p>This article includes examples of blogs
 from various niches. There are millions of example blogs out there in all 
 different shapes and sizes. A good place to start is 
 </p>

预期产出

代码语言:javascript
复制
itemtype="http://schema.org/Blog">
itemprop="mainContentOfPage"

itemtype="http://schema.org/BlogPosting" 
itemprop="blogPost"

itemtype="http://schema.org/Person"
itemprop="author"
itemprop="name">
itemprop="text"
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-09-12 16:54:10

我不太清楚您真正想要的是什么,但似乎需要将包含属性itemtype的所有元素与属性itemprop或仅包含itemprop但包含itemtype的元素的直接子元素放在一起。如果是这样的话,那么您可以使用以下方法:

代码语言:javascript
复制
String html = ""
        +"<div class=\"content-sidebar-wrap\">"
        +"<main class=\"content\" role=\"main\" itemprop=\"mainContentOfPage\" itemscope=\"itemscope\" "
        +"itemtype=\"http://schema.org/Blog\"><article class=\"post-2334 post type-post status-publish "
        +"format-standard has-post-thumbnail category-blog-basics entry\" itemscope=\"itemscope\" "
        +"itemtype=\"http://schema.org/BlogPosting\" itemprop=\"blogPost\"><header class=\"entry-header\">"
        +"<h1 class=\"entry-title\" itemprop=\"headline\">Examples of Blogs</h1> "
        +"<p class=\"entry-meta\">by <span class=\"entry-author\" itemprop=\"author\" itemscope=\"itemscope\" "
        +"itemtype=\"http://schema.org/Person\"><span class=\"entry-author-name\" itemprop=\"name\">Kenneth Byrd</span></span> |"
        +" Go from 0 to 5,000 blog subscribers in 60 days"
        +" <a href=\"https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/\" rel=\"nofollow\">(Click Here)</a>"
        +" </p></header><img src=\"http://blogbasics.com/wp-content/uploads/Examples-of-Blogs.jpg\" width=\"5315\" height=\"3543\" "
        +" alt=\"examples of blogs\" title=\"\" class=\"attachment-tru-post wp-post-image\" /><div class=\"entry-content\""
        +" itemprop=\"text\"><h3>Overview</h3><p>This article includes examples of blogs"
        +" from various niches. There are millions of example blogs out there in all "
        +" different shapes and sizes. A good place to start is "
        +" </p>"
        ;

Document doc = Jsoup.parse(html,"");

Elements els = doc.select("*[itemtype][itemprop], *[itemtype] > *[itemprop]");
for (Element el:els){

    System.out.print(el.attr("itemtype").isEmpty()?"":("\n" +el.attr("itemtype")+"\n"));
    System.out.println(el.attr("itemprop"));
}

重要的部分是JSoup CSS选择器 *[itemtype][itemprop], *[itemtype] > *[itemprop],它有两个部分:

  1. *[itemtype][itemprop]选择两个属性的元素。
  2. *[itemtype] > *[itemprop]选择具有属性itemprop的元素,这些元素是具有属性itemtype的元素的直接子元素。如果你想让所有的孩子,不仅仅是直接的孩子,那就省去>吧。

选择器之间的逗号工作为"OR",因此所有匹配列出的选择器的元素都将被返回。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/32519235

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档