首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Regex在基于wordpress的网站上查找主题信息

Regex在基于wordpress的网站上查找主题信息
EN

Stack Overflow用户
提问于 2015-01-19 22:34:13
回答 1查看 198关注 0票数 1

我正在尝试构建一个脚本,它将进入任何给定wordpress站点的css文件,并检索主题信息。问题是-当我抓取页面时,所有的换行符都变成了空格,并且顺序总是不同的。例如:

代码语言:javascript
复制
/*
Theme Name: ColorWay
Theme URI: http://www.inkthemes.com/wp-themes/colorway-wp-theme/
Description: Colorway is Simple, Elegant, Responsive and beautiful Theme with Easy Customization Options built by InkThemes.com. The Customization Options includes using your own Logos, Backgrounds, Analytics and your own Custom Footer Texts and Analytics that can be tweaked using Theme Options Panel. Colorway Theme is Single Click Intall feature, Just press activate button and your website will get ready with all the dummy content. Just set the content from the Themes Options Panel. Colorway by InkThemes.com is suitable for any business or personal website. The Theme can work for various different niches. It includes special styles for Gallery pages, and has an optional fullwidth page template as well.
Author: InkThemes.com
Author URI: http://www.inkthemes.com
Version: 2.5.1
License: GNU General Public License
License URI: license.txt
Tags: black, blue, green, white, gray, custom-menu, dark, two-columns, fixed-width, custom-header, custom-background, threaded-comments, sticky-post, custom-colors, custom-header, custom-menu, light, theme-options, editor-style
*/

这可能很简单,但是在抓取之后,我得到了以下结果:

代码语言:javascript
复制
/* Theme Name: ColorWay Theme URI: www.inkthemes. com/wp-themes/colorway-wp-theme/ Description: Colorway is Simple, Elegant, Responsive and beautiful Theme with Easy Customization Options built by InkThemes.com. The Customization Options includes using your own Logos, Backgrounds, Analytics and your own Custom Footer Texts and Analytics that can be tweaked using Theme Options Panel. Colorway Theme is Single Click Intall feature, Just press activate button and your website will get ready with all the dummy content. Just set the content from the Themes Options Panel. Colorway by InkThemes .com is suitable for any business or personal website. The Theme can work for various different niches. It includes special styles for Gallery pages, and has an optional fullwidth page template as well. Author: InkThemes.com Author URI: www. inkthemes. com Version: 2.5.1 License: GNU General Public License License URI: license.txt Tags: black, blue, green, white, gray, custom-menu, dark, two-columns, fixed-width, custom-header, custom-background, threaded-comments, sticky-post, custom-colors, custom-header, custom-menu, light, theme-options, editor-style */

只有一段文字。你会怎么做呢?

编辑:

这是一个在代码不在一行的情况下可以工作的示例:

对不起,我以为你指的是我想刮掉的URL。

下面是一个在源代码上工作的示例,如果我只复制它:

代码语言:javascript
复制
$html = file_get_html('http://website-addons.net/wp-content/themes/powermag/style.css?ver=all');
          preg_match("/Theme\sName:\s?(.+)/", $html, $themename);
          preg_match("/Theme\sURI:\s?(.+?)\s/", $html, $uri);
          preg_match("/Version:(\s?.+?)\s/", $html, $version);
          preg_match("/Description:(.+)\s/", $html, $desc);
          preg_match("/Author:(.+?)\s/", $html, $author);
      echo $themename[1];

现在这样做是行不通的。我只会得到一大堆代码。

EN

回答 1

Stack Overflow用户

发布于 2015-01-20 04:45:38

当然,我想单行和多行都可以。

代码语言:javascript
复制
 # '/(?s)^(?=.*?\bTheme[ ]+Name:[ ]*(?<theme_name>(?&info)))?(?=.*?\bTheme[ ]+URI:[ ]*(?<theme_uri>(?&info)))?(?=.*?\bDescription:[ ]*(?<desc>(?&info)))?(?=.*?\bAuthor:[ ]*(?<author>(?&info)))?(?=.*?\bAuthor[ ]+URI:[ ]*(?<author_uri>(?&info)))?(?=.*?\bVersion:[ ]*(?<version>(?&info)))?(?=.*?\bLicense:[ ]*(?<license>(?&info)))?(?=.*?\bLicense[ ]+URI:[ ]*(?<license_uri>(?&info)))?(?=.*?\bTags:[ ]*(?<tags>(?&info)))?(?(1)|(?(2)|(?(3)|(?(4)|(?(5)|(?(6)|(?(7)|(?(8)|(?(9)|(?!))))))))))(?(DEFINE)(?<info>(?-s:(?![ ]*\b(?:Theme[ ]+Name:|Theme[ ]+URI:|Description:|Author:|Author[ ]+URI:|Version:|License:|License[ ]+URI:|Tags:)).)*))/'


 (?s)                                    # Dot all modifier
 ^                                       # BOS
                                         # Series of lookaheads, optional (ie. independent order, change if need be)
 (?=
      .*? \b Theme [ ]+ Name: [ ]* 
      (?<theme_name> (?&info) )               # (1), Theme, optional
 )?
 (?=
      .*? \b Theme [ ]+ URI: [ ]* 
      (?<theme_uri> (?&info) )                # (2), Theme URI, optional
 )?
 (?=
      .*? \b Description: [ ]* 
      (?<desc> (?&info) )                     # (3), Description
 )?
 (?=
      .*? \b Author: [ ]* 
      (?<author> (?&info) )                   # (4), Author
 )?
 (?=
      .*? \b Author [ ]+ URI: [ ]* 
      (?<author_uri> (?&info) )               # (5), Author URI
 )?
 (?=
      .*? \b Version: [ ]* 
      (?<version> (?&info) )                  # (6), Version
 )?
 (?=
      .*? \b License: [ ]* 
      (?<license> (?&info) )                  # (7), License
 )?
 (?=
      .*? \b License [ ]+ URI: [ ]* 
      (?<license_uri> (?&info) )              # (8), License URI
 )?
 (?=
      .*? \b Tags: [ ]* 
      (?<tags> (?&info) )                     # (9), Tags
 )?

 (?(1)                                   # Conditional, Fail if nothing matched
   |  (?(2)
        |  (?(3)
             |  (?(4)
                  |  (?(5)
                       |  (?(6)
                            |  (?(7)
                                 |  (?(8)
                                      |  (?(9)
                                           |  (?!)
                                         )
                                    )
                               )
                          )
                     )
                )
           )
      )
 )

 (?(DEFINE)                              # Subroutines
      (?<info>                                # (10 start), Info
           (?-s:                                   # Cluster, data is on same line (remove '-s' if need be)
                (?!                                     # Blacklist - Not any other categories, add more here
                     [ ]*                                    # trim spaces before next block
                     \b 
                     (?:
                          Theme [ ]+ Name: 
                       |  Theme [ ]+ URI: 
                       |  Description: 
                       |  Author: 
                       |  Author [ ]+ URI: 
                       |  Version: 
                       |  License: 
                       |  License [ ]+ URI: 
                       |  Tags: 
                     )
                )
                .                                       # Grab a data character
           )*                                      # End Cluster, do 0 to many times
      )                                       # (10 end)
 )

输出:用于多行样本:

代码语言:javascript
复制
 **  Grp 0 -  ( pos 0 , len 0 )  EMPTY 
 **  Grp 1 -  ( pos 16 , len 8 ) 
ColorWay
 **  Grp 2 -  ( pos 37 , len 53 ) 
http://www.inkthemes.com/wp-themes/colorway-wp-theme/
 **  Grp 3 -  ( pos 105 , len 699 ) 
Colorway is Simple, Elegant, Responsive and beautiful Theme with Easy Customization Options built by InkThemes.com. The Customization Options includes using your own Logos, Backgrounds, Analytics and your own Custom Footer Texts and Analytics that can be tweaked using Theme Options Panel. Colorway Theme is Single Click Intall feature, Just press activate button and your website will get ready with all the dummy content. Just set the content from the Themes Options Panel. Colorway by InkThemes.com is suitable for any business or personal website. The Theme can work for various different niches. It includes special styles for Gallery pages, and has an optional fullwidth page template as well.
 **  Grp 4 -  ( pos 814 , len 13 ) 
InkThemes.com
 **  Grp 5 -  ( pos 841 , len 24 ) 
http://www.inkthemes.com
 **  Grp 6 -  ( pos 876 , len 5 ) 
2.5.1
 **  Grp 7 -  ( pos 892 , len 26 ) 
GNU General Public License
 **  Grp 8 -  ( pos 933 , len 11 ) 
license.txt
 **  Grp 9 -  ( pos 952 , len 221 ) 
black, blue, green, white, gray, custom-menu, dark, two-columns, fixed-width, custom-header, custom-background, threaded-comments, sticky-post, custom-colors, custom-header, custom-menu, light, theme-options, editor-style
 **  Grp 10 -  NULL 

输出:用于单行样本:

代码语言:javascript
复制
 **  Grp 0 -  ( pos 0 , len 0 )  EMPTY 
 **  Grp 1 -  ( pos 15 , len 8 ) 
ColorWay
 **  Grp 2 -  ( pos 35 , len 47 ) 
www.inkthemes. com/wp-themes/colorway-wp-theme/
 **  Grp 3 -  ( pos 96 , len 700 ) 
Colorway is Simple, Elegant, Responsive and beautiful Theme with Easy Customization Options built by InkThemes.com. The Customization Options includes using your own Logos, Backgrounds, Analytics and your own Custom Footer Texts and Analytics that can be tweaked using Theme Options Panel. Colorway Theme is Single Click Intall feature, Just press activate button and your website will get ready with all the dummy content. Just set the content from the Themes Options Panel. Colorway by InkThemes .com is suitable for any business or personal website. The Theme can work for various different niches. It includes special styles for Gallery pages, and has an optional fullwidth page template as well.
 **  Grp 4 -  ( pos 805 , len 13 ) 
InkThemes.com
 **  Grp 5 -  ( pos 831 , len 19 ) 
www. inkthemes. com
 **  Grp 6 -  ( pos 860 , len 5 ) 
2.5.1
 **  Grp 7 -  ( pos 875 , len 26 ) 
GNU General Public License
 **  Grp 8 -  ( pos 915 , len 11 ) 
license.txt
 **  Grp 9 -  ( pos 933 , len 224 ) 
black, blue, green, white, gray, custom-menu, dark, two-columns, fixed-width, custom-header, custom-background, threaded-comments, sticky-post, custom-colors, custom-header, custom-menu, light, theme-options, editor-style */
 **  Grp 10 -  NULL 
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/28027012

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档