文章/答案/技术大牛

发布

社区首页 >问答首页 >用绝对URLs替换所有相对URLS

问用绝对URLs替换所有相对URLS
EN

Stack Overflow用户

提问于 2018-02-16 23:27:01

回答 2查看 5.2K关注 0票数 4

我看到了一些答案(比如这一个)，但我有一些更复杂的场景，我不知道如何解释。

我基本上有完整的HTML文档。我需要用绝对URL替换每一个相对URL。

来自潜在HTML元素的元素如下所示，也可能是其他情况：

<img src="/relative/url/img.jpg" />
<form action="/">
<form action="/contact-us/">
<a href='/relative/url/'>Note the Single Quote</a>
<img src="//example.com/protocol-relative-img.jpg" />

预期产出将是：

// "//example.com/" is ideal, but "http(s)://example.com/" are acceptable

<img src="//example.com/relative/url/img.jpg" />
<form action="//example.com/">
<form action="//example.com/contact-us/">
<a href='//example.com/relative/url/'>Note the Single Quote</a>
<img src="//example.com/protocol-relative-img.jpg" /> <!-- Unmodified -->

我不想取代协议相对URL，因为它们已经成为绝对URL。我已经想出了一些可以工作的代码，但我想知道我是否能稍微清理一下，因为它是极其的重复。

但是，我必须考虑src、href和action的单引号和双引号属性值(是否遗漏了任何可以具有相对URL的属性？)同时避免与协议相关的URL。

到目前为止，我的情况如下：

// Make URL replacement protocol relative to not break insecure/secure links
$url = str_replace( array( 'http://', 'https://' ), '//', $url );

// Temporarily Modify Protocol-Relative URLS
$str = str_replace( 'src="//', 'src="::TEMP_REPLACE::', $str );
$str = str_replace( "src='//", "src='::TEMP_REPLACE::", $str );
$str = str_replace( 'href="//', 'href="::TEMP_REPLACE::', $str );
$str = str_replace( "href='//", "href='::TEMP_REPLACE::", $str );
$str = str_replace( 'action="//', 'action="::TEMP_REPLACE::', $str );
$str = str_replace( "action='//", "action='::TEMP_REPLACE::", $str );

// Replace all other Relative URLS
$str = str_replace( 'src="/', 'src="'. $url .'/', $str );
$str = str_replace( "src='/", "src='". $url ."/", $str );
$str = str_replace( 'href="/', 'href="'. $url .'/', $str );
$str = str_replace( "href='/", "href='". $url ."/", $str );
$str = str_replace( 'action="/', 'action="'. $url .'/', $str );
$str = str_replace( "action='/", "action='". $url ."/", $str );

// Change Protocol Relative URLs back
$str = str_replace( 'src="::TEMP_REPLACE::', 'src="//', $str );
$str = str_replace( "src='::TEMP_REPLACE::", "src='//", $str );
$str = str_replace( 'href="::TEMP_REPLACE::', 'href="//', $str );
$str = str_replace( "href='::TEMP_REPLACE::", "href='//", $str );
$str = str_replace( 'action="::TEMP_REPLACE::', 'action="//', $str );
$str = str_replace( "action='::TEMP_REPLACE::", "action='//", $str );

我是说，这很管用，但很丑，我在想也许有更好的方法。

php

url

str-replace

relative-path

absolute-path

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-02-17 04:46:44

新答案

如果您真正的html文档是有效的(并且有一个父/包含标记)，那么最合适和最可靠的技术就是使用适当的DOM解析器。

下面是如何使用DOMDocument和Xpath来优雅地锁定和替换指定的标记属性：

Code1 -嵌套Xpath查询：(演示)

$domain = '//example.com';
$tagsAndAttributes = [
    'img' => 'src',
    'form' => 'action',
    'a' => 'href'
];

$dom = new DOMDocument; 
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($tagsAndAttributes as $tag => $attr) {
    foreach ($xpath->query("//{$tag}[not(starts-with(@{$attr}, '//'))]") as $node) {
        $node->setAttribute($attr, $domain . $node->getAttribute($attr));
    }
}
echo $dom->saveHTML();

Code2 -单个Xpath查询w/条件块：(演示)

$domain = '//example.com';
$targets = [
    "//img[not(starts-with(@src, '//'))]",
    "//form[not(starts-with(@action, '//'))]",
    "//a[not(starts-with(@href, '//'))]"
];

$dom = new DOMDocument; 
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query(implode('|', $targets)) as $node) {
    if ($src = $node->getAttribute('src')) {
        $node->setAttribute('src', $domain . $src);
    } elseif ($action = $node->getAttribute('action')) {
        $node->setAttribute('action', $domain . $action);
    } else {
        $node->setAttribute('href', $domain . $node->getAttribute('href'));
    }
}
echo $dom->saveHTML();

旧答案：(...regex不是“DOM感知的”，易受意外破坏的影响)

如果我正确地理解了您，您就会想到一个基值，并且您只想将它应用于相对路径。

模式演示

代码：(演示)

$html=<<<HTML
<img src="/relative/url/img.jpg" />
<form action="/">
<a href='/relative/url/'>Note the Single Quote</a>
<img src="//site.com/protocol-relative-img.jpg" />
HTML;

$base='https://example.com';

echo preg_replace('~(?:src|action|href)=[\'"]\K/(?!/)[^\'"]*~',"$base$0",$html);

输出：

<img src="https://example.com/relative/url/img.jpg" />
<form action="https://example.com/">
<a href='https://example.com/relative/url/'>Note the Single Quote</a>
<img src="//site.com/protocol-relative-img.jpg" />

模式分解：

~                      #Pattern delimiter
(?:src|action|href)    #Match: src or action or href
=                      #Match equal sign
[\'"]                  #Match single or double quote
\K                     #Restart fullstring match (discard previously matched characters
/                      #Match slash
(?!/)                  #Negative lookahead (zero-length assertion): must not be a slash immediately after first matched slash
[^\'"]*                #Match zero or more non-single/double quote characters
~                      #Pattern delimiter

票数 7

Stack Overflow用户

发布于 2018-02-17 02:57:49

我认为<base>元素是你要找的.

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base

<base>是<head>中的一个空元素。使用<base href="https://example.com/path/" />将告诉文档中的所有相关URL引用https://example.com/path/而不是父URL

票数 6

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/48836281

复制

相似问题

问用绝对URLs替换所有相对URLS
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用绝对URLs替换所有相对URLSEN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用绝对URLs替换所有相对URLS
EN