我有一个txt文件与许多短的urls.Each网址是由一行分隔。我想解析url,以获得最终的link.Also一些url被重定向twice.How,以自动化这一点,以获得最终的url输出格式为每行一个url?更新:输入文本文件:
http://www.example.com/go/post-page-1
http://www.example.com/go/post-page-2
http://www.example.com/go/post-page-3 txt文件中需要的输出格式:
http://www.example.org/post-page-name
http://www.example.org/post-page-name
http://www.example.org/post-page-name以下是链接重定向的方式:
Initial URL:http://www.example.com/go/post-page
==>301 Permanent Redirect
Intermediate url:http://click.affiliate.com/tracking?url=http://www.example.org/post-page-name
==>302 Temporary Redirect
Final URL: http://www.example.org/post-page-name下面是我尝试过的代码,但它不会将URL解析到最终链接,而是解析到中间链接。
#!/bin/bash
rm resolved_urls.txt
for url in $(cat url.txt); do
wget -S "$url" 2>&1 | grep ^Location >> resolved_urls.txt
done发布于 2014-08-26 02:02:15
所以,你想要什么并不是100%清楚。但是我所看到的,以及我所猜测的,我认为这将为你做好准备:
#! /bin/bash
# Use the urls.txt as your input file for wget
# Use the url-redirect.txt as your output file from wget.
wget -S -i urls.txt -o url-redirect.txt
# Grep for your "Final URL" output, extract the URL, assuming
# the output you provided is what you're looking for, and is
# uniform, and redirect to your resolved_urls.txt file.
grep 'Final URL' url-redirect.txt | cut -d ' ' -f3>resolved_urls.txt
# Remove your trash temp file.
rm url-redirect.txt如果没有所有的重定向,这可能会快得多,但我认为这满足了您的需求。
发布于 2014-08-26 02:07:37
尝试如下所示:
#!/bin/bash
function getFinalRedirect {
local url=$1
while true; do
nextloc=$( curl -s -I $url | grep ^Location: )
if [ -n "$nextloc" ]; then
url=${nextloc##Location: }
else
break
fi
done
echo $url
}
url="http://stackoverflow.com/q/25485374/1563512"
getFinalRedirect $url当心无限重定向。这会产生以下结果:
$ ./test.bash
http://stackoverflow.com/questions/25485374/how-to-resolve-url-redirects然后,调用文件上的函数:
while read url; do
getFinalRedirect $url
done < urls.txt > finalurls.txthttps://stackoverflow.com/questions/25485374
复制相似问题