首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何抓取受登录保护的.jsp页面?

如何抓取受登录保护的.jsp页面?
EN

Stack Overflow用户
提问于 2020-04-08 21:01:12
回答 1查看 94关注 0票数 1

我想从一个具有JavaServer页面和登录保护的网站抓取一些数据。

问题是登录页面是动态创建的。起初,我发现我无法登录,因为我无法加载登录页面。登录页面的url类似于https://xxxx.xxxxxxx.edu.au/login/pages/login.jsp。下面是我的python代码:

代码语言:javascript
复制
def print_HTML(url):
        request = req.Request(url, headers={"User-Agent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0"})
        with req.urlopen(request) as response:
                data = response.read().decode("utf-8")
        html =  bs4.BeautifulSoup(data, "html.parser")
        print(html.prettify())

下面是输出:

代码语言:javascript
复制
<head>
 <meta content="no-cache" http-equiv="pragma"/>
 <meta content="no-cache" http-equiv="cache-control"/>
 <meta content="0" http-equiv="expires"/>
 <noscript>
  <meta content="1;url=https://my.xxxxxxx.edu.au/studentportal/faces/home" http-equiv="refresh"/>
 </noscript>
 <script type="text/javascript">
  function delayedRedirect(){
                        window.location = "https://my.xxxxxxx.edu.au/studentportal/faces/home";
                }
 </script>
 <title>
  Login redirect page
 </title>
</head>
<body onload="setTimeout('delayedRedirect()', 1000)">
 <i> 
  Redirecting to... https://my.xxxxxxx.edu.au/studentportal/faces/home
 </i>
</body>

在此之后,我返回到登录前的最后一个页面,它类似于https://xxxxxx.xxx.xxxxxxx.edu.au,打印它的html,我发现要转到登录页面的href是https://xxxxxx.xxx.xxxxxxx.edu.au/login/saml。但是,当我尝试打印它时,它显示

代码语言:javascript
复制
<html>
 <head>
  <base target="_self"/>
 </head>
 <body onload="document.myForm.submit()">
  <noscript>
   <p> 
    JavaScript is required. Enable JavaScript to use OAM Server.
   </p>
  </noscript>
  <form action="https://auth.xxxxxxx.edu.au/login/pages/login.jsp" method="post" name="myForm">
   <!------------ DO NOT REMOVE ------------->
   <!----- loginform renderBrowserView ----->
   <!-- Required for SmartView Integration --> 
   <input name="contextType" type="hidden" value="external"/>
   <input name="username" type="hidden" value="string"/>
   <input name="OverrideRetryLimit" type="hidden" value="6"/>
   <input name="password" type="hidden" value="secure_string"/>
   <input name="challenge_url" type="hidden" value="https%3A%2F%2Fauth.unimelb.edu.au%2Flogin%2Fpages%2Flogin.jsp"/>
   <input name="request_id" type="hidden" value="1031689933436939677"/>
   <input name="authn_try_count" type="hidden" value="0"/>
   <input name="OAM_REQ" type="hidden" value="VERSION_4~Dx0y9HYwplTsrfWQuqCU5Y2hQlk96FnIkBSXmLxTfuyLy0XUtqGK20TF4Z7nTGFfHouR5m7KmcK96in%2f670EPaaukVhyOLld36hlyZe4ZtPW9Bvz%2bs%2fN%2fQXgcBw5z5ppJksT6HckJtxSI1TSWL5fPHKBjCQk0MuIzrxmH%2b%2fP4NnoeL73NCL4mCLoIu6NrPQ8q28kYR8Gi2Qh9i1mqOtr1QXl%2bXzeAXMS6ShA307odSH%2fT1GzsEcxTEEPKd7JLXUd8Z28iQM4t5PyVQJVHqiqTgyVxvFgiPlsrs%2bBb%2bhJ1tmCyvuPPsCc9cOsX7p1Jg0gHZkoRJjxrbYhXKVqJvAj9HhBve5zI6Hs73m6YyKyWgztO3gmlj5clBHMAzEY5EJ4MU8OojP6fxdd5cRL2GQPUQ6cGk9IV4HOSV2SPCaKdzkXGt5DwLXnMLsx3AJpiPEXptSns%2fDm%2fzPcnWbtD%2fZrFKgM%2b6hatFtlsFPk65N0fbNu1T5FMGVNioqIVBbkdcNyEHyoPmioCBXb9eB5KWXdGDudLApKy0nVdLjrYE14hRDwZstX8SkpqvKhjKB5JeiWCKuPvPe%2bWFg6ZcVftSj3UuaNaH%2f4Wst4suXGKq9t8di2e1kbJAV5pBamxkwVKrHJ9cz%2boJzqgJ5Cx6s1dxb%2brHBxTw6VJ%2f9otIlaplxNvKwilRUOhXqgoGVJxsVp5z9BDdnWt%2fzgjK8Rxq6qtQt1LfmM5pSdNB3Rn%2b6Q3S0kgofs7goOr%2bEqo1Fc3kTxn%2ffMjvASU%2fdYwFuVafahA4lkgplHT7986SdHt8V1A5dLLRSdX8PgwHMd4XlJHYEkw1Neeoog%2fG7Lq%2fysG%2bfDc5rCvjoj0gLZy%2fowUhgqYwaZvfNGLNkH7H802e0bP59Ms3IU605%2f9o7in%2bS1u3ZE3PnNabP1pu0somVqcRxz8hxOEkRbRLHZwYB%2fTNvAalywCAZ9sCwweH8tU0oFAuXwWdUDuviq8Hz%2bBWwhHEJkSfv%2b100lgRBlX6p%2b9HJYW4cqgcXU1oT%2f8qBywYHw1Ap6DmZb6L0S7MUNw%2ft8%2fg%2bO5NwGRbrOjlV0cQ4tCEU3ehZiEnXwuunjVOAfjjiyACjkfstnY7vSsFbcWEeBwtvZIW2RXFFV2qYPaS6iqZxlt0fWpV2VvL%2fb9BipKOgtJxFigvnsSa5a9THBrlBM%2byA1pNNI2dm3s18Fx68z0oIQhNDEVVx7Q7oOl5TBdUxYgU7uWrkqtKf%2flxvGrsKEmhdWModmOIiYKq2I4U6KcYmN2fogi7neh6t%2fZbg7%2bMQ%2fvQAeVOrKpJWB558DXm0qDW65msxQgmwhg6ct7D29iSOVDyLGpnrMAw5QU%2fB7jwx5OinbJ83UyGCJqTm0T9%2fm9fAq4ofjQ0p3YV62iokrCC0E0ZR7GBh6%2bFaaElOSdoL1nxdVJN2KNXTuwFgg8iK5%2fPVcoYgLsCRXGq0Dutwaf%2fp6UgjdTKHz5y0W4DO3ZTsPF4jhhWUJ%2fvtG3slDJHN1EOb78ACnrAi5S4q109xFPqj8s5U835yUdaHIMFXxMpT2pWutWtbC39p09y9LXuwUM4obMutVmA5EvYvSLqPnu3KAiMGDttfbvmkA9AjSDvV6mAwv8k9urj%2bo%2fSQkFxNt3aUD4ymERZ7ksyjQbm2ud%2f15gFvfNizTRE6JsamIWO4UICJUX6Pr7A%3d%3d"/>
   <input name="locale" type="hidden" value="en_AU"/>
   <input name="resource_url" type="hidden" value="%252Fuser%252Floginsso"/>
  </form>
 </body>
</html>

有没有办法让我的python程序转到.jsp页面?谢谢。

EN

回答 1

Stack Overflow用户

发布于 2020-04-08 21:14:51

在浏览器中打开devtools,单击网络,转到

https://auth.xxxxxxx.edu.au/login/pages/login.jsp,登录。在post查询中检查浏览器发送的内容。如果它只是简单的身份验证,只需复制所有带有头部等的post查询,并将所有这些放到您的请求中。

或者甚至更简单地从devtoll复制为CURL并转换为请求(例如这里的https://curl.trillworks.com/)

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61101314

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档