我编写了一个小工具,通过检查产品商店页面来检查产品(是的,PS5)的可用性:
var client = new HttpClient();
HttpResponseMessage response = await client.GetAsync("https://www.mediamarkt.de/de/product/_sony-playstation®5-2661938.html");
HttpContent responseContent = response.Content;
using (var reader = new StreamReader(await responseContent.ReadAsStreamAsync()))
{
var output = reader.ReadToEndAsync();
Console.WriteLine(output.Result);
}由于某些原因,结果页面要求我执行验证码,同时在浏览器中调用完全相同的URL,从而得到没有验证码的正确页面。
这种行为的原因是什么?我如何避免它?
发布于 2020-11-27 21:55:16
这不是直接的答案,而是一种变通办法。
这个网站受到Cloudflare的保护,它向你展示了只能在javascript环境中解决的recaptcha。显然,HttpClient没有这样的功能。虽然有一些其他语言的解决方案,但我找不到任何适用于C#的解决方案。我将在Selenium中展示一个使用web浏览器驱动程序(在我的例子中是Chrome)的web测试框架的示例。
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
using OpenQA.Selenium.Support.UI;
using System;
class Program
{
public static void Main(string[] args)
{
using (var driver = new ChromeDriver())
{
driver.Url = "https://www.mediamarkt.de/de/product/_sony-playstation®5-2661938.html";
// selenium does not behave well when element you are looking for is not visible,
// this method helps us to close cookie banner that blocks the view
CloseCookieBannerIfAppears(driver);
var buyButton = By.XPath("//div[contains(@class, \"Badge\")]").FindElement(driver);
Console.WriteLine(buyButton.Text); // Ausverkauft
}
}
private static void CloseCookieBannerIfAppears(IWebDriver driver)
{
var buttonInAcceptCookieBannerSelector = By.XPath("//button[@id=\"privacy-layer-accept-all-button\"]");
var waitForCookieBanner = new WebDriverWait(driver, TimeSpan.FromSeconds(5));
if (waitForCookieBanner.Until(x => x.FindElements(buttonInAcceptCookieBannerSelector).Count > 0))
{
driver.FindElement(buttonInAcceptCookieBannerSelector)
.Click();
}
}
}看起来他们也有不受保护的API,所以你也应该能够直接获取这些数据。您可以看到,在您的链接和api调用中都有id参数- _sony-playstation®5-2661938.html vs productId=2661938
using Newtonsoft.Json.Linq;
using System;
using System.Net.Http;
using System.Threading.Tasks;
class Program
{
public static async Task Main(string[] args)
{
var httpClient = new HttpClient();
var response = await httpClient.GetAsync("https://delivery-prod-teasermanagement.cloud.mmst.eu/api/teaser/find?productId=2661938");
var content = await response.Content.ReadAsStringAsync();
var status = JArray.Parse(content)[0]["promotionData"]["badge"];
Console.WriteLine(status); // Ausverkauft
}
}也许还有其他一些边缘情况,但您应该能够理解这一点。
https://stackoverflow.com/questions/65037771
复制相似问题