首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何从网站(如Udacity)中提取课程名称/学校/描述

如何从网站(如Udacity)中提取课程名称/学校/描述
EN

Stack Overflow用户
提问于 2021-08-02 00:35:29
回答 1查看 126关注 0票数 0
代码语言:javascript
复制
import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.udacity.com/courses/all")
soup = BeautifulSoup(r.text)
summaries = soup.find_all("li", class_="") #using "card-list_catalogCardListItem__aUQtx" for class_ resulted in 0 case
print('Number of Courses:', len(summaries)) #this finds 225 case 

summaries[7].select_one("li").get_text().strip() #output: 'AI for Business Leaders'
summaries[7].select_one("a").get_text().strip() #output:'Artificial Intelligence'

courses = []
for summary in summaries:
    title = summary.select_one("a").get_text().strip()
    school = summary.select_one("li").get_text().strip()
    courses.append((title, school))
#to get all the summaries text extraction will result in "AttributeError: 'NoneType' object has no attribute 'get_text'"

为了教育目的,为了

(1)所有的智能课程(2)用什么学校描述?

我试图使用"find_all“来使用上面的代码。我的手动搜索显示页面上有264个课程。我最初使用的是“find_all("li", class_="card-list_catalogCardListItem__aUQtx")”标签,结果为0。当我空着class_测试时,最接近的数字是225。然而,当我打算使用“for循环”提取所有课程时,这最终会导致AttributeError。这可能是因为并非所有发现的摘要都是可读的"'NoneType' object has no attribute 'get_text'“。

我的问题是:我怎样才能做到这一点?(因为find_all标记的发现似乎失败了)

EN

回答 1

Stack Overflow用户

发布于 2021-08-02 00:59:11

通过发送GET请求将页面动态加载到:

代码语言:javascript
复制
https://www.udacity.com/data/catalog.json?v=%223cd8649e%22

您可以向该链接发送请求以接收所有数据,在那里您可以以Python字典(dict)的形式访问键/值:

代码语言:javascript
复制
import requests


url = "https://www.udacity.com/data/catalog.json?v=%223cd8649e%22"
response = requests.get(url).json()

for data in response:
    course = data["payload"]
    if "shortSummary" in course:
        print("{:<50} {:<60} {:<50}".format(course["school"], course["title"], course["shortSummary"]))

产出(截断):

代码语言:javascript
复制
School of Data Science                             Data Engineer                                                Data Engineering is the foundation for the new world of Big Data. Enroll now to build production-ready data infrastructure, an essential skill for advancing your data career.
School of Data Science                             Data Scientist                                               Build effective machine learning models, run data pipelines, build recommendation systems, and deploy solutions to the cloud with industry-aligned projects.
School of Data Science                             Data Analyst                                                 Use Python, SQL, and statistics to uncover insights, communicate critical findings, and create data-driven solutions.
School of Data Science                             Programming for Data Science with Python                     Learn the fundamental programming tools for data professionals: Python, SQL, the Terminal and Git.
School of Autonomous Systems                       C++                                                          Get hands-on experience by building five real-world projects.
School of Product Management                       Product Manager                                              Envision and execute the development of industry-defining products, and learn how to successfully bring them to market.

使用{:<50} {:<60} {:<50}将文本与指定的数量对齐。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/68615350

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档