首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >BeautifulSoup4 -在两个不同的标记之间协调多个html元素

BeautifulSoup4 -在两个不同的标记之间协调多个html元素
EN

Stack Overflow用户
提问于 2020-02-16 19:22:13
回答 1查看 95关注 0票数 1

我正在使用Python&bs4抓取页面

我从bs4获得的html源代码如下(为了便于阅读而略作清理):

代码语言:javascript
复制
<p style="text-align:justify;font-size:12.0px;font-family:Arial, Helvetica, sans-serif">
<span style="font-size:14.0px"><span style="font-family:Arial, Helvetica, sans-serif">

<strong>COMPANY DESCRIPTION</strong><br>
Here goes the first para of company description</span></span></p>

<p style="text-align:justify;font-size:12.0px;font-family:Arial, Helvetica, sans-serif">
<span style="font-size:14.0px"><span style="font-family:Arial, Helvetica, sans-serif">
Here goes the second para of company description</span></span></p>

<p><strong>PURPOSE AND OBJECTIVES</strong></p>
<p>To address requirements in the area of Supply Chain Management Extended Warehouse Management solutions, Build competencies at Solution Delivery Center to deliver solutions<br>

<strong>EXPECTATIONS AND TASKS&nbsp;</strong></p>
<ul>
    <li>Independently handle large implementation projects with focus on Warehouse Management processes such as inbound, outbound and internal processes. RF Device functions and Barcode support experience is desirable</li>
    <li>Able to lead EWM discussions, assessments and detail requirement studies with customers</li>
</ul>

<strong>KEY PERFORMANCE INDICATORS</strong></p>
<ul>
    <li>Customer Feedback/customer satisfaction scores</li>
    <li>Productive days/utilization as defined by the organization for projects/assessments/etc.</li>
    <li>Knowledge Management and creation of effective reusable components</li>
</ul>

<strong>EXPERIENCE REQUIREMENTS</strong></p>
<ul>
    <li>Minimum of 4+ years industry experience and a minimum of 5 to 6 years of SAP EWM experience</li>
    <li>Domain knowledge in Supply Chain Management in the areas of Planning, Manufacturing &amp; warehousing processes is a must</li>
</ul>

<p><strong>EDUCATION AND QUALIFICATIONS/SKILLS AND COMPETENCIES</strong></p>
<ul>
    <li>Degree in Engineering or IT</li>
    <li>SAP Certification in Extended Warehouse Management (EWM) desirable</li>
</ul>

<p><span style="font-family:Arial,Helvetica,sans-serif"><span style="font-size:14.0px"><strong>WHAT YOU GET FROM US </strong></span></span></p>

观测:

在上面的代码中,所有的节标题都在<strong> </strong>标记之间。标题可能因页而异。

My requirement:

  • 将所有html文本和标签结合起来,从公司描述后的第二个<strong>标签开始,即目标和目标,在包含您从美国获得的信息的标签之前。
  • 我并不是在寻找任何使用Selenium的解决方案,因为它会比较慢。

我正在刮的页面是链接我在刮擦

下面是我的python代码:

代码语言:javascript
复制
def scrape_url(url, method='bs4'):
    session = requests.session()
    page = session.get(url)
    soup = BeautifulSoup(page.text, 'html.parser')
    return soup

url = 'https://jobs.sap.com/job/Mumbai-Senior-Account-Executive-Job-MH/539212101/'
soup = scrape_url(url)
job_page = soup.body.find('div', attrs={'class': 'job'})
print(job_page)
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-02-16 21:09:23

首先使用正则表达式标识带有文本的标记,然后使用find_next_siblings()获取所有下一个兄弟姐妹,然后检查是否any siblings contains文本WHAT YOU GET FROM US

代码语言:javascript
复制
import re
import requests
from bs4 import BeautifulSoup
def scrape_url(url, method='bs4'):
    session = requests.session()
    page = session.get(url)
    soup = BeautifulSoup(page.text, 'html.parser')
    return soup

url = 'https://jobs.sap.com/job/Kuala-Lumpur-Business-Processes-Consultant-%28FICO%29-Job-14/541909901/'
soup = scrape_url(url)
findtag=soup.find('p',text=re.compile("PURPOSE AND OBJECTIVES"))
print(findtag.text)
for item in findtag.find_next_siblings():    
    if 'WHAT YOU GET FROM US' in item.text:
        break
    else:
        print(item.text.strip())

输出:在控制台上

代码语言:javascript
复制
PURPOSE AND OBJECTIVES

To address requirements in the area of Supply Chain Management Extended Warehouse Management solutions, Build competencies at Solution Delivery Center to deliver solutions especially in areas relating to SAP EWM

EXPECTATIONS AND TASKS

Independently handle large implementation projects with focus on Warehouse Management processes such as inbound, outbound and internal processes. RF Device functions and Barcode support experience is desirable
Able to lead EWM discussions, assessments and detail requirement studies with customers
Leading the team that are assigned to, in functional capacity, adding value to the project and to the final deliverables
Be actively involved in the preparation, conception, realization and Go Live of customer implementation projects
Demonstrate the ability to plan, run, and manage blueprint workshops / meetings with internal and external clients
Responsible for defining the scope of a project / opportunities, estimating efforts and project timelines
Participating in RFP discussions and estimating under guidance from a Bid Manager
Providing a creative source of ideas/solutions to address problems
Delivering billable components that meets a customer’s needs
KEY PERFORMANCE INDICATORS

Customer Feedback/customer satisfaction scores
Productive days/utilization as defined by the organization for projects/assessments/etc.
Knowledge Management and creation of effective reusable components
EXPERIENCE REQUIREMENTS

Minimum of 4+ years industry experience and a minimum of 5 to 6 years of SAP EWM experience
Domain knowledge in Supply Chain Management in the areas of Planning, Manufacturing & warehousing processes is a must
Must have strong ERP implementation experience
Experience in SAP Material Flow Systems (MFS) or any other third party automation tools will be desirable
Experience in EWM technical knowledge will be an added advantage
Knowledge on SAP S/4HANA Public Cloud solution and SAP IOT/Leonardo portfolio will be preferred but not mandatory
Good understanding of S/4HANA Order to Cash and Procure to Pay business processes
Good understanding of SAP ACTIVATE implementation methodology
Use of Solution Manager as a part of implementation life cycle is desirable
Good Communication skill in English.

EDUCATION AND QUALIFICATIONS/SKILLS AND COMPETENCIES

Degree in Engineering or IT
SAP Certification in Extended Warehouse Management (EWM) desirable
Minimum 4 to 5 full life cycle SAP EWM implementations
Strong knowledge in SAP SCM Extended Warehouse Management Solutions and S/4HANA Embedded EWM Solution
Good integration knowledge with other components with SAP S/4HANA (WM, SD, MM, PP) and other SAP or Non-SAP legacy applications
Knowledge of SCOR, APICS certification preferable
Strong client-facing experience and well-developed customer focus
Solid oral and written communication skills, with the demonstrated ability to communicate complex technical topics to management and non-technical audiences
Mobility is must – candidate must be ready to travel to project locations (short term and long term)
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/60252301

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档