我在获取BeautifulSoup抓取的文本内部时遇到了问题。这是我当前的代码:
from bs4 import BeautifulSoup
url = 'https://patents.google.com/patent/AU2016304408B2/en'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
results = soup.find(class_="claim", num="1")
results这将打印出类似以下内容的内容(下面有更多内容):
<div class="claim" num="1">
<div class="claim-text">The claims defining the invention are as follows:</div>
<div class="claim-text">1. A compound of Formula l-a or l-a1:</div>
<div class="claim-text">or a pharmaceutically acceptable salt thereof, wherein:</div>
<div class="claim-text">the moiety of “-N(R<sup>1</sup>)(R<sup>2</sup>)” is a moiety of Formula a-26:</div>
<div class="claim-text">a-26 ring A<sup>2</sup> is 5- or 6- membered cycloalkyl or heterocycloalkyl; t2 is 0, 1, 2, or 3; t3 is 0, 1,2, or 3;</div>
<div class="claim-text">each of R<sup>5</sup>and R<sup>6</sup> is independently H or Ci-<sub>4</sub> alkyl;</div>
<div class="claim-text">R<sup>7</sup> is H, C1-6 alkyl, C3-7 cycloalkyl, or R<sup>10</sup>, wherein the C1-6 alkyl of R<sup>7</sup>is optionally substituted with one or more substituents each independently selected from the group consisting of OH, halogen, Ci-4 alkoxy, Ci-4 haloalkoxy, and C3-6 cycloalkyl, and wherein the C3-7 cycloalkyl of R<sup>7</sup> is optionally substituted with one or more substituents each independently selected from the group consisting of OH, halogen, Cm alkyl, Ci-4 haloalkyl, Ci-<sub>4</sub> alkoxy, and Ci<sub>4</sub> haloalkoxy;</div>我想从类“claim text”的.find中检索文本,并将其连接到一个字符串中,但我不确定是否应该迭代结果并使用另一个div检索文本,也不确定如何进行。
发布于 2021-06-02 23:20:42
是的,您可以遍历这些元素并连接它们。
from bs4 import BeautifulSoup
import requests
url = 'https://patents.google.com/patent/AU2016304408B2/en'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
results = soup.find(class_="claim", num="1")
claim_texts = results.find_all('div',{'class':'claim-text'})
fullStr = ' '.join([x.text for x in claim_texts])输出:
print(fullStr)
The claims defining the invention are as follows: 1. A compound of Formula l-a or l-a1: or a pharmaceutically acceptable salt thereof, wherein: the moiety of “-N(R1)(R2)” is a moiety of Formula a-26: a-26 ring A2 is 5- or 6- membered cycloalkyl or heterocycloalkyl; t2 is 0, 1, 2, or 3; t3 is 0, 1,2, or 3; each of R5and R6 is independently H or Ci-4 alkyl; R7 is H, C1-6 alkyl, C3-7 cycloalkyl, or R10, wherein the C1-6 alkyl of R7is optionally substituted with one or more substituents each independently selected from the group consisting of OH, halogen, Ci-4 alkoxy, Ci-4 haloalkoxy, and C3-6 cycloalkyl, and wherein the C3-7 cycloalkyl of R7 is optionally substituted with one or more substituents each independently selected from the group consisting of OH, halogen, Cm alkyl, Ci-4 haloalkyl, Ci-4 alkoxy, and Ci4 haloalkoxy; R8 is -L1-R11, -L2-R12, -L3-R13, -L4-R14, -C(R15)(Cy1)(Cy2), -C(R15)(Cy1)[-NR23-S(=O)2Cy2], or-L5-N(-L6-Cy3)(-L7-Cy4); each R9 is independently OH, oxo, halogen, optionally substituted Ci-4 alkyl, optionally substituted Cm alkoxy, or optionally substituted C3-6 cycloalkyl; R10 is -P(=O)(OR81)(OR82) or -S(=O)2OR90; each of L1, L2, L3, and L4 is independently absent, -(CR21R22)m-, -NR23-, -O-, -C(=O)-, S(=O)2- -S(=O)2-(CR21R22)n-, -C(=O)-(CR21R22)n-, -S(=O)2-NR23-, -C(=O)-NR23-, -(CR21R22)fi2016304408 21 Jan 2019 220 NR23-(CR21R22)t2-, -(CR21R22)fi-O-(CR21R22)f2-, -C(=O)-NR23-(CR21R22)P-, or -S(=O)2-NR23(CR21R22)P-; L5 is absent or -(CR21R22)-; L6 is absent or -(CR21R22)-; L7 is absent, -(CR21R22)-, or-S(=O)2-; R11 is 5- to 10-membered heteroaryl optionally substituted with one or more independently selected R31; R12 is 4- to 14-membered heterocycloalkyl optionally substituted with one or more independently selected R32; R13 is C6-10 aryl optionally substituted with one or more independently selected R33; R14 is C3-14 cycloalkyl optionally substituted with one or more independently selected R34; R15 is H, OH, halogen, Cm alkoxy, C1-4 alkyl, or cyclopropyl; each of R21 and R22 is independently H, OH, halogen, C1-3 alkyl, or cyclopropyl, wherein the C1-3 alkyl is optionally substituted with one or more substituents each independently selected from the group consisting of OH, halogen, C1-3 alkoxy, C1-3 haloalkoxy, and cyclopropyl; R23 is H, C1-4 alkyl, or cyclopropyl; each of R30, R31, R32, R33, and R34 is independently selected from the group consisting of halogen, -N(Ra)(Rb), -N(Rc)(C(=O)Rd), -N(Rc)(S(=O)2Rd), -C(=O)-N(Ra)(Rb), -C(=O)-Rd, -C(=O)ORd, -OC(=O)-Rd, -N(Rc)(S(=O)2Rd), -S(=O)2-N(Ra)(Rb), -SRd, -S(=O)2Rd, -ORd, -OR35, -CN, Cie alkyl, C2-6 alkenyl, C2-6 alkynyl, C3-10 cycloalkyl, 4- to 10-membered heterocycloalkyl, Ce-io aryl, 5- to 10-membered heteroaryl, (C3-10 cycloalkyl)-Ci-4 alkyl-, (4- to 10-membered heterocycloalkyl)-Ci-4 alkyl-, (Ce-io aryl)-Ci-4 alkyl-, and (5- to 10-membered heteroaryl)-Ci-4 alkyl-, wherein each of the C1-6 alkyl, C2-6 alkenyl, C2-6 alkynyl, C3-10 cycloalkyl, 4- to 10membered heterocycloalkyl, Ce-io aryl, 5- to 10-membered heteroaryl, (C3-10 cycloalkyl)-Ci-4 alkyl-, (4- to 10-membered heterocycloalkyl)-Ci-4 alkyl-, (Ce-io aryl)-Ci-4 alkyl-, and (5- to 10membered heteroaryl)-Ci-4 alkyl- is optionally substituted with one or more independently selected R36; and wherein each of the C1-6 alkyl, C3-10 cycloalkyl, 4- to 10-membered heterocycloalkyl, (C3-10 cycloalkyl)-Ci-4 alkyl-, (4-to 10-membered heterocycloalkyl)-Ci-4 alkyl-, (C6-10 aryl)-Ci-4 alkyl-, and (5- to 10-membered heteroaryl)-Ci-4 alkyl- is further optionally substituted one or more oxo; each R35 is independently selected from the group consisting of H, C1-6 alkyl, C3-10 cycloalkyl, 4- to 10-membered heterocycloalkyl, C6-10 aryl, 5- to 10-membered heteroaryl, (C3-10 cycloalkyl)-Ci-4 alkyl-, (4- to 10-membered heterocycloalkyl)-Ci-4 alkyl-, (Ce-io aryl)-Ci-4 alkyl-, 2016304408 21 Jan 2019 221 and (5- to 10-membered heteroaryl)-Ci-4 alkyl-, wherein each of the C1-6 alkyl, C3-10 cycloalkyl, 4- to 10-membered heterocycloalkyl, C6-10 aryl, 5- to 10-membered heteroaryl, (C3-10 cycloalkyl)C1-4 alkyl-, (4- to 10-membered heterocycloalkyl)-Ci-4 alkyl-, (Ce-io aryl)-Ci-4 alkyl-, and (5- to 10membered heteroaryl)-Ci-4 alkyl- is optionally substituted with one or more substituents independently selected from the group consisting of halogen, -CN, -C(=O)Ci-4 alkyl, -C(=O)OH, -C(=O)O-Ci-4 alkyl, -C(=O)NHCi-4 alkyl, -C(=O)N(Ci-4 alkyl)2, oxo,...https://stackoverflow.com/questions/67807949
复制相似问题