文章/答案/技术大牛

发布

社区首页 >问答首页 >在Python (BeautifulSoup)中，如何防止字符串被解释为科学符号？

问在Python (BeautifulSoup)中，如何防止字符串被解释为科学符号？
EN

Stack Overflow用户

提问于 2022-10-31 21:50:18

回答 1查看 37关注 0票数 1

我使用BeautifulSoup解析XML并返回一个5个字符的字符串。Python将一些字符串解释为科学表示法(例如，0E001，1E001)，但我需要得到字符串本身。

我尝试在多个地方应用str(some_code)和f"{some_code}“，但似乎找不出如何强迫Python将结果解释为字符串而不是科学表示法。以下是有关守则的摘录：

功能定义：

def find_eccn(text:str) -> str:
    """
    This function returns a string with the substring preceding the first whitespace.
    """
    return text[:text.find(" ")]

在main()函数中：

for i in soup.find_all(["P"]):
    # Get text in line and strip extra whitespace.
    line = i.get_text().strip()
    # Parse the line into the string.
    eccn = find_eccn(line)

如有任何建议，将不胜感激！

编辑:下面是标签中引起问题的文本示例：“1E001”技术，根据“通用技术说明”，用于“开发”或“生产”受控项目--该函数返回10而不是1E001。

编辑2:这是完整的代码：

# Import required modules.
from datetime import date
import requests
from bs4 import BeautifulSoup
import csv

# Define a function to find the ECCN in the CCL.
def find_eccn(text:str) -> str:
    result = text[:text.find(" ")]
    return result

# Define a function to find the ECCN description in the CCL.
def find_eccn_description(text):
    result = text[text.find(" ")+1:]
    return result

# Define a main function which generates a CSV file with CCL data.
def main():
    # Determine today's date.
    today = date(2022, 10, 27)
    # Define the URL to be scraped.
    url = f"https://www.ecfr.gov/api/versioner/v1/full/{today}/title-15.xml?subtitle=B&chapter=VII&subchapter=C&part=774"
    # Initialize a requests Response object.
    page = requests.get(url)
    # Parse the XML data with BeautifulSoup.
    soup = BeautifulSoup(page.content, features="xml")
    # Remove all of the <note> tags.
    for x in soup.find_all("NOTE"):
        x.decompose()
    # Reduce the soup object to only the first <div9> tag.
    reduction = soup.find("DIV9")
    # Define CCL field headers.
    fields = ["classification", "eccn", "eccn_description", "item", "item_description"]
    # Define a list for CCL rows.
    rows = []
    # Loop through the <*insert applicable*> tags in the soup object.
    for i in reduction.find_all(["FP-2"]):
        # Get text in line and strip extra whitespace.
        line = i.get_text().strip()
        # Parse the line into the ECCN and its description.
        eccn = find_eccn(line)
        eccn_description = find_eccn_description(line)
        # Append data to the 'rows' list.
        rows.append(["classification", eccn, eccn_description, "item", "item_description"])
    # Create CSV file and write USML data to it.
    with open(f"ccl_{today}.csv", "w", encoding="utf-8", newline="") as csvfile:
        csvwriter = csv.writer(csvfile, delimiter="|")
        csvwriter.writerow(fields)
        csvwriter.writerows(rows)

# Set main function to run when the file is run.
if __name__ == "__main__":
    main()

python

beautifulsoup

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-10-31 22:58:56

我认为您的代码工作正常:可能是1E001被用于查看.csv文件(即Excel)的程序解析为科学符号。

但是，从我的角度来看，输出eccn标识符似乎是正确的(除非我遗漏了什么)。下面是从发布的代码中生成的输出的屏幕截图：

您可以尝试在'数字前面插入一个eccn，以确保它在eccn中不被视为科学符号。

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/74269460

复制

相似问题

问在Python (BeautifulSoup)中，如何防止字符串被解释为科学符号？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Python (BeautifulSoup)中，如何防止字符串被解释为科学符号？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Python (BeautifulSoup)中，如何防止字符串被解释为科学符号？
EN