首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在Python (BeautifulSoup)中,如何防止字符串被解释为科学符号?

在Python (BeautifulSoup)中,如何防止字符串被解释为科学符号?
EN

Stack Overflow用户
提问于 2022-10-31 21:50:18
回答 1查看 37关注 0票数 1

我使用BeautifulSoup解析XML并返回一个5个字符的字符串。Python将一些字符串解释为科学表示法(例如,0E001,1E001),但我需要得到字符串本身。

我尝试在多个地方应用str(some_code)和f"{some_code}“,但似乎找不出如何强迫Python将结果解释为字符串而不是科学表示法。以下是有关守则的摘录:

功能定义:

代码语言:javascript
复制
def find_eccn(text:str) -> str:
    """
    This function returns a string with the substring preceding the first whitespace.
    """
    return text[:text.find(" ")]

在main()函数中:

代码语言:javascript
复制
for i in soup.find_all(["P"]):
    # Get text in line and strip extra whitespace.
    line = i.get_text().strip()
    # Parse the line into the string.
    eccn = find_eccn(line)

如有任何建议,将不胜感激!

编辑:下面是标签中引起问题的文本示例:“1E001”技术,根据“通用技术说明”,用于“开发”或“生产”受控项目--该函数返回10而不是1E001。

编辑2:这是完整的代码:

代码语言:javascript
复制
# Import required modules.
from datetime import date
import requests
from bs4 import BeautifulSoup
import csv

# Define a function to find the ECCN in the CCL.
def find_eccn(text:str) -> str:
    result = text[:text.find(" ")]
    return result

# Define a function to find the ECCN description in the CCL.
def find_eccn_description(text):
    result = text[text.find(" ")+1:]
    return result

# Define a main function which generates a CSV file with CCL data.
def main():
    # Determine today's date.
    today = date(2022, 10, 27)
    # Define the URL to be scraped.
    url = f"https://www.ecfr.gov/api/versioner/v1/full/{today}/title-15.xml?subtitle=B&chapter=VII&subchapter=C&part=774"
    # Initialize a requests Response object.
    page = requests.get(url)
    # Parse the XML data with BeautifulSoup.
    soup = BeautifulSoup(page.content, features="xml")
    # Remove all of the <note> tags.
    for x in soup.find_all("NOTE"):
        x.decompose()
    # Reduce the soup object to only the first <div9> tag.
    reduction = soup.find("DIV9")
    # Define CCL field headers.
    fields = ["classification", "eccn", "eccn_description", "item", "item_description"]
    # Define a list for CCL rows.
    rows = []
    # Loop through the <*insert applicable*> tags in the soup object.
    for i in reduction.find_all(["FP-2"]):
        # Get text in line and strip extra whitespace.
        line = i.get_text().strip()
        # Parse the line into the ECCN and its description.
        eccn = find_eccn(line)
        eccn_description = find_eccn_description(line)
        # Append data to the 'rows' list.
        rows.append(["classification", eccn, eccn_description, "item", "item_description"])
    # Create CSV file and write USML data to it.
    with open(f"ccl_{today}.csv", "w", encoding="utf-8", newline="") as csvfile:
        csvwriter = csv.writer(csvfile, delimiter="|")
        csvwriter.writerow(fields)
        csvwriter.writerows(rows)

# Set main function to run when the file is run.
if __name__ == "__main__":
    main()
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-10-31 22:58:56

我认为您的代码工作正常:可能是1E001被用于查看.csv文件(即Excel)的程序解析为科学符号。

但是,从我的角度来看,输出eccn标识符似乎是正确的(除非我遗漏了什么)。下面是从发布的代码中生成的输出的屏幕截图:

您可以尝试在'数字前面插入一个eccn,以确保它在eccn中不被视为科学符号。

票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/74269460

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档