我使用BeautifulSoup解析XML并返回一个5个字符的字符串。Python将一些字符串解释为科学表示法(例如,0E001,1E001),但我需要得到字符串本身。
我尝试在多个地方应用str(some_code)和f"{some_code}“,但似乎找不出如何强迫Python将结果解释为字符串而不是科学表示法。以下是有关守则的摘录:
功能定义:
def find_eccn(text:str) -> str:
"""
This function returns a string with the substring preceding the first whitespace.
"""
return text[:text.find(" ")]在main()函数中:
for i in soup.find_all(["P"]):
# Get text in line and strip extra whitespace.
line = i.get_text().strip()
# Parse the line into the string.
eccn = find_eccn(line)如有任何建议,将不胜感激!
编辑:下面是标签中引起问题的文本示例:“1E001”技术,根据“通用技术说明”,用于“开发”或“生产”受控项目--该函数返回10而不是1E001。
编辑2:这是完整的代码:
# Import required modules.
from datetime import date
import requests
from bs4 import BeautifulSoup
import csv
# Define a function to find the ECCN in the CCL.
def find_eccn(text:str) -> str:
result = text[:text.find(" ")]
return result
# Define a function to find the ECCN description in the CCL.
def find_eccn_description(text):
result = text[text.find(" ")+1:]
return result
# Define a main function which generates a CSV file with CCL data.
def main():
# Determine today's date.
today = date(2022, 10, 27)
# Define the URL to be scraped.
url = f"https://www.ecfr.gov/api/versioner/v1/full/{today}/title-15.xml?subtitle=B&chapter=VII&subchapter=C&part=774"
# Initialize a requests Response object.
page = requests.get(url)
# Parse the XML data with BeautifulSoup.
soup = BeautifulSoup(page.content, features="xml")
# Remove all of the <note> tags.
for x in soup.find_all("NOTE"):
x.decompose()
# Reduce the soup object to only the first <div9> tag.
reduction = soup.find("DIV9")
# Define CCL field headers.
fields = ["classification", "eccn", "eccn_description", "item", "item_description"]
# Define a list for CCL rows.
rows = []
# Loop through the <*insert applicable*> tags in the soup object.
for i in reduction.find_all(["FP-2"]):
# Get text in line and strip extra whitespace.
line = i.get_text().strip()
# Parse the line into the ECCN and its description.
eccn = find_eccn(line)
eccn_description = find_eccn_description(line)
# Append data to the 'rows' list.
rows.append(["classification", eccn, eccn_description, "item", "item_description"])
# Create CSV file and write USML data to it.
with open(f"ccl_{today}.csv", "w", encoding="utf-8", newline="") as csvfile:
csvwriter = csv.writer(csvfile, delimiter="|")
csvwriter.writerow(fields)
csvwriter.writerows(rows)
# Set main function to run when the file is run.
if __name__ == "__main__":
main()发布于 2022-10-31 22:58:56
我认为您的代码工作正常:可能是1E001被用于查看.csv文件(即Excel)的程序解析为科学符号。
但是,从我的角度来看,输出eccn标识符似乎是正确的(除非我遗漏了什么)。下面是从发布的代码中生成的输出的屏幕截图:

您可以尝试在'数字前面插入一个eccn,以确保它在eccn中不被视为科学符号。
https://stackoverflow.com/questions/74269460
复制相似问题