public DocumentContent readPath(InputStream stream,Path path)
GitHub - apache/tika: The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.11.3</version> </dependency> Tika Apache-Tika
doclangchain4j+poi小试牛刀document-parsers/apache-tika