爬网时出错: java.net.ConnectException: path\to\file_folder:连接超时:连接
我试图使用FSCrawler将远程服务器文件摄取到Elasticserach(在我的本地机器上)的现有索引中,但得到了上面的异常。
下面是FSCrawler的_settings.yml文件:
---
name: "index_in_es_onefsc"
server:
hostname: "machinename.abc.com"
port: 22
username: "username"
password: "password@20"
protocol: "ssh"
fs:
url: "E:\\TestFilesToBeIndexed"
update_rate: "15m"
excludes:
- "*/~*"
json_support: false
filename_as_id: false
add_filesize: true
remove_deleted: true
add_as_inner_object: false
store_source: false
index_content: true
attributes_support: false
raw_metadata: false
xml_support: false
index_folders: true
lang_detect: false
continue_on_error: false
ocr:
language: "eng"
enabled: true
pdf_strategy: "ocr_and_text"
follow_symlinks: false
elasticsearch:
nodes:
- url: "http://127.0.0.1:9200"
bulk_size: 100
flush_interval: "5s"
byte_size: "10mb"发布于 2020-06-10 23:12:21
The documentation说,在Windows上,当在Windows机器之间来回执行SSH时,您必须使用以下形式:
我认为在Windows上,您需要使用:
name: "index_in_es_onefsc"
fs:
url: "/E:/TestFilesToBeIndexed"
server:
hostname: "machinename.abc.com"
port: 22
username: "username"
password: "password@20"
protocol: "ssh"请注意,在Windows机上运行FSCrawler时会出现there is a known issue。这个问题已经修复,但如果你使用的快照版本比6月26日发布的版本旧,你很可能需要升级。
https://stackoverflow.com/questions/61949295
复制相似问题