首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python和LanguageTool编码错误

Python和LanguageTool编码错误
EN

Stack Overflow用户
提问于 2022-01-23 13:12:16
回答 1查看 193关注 0票数 -1

我正在尝试将文本数据发布到语言工具服务器。我的文字包括商标符号和版权符号等。

在我的第一次尝试中,我只发布了这样的文本:

代码语言:javascript
复制
response = requests.post(
    LANGUAGETOOL_URL,
    data=f"language=en-US&text={text}"
    )

我从请求中收到了一个错误:

代码语言:javascript
复制
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2122' in position 317: Body ('™') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

这个职位之后,我将请求更新如下:

代码语言:javascript
复制
response = requests.post(
    LANGUAGETOOL_URL,
    data=f"language=en-US&text={text}".encode('utf-8')
    )

现在,请求不会出错,但是langaugetool抱怨它无法解码查询:

代码语言:javascript
复制
2022-01-23 13:09:47.366 +0000 INFO  [lt-server-thread-6] [logError] rID:- org.languagetool.server.LanguageToolHttpHandler An error has occurred: 'Could not decode query. Query length: 3085 Request method: POST', sending HTTP code 400. Access from 172.17.0.1, HTTP user agent: python-requests/2.27.1, User agent param: null, Referrer: null, language: null, h: 1, r: 29, time: 0m: ALL, l: DEFAULT, Stacktrace follows:org.languagetool.server.BadRequestException: Could not decode query. Query length: 3085 Request method: POST
    at org.languagetool.server.LanguageToolHttpHandler.getParameterMap(LanguageToolHttpHandler.java:470)
    at org.languagetool.server.LanguageToolHttpHandler.parseQuery(LanguageToolHttpHandler.java:452)
    at org.languagetool.server.LanguageToolHttpHandler.getRequestQuery(LanguageToolHttpHandler.java:417)
    at org.languagetool.server.LanguageToolHttpHandler.handle(LanguageToolHttpHandler.java:152)
    at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
    at jdk.httpserver/sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:82)
    at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:80)
    at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:725)
    at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
    at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:694)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)

我已经检查了所有的语言工具文档,并且找不到任何关于编码的内容。在这个阶段,我不知道问题是请求、语言工具,还是其他我做错了的事情。是否有可能将像商标符号这样的字符贴到语言工具上,如果是这样的话,怎么做呢?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-01-23 17:33:40

将参数作为字典传递。不需要手动编码任何内容:

代码语言:javascript
复制
import requests
import json

response = requests.post(
    'https://api.languagetoolplus.com/v2/check',
    data={'text':'check for mispelling™ © 2022', 'language':'en-US'}
    )

print(json.dumps(response.json(), ensure_ascii=False, indent=2))

输出:

代码语言:javascript
复制
{
  "software": {
    "name": "LanguageTool",
    "version": "5.7-SNAPSHOT",
    "buildDate": "2022-01-18 13:50:09 +0000",
    "apiVersion": 1,
    "premium": true,
    "premiumHint": "You might be missing errors only the Premium version can find. Contact us at support<at>languagetoolplus.com.",
    "status": ""
  },
  "warnings": {
    "incompleteResults": false
  },
  "language": {
    "name": "English (US)",
    "code": "en-US",
    "detectedLanguage": {
      "name": "English (US)",
      "code": "en-US",
      "confidence": 0.924
    }
  },
  "matches": [
    {
      "message": "This sentence does not start with an uppercase letter.",
      "shortMessage": "",
      "replacements": [
        {
          "value": "Check"
        }
      ],
      "offset": 0,
      "length": 5,
      "context": {
        "text": "check for mispelling™ © 2022",
        "offset": 0,
        "length": 5
      },
      "sentence": "check for mispelling™ © 2022",
      "type": {
        "typeName": "Other"
      },
      "rule": {
        "id": "UPPERCASE_SENTENCE_START",
        "description": "Checks that a sentence starts with an uppercase letter",
        "issueType": "typographical",
        "category": {
          "id": "CASING",
          "name": "Capitalization"
        },
        "isPremium": false
      },
      "ignoreForIncompleteSentence": true,
      "contextForSureMatch": -1
    },
    {
      "message": "Possible spelling mistake found.",
      "shortMessage": "Spelling mistake",
      "replacements": [
        {
          "value": "misspelling"
        },
        {
          "value": "dispelling"
        },
        {
          "value": "mi spelling"
        }
      ],
      "offset": 10,
      "length": 10,
      "context": {
        "text": "check for mispelling™ © 2022",
        "offset": 10,
        "length": 10
      },
      "sentence": "check for mispelling™ © 2022",
      "type": {
        "typeName": "Other"
      },
      "rule": {
        "id": "MORFOLOGIK_RULE_EN_US",
        "description": "Possible spelling mistake",
        "issueType": "misspelling",
        "category": {
          "id": "TYPOS",
          "name": "Possible Typo"
        },
        "isPremium": false
      },
      "ignoreForIncompleteSentence": false,
      "contextForSureMatch": 0
    }
  ]
}
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/70822417

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档