文章/答案/技术大牛

发布

问HTTP使用luasocket获取汉字
EN

Stack Overflow用户

提问于 2012-11-24 09:51:27

回答 1查看 467关注 0票数 2

我使用luasocket获取一个包含中文字符“gb2312”的网页(页面本身是用charset= "开奖结果“编码的)，如下所示：

require "socket"
host = '61.129.89.226'
fileformat = '/fcopen/cp_kjgg_dfw.jsp?lottery_type=ssq&lottery_issue=%s'
function getlottery(num)
  c = assert(socket.connect(host, 80))
  c:send('GET ' .. string.format(fileformat, num)  .. " HTTP/1.0\r\n\r\n")
  content = c:receive('*l')
  while content do
    if content and content:find('开奖结果') then -- failed
      print(content)
    end
    content = c:receive('*l')
  end
  c:close()
end

--http://61.129.89.226/fcopen/cp_kjgg_dfw.jsp?lottery_type=ssq&lottery_issue=2012138
getlottery('2012138')

不幸的是，它无法匹配预期的字符：

content:find('开奖结果') -- failed

我知道Lua能够找到unicode字符：

Lua 5.1.4  Copyright (C) 1994-2008 Lua.org, PUC-Rio
> if string.find("This is 开奖结果", "开奖结果") then print("found!") end
found!

那么我猜这可能是由于luasocket从web上检索数据的方式造成的。有没有人能解释一下这个？

谢谢。

lua

cjk

luasocket

sockets

unicode

回答 1

Stack Overflow用户

回答已采纳

发布于 2012-11-24 12:46:49

如果页面是用GB2312编码的，而脚本(文件本身)是用utf-8编码的，那么匹配是不可能的。因为.find()将查找utf-8码点，并且它会滑过您要查找的字符，因为它们的编码方式不同……

          开    奖      结     果
GB      bfaa   bdb1   bde1   b9fb
UTF-16  5f00   5956   7ed3   679c
UTF-8   e5bc80 e5a596 e7bb93 e69e9c

票数 4

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/13537579

复制

相似问题

问HTTP使用luasocket获取汉字
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问HTTP使用luasocket获取汉字EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问HTTP使用luasocket获取汉字
EN