文章/答案/技术大牛

发布

社区首页 >问答首页 >unicode字符码的uint和char铸造

问unicode字符码的uint和char铸造
EN

Stack Overflow用户

提问于 2015-05-19 11:30:22

回答 2查看 1.6K关注 0票数 3

有人能确切地解释一下这段代码是怎么回事吗？

var letter= 'J';
char c = (char)(0x000000ff & (uint)letter);

我知道它得到了字符的unicode表示，但是我不完全理解以下角色：

(0x000000ff & (uint)letter

0x0000ff的目的是什么，以及字母投给(uint)的目的是什么，是否有一种短手方法来达到同样的效果？

谢谢

更新

好吧，看起来大多数人都认为这是个糟糕的例子，我不想包括整个类，但我想我也可以这样做，这样就可以看到上下文了。来自参考源WebHeaderCollection

  private static string CheckBadChars(string name, bool isHeaderValue)
    {
        if (name == null || name.Length == 0)
        {
            // emtpy name is invlaid
            if (!isHeaderValue)
            {
                throw name == null ? 
                    new ArgumentNullException("name") :
                    new ArgumentException(SR.GetString(SR.WebHeaderEmptyStringCall, "name"), "name");
            }

            // empty value is OK
            return string.Empty;
        }

        if (isHeaderValue)
        {
            // VALUE check
            // Trim spaces from both ends
            name = name.Trim(HttpTrimCharacters);

            // First, check for correctly formed multi-line value
            // Second, check for absenece of CTL characters
            int crlf = 0;
            for (int i = 0; i < name.Length; ++i)
            {
                char c = (char)(0x000000ff & (uint)name[i]);
                switch (crlf)
                {
                    case 0:
                        if (c == '\r')
                        {
                            crlf = 1;
                        }
                        else if (c == '\n')
                        {
                            // Technically this is bad HTTP.  But it would be a breaking change to throw here.
                            // Is there an exploit?
                            crlf = 2;
                        }
                        else if (c == 127 || (c < ' ' && c != '\t'))
                        {
                            throw new ArgumentException(SR.GetString(SR.WebHeaderInvalidControlChars), "value");
                        }

                        break;

                    case 1:
                        if (c == '\n')
                        {
                            crlf = 2;
                            break;
                        }

                        throw new ArgumentException(SR.GetString(SR.WebHeaderInvalidCRLFChars), "value");

                    case 2:
                        if (c == ' ' || c == '\t')
                        {
                            crlf = 0;
                            break;
                        }

                        throw new ArgumentException(SR.GetString(SR.WebHeaderInvalidCRLFChars), "value");
                }
            }

            if (crlf != 0)
            {
                throw new ArgumentException(SR.GetString(SR.WebHeaderInvalidCRLFChars), "value");
            }
        }
        else
        {
            // NAME check
            // First, check for absence of separators and spaces
            if (name.IndexOfAny(InvalidParamChars) != -1)
            {
                throw new ArgumentException(SR.GetString(SR.WebHeaderInvalidHeaderChars), "name");
            }

            // Second, check for non CTL ASCII-7 characters (32-126)
            if (ContainsNonAsciiChars(name))
            {
                throw new ArgumentException(SR.GetString(SR.WebHeaderInvalidNonAsciiChars), "name");
            }
        }

        return name;
    }

令人感兴趣的是：

char c = (char)(0x000000ff & (uint)name[i]);

.net

unicode

回答 2

Stack Overflow用户

发布于 2015-05-19 11:43:30

此代码所做的不是转换为Unicode。如果说有什么不同的话，那就是相反：

0x000000ff &部分基本上放弃了unicode字母的第二个字节，并将其转换为一个只有一个字节长的字母。或者更准确地说:它只保留最重要的字节，并丢弃所有其他字节--这对于char来说是一样的，因为它的大小为两个字节。

我仍然认为这是没有意义的，因为它会导致假阳性:实际上使用这两个字节的Unicode字母只会丢失其中一个字节()，从而变成一个不同的字母。

我只需简单地去掉这段代码，并在任何地方使用name[i] --您可以使用c。

票数 1

Stack Overflow用户

发布于 2015-05-19 11:44:00

0x0000ff的目的是什么，以及把信投给(uint)

要从0.255范围内获得代码字符：char占用内存中的2个字节

例如：

var letter= (char)4200; // ၩ
char c = (char)(0x000000ff & (uint)letter); // h

// or
// char c = (char)(0x00ff & (ushort)letter);

// ushort (2-byte unsigned integer) is enough: uint is 4-byte unsigned integer

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/30324391

复制

相似问题

问unicode字符码的uint和char铸造
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问unicode字符码的uint和char铸造EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问unicode字符码的uint和char铸造
EN