我只是好奇包括浏览器在内的本地应用程序是如何读取/解释mime类型的。用于读取mime类型的插件是构建到每个应用程序中,还是在解释mime类型时,操作系统中有一个特定的系统文件夹?
RFC在定义MIME类型时使用字符图表作为引用:
(1) US-ASCII以外字符集中的文本消息体
虽然MDN使其听起来像使用了content-type,但您可以在类似于HTML的内容中找到
像content-type=image/jpeg或content-type=application/javascript这样的东西是否使用UTF-8字符图来确定它们的字符集(字形),而其他的逻辑则用来确定这些字符符号应该解释到什么?
或者这是否意味着每种内容类型都有自己的特殊图表(如utf-8 -> js-8??)这是字符的字形转换和字符字符的逻辑解释为二进制吗?
为什么它听起来像字符图和内容类型都意味着MIME?包含内容类型图表/解释逻辑的Mac和Linux系统的文件夹路径在哪里?
发布于 2017-12-13 12:21:07
在macOS上,您可以使用file --mime "/path/to/filename"报告文件的mime类型。
file的手册页(参见这里)揭示了在mime类型查找之前所发生的事情:
file tests each argument in an attempt to classify it. There are three
sets of tests, performed in this order: filesystem tests, magic tests,
and language tests. The first test that succeeds causes the file type to
be printed.
The filesystem tests are based on examining the return from a stat(2)
system call. The program checks to see if the file is empty, or if it's
some sort of special file. Any known file types appropriate to the sys-
tem you are running on (sockets, symbolic links, or named pipes (FIFOs)
on those systems that implement them) are intuited if they are defined in
the system header file <sys/stat.h>.
The magic tests are used to check for files with data in particular fixed
formats. The canonical example of this is a binary executable (compiled
program) a.out file, whose format is defined in <elf.h>, <a.out.h> and
possibly <exec.h> in the standard include directory. These files have a
``magic number'' stored in a particular place near the beginning of the
file that tells the UNIX operating system that the file is a binary exe-
cutable, and which of several types thereof. The concept of a ``magic''
has been applied by extension to data files. Any file with some invari-
ant identifier at a small fixed offset into the file can usually be
described in this way. The information identifying these files is read
from the compiled magic file /usr/share/file/magic.mgc, or the files in
the directory /usr/share/file/magic if the compiled file does not exist.
If a file does not match any of the entries in the magic file, it is
examined to see if it seems to be a text file. ASCII, ISO-8859-x, non-
ISO 8-bit extended-ASCII character sets (such as those used on Macintosh
and IBM PC systems), UTF-8-encoded Unicode, UTF-16-encoded Unicode, and
EBCDIC character sets can be distinguished by the different ranges and
sequences of bytes that constitute printable text in each set. If a file
passes any of these tests, its character set is reported. ASCII,
ISO-8859-x, UTF-8, and extended-ASCII files are identified as ``text''
because they will be mostly readable on nearly any terminal; UTF-16 and
EBCDIC are only ``character data'' because, while they contain text, it
is text that will require translation before it can be read. In addi-
tion, file will attempt to determine other characteristics of text-type
files. If the lines of a file are terminated by CR, CRLF, or NEL,
instead of the Unix-standard LF, this will be reported. Files that con-
tain embedded escape sequences or overstriking will also be identified.
Once file has determined the character set used in a text-type file, it
will attempt to determine in what language the file is written. The lan-
guage tests look for particular strings (cf. <names.h>) that can appear
anywhere in the first few blocks of a file. For example, the keyword .br
indicates that the file is most likely a troff(1) input file, just as the
keyword struct indicates a C program. These tests are less reliable than
the previous two groups, so they are performed last. The language test
routines also test for some miscellany (such as tar(1) archives).
Any file that cannot be identified as having been written in any of the
character sets listed above is simply said to be ``data''.发布于 2017-12-10 11:56:35
它们大多位于/usr/share/mime和/usr/share/mime中,linux或mac (几乎整个unix树)也不跟随扩展,只是扩展的内容只是为了方便用户。
注意:具体应用程序位于"/usr/share/mimelnk“(感谢David C. Rankin)
(您也可以尝试在终端中执行locate mime以获取更多信息)
https://stackoverflow.com/questions/46217787
复制相似问题