The Linux Console Tools Yann Dirson, dirson@debian.org 18 May 1999 ____________________________________________________________ Table of Contents 1. Status of this document 1.1 Other documents 2. What the Linux Console Tools are 3. Understanding the big picture of the console 4. What is Unicode 5. Understanding and setting up the keyboard driver 5.1 How it works 5.2 See also 6. Understanding and setting up the screen driver 6.1 Unicode is everywhere 6.1.1 Screen Font Maps 6.1.2 SFM Fallback tables 6.2 The unicode screen-mode 6.3 The byte screen-mode 6.3.1 Charset slots 6.4 Special UCS2 codes 6.5 About the old 8-bit ``screen maps'' 6.6 See also 7. Font files 7.1 The formats 7.2 Tools 7.2.1 Font-files manipulation tools 7.2.2 Font editors 8. The libraries 8.1 libconsole, 8.2 libcfont, 8.3 libctutils, 9. The future of the console driver and of the Linux Console Tools ______________________________________________________________________ 11.. SSttaattuuss ooff tthhiiss ddooccuummeenntt This is an introduction to the Linux Console Tools package. You should refer to the manpages for more details. 11..11.. OOtthheerr ddooccuummeennttss The Linux Console Tools WWW site may contain additionnal informations, latest news, and such. Files in the doc/contrib/ directory are unsupported, and may be obsolete, but are provided just in case someone needs them. README.{acm,sfm,keytables} give some info on the respective included data files. kbd.FAQ.* is the Console and Keyboard HOWTO by Andries Brouwer, as included in kbd 0.97. It would need some corrections, though. 22.. WWhhaatt tthhee LLiinnuuxx CCoonnssoollee TToooollss aarree The Linux Console Tools are a set of programs allowing the user to setup/customize your console (restricted meaning: text mode screen + keyboard only). It is derived from version 0.94 of the kbd package, and has benefited from most features introduced in kbd until version 0.97. The Linux Console Tools are still under development, but using it just as a replacement for kbd should be quite safe, at it fixes many bugs kbd has. 33.. UUnnddeerrssttaannddiinngg tthhee bbiigg ppiiccttuurree ooff tthhee ccoonnssoollee The console driver is currently made of 2 sub-drivers: the keyboard driver, and the screen driver. Basically, the keyboard driver sends characters to your application, then the application does its own job, and sends to the screen driver the characters to be displayed. 44.. WWhhaatt iiss UUnniiccooddee Traditionnaly, character encodings use 8 bits, and thus are limited to 256 characters. This causes problems because: 1. it's not enough for some languages; 2. people speaking languages using different encodings have to choose which one they use, and have to switch the system's state when changing the language, which makes it difficult to mix several languages in the same file; 3. etc... Thus the UCS (Universal Character Set), also know as _U_n_i_c_o_d_e was created to handle and mix all of our world's scripts. This is a 32-bit (4 bytes) encoding, otherwise known as UCS4 because of the size of its characters, which is normalised by ISO as the 10646-1 standard. The most widely used characters from UCS are contained in the UCS2 16-bit subset of UCS; this is the subset used by the Linux console. For convenience, the UTF8 encoding was designed as a variable-length encoding (with 8 bytes of maximum length) with ASCII compatibility; all chars that have a UCS4 encoding can be expressed as a UTF8 sesquence, and vice-versa. The Unicode consortium defines additional properties for UCS2 characters. See: unicode(7), utf-8(7). 55.. UUnnddeerrssttaannddiinngg aanndd sseettttiinngg uupp tthhee kkeeyybbooaarrdd ddrriivveerr 55..11.. HHooww iitt wwoorrkkss The keyboard driver is made up several levels: +o the keyboard hardware, which turns the user's finger moves into so- called _s_c_a_n_c_o_d_e_s (Disclaimer: this is not really part of the software driver itself; no support is provided for bugs in this domain ;-). An event (key pressed or released) generates from 1 to 6 _s_c_a_n_c_o_d_e_s. +o a mechanism turning _s_c_a_n_c_o_d_e_s into _k_e_y_c_o_d_e_s using a translation- table which you can access with the getkeycodes(8) and setkeycodes(8) utilities. You will only need to look at that if you have some sort of non-standard (or programmable ?) keys on your keyboard. AFAIK, these keycodes are the same among a set of keyboards sharing the same hardware, but differing in the symbols drawn on the keys. +o a mechanism turning _k_e_y_c_o_d_e_s into _c_h_a_r_a_c_t_e_r_s using a _k_e_y_m_a_p. You can access this _k_e_y_m_a_p using the loadkeys(1) and dumpkeys(1) utilities. The keyboard driver can be in one of 4 modes (which you can access using kbd_mode(1)), which will influence what type of data applications will get as keyboard input: +o the scancode (K_RAW) mode, in which the application gets scancodes for input. It is used by applications that implement their own keyboard driver. For example, X11 does that. +o the keycode (K_MEDIUMRAW) mode, in which the application gets information on which keys (identified by their keycodes) get pressed and released. AFAIK, no real-life application uses this mode. +o the ASCII (K_XLATE) mode, in which the application effectively gets the characters as defined by the _k_e_y_m_a_p, using an 8-bit encoding. In this mode, the Ascii_0 to Ascii_9 keymap symbols allow to compose characters by giving their decimal 8bit-code, and Hex_0 to Hex_F do the same with (2-digit) hexadecimal codes. +o the Unicode (K_UNICODE) mode, which at this time only differs from the ASCII mode by allowing the user to compose UTF8 unicode characters by their decimal value, using Ascii_0 to Ascii_9 (who needs that ?), or their hexadecimal (4-digit) value, using Hex_0 to Hex_9. A keymap can be set up to produce UTF8 sequences (with a U+XXXX pseudo-symbol, where each X is an hexadecimal digit), but be warned that these UTF8 sequences will also be produced even in ASCII mode. I think this is a bug in the kernel. BBEE WWAARRNNEEDD that putting the keyboard in RAW or MEDIUMRAW mode will make it unusable for most applications. Use showkey(1) to get a demo of these special modes, or to find out what scancodes/keycodes are produced by a specific key. 55..22.. SSeeee aallssoo keytables(5), setleds(1), setmetamode(1). 66.. UUnnddeerrssttaannddiinngg aanndd sseettttiinngg uupp tthhee ssccrreeeenn ddrriivveerr 66..11.. UUnniiccooddee iiss eevveerryywwhheerree 66..11..11.. SSccrreeeenn FFoonntt MMaappss In recent (as of 1998/08/11) kernels, the screen driver is based on 16-bit unicode (UCS2) encoding, which means that every console-font loaded sshhoouulldd be defined using a _u_n_i_c_o_d_e _S_c_r_e_e_n _F_o_n_t _M_a_p (SFM for short), which tells, for each character in the font, the list of UCS2 characters it will render. (-- SFM's were formerly called ``Unicode Map'', or ``unimap'' for short, but this term should be dropped, as now what they called ``screen maps'' uses Unicode as well: it probably confuses many many people--) 66..11..22.. SSFFMM FFaallllbbaacckk ttaabblleess Starting with release 1997.11.13 of the Linux Console Tools, consolechars(8) now understands _S_F_M _f_a_l_l_b_a_c_k _t_a_b_l_e_s. Before that, SFM's should contain at the same time the Unicode of the characters it was primarily meant to render, as well as any approximations the user would like to. These fallback tables allow to only put the primary mappings in the SFM provided with the font-file, and to _s_e_p_a_r_a_t_e_l_y keep a list telling _`_`_i_f _n_o _g_l_y_p_h _f_o_r _t_h_a_t _c_h_a_r_a_c_t_e_r _i_s _a_v_a_i_l_a_b_l_e _i_n _t_h_e _c_u_r_r_e_n_t _f_o_n_t_, _t_h_e_n _t_r_y _t_o _d_i_s_p_l_a_y _i_t _w_i_t_h _t_h_e _g_l_y_p_h _f_o_r _t_h_i_s _o_n_e_, _o_r _e_l_s_e _t_h_e _o_n_e _f_o_r _t_h_a_t _o_n_e_, _o_r _._._._'_'. This permits to keep in one only place all possible fallbacks, and everyone will be able to choose which fallback tables (s)he wants. Have a look at data/consoletrans/*.fallback for examples. A fallback-table file is made of fallback entries, each entry being on its own line. Empty lines, and lines beginning with the # comment character are ignored. A fallback entry is a series of 2 or more UCS2 codes. The first one is the character for which we want a glyph; the following ones are those whose glyph we want to use when no glyph designed specially for our character is available. The order of the codes defines a priority order (own glyph if available, then second char's, then the third's, etc.) If a SFM was to be loaded, fallback mappings are added to this map before it is loaded. If there was not (ie. a font without SFM was loaded, and no --sfm option was given to consolechars, or the --force- no-sfm option was given), then the current SFM is requested from the kernel, the fallback mappings are added, and the resulting SFM is loaded back into the kernel. Note that each fallback entry is checked against the original SFM, not against the SFM we get by adding former fallback entries to the original SFM (the one read from a file, or given by the kernel); this applies even to entries in different files, and thus the order of -k options has no effect. If you want some entries to be influenced by previous ones, you will have to use different fallback files, and to load them with several consecutive invocations of consolechars -k. 66..22.. TThhee uunniiccooddee ssccrreeeenn--mmooddee There are basically 2 screen-modes (byte mode and UTF mode). The simpler to explain is the UTF mode, in which the bytes received from the application (ie. written to the console screen) are interpreted as UTF8 sequences, which are converted in the ``equivalent UCS2 codes'', and then looked-up in the SFM to determine the glyphs used to display each character. Switching to and from UTF mode is done by sending to the screen the escape sequences %G and %@ respectively. You may use the unicode_start(1) and unicode_stop(1) scripts instead, as they also change the keyboard mode, and let you optionally change the screen- font. Use vt-is-UTF8(1) to find out whether active VT is in UTF mode. 66..33.. TThhee bbyyttee ssccrreeeenn--mmooddee The byte mode is a bit more complicated, as it uses an additional map to transform the byte-characters sent by the application into UCS2 characters, which are then treated as told above. This map I call the Application Charset Map (ACM), because it defines the encoding the application uses, but it used to be called a ``screen map'', or ``console map'' (this comes from the time where the screen driver didn't use Unicode, and there was only one Map down there). Although there is only one ACM active at a given time, there are 4 of them at any time in the kernel; 3 of them are built-in and never change, and they define the IBM codepage 437 (the i386's default, and thus the kernel's default even on other archs), the DEC VT100 charset, and the ISO latin1 charset; the 4th is user-definable, and defaults on boot to the ``straight to font'' mapping, decribed below under ``Special UCS2 codes''. The consolechars(1) command can be used to change the ACM, as well as the font and its associated SFM. 66..33..11.. CChhaarrsseett sslloottss The Linux Console Driver has 2 slots for charsets, labeled _G_0 and _G_1. Each of these slots contains a reference to one of the 4 kernel ACMs, 3 of which are predefined to provide the _c_p_4_3_7, _i_s_o_0_1, and _v_t_1_0_0 _g_r_a_p_h_i_c_s charsets. The 4th one is user-definable; this is the one you can set with consolechars --acm and get with consolechars --old-acm. Versions of the Linux Console Tools prior to 1998.08.11, as well as all versions of kbd at least until 0.96a, were always assuming you wanted to use the G0 slot, pointing to the user-defined ACM. You can now use the charset utility to tune your charset slots. You will note that, although each VT has its own slot settings, there is only one user-defined ACM for use by all the VTs. That is, whereas you can have tty1 using _G_0_=_c_p_4_3_7 and _G_1_=_v_t_1_0_0, at the same time as tty2 using _G_0_=_i_s_o_0_1 and _G_1_=_i_s_o_0_2 (user-defined), you ccaannnnoott have at the same time tty1 using _i_s_o_0_2 and tty2 using _i_s_o_0_3. This is a limitation of the linux kernel. Note that you can emulate such a setting using the filterm utility, with your console in UTF8-mode, by telling filterm to translate screen output on-the-fly to UTF8. You'll find ffiilltteerrmm in the kkoonnwweerrtt package, by Marcin Kowalczyk, which is available from his WWW site . 66..44.. SSppeecciiaall UUCCSS22 ccooddeess There are special UCS2 values you should care about, but the present list is probably not exhaustive: +o codes C from U+F000 to U+F1FF are not looked-up in the SFM, and directly accesses the character in font-position C & 0x01FF (yes, a font can be 512-chars on many hardware platforms, like VGA). This is refered to as the _s_t_r_a_i_g_h_t _t_o _f_o_n_t zone. +o code U+FFFD is the _r_e_p_l_a_c_e_m_e_n_t _c_h_a_r_a_c_t_e_r, usually at font-position 0 in a font. It is displayed by the kernel each time the application requested a unicode character that is not present in the SFM. This allows not only the driver to be safe in Unicode mode, but also prevents displaying invalid characters when the ACM on a particular VT contains characters not in the current font ! 66..55.. AAbboouutt tthhee oolldd 88--bbiitt ````ssccrreeeenn mmaappss'''' There was a time where the kernel didn't know anything about Unicode. In this ancient time, Application Charset Maps were called ``screen maps'', and just mapped the application's characters into font positions. The file format used for these 8bit ACM's is still supported for backward compatibility, but should not be used any more. The old way of using custom ACM's didn't know about unicode, so the ACM had to depend on the font. Now, as each VT chooses its own ACM (from the 4 ones in the kernel at a given time), and as the console- font is common to all VT's, we can use a charset even if the font can't display all of its characters; it will then display the replacement character (U+FFFD). 66..66.. SSeeee aallssoo psfaddtable(1), psfgettable(1), psfstriptable(1), showfont(1). 77.. FFoonntt ffiilleess 77..11.. TThhee ffoorrmmaattss The primary font file format for the Linux Console Tools, as of version 0.2.x, is the PSF format, which is also used by kbd. 0.3.x will introduce the XPSF format, which will be able to replace all existing file formats. Raw fonts can be converted into PSF files with the font2psf(1) (written by Martin Lohner, SuSE GmbH). Versions 0.2.x do not have support for the CP format again - this will come back in the 0.3.x development branch. 77..22.. TToooollss 77..22..11.. FFoonntt--ffiilleess mmaanniippuullaattiioonn ttoooollss The psfaddtable(1), psfgettable(1), and psfstriptable(1) tools are provided by the Linux Console Tools for manipulation of the SFM embedded in PSF files. These are the only font-manipulation tools provided by the Linux Console Tools as of version 0.2.x. The font2psf(1) tool is available in the contrib directory to convert old raw fonts into PSF fonts. There are plans for a more generic font-conversion tool based on libcfont. It will be mostly trivial to write once work on libcfont will be advanced enough. The only way provided by the Linux Console Tools to display a font's contents is to load it, and then to display it using showfont(1). 77..22..22.. FFoonntt eeddiittoorrss I do not curently know of a good font-editor suitable for editing console fonts. I tried fonter, but this one has a bad design flaw: you can only properly edit cp437 fonts (or maybe ASCII-based fonts if you like unreadable screens) because it works on the console and loads the font you are editing. I was told about cse which I did not tried yet. Marcin Kowalczyk is working on the fonty tool (which I did not check yet either), which will help font designers, but is not AFAIK a real editor. Robert de Bath works on his own tools which handle a variety of file formats and table formats. 88.. TThhee lliibbrraarriieess There are several shared libraries installed by the Linux Console Tools. They were at first meant just to share code betwwen the various utilities (kbd has lots of duplicated code), but they could be used as a base to build new tools. However, they are not yet ready for production use (hence the version number 0.0.0), and are absolutely not complete nor coherent at the time. Here is a summary of what they are meant to become: 88..11.. <> lliibbccoonnssoollee,, <> ++ is a meant to be a collection of: +o wrappers around the kernel-level functionnalities, which should be as kernel-version-independant as reasonable; +o higher-level interfaces to these functionnalities. Maybe this goal overlaps with some part of libggi (see ``The future''), but I didn't investigate that for now. 88..22.. lliibbccffoonntt,, <> is meant to provide a high-level interface to console-font file- handling. It also exports the lower-level functions used to construct higher-level ones. It only supports for now some low- to medium-level functions that ease writing programs, but I hope to make it a lot more than that, especially with the coming of the XPSF file-format (see doc/font- formats/xpsf.draft for details). As of release 1998.08.11, implementation of higher-level interface has just started. 88..33.. <> lliibbccttuuttiillss,, <> ++ is a collection of misc utility functions for use by the 2 other libs and by the tools. I hope most this stuff will one day make its way to an existing general purpose utility-library. Any offers welcomed. 99.. TThhee ffuuttuurree ooff tthhee ccoonnssoollee ddrriivveerr aanndd ooff tthhee LLiinnuuxx CCoonnssoollee TToooollss The Linux Console Tools were derived from kbd. It is not a good thing to have two distinct distributions for these tools, so we once hoped we'd manage to finally merge the two packages back, together with Andries Brouwer, who still maintains kbd. However, due to the lack of technical cooperation from kbd's maintainer, and to the growing gap with kbd, this project is now on hold. The driver in 2.2.x kernel has been reworked a lot, and it seems it will continue to evolve in 2.3.x. There are already some new features, such as fonts with width != 8, which will be supported in the future. There is an ongoing project, known as GGI (for General Graphical Interface), which is in the process of, among other things, revolutionarize the way the console is handled. Have a look at their WWW site for details. As far as possible, I will try to keep the Linux Console Tools in sync with what is developped for the kernel, and to what gets added to new releases of kbd but I have to look better at the current state of the GGI project before I give any more info.