What is language setting and Unicode concept of Character producing and Character Encoding?

Thu, 22/10/2009 – 13:04 — Admin

The world of encoding is changing as the rapid development of computer based technology requires more efficient applications. The body of international standardisation is trying to incorporate many of first generations of character encoding such as ISO-8859-1 into a complete broader encoding set of ISO-10646 which will match better with Unicode. The Unicode is a computing industry standard allowing computers to consistently represent and manipulate text expressed in most of the world’s writing systems.

ISO 8859 is a set of 10 different 256-character sets used to represent a large set of the alphabetic languages used in the West. These sets were designed by the standards group ECMA (European Computer Manufacturer’s Association,) and are included in the Internet charset register for use with MIME identification.

Why is ISO 8859 important you might ask? The ISO 8859-1 (also called ISO-Latin) character set is the one used for HTTP (the transport protocol for web documents) and is also used in the creation of HTML documents. This character set contains all characters necessary to type all major West European languages and is also the preferred encoding on the Internet. The following languages are supported under the ISO 8859-1 character set:

Afrikaans, Basque, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, Galician, German, Icelandic, Irish, Italian, Kurdish (Yekgirtú/Yekgirtí), Norwegian, Portuguese, Spanish, Swedish

The new international standard of ISO-10646 is a simple character map, an extension of previous standards like ISO-8859. In contrast, Unicode adds rules for collation, normalization of forms, and the bidirectional algorithm for scripts like Hebrew and Arabic base scripts. For interoperability between platforms, especially if bidirectional scripts are used, it is not enough to support ISO-10646; Unicode must be implemented.

The rapid advance of Internet technology and electronic based communication as well as multi task search engines have brought to live an unlimited experience in the languages of world. The Unicode has the explicit aim of transcending the limitations of traditional character encodings, such as those defined by the ISO 8859 standard, which find wide usage in various countries of the world but remain largely incompatible with each other. In text processing, Unicode takes the role of providing a unique number called code point — a number, not a glyph — for each character. In other words, Unicode represents a character in an abstract way and leaves the visual rendering (size, shape, font or style) to other software, such as a web browser or word processor. The first 256 code points of Unicode (U+0000 to U+00FF) have been made identical to the content of ISO 8859-1 to make it trivial to convert existing western including Kurdish Yekgirtú/Yekgirtí text.

ISO 10646 and Unicode are both generally encompassed by the term Unicode. The computer’s software uses the code point to look up the appropriate character in the font file, so the characters can be displayed on the page or screen. To produce these code point the computer uses predefined keyboards and language writing systems settings. Different ISO/IEC standards are defined to deal with each individual writing systems to implement them in computers.

A keyboard/keypad device is the device most commonly used for writing via computer or mobile phones as well as any data entry devices. Each key is associated with a standard code which the keyboard sends to the computer when it is pressed. By using a combination of alphabetic keys with modifier keys such as Ctrl, Alt, Shift and AltGr, various character codes are generated and sent to the CPU. The operating system intercepts and converts those signals to the appropriate characters based on the keyboard layout and input method, and then delivers those converted codes and characters to the running application software, which in turn looks up the appropriate number/glyph in the currently used font file, and requests the operating system to draw these on the screen. The produced code point is readable by any application supporting the Unicode.

Currently the Kurdish writing systems Kurdo-Latin (Hawar version) or Kurdo-Arabic based are not represented with any international recognised standard keyboards. Many users are forced to install specially designed keyboards on their systems to enable them for writing in Kurdish. The installation process happen on computers with administration privilege. In other word users should be the owner of the computer. The unique selection of Kurdish Unified writing system enables the user to write Kurdish on any platform or keyboard to produce Kurdish unified code points. Such a freedom in experiencing writing in Kurdish will allow for wider workability of Kurdish by anyone, anywhere, at any time.

Representation has never been the objective of our work but unlimited experiment in writing Kurdish without any Engineering effort. If Kurds want to they can write their language with Chinese choreography but all comes to feasibility, workability and future of a deprived language which is limited in all aspects. These can be explain in self thought users flexibility to local measures in identifying their Kurdishness in the way their dialect is written.