...making Linux just a little more fun!

<-- prev | next -->

2-cent Tips

By Kat Tanaka Okopnik

2-Cent Tips

2-cent Tip: Unicode conversion
2-cent Tips: Same prompt different term

2-cent Tip: Unicode conversion

Benjamin A. Okopnik ([ben at linuxgazette.net])
Mon, 4 Sep 2006 15:27:16 -0400

A couple of years ago, I decided to stop wrestling with what I call "encoding craziness" for various bits of non-English text that I have scattered around my file system. Russian, for example, has at least four different encodings that I've run into - and guessing which one a given text file was written in was like a game of darts played in the dark. At 300 yards. With your hands tied behind you, so you had to use your toes. Oh, and while you were severely drunk on Stoli vodka. :) UTF-8 (Unicode) allowed me to, well, unify all of that into one single encoding that was readable without scrambling for whichever character set I needed (and may or may not have installed.) Better yet, Unicode usually displays just fine in HTML browsers - no special entity encoding is required.

For some reason, though, good converters appear to be something of a black art - and finding one that works, as opposed to all those that claim to work, was rather frustrating. Therefore, I decided to write one in my favorite language, Perl - only to find that the job has already been done for me, via the 'encoding' pragma. In other words, conversion from, say, KOI8-R to UTF-8 is no more complex than this:

# Convert and write to a file
perl -Mencoding=koi8r,STDOUT,utf8 -pe0 < file.koi8r > file.utf8
# Or just display it in a pager:
perl -Mencoding=koi8r,STDOUT,utf8 -pe0 < file.koi8r|less
It is literally that simple. Pretty much every encoding you can imagine is available (see 'perldoc Encode::Supported' for the naming conventions and charsets). The conversion does not have to be to UTF-8 - it'll do any of the listed charsets - but why would you care? :)

# Print the Kanji for 'Rakuda' (Camel) from multibyte strings:
perl -Mencoding=euc-jp,STDOUT,utf-8 -wle'print "Follow the
Follow the 駱駝!
# Or you can do it in Hiragana, but using Unicode values instead:
perl -Mencoding=shift-jis,STDOUT,utf8 -wle'print "Follow the
Follow the らくだ!
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *

[ Discussion continued (10 messages/38.02kB) ]

2-cent Tips: Same prompt different term

Andrew Elian ([a_elian at sympatico.ca])
Wed, 25 Oct 2006 14:18:37 -0400


Here's a quick tidbit to help the PS1 variable do the right thing depending on the terminal - X or otherwise. I've added these lines to my .bash_profile and found them useful:

case $TERM in
                export TERM=xterm-color
                export PROMPT_COMMAND='echo -ne "\033]0;${USER}:${PWD/#$HOME/~}\007"'
                export PS1="$ "

                export PROMPT_COMMAND='echo -ne "\033]0;${USER}:${PWD/#$HOME/~}\007"'
                export PS1="$ "

                export PS1="\[\033[0;32m\]\u \[\033[1;32m\]\W]\[\033[0;32m\] "
Sincerely, Andrew

[ Discussion continued (5 messages/4.85kB) ]

Talkback: Discuss this article with The Answer Gang

Bio picture

Kat likes to tell people she's one of the youngest people to have learned to program using punchcards on a mainframe (back in '83); but the truth is that since then, despite many hours in front of various computer screens, she's a computer user rather than a computer programmer.

When away from the keyboard, her hands have been found full of knitting needles, various pens, henna, red-hot welding tools, upholsterer's shears, and a pneumatic scaler.

Copyright © 2006, Kat Tanaka Okopnik. Released under the Open Publication license unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 132 of Linux Gazette, November 2006

<-- prev | next -->