Jump to content

Recommended Posts

The OLED 128x64 Bricklet 2.0 has an embedded font. Code points 32 through 126 overlap with ASCII.

Is there a table (somewhere) mapping UTF to the other codepoints, where there is a match? E.g. "½" => 171 ?

Link to comment
Share on other sites

Hi,

The embedded font is Code page 437. For the openHAB bindings I've used the following mapping to convert from UTF-16:

static final List<Integer> CP437 = Arrays.asList(0x0000, 0x263A, 0x263B, 0x2665, 0x2666, 0x2663, 0x2660, 0x2022,
        0x25D8, 0x25CB, 0x25D9, 0x2642, 0x2640, 0x266A, 0x266B, 0x263C, 0x25BA, 0x25C4, 0x2195, 0x203C, 0x00B6,
        0x00A7, 0x25AC, 0x21A8, 0x2191, 0x2193, 0x2192, 0x2190, 0x221F, 0x2194, 0x25B2, 0x25BC, 0x0020, 0x0021,
        0x0022, 0x0023, 0x0024, 0x0025, 0x0026, 0x0027, 0x0028, 0x0029, 0x002A, 0x002B, 0x002C, 0x002D, 0x002E,
        0x002F, 0x0030, 0x0031, 0x0032, 0x0033, 0x0034, 0x0035, 0x0036, 0x0037, 0x0038, 0x0039, 0x003A, 0x003B,
        0x003C, 0x003D, 0x003E, 0x003F, 0x0040, 0x0041, 0x0042, 0x0043, 0x0044, 0x0045, 0x0046, 0x0047, 0x0048,
        0x0049, 0x004A, 0x004B, 0x004C, 0x004D, 0x004E, 0x004F, 0x0050, 0x0051, 0x0052, 0x0053, 0x0054, 0x0055,
        0x0056, 0x0057, 0x0058, 0x0059, 0x005A, 0x005B, 0x005C, 0x005D, 0x005E, 0x005F, 0x0060, 0x0061, 0x0062,
        0x0063, 0x0064, 0x0065, 0x0066, 0x0067, 0x0068, 0x0069, 0x006A, 0x006B, 0x006C, 0x006D, 0x006E, 0x006F,
        0x0070, 0x0071, 0x0072, 0x0073, 0x0074, 0x0075, 0x0076, 0x0077, 0x0078, 0x0079, 0x007A, 0x007B, 0x007C,
        0x007D, 0x007E, 0x2302, 0x00C7, 0x00FC, 0x00E9, 0x00E2, 0x00E4, 0x00E0, 0x00E5, 0x00E7, 0x00EA, 0x00EB,
        0x00E8, 0x00EF, 0x00EE, 0x00EC, 0x00C4, 0x00C5, 0x00C9, 0x00E6, 0x00C6, 0x00F4, 0x00F6, 0x00F2, 0x00FB,
        0x00F9, 0x00FF, 0x00D6, 0x00DC, 0x00A2, 0x00A3, 0x00A5, 0x20A7, 0x0192, 0x00E1, 0x00ED, 0x00F3, 0x00FA,
        0x00F1, 0x00D1, 0x00AA, 0x00BA, 0x00BF, 0x2310, 0x00AC, 0x00BD, 0x00BC, 0x00A1, 0x00AB, 0x00BB, 0x2591,
        0x2592, 0x2593, 0x2502, 0x2524, 0x2561, 0x2562, 0x2556, 0x2555, 0x2563, 0x2551, 0x2557, 0x255D, 0x255C,
        0x255B, 0x2510, 0x2514, 0x2534, 0x252C, 0x251C, 0x2500, 0x253C, 0x255E, 0x255F, 0x255A, 0x2554, 0x2569,
        0x2566, 0x2560, 0x2550, 0x256C, 0x2567, 0x2568, 0x2564, 0x2565, 0x2559, 0x2558, 0x2552, 0x2553, 0x256B,
        0x256A, 0x2518, 0x250C, 0x2588, 0x2584, 0x258C, 0x2590, 0x2580, 0x03B1, 0x00DF, 0x0393, 0x03C0, 0x03A3,
        0x03C3, 0x00B5, 0x03C4, 0x03A6, 0x0398, 0x03A9, 0x03B4, 0x221E, 0x03C6, 0x03B5, 0x2229, 0x2261, 0x00B1,
        0x2265, 0x2264, 0x2320, 0x2321, 0x00F7, 0x2248, 0x00B0, 0x2219, 0x00B7, 0x221A, 0x207F, 0x00B2, 0x25A0,
        0x00A0);

public static String utf16ToCP437(String utf16) {
    StringBuilder result = new StringBuilder();
    utf16.codePoints().map(c -> CP437.indexOf(c)).map(i -> i == -1 ? 0xDB : i)
            .forEach(c -> result.append((char) c));
    return result.toString();
}

The lookup table should work in any language. Many programming language standard libraries can do this conversion. For example in Python:

'test ½'.encode('cp437', 'replace')

will return

b'test \xab'

Using 'replace' will insert encoded '?' chars if a non-encodeable unicode character is encountered.

  • Thanks 1
Link to comment
Share on other sites

Posted (edited)

Genius. This saves me a lot of work. I completely missed that it is the character set of the IBM PC.

In Ruby:

'test ½'.encode('cp437')
=> "test \xAB"

'test ½'.encode('cp437', :replace => '?')
=> "test \xAB"

'test ‹'.encode('cp437', :replace => '?')
=> "test ?"

Thanks!

Edited by Superp
clarify invalid chars in example
Link to comment
Share on other sites

  • 2 weeks later...

...but things are not that simple.

The IBM437 encoding in Ruby (and some other languages) does not include code points 0..31 and 127, which traditionally were control characters like bell and tab.

This means you can either:

  1. Build your own complete lookup table with 256 code points, duplicating the encoding already available on your system, but adding 0..31 and 127. Not dry.
  2. Use the encoding, and miss •, ○, ♫, →, ♥, ⌂ and other useful characters.
  3. Use the encoding, with a fallback table for 0..31 and 127.

I opted to do 3. Commit here.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...