Unicode Quick Guide

From Digital Scholarship Group
Revision as of 13:05, 25 May 2015 by Sconnell (talk | contribs) (Created page with "Unicode is a way of representing all characters in all human languages using numeric codes; you can access any character in any language using its code in Oxygen. Unicode Hex ...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Unicode is a way of representing all characters in all human languages using numeric codes; you can access any character in any language using its code in Oxygen. Unicode Hex is how you access a character by its code, even if your keyboard doesn’t have that character.

For Mac users, it’s very easy to set yourself up to enter characters using Unicode:

  • Go to System Preferences
  • Choose “Keyboard”
  • Select “Input Sources” from the options at the top
  • Hit the + in the bottom-left of the screen; this will give you the option to add keyboard input sources
  • A long list of options will come up. The one you want is “Unicode Hex Input”; select that one and choose “Add”
  • Now, you should be able to choose between the default input (U.S.; indicated by an American flag) and Unicode Hex (indicated by a black box with “U+”); you switch between these in the top-right bar in your computer (where the date, wifi signal, volume, etc. display)
  • As long as you have Unicode Hex Input selected, you can hold down the Option key and type the four digit code for any unicode character and it will appear in your text.

If you don’t know the code for a character you need (for example, you might want to have an é or something in the Greek alphabet), you can use the UnicodeChecker software (on the encoding computers already and you can download it on your own), or look up what you need online. Doing a Google search with “unicode” prefacing the name of character you need will almost always work and Google is usually forgiving if you don’t know the exact name of the character you’re looking for. The Unicode Consortium site also has links to code charts. And, remember that the WWP doesn’t include many of the ligatures that you might encounter in your text: see the entry on typography and special characters

Here are some common codes for WWP encoding:

long s U+017F ſ
em dash U+2014
en dash U+2013
super dash U+2015
left curly quotes U+201C
right curly quotes U+201D
soft hyphen U+00AD (looks like a regular hyphen)
lowercase ae ligature U+00E6 æ
uppercase ae ligature U+00C6 Æ
lowercase oe ligature U+0153 œ
uppercase oe ligature U+0152 Œ