Unicode Quick Guide

From Digital Scholarship Group
Jump to navigation Jump to search

Unicode is a way of representing all characters in all human languages using numeric codes; you can access any character in any language using its code in Oxygen. Unicode Hex is how you access a character by its code, even if your keyboard doesn’t have that character.

For Mac users, it’s very easy to set yourself up to enter characters using Unicode:

  • Go to System Preferences
  • Choose “Keyboard”
  • Select “Input Sources” from the options at the top
  • Hit the + in the bottom-left of the screen; this will give you the option to add keyboard input sources
  • A long list of options will come up. The one you want is “Unicode Hex Input”; select that one and choose “Add”
  • Now, you should be able to choose between the default input (U.S.; indicated by an American flag) and Unicode Hex (indicated by a black box with “U+”); you switch between these in the top-right bar in your computer (where the date, wifi signal, volume, etc. display)
  • As long as you have Unicode Hex Input selected, you can hold down the Option key and type the four digit code for any unicode character and it will appear in your text.

For PC users, you can enter Unicode characters with Oxygen's built-in character insertion option. Go to "Edit" and then "Insert from Character Map." You can search for the character you want by description or by its Unicode code point. When you've found the character, hit "Insert" and it will be inserted into your text wherever your cursor is.

If you don’t know the code for a character you need (for example, you might want to have an é or something in the Greek alphabet), you can use the UnicodeChecker software (on the encoding computers already and you can download it on your own), or look up what you need online. Doing a Google search with “unicode” prefacing the name of character you need will almost always work and Google is usually forgiving if you don’t know the exact name of the character you’re looking for. The Unicode Consortium site also has links to code charts. And, remember that the WWP doesn’t include many of the ligatures that you might encounter in your text: see the entry on typography and special characters

Here are some common codes for WWP encoding:

colloquial name code point character
 long s U+017F ſ
 em dash U+2014
 en dash U+2013
 super dash U+2015
 left double curly quote U+201C
 right double curly quote U+201D
 left single curly quote U+2018
 right single curly quote U+2019
 soft hyphen U+00AD (looks like a regular hyphen)
 a-e ligature U+00E6 æ
 A-E ligature U+00C6 Æ
 o-e ligature U+0153 œ
 O-E ligature U+0152 Œ
 e with acute U+00E9 é
 a with macron U+0101 ā
 e with macron U+0113 ē
 o with macron U+014D ō
 u with macron U+016B ū
 o with circumflex U+00F4 ô
 u with circumflex U+00FB û
 e with diaeresis U+00EB ë
 section symbol U+00A7 §
 eszett U+00DF ß
 pilcrow U+00B6

At the moment Syd also has both a sortable table of the top 20 characters and a sortable table of all characters in the textbase. Note that your pop-up blocker or ad blocker may prevent sorting, and that these pages are subject to change without notice.