How to break lines in Asian languages

Frame 14111

When we write texts in Russian or English, we don’t even think about how to correctly break, or split, words over lines. Moreover, we usually don’t even notice that it is happening. The text adapts itself to the size of the window: you type and the words break to the next line themselves. In most programs, this function is called «text-wrapping». 

But, what happens with texts in Asian languages, where we are not using letters but characters, which are less familiar to us? We’re going to explain how to wrap text in Chinese, Japanese, Thai, and Korean so that a native speaker can read easily the text. 

How computers break lines

When dealing with texts written in European languages, the computer easily wraps words and it’s not a problem for a human to check for correctness. This is possible because words in Western languages are separated by spaces. To only break lines where there are spaces is enough to keep words whole and texts readable.

Incorrect line break of the word “achievements” in the English version of the text

When dealing with texts in characters, it is a lot more difficult to understand if the text wrapping is correct or not — especially if you are not a native speaker. It is almost impossible to understand if everything in the text is in its right place. 

Can you determine whether the text wrapping is correct or not in this text? 

It is pretty normal to have difficulties with text wrapping in Asian languages. After all, it is an unfamiliar written format. Even so, in order to avoid mistakes, it is important to study in detail the specific nature of line brealing in this group of languages. Let’s have a look together. 

Korean

This is the simplest of the Asian languages to work with. In Korean, words are separated by spaces, which makes it easier to identify places for potential line breaks. It’s not even a mistake to have a break in the middle of a word, with the word’s characters spread over two lines. However, you must remember not to separate a word from the punctuation mark that follows it. 

There are some exceptions to these rules, though. For example, some game stores require to place line breaks only between words, claiming this improves readability. For the same reason, it’s better not to leave an orphaned Korean character on a separate line.

Incorrect line break
Сorrect line break

Chinese 

If your software uses spaces as a guide to wrap texts, you will encounter problems. The thing is, Chinese doesn’t have spaces. The text on the page is arranged in unbroken lines. The smallest unit in a line is not a word but a character. 

We can use the spaces and punctuation in texts in Russian or English as a guide to wrap the text correctly. In Chinese texts, however, we need a dictionary and an understanding of the context. There are no spaces, as we mentioned above. 

A word can comprise one, two, three or more characters. If you don’t know the content, it is pretty difficult to figure out where you can split characters and yet maintain the correct meaning of the text. But it’s not too much of a problem because Chinese doesn’t have strict rules about wrapping text by the word. 

There are some rules concerning punctuation, however: 

  • No periods, commas, closing parentheses, exclamation points, or question marks at the beginning of a line;
  • No opening parentheses and quotation marks or currency characters at the end of a line; 
  • Double punctuation marks, e.g. the traditional Chinese double dashes or ellipses, should not be split up;
  • Numerals with prefixes and postfixes must be kept together.

Although there are no strict rules pertaining to text wrapping in Chinese, the line breaks have a tangible effect on the readability and aesthetics of the text. Therefore, it is worth wrapping Chinese text over lines according to the meaning, for marketing texts at least. 

At the very least:  

  • Do not split key words, names, and titles;
  • Do not leave emphatic reinforcing particles dangling.

This can be done manually, if you have someone who speaks Chinese and can place line breaks in the text, or automatically using special programs. As a rule, such kind of texts are written by the companies themselves with account for the features of the language.

Japanese

In Japanese, words are not separated by spaces either. Like in Chinese, the text is not wrapped by word but by character. This means you’ll encounter similar problems but also similar solutions.  

It is worth noting that Japanese pays more attention to word boundaries when breaking lines than Chinese. This could be because of differences in the grammar. In Chinese, words almost never change, whereas Japanese has suffixes that carry grammatical meaning. Splitting them from the root word can make comprehension difficult, so in Japanese texts it is more important to break lines correctly to keep the text comprehensible.

It’s difficult to write a program that is good at auto-line breaking Japanese text. This would require more not only a dictionary but the prescribed rules related to possible word changes and the presence of suffixes. 

If you don’t have the chance to ask a native speaker to check the line breaks, the best solution is use the standard method: by character. The text is likely to be comprehensible even if there are aesthetic issues. 

However, it is important that you follow the same rules as in Chinese: for example, lines should not begin with periods or end with opening parentheses. 

Thai

There are no simple solutions for Thai that provide adequate results. The text-wrapping rules are strict: text should be wrapped by the word. But it isn’t easy to find where a word begins and ends. Although Thai does use spaces, they are not between words but between key parts of the sentence. 

This means that neither the European standard of text-wrapping by word nor the East Asian standard of text-wrapping by character will work here. It is almost certain that the Thai localization will not display correctly if you do not designate specific text-wrapping points. To do this, special characters and tags have to be used. 

The Laotian and Khmer languages have the same features. The best solution in this case is to use an automatic tokenizer and then have it checked by a native speaker.

Stay tuned for more about the features of translating into different languages, interviews with industry professionals and real projects localization cases. Find out more in our blog.

Leave a Reply

Your email address will not be published. Required fields are marked *

Share
Share
Tweet