Szathmáry László honlapja @ DEIK | DLang / Loop over a string char by char

If the string is ASCII text

(1) ASCII text
def main(): text = "abc" for c in text: print(c)	import std.stdio; void main() { string text = "abc"; foreach (c; text) // c's type is char { writeln(c); } }

Output:

a
b
c

If the string is Unicode text

(2) problem with Unicode text
def main(): text = "görög" for c in text: print(c)	import std.stdio; void main() { string text = "görög"; foreach (c; text) // c's type is char (1 byte) { writeln(c); } }
g ö r ö g	g � � r � � g

string stores text in UTF-8 format. The character "ö" is stored in 2 bytes. The size of the char type is just 1 byte. With the foreach loop, we iterate over every byte, not every character.

(3) solution for Unicode text
def main(): text = "görög" for c in text: print(c)	import std.stdio; void main() { string text = "görög"; foreach (dchar c; text) // !! notice the dchar type !! { writeln(c); } }

Output in both cases:

g
ö
r
ö
g

dchar is 4-bytes long, thus it can hold any Unicode character. Now the loop iterates over the Unicode characters.

A more explicit solution would be the following:

import std.stdio;
import std.conv;

void main()
{
string text = "görög";
foreach (dchar c; text.to!dstring)
{
writeln(c);
}
}

Here, the UTF-8-encoded string is converted to a UTF-32-encoded string (called dstring in D), in which every character occupies 4 bytes. But, as we saw in the earlier example, this explicit conversion is not necessary.