See also std.conv.
If the string is ASCII text
(1) ASCII text
|
def main():
text = "abc"
for c in text:
print(c)
|
import std.stdio;
void main()
{
string text = "abc";
foreach (c; text) // c's type is char
{
writeln(c);
}
}
|
Output:
If the string is Unicode text
(2) problem with Unicode text
|
def main():
text = "görög"
for c in text:
print(c)
|
import std.stdio;
void main()
{
string text = "görög";
foreach (c; text) // c's type is char (1 byte)
{
writeln(c);
}
}
|
g
ö
r
ö
g
|
g
�
�
r
�
�
g
|
string
stores text in UTF-8 format. The character "ö" is stored in 2 bytes. The size of the char
type is just 1 byte. With the foreach loop, we iterate over every byte, not every character.
(3) solution for Unicode text
|
def main():
text = "görög"
for c in text:
print(c)
|
import std.stdio;
void main()
{
string text = "görög";
foreach (dchar c; text) // !! notice the dchar type !!
{
writeln(c);
}
}
|
Output in both cases:
dchar
is 4-bytes long, thus it can hold any Unicode character. Now the loop iterates over the Unicode characters.
A more explicit solution would be the following:
import std.stdio;
import std.conv;
void main()
{
string text = "görög";
foreach (dchar c; text.to!dstring)
{
writeln(c);
}
}
Here, the UTF-8-encoded string is converted to a UTF-32-encoded string (called dstring
in D), in which every character occupies 4 bytes. But, as we saw in the earlier example, this explicit conversion is not necessary.