Project

General

Profile

1
*******************************************************************************
2
*                                                                             *
3
*                    IDNA Convert (idna_convert.class.php)                    *
4
*                                                                             *
5
* http://idnaconv.phlymail.de                     mailto:phlymail@phlylabs.de *
6
*******************************************************************************
7
* (c) 2004-2010 phlyLabs, Berlin                                              *
8
* This file is encoded in UTF-8                                               *
9
*******************************************************************************
10

    
11
Introduction
12
------------
13

    
14
The class idna_convert allows to convert internationalized domain names
15
(see RFC 3490, 3491, 3492 and 3454 for detials) as they can be used with various
16
registries worldwide to be translated between their original (localized) form
17
and their encoded form as it will be used in the DNS (Domain Name System).
18

    
19
The class provides two public methods, encode() and decode(), which do exactly
20
what you would expect them to do. You are allowed to use complete domain names,
21
simple strings and complete email addresses as well. That means, that you might
22
use any of the following notations:
23

    
24
- www.n?rgler.com
25
- xn--nrgler-wxa
26
- xn--brse-5qa.xn--knrz-1ra.info
27

    
28
Errors, incorrectly encoded or invalid strings will lead to either a FALSE
29
response (when in strict mode) or to only partially converted strings.
30
You can query the occured error by calling the method get_last_error().
31

    
32
Unicode strings are expected to be either UTF-8 strings, UCS-4 strings or UCS-4
33
arrays. The default format is UTF-8. For setting different encodings, you can
34
call the method setParams() - please see the inline documentation for details.
35
ACE strings (the Punycode form) are always 7bit ASCII strings.
36

    
37
ATTENTION: As of version 0.6.0 this class is written in the OOP style of PHP5.
38
Since PHP4 is no longer actively maintained, you should switch to PHP5 as fast as
39
possible.
40
We expect to see no compatibility issues with the upcoming PHP6, too.
41

    
42
ATTENTION: BC break! As of version 0.6.4 the class per default allows the German
43
ligature ? to be encoded as the DeNIC, the registry for .DE allows domains
44
containing ?.
45
In older builds "?" was mapped to "ss". Should you still need this behaviour,
46
see example 5 below.
47

    
48

    
49
Files
50
-----
51
idna_convert.class.php         - The actual class
52
example.php                    - An example web page for converting
53
transcode_wrapper.php          - Convert various encodings, see below
54
uctc.php                       - phlyLabs' Unicode Transcoder, see below
55
ReadMe.txt                     - This file
56
LICENCE                        - The LGPL licence file
57

    
58
The class is contained in idna_convert.class.php.
59

    
60

    
61
Examples
62
--------
63
1. Say we wish to encode the domain name n?rgler.com:
64

    
65
// Include the class
66
require_once('idna_convert.class.php');
67
// Instantiate it
68
$IDN = new idna_convert();
69
// The input string, if input is not UTF-8 or UCS-4, it must be converted before
70
$input = utf8_encode('n?rgler.com');
71
// Encode it to its punycode presentation
72
$output = $IDN->encode($input);
73
// Output, what we got now
74
echo $output; // This will read: xn--nrgler-wxa.com
75

    
76

    
77
2. We received an email from a punycoded domain and are willing to learn, how
78
   the domain name reads originally
79

    
80
// Include the class
81
require_once('idna_convert.class.php');
82
// Instantiate it
83
$IDN = new idna_convert();
84
// The input string
85
$input = 'andre@xn--brse-5qa.xn--knrz-1ra.info';
86
// Encode it to its punycode presentation
87
$output = $IDN->decode($input);
88
// Output, what we got now, if output should be in a format different to UTF-8
89
// or UCS-4, you will have to convert it before outputting it
90
echo utf8_decode($output); // This will read: andre@b?rse.kn?rz.info
91

    
92

    
93
3. The input is read from a UCS-4 coded file and encoded line by line. By
94
   appending the optional second parameter we tell enode() about the input
95
   format to be used
96

    
97
// Include the class
98
require_once('idna_convert.class.php');
99
// Instantiate it
100
$IDN = new dinca_convert();
101
// Iterate through the input file line by line
102
foreach (file('ucs4-domains.txt') as $line) {
103
    echo $IDN->encode(trim($line), 'ucs4_string');
104
    echo "\n";
105
}
106

    
107

    
108
4. We wish to convert a whole URI into the IDNA form, but leave the path or
109
   query string component of it alone. Just using encode() would lead to mangled
110
   paths or query strings. Here the public method encode_uri() comes into play:
111

    
112
// Include the class
113
require_once('idna_convert.class.php');
114
// Instantiate it
115
$IDN = new idna_convert();
116
// The input string, a whole URI in UTF-8 (!)
117
$input = 'http://n?rgler:secret@n?rgler.com/my_p?th_is_not_?SCII/');
118
// Encode it to its punycode presentation
119
$output = $IDN->encode_uri($input);
120
// Output, what we got now
121
echo $output; // http://n?rgler:secret@xn--nrgler-wxa.com/my_p?th_is_not_?SCII/
122

    
123

    
124
5. Since per default this class does no longer map "?" to "ss", we wish to enforce
125
   the mapping anyway. Thus we need to pass a parameter to the constructor:
126

    
127
// Include the class
128
require_once('idna_convert.class.php');
129
// Instantiate it
130
$IDN = new idna_convert(array('encode_german_sz' => false));
131
// Sth. containing the German letter ?
132
$input = 'meine-stra?e.de');
133
// Encode it to its punycode presentation
134
$output = $IDN->encode_uri($input);
135
// Output, what we got now
136
echo $output; // meine-strasse.de
137

    
138

    
139

    
140
Transcode wrapper
141
-----------------
142
In case you have strings in different encoding than ISO-8859-1 and UTF-8 you might need to
143
translate these strings to UTF-8 before feeding the IDNA converter with it.
144
PHP's built in functions utf8_encode() and utf8_decode() can only deal with ISO-8859-1.
145
Use the file transcode_wrapper.php for the conversion. It requires either iconv, libiconv
146
or mbstring installed together with one of the relevant PHP extensions.
147
The functions you will find useful are
148
encode_utf8() as a replacement for utf8_encode() and
149
decode_utf8() as a replacement for utf8_decode().
150

    
151
Example usage:
152
<?php
153
require_once('idna_convert.class.php');
154
require_once('transcode_wrapper.php');
155
$mystring = '<something in e.g. ISO-8859-15';
156
$mystring = encode_utf8($mystring, 'ISO-8859-15');
157
echo $IDN->encode($mystring);
158
?>
159

    
160

    
161
UCTC - Unicode Transcoder
162
-------------------------
163
Another class you might find useful when dealing with one or more of the Unicode encoding
164
flavours. The class is static, it requires PHP5. It can transcode into each other:
165
- UCS-4 string / array
166
- UTF-8
167
- UTF-7
168
- UTF-7 IMAP (modified UTF-7)
169
All encodings expect / return a string in the given format, with one major exception:
170
UCS-4 array is jsut an array, where each value represents one codepoint in the string, i.e.
171
every value is a 32bit integer value.
172

    
173
Example usage:
174
<?php
175
require_once('uctc.php');
176
$mystring = 'n?rgler.com';
177
echo uctc::convert($mystring, 'utf8', 'utf7imap');
178
?>
179

    
180

    
181
Contact us
182
----------
183
In case of errors, bugs, questions, wishes, please don't hesitate to contact us
184
under the email address above.
185

    
186
The team of phlyLabs
187
http://phlylabs.de
188
mailto:phlymail@phlylabs.de
(2-2/4)