1 |
1378
|
Luisehahne
|
*******************************************************************************
|
2 |
|
|
* *
|
3 |
|
|
* IDNA Convert (idna_convert.class.php) *
|
4 |
|
|
* *
|
5 |
|
|
* http://idnaconv.phlymail.de mailto:phlymail@phlylabs.de *
|
6 |
|
|
*******************************************************************************
|
7 |
|
|
* (c) 2004-2010 phlyLabs, Berlin *
|
8 |
|
|
* This file is encoded in UTF-8 *
|
9 |
|
|
*******************************************************************************
|
10 |
|
|
|
11 |
|
|
Introduction
|
12 |
|
|
------------
|
13 |
|
|
|
14 |
|
|
The class idna_convert allows to convert internationalized domain names
|
15 |
|
|
(see RFC 3490, 3491, 3492 and 3454 for detials) as they can be used with various
|
16 |
|
|
registries worldwide to be translated between their original (localized) form
|
17 |
|
|
and their encoded form as it will be used in the DNS (Domain Name System).
|
18 |
|
|
|
19 |
|
|
The class provides two public methods, encode() and decode(), which do exactly
|
20 |
|
|
what you would expect them to do. You are allowed to use complete domain names,
|
21 |
|
|
simple strings and complete email addresses as well. That means, that you might
|
22 |
|
|
use any of the following notations:
|
23 |
|
|
|
24 |
|
|
- www.n?rgler.com
|
25 |
|
|
- xn--nrgler-wxa
|
26 |
|
|
- xn--brse-5qa.xn--knrz-1ra.info
|
27 |
|
|
|
28 |
|
|
Errors, incorrectly encoded or invalid strings will lead to either a FALSE
|
29 |
|
|
response (when in strict mode) or to only partially converted strings.
|
30 |
|
|
You can query the occured error by calling the method get_last_error().
|
31 |
|
|
|
32 |
|
|
Unicode strings are expected to be either UTF-8 strings, UCS-4 strings or UCS-4
|
33 |
|
|
arrays. The default format is UTF-8. For setting different encodings, you can
|
34 |
|
|
call the method setParams() - please see the inline documentation for details.
|
35 |
|
|
ACE strings (the Punycode form) are always 7bit ASCII strings.
|
36 |
|
|
|
37 |
|
|
ATTENTION: As of version 0.6.0 this class is written in the OOP style of PHP5.
|
38 |
|
|
Since PHP4 is no longer actively maintained, you should switch to PHP5 as fast as
|
39 |
|
|
possible.
|
40 |
|
|
We expect to see no compatibility issues with the upcoming PHP6, too.
|
41 |
|
|
|
42 |
|
|
ATTENTION: BC break! As of version 0.6.4 the class per default allows the German
|
43 |
|
|
ligature ? to be encoded as the DeNIC, the registry for .DE allows domains
|
44 |
|
|
containing ?.
|
45 |
|
|
In older builds "?" was mapped to "ss". Should you still need this behaviour,
|
46 |
|
|
see example 5 below.
|
47 |
|
|
|
48 |
|
|
|
49 |
|
|
Files
|
50 |
|
|
-----
|
51 |
|
|
idna_convert.class.php - The actual class
|
52 |
|
|
example.php - An example web page for converting
|
53 |
|
|
transcode_wrapper.php - Convert various encodings, see below
|
54 |
|
|
uctc.php - phlyLabs' Unicode Transcoder, see below
|
55 |
|
|
ReadMe.txt - This file
|
56 |
|
|
LICENCE - The LGPL licence file
|
57 |
|
|
|
58 |
|
|
The class is contained in idna_convert.class.php.
|
59 |
|
|
|
60 |
|
|
|
61 |
|
|
Examples
|
62 |
|
|
--------
|
63 |
|
|
1. Say we wish to encode the domain name n?rgler.com:
|
64 |
|
|
|
65 |
|
|
// Include the class
|
66 |
|
|
require_once('idna_convert.class.php');
|
67 |
|
|
// Instantiate it
|
68 |
|
|
$IDN = new idna_convert();
|
69 |
|
|
// The input string, if input is not UTF-8 or UCS-4, it must be converted before
|
70 |
|
|
$input = utf8_encode('n?rgler.com');
|
71 |
|
|
// Encode it to its punycode presentation
|
72 |
|
|
$output = $IDN->encode($input);
|
73 |
|
|
// Output, what we got now
|
74 |
|
|
echo $output; // This will read: xn--nrgler-wxa.com
|
75 |
|
|
|
76 |
|
|
|
77 |
|
|
2. We received an email from a punycoded domain and are willing to learn, how
|
78 |
|
|
the domain name reads originally
|
79 |
|
|
|
80 |
|
|
// Include the class
|
81 |
|
|
require_once('idna_convert.class.php');
|
82 |
|
|
// Instantiate it
|
83 |
|
|
$IDN = new idna_convert();
|
84 |
|
|
// The input string
|
85 |
|
|
$input = 'andre@xn--brse-5qa.xn--knrz-1ra.info';
|
86 |
|
|
// Encode it to its punycode presentation
|
87 |
|
|
$output = $IDN->decode($input);
|
88 |
|
|
// Output, what we got now, if output should be in a format different to UTF-8
|
89 |
|
|
// or UCS-4, you will have to convert it before outputting it
|
90 |
|
|
echo utf8_decode($output); // This will read: andre@b?rse.kn?rz.info
|
91 |
|
|
|
92 |
|
|
|
93 |
|
|
3. The input is read from a UCS-4 coded file and encoded line by line. By
|
94 |
|
|
appending the optional second parameter we tell enode() about the input
|
95 |
|
|
format to be used
|
96 |
|
|
|
97 |
|
|
// Include the class
|
98 |
|
|
require_once('idna_convert.class.php');
|
99 |
|
|
// Instantiate it
|
100 |
|
|
$IDN = new dinca_convert();
|
101 |
|
|
// Iterate through the input file line by line
|
102 |
|
|
foreach (file('ucs4-domains.txt') as $line) {
|
103 |
|
|
echo $IDN->encode(trim($line), 'ucs4_string');
|
104 |
|
|
echo "\n";
|
105 |
|
|
}
|
106 |
|
|
|
107 |
|
|
|
108 |
|
|
4. We wish to convert a whole URI into the IDNA form, but leave the path or
|
109 |
|
|
query string component of it alone. Just using encode() would lead to mangled
|
110 |
|
|
paths or query strings. Here the public method encode_uri() comes into play:
|
111 |
|
|
|
112 |
|
|
// Include the class
|
113 |
|
|
require_once('idna_convert.class.php');
|
114 |
|
|
// Instantiate it
|
115 |
|
|
$IDN = new idna_convert();
|
116 |
|
|
// The input string, a whole URI in UTF-8 (!)
|
117 |
|
|
$input = 'http://n?rgler:secret@n?rgler.com/my_p?th_is_not_?SCII/');
|
118 |
|
|
// Encode it to its punycode presentation
|
119 |
|
|
$output = $IDN->encode_uri($input);
|
120 |
|
|
// Output, what we got now
|
121 |
|
|
echo $output; // http://n?rgler:secret@xn--nrgler-wxa.com/my_p?th_is_not_?SCII/
|
122 |
|
|
|
123 |
|
|
|
124 |
|
|
5. Since per default this class does no longer map "?" to "ss", we wish to enforce
|
125 |
|
|
the mapping anyway. Thus we need to pass a parameter to the constructor:
|
126 |
|
|
|
127 |
|
|
// Include the class
|
128 |
|
|
require_once('idna_convert.class.php');
|
129 |
|
|
// Instantiate it
|
130 |
|
|
$IDN = new idna_convert(array('encode_german_sz' => false));
|
131 |
|
|
// Sth. containing the German letter ?
|
132 |
|
|
$input = 'meine-stra?e.de');
|
133 |
|
|
// Encode it to its punycode presentation
|
134 |
|
|
$output = $IDN->encode_uri($input);
|
135 |
|
|
// Output, what we got now
|
136 |
|
|
echo $output; // meine-strasse.de
|
137 |
|
|
|
138 |
|
|
|
139 |
|
|
|
140 |
|
|
Transcode wrapper
|
141 |
|
|
-----------------
|
142 |
|
|
In case you have strings in different encoding than ISO-8859-1 and UTF-8 you might need to
|
143 |
|
|
translate these strings to UTF-8 before feeding the IDNA converter with it.
|
144 |
|
|
PHP's built in functions utf8_encode() and utf8_decode() can only deal with ISO-8859-1.
|
145 |
|
|
Use the file transcode_wrapper.php for the conversion. It requires either iconv, libiconv
|
146 |
|
|
or mbstring installed together with one of the relevant PHP extensions.
|
147 |
|
|
The functions you will find useful are
|
148 |
|
|
encode_utf8() as a replacement for utf8_encode() and
|
149 |
|
|
decode_utf8() as a replacement for utf8_decode().
|
150 |
|
|
|
151 |
|
|
Example usage:
|
152 |
|
|
<?php
|
153 |
|
|
require_once('idna_convert.class.php');
|
154 |
|
|
require_once('transcode_wrapper.php');
|
155 |
|
|
$mystring = '<something in e.g. ISO-8859-15';
|
156 |
|
|
$mystring = encode_utf8($mystring, 'ISO-8859-15');
|
157 |
|
|
echo $IDN->encode($mystring);
|
158 |
|
|
?>
|
159 |
|
|
|
160 |
|
|
|
161 |
|
|
UCTC - Unicode Transcoder
|
162 |
|
|
-------------------------
|
163 |
|
|
Another class you might find useful when dealing with one or more of the Unicode encoding
|
164 |
|
|
flavours. The class is static, it requires PHP5. It can transcode into each other:
|
165 |
|
|
- UCS-4 string / array
|
166 |
|
|
- UTF-8
|
167 |
|
|
- UTF-7
|
168 |
|
|
- UTF-7 IMAP (modified UTF-7)
|
169 |
|
|
All encodings expect / return a string in the given format, with one major exception:
|
170 |
|
|
UCS-4 array is jsut an array, where each value represents one codepoint in the string, i.e.
|
171 |
|
|
every value is a 32bit integer value.
|
172 |
|
|
|
173 |
|
|
Example usage:
|
174 |
|
|
<?php
|
175 |
|
|
require_once('uctc.php');
|
176 |
|
|
$mystring = 'n?rgler.com';
|
177 |
|
|
echo uctc::convert($mystring, 'utf8', 'utf7imap');
|
178 |
|
|
?>
|
179 |
|
|
|
180 |
|
|
|
181 |
|
|
Contact us
|
182 |
|
|
----------
|
183 |
|
|
In case of errors, bugs, questions, wishes, please don't hesitate to contact us
|
184 |
|
|
under the email address above.
|
185 |
|
|
|
186 |
|
|
The team of phlyLabs
|
187 |
|
|
http://phlylabs.de
|
188 |
|
|
mailto:phlymail@phlylabs.de
|