Angående "å" etc, från HTML 3.0-specifikationen:
Character sets
The charset parameter (as defined in section 7.1.1 of RFC 1521)
may be used with the text/html content type to specify the
encoding used to represent the HTML document as a sequence of
bytes. Normally, text/* media types specify a default of
US-ASCII for the charset parameter. However, for text/html, if
* the byte stream contains data that is not in the 7-bit US-ASCII
* set, the HTML interpreting agent should assume a default
* charset of ISO-8859-1.
Läs speciellt de sista tre raderna. Där står klart och tydligt att det är
ISO 8859-1 som gäller som textstandard i HTML-dokument.
Angående "~" i URL:er, från HTTP-1.0 specifikationen:
3.2.1 General Syntax
URIs in HTTP/1.0 can be represented in absolute form or relative to
some known base URI [9], depending upon the context of their use. The
two forms are differentiated by the fact that absolute URIs always
begin with a scheme name followed by a colon.
URI = ( absoluteURI | relativeURI ) [ "#" fragment ]
absoluteURI = scheme ":" *( uchar | reserved )
relativeURI = net_path | abs_path | rel_path
net_path = "//" net_loc [ abs_path ]
abs_path = "/" rel_path
rel_path = [ path ] [ ";" params ] [ "?" query ]
path = fsegment *( "/" segment )
fsegment = 1*pchar
segment = *pchar
params = param *( ";" param )
param = *( pchar | "/" )
scheme = 1*( ALPHA | DIGIT | "+" | "-" | "." )
net_loc = *( pchar | ";" | "?" )
query = *( uchar | reserved )
fragment = *( uchar | reserved )
pchar = uchar | ":" | "@" | "&" | "="
uchar = unreserved | escape
unreserved = ALPHA | DIGIT | safe | extra | national
escape = "%" hex hex
hex = "A" | "B" | "C" | "D" | "E" | "F"
| "a" | "b" | "c" | "d" | "e" | "f" | DIGIT
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "="
safe = "$" | "-" | "_" | "." | "+"
extra = "!" | "*" | "'" | "(" | ")" | ","
* national =
* For definitive information on URL syntax and semantics, see RFC 1738
* [4] and RFC 1808 [9]. The BNF above includes national characters not
* allowed in valid URLs as specified by RFC 1738, since HTTP servers are
* not restricted in the set of unreserved characters allowed to
* represent the rel_path part of addresses, and HTTP proxies may receive
* requests for URIs not defined by RFC 1738.
Se speciellt de sista (markerade med "*") raderna. Där står klart och
tydligt att tex ISO 8859-1 ÅÄÖ~ _är_ tillåtna i URL:er eftersom de
ingår i "national", och de ingår inte i "reserved". Och det står också
klart och tydligt att detta skiljer sig från RFC:ernas definition av
URL:er.