encoding - Why is "​" being injected into my HTML? -
edit: can see issue here (look in source).
edit2: interesting, not issue in source. console (firebug well).
i have following markup in file called test.html
:
<!doctype html> <html> <head> <title>test harness</title> <link href='/css/main.css' rel='stylesheet' type='text/css' /> </head> <body> <h3>test harness</h3> </body> </html>
but in chrome, see:
<!doctype html> <html> <head> </head> <body> "​ " <title>test harness</title> <link href='/css/main.css' rel='stylesheet' type='text/css' /> <h3>test harness</h3> </body> </html>
it looks ̢ 0 width space, causing it? using sublime text 2 utf-8 encoding , google app engine jinja2 (but jinja loading test.html
). thoughts?
thanks in advance.
it issue in source. live example provided starts following bytes (i.e., appear before <!doctype html>
): 0xe2 0x80 0x8b. can seen e.g. using rex swain’s http viewer selecting “hex” under “display format”. note validating page w3c markup validator gives information suggests there wrong @ start of document, message “line 1, column 1: non-space characters found without seeing doctype first.”
what happens in validator , in chrome tools – e.g. in firebug – bytes 0xe2 0x80 0x8b taken character data, implicitly starts body
element (since character data cannot validly appear in head
element or before it), implying empty head
element before it.
the solution, of course, remove bytes. browsers ignore them, should not rely on such error handling, , bytes prevent useful html validation. how remove them, , how got there in first place, depends on authoring environment.
since page declared (in http headers) being utf-8 encoded, bytes represent zero width space (u+200b) character. has no visible glyph , no width, won’t notice in visual presentation though browsers treat being data @ start of body
element. notation ​
character reference it, presumably used browser tools indicate presence of invisible character.
it possible software produced html document meant insert zero width no-break space (u+feff) instead. have been valid, since special convention, utf-8 encoded data may start character, known byte order mark (bom) when appearing @ start of data. using u+200b instead of u+feff sounds error software unlikely make, human beings may mistaken way if think of unicode names of characters.
Comments
Post a Comment