[SOLVED] Trouble with some HTML entities

xss
Developer
Posts: 22
Joined: Thu May 26, 2011 9:07 pm

[SOLVED] Trouble with some HTML entities

Post by xss »

Hello rob, hello everyone,

I've been playing around a bit with WonderCMS and I really like this approach. Thank you, rob, for providing this!

Though, I have come across an issue which is quite a buzzkill when it comes to using „special” characters like Umlauts or other non-English characters (my Site is in German), or even some simple HTML entities like the „&copy;” entity (while „<” and „>” are not affected...). When writing out, for example, an „ä” character as „&auml;”, it would be saved to the text file as exactly that string and is displayed on the website correctly. But, as soon as the same page will be edited again, those characters will be displayed as real characters („ä”, „©”) in the edit box and some character („�”) is being saved to the text file at their place. These characters are merely displayed as a square box (Windows) or a black rhombus with a question mark (Linux) on the website and also when editing or viewing the text file directly. When editing the same page yet again and then clicking outside the edit box, only the word „false” would be shown in place of the changed content, but the editions will be lost and neither saved to the file.

I suspect this issue is UTF8 related and might be caused by the browser itself, when the editInPlace is activated. Is there some easy trick to get around this issue?
¿ן ʇ,uop 'spɹɐʍʞɔɐq ןןɐ ʇı ʇoƃ ן 'ɹɐǝp ɥo

User avatar
wiz
Admin
Posts: 749
Joined: Sat Oct 30, 2010 12:23 am

Re: Trouble with some HTML entities

Post by wiz »

You're welcome, I'm glad you're enjoying it and hope I'll be able to fix this.

I tried it out and it I get the same effect as you did however writing &uuml; didn't. (well it did turn into an („�”) the second time.
I'm thinking of a couple if statements, something like if content contains "ü", content == &uuml; - this would probably display it correctly every time it comes back to the browser. (a couple of if statements would do since there aren't that many umlauts in the German language (as far as I know?)) ;)

You can try and play around with, if not, I'll be able to test this out myself in a couple of hours to make it work.

But I do know, that special characters work on some servers, so I'm going to look into that also, to see what the difference between our server and those who support special characters is.

xss
Developer
Posts: 22
Joined: Thu May 26, 2011 9:07 pm

Re: Trouble with some HTML entities

Post by xss »

Thank you so much for replying, rob!
rob wrote:however writing &uuml; didn't. (well it did turn into an („�”) the second time.
Strange, it does on my webspace.
rob wrote:(a couple of if statements would do since there aren't that many umlauts in the German language (as far as I know?)) ;)
Well, for German language, you'd basically need the following ä, Ä, ö, Ö, ü, Ü and ß: &auml; &Auml; &ouml; &Ouml; &uuml; &Uuml; &szlig;

And, as I stated before, also the Copyright character is affected, at least in my installation of WonderCMS, so, also simply editing the page footer would cause these problems.

I suppose a fix that not only fixes German text would be preferable, as this also seems to affect simple accents (é, è, ...) and other diacritica as they are used in several languages like French, Spanish, Portuguese, Polish, Hungarian, Romanian, Icelandic, Esperanto, and many, many more. Any really, at the latest when it comes to Vietnamese, that would require a whole lot of if statements – you'd probably have to change your slogan where it says "only 10kb"... :mrgreen:
rob wrote:You can try and play around with, if not, I'll be able to test this out myself in a couple of hours to make it work.
Where does this belong? Into the editInPlace.js, I assume?
rob wrote:But I do know, that special characters work on some servers, so I'm going to look into that also, to see what the difference between our server and those who support special characters is.
Good idea. If you have use for specs of the server I'm on, just let me know (and possibly with a way on how to provide the info in case it's anything more than a simple phpinfo();. My webspace is with a webhoster and I have no root access, but I can (and probably will if I find the time) also run some tests on my own local webserver.

I hope this can be solved, as it would add a great deal of usability to WonderCMS, especially for international users. :-) If you need backup, I will try to assist you as good as I can; though, my PHP and JS skills are rather basic.
¿ן ʇ,uop 'spɹɐʍʞɔɐq ןןɐ ʇı ʇoƃ ן 'ɹɐǝp ɥo

User avatar
wiz
Admin
Posts: 749
Joined: Sat Oct 30, 2010 12:23 am

Re: Trouble with some HTML entities

Post by wiz »

Yeah its strange, would you mind running a php info on your server and saving the html file with the info and attaching it here, or sending it to my email?

I really think it's the server settings since some users have reported any language works with their installation.

It does belong both in editInPlace.js and editText.php (any maybe in the main file index.php).
I could use any testing or backup you can provide, I'd be more than glad to use your help. :)

That would be sad if it would have to increase the size, thats why I'm going to have to find an universal solution.

Hopefully it's an encode/decode problem, I'll ask around to see what's the best way to solve this.

xss
Developer
Posts: 22
Joined: Thu May 26, 2011 9:07 pm

Re: Trouble with some HTML entities

Post by xss »

Hello rob,

sorry for the delay, I was kinda offline over the weekend. :) As I cannot send you a PM (too new a user, it says...), I'll attach the PHP info files here, they're in the attached ZIP file, but completely anonymized:

„phpinfo_webhoster.html“ is the PHP info file from my webhoster's server. „phpinfo_homeserver.html“ is that of my Lamp server at home.

EDIT: I totally forgot: The WonderCMS installation on my home server appears to behave exactly the same concerning the encoding of non-English characters in the course of multiple edits. /EDIT

If you need any additional info, please, let me know. I didn't get to do any further testing yet, though. I hope I find some time later this week.

One more thing I noticed:
I had to disable PHP error reporting ('php_value display_errors 0' in .htaccess) for my home server. The errors seem somewhat theme and cookie related, and some of them remain even after logging in and getting a cookie. Apart from the giant error lines, all page content is there, and when disabling error reporting, everything looks fine. But it could be a bit confusing for a new user...
Attachments
phpinfos.zip
PHP infos of my webhoster's server and my own home server.
(67.12 KiB) Downloaded 928 times
¿ן ʇ,uop 'spɹɐʍʞɔɐq ןןɐ ʇı ʇoƃ ן 'ɹɐǝp ɥo

User avatar
wiz
Admin
Posts: 749
Joined: Sat Oct 30, 2010 12:23 am

Re: Trouble with some HTML entities

Post by wiz »

xss wrote:Hello rob,

sorry for the delay, I was kinda offline over the weekend. :) As I cannot send you a PM (too new a user, it says...), I'll attach the PHP info files here, they're in the attached ZIP file, but completely anonymized:

„phpinfo_webhoster.html“ is the PHP info file from my webhoster's server. „phpinfo_homeserver.html“ is that of my Lamp server at home.

EDIT: I totally forgot: The WonderCMS installation on my home server appears to behave exactly the same concerning the encoding of non-English characters in the course of multiple edits. /EDIT

If you need any additional info, please, let me know. I didn't get to do any further testing yet, though. I hope I find some time later this week.

One more thing I noticed:
I had to disable PHP error reporting ('php_value display_errors 0' in .htaccess) for my home server. The errors seem somewhat theme and cookie related, and some of them remain even after logging in and getting a cookie. Apart from the giant error lines, all page content is there, and when disabling error reporting, everything looks fine. But it could be a bit confusing for a new user...
As far as I understand, special characters did not work on either of those servers?
Are those server related problems? (I'm guessing they only show up on your home server?)

I just need confirmation so I can start comparing those phpinfo() files.

And private messages are not enabled for new users. (also - no problem about being offline, I need a rest myself sometimes).

dwfee
Posts: 2
Joined: Tue Jun 07, 2011 8:55 am

Re: Trouble with some HTML entities

Post by dwfee »

I inserted :
$ent =array('À'=>'&Agrave;', 'à'=>'&agrave;', 'Á'=>'&Aacute;', 'á'=>'&aacute;', 'Â'=>'&Acirc;', 'â'=>'&acirc;', 'Ã'=>'&Atilde;', 'ã'=>'&atilde;', 'Ä'=>'&Auml;', 'ä'=>'&auml;', 'Å'=>'&Aring;', 'å'=>'&aring;', 'Æ'=>'&AElig;', 'æ'=>'&aelig;', 'Ç'=>'&Ccedil;', 'ç'=>'&ccedil;', 'Ð'=>'&ETH;', 'ð'=>'&eth;', 'È'=>'&Egrave;', 'è'=>'&egrave;', 'É'=>'&Eacute;', 'é'=>'&eacute;', 'Ê'=>'&Ecirc;', 'ê'=>'&ecirc;', 'Ë'=>'&Euml;', 'ë'=>'&euml;', 'Ì'=>'&Igrave;', 'ì'=>'&igrave;', 'Í'=>'&Iacute;', 'í'=>'&iacute;', 'Î'=>'&Icirc;', 'î'=>'&icirc;', 'Ï'=>'&Iuml;', 'ï'=>'&iuml;', 'Ñ'=>'&Ntilde;', 'ñ'=>'&ntilde;', 'Ò'=>'&Ograve;', 'ò'=>'&ograve;', 'Ó'=>'&Oacute;', 'ó'=>'&oacute;', 'Ô'=>'&Ocirc;', 'ô'=>'&ocirc;', 'Õ'=>'&Otilde;', 'õ'=>'&otilde;', 'Ö'=>'&Ouml;', 'ö'=>'&ouml;', 'Ø'=>'&Oslash;', 'ø'=>'&oslash;', 'Œ'=>'&OElig;', 'œ'=>'&oelig;', 'ß'=>'&szlig;', 'Þ'=>'&THORN;', 'þ'=>'&thorn;', 'Ù'=>'&Ugrave;', 'ù'=>'&ugrave;', 'Ú'=>'&Uacute;', 'ú'=>'&uacute;', 'Û'=>'&Ucirc;', 'û'=>'&ucirc;', 'Ü'=>'&Uuml;', 'ü'=>'&uuml;', 'Ý'=>'&Yacute;', 'ý'=>'&yacute;', 'Ÿ'=>'&Yuml;', 'ÿ'=>'&yuml;','&' =>'&');
$content=strtr($content, $ent);
before:
fwrite($file, $content);
fclose($file);
in editText.php
now it work, it's not complete, but works for me.

xss
Developer
Posts: 22
Joined: Thu May 26, 2011 9:07 pm

Re: Trouble with some HTML entities

Post by xss »

Hello rob,

sorry for this late reply. Somehow I didn't get notified of the replies, and in all the fuzz I have these days I totally forgot about these threads...
rob wrote:As far as I understand, special characters did not work on either of those servers?
Correct, both servers seem to behave equally when it comes to non-english characters.
rob wrote:Are those server related problems? (I'm guessing they only show up on your home server?)
Due you refer to the PHP error reporting thingy? Yes, that only happens on my home server, I assume, because error reporting is activated via the php.ini on my local server but not at my webhoster's server.

dwfee's solution looks like it would handle most European languages (though, the &copy; is missing), which is a start. But unfortunately, it does not work for me, probably due to some UTF8/ISO 8859-1 discrepancy issue with my linux system. So I still hope there is an ultimate solution. :-)
¿ן ʇ,uop 'spɹɐʍʞɔɐq ןןɐ ʇı ʇoƃ ן 'ɹɐǝp ɥo

User avatar
wiz
Admin
Posts: 749
Joined: Sat Oct 30, 2010 12:23 am

Re: Trouble with some HTML entities

Post by wiz »

I've managed to find a way to solve some special characters (I think all of them from the dwfee's list) and the &copy; stays the same at all times. (doesn't turn into any weird code). How to fix it?

Open up editText.php and add the following line of code

Code: Select all

$content = mb_convert_encoding($content,"UTF-8");
above

Code: Select all

fwrite($file, $content);
fclose($file);
So the end result will be:

Code: Select all

$content = mb_convert_encoding($content,"UTF-8");

fwrite($file, $content);
fclose($file);
Or just download the editText.php attached to this post and replace it with the original one.
Attachments
editText.zip
Unzip this and replace it with the original on your server.
(1.03 KiB) Downloaded 874 times

xss
Developer
Posts: 22
Joined: Thu May 26, 2011 9:07 pm

Re: Trouble with some HTML entities

Post by xss »

From what I have tested so far, it appears to work nicely. :) What a treat, thank you!

But, as a thought: wouldn't it still be preferable that HTML entity markup (like „&#x00E4;“ or „&#228;“ or „&auml;“, which all result in „ä“) will not be converted to their plain text form in the first place? Sure, it eases reading the source a lot, but in the past, I often encountered encoding issues when using umlauts or other non-English chars in my HTML source code, especially when I viewed the pages on different OSes or with different browsers (afore-mentioned UTF8/ISO 8859 conflicts, AFAIK), and therefore resorted to always using the appropriate HTML entity.
¿ן ʇ,uop 'spɹɐʍʞɔɐq ןןɐ ʇı ʇoƃ ן 'ɹɐǝp ɥo

Post Reply