PEAK XOOPS - A Trap with get_html_translation_table() in englishin japanese

Archive | RSS |
PHP
PHP : A Trap with get_html_translation_table()
Poster : GIJOE on 2008-05-16 04:55:48 (11642 reads)

A Trap with get_html_translation_table()in englishin japanese
You may know unhtmlspecialchars() has implemented into PHP.
With former versions of PHP, we had made such a custom function using get_html_translation_table() like this:


function my_unhtmlspecialchars( $text , $quotes = ENT_QUOTES )
{
	return strtr( $text , array_flip( get_html_translation_table( HTML_SPECIALCHARS , $quotes ) ) ) ;
}


Recently, I find the function never convert & #039; into single quotes.
This is a result of my investigation.
Code:

<?php
        var_dump( htmlspecialchars( '"\'<>&' , ENT_QUOTES ) ) ;
        var_dump( get_html_translation_table( HTML_SPECIALCHARS , ENT_QUOTES ) ) ;
?>

Result:

string(25) "& quot;& #039;& lt;& gt;& amp;"
array(5) {
  ["""]=>
  string(6) "& quot;"
  ["'"]=>
  string(5) "& #39;"
  ["<"]=>
  string(4) "& lt;"
  [">"]=>
  string(4) "& gt;"
  ["& "]=>
  string(5) "& amp;"
}

& #039 and & #39; ...

This bug(?) are alive in PHP versions from PHP 4.3.10 to 5.2.5 at least.
We have to consider this trap for some time.

Printer friendly page Send this story to a friend

Comments list

GIJOE  Posted on 2008/7/6 18:06
hi vaughan.

Thank you for your kindness.
vaughan  Posted on 2008/7/5 19:17
not sure to be fair. but they may do, i will try to find out.

I thought it strange too as i am aware that you have been using utf-8 for a long while and didn't have these problems.

this however could also be related to us being all PHP 5 now and icms will not work on PHP4.

If I ever discover exactly what was causing it, i'll be sure to let you know :)
GIJOE  Posted on 2008/7/5 6:57 | Last modified
hi vaughan.

hmmm..
It sounds strange.

We -Japanese- had used UTF-8 with htmlspecialchars() some years ago.
But we've never met troubles about the third parameter.
(Always blank)

Does Russian HTML use some specialcharacters other than <>'"& ??


# If they uses UTF-7, I can understand it though.
vaughan  Posted on 2008/7/4 2:47
me neither. initially we thought it was html purifier causing the problem, but after looking at the function, html purifier was not used at all, so it had to be with that function. the only difference i could see was on php.net they mentioned to set the charset if you use utf-8 encoding. was worth a try anyway and solved the problem in our case.
GIJOE  Posted on 2008/7/3 13:37
hi vaughan.

Quote:

we had trouble with some russian languages using utf-8 with the textsanitizer. and we had to add the charset to the end of function htmlSpecialChars() in order to solve it. this could have been caused through various server configs though, because some of us couldn't replicate the issue ourselves.
hmmm...
Certainly, I don't know russian encodings at all.

But, _CHARSET is just 'UTF-8' for russian utf8.
I cannot understand what is wrong about that.
vaughan  Posted on 2008/7/2 17:52
we had trouble with some russian languages using utf-8 with the textsanitizer. and we had to add the charset to the end of function htmlSpecialChars() in order to solve it. this could have been caused through various server configs though, because some of us couldn't replicate the issue ourselves.
GIJOE  Posted on 2008/7/2 12:17
hi vaughan.

Quote:

on a sidenote: we also discovered when using UTF-8 charsets fully, we ran into a few small issues with icms, which will affect xoops aswell.

in certain conditions, it is not enough to just use htmlspecialchars. but you also need to set which characterset you are using aswell.
Which conditions?

As long as I read the source code of PHP in ext/standard/html.c, the third parameter -charset- looks almost non-sense.

And I've never experienced the trouble the charset for htmlspecialchars().
vaughan  Posted on 2008/7/1 18:27
ok thanks :)

on a sidenote: we also discovered when using UTF-8 charsets fully, we ran into a few small issues with icms, which will affect xoops aswell.

in certain conditions, it is not enough to just use htmlspecialchars. but you also need to set which characterset you are using aswell.

as default htmlspecialchars encodes with ISO charset, for utf-8 you need to specify utf-8

example $text = htmlspecialchars($text, ENT_QUOTES, _CHARSET);

especially with function htmlSpecialChars() in xoops textsanitizer
GIJOE  Posted on 2008/7/1 12:40 | Last modified
hi vaughan.

htmlspecialchars_decode() needs PHP5.

We have to write codes working with both PHP4 and PHP5
vaughan  Posted on 2008/6/16 8:58
what's the difference between using this method and php native function htmlspecialchars_decode()

or is this just for php 4 backward compatibility?
Login
Username or e-mail:

Password:

Remember Me

Lost Password?

Register now!