PEAK XOOPS - get_html_translation_table()の罠 in englishin japanese

Archive | RSS |
PHP
PHP : get_html_translation_table()の罠
Poster : GIJOE on 2008-05-16 04:55:48 (11681 reads)

get_html_translation_table()の罠in englishin japanese
最近、unhtmlspecialchars() も実装されたようですが、get_html_translation_table() を利用して、htmlspecialchars() の逆関数を作っていた人も多いでしょう。


function my_unhtmlspecialchars( $text , $quotes = ENT_QUOTES )
{
	return strtr( $text , array_flip( get_html_translation_table( HTML_SPECIALCHARS , $quotes ) ) ) ;
}


ところが、それだとなぜかシングルクオートだけがうまく処理されないことに、今さら気づきました。

おかしいなと調べてみたら、get_html_translation_table( HTML_SPECIALCHARS , ENT_QUOTES ) と htmlspecialchars( $text , ENT_QUOTES ) では、結果が違います。

コード:

<?php
        var_dump( htmlspecialchars( '"\'<>&' , ENT_QUOTES ) ) ;
        var_dump( get_html_translation_table( HTML_SPECIALCHARS , ENT_QUOTES ) ) ;
?>

結果:

string(25) "& quot;& #039;& lt;& gt;& amp;"
array(5) {
  ["""]=>
  string(6) "& quot;"
  ["'"]=>
  string(5) "& #39;"
  ["<"]=>
  string(4) "& lt;"
  [">"]=>
  string(4) "& gt;"
  ["& "]=>
  string(5) "& amp;"
}

& #39; と & #039; …。
そりゃ逆変換できないのも当然です。

ええ〜、そりゃないんじゃないの! という感じです。以前、PHPのマニュアルにも、逆関数の作り方として、get_html_translation_table()の結果をarray_flip()する、なんてコードが載っていたような記憶もあるのですが…

もしかすると有名な不具合なのかもしれませんが、うちでビルドしたCLI版PHPは、4.3.10から5.2.5までことごとくこういう食い違いを見せていたので、もしかしたら誰も気づいていないのかもと思って書いてみました。

Printer friendly page Send this story to a friend

Comments list

GIJOE  Posted on 2008/7/6 18:06
hi vaughan.

Thank you for your kindness.
vaughan  Posted on 2008/7/5 19:17
not sure to be fair. but they may do, i will try to find out.

I thought it strange too as i am aware that you have been using utf-8 for a long while and didn't have these problems.

this however could also be related to us being all PHP 5 now and icms will not work on PHP4.

If I ever discover exactly what was causing it, i'll be sure to let you know :)
GIJOE  Posted on 2008/7/5 6:57 | Last modified
hi vaughan.

hmmm..
It sounds strange.

We -Japanese- had used UTF-8 with htmlspecialchars() some years ago.
But we've never met troubles about the third parameter.
(Always blank)

Does Russian HTML use some specialcharacters other than <>'"& ??


# If they uses UTF-7, I can understand it though.
vaughan  Posted on 2008/7/4 2:47
me neither. initially we thought it was html purifier causing the problem, but after looking at the function, html purifier was not used at all, so it had to be with that function. the only difference i could see was on php.net they mentioned to set the charset if you use utf-8 encoding. was worth a try anyway and solved the problem in our case.
GIJOE  Posted on 2008/7/3 13:37
hi vaughan.

Quote:

we had trouble with some russian languages using utf-8 with the textsanitizer. and we had to add the charset to the end of function htmlSpecialChars() in order to solve it. this could have been caused through various server configs though, because some of us couldn't replicate the issue ourselves.
hmmm...
Certainly, I don't know russian encodings at all.

But, _CHARSET is just 'UTF-8' for russian utf8.
I cannot understand what is wrong about that.
vaughan  Posted on 2008/7/2 17:52
we had trouble with some russian languages using utf-8 with the textsanitizer. and we had to add the charset to the end of function htmlSpecialChars() in order to solve it. this could have been caused through various server configs though, because some of us couldn't replicate the issue ourselves.
GIJOE  Posted on 2008/7/2 12:17
hi vaughan.

Quote:

on a sidenote: we also discovered when using UTF-8 charsets fully, we ran into a few small issues with icms, which will affect xoops aswell.

in certain conditions, it is not enough to just use htmlspecialchars. but you also need to set which characterset you are using aswell.
Which conditions?

As long as I read the source code of PHP in ext/standard/html.c, the third parameter -charset- looks almost non-sense.

And I've never experienced the trouble the charset for htmlspecialchars().
vaughan  Posted on 2008/7/1 18:27
ok thanks :)

on a sidenote: we also discovered when using UTF-8 charsets fully, we ran into a few small issues with icms, which will affect xoops aswell.

in certain conditions, it is not enough to just use htmlspecialchars. but you also need to set which characterset you are using aswell.

as default htmlspecialchars encodes with ISO charset, for utf-8 you need to specify utf-8

example $text = htmlspecialchars($text, ENT_QUOTES, _CHARSET);

especially with function htmlSpecialChars() in xoops textsanitizer
GIJOE  Posted on 2008/7/1 12:40 | Last modified
hi vaughan.

htmlspecialchars_decode() needs PHP5.

We have to write codes working with both PHP4 and PHP5
vaughan  Posted on 2008/6/16 8:58
what's the difference between using this method and php native function htmlspecialchars_decode()

or is this just for php 4 backward compatibility?
Login
Username or e-mail:

Password:

Remember Me

Lost Password?

Register now!