Each time I see someone use strlen()
I cringe. It will break.
Despite its name, strlen()
doesn’t count characters. It counts bytes. In UTF-8 a character may be up to four bytes long.
So what happens if we use strlen()
and its companion substr()
to shorten the title of post?
<?php # -*- coding: utf-8 -*- declare( encoding = 'UTF-8' ); header('Content-Type: text/plain;charset=utf-8'); $string = 'Doppelgänger'; print 'strlen(): ' . strlen( $string ) . "\n"; print 'mb_strlen(): ' . mb_strlen( $string, 'utf8' ) . "\n\n"; print 'substr(): ' . substr( $string, 0, 8 ) . "\n"; print 'mb_substr(): ' . mb_substr( $string, 0, 8, 'utf8' );
Output:
I have to use an image here. If I had used the plain text output our newsfeed would break. And that’s what happens each time you use strlen() and substr() on strings encoded in UTF-8: You end up with partial characters and invalid UTF-8.
Alternatives for mb_strlen()
You can use different methods to get the real string length.
$length = preg_match_all( '(.)su', $string, $matches );
See also Hakre: PHP UTF-8 string Length.
Or just use …
$length = strlen( utf8_decode( $string ) );
There is also a nice php-utf8 library on GitHub from Frank Smit.
Comments
16 responses to “Don’t use strlen()”
Danke für den Hinweis. Hat mir hier geholfen.
So, how do you determine the character count in PHP?
@Sergey Use mb_strlen().
WordPress core sometimes uses strlen(). Should they use only mb_strlen() ?
@Nicolas: It depends. If you really know you have only single byte characters it is okay. Unfortunately WordPress uses strlen() sometimes on data where this is not the case (plugin description length in WP_Plugin_Install_List_Table or image captions in wp_read_image_metadata() for example).
There are rarely critical side effects unless substr() is used to write something into the database or into the output.
substr($_SERVER[‘HTTP_USER_AGENT’], 0, 254); for example is written to the data base and may be invalid UTF-8.
mb_strlen is not always available …
@Marcel wp-includes/compat.php defines the function if it is missing.
Wow, thanks a lot Thomas. Didn’t know this yet.
To me, this sounds more like a problem with the PHP function not doing what the name is suggests it does.
There is no mention on the documentation for either strlen() or mb_strlen that this is the case… it’s just shoddy work on behalf of the PHP development team
I think strlen() should give you the number of characters in a string and there should be a dedicated function for the number of bytes perhaps strbytes()?
Wow, thanks a lot .
Didn’t know this yet.
Nice works.Really great stuff.Keep it up.Thank you.
wp-includes/compat.php defines mb_substr(), but not mb_strlen().
@GaryJ, you are right, I stand corrected. 🙂
I have added some alternatives and links to show other ways.
Nice article.Its really nice works.Thank you.
[…] Informations sur le lien : Proposé par : Aurélien Garroux Lien : https://wpengineer.com/2410/dont-use-strlen/ […]
You saved my sunday 🙂