Don’t use strlen()
March 29th, 2012 by Thomas • WPengineer Misc • 16 Comments
Each time I see someone use strlen() I cringe. It will break.
Despite its name, strlen() doesn’t count characters. It counts bytes. In UTF-8 a character may be up to four bytes long.
So what happens if we use strlen() and its companion substr() to shorten the title of post?
<?php # -*- coding: utf-8 -*-
declare( encoding = 'UTF-8' );
header('Content-Type: text/plain;charset=utf-8');
$string = 'Doppelgänger';
print 'strlen(): ' . strlen( $string ) . "\n";
print 'mb_strlen(): ' . mb_strlen( $string, 'utf8' ) . "\n\n";
print 'substr(): ' . substr( $string, 0, 8 ) . "\n";
print 'mb_substr(): ' . mb_substr( $string, 0, 8, 'utf8' );
Output:

I have to use an image here. If I had used the plain text output our newsfeed would break. And that’s what happens each time you use strlen() and substr() on strings encoded in UTF-8: You end up with partial characters and invalid UTF-8.
Alternatives for mb_strlen()
You can use different methods to get the real string length.
$length = preg_match_all( '(.)su', $string, $matches );
See also Hakre: PHP UTF-8 string Length.
Or just use …
$length = strlen( utf8_decode( $string ) );
There is also a nice php-utf8 library on GitHub from Frank Smit.
Info
- Published in WPengineer Misc
- Tags: Code, development, PHP, WordPress, WP
- Comment feed
- read: 24538 | today: 39
- leave a Comment


Danke für den Hinweis. Hat mir hier geholfen.
So, how do you determine the character count in PHP?
@Sergey Use mb_strlen().
WordPress core sometimes uses strlen(). Should they use only mb_strlen() ?
@Nicolas: It depends. If you really know you have only single byte characters it is okay. Unfortunately WordPress uses strlen() sometimes on data where this is not the case (plugin description length in WP_Plugin_Install_List_Table or image captions in wp_read_image_metadata() for example).
There are rarely critical side effects unless substr() is used to write something into the database or into the output.
substr($_SERVER['HTTP_USER_AGENT'], 0, 254); for example is written to the data base and may be invalid UTF-8.
mb_strlen is not always available ...
@Marcel wp-includes/compat.php defines the function if it is missing.
Wow, thanks a lot Thomas. Didn't know this yet.
To me, this sounds more like a problem with the PHP function not doing what the name is suggests it does.
There is no mention on the documentation for either strlen() or mb_strlen that this is the case... it's just shoddy work on behalf of the PHP development team
I think strlen() should give you the number of characters in a string and there should be a dedicated function for the number of bytes perhaps strbytes()?
Wow, thanks a lot .
Didn't know this yet.
Nice works.Really great stuff.Keep it up.Thank you.
wp-includes/compat.php defines mb_substr(), but not mb_strlen().
@GaryJ, you are right, I stand corrected. :)
I have added some alternatives and links to show other ways.
Nice article.Its really nice works.Thank you.
You saved my sunday :)