Measuring string length seems easy, but it’s such a nuanced trap.
Here’s a string. Can you guess its length?
"My 👨👩👦👦 is fun"
It’s a trick question. There isn’t a single answer.
In PHP, it’s 38. strlen($string) counts bytes in memory.
In JavaScript, it’s 22. string.length counts UTF-16 code units.
In Swift, it’s 11. string.count counts grapheme clusters.
In Python 3, it’s 18. len(string) counts Unicode codepoints.
In Python 2, it can be 38 or 76 – depending on how it was built 1
This also matters for computing substrings, accessing string indices, and regular expressions.
Remember to check how your language treats strings.
- In python < 3.3 internal string representation depends on the
--enable-unicodeflag that can be set toucs2orucs4. ↩︎
Leave a ReplyCancel reply