String Length Differs Between Programming Languages

Posted by

on

Measuring string length seems easy, but it’s such a nuanced trap.

Here’s a string. Can you guess its length?

"My 👨‍👩‍👦‍👦‍ is fun"

It’s a trick question. There isn’t a single answer.

In PHP, it’s 38. strlen($string) counts bytes in memory.

In JavaScript, it’s 22. string.length counts UTF-16 code units.

In Swift, it’s 11. string.count counts grapheme clusters.

In Python 3, it’s 18. len(string) counts Unicode codepoints.

In Python 2, it can be 38 or 76 – depending on how it was built 1

This also matters for computing substrings, accessing string indices, and regular expressions.

Remember to check how your language treats strings.

  1. In python < 3.3 internal string representation depends on the --enable-unicode flag that can be set to ucs2 or ucs4. ↩︎

Discover more from Adam's Perspective

Subscribe now to keep reading and get access to the full archive.

Continue reading

Exit mobile version