Advertisements

seems_utf8

Definition:
function seems_utf8($str) {}

Checks to see if a string is utf8 encoded.
NOTE: This function checks for 5-Byte sequences, UTF8 has Bytes Sequences with a maximum length of 4.

Parameters

  • string $str: The string to be checked

Return values

returns:True if $str fits a UTF-8 model, false otherwise.

Source code

function seems_utf8($str) {

	$length = strlen($str);

	for ($i=0; $i < $length; $i++) {

		$c = ord($str[$i]);

		if ($c < 0x80) $n = 0; # 0bbbbbbb

		elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb

		elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb

		elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb

		elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb

		elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b

		else return false; # Does not match any model

		for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?

			if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80))

				return false;

		}

	}

	return true;

}

2821

Advertisements

No comments yet... Be the first to leave a reply!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: