Detecting UTF BOM using PHP

You need to detect the three bytes and remove the BOM. Below is a simplified example on how to detect and remove the three bytes.

$str = file_get_contents('<span class="skimlinks-unlinked">yourfile.utf8.csv</span>');
$bom = pack("CCC", 0xef, 0xbb, 0xbf);
if (0 === strncmp($str, $bom, 3)) {
&nbsp;&nbsp; &nbsp;echo "BOM detected - file is UTF-8\n";
&nbsp;&nbsp; &nbsp;$str = substr($str, 3);
}

Here is a script to recursively check php files for BOM (ByteOrderMark)

$check_extensions = array('php');

define('STR_BOM', "\xEF\xBB\xBF");
$file = null;
$directory = getcwd();
$rit = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($directory), RecursiveIteratorIterator::CHILD_FIRST);
echo '<h1>BOM Check</h1>';

try{
  foreach ($rit as $file){
    if ($file->isFile()){
      $path_parts = pathinfo($file->getRealPath());
      if (isset($path_parts['extension']) && in_array($path_parts['extension'],$check_extensions)){
        $object = new SplFileObject($file->getRealPath());
        if (false !== strpos($object->getCurrentLine(), STR_BOM)) {
          echo $file->getRealPath().'<br />';
        }
      }
    }
  }
} catch (Exception $e) {
  die ('Exception caught: '. $e->getMessage());
}
affiliate_link
Share this Post:
Digg Google Bookmarks reddit Mixx StumbleUpon Technorati Yahoo! Buzz DesignFloat Delicious BlinkList Furl

Comments are closed.