PHP Tricks: Eliminate any unwanted characters from a string

Hi all.

I’ve been looking for a handy/graceful way to do this for a while and I thought I’d share this little php trick with you in the hopes that it’ll save you some time if you ever need it. If anyone reading this can quickly convert this example into other languages such as perl, coldfusion, ruby, c, c++ please be my guest and post it as a comment or as a trackback to this blog.

What I wanted to do

I wanted to scrub any characters out of a string that were not alphanumeric. That is, not “a” to “z” and “0” to “9”. Thankfully, PHP has extensive regular expressions support so that’s what we’ll use.

Why

I needed to store files uploaded through PHP on my Linux machine using a filename chosen by the user. A web site I’m working on requires users to upload media (images, video, music, et. al) and in order to save the files on the hard disk. This functionality prevents code failure by making the filename valid in UNIX filesystems.

The Code

/**
 * Converts a string to a valid UNIX filename.
 * @param $string The filename to be converted
 * @return $string The filename converted
 */
function convert_to_filename ($string) {

  // Replace spaces with underscores and makes the string lowercase
  $string = str_replace (" ", "_", $string);
  $string = str_replace ("..", ".", $string);
  $string = strtolower ($string);

  // Match any character that is not in our whitelist
  preg_match_all ("/[^0-9^a-z^_^.]/", $string, $matches);

  // Loop through the matches with foreach
  foreach ($matches[0] as $value) {
    $string = str_replace($value, "", $string);
  }
  return $string;
}

Usage

Simply call the convert_to_filename function and pass the filename/string along. For example:

$valid_filename = convert_to_filename ($original_filename);

How it works

Almost the entire function is self-explanatory as long as you have a bit of experience with PHP. The part that does the actual deed may need some explanation, however. Especially if you’re relatively new to regular expressions. The part that strips the bad characters from the filename string is started first by this line, which uses a regular expression to make an array called $matches of all of the characters that are NOT a-z, 0-9, a period, or an underscore:

preg_match_all ("/[^0-9^a-z^_^.]/", $string, $matches);

Then, we simply walk through the array replacing the offending character with nothing using the PHP function str_replace.

Functions Used

The functions/constructs used in this article are