Tag Archives: perl

Find unbalanced smart quotes in HTML using Perl

So I’m brushing up an HTML document so I can publish it on the Kindle and I’ve discovered lots of unbalanced smart quotes. Smart quotes are double quotes that face to the right or the left, rather than just straight up and down. In HTML, they are rendered as “ldquo” and “rdquo” with an ampersand in front of each and a trailing semi-colon. I wrote a little script to make sure that for every left side I had a right side, and print out the lines in which they do not match.

Here you go:

#!/usr/bin/perl

open(FILE,"./file.html") or die "Can't open file.html: $!\n";

while(<FILE>) {
   chomp();
   $line = "$_";

   $l = "ldquo";
   $r = "rdquo";   
   $lc = 0;
   $rc = 0;   

   $lc = () = $line =~ /$l/g;
   $rc = () = $line =~ /$r/g;

   print "Lc = $lc  Rc = $rc\n";
   if ($lc != $rc) {
      print "\n$line\n";
   }
}
close(FILE);
Advertisement

3 Comments

Filed under Grammar/Punctuation, Tools for Writers