Find unbalanced smart quotes in HTML using Perl

So I’m brushing up an HTML document so I can publish it on the Kindle and I’ve discovered lots of unbalanced smart quotes. Smart quotes are double quotes that face to the right or the left, rather than just straight up and down. In HTML, they are rendered as “ldquo” and “rdquo” with an ampersand in front of each and a trailing semi-colon. I wrote a little script to make sure that for every left side I had a right side, and print out the lines in which they do not match.

Here you go:

#!/usr/bin/perl

open(FILE,"./file.html") or die "Can't open file.html: $!\n";

while(<FILE>) {
   chomp();
   $line = "$_";

   $l = "ldquo";
   $r = "rdquo";   
   $lc = 0;
   $rc = 0;   

   $lc = () = $line =~ /$l/g;
   $rc = () = $line =~ /$r/g;

   print "Lc = $lc  Rc = $rc\n";
   if ($lc != $rc) {
      print "\n$line\n";
   }
}
close(FILE);
Advertisement

3 Comments

Filed under Grammar/Punctuation, Tools for Writers

3 responses to “Find unbalanced smart quotes in HTML using Perl

  1. Pingback: Some Thoughts on Editing | RBPierce Online

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s