Category Archives: Grammar/Punctuation

Find unbalanced smart quotes in HTML using Perl

So I’m brushing up an HTML document so I can publish it on the Kindle and I’ve discovered lots of unbalanced smart quotes. Smart quotes are double quotes that face to the right or the left, rather than just straight up and down. In HTML, they are rendered as “ldquo” and “rdquo” with an ampersand in front of each and a trailing semi-colon. I wrote a little script to make sure that for every left side I had a right side, and print out the lines in which they do not match.

Here you go:


open(FILE,"./file.html") or die "Can't open file.html: $!\n";

while(<FILE>) {
   $line = "$_";

   $l = "ldquo";
   $r = "rdquo";   
   $lc = 0;
   $rc = 0;   

   $lc = () = $line =~ /$l/g;
   $rc = () = $line =~ /$r/g;

   print "Lc = $lc  Rc = $rc\n";
   if ($lc != $rc) {
      print "\n$line\n";


Filed under Grammar/Punctuation, Tools for Writers

Added another link to my Editing Blogs collection

Terribly Write

Leave a comment

Filed under Grammar/Punctuation

Preliminary homophone finder written in perl

I wrote this little perl script to find homophones in text documents, so if you save your word doc as a text file, in theory you can find all the homophones with it.  I’m using 943 homophones and running the first part of a Winston Churchill speech through it 🙂

To run the script, you’d need to know a little perl and how to use it. So for most people, this isn’t particularly user-friendly.  It’s more for fun, as well as a proof of concept on a hypothetical tool writers could use to keep silly mistakes out of their writing.  While the script runs, you hit ‘enter’ occasionally to go to the next line with one or more homophones in it.

The script needs:

  • words.txt – a list of homophones one after the other  (it makes sense to edit out of this any words you’d never mess up, for example, “I” vs. “eye” or “were” vs. “whirr”)
  • ms.txt –  your manuscript saved as a text file

First, the code:


open(WORDS,"words.txt") or die "Can't open words.txt: $!\n";
@words = <WORDS> ;

open(MS,"ms.txt") or die "Can't open ms.txt: $!\n";

while(<MS>) {
   $aline = $_;
   $match = 0;
   foreach $i (@words) {
      if ($aline=~/\s+$i\s+/g) {
        $match = 1;
        $uppercase = uc($i);
         $aline=~s/\s+$i\s+/ \*$uppercase\* /g;

   if ($match == 1) {
      print "$aline\n";
      print "[ hit enter to continue ]\n";
      $ans= <> ;

Here’s what happens to the first part of this famous speech:

I spoke the other day of the colossal military disaster *WHICH* occurred when the French High Command failed *TO* withdraw the northern Armies from Belgium at the moment when they *KNEW* that the French front was decisively broken at Sedan and on the Meuse. This delay entailed the loss of fifteen *OR* sixteen French divisions and *THREW* out of action *FOR* the critical period the whole of the British Expeditionary Force. Our Army and 120,000 French troops *WERE* indeed rescued *BY* the British Navy from Dunkirk *BUT* only with the loss of *THEIR* cannon, vehicles and modern equipment. This loss inevitably took *SOME* weeks *TO* repair, and *IN* the first *TWO* of those weeks the battle *IN* France has *BEEN* lost. When *WE* consider the heroic resistance *MADE* *BY* the French Army against heavy odds *IN* this battle, the enormous losses inflicted upon the enemy and the evident exhaustion of the enemy, it may well *BE* the thought that these 25 divisions of the best-trained and best-equipped troops *MIGHT* have turned the scale. However, General Weygand had *TO* fight without them. Only three British divisions *OR* *THEIR* equivalent *WERE* able *TO* stand *IN* the line with *THEIR* French comrades. They have suffered severely, *BUT* they have *FOUGHT* well. We *SENT* every man *WE* could *TO* France as fast as *WE* could re-equip and transport *THEIR* formations.
[ hit enter to continue ]

I am *NOT* reciting these facts *FOR* the purpose of recrimination. That *I* judge *TO* *BE* utterly futile and even harmful. We cannot afford it. *I* recite them *IN* order *TO* explain why it was *WE* did *NOT* have, as *WE* could have had, between twelve and fourteen British divisions fighting *IN* the line *IN* this *GREAT* battle instead of only three. Now *I* put *ALL* this aside. *I* put it on the shelf, from *WHICH* the historians, when they have time, will select *THEIR* documents *TO* tell *THEIR* stories. We have *TO* think of the future and *NOT* of the past. This also applies *IN* a small *WAY* *TO* *OUR* own affairs at home. There are many who *WOULD* hold an inquest *IN* the House of Commons on the conduct of the Governments-and of Parliaments, *FOR* they are *IN* it, too-during the years *WHICH* *LED* up *TO* this catastrophe. They seek *TO* *INDICT* those who *WERE* responsible *FOR* the guidance of *OUR* affairs. This also *WOULD* *BE* a foolish and pernicious process. There are *TOO* many *IN* it. Let each man search his conscience and search his speeches. *I* frequently search mine.
[ hit enter to continue ]

Leave a comment

Filed under Grammar/Punctuation, Tools for Writers

Looking for some good proofreading checklists

I found one with good tips on using search/replace:

Here’s a list of homophones:

From my blogroll, here’s an amazing list of crap that can be snipped from your writing.

She has a lot to say about numbers, too:

I’m working on a style sheet of my own, tailored to the mistakes I make.  When it’s done, I’ll post it.

Leave a comment

Filed under Grammar/Punctuation

Elements of Style online

This is the book referenced by Stephen King in his incredible book for writers, “On Writing.”  I have the hard copy, but it’s nice to be able to go click through it online.  See it here:

Leave a comment

Filed under Grammar/Punctuation