Tag Archives: using perl in editing

Finding doubled words using perl

I recently switched to Scrivener for writing my documents. Much more enjoyable interface than Word, with lots of nifty features for writers. One big issue: I’m still getting used to Scrivener’s spellchecker. Microsoft Word finds doubled words right out of the box, but Scrivener does not.

The script below is written in perl, which comes pre-installed on Macs. If you paste it into a text file, make the file executable, and then run it in the same directory with a file called “infile.txt” (a cut/paste from Word to the file will do nicely), it will report your doubled words.

*update* – the script won’t catch things like: “bang bang” because the quotes make it think it’s 2 patterns.  Working on it 🙂

Example input (infile.txt):

This is a line
This is another line
And yet another line
wow I sure do a lot of lines, "Don't I?" he said (in a funny voice)...
Wow it sure is is fune typing all this
I like dogs and cats and stuff.
Big big is funner than small people.
how are are the dodgers doing this year? Nobody knows.
more lines and stuff...
etc. etc.
good things come to those who write scripts in perl and post them on the internet

Example Output:

is is ---->  Wow it sure is is fune typing all this
big big ---->  Big big is funner than small people.
are are ---->  how are are the dodgers doing this year? Nobody knows.
etc. etc. ---->  etc. etc.

And now the script: rep.pl

#!/usr/bin/perl
open(FILE,"infile.txt") or die "Can't open infile.txt: $!";
$section_breaks = "*";  # I have * * * as section breaks. The script sees them as words and should ignore them.
while(<FILE>) {
   chomp();
   $a_line = $_;
   @line = split(/ /, $_);
   $prev = 0;   
   foreach $i (@line) {
      $i = lc($i);
      if ($i eq $prev && $i ne $section_breaks) {
         print "$prev $i ---->  $a_line\n";
      }
      $prev = $i;
   }
}
close(FILE);
Advertisement

10 Comments

Filed under Grammar/Punctuation, Tools for Writers, Writing in general