Finding doubled words using perl

I recently switched to Scrivener for writing my documents. Much more enjoyable interface than Word, with lots of nifty features for writers. One big issue: I’m still getting used to Scrivener’s spellchecker. Microsoft Word finds doubled words right out of the box, but Scrivener does not.

The script below is written in perl, which comes pre-installed on Macs. If you paste it into a text file, make the file executable, and then run it in the same directory with a file called “infile.txt” (a cut/paste from Word to the file will do nicely), it will report your doubled words.

*update* – the script won’t catch things like: “bang bang” because the quotes make it think it’s 2 patterns.  Working on it 🙂

Example input (infile.txt):

This is a line
This is another line
And yet another line
wow I sure do a lot of lines, "Don't I?" he said (in a funny voice)...
Wow it sure is is fune typing all this
I like dogs and cats and stuff.
Big big is funner than small people.
how are are the dodgers doing this year? Nobody knows.
more lines and stuff...
etc. etc.
good things come to those who write scripts in perl and post them on the internet

Example Output:

is is ---->  Wow it sure is is fune typing all this
big big ---->  Big big is funner than small people.
are are ---->  how are are the dodgers doing this year? Nobody knows.
etc. etc. ---->  etc. etc.

And now the script: rep.pl

#!/usr/bin/perl
open(FILE,"infile.txt") or die "Can't open infile.txt: $!";
$section_breaks = "*";  # I have * * * as section breaks. The script sees them as words and should ignore them.
while(<FILE>) {
   chomp();
   $a_line = $_;
   @line = split(/ /, $_);
   $prev = 0;   
   foreach $i (@line) {
      $i = lc($i);
      if ($i eq $prev && $i ne $section_breaks) {
         print "$prev $i ---->  $a_line\n";
      }
      $prev = $i;
   }
}
close(FILE);
Advertisements

10 Comments

Filed under Grammar/Punctuation, Tools for Writers, Writing in general

10 responses to “Finding doubled words using perl

  1. Bah!

    perl -n -e ‘while (m{\b(\S+)\b(\s+\1\b)+}migs) { print “dup on line $.\n”; }’ infile.txt

  2. I’ve tried using Scrivener, I like it, but I always go back to Word

    • I’m seriously considering going back to word for 2 reasons:
      1) a more intuitive search(replace) system.
      2) a more reliable spellchecker. In Scrivener, if you type “don’k” (for example) it doesn’t flag it as an error.

      • It may not flag it, but if you run a check does it show up? I vaguely remember reading that it doesn’t underline misspellings so that you don’t get caught up in correcting and instead focus on writing.

      • Casslogan: very interesting! I’ll have to try. Thanks for the tip. That’d help a lot 🙂

      • Casslogan: you’re right. If you manually run the check, it DOES find the doubles. I’ve amended my post so I’m not bashing Scrivener 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s