I recently switched to Scrivener for writing my documents. Much more enjoyable interface than Word, with lots of nifty features for writers. One big issue: I’m still getting used to Scrivener’s spellchecker. Microsoft Word finds doubled words right out of the box, but Scrivener does not.
The script below is written in perl, which comes pre-installed on Macs. If you paste it into a text file, make the file executable, and then run it in the same directory with a file called “infile.txt” (a cut/paste from Word to the file will do nicely), it will report your doubled words.
*update* – the script won’t catch things like: “bang bang” because the quotes make it think it’s 2 patterns. Working on it 🙂
Example input (infile.txt):
This is a line
This is another line
And yet another line
wow I sure do a lot of lines, "Don't I?" he said (in a funny voice)...
Wow it sure is is fune typing all this
I like dogs and cats and stuff.
Big big is funner than small people.
how are are the dodgers doing this year? Nobody knows.
more lines and stuff...
etc. etc.
good things come to those who write scripts in perl and post them on the internet
Example Output:
is is ----> Wow it sure is is fune typing all this
big big ----> Big big is funner than small people.
are are ----> how are are the dodgers doing this year? Nobody knows.
etc. etc. ----> etc. etc.
And now the script: rep.pl
#!/usr/bin/perl open(FILE,"infile.txt") or die "Can't open infile.txt: $!"; $section_breaks = "*"; # I have * * * as section breaks. The script sees them as words and should ignore them. while(<FILE>) { chomp(); $a_line = $_; @line = split(/ /, $_); $prev = 0; foreach $i (@line) { $i = lc($i); if ($i eq $prev && $i ne $section_breaks) { print "$prev $i ----> $a_line\n"; } $prev = $i; } } close(FILE);
Bah!
perl -n -e ‘while (m{\b(\S+)\b(\s+\1\b)+}migs) { print “dup on line $.\n”; }’ infile.txt
hah, I knew you’d do something like that.
Nice work 🙂
Yours doesn’t account for section breaks! ( * * * )
There are no section breaks in my test file (or yours!)
I guess if you’re looking to ignore consecutive asterisks you can simply do: perl -n -e ‘while (m{\b(\S+)\b(\s+\1\b)+}migs) { print “dup on line $.\n” unless $1 eq ‘*’; }’ infile.txt
I’ve tried using Scrivener, I like it, but I always go back to Word
I’m seriously considering going back to word for 2 reasons:
1) a more intuitive search(replace) system.
2) a more reliable spellchecker. In Scrivener, if you type “don’k” (for example) it doesn’t flag it as an error.
It may not flag it, but if you run a check does it show up? I vaguely remember reading that it doesn’t underline misspellings so that you don’t get caught up in correcting and instead focus on writing.
Casslogan: very interesting! I’ll have to try. Thanks for the tip. That’d help a lot 🙂
Casslogan: you’re right. If you manually run the check, it DOES find the doubles. I’ve amended my post so I’m not bashing Scrivener 🙂