Sunday, October 20, 2013

Letter Frequency in Text and Rails

The other day a coworker and I were discussing keyboard layouts and this led into whether or not the character frequency (how often various letters are used) in a Rails application would be the same as it would be in normal text (though obviously, the code will have many more and more varied special characters). I thought that it would be but, he didn't think that would be the case. Since I had a bit of free time this weekend, I thought I'd write a bit of code to figure it out. For the text, I used Project Gutenberg's copy of Moby Dick and for the code, I used a smallish Rails project of mine.

Here's the code and results ...



As you can see, the histograms are pretty similar. For this particular Rails project, I was using HAML, so I'm not sure if there'd be any differences if you used erb or not. Also, as I noted, this was a fairly small project.

The code runs against a single file that's passed in on the command line. So run it like this ...

ruby letter_freqency.rb moby10b.txt > moby_results.txt

for example. For a rails project, I cat'd all the files together and then ran the code like this ...

cat `find . -iname \*.rb -or -iname \*.haml` > rails_files.txt
ruby letter_freqency.rb rails_files.txt > rails_results.txt

Let me know if you try this on one of your projects and post the results in the comments.