Katipo Project Survival Kit

Katipo Developers Blog

Quick Ruby script to write file types report

April 27th, 2011 by walter

Today I had to evaluate whether it was worth it to use Kete’s bulk import facility to migrate an existing site’s content to Kete or whether to just have someone drag the content over page by page.

To figure this out, I wanted to know roughly how many pages along with other files were on the site. I knew that it wasn’t going to absolutely massive, so I started by grabbing all of the site’s public content with wget (the details coming from http://linuxreviews.org/quicktips/wget/):

wget -p -r --wait=20 --limit-rate=20K -U Mozilla http://the_site/

I let that run in the background while I did other work.

When it finished I wrote up a little (ugly, unDRY, but took < 5 minutes) Ruby to give me a report by file type and called it file_report.rb based on a skeleton grabbed from http://blogs.sourceallies.com/2009/12/word-counts-example-in-ruby-and-scala/:

require 'yaml' # Change rootDir to the location of downloaded site rootDir = "/path/to/roodDir/for/entire/downloaded/site" raise rootDir + " does not exist" unless File.directory? rootDir # recursively add files and directories to report_hash based on their type def files(rootDir, report_hash) report_hash['directories'] = report_hash['directories'] || Array.new Dir.foreach(rootDir) do |dir| if dir != "." && dir != ".." dir_path = rootDir + "/" + dir if File.directory?(dir_path) puts "Processing " + dir report_hash['directories'] = report_hash['directories'] << dir_path Dir.foreach(dir_path) do |file| if file != "." && file != ".." file_path = rootDir + "/" + dir + "/" + file if File.directory?(file_path) report_hash['directories'] = report_hash['directories'] << file_path files(file_path, report_hash) else # add path to report_hash's entry for the file extension extension = File.extname(file).sub('.', '') report_hash[extension] = report_hash[extension] || Array.new report_hash[extension] = report_hash[extension] << file_path end end end else # add path to report_hash's entry for the file extension extension = File.extname(dir).sub('.', '') report_hash[extension] = report_hash[extension] || Array.new report_hash[extension] = report_hash[extension] << dir_path end end end end t1 = Time.now report_hash = Hash.new files(rootDir, report_hash) puts "File type counts: " report_hash.each do |k, v| puts "#{k} : #{v.size}" end puts "Writing complete report" File.open('report.yml', "w") do |f| f.write(report_hash.to_yaml) end t2 = Time.now puts "Finished in " + (t2 - t1).to_s + " seconds"

Finally, I called it with:

ruby file_report.rb

Nothing flash, but handy to have when you need it. I saved myself a lot of clicking around on their website.

Multiple migration_template calls in Rails (2.3x) generator manifest

April 26th, 2011 by walter

If you have created a Rails generator that needs to include more than one migration_template in its record block, I’ve found a trick so you don’t run into a “Multiple migrations have the version number” error when running db:migrate.

You need to tell the generator to take a one second snooze, so that the next_migration_string method returns a timestamp that is one second later.

You would think a simple call to the sleep method would do the trick, but because the generator manifest’s record has a special syntax (that relies on method_missing to define recorded actions), you need to do a small tweek by calling the sleep method on the block’s object, i.e. “m.sleep(1)”.

Here’s what it looks like in practice:

class TrolliedMigrationsGenerator < Rails::Generator::Base def manifest record do |m| m.migration_template 'trolleys_migration.rb', 'db/migrate', { :migration_file_name => "create_trolleys" } m.sleep(1) m.migration_template 'purchase_orders_migration.rb', 'db/migrate', { :migration_file_name => "create_purchase_orders" } m.sleep(1) m.migration_template 'line_items_migration.rb', 'db/migrate', { :migration_file_name => "create_line_items" } end end end

mongo_translatable Rails engine Ruby gem released

March 11th, 2011 by walter

Version 0.1 of the mongo_translatable gem is out.

We’ve been using mongo_translatable for awhile as a part of a feature in Kete (an open source Rails app, http://kete.net.nz) for a content translation add-on for awhile now. Thought it was about time to share.

mongo_translatable is a Rails specific I18n model localization mechanism meant to tie-in to existing ActiveRecord models, ala Globalize2, backed by MongoDB rather than an RDBMS.

The gem has only been developed with Rails 2.3.5 up to this point, as that is what my needs are right now, but it would be great if others contributed compatibility with later versions of Rails.

Project information can be found on github:


Thanks to Te Reo o Taranaki, the Chinese Association of New Zealand Auckland Branch, and Auckland City Libraries for funding the work.

oembed_provider Rails engine Ruby gem released

February 16th, 2011 by walter

Version 0.1 of the oembed_provider gem is out.

oembed_provider is a Rails engine to answer oEmbed requests for application media asset models. In other words, this gem allows your application, after configuring the gem and the relevant models, to act as an oEmbed Provider by providing a controller that returns JSON or XML for a given oEmbed consumer request for the specified media asset. This gem does not offer oEmbed consumer functionality.

The gem has only been developed with Rails 2.3.5 up to this point, as that is what my needs are right now, but it would be great if others contributed compatibility with later versions of Rails.

More information is available at https://github.com/kete/oembed_provider.

Issues can be reported at http://kete.lighthouseapp.com/projects/69994-oembed_provider.

This gem was developed for the Kete open source application (http://kete.net.nz) and was funded by pledge campaign to improve media selection from with the rich text editor (i.e. the TinyMCE plugin, look for TinyMCE media selector plugin soon). Horowhenua Library Trust, Wellington City Libraries, Te Reo o Taranaki, Environmental Earth Sciences, CALYX, and many individual contributors. Thanks to all contributors.

Sharl is your friend

October 11th, 2010 by bob

Sometimes your linux server partitions fill up faster than you expect and suddenly you are getting warning emails screaming at you.

This is when Sharl comes in handy.

ls -Sharl

It is a quick way to list the contents of a dir by size of file with the biggest ones being the last on the list just right there above the command prompt allowing you to easy banish those not needed tar, sql, zip files that are not needed with flourish to free up as much space as one can with the least number of keystrokes

Kakama Technical Overview

June 4th, 2010 by kieran

I’ve just posted a technical overview of how Kakama functions. You can view it at Kakama.org

A more concise way to call single test file in ruby

April 25th, 2010 by walter

I’ve been working with Rails and Ruby since 2006 and I’m surprised I hadn’t put this together for myself:

$ cd test # from your rails app root $ ruby unit/a_model_test.rb

As compared to:

$ ruby -I"lib:test" "/usr/local/Cellar/ruby-enterprise-edition/1.8.7-20090928/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake/rake_test_loader.rb" "test/unit/a_model_test.rb"

Definitely a hand to forehead moment when I read that! Found it in a comment here:


*Assumes your ruby command is set up correctly in your shell’s environment, of course.

Using git feature branches to make your master branch commits list concise

March 18th, 2010 by kieran

When you’re starting off, it’s fairly easy to commit to the master branch. But once your application is released, you probably want to keep things stable on the master branch. So use feature branches.

Read the rest of this entry »

Installing MongoDB on Mac OS X using Homebrew

March 16th, 2010 by walter

I’ve moved from MacPorts to Homebrew which includes a recipe for installing MongoDB. After installing Homebrew, just run this as your normal user:

brew install mongodb

If you prefer to store your MongoDB data all under your home directory, you might find Mislav’s gist suits your needs instead:


If you prefer installing from source, check out this post:



Three ways to increase New Relic RPM’s usefulness

March 3rd, 2010 by kieran

Here at Katipo, we’re using New Relic RPM to monitor our deployed Kete applications, to help make things as fast as possible. In order to make New Relic as useful as possible, I’ve been trying out three New Relic RPM features, some available in only the latest versions of RPM, on one of those sites. These recent and little-known features aren’t enabled by default, so I’m going to run you through them and how to set them up in this post.

If you don’t yet use New Relic RPM, you can get a Lite account for free by going to newrelic.com, where you can also test drive New Relic RPM on a real application. Read the rest of this entry »

Katipo Developers Blog is proudly powered by WordPress
Entries (RSS) and Comments (RSS).