Ruby-Stemmer

Ruby-Stemmer exposes SnowBall API to Ruby.

This package includes libstemmer_c library released under BSD licence and available for free here.

Support for latin language is also included and it has been generated with the snowball compiler using schinke contribution.

For more details about libstemmer_c please visit the SnowBall website.

Usage

require 'rubygems'
require 'lingua/stemmer'

stemmer= Lingua::Stemmer.new(:language => "ro")
stemmer.stem("netăgăduit") #=> netăgădu

Alternative

require 'rubygems'
require 'lingua/stemmer'

Lingua.stemmer( %w(incontestabil neîndoielnic), :language => "ro" ) #=> ["incontest", "neîndoieln"]
Lingua.stemmer("installation") #=> "instal"
Lingua.stemmer("installation", :language => "fr", :encoding => "ISO_8859_1") do | word |
  puts "~> #{word}" #=> "instal"
end # => #<Lingua::Stemmer:0x102501e48>

Rails

# Rails2: -- config/environment.rb:
config.gem 'ruby-stemmer', :version => '>=0.6.2', :lib => 'lingua/stemmer'

# Rails3: -- Gemfile
gem 'ruby-stemmer', '>=0.8.3', :require => 'lingua/stemmer'

More details

Install

Standard install with:

gem install ruby-stemmer

Windows

There's also a Windows (Fat bin) compiled against ruby 1.9.3 and ruby 1.8.7.

gem install ruby-stemmer --platform=x86-mingw32

As far as I know the above should work with rubyinstaller. If if fails, you could try with:

gem install ruby-stemmer --platform=x86-mswin32

It’s known to work under Windows XP.

Development version

$ git clone git://github.com/aurelian/ruby-stemmer.git
$ cd ruby-stemmer
$ rake -T #<== see what we've got
$ rake compile #<== builds the extension do'h
$ rake test

NOT A BUG

The stemming process is an algorithm to allow one to find the stem of an word (not the root of it). For further reference on stem vs. root, please check wikipedia articles on the topic:

TODO

Note on Patches/Pull Requests

Alternative Stemmers for Ruby

Copyright

Copyright © 2008-2011 Aurelian Oancea. See MIT-LICENSE for details.

Contributors

Real life usage

# encoding: utf-8