Version17

次にVersion07で示したような、最長ワードトップ30を出力するメソッドtop_by_lengthも定義しましょう。

class WordDictionary
   def top_by_length(nth, &blk)
     list = take_by_key(nth, lambda { |key| -key.length }, &blk)
     list.map { |word, freq| [word, freq, word.length] }
   end
 
   private
   def take_by_value(nth, sort_opt)
     @freq_dic.select { |key, val| block_given? ? yield(val) : val }.take_by(nth) { |key, val| sort_opt[val] }
   end
 
   def take_by_key(nth, sort_opt)
     @freq_dic.select { |key, val| block_given? ? yield(val) : val }.take_by(nth) { |key, val| sort_opt[key] }
   end
 end
 wdic = WordDictionary.new(ARGF)
 p wdic.top_by_length(30) { |val| val > 100 }

ここでは将来に備えて、take_by_valueと同じようにtake_by_keyを定義して、top_by_lengthはこれを使うようにします。

top_by_lengthはその語と出現数に加えて、語長を返すようにしています。Arrayクラスのmapメソッドをここでは使っています。mapメソッドはinjectメソッド同様とても便利なメソッドです。配列の各要素の内容をブロックの処理結果で置き換えます。上の例は list.map { |item| item << item[0].length } でもいいです。

出力はこんな感じです。

#> [["illustration", 160, 12], ["therefore", 127, 9], ["catherine", 126, 9], ["jerusalem", 120, 9], ["gutenberg", 285, 9], ["elizabeth", 636, 9], ["prophecy", 322, 8], ["together", 105, 8], ["anything", 117, 8], ["pleasure", 103, 8], ["judgment", 134, 8], ["believe", 110, 7], ["collins", 180, 7], ["between", 114, 7], ["wickham", 194, 7], ["bingley", 306, 7], ["replied", 136, 7], ["history", 189, 7], ["himself", 178, 7], ["against", 164, 7], ["because", 116, 7], ["however", 179, 7], ["through", 185, 7], ["nothing", 235, 7], ["sabbath", 215, 7], ["herself", 312, 7], ["another", 144, 7], ["project", 262, 7], ["without", 263, 7], ["thought", 215, 7]]

Version18

またも問題発生!DRY違反です!

def take_by_value(nth, sort_opt)
     @freq_dic.select { |key, val| block_given? ? yield(val) : val }.take_by(nth) { |key, val| sort_opt[val] }
   end
 
   def take_by_key(nth, sort_opt)
     @freq_dic.select { |key, val| block_given? ? yield(val) : val }.take_by(nth) { |key, val| sort_opt[key] }
   end

take_by_key_or_valメソッドを定義して、これを回避します。

def take_by_value(nth, sort_opt, &blk)
     val = lambda { |key, val| val }
     take_by_key_or_val(nth, sort_opt, val, &blk)
   end
 
   def take_by_key(nth, sort_opt, &blk)
     key = lambda { |key, val| key }
     take_by_key_or_val(nth, sort_opt, key, &blk)
   end
 
   def take_by_key_or_val(nth, sort_opt, by)
     @freq_dic.select { |key, val| block_given? ? yield(val) : val }.take_by(nth) { |key, val| sort_opt[by[key, val]] }
   end

ふぅ

Version19

さて次は何ですか?そうですね…

せっかくクラスを作ったのに、コマンド引数しか取れないっていうのは寂しいです。では次はWordDictionaryクラスがファイル名か文字列を直接受け取れるようにしましょう。

そのためにinput_to_stringメソッドを定義し、initializeメソッドで入力を適切に変換するようにします。

class WordDictionary
   def initialize(input)
     input = input_to_string(input)
     @words = input.downcase.scan(/[a-z]+/)
     @freq_dic = @words.inject(Hash.new(0)) { |dic, word| dic[word] += 1 ; dic }
   end
   
   private
   def input_to_string(input)
    case input
    when String
      begin
        File.open(input, "r") { |f| return f.read }
      rescue
        puts "Argument has assumed as a text string" 
        input
      end
    when ARGF.class
      input.read
    else
      raise "Wrong argument. ARGF, file or string are acceptable."
    end
   end
 end
 wdic1 = WordDictionary.new(ARGF)
 wdic2 = WordDictionary.new('11.txt')
 wdic3 = WordDictionary.new(<<-EOS)
 It was all very well to say 'Drink me,' but the wise little Alice was not going to do THAT in a hurry. 'No, I'll look first,' she said, 'and see whether it's marked "poison" or not'; for she had read several nice little histories about children who had got burnt, and eaten up by wild beasts and other unpleasant things, all because they WOULD not remember
 the simple rules their friends had taught them: such as, that a red-hot poker will burn you if you hold it too long; and that if you cut your finger VERY deeply with a knife, it usually bleeds; and she had never forgotten that, if you drink much from a bottle marked 'poison,' it is almost certain to disagree with you, sooner or later.
EOS
 p wdic1.top_by_frequency(10)
 p wdic2.top_by_frequency(10)
 p wdic3.top_by_frequency(10)
 
 #> [["the", 4507], ["to", 4243], ["of", 3728], ["and", 3658], ["her", 2225], ["i", 2069], ["a", 2012], ["in", 1936], ["was", 1848], ["she", 1710]]
 [["the", 1818], ["and", 940], ["to", 809], ["a", 690], ["of", 631], ["it", 610], ["she", 553], ["i", 545], ["you", 481], ["said", 462]]
 [["it", 5], ["you", 5], ["and", 5], ["that", 4], ["had", 4], ["a", 4], ["if", 3], ["she", 3], ["to", 3], ["not", 3]]

input_to_stringにおいて、case式を使って入力の種類を切り分けました。when Stringでは最初ファイル名として処理できるか試み、できない場合は文字列として処理できるようにしました。うまくいっているようです。

WordDictionary.new(«-EOS)…は、ヒアドキュメントという記法を使っています。任意記号EOSで挟まれた行が文字列として解釈されます。

(次回に続く)



blog comments powered by Disqus
ruby_pack8

100円〜で好評発売中!
M'ELBORNE BOOKS