Searching for modified files with Ruby and MurmurHash

I recently encountered a project where I had to constantly be on the lookout for modified files on the filesystem, then uploading data about modified versions to a database. I eneded up using MurmurHash to generate a hash of each file. I stored the file’s hash in the database every time I’d update the record.

Then, when I wanted to see if anything new had happened, I’d just open each file by scanning through relevant files in the folder:

  def self.import_folder_contents(folder, opts={})
    files = Dir.glob(folder+'/**/*_full.xlsm')

    files.each do |file|
      import_from_file(file, opts)
    end
  end

Then, I’d check each file to make sure it wasn’t a duplicate. (This was far faster than I expected, considering I’m going through tens of thousands of files each time it’s run.)

  def self.is_duplicate_file(filename)
    basename = File.basename(filename)
    prior_file = Model.find_by(filename: basename)
    unless prior_file.nil?
      current_murmurhash = Digest::MurmurHash2.file(filename).hexdigest
      return true if prior_file.file_murmurhash == current_murmurhash
    end
    return false
  end