I recently encountered a project where I had to constantly be on the lookout for modified files on the filesystem, then uploading data about modified versions to a database. I eneded up using MurmurHash to generate a hash of each file. I stored the file’s hash in the database every time I’d update the record.

Then, when I wanted to see if anything new had happened, I’d just open each file by scanning through relevant files in the folder:

```ruby def self.import_folder_contents(folder, opts={}) files = Dir.glob(folder+’/*/_full.xlsm’)

files.each do |file|
  import_from_file(file, opts)
end   end ```

Then, I’d check each file to make sure it wasn’t a duplicate. (This was far faster than I expected, considering I’m going through tens of thousands of files each time it’s run.)

ruby def self.is_duplicate_file(filename) basename = File.basename(filename) prior_file = Model.find_by(filename: basename) unless prior_file.nil? current_murmurhash = Digest::MurmurHash2.file(filename).hexdigest return true if prior_file.file_murmurhash == current_murmurhash end return false end