Convert YAML column data to JSONB in Rails and Postgres

yaml-to-json

Introduction

My team had wanted to take advantage of the incredibly useful JSON features provided by Postgres 9.4+ in working with data that our rails application was storing as YAML in a text column. We had installed a very useful gem called audited that was responsible for keeping track of “when” & “what” changed in our ActiveRecord records. The “what” data was, unfortunately, being stored as YAML in a text column and makes querying this data more difficult than it needs to be. Here is a sample of that “what” data:

---
pilot:
- false
- true

And this is what I want it to be like:

{
  "pilot": [false, true]
}

Notably, the authors of the audited gem included the option to install it such that this data is stored in JSONB (default is YAML).

Lately, we have been interested in building more advanced queries to search by the content of this “what” data. As a result, we aimed to create a migration that can transform the data safely back and forth from YAML to JSONB (reversible for good measure).

The Result

Here is a migration file that we ended up generating that converts the audited_changes column (that “what” data) on the audits table from YAML to JSONB. You can utilize this snippet of code to transform your YAML column to JSONB.

Disclaimer - Migrate at your own risk!

class ChangeAuditedChangesFromYamlToJson < ActiveRecord::Migration
  def up
    updated_records = fetch_audit_records.map do |r|
      {
        id: r['id'],
        audited_changes: escape_sql(JSON.dump(YAML.safe_load(r['audited_changes'])))
      }
    end

    remove_column :audits, :audited_changes
    add_column :audits, :audited_changes, :jsonb

    Transaction.transaction do
      updated_records.each do |ur|
        ActiveRecord::Base.connection.execute(
          """
            UPDATE audits 
            SET audited_changes = '#{ur[:audited_changes]}'::jsonb 
            WHERE id = #{ur[:id]}
          """
        )
      end
    end
  end

  def down
    updated_records = fetch_audit_records.map do |r|
      {
        id: r['id'],
        audited_changes: escape_sql(YAML.dump(JSON.parse(r['audited_changes'])))
      }
    end

    remove_column :audits, :audited_changes
    add_column :audits, :audited_changes, :text

    Transaction.transaction do
      updated_records.each do |ur|
        ActiveRecord::Base.connection.execute(
          """
            UPDATE audits 
            SET audited_changes = '#{ur[:audited_changes]}'::text 
            WHERE id = #{ur[:id]}
          """
        )
      end
    end
  end

  private

  #
  # Returns the id & audit_changes column of the audits table.
  # @return [PG::Result]
  def fetch_audit_records
    ActiveRecord::Base.connection.execute('SELECT id, audited_changes FROM audits')
  end

  #
  # Returns the SQL escaped version of the string provided 
  # For example, a "'" needs to be escaped by doubling it.
  # @param [String] the string to be converted
  # @return [String] the escaped version of the string provided
  def escape_sql(string)
    return string.gsub("'","''")
  end

end

Conclusion

It took some trial and error and navigating through some ‘gotchas’ to arrive at our resultant migration. For instance, we had to escape single-quote characters by adding another single-quote next to every instance (See the escape_sql method in the code snippet). Nonetheless, we are happy with the results and excited to start building queries with those super awesome JSON features given to us by Postgres.

Hope this helps someone else =). Happy coding!