When storing large JSON files, how efficiently does ActiveRecord’s jsonb field store and retrieve data when compared to using a regular String field??
This is an experiment I performed to help me solve a problem with long response times. Full disclosure, I experienced some strange results here that I can’t explain. I’d love to be able to understand these better (help, svp!), but for now I’m simply posing the problem. All code is included.
If you decide to duplicate, ensure you’re using at least Postgres v9.4.
There’ve been a few great posts about how speedy ActiveRecord is in storing JSON and JSONB files. In particular, Nando Vieira shows that inserting 30,000 records in json versus jsonb “didn’t have any real difference”, with yielding response times of 12.76s vs 12.57s respectively (only a 1.5% efficiency decrease). He does however note that there querying JSON is much faster with JSONb.
I wish to extend Nano’s study to see how large the performance difference is when JSON files are larger (such as an array of 2000 objects). Is it larger than the 1.5% drop that Nando calculated? If yes, how much larger?
Setting up Rails & Postgres
I launched a new rails application rails new storing_large_json_test_app --database=postgresql
with bundle install
and generated three models using:
In your migration files you should see, respectively:
Getting data
I will be comparing an array of objects, where each object is a set of 4 randomly generated, 2 character strings. Note that all our objects will be the same.
Benchmarking & Results
Lets generate four random strings using this handy code snippet from StackOverflow, store them within an array of objects. Next, we’ll convert this to a string to mimic our server and then run some benchmarks:
Results:
- String test: 0.748323 seconds
- json test: 0.020513 seconds
- jsonb test: 0.021677 seconds (only marginally slower than json)
As you can see above, the object associated with a string field is significantly slower when storing a string of JSON. I wasn’t able to determine exactly why, to me this seems a bit counterintuitive. Note-to-self: Use JSON fields when storing a String of JSON.
However, what about the opposite situation? Namely, what about when we’re storing a Ruby Array or Ruby String object that has not been coerced to a string? I ran the tests two more times, first by leaving the object as a Ruby Array, and next by coercing the entry
object to JSON before storing it (user: entry = entry.to_json
).
Results with String (to_s
, from above):
- String test: 0.748323 seconds
- json test: 0.020513 seconds
- jsonb test: 0.021677 seconds
Results with Ruby Array:
- String test: 0.851061 seconds
- json test: 4.765974 seconds
- jsonb test: 6.193291 seconds
Results with JSON (to_json
):
- String test: 0.846517 seconds
- json test: 4.310821 seconds
- jsonb test: 6.614038 seconds
Summary
It’s important to know what type of object you’re storing if you have a large amount of JSON to store.
- If you have a string of JSON, store in a JSON field
- If you have a Ruby Object, coerce to string, store in a json or string field
- If you have a coerced string of JSON, consider storing in a string field
Interestingly, if we have a Ruby object and coerce it to a string (to_s
) it will store nice and quickly in a JSON file. Similarly if we have an escaped JSON string and coerce it non-escaped JSON string we can bring performance times down. For example, if we have an escaped JSON string (example #3):
We get some baffling results here:
Results coercing to string:
- String test: 0.751831 seconds
- json test: 0.136969 seconds
- jsonb test: 0.05749 seconds
Now, jsonb is faster than storing a string. Moral of the story: Run Benchmarks when you’re storing JSON!. :/