The other day I was listening to this Bikeshed podcast episode, where the hosts were discussing when is it a good idea to memoize values using
||= ruby idiom. Since this is a common question even among seasoned developers, I decided to write up my take on it. The short answer is: never.
Let’s take a look at this example. We query the database to find the user by id, then use their email to make an API call to download a profile and grab the name. While this example is indeed contrived, it’s fairly common to see variations on this theme in the wild.
@name ||= @api.fetch_profile(User.find(@id).email).name
Now, just to get it out of the way, there are various problems with this code. However, in this post, let’s just view it from the angle of memoization. So, what are the 3 reasons not to memoize like this?
Reason 1: Caller is misled about the real impact of making this call.
Typically, doing this sort of memoization goes hand-in-hand with naming your method with a noun. Since the method is named so inconspicuously (
name), we’re signalling that a caller doesn’t have to worry what happens under the hood. We perpetuate the practice of calling this method mindlessly, with no regard for the fragile sequence of interdependent network operations that it takes to fulfill the request. I get it, we want to encapsulate the plumbing, but couldn’t we do it without misleading the caller?
Reason 2: Caller has no say in cache invalidation.
This memoization style assumes that caller will never want another fresh value. For web apps, it probably comes out of another assumption that we’re always living within a web request, and we never want to fetch any data twice. Unfortunately, each such memoization slowly eats away at our understanding of how data flows through our application, making it much harder to debug problems, or implement anything else on top of the same codebase.
Reason 3: Caller has no way of stopping redundant work.
In our example, if a caller already has a
user available, the method will fetch it again anyway. In a well architected system we should be able to inject that dependency, especially if it took something as error-prone as network or database roundtrips to obtain it.
How would we avoid all 3 of the above problems? It’s not that difficult, but with a caveat that you didn’t already overcommit to bigger architectural mistakes. Still, it’s never too late to stop making things worse. So without further ado, here’s the code free of all of the above problems.
def retrieve_name email: User.find(@id).email, api: @api
You might’ve just done a double-take: wait, how is this the solution? We just removed caching and added some useless arguments. Bear with me, let’s talk through this real quick.
Note that arguments are optional, so the method can still be called without passing anything. Let’s go back and see if we’ve addressed the problems with the original code.
1. Is caller still misled about the real impact of calling this?
No. The fact that this method name is now a verb
retrieve_name makes it clear that when you call it, it will do things. That’s all it takes to send the correct signal.
2. Can the caller control cache invalidation?
name = retrieve_name
# Name is now cached, feel free to reuse it.
# Get a fresh name whenever you want.
fresh_name = retrieve_name
3. Can the caller stop redundant work from happening?
my_user = User.find(123)
name = retrieve_name(email: my_user.email) # Saves a database call.
In case it’s not obvious, we couldn’t accept arguments the same way in the original version, because we’re only caching one value, and even if we then passed a different user, we would still get back the first cached value.
Ultimately, with very little effort, we just gained 3 significant advantages in maintainability, reusability, and performance of our code.
What if I need to call this method from different places, so I don’t have a variable to reuse?
I feel your pain. Unfortunately, if you must depend on this caching technique because you cannot assign a variable once, and pass it around, I have some bad news for you. Your abstractions need rethinking. There should be a top level routine in your code that tells the story of a particular transaction. Values that are reused need to be floated up into that context and passed into whatever needs them. In a vanilla Rails world the place like this would be your controller actions. If doing this makes your actions too long, you’re missing intermediary objects that give you a clean abstraction to write your routine. That said, this is a pretty big topic best left for future blog posts.