Iteration 1: Objects and Operations That Spam Requests
Learn to classify spam requests using objects and operations
We'll cover the following
While the bulk of the state that persists across requests belongs in the database and is accessed via Active Record, some other bits of state have different life spans and need to be managed differently. While the Cart itself was stored in the database in our Depot application, knowledge of which cart is current was managed by sessions. Flash notices were used to communicate messages such as “Can’t delete the last user” to the next request after a redirect and callbacks were used to extract locale data from the URLs themselves.
In this lesson, we’ll explore each of these mechanisms in turn.
Rails sessions
A Rails session is a hash-like structure that persists across requests. Unlike raw cookies, sessions can hold any marshallable objects, which makes them ideal for holding state information in web applications. For example, in our store application, we used a session to hold the shopping cart object between requests. The Cart
object could be used in our application just like any other object, but Rails arranged things so that the cart was saved at the end of handling each request. More importantly, it ensure that the correct cart for an incoming request was restored when Rails started to handle that request. Using sessions, we can pretend that our application stays around between requests.
That leads us to an interesting question: Where exactly does this data stay between requests? One possibility is for the server to send it down to the client as a cookie, which is the default for Rails. It places limitations on the size and increases the bandwidth but means that there is less for the server to manage and clean up. Note that the contents are encrypted by default, This means that users can neither see nor tamper with the contents.
The other option is to store the data on the server. It requires more work to set up and is rarely necessary.
-
First, Rails has to keep track of sessions. It does this by creating a default
32-hex
character key (which means there are16^32
possible combinations). This key is called thesession ID
, and it’s effectively random. Rails arrange to store this session ID as a cookie with the key_session_id
on the user’s browser. Because subsequent requests come into the application from this browser, Rails can recover the session ID. -
Second, Rails keeps a persistent store of session data on the server, indexed by the session ID. When a request comes in, Rails looks up the data store using the session ID. The data that it finds there is a serialized Ruby object. It deserializes this and stores the result in the controller’s
session
attribute, where the data is available to our application code. The application can add to and modify this data to its heart’s content. When it finishes processing each request, Rails writes the session data back into the data store, where it sits until the next request from this browser comes along.
What should we store in a session? We can store anything we want, subject to a few restrictions and caveats:
-
There are some restrictions on what kinds of objects we can store in a session. The details depend on the storage mechanism which we’ll look at shortly. In general, objects in a session must be serializable using Ruby’s
Marshal
functions. This means, for example, that we cannot store an I/O object in a session. -
If we store any Rails model objects in a session, we’ll have to add
model
declarations for them. This causes Rails to preload the model class so that its definition is available when Ruby comes to deserialize it from the session store. If the use of the session is restricted to just one controller, this declaration can go at the top of that controller.
class BlogController < ApplicationController
model :user_preferences
#...
However, if the session might get read by another controller (which is likely in any application with multiple controllers), we’ll probably want to add the declaration to application_controller.rb
in app/controllers
.
-
We probably don’t want to store massive objects in session data. Instead, put them in the database, and reference them from the session. This is particularly true for cookie-based sessions, where the overall limit is 4KB.
-
We also probably don’t want to store volatile objects in session data. For example, we might want to keep a tally of the number of articles in a blog and store that in the session for performance reasons. But, if we do that, the count won’t get updated if some other user adds an article. It’s tempting to store objects representing the currently logged-in user in session data. This might not be wise if our application needs to be able to invalidate users. Even if a user is disabled in the database, their session data will still reflect a valid status.
It’s better to store volatile data in the database and reference it from the session instead.
- We don’t want to store critical information solely in session data either. If our application generates an order confirmation number in one request and stores it in session data so that it can be saved to the database when the next request is handled, for example, we risk losing that number if the user deletes the cookie from their browser. Critical information needs to be in the database.
There’s one more caveat, and it’s a big one. If we store an object in session data, then the next time we come back to that browser, our application will end up retrieving that object. However, if in the meantime we’ve updated our application, the object in session data may not agree with the definition of that object’s class in our application, and the application will fail while processing the request. We have three solutions here. One is to store the object in the database using conventional models and keep just the ID of the row in the session. Model objects are far more forgiving of schema changes than the Ruby marshaling library. The second option is to manually delete all the session data stored on our server whenever we change the definition of a class stored in that data.
The third option is slightly more complex. If we add a version number to our session keys and change that number whenever we update the stored data, we’ll only ever load data that corresponds with the current version of the application. We can potentially version the classes whose objects are stored in the session and use the appropriate classes depending on the session keys associated with each request. This last idea can be a lot of work, so we’ll need to decide whether it’s worth the effort.
Since the session store is hash-like, we can save multiple objects in it, each with its own key.
There’s no need to also disable sessions for particular actions. Sessions are lazily loaded,so simply don’t reference a session in any action in which we don’t need a session.
Session storage
Rails has a number of options when it comes to storing our session data. Each has good and bad points. We’ll start by listing the options and then compare them at the end.
The session_store
attribute of ActionController::Base
determines the session storage mechanism—set this attribute to a class that implements the storage strategy. This class must be defined in the ActiveSupport::Cache::Store module
. We use symbols to name the session storage strategy, which is converted into a CamelCase class name.
-
session_store = :cookie_store
This is the default session storage mechanism used by Rails, starting with version 2.0. This format represents objects in their marshaled form, which allows any serializable data to be stored in sessions but is limited to 4KB total. This is the option we used in the Depot application.
-
session_store = :active_record_store
We can use the
activerecord-session_store
gem to store our session data in our application’s database usingActiveRecordStore
. -
session_store = :drb_store
DRb is a protocol that allows Ruby processes to share objects over a net-work connection. Using the DRbStore database manager, Rails stores session data on a DRb server, which we manage outside the web application. Multiple instances of our application, potentially running on distributed servers, can access the same DRb store. DRb uses
Marshal
to serialize objects. -
session_store = :mem_cache_store
Memcached is a freely available, distributed object caching system maintained by Dormando. It is more complex to use than the other alternatives and is probably interesting only if we are already using it for other reasons at our site.
-
session_store = :memory_store
This option stores the session data locally in the application’s memory. Because no serialization is involved, any object can be stored in an in-memory session. As we’ll see in a minute, this generally is not a good idea for Rails applications.
-
session_store = :file_store
Session data is stored in flat files. It’s pretty much useless for Rails applications, because the contents must be strings. This mechanism supports the additional configuration options
:prefix
,:suffix
, and:tmpdir
.
Comparing session storage options
With all these session options to choose from, which should we use in our application? As always, the answer is “it depends.”
When it comes to performance, there are few absolutes, and everyone’s context is different. Our hardware, network latencies, database choices, and possibly even the weather will impact how all the components of session storage interact. Our best advice is to start with the simplest workable solution and then monitor it. If it starts to slow us down, find out why before jumping to another solution.
If we have a high-volume site, keeping the size of the session data small and going with cookie_store
is the way to go.
If we rule out memory store as being too simplistic, file store as too restrictive, and memcached
as overkill, the server-side choices boil down to CookieStore, Active Record store, and DRb-based storage. Should we need to store more in a session than we can with cookies, we recommend starting with an Active Record solution. If, as our application grows, we find this becoming a bottleneck, we can migrate to a DRb-based solution.
Session expiry and cleanup
One problem with all the server-side session storage solutions is that each new session adds something to the session store. This means we’ll eventually need to do some housekeeping or we’ll run out of server resources.
There’s another reason to tidy up sessions. Many applications don’t want a session to last forever. Once a user has logged in from a particular browser, the application might want to enforce a rule that the user stays logged in only as long as they are active; when they log out or some fixed time after the last use the application, their session should be terminated.
We can sometimes achieve this effect by expiring the cookie holding the session ID. However, this is open to end-user abuse. Worse, it’s hard to synchronize the expiry of a cookie on the browser with the tidying up of the session data on the server.
Therefore, expiring sessions by simply removing their server-side session data. Should a browser request subsequently arrive that contains a session ID for data that has been deleted, the application will receive no session data. The session will effectively not be there.
Implementing this expiration depends on the storage mechanism being used.
For Active Record–based session storage, use the updated_at
columns in the sessions table. We can delete all sessions that have not been modified in the last hour (ignoring daylight saving time changes) by having our sweeper task issue SQL such as this:
delete from sessions
where now() - updated_at > 3600;
For DRb-based solutions, expiration takes place within the DRb server process. We’ll probably want to record timestamps alongside the entries in the session data hash. We can run a separate thread (or even a separate process) that periodically deletes the entries in this hash.
In all cases, our application can help this process by calling reset_session()
to delete sessions when they are no longer needed, like when a user logs out.
Get hands-on with 1200+ tech skills courses.