Scala, Clojure, or JRuby are most often used in SoundCloud's products. It wasn't always like that. In the same way as other start-ups, SoundCloud was designed as a monolithic Ruby on Rails application running on Ruby's official interpreter, the MRI, as well as memcached and MySQL.
Mothership is what we affectionately call this system. The product's architecture was a good solution for a new product used by hundreds of thousands of artists to share their work, collaborate on tracks, and be discovered by industry professionals.
It contained both our Public API, which was used by thousands of third-party applications, and the user-facing web application. We built all of our client applications on top of the same API partners and developers used with the launch of the Next SoundCloud in 2012.
We upload about 12 hours of music and sound every minute, and hundreds of millions of people use our platform every day. It combines the challenges of scaling a very large social network and a media distribution platform.
Our Rails application has scaled to this level as a result of developing, contributing to, and publishing several components and tools to help run database migrations at scale, be smarter about how Rails accesses databases, process a large number of messages, and more. In the end, we decided to fundamentally change how we build products, since we were always patching the system and not solving the underlying scalability issue.
Our architecture was the first thing to change. As a result, we moved towards what is now known as a microservices architecture. Engineers separate domain logic into very small components. A well-defined API is exposed by these components, and they implement a Bounded Context, including the persistence layer and any other infrastructure requirements.
In the past, big-bang refactoring had plagued us, so instead of splitting the Mothership immediately, the team chose to not add any new functionality to it. We built all of our new features as microservices, and whenever a feature in the Mothership needed to be refactored, we extracted the code as part of this process.
Initially, everything went well, but soon we discovered a problem. Since so much of our logic was still in the Rails monolith, all of our microservices had to interface with it somehow.
To solve this problem, the microservices could access the Mothership database directly. This is a common practice in some corporate settings, but because this database is a public interface, but not published, it causes problems when we need to change the structure of shared tables.
In its place, we chose the Public API, the only Published Interface SoundCloud had. Our internal microservices would therefore behave exactly the same as applications developed by third-party companies to integrate with the SoundCloud platform.
In no time, we realized there was a big problem with this model, as our microservices needed to react to user activity. For example, the push-notification system needed to know when a track had received a new comment so that it could inform the artist. We could not use polling at scale. We created a better model.
Amqp in general and RabbitMQ in particular was already used - In a Rails application you often need a way to dispatch slow jobs to a worker process to avoid hogging the concurrency-weak Ruby interpreter. Our co-presenters Sebastian Ohm and Tomás Senart discussed how we use AMQP, but we developed over time a new model called Semantic Events, where changes to the domain objects lead to a message being dispatched to a broker and consumed by whichever microservice finds the message interesting.
We used this architecture to enable Event Sourcing, which is how most of our microservices manage shared data, but it did not remove the need to query the Public API -for example, you might need to notify all fans of an artist and their email addresses about a new track.
Although most of the data was available through the Public API, we were limited by the same rules we imposed on third-party applications. Users could only access public information, so a microservice could not notify them about activity on private tracks.
The problem could be solved in several ways. A popular alternative was to extract all the ActiveRecord model classes from the Mothership and package them as a Ruby gem, effectively turning them into published interfaces and shared components. The problem with this approach was the overhead of versioning a component across so many microservices, and the fact that microservices would be written in languages other than Ruby. A different solution was needed.
Ultimately, the team decided to use Rails' engines (or plugins, depending on what version is installed) to create an internal API that can only be accessed within the company's network. When an application is acting on behalf of a user, we used Oauth 2.0, with different authorisation scopes depending on which microservice needs the data.
Despite constantly removing features from the Mothership, having a push and pull interface ensures that our new microservices won't be bound to the old architecture. Developing production-ready features with much shorter feedback cycles is possible thanks to the microservice architecture. Visual sounds and the new stats system are examples of externally visible features.