Recommendation System Using Spark, ML Akka, and Cassandra

Image title

Reccomendation System


Building a recommendation system with Spark is a simple task. Spark’s machine learning library already does all the hard work for us.

In this study, I will show you how to build a scalable application for Big Data using the following technologies:

  • Scala Language
  • Spark with Machine Learning
  • Akka with Actors
  • Cassandra

You may also like:  Introduction to Recommender Systems

A recommendation system is an information filtering mechanism that attempts to predict the rating a user would give a particular product. There are some algorithms to create a Recommendation System.

Apache Spark ML implements alternating least squares (ALS) for collaborative filtering, a very popular algorithm for making recommendations.

ALS recommender is a matrix factorization algorithm that uses Alternating Least Squares with Weighted-Lamda-Regularization (ALS-WR). It factors the user to item matrix A into the user-to-feature matrix U and the item-to-feature matrix M:

It runs the ALS algorithm in a parallel fashion. The ALS algorithm should uncover the latent factors that explain the observed user to item ratings and tries to find optimal factor weights to minimize the least squares between predicted and actual ratings.


We also know that not all users rate the products (movies), or we don’t already know all the entries in the matrix. With collaborative filtering, the idea is to approximate the ratings matrix by factorizing it as the product of two matrices: one that describes properties of each user (shown in green), and one that describes properties of each movie (shown in blue).


1. Project Architecture

The architecture used in the project:

2. Dataset

The datasets with the movie information and user rating were taken from site Movie Lens. Then the data was customized and loaded into Apache Cassandra. A docker was also used for Cassandra.

The keyspace is called movies. The data in Cassandra is modeled as follows:

3. The Code

The code is available in:

4. Organization and End-Points


Collection Comments
movies.uitem Contains available movies, total dataset used is 1682.
movies.udata Contains movies rated by each user, total dataset used is 100000.
movies.uresult Where the data calculated by the model is saved, by default it is empty.

The end-points:

Method End-Point Comments
POST /movie-model-train Do the training of the model.
GET /movie-get-recommendation/{ID} Lists user recommended movies.

5. Hands-on Docking and Configuring Cassandra

Run the commands below to upload and configure Cassandra:

$ docker pull cassandra:3.11.4
$ docker run --name cassandra-movie-rec -p -p -d cassandra:3.11.4

In the project directory (movie-rec), there are the datasets already prepared to put in Cassandra.

$ cd movie-rec
$ cat dataset/ml-100k.tar.gz | docker exec -i cassandra-movie-rec tar zxvf - -C /tmp
$ docker exec -it cassandra-movie-rec cqlsh -f /tmp/ml-100k/schema.cql

6. Hands-on Running and testing

Enter the project root folder and run the commands. If this is the first time, SBT will download the necessary dependencies.

$ sbt run

In another terminal, run the below command to train the model:

$ curl -XPOST http://localhost:8080/movie-model-train

This will start the model training. You can then run the command to see results with recommendations. Example:

$ curl -XGET http://localhost:8080/movie-get-recommendation/1

The answer should be:

{ "items": [ { "datetime": "Thu Oct 03 15:37:34 BRT 2019", "movieId": 613, "name": "My Man Godfrey (1936)", "rating": 6.485164882121823, "userId": 1 }, { "datetime": "Thu Oct 03 15:37:34 BRT 2019", "movieId": 718, "name": "In the Bleak Midwinter (1995)", "rating": 5.728434247420009, "userId": 1 }, ...

That’s the icing on the cake! Remember that the setting is set to show 10 movie recommendations per user.

You can also check the results in the uresult collection:

7. Model Predictions

The model and application training settings are in: (src/main/resources/application.conf)

model { rank = 10 iterations = 10 lambda = 0.01

This setting controls forecasts and is linked with how much and what kind of data we have. For more detailed project information, please access the below link:

8. References

Books that were used:

  • 6.1. Scala Machine Learning Projects
  • 6.2. Reactive Programming with Scala and Akka

Spark ML Documentation:


Further Reading

Building a Recommendation System Using Deep Learning Models

How to Develop a Simple Recommendations Engine Using Redis

This UrIoTNews article is syndicated fromDzone