Web analytics systems

Big-data systems series - Part 2

Setting up big data systems is a knowledge gap for most developers. They know how to write the code that analyzes the data--maybe how to use Spark, Hadoop, Hive or Pig to process it--but setting up a system so that they can do that is a bridge to far.

In this series, we'll review three common big data architectures in order from least complex to most complex.

  • Base analytics system -- link
  • Web analytics -- link
  • Integrated recommendation system -- link

In this post, we're going to look at a standard web-analytics setup. This system is more complex than the stand-alone data processing system we looked at in the last article in this series. That said, it has a lot of the same components at its core. The largest difference between this system and the standalone analytics system is that the web analytics system is loosely connected to an active web application.

If you look at the AWS reference architecture for web analytics you'll find that the first two components create the data for our analytics processes--they don't necessarily play a direct role. If we abstract this well, we can entirely decouple these pieces with the only connection between the two systems being the log files that the application produces and the analytics system analyzes.

From there, our analytics system revolves around the two core components we looked at for our standalone analytics system: object storage (S3, Azure Blob, etc.) and cloud compute or a managed cluster (ElasticMapReduce or HDInsight). Unlike before, though, where we wanted to place our analysis into an object store, with web analytics, typically we'll want to make our analytics available to business analysts in a data lake or data warehouse system. In this example, AWS provides a stand-in for that with an "analytics database"

Ultimately, though, the web analytics system boils down to having a web application feed our base analytics system.

In my book, Mastering Large Datasets with Python, I detail how you can set up a base analytics system in chapters 11 and 12. You can buy that book at Manning.com.

Mastering Large Datasets

My new book, Mastering Large Datasets, is in early release now. Head over to Manning.com and buy a copy today.