In this post, we're going to look at a common, complex analytics set up: the integrated recommendation system. This type of system is found on the best e-commerce platforms and most social-media platforms, such as the webpages of Amazon, Walmart, Target, Facebook, Yelp, Netflix, etc. The goal of these systems is to analyze customer behavior and serve customers the conent--products or other media--that is most likely to engage them.
The reference architecture provided by AWS imagines an e-commerce site that sends marketing emails to its customers based on users behavior on the site. If you take a look at the reference architecture, the first thing you'll notice is that there are three web applications:
The lynchpin to our data analytics process--as has been the case with both of our simpler setups--is going to be a managed cluster, like ElasticMapReduce or HDInsight. This cluster will analyze web logs, which we can use to look at how users are behaving on the site, a database of actual orders, which we can use determine which products users will likely buy, and user profiles, which we use to form recommendations and inform content personalization.
You'll also notice that none of the three systems directly interact with our cluster. The cluster consumes the data output by these systems--mainly the e-commerce platform--and then produces data that these systems can use: especially the marketing email application and the recommendation application. When we loosley couple our designs in this way, we can modify our systems and remain confident that we won't break out analtics system. Similarly, we don't have to embed all of our data in any of our applications. That data can stay in a data lake that our managed cluster works on, and the marketing and recommendation applications can deal with the condensed user profiles.
You'll notice that again, while there is a lot of stuff going on around our analytics system, the analytics system at its core is a data lake and a managed cluster. In my book, Mastering Large Datasets with Python, I detail how you can set up a base analytics system in chapters 11 and 12. You can buy that book at Manning.com.July 19, 2019