Content Distribution System (aka 'H')
Platform: Web - NodeJS, Express.js, Handlebars
‘H’ is a collection of Node.js microservices oriented to the distribution of content using different channels. The content was stored in a MySQL database using a proprietary CMS format. Then one of those microservices retrieve each piece of content and normalize it to a well-structured & easy to use JSON abstraction. Finally, each service renders that abstraction to specific channel’s format, adapting the content to follow each distribution channel policies & restrictions (i.e. Google AMP, Facebook Instant Articles, MSN, Google News). This microservices net support ~25 requests/sec using only one service with 5 instances of them running, coordinated by PM2 process manager with an NGINX as reverse proxy supporting ~1000 req/sec.
Having an article base of several tens of thousands is a positive thing for any publisher and a source of pride for its editorial team. One characteristic of this type of collection is that it has usually been produced over the course of many years and for this reason, its HTML has suffered the passage of time. Over the years recommendations have changed, teams and tools have matured, but those changes rarely impact on historical archives. So, many experiments with different types of content, embeds of products that no longer exist, others that have changed their format and a long and so on, are part of that list of elements that make up the entire information base.
The real pain arises when, due to the publishing business, such content must be sent through different distribution channels. For example, Facebook Instant Articles, Google Accelerated Pages (AMP), Google News, Apple News, smart TVs and native apps on mobile devices. At that moment you suffer "translating" the old, outdated and bad HTML, trying to remove everything that is not compatible with each channel, having to make costly and not reusable adaptations.
The solution is to generate an abstraction of the HTML of each article and after having it in an intermediate format, build rules to convert it to each distribution format. So, in the future adding more distribution channels only implies the definition of a new "conversor" that map each element of the intermediate format with the desired representation.
These are the core capilities of "H":
- HTML to json-article conversion.
- json-article conversion to each distribution channel format through an easy to implement "translator" with a recursive-asynchronous algorithm under the hood.
- Custom logger with convenient format and traceability indicators to follow request's path.