/

12 avril 2024

Integrating a Product Referential System: the REST API Proxy Pattern

Integrating an off-the-shelf data management system might be more challenging than paying a SaaS license and branching a few REST APIs. This post will discuss a practical use case with the Syndigo (ex Riversand) Product Information Management (aka PIM) system integration with our global information system. We have finally managed to get through, but it was not as straightforward as in the slideware.

Although the tone of this article might sometimes seem a bit critical of the solution, we are convinced that the Syndigo product was a good choice overall for L’Occitane in terms of features and adaptability.  Another disclaimer is that we report observations made during implementation over the past three years. They might not all stand true today.

Moreover, the pattern presented here can be  – and actually is – adapted to other systems.

This post will discuss the context and implementation of this proxy pattern. In a second post, we will walk through the various levels of our testing strategy, which is a cornerstone of the overall platform’s stability and sustainability.

The Context

As a cosmetics manufacturer and retailer, we provide products worldwide to end customers (B2C) and distributors (B2B). Each product has information relevant to those customers’ choices and must be made available to a wide range of tools, such as eCommerce websites, cashier points of sale, CRM applications, or data scientist magic. The information is fed from different systems and manually enriched by marketing and digital teams. Distributing worldwide implies content translation and adaptation (a product sale claim is different in the UK than in the US), custom fields, and, of course, specific local products.

“We want a single source of truth system for product ‘cold’ information exposed to customers.”

The Product Referential Scope

The referential is aimed at holding only a limited set of product information, other data being filterd either when they do not face the end customer or evolve continuously. Typically, we do not handle R&D workflows (we wait for a product to exit the pipeline), prices (as they are edited by country and channels), stock levels at the different supply stages, gross packaging & logistics.

Yet, even if limited, a lot of data must be made available, such as product description, size and volume, classifications, commercial status, availability date and successors, digital and marketing-specific claims and disclaimers, ingredient composition and formula description, and many more.

Moreover, products can be regrouped within a kit (a packaged set of products) and have siblings and other links to various entities. In the end, a product description contains more than 150 fields.

Where is all this Information Coming From?

Information is primarily fed from other sources, such as SAP, R&D, or SharePoint tables. For this, the product referential will act as a data cleansing cache—many defects can pile up in decades-old systems. Secondly, half of the information is manually managed by various teams, such as digital, marketing, or local markets. They follow the internal validation workflows.

Why a Master System?

The answer is in the architect’s textbook: have a single source of truth for all downstream systems. The alternative is to handle a myriad of applications, flows, and manual operations. This pattern hardly scales, causes quality issues, and requires much tedious effort. Any new requirement opens the door to Byzantine architectural discussions, ugly shortcuts, long times to market… and much frustration.

Time for Change: Buying a PIM from the shelf

A natural solution was to converge to a PIM, which would provide five main features. First, a data store customizable to our internal needs. Second, we need the ability to provide programmatic controls, to either populate fields based on others or to validate some consistency – we will call them business rules. Third, we must handle taxonomy. Fourth, an internal user tool to enter information, with various privileges and validation workflows. Finally, this PIM should communicate smoothly with publishers and subscriber systems. These systems are often proprietary and have little ability to customize integration. Many vendors have not yet widely adopted REST API or pub/sub patterns.

Build versus Buy

L’Occitane is not a tech company. Even if we develop our own products when it strategically makes sense, our goal is to find the most pragmatic approach and opt for vendor solutions whenever possible. An RFP was launched, and we selected the Syndigo – ex-Riversand – solution.

Reality Kicked in : the Walls we Hit

“Integration will be a piece of cake.”

The vendor slideshows were terrific. They assigned senior tech folks to assess and meet our needs. The REST API checkbox was ticked for integration when they provided Postman collections. Everything looked perfect. An integrator was selected. The deal was signed.

Intrinsic PIM (and Integrator’s) Limitations

Son, we discovered first roadblocks and several others popped up down the journey.  We will expose the most significant challenge we met, while the implementation section will describe how we tackled them.

REST API

Almost all organizations claim to be agile, and most software vendors are convinced they offer a REST API. They prove the latter by providing a Postman collection—with, alleluia! JSON payloads—which reassures everyone at the discovery phase. We do not expect everyone to stand at Richardson’s level 3; however, a closer look made a few alarms blink.

The first alarm was maturity. We quickly discovered that we were standing far from level 2, not using verbs such as DELETE, GET, or POST in the appropriate context. We were not even standing at level 1, with standard resource access. We were deep in the level 0 swamp…

Secondly, the integration was intricate. For example, creating a new product should be done through a simple POST command. On the slide, it said: “Post the entity payload, wait for a 200-response status, and celebrate!” In practice, the 200 status comes in any case. It simply states that some requests have been received. To acknowledge its success or error, you must retrieve a requestId, then query the request – multiple times until completion –, parse its intricated response message, and finally celebrate. Even if the overall creation process lasts a few seconds, one must follow all these steps. And the same goes for deleting, updating, etc. We will not mention here retry or pagination schemes here.

Thirdly, the internal model is exposed with all its intrinsic complexity, which makes sense for the tool rather than for the outsider. With all the implied constraints, learning and maintaining such a complication is impossible for developers who simply want a product description.

Finally, we will be charitable enough not to discuss the authentication/authorization schema.

Processing rules

Syndigo’s platform allows one to write custom code, validate some fields, or generate others. They are called “business rules,” expressed in a proprietary language similar to Microsoft macros. The desired complexity can soon be beyond triviality when we want to generate “Does this product contain allergens, based on the ingredient list and line?” Besides the language and the programming interface, there is no possibility of automatically testing the result, generating long development times and regression issues. Moreover, even in simple cases, performance becomes a severe issue.

Data coherence

Entities can have relations with each other. For example, a kit contains several products, or a product is related to variations. This functionality is available. For example, kit k consists of once the product p1 and twice p2. Let aside that, at creation time, the internal IDs of both products must first be gathered, as one cannot rely simply on the external unique identifier. If, for some reason, product p2 is removed from the catalog, no cascading error message will be triggered.

The platform evolution

Every quarter, a new version is released. The – exposed – internal data model can be rightfully enriched. But more treacherous “improvements” can be unleashed, such as asynchronicity in some previous straightforward operation. Of course, release notes – if ever read – do not always mention those intricacies.

Handling environments

As a standard, we have three environments: production, user acceptance test (uat), and development. Let’s ignore the fact that all developers share the latter, and concurrency problems will inevitably result. Some configurations will be changed on the dev instance; hence, they should be cascaded to uat and finally to prod. This manipulation is not achieved through continuous deployment – though theoretically possible – but manually via downloading and uploading various Excel files. This process would eventually be bearable if there were tooling to assess the difference between environments and to limit, for example, the recurrent risks posed by hotfixes configured directly in prod but not cascaded down.

How to move forward?

In this section, we have exposed some practical integration challenges. What should we do with them? The first remark would be, “Why did you select this tool?” The first answer is that the tool meets most of our PIM business needs and is provided by a major vendor. The second answer is that we see similar patterns with many other applications. The last answer is that we discovered most of these problems during the project implementation phase.

So, we decided to follow the motto “trust the team” and to find solutions.

The Integration Response: Implementing a Proxy

Historically, the default pattern would have been to embed the answers to all previous challenges in our WebMethods ESB. However, if we followed this trail at first, it soon turned out to be too tedious.

POC: is it possible to circumvent the level 0 API flaws?

The first question was: “Is it possible to proxy Syndigo’s REST API and expose it through Richardson’s level 2, with verbs, error handling, and resource access?” We launched a three-day Proof Of Concept.

A TypeScript module was written, proving reliable enough to hide Syndigo’s complications in a handful of selected basic and riskier scenarios.

More to the POC: a command line interface

A POC code should not be turned into production. Hence, we decided to use a pragmatic language that would not be our default choice for a longer-term development effort. However, we have pursued the development of this TypeScript library and wrote a series of command line tools. These scripts are used daily to list entities, delete some, and report statistics. Aside from Syndigo entities’ versatility, scripts were written to report configuration divergences between the different deployment environments or to reformat hardly readable business rules.

The REST API Proxy Core Application

The core proxy application was developed in Kotlin – think of a modern Java – and based on a SpringBoot stack, a technology commonly used in the company.

The proxy’s primary purpose is to meet most of the challenges described in the section “reality kicks in.” It exposes more maneuverable REST API to hide Syndigo complications, instabilities, and evolutions.

Feeding data into the PIM

As exposed in the figure above, data flows in from various systems through our ESB, which reformats it. The ESB delivers the data to the proxy API and waits for a 200-http status.

The proxy first orchestrates Syndigo’s workflows. Second, it corrects data unit quality, catching flaws piled up by decades of history and the myriad of tools and humans implied in upstream production. Third, business rules that are too complex or time-consuming to be implemented in the backend are also handled at this level. Finally, it addresses some relation consistency.

A classic software development stack and practices make it more efficient to continuously test and deploy all these features (a more comprehensive article will focus on this proxy testing).

Exposing and feeding data to consumer application

Of course, the preferred way to provide product information is through the proxy REST API itself. All business actors understand the exposed format, and the query syntax remains simple to access resources or filter a list.

As a worldwide retail company, we ship our products… worldwide. This implies, of course, translation but also adapting information to the region or country context. Syndigo offers this feature, but the exposed data is… not trivial to comprehend. The proxy syntax must provide arguments such as country=JP or locale=fr-CA, hiding all the complexity from consumers.

Unfortunately, many tools in our information system cannot query and process a REST API. They are often closed proprietary systems that expect rigid protocols, usually through flat files. One way is to push product updates through our ESB. The ESB will read and publish the new information to the concerned systems when notified of a product change.

Finally, another pattern has proven convenient. Scheduled jobs filter a list of products, transform them to the target system format, and push the result. This later pattern has the advantage of allowing extensive autonomy to consumer teams.

Caching

The underlying Syndigo platform cannot meet SLO, typically for large downloads. To address the issue, we have integrated the SpringBoot caching mechanism to the slight price of refresh time.

Access management

Information security is yet to be a concern throughout the whole industry. To avoid exposing Syndigo’s API – weak –  authentication process and to control access, our preferred route is to go through our API manager.

Deployment

The Proxy API exposes product information to internal and external systems. Our natural choice was to host the application on a public cloud (Azure) and build, test, and deploy it through Azure DevOps tooling.

Lessons Learned

Ultimately, we could benefit from the vendor’s platform while hiding many of its intricacies from the external world.

We have learned many lessons during this product development. The first was that, with a good team and – a lot of – optimism, it was possible to disentangle many dismaying situations. Another lesson was that even if we moved early to the proxy pattern, some logic had already been implemented inside the ESB. But the central lesson – for me, at least – was to realize how far away an entire ecosystem of vendors and integrators is from well-established software development methodology. “Agility,” “automated testing,” “continuous deployment,” “peer review,” “software factory,” “git” – and many, many, more – are words that are blatantly unknown to a large part of our industry.

Finally, we came to a disturbing finding. Even though Syndigo is a significant player in the field and people there have responded to many of our requests, we felt alone addressing the seemingly fundamental integration challenges presented in this article. How do other people meet these challenges? This is a mystery we would love to unveil. Do not hesitate to contact us.

Coming next: How do we test all that?bo

a propoS