Pattern for Transactional Full Stack Application
I recently involved in a service that perform subscription transaction for SaaS platform. Customer can buy certain seats of base products of different tier online which they pay periodically and on a per seat basis. There are also add-on products that is either per seat recurring, or flat cost recurring, or flat cost but one time charge. Customer can also choose to pay with different billing period, and they might also try certain products.
There is also various features. Customer will get specific discount when seat is above specific number, Trial customer can’t purchase more than 3 seats, only customers of certain tier can change billing period, ip address from USA can’t purchase specific products. We could also send promotional links that customer can use to get promotion when adding a specific add-on with discounts.
Similar problem existed in other applications. Stock broker might allow trader to enter buy or sell, limit or market, on specific ticker. Restaurant might allow customer to order a number of Pizza of up to 5 different toppings, and 2 different sizes.
There is some problems that we have already found.
- There are some transactions that we are allowing customer to choose in front end but when it is submitted, it will fail in backend.
- Some customer’s transaction is corrupted. For example, some are charged monthly amount while they paid annually. It is not clear how they get there (hacked?)
- We have complex code that we have to keep in sync between two different modules, sometimes those modules are even in different repos (for example, replicated in both front end repo and back end repo)
- There is explosion of cases. Sign up, upgrade, buy seats, billing period change, cancellation, reactivation, month to annual. Each of the transaction can be directly entered online, or through a promotional link that only allow a customized variation of transaction. At least 14 cases are present.
- When external SaaS service has to be reconfigured without actual change to our own functionality, we have to make a lot of changes to our code too. For example, we used Zuora billing service and there is cases we need to change license product defined in Zuora for accounting purpose, but to do that, we have to rewrite bunch of our code including those in front end that referenced old product by Zuora Product ID.
- There are a lot of postmortems. About once every month.
Deep down, the problem is really
- Validation logic is duplicated in front end and back end
- Code is not reused across transactions, resulting in more edge cases
- Transaction consistency is not guaranteed inherently
- Lack of regression to build confidence
- Coupling of Zuora configuration to our code base
We made some choices along the way to solve this type of problem. One is abstract all transaction interface into three schemas: Context, Proposal, Request. The idea is these three
- Schemas are mostly transaction independent, and code is thus sharable.
- The same request is validated against proposal in front end and backend.
- The same Turing Machine is used to validate the request against proposal.
- Nothing except proposal can determine whether request is valid or not, which means there is no hidden contract between front end and back end.
- Proposal builds “version” in and request must contains the same version to avoid race condition.
Below is a block diagram showing what produces context, proposal and request. The context is usually identifying information and discount code etc that alters the proposals. Proposal is used to render the form to user, and request must be allowed by the proposal and is sent to backend together with context.
Executor will obtain proposal from backend directly. If the allowed contract between front end and back end is 100% encoded in proposal, the executor can validate the request is allowed by the contract. This validation includes ability to ensure the state being modified in backend hasn’t been mutated since the proposal is calculated.
Context contains who is the user, ip address, coupon code etc that will determine what proposal backend will present. Proposal encodes exactly what user is allowed to submit in a request. In the case there is only one request allowed and no flexibility provided to user, the proposal can be identical to request object.
Here is the example object for Context, Proposal and Request:
The key idea is serialization of the proposal to eradicate implicit contract and enforce code reuse.
End to End Regression
One of the key realization is, previous generation of engineers don’t intend to not reuse the code, but they don’t build end to end regression, so they felt it is risky to alter or refactor another flow in order to reuse it in a new flow. Every time a new flow is created, the default decision is to copy all the code to a new flow. Over time, the flows start to diverge and each contains its own tribal knowledge to a point of no return.
In the new iteration, we build a massive regression over time. See a snapshot of it. It runs every a few hours and build strong confidence in the team. Each of the green dot below is about 4~5 tests. We had 92 tests as of today with 17 contributors over course of 9 months.
When you have a massive cypress suite, you will encounter stability issues. Here are some solutions
- run against dev, staging at the same time, considering test as passing if one of the environment passed.
- write a wrapper that capture cypress output and check if it is failed for one of the stability reasons, if it is, rerun in a different docker. However, if it is not one of the stability reasons, such as mismatch, don’t rerun in new docker
- rerun the test once if failed
- mark the test as unstable when specific output is obtained (such as chrome crashed etc), instead of failure to minimize false alarm
We also have the following learnings:
- capture the output of backend and store in “event” system that can retrieved from cypress to validate the app does the right thing.
- make a call between pre-merge test vs post-merge test. Pre-merge prevents failure from one developer propagated to another user, but pre-merge also makes it hard to test back-end.
Expanded Transaction Model
We also have to clarify of our transaction model along the way due to the complexity, particular the number of internal micro-service, database and external service we integrate.
Transaction model is typically considered as a database concept. Most of the discussion is about ACID, CAP, sharding, lock/optimistic lock, 2PC. Expanded Consistency Model refers to situation to retain consistency across databases, micro-services and even corporate boundary.
The common theme in those situation is database centric ACID model is not always realistic. Even if it could be made realistic, it is clear how it is scalable and can separately evolve. However, I’d still use existing database transaction model to build expanded consistency model.
In the expanded consistency model, we still try to build consistency using “operations” that itself is transactional. We identify call those “operation”. There are following properties of operations:
- Non-mutation operation
- Transactional operation (those that can be rolled back together)
- External Operation (only 1 is allowed, must be all or none)
- Best effort operation
- Idempotent operation
Below is order of execution for above types of operations. One key thing is where the Transaction Scope is. Non-mutation operation must be done before the Transaction Scope, and Best effort operation should be done after that (whose failure should not cause the whole request to fail). Only one Non-revertable operation is allowed and it must be placed at the end of Transaction Scope. Watch out for post commit hooks, as if there is non-transaction operation, post commit hooks are effectively best effort operation.
Idempotent operations should be queued in a task queue and executed after request is completed asynchronously.