Definition

The fundamental facility provided by Feature Toggles - being able to ship alternative codepaths within one deployable unit and choose between them at runtime

By Pete Hodgson

Concepts / Terminology

TerminologyDescription
Toggle RouterFunction that controls which code branch to take
Toggle PointThe location in each code logic branches based on a flag
Toggle ConfigurationThe location where flags live. Examples include:
- config files
- in-memory store
- DB
- distributed system with a fancy UI
Toggle ContextAdditional information, like the user making the requests, to decide which code path to take.

This can take the form of cookies or the HTTP header that represent the user making the request.

Types of Feature Flags

TypeDescriptionOwnerContextLongevityLocation
Release flagsDecouples feature release from code deployment.

Allow developers to merge non-production ready code, minimizing risk of merge conflicts from long running branches.

Release flags should be the last thing you do, prefer smaller releases first.
DevelopersBuild time / run timeOne or two weeksCore
Ops flagsTo be used by system operators that change influence the application’s operational behaviour.

Usually short-lived and retired once operational confidence is developed. That said, a small number can remain as long-lived Kill Switches.

Allows less important but memory hungry features from running during periods of high load
OperationsRun timeMonths to yearsCore
Experiment flagsAllows for A/B testing. This is inherently dynamic (by user/request), but the user/request configuration itself is relatively staticProductRun timeWeeks to monthsEdge
Permission flagsAllows for different users to experience the application differently.

Users can be separated by internal/alpha/beta/premium.

This mechanism allows for dogfooding.
ProductRun timeMonths to yearsEdge

Dynamic routing vs dynamic configuration

NOTE

The longevity of the flag affects how it should be implemented

Implementation Techniques

Decoupling decision points from decision logic

Looking at the following example:

features: dict[str, bool] = fetch_feature_flags()
 
def generate_invoice_email():
    base_email = build_email_for_invoice()
    if features['next-gen-ecomm'] == True:
        return add_order_cancellation_content_to_email(base_email)
    return base_email

This is undesirable as the decision point (generate_invoice_email) has to be aware of the decision logic (features['next-gen-ecomm'] == True)

A better solution looks like this:

features: dict[str, bool] = fetch_feature_flags()
config: pydantic.BaseModel = create_feature_decisions(features)
 
def generate_invoice_email():
    base_email = build_email_for_invoice()
    if config.include_order_cancellation_email():
        return add_order_cancellation_content_to_email(base_email)
    return base_email

This way, the decision logic can evolve independently of the flag

For example,

  • the feature flag controlling the decision
  • the reason being the decision (driven by static config file to A/B testing)

Inversion of control

Make the code more configurable and testable:

features: dict[str, bool] = fetch_feature_flags()
config: pydantic.BaseModel = create_feature_decisions(features)
 
def generate_invoice_email(config: pydantic.BaseModel):
    base_email = build_email_for_invoice()
    if config.include_order_cancellation_email():
        return add_order_cancellation_content_to_email(base_email)
    return base_email

Use the Strategy Pattern

Removes if statements and simplifies the callsite.

def generate_invoice_email(content_enhancer):
    base_email = build_email_for_invoice()
    return content_enhancer(base_email)

where content_enhancer is either a strategy function, or a concrete class of different strategies.

Tips for Working with Feature Flags

How to minimize the complexity of feature-flagged systems?

  • Prefer static configuration
    • Allows for source control, observability and for reproducibility
  • Expose the application’s current configuration state
    • via a file
    • or some metadata API endpoint
  • Use a structured (schema-ful) configuration file, with metadata for the user, including:
    • Description and recommendation of when to use a flag
    • Creation date
    • Primary developer contact
    • Expiration date for short-lived flags
  • Have a process and strategy for managing different types of flags
  • How to test feature flags?
  • How to prevent feature flag sprawl?

How to test feature flags?

  • Instead of it being a combinatorial problem, it’s usually sufficient to test that
    • All flags that are expected to be on the next releases
    • All flags on (assuming the default state if “off”)

How to prevent feature flag sprawl?

  • Put expiration dates—after that date it defaults to one or the other
  • More extreme would be to put a time bomb that fails the test or prevent the application from starting
  • Limit the number of flags. Once a limit is reached, some flags have to be deprecated before new ones are added

Flag Configuration Management

ApproachesDescriptionProsCons
HardcodedHardcode the flag or comment parts of codeSuper simple- Requires redeployment
- Commented code introduces ambiguity and affects readability
Parametrized configurationUse environment variables or CLI flags on program startupAllows reconfiguration without re-building an app- Requires re-deployment or an application restart
- Hard to scale with many applications
Configuration filesApplication reads configuration from a configuration file- Configuration state of application is more easily visible
- Is version-controlled
May require a re-deploy if the file is in version control
App DBApplication reads configuration from DB- Values configurable from DB
- There will be some UI to toggle values
Distributed configurationFor distributed systems where there are multiple node replicas within a cluster
Overriding configurationSome default values are overridden based on some context like the environment- Provides flexibility- Overriding configs introduces debugging complexity
- It also runs counter to Continuous Delivery ideals, as the overrides config may not have been tested in CI/CD
Per-request overridesBased on the Toggle Context- Very dynamic and granular-level of control
- Allows for B testing and Canary testing
- More complex
- Security risks as unreleased (and probably less tested) features are accessible by the public1

Footnotes

  1. this can be mitigated by cryptographically signing the override configurations ↩