Internal SDKs: When to Build Them and When They Become Liabilities
Most product teams pass through a phase where they build an internal SDK for their own API. The intentions are good: to encapsulate authentication, to provide typed bindings, to give developers a faster path to a working integration. The execution often goes wrong in predictable ways: the SDK lags the API by months, it papers over genuine API mistakes by absorbing them into client logic, and it accumulates abstractions that nobody wants to remove because they are load-bearing for some forgotten use case.
This post covers when an internal SDK is the right call, when it is the wrong call disguised as the right call, the patterns that keep an SDK from becoming a liability, the specific anti-patterns that produce most failures, and the alternative of investing in the raw API surface instead.
When the SDK is the right call
Internal SDKs are the right call when the API is stable enough that yesterday's bindings still work today, when the language ecosystem the SDK targets is large enough that the time saved across all integrators exceeds the time spent building and maintaining the SDK, and when the API has enough surface complexity that the typed bindings are a substantive value add over reading the docs. All three conditions need to hold.
Stability is the easiest one to assess and the easiest one to be wrong about. An API that has been in production for less than six months is almost certainly going to need breaking changes, and an SDK built against a pre-stable API will need rewrites at the same cadence. The discipline of refusing to build the SDK until the API has been stable for at least one release cycle saves more engineering time than any other rule.
Ecosystem size is harder to assess because the temptation is to imagine the SDK serving every integrator equally. The honest accounting is that the integrators using JavaScript and Python are 80% of the total, the integrators using Go and Ruby are another 15%, and everything else is the long tail. An SDK strategy that does not start with JavaScript and Python and stop there until the demand is proven is over-investing in long-tail languages that almost nobody will use.
Surface complexity is the hardest to assess. A simple REST API with three endpoints and JSON request bodies does not need an SDK; the developer can write the integration faster than they can read the SDK docs. A complex API with stateful workflows, multipart uploads, signed URLs, and pagination conventions is where the SDK earns its weight. The threshold is roughly when the integration code is more than a hundred lines of glue.
When the SDK is the wrong call disguised as the right call
The most common mistake is building an SDK to compensate for an API that has design problems. The SDK absorbs the design problems into client code: unintuitive URL patterns get wrapped in nicer method names; inconsistent response formats get normalized in the SDK; missing fields get filled in with defaults. The result is an SDK that is much better than the underlying API, and an underlying API that nobody is incentivized to fix because the SDK papers over the problems.
The honest move is to fix the API instead. Every line of SDK code that exists to compensate for an API design problem is a line of code that hides the problem from the integrators who will write their own clients in languages the SDK does not support. Those integrators will hit the same problems the SDK papers over, and they will form a worse opinion of the API than they would have if the SDK did not exist to mask the issues.
The second common mistake is building an SDK before the API surface is articulated as a developer-facing product. An API that exists to serve the company's own front end is not a developer-facing product; it is an internal interface that happens to be exposed over HTTP. The SDK for an internal interface is just code reuse, which is fine but should not be marketed as a developer SDK.
Patterns that keep the SDK honest
The SDK should be a thin wrapper. The benchmark is that the curl-equivalent command for any operation should be obvious from the SDK source. If the SDK is doing transformations, retries, caching, or workflow orchestration that the raw API does not do, those features should be promoted into the API or factored out into a higher-level library that is clearly distinct from the SDK.
The SDK should be auto-generated from the OpenAPI spec or whatever schema the API publishes. Hand-written SDKs drift from the API at every release; generated SDKs are correct by construction. The cost of investing in the spec quality pays back across every SDK and every documentation surface.
The SDK should follow language idioms rather than imposing a uniform style across languages. The Python SDK should look like Python; the Go SDK should look like Go. The "consistent across languages" SDK is one that looks awkward in every language, because every language has different conventions for error handling, pagination, and async operations.
The SDK should be versioned independently from the API. The API can be at v3 while the SDK is at v1.4.7; the SDK version reflects SDK changes, not API changes. The SDK should support multiple API versions concurrently when API versioning is required, with explicit version selection rather than implicit defaults.
The SDK should have a deprecation policy that matches the API's deprecation policy. SDK methods get deprecated when the underlying API endpoints are deprecated, with the same sunset windows. The SDK does not get to invent its own deprecation timeline because that fragments the deprecation story for integrators who use both the SDK and the raw API.
The anti-patterns that produce most failures
The SDK that adds retry logic and exponential backoff at the call site is the most common pattern that looks helpful and is actually harmful. The integrator does not control when the SDK retries, the retry budget is hidden, and the integrator's overall request budget is silently consumed by retries that may not be appropriate for their use case. Retries belong in the integrator's code where the integrator can reason about them.
The SDK that maintains stateful sessions or connection pools is the next most common harmful pattern. The integrator now has to reason about SDK state, lifecycle, and thread safety. A stateless SDK that constructs a fresh request per method call is dramatically simpler and rarely meaningfully slower for typical SaaS API call rates.
The SDK that batches calls together for performance is harmful when the batching is opaque. The integrator does not see the batching boundary and cannot reason about which calls succeed and which fail. Batching that needs to be visible to the integrator should be exposed as a batch endpoint in the API, not hidden in the SDK.
The SDK that includes a full mock or stub harness for testing is harmful because it diverges from the API in subtle ways. The integrator who tests against the SDK mock and not against the real API ships bugs that the mock did not catch. Mocking is the integrator's responsibility, not the SDK's.
The alternative of investing in the raw API
For the four Anethoth products — DocuMint, CronPing, FlagBit, and WebhookVault — the choice was to invest in the raw API rather than to build SDKs. The investments were: clear OpenAPI specs published at /docs and /redoc on every product; consistent authentication via Authorization Bearer headers across all four; consistent error response shapes; consistent pagination patterns; concrete curl examples in every doc page; multiple-language code samples for the most common operations.
The result is that integrators in any language can build against the API in roughly the same time it would take them to read the docs for an SDK. The maintenance cost is zero per language because there are no language-specific bindings to maintain. The deprecation story is simple because there is one surface to deprecate. The honest accounting is that this approach scales further than building SDKs would have, at lower total engineering cost, with broader language coverage.
The deeper observation is that internal SDKs are tempting because they look like leverage and often are at first, but the leverage decays as the API evolves and the SDK falls behind. Investing in the raw API surface — clear specs, consistent conventions, good docs, code samples — is leverage that does not decay because every improvement applies uniformly to every integrator regardless of language. The discipline of refusing to build the SDK until the demand is proven and the maintenance cost is accepted is one of the more counterintuitive but durable disciplines in API design.