Integration testing is hard

Image generated using Dall-E

If we think about the testing pyramid, at the very bottom we have Unit Testing, this is the most basic testing mechanism, in theory, we can think of our code in terms of Units, a Unit is the smallest piece of code to test, it can be a function, a class or a module. The idea is to only test that piece of code, that function and only that, however, there are cases where one function calls another or makes a query to a third party service or a database and things get a bit complex. What if the database is unavailable?, what if the third party service is out there, somewhere in the cloud and I don't have Internet connection?, my tests will fail even because it can't connect even if the logic is still correct. This is when the concept of Mocking comes handy, mocking is a technique used in unit testing to replace a function call or an object that interacts with a third party service or a database so that the function calls return something we know and the test runs in isolation without any third party call, we can setup the mock to return success or error results depending on what we are testing for or just assert it was called.

Integration testing, on the other hand, is a complete different story, when we are running integration tests we want our service to call all those third party external services and connect to the databases and other storage layers it relies on because this is what we are testing for, how our system interacts with the other components, these other components could be a database, some other services built and maintained by a different team, or some third party completely external APIs we need to consume from an external provider.

While databases are part of your infrastructure and more often than not we are able to replicate them locally just by installing the DBMS in question or via containers, sometimes third party services might be more difficult to access, depending on the setup, some might be available only via VPN or on certain IP range or perhaps only accessible on-prem or might be complex to setup locally for development.

I'm working on a few personal projects that interact with different social APIs for several use cases (twitter (X), facebook, google, instagram...)

Different response types, different flows

When I'm interacting with a third party service, I can get successful or error responses, depending on this I'll take one flow to show the user the result or process the data I need from the REST call, for example, or a different flow to mitigate the error or return an error response to the user.

If I get an error response, there could be also different types of errors for which I might need to take different flows in my application, testing these could also be challenging, I can force 4xx errors easily, but server side errors such as a request timing out, requests being throttled would be hard to force in order to verify how our system is able to gracefully degrade and handle the errors by taking the correct flow.

Complex scenarios for simple tests

Sometimes, to use certain third party services I have to go through complex authentication flows (OAuth anyone?) to have access to the endpoint I was to test the integration with just to force a client error to test if my system under test handles it correctly, this is ok when I'm doing my acceptance testing rounds or running the integration tests to validate the software I'm about to release to my production environments, but for simple development tests on a development or local environment this seems overkill and time consuming.

Ideally, at least on local development environments, I would like to have a way to bypass all that authentication logic and only test the error I'm about to force to verify how my system will handle it, unfortunately there's no way I can control that because, well, it's a third party service. Even on sandbox environments those steps are enforced.

Failure doesn't mean it's not working

An integration test case can fail because the scenario is not handled correctly by the system under test, this is expected, tests should detect bugs before the application is released to production. However, there are times when the code is correct and the test still fails for other reasons.

When we are testing integration we do want the system under test to interact with all the other services it's supposed to talk to, this adds a risk for the tests to fail if, for example, the other system is having a bad deployment and sends a 500 response (Internal Server Error) when it should be a 404 or a 201 for example, or if, for some reason, it is unavailable and the requests time out, these scenarios should be covered by the test suite, and I should have ways to easily simulate it and control it for my development environment so that I can test and build the graceful degradation for this, but if it happens in tests where I'm not checking for that, it's a problem, code might be in place to handle it seamlessly, but the conditions of the scenario make it fail.

In my opinion, this is ok for my CICD pipeline, before deploying to production, but if I'm testing locally or building something on top of an integration with another service, I shouldn't wait for it to be available so that I can work. I often rely on unit tests to pass a hardcoded payload to the functions I'm writing and check it produces the expected output, that way, when I test the integration there are higher chances for it to work, however, there are cases where I cannot test locally at all because the service I integrate with is not accessible from the location I'm at, so I cannot run the application and test the feature from the UI.

Integration testing is not easy, it isn't easy to do on pre-production environments, it's not easy to do on staging environments and, for sure, it's not easy to do on local environments, it requires a lot of setup to get it right at large scale, a lot of monitoring to validate false negatives and false positives and a lot of patience when it doesn't work.