Right now, you can find plenty of success stories for telco businesses implementing DevOps programs to improve their delivery models. But when it comes to the related concepts of continuous integration and continuous delivery (CICD), detailed success stories for telco operators are fewer and farther between. This doesn’t mean that CICD is impossible for network operators, mobile carriers, and others within the telco space, but it does bespeak the high degree of complexity involved in defining and adopting the right tools and software solutions for honest-to-goodness continuous delivery in telecommunications.
What this means for testers and other stakeholders within the telco sphere is that, while CICD isn’t exactly uncharted territory, it does present some potential challenges that are particular to the telecom industry. Here are a few of the biggest ones below.
1. Distributed Infrastructure
CICD operates on the principal that your product will be more stable in the long run if, rather than committing new changes to the code base from different teams all at once, teams and users commit at the end of each development cycle (e.g. sprint) and automatically build and test the changes immediately. If we imagine that the average telco operator has a 1,000 new configurations per year (including new firmware versions, devices, system upgrades, etc.), it’s easy to see how a bunch small, immediately validated and deployed changes would be less risky than making a few big updates all at once. That said, the distributed nature of actual telco infrastructure makes actual deployment a much more complex task than it is for traditional software developers. Even in a virtualized, cloud-based RAN environment you still need to push out updates to hundreds of thousands of cell towers—and be able to roll those updates back if needed.
2. Telco-specific Needs
Not only does the architecture of a telco network add complexity that doesn’t exist in the same way for software developers, things like phones and various network elements fundamentally change the way the testing environment has to be set up. While some tests can certainly be done virtually, at some point before deploying new code, those changes will have to be verified against real, out-of-the-box mobile phones (virtual or simulated phones won’t offer the same level of accuracy, owing to minute differences in the simulation versus the real thing). This is where things get tricky from an architecture standpoint. A traditional CICD pipeline might have:
- A continuous integration server
- Version control tools
- Build tools
- An automation framework.
To adapt this workflow for telco operations, you’d need to be able to build a staging environment with an automation framework that could actually orchestrate tests outside the bounds of traditional software testing, i.e. run tests on smartphones and other network elements. This way you can be sure that when you make updates to your BSS workflows, for instance, the automation framework actually records test phone usage (tested with a real phone whose data and voice) and checks it against what’s in the billing system; then, it automatically creates invoices based on that usage and verifies that they’re correct. Without this kind of functionality to make sure that telco-specific backend system—to say nothing of provisioning, dialing and calling, data usage, fallbacks—the practical scope of your CICD process will be fairly limited.
3. High of Failure
Okay, let’s change gears a little bit. For your average mobile application or software suite, the cost of a few bugs or some occasional downtime isn’t necessarily a huge problem: sure, if AWS goes down Amazon is likely to hemorrhage cash, but if the Neopets site crashes because they were trying to push out a new feature, it’s hard to imagine that they’ll take a huge loss. For telco operators, things look a little bit different. If an error in a new deployment makes it impossible for users to make 911 and other emergency calls on your network, you run the risk of huge fines mounting by the hour. Other failure scenarios can prove almost as costly. While CICD is designed to decrease the likelihood that something like that could happen by increasing stability overall, your integration and deployment systems have to be built with these incredibly high stakes in mind. The “move fast and break things” model is categorically not going to work here. This might mean paying particular attention to how easily you can revert to previous builds, as well as prioritizing readable documentation that can make it easier to perform root cause analysis as needed.
4. Network Functions
For most continuous integration systems within the telco sphere, a certain amount of network functions virtualization (NFV) is going to be a necessity. At the same time, NFV can present challenges in terms of isolating the systems or devices under test—sometimes leading to what is effectively black box testing. Telco testers need to find a way to balance the need for virtualization with the need for readable, accessible end-to-end tests that can be automated within the context of a larger CICD deployment. This is another concern that may come down to your choice of automation framework, but it will also be a matter of choosing the right tools to create visibility and maintain a workable staging environment.
Related to the challenges presented by the high cost of failure is the issue of regulatory compliance. This is another area where the “move fast and break things” approach that some development teams promote has the potential for huge consequences. Traditionally, ensuring regulatory compliance has added time to testing flows and created something of an added burden for operators in general, and CICD can’t necessarily change that. This may be of particular concern to operators and equipment manufacturers who are looking ahead to the influx of widespread 5G deployments expected in the next few years. For operators who might essentially be building new 5G networks from scratch, you’ll need a test suite that can ensure that you’re achieving upload speeds, download speeds, latency times, and other factors that are within the 3GPP’s guidelines. Not only that, you’ll need to have robust documentation that can demonstrate that you’ve achieved desired results—or to point you towards areas of improvement if you haven’t.