hello anons, Celt here to talk about common mistakes that network engineers make when setting up networks in the cloud. These mistakes can be very costly to your organization, so a little bit of planning can really help you out.
Overlapping IPv4 CIDR Ranges
Some of you are already cringing. This is applicable to your private IPv4 CIDRs. Remember IPv4 has private IPs due to the lack of IPv4 public addresses. CIDR stands for Classless Inter-Domain Routing, and is the range of space (think IP addresses) allocated to a network. The issue with overlapping CIDR ranges is that the systems will not be able to send and receive Your on-prem infrastructure is going to have a CIDR ranges, and you do not want to overlap your cloud CIDR ranges with that range. Additionally, you do not want to overlap any CIDR ranges between cloud providers. Planning starts to become very important, and I would recommend to plan extensively before this becomes a mistake later down the road. When you are planning, make sure to plan for a second cloud provider. You will thank me later. It seems crazy to plan a CIDR range for a second cloud provider, but the future is multi-cloud.
Recently, I was working to connect a newly built application in AWS to a legacy on-prem application. We discovered the connection is failing because the CIDR ranges are overlapping. For this particular situation the cost of fixing this issue is greater than the value of integrating these two apps, so they will not be connected.
Fixing overlapping CIDR ranges is going to be painful. I would recommend following a solution from this AWS Post. Turbo Note: If you are building a large microservices architecture you may be forced to implement something like that, one of the tradeoffs for microservices.
Not Planning for Growth
When you dedicate a CIDR range for a network you will need to come up with a rough estimate of how many IPs are needed. No matter what that team tells you, you need to build in extra IP space. The number can range, some people prefer building in 2x space, and i have also seen 1.5x. If you go lower than 1.25x, I would begin to question your judgement. The amount of space you build in can also be variable, if the application is new and growing fast maybe 2x is good, if the application is very mature maybe 1.5. The important thing is that you build some extra space and document your decision with the rationale (in case you need to prove you did due diligence). That way if someone is angry there is no IP space left, you can at least say you followed best practices and did you diligence and built in X extra space.
Turbo Note: I like asking how much space someone plans to build into a network for a systems design interview. Interesting to see the reasoning and thought process.
This happened to team I was supporting. There was an AWS account that was only supposed to be used for one application, so the network team did not built in much extra IP space (like 1.1x extra space). Of course, the business requirements changed and we needed to spin up a new app in this certain AWS account because of its integration to the existing application. Of course there was not enough IP space to spin up the application and we could not attach an additional IPv4 CIDR range to the VPC, so the VPC had to be terminated and rebuilt with a larger CIDR and extra space. That took time and management had to push out timelines.
Not Leveraging Redundant Fiber Lines
This one might seem like a stretch to some of you but let me explain. I am going to use AWS in my example but the lesson is applicable providers that offer similar services. When you set up a connection with a cloud provider you will have the option of a fiber optic line (in AWS its called Direct Connect (DX)), or a VPN connection to the cloud provider. The VPN connection will work for some use cases, but many companies opt to have a DX line for network latency and reliability requirements. A common set up is to have a DX line and a VPN connection as backup in the case of a failure with the DX line (fiber optic gets cut, issue at the partner location, hardware failure etc). Now the issue with this can be that the VPN connection most likely will not be able to handle the full load of the traffic. AWS DX lines can be 1,10,100 Gbps, and AWS Site-to-Site VPNs support up to 1.25 Gbps. If you have a 10 Gbps line and it gets cut, and you have to send all that traffic through the VPN you are going to have performance degradation. Consider having a second DX/Fiber line instead of having a VPN as as backup. The money/business lost from the potential performance/availability issues could be smaller than the cost of a DX/Fiber line. You would need to perform a risk analysis and determine the potential impact to business if you had to flow all traffic through the VPN, usually its not good. Hopefully from reading Bull you understand risk management and hedging and can figure out what is the best thing to do here. Do not just say ‘it will never happen’ because it does.
Friend at a financial services company had this happen. Fiber optic line (10 Gbps) was cut during construction and the company had to fall back their traffic on a VPN connection. In this case there was performance degradation which impacted customer experience, which led to money being lost.
Wrap Up
thanks for reading anons, please consider subscribing. It was interesting to work more closely with my firm’s network team, I did not have much insight into the challenges around hybrid and multi cloud network engineering. Hopefully these insights provide some value, these are all mistakes i have either heard about first hand or experienced.