DevOps is quickly becoming one of the most sought-after specialties in the software development world. DevOps professionals help organizations automate many processes they currently do manually or semi-manually. This saves companies a lot of money in man-hours as well as makes their processes less prone to human error. DevOps engineers also are responsible for setting up the monitoring of resources and services that the company relies on. Companies desperately need professionals to set up, configure, and maintain all alerting mechanisms so recovery is as quick as possible. Every minute these services are out costs the company money, thus they want to recover as quickly as possible. This is one of the primary focuses of DevOps engineers.
In this post, I will go over 5 main areas of focus that DevOps engineers must be fluent in and know very well. Enough of the introduction, I bet you want to see what areas to focus your attention on huh?? Let’s jump in!
Check out our related YouTube video below:
Monitoring
Monitoring is one of the biggest and most important parts of maintaining software services. Computers are very predictable and repeatable compared to humans, but they do have errors that crop up for varying reasons. You can set up services to be robust enough to recover from some of the most common errors. However, there will ALWAYS be errors that come up that you don’t anticipate. Monitoring allows immediate detection when things are not functioning properly and immediately notify the correct team to fix the issue.
There are many popular monitoring platforms available to choose from. Some of the most popular are:
- New Relic
- DataDog
- Sentry
- Firebase
- AWS CloudWatch
There are many ways to monitor different types of services. There isn’t a one-way fits-all to cover everything. You need to dig in and know what your service(s) are used for and the best way to monitor them. However, there are different avenues of monitoring that when combined offer the best coverage for your services.
Internal Monitoring
Monitoring your services from within your network allows you to quickly notice any errors that may crop up before end-users see them. This can allow you to react quickly if caught soon enough and remediated quickly. However, monitoring from within the same stack/network has major downsides. If your network goes down or somehow loses network connectivity, your internal monitors won’t be able to detect that kind of issue. So let’s look at how we can get a more comprehensive monitoring solution in place.
Internal monitors normally involve more details that alert for issues that don’t affect the whole service but can affect some users. The internal monitors include much more error details in the alerts as well as more frequent tests than external monitors.
External Monitoring
Internal monitoring is great and a needed aspect to monitor your resources and services. External monitors that are hosted by a third-party service or in a separate geographical region and/or separate network will need to be configured. Thus, if the whole network goes down, your external monitors can detect this and alert when internal monitors are unable to function properly.
External monitors are normally not as intensive and frequent as internal monitors but instead focus more on making sure the overall service is functional and has a good user experience (speed, no consistent errors with good requests, etc). These are normally not run as frequently as internal monitors but are a lifesaver when internal monitors cannot detect issues.
Learning how to figure out what to monitor and how to set up these monitors and when to alert is part art/part science.. Experience with setting up monitors and alerts, seeing other errors that weren’t monitored, and then adding monitors for those errors, etc is a nonstop loop of learning and tweaking to get it working just the way you want.
Continuous Integration and Continuous Delivery (CI/CD)
One of THE most important things a DevOps engineer should know well is Continuous Integration and Continuous Delivery (CI/CD). Git source code management allows for multiple people to work on different features or bugs in a project at the same time. Then once changes are ready for production, they can “merge” their code into the production branch. This is different compared to some source code management platforms that “lock” files when someone checks out a file. No one else can check out or edit the file until the original editor checks the file back in. This can slow down development if multiple people are trying to update the file at the same time.
Some of the most popular CI/CD providers as of this writing are:
You can check out some of my posts comparing the above providers here.
Continuous Integration
Continuous Integration allows you to make smaller frequent updates to your codebase instead of many changes all at once. This is especially critical when working with Git as your source code repository tool of choice. Small frequent changes allow for much fewer conflicts when merging as well as easier conflict resolution when conflicts do occur. Unit tests and sometimes acceptance tests are automatically triggered to run when changes are pushed to a repository. This helps to ensure any new changes don’t break existing business logic. If changes do break tests, the changes are small enough that narrowing down the cause of failure is much easier.
Most software projects historically followed what is known as the waterfall method. Of course, this process RARELY ever worked efficiently. Developers would go back and forth multiple times to fix all errors and then try to combine their changes again. This normally took quite a long time and added a lot of time to the software development lifecycle.
DevOps is focused more on a continuous flow instead of teams working in silos for extended periods of time. Continuous integration is part of that philosophy. This allows for teams to test out very small changes quickly and deploy those changes rather than fewer large changes.
Continuous Delivery
You want to enable automatic deployments to your different environments once you have automated testing of changes in place. This allows you to manually test your changes in your development and staging environments before merging them into your production git branch in your repository. If you have adequate monitoring of your development and staging environments, these tests should let you know if your changes broke anything. This lets you verify everything works as expected before pushing it to production.
Automated Testing
You need to be sure to have very thorough automated tests in place when using CI/CD. That way, you can be very confident that your changes did not break any functionality when pushing new changes. Without thorough automated tests, you cannot fully experience the depth of what DevOps can provide over conventional software development lifecycles.
Unit tests need to cover a very high percentage of your code base. You cannot be confident using only automated test results unless your tests cover a majority of lines in your code. You can utilize different code coverage frameworks and file output types when testing code coverage. There are many providers that allow you to upload those files to keep a history of code coverage changes throughout the history of your project. Some include Coveralls, Code Climate, Codecov, and more.
With a very good suite of automated tests and high test coverage percentage, you should be comfortable utilizing continuous deployment to production without manual reviews. This happens a lot with many newer technology companies where they deploy to production numerous times a day. This allows for deployments that can possibly cause issues in production. However, when the development lifecycle is agile and quick from start to finish with small changes, a bug deployed to production can be quickly remediated and pushed to production as well.