Don’t get yourself in a situation where you spend most of your time doing damage control after problems become unavoidable. Running around putting out fires is usually unproductive beyond temporary patches and band-aids. Use your system to keep watch on issues that may arise from resource limitations, application errors, and the health of the system it’self.
A good approach may be to develop a daily procedure for daily system checks. A good place to start is with log files. Logs can be collected from multiple locations within your system. You should have a process to review the logs and to analyze what these logs are providing for clues to the health and security of your system. Some of the available information you may access and retain for further review or documentation.
Logs can provide insight into configuration issues, buggy software, and your system’s security. They can also provide you with hardware and resource status.
Even if you have automated your system checks, log alarming, and/or use centralized logging via a dedicated monitoring application, it is useful to understand how to manually retrieve and analyze some of this information.
If I want to see what tools/commands are available on a Linux system check the /usr/bin/ directory. This will show you what executables are available on the system you are working with.
For a quick error check through the log files located in the /var/log/ directory use:
sudo grep error /var/log/*.log | less
This will show all the plain text with the word “error”,
Also replace “error” with “fail” and remove “less”
$ sudo grep fail /var/log/*.log
If your system is running systmd then the “journalctl” command will print the messages logged.
sudo journalctl | grep error
Don’t limit yourself to only looking for obvious errors noted in the text messages.
syslog also contains a lot of useful clues that you can filter by keywords associated with severity levels:
emerg – alert – crit – err – warning – notice – info – debug.
below simply filter for the keyword “warning’ to look for warning messages.
/var/log$ cat syslog | grep warning
Your system most likely has a lot of logs that you can review, and usually there is a graphical program or programs to check logs and system resources.
combine checking your logs with checking real time resources. You have tools such as “top”, “du”, “free”, “netstat”, “ping”, “ifconfig”,”lsof”, and the usually very interesting “uptime” and “who” that compliment your log file checks that you can do manually to gauge how your system is functioning.
Memory, CPU, Network, and application logs can be useful for a proactive approach to maintaining the health of your system and application functions. If you have a system for monitoring everything, than make sure you monitor the monitoring application.
There are a lot of tools and resources to keep ahead of issues that could at some point create a lot of problems and even system failures.