We just raised a $30M Series A: Read our story

Top 8 IT Alerting and Incident Management Tools

PagerDutyxMatters IT ManagementEverbridge IT AlertingBigPandaOpsgenieVictorOpsSend Word NowFortiMonitor
  1. leader badge
    It reduces the amount of white noise. If something comes through, then it will alert somebody. However, if it's a bit of white noise that comes through at night, then it gets dealt with the next day. Everything is visible to everybody. It's not just a single person getting an SMS, then going, "Oh, I'm not going to worry about that." The visibility to everybody on the team is one of the great things about it because it reduces the white noise.
  2. leader badge
    Workflows and messaging are most valuable. Workflows are very useful. They are important for consolidating information or stopping duplication from happening. We put all the information into xMatters and then the workflow will push the same information in the correct format directly through to other applications that our end users frequently use, such as Slack, email, and Workplace.
  3. Find out what your peers are saying about PagerDuty, Everbridge, BigPanda and others in IT Alerting and Incident Management. Updated: November 2021.
    552,695 professionals have used our research since 2012.
  4. The post mortem reports are descriptive, indicating who joined the call and when. A robust solution with multiple modules that can be leveraged.
  5. BigPanda integrates well with other solutions, such as WatchGuard,The main thing that we like about BigPanda is the user interface.
  6. The integration feature is the most valuable. It provides a lot of customizations for the integrations we use. OpsGenie has many features, such as email notification, SMS notification, roster, tracking of the tickets. Automation, like scripting, is also possible. There are also features for maintaining the history of the tickets and all the solutions related to how it was resolved previously. If there are similar kinds of tickets, we can look at how a person is working on that ticket. If he doesn't have any idea, you can look back at a similar ticket and solve it as the previous person did it.
  7. Transmogrifier and automatic solution report gives me a report with the solution and the way to solve issues when an error occurred.
  8. report
    Use our free recommendation engine to learn which IT Alerting and Incident Management solutions are best for your needs.
    552,695 professionals have used our research since 2012.
  9. The placeholder dropdowns for message templates are useful.​It allows for a systematic and uniform method of alerting personnel in every location.​

Advice From The Community

Read answers to top IT Alerting and Incident Management questions. 552,695 professionals have gotten help from our community of experts.
Rony_Sklar
Hi dear community, Can you explain what an incident response playbook is and the role it plays in SOAR? How do you build an incident response playbook?  Do SOAR solutions come with a pre-defined playbook as a starting point?
author avatarMaged Magdy
Real User

Hi,


what an incident response playbook? 


Incident Response Playbook is the guide lines and group of processes, policies, plans, and procedures, along with appropriate oversight of response activities, that  the organization should take to make a proactive response, quick containment, effective remediation and action plan with "what if" scenario in case of certain cyber incident has taken place.




How do you build an incident response playbook?


Regarding to NIST, to build an Incident Response Playbook you need to design the process which contains 4 main phases:


1- Prepare.


2- Detect and Analyze.


3- Contain, Eradicate and Recover.


4- Post-Incident Handling.


*reference, NIST Computer Security Incident Handling Guide:


https://nvlpubs.nist.gov/nistp...


*reference, SANS Incident Handler's Handbook:


https://www.sans.org/reading-r...





Do SOAR solutions come with a pre-defined playbook as a starting point?


- Sure, most of SOAR solutions today comes with predefined templates. However, it's a double-bladed weapon based on Cyber Security Awareness and maturity level of the organization. If it's implemented with no or low maturity level, it may harm the organization production and utilize the resources improperly.



 

author avatarDavid Swift
Real User

Incident Response playbooks detail how to act when a threat or incident occurs. PICERL - Preparation, Identification, Containment, Eradication, Remediation, Lessons Learned (From SANS).  The playbook outlines what to do at each stage.


Typical SOAR playbooks automate the response to detected threats.


- Create a Ticket to Track the Incident


- Identify the source and target


- Confirm the attack is suspicious (SOC Analyst Lookup, On known blacklist? other events?)


- Contain or Clean the Host (EDR, Patch, Update AV...)


- Block the Known Attacker (on a Firewall, IDS, etc...)


- Disable a Compromised Account


- Notify anyone necessary 


SOAR actions include scripts to set or fire off actions on devices.


A playbook usually has a series of actions when a threat/incident is detected.


Most SOARs include playbooks, but they have to be tailored and customized to the specific devices you have in your environment (Palo Alto Firewall vs. Checkpoint, Cylance vs. McAfee EPO...), Ticketing System integration, SIEM/UEBA threat detection integration...

author avatarRobert Cheruiyot
Real User

Hi Rony, 


Playbook automates the gathering of threat intelligence from a myriad of sources of threat intelligence. Playbooks ingest alerts from tools like SIEM and scan the alerts against the threat intelligence sources like VirusTotal and others in order to get information related to the alert. Playbook for example can scan suspicious domains /IPs against virus total and provide reputation score of the domain/IP.


Depending on the workflow, the playbook may be configured to close a case if it's a false positive or pass the case together with threat intelligence gathered to SOC Analyst for further investigation. This way the playbook will reduce time spent on false-positive alerts. Also saves time for analysts by automatically gathering threat intelligence instead of analysts doing that manually. 


Be careful of cases where you set alerts to be automatically closed though. You can try this on some community editions soar platforms: Splunk phantom, SIEMplify ...


Building a playbook


Magdy has provided perfect industry standards for building playbooks. Just a little, the playbook mainly has actions and decisions. Actions: take an action against an alert (like scanning) and based on the results playbook decides what to do with the results: whether to close, do further scanning using other tools, pass it to the SOC analyst and this really depends on your workflow.


I am a junior but I love this SOAR thing.

author avatarSimon Thornton
Real User

For a given incident type, it describes a series of actions that can be a mixture of automated and manual steps. When you start, the steps are often manual. As the playbook and confidence in the steps improve, you can start automating.


For example a playbook for a “suspicious email” might read as:

1) check if the case is already opened for this user and/or asset, if yes go to step 3open case


2) open case and record details


3) extract suspicious attachment


4) generate MD5 and SHA256 hashes


5) submit hashes to Virustotal and record results


6) if 50% (pick your threshold) of AV engines detect the sample skip to step 10


7) forward email attachment to sandbox


8) does a sandbox report indicate suspicious behavior? If yes escalate to T3


9) inform the user


10) open a ticket to IT to re-template PC or fix


11) when you receive a response from IT about the ticket, then close a SOC ticket with relevant closure details


This is a quick illustration of what steps should be included depending on your environment and how far you go. 


Each step could be related to different teams.


IT Alerting and Incident Management Articles

Evgeny Belenky
IT Central Station
Nov 19 2021
Hi community members, Spotlight #2 is our fresh bi-weekly community digest for you. It covers cybersecurity, IT and DevOps topics. Check it out and comment below with your feedback! Trending What are the pros and cons of internal SOC vs SOC-as-a-Service? Join The Moderator Team at IT… (more)

Hi community members,

Spotlight #2 is our fresh bi-weekly community digest for you. It covers cybersecurity, IT and DevOps topicsCheck it out and comment below with your feedback!

Spotlight 2 - community digest

Trending

Questions

Share your experience with other peers by answering the questions below!

IT

Security

    DevOps


    Articles

    Community members share their knowledge in the articles below.

    Also, you're welcome to check our previous community digest here.

    Community Team,

    IT Central Station (soon to be PeerSpot)

    (less)
    CristianoLima
    Senior IT Infrastructure Engineer at Tecnoage
    Nov 05 2021
    Keeping up with the evolution of cybersecurity and the threats that are haunting the IT industry across all industries, this text pays special attention to ransomware, as this practice is on the rise in the world of cybercrime. Let's focus on the subject, specifically on the Healthcare sector. We… (more)

    Keeping up with the evolution of cybersecurity and the threats that are haunting the IT industry across all industries, this text pays special attention to ransomware, as this practice is on the rise in the world of cybercrime. Let's focus on the subject, specifically on the Healthcare sector. We are based on Sophos' annual report on cyber threats, which discusses the continuity of ransomware attacks, the impacts on those who fell victim to these attacks, the ways and costs of remediating an attack. 

    Due to the Covid-19 pandemic, the significant increase in remote access during the pandemic and, in many cases, the use of BYOD's with access to corporate environments, became vehicles and possible gateways for cyberattacks mainly by ransomware, this is due to to the fact that there were no preparations and a structured form of containment and prevention against malware on unmanaged devices, as the urgent isolation measures forced the availability of immediate access to corporate environments remotely and in many cases without at least the sector IT administrators to know if the device that would gain access to their networks had countermeasures preventing malware, such as an active and reliable endpoint.

    The data showed that 34% of those taking part in the survey reported having already suffered some type of attack by ransomware, which is good news compared to 46% of victims in the retail sector, thus leaving the healthcare sector below the average of those who participated in the survey. 

    This light on this problem is due to the obligations on health organizations to disclose data related to information security given the importance of the sector. Globally, there was a reduction in the volume of attacks compared to 2020, where 51% of victims admitted to having been attacked and criminals were successful in the attack.

    Comparing these points with data from other industries, we see that attackers have a much higher success rate in encrypting healthcare data (65%) than the global average (54%). Healthcare organizations are also less successful in preventing attacks than the global average: 28% versus 39%. This low performance can be related to the lack of interest in investing in information security, overload of the IT team or even the lack of a specialized collaborator or partner, and also the presence of legacy equipment that has little or no preventive condition. and combat cyberattacks, thus becoming easy access points for potential intruders.

    Healthcare organizations, unfortunately, are one of the sectors with the highest probability of paying for data recovery among all the sectors evaluated, around 34% of respondents admitted that they would pay to recover their data, against 32%, on average, of the interviewees from other sectors. The sector most likely to pay data rescue is utilities, energy and fuel, about 42% of respondents in these sectors admitted that they would pay for data rescue. Another point to comment is the inability of healthcare organizations to recover their data through backups; Globally 57% of organizations that had their data encrypted were able to recover their data, while only 44% of healthcare organizations were able to recover data using backups, the second-lowest rate among all surveyed industries.

    What the attackers omit is that even if the ransom is paid, there is no guarantee that the data will be returned or decrypted, only 65% ​​of those who paid to recover their data were successful, and about 29% of the organizations that paid for the ransom, got only 50% of all their data while 8% received the totality of the hijacked data.

    While the subject of the agenda is the number of attacks, percentage of data recovery and recovery format, the data is worthy of concern because it directly affects the technical competence of the teams and also the tools available to each team. However, when we shift the focus to the costs that each hijacking can incur for each organization individually or apply the volume globally, we have the exact proportion of damage caused by ransomware attacks. The average amount paid for each ransom globally is $170,000, however, for healthcare organizations this average amount is lower, being only $131,300. Values ​​vary widely and are always considered by the size of the organization and the potential of its financial resources.

    When we move to general costs involved in recovering data hijacked by a ransomware attack, considering the organization's operations downtime, lost hours, device costs, lost opportunities and the ransom value, the average among the evaluated sectors is 1 .85 million dollars and for health organizations, this value was 1.27 million dollars. There are several likely factors behind lower healthcare costs. First, healthcare organizations tend to have lower budgets than other sectors, limiting the amount that is available to be spent on remediation. At the same time, in many parts of the world, health is a public service.

    In the future of prevention and/or remediation of ransomware attacks in healthcare organizations, the expectations of an attack or even a successful attack is an alarming reality, as 63% of respondents expect to suffer some type of attack. of attack and only 37% report not foreseeing an attack like this. Assessing this variation, we see differences in attitudes and confidence in dealing with a ransomware attack. Among the health organizations that were not hit by a similar attack, but anticipate that they could be attacked, they state that this volume of attacks is due to the fact that as a result of the pandemic, more units of health organizations are targeted by the attackers, thus increasing the percentage compared to years prior to pandemic times.

    Assessing all the data, it is understood that despite the growth of attacks and the percentage of their success, healthcare organizations are one of the organizations most attentive to the evolutionary sophistication of ransomware's, and even if a part has not suffered attacks, was certainly influenced by the experiences of others in decision-making regarding the prevention and/or recovery of the hijacked data.

    Therefore, it is good practice that some preventive or corrective actions are taken, and therefore we recommend that you keep in mind that, primarily, no sector, country or organization is unaffected by the potential for ransomware attacks, taking into account the prevalence of this attack format. Therefore, the recommendations are as follows:

    – Plan, invest and specialize not only in backups but in the prompt restoration of your data; multiple backup copies become costly when the restore time or format is ineffective;
    – Use redundancies in your backup copies always in 3-2-1 format, we must always keep 3 copies on different media and preferably one of them is in an offline format, in case of data recovery through the other two is not possible formats;
    – Use layered protection. Also, use technology and humans in favor of your success in containing or repairing losses after a ransomware attack, invest in your information security team or provision specialized third parties who can collaborate with your team;
    – Understand and leverage the use of AI that has the potential to provide immediate detection and prevention when an attack may be about to happen or as a result of one;
    – Avoid paying the ransom, although this is the first option, studies show the ineffectiveness of this action and the expense it generates while your organization is at a standstill;
    – Always have a malware recovery plan, because in these cases, prevention is always the best option, as malware infection or data hijacking through ransomware can have external as well as internal sources. Beware!

    Source: Sophos (State of Ransomware in Healthcare 2021)

    (less)
    Netanya Carmi
    Content Manager
    IT Central Station
    Oct 14 2021
    We receive alerts all day long - alerts about emails, incoming Whatsapps and SMSes, posts on social media, etc. At some point we become desensitized to these alerts and stop noticing them anymore - a phenomenon known as “alert fatigue.” Seventy percent of a SOC analyst’s workday is spent dealing… (more)

    We receive alerts all day long - alerts about emails, incoming Whatsapps and SMSes, posts on social media, etc. At some point we become desensitized to these alerts and stop noticing them anymore - a phenomenon known as “alert fatigue.” Seventy percent of a SOC analyst’s workday is spent dealing with alerts, so SOC analysts are more at risk for alert fatigue than pretty much anyone else.

    SOC analyst and IT Central Station user Geofrey M. says that he receives more than 20,000 alerts a week - and 60% of these are deemed critical. With numbers like these, alerts can easily start piling up and don’t always get dealt with in a timely manner - or sometimes at all - leaving what may be important issues to fall through the cracks.

    Alert fatigue can be harmful to your business for a number of reasons. These include:

    1. Ignored alerts - Obviously, when alerts get missed due to alert fatigue, this can lead to damaged customer relationships and overall devastation to your business.
    2. Wasted time - The more time your team spends responding to alerts that are not necessarily critical, the less time they spend doing the other critical tasks they are being paid to do.
    3. Employee burnout - Your staff may, in fact, manage to resolve most of the significant alerts and therefore your customers may not be directly impacted. But the fact remains that the more alerts your employees receive and have to deal with, the less productive they will be.
    4. Psychological effects - The more alerts SOC analysts receive, the more reason they have for concern. Fear that they may have missed something can slow down releases, ultimately impacting customers.

    Some of these factors, like the number of alerts that get missed or the amount of wasted time, can be measured. In his article, “Alert Fatigue – A Practical Guide to Managing Alerts,” Itiel Schwartz writes that psychological ramifications, such as how burned out your staff gets, cannot. But in response to his article, Reddit user SU1PHR disagrees, stating that just because there is no quantitative way to measure employee burnout does not mean it can’t be measured. He suggests monitoring employee retention rates and one-to-one meetings at which managers can routinely receive feedback on how their employees are managing. He warns that not doing so can cause “a deadly spiral that will lead to more fatigue, more errors and more missed alerts.”

    Several other Reddit users on the same thread mentioned that at their places of employment, alerts get saved and fed into business analytics tools but that, other than being able to say “we are collecting the data,” nothing else ever gets done with them.

    So what can be done to minimize alert fatigue and help SOC analysts stay on top of everything that needs to get done?

    First of all, it’s important to minimize human error whenever possible. Sometimes engineers inadvertently create a code malfunction or an alert isn’t calibrated properly. Putting better organizational processes in place can help ensure that the people involved in setting the alerts do so appropriately.

    But it’s best, when feasible, to remove the human element altogether. This is why IT Central Station user Tshepiso M. points out that it is important to automate wherever possible. Using technology to sort alerts by importance can help ensure peace of mind in your staff by taking some of the burden off of them. The less your employees feel responsible for keeping track of and dealing with alerts, the less you’ll have to worry about psychological effects such as burnout or fear of failure.

    One way to increase efficiency, as Geofrey M. points out, is by implementing a SIEM solution. Security information and event management (SIEM) solutions can help prevent alert fatigue by streamlining security. Part of the problem with alerts is the amount of sources from which they originate. Organizations are constantly adding more tools, which makes IT environments increasingly complex. A SIEM solution can become your primary security monitoring tool by consolidating the data streams and integrating unique data sources. It can also take security data from a variety of systems and analyze them, putting them all into context and gleaning new insights from one centralized location.

    Every organization is different, and a SIEM allows you to adapt and build your own nuances into the security alert process for your business. When you are considering deploying a SIEM solution, keep the following things in mind in order to ensure that you are only receiving the notifications you actually need.

    • Consider context - Rather than setting the same alerts for each new asset, take the time to think through each asset’s function and role within the wider context of the environment and adjust the defaults and settings accordingly. This allows for proper prioritization and allows the number of notifications to be reduced significantly.
    • Limit who will receive what alerts - Without a SIEM, every single alert may be sent out to every single admin. But this is rarely necessary. A SIEM will allow you to have different staff members alerted depending on the event or the operating system affected. This reduces redundancy and prevents a buildup of excess alerts over time.
    • Revisit and readjust - Make changes as you go. If your initial configuration leaves you getting alerts you don’t need, you can always lower the priority or filter it out altogether. SIEMs allow you the flexibility to change your settings as needed, maximizing the capabilities of your security tools and freeing up your security team to be available when they are really needed.

    Conclusion

    Putting better organizational processes in place in order to minimize human error can help reduce alert fatigue for SOC analysts. But an even better strategy is to try to automate and remove the human element altogether. One great way to do this is to implement a SIEM solution.

    Learn about SOC Analyst Appreciation Day here.

    (less)
    Find out what your peers are saying about PagerDuty, Everbridge, BigPanda and others in IT Alerting and Incident Management. Updated: November 2021.
    552,695 professionals have used our research since 2012.