In May 2018, Twitter stated that they detected a vulnerability that unintentionally entered user credentials into their logging system. A year later, Facebook urged all users to change their passwords after discovering readable passwords on internal data storage.

Although there was no data compromise in either occurrence, the data was still logged. Even if there was no security breach, personal information was still at risk – anyone with access to the logging system could have potentially gained access to sensitive consumer data.

Sensitive data is anything that may identify a person (also called PII, Personally Identifiable Information). Email address, name, birthday, SSN, IP address, ethnicity, gender, IP addresses, user names and passwords, credit card information, and any other personally-identifying information are all considered sensitive. 

Logging is critical for troubleshooting, incident management, and debugging errors. Nonetheless, recording sensitive data creates a slew of problems, including stakeholder privacy, legal restrictions on the gathering of personal information, and the possibility of data disclosure by insiders. 

Categorization of data

Businesses must determine what data must be secured and develop a Data Classification Policy to categorize data according to its sensitivity. At the very least, companies require three stages of data categorization:

  • Restricted: This category houses the most sensitive information and might pose a severe threat if exposed. Only those with a need-to-know basis should have access.
  • Confidential or Private: This information is moderately sensitive. If breached, it poses a moderate risk to the firm. The firm or department that owns the data controls access.
  • Public: These contain non-sensitive information, which, if obtained, would pose little or no harm to the firm.

Quality assurance and automated testing

A QA team may not have the necessary access or even know which systems to examine. It’ll take some time to bring them up to speed if they’ve been undertaking black box testing.

QA teams should ensure that all of the system’s various flows function. However, they don’t have to stop there. If tests are automated and employ pattern recognition, a test may confirm that sensitive data does not wind up in the logs automatically.

Encrypt logs

Businesses must adequately adopt and manage encryption. A successful encryption strategy involves using strong encryption and handling keys effectively. Before sharing important material across untrusted networks, encrypt it. Transmit log data to a central storage location using exclusively encrypted channels only. For speed reasons, logs are often sent unencrypted, thereby posing a threat.

Filter sensitive information

While writing PII or sensitive data to your logs is generally not a good idea, it is not necessarily inevitable and may occur accidentally or due to an error. For many businesses, the ability to search for and filter out/redact/obfuscate sensitive data from their logs is critical. They will effortlessly filter and redact sensitive data before it leaves your network for data protection, security obligations, and compliance. 

Organize logs

You can arrange logs using parser rules in a log shipper config since structured logs make it simpler to mask sensitive data or anonymize them. Applications should always log directly in a structured format such as JSON wherever feasible. Using a structured log format reduces the human and CPU time required to construct parser rules.

Anonymize Log Fields Containing Sensitive Data

Before sending data to remote storage, recognize and anonymize sensitive data fields. Data teams can employ numerous strategies, such as hashing, encryption, or eliminating sensitive data fields. An observability data pipeline solution can help obfuscate and rewrite data in motion so that no sensitive information makes its way into production systems, storage, or anywhere it’s unnecessary.

Conclusion

Keeping track of sensitive information and ensuring that they don’t make their way into your logs is crucial for protecting end-user information and keeping your business compliant. Although centralizing your logs is always the right thing to do, it does provide attackers targetting log data a single location to focus their attack on. Following the recommendations in this article can help you keep sensitive information out of your logs. However, if you employ total observability data pipeline control and data storage using Apica, you’ll gain complete control over your machine and observability data pipelines and unlock the ability to enhance data value while it’s in motion. 

With Apica, you can build data pipelines that connect data from the right sources to their appropriate targets, enforce role-based access to all data, and identify, obfuscate, and redact PII in log data using built-in rules. You can also write your data to highly secure and highly available storage systems and prevent data loss whenever a downstream target system is down or during back pressure and replay data on-demand when the target system is back online.