As more organizations use cloud technology, they generate a massive amount of data every day. This data can include sensitive customer information, such as Personal Identifiable Information (PII). It is essential to identify this sensitive data to comply with data privacy and security regulations. However, this can be a challenge because of the complexity of the data generated. As organizations operate on a global scale, there are some standards and regulatory compliances that they need to follow to ensure compliance with various laws. It keeps the security and data governance teams busy to ensure compliance with global regulations, such as GDPR, and regional regulations, such as the California Consumer Privacy Act (CCPA). One of the primary challenges is identifying all the sensitive data so that appropriate actions can be taken to protect it. Another obstacle arises when customers move their data from on-premises to cloud environments. In that case, identifying sensitive data is challenging, whether starting a new data lake or migrating existing workloads. Addressing these problem statements is one of the critical objectives of Amazon Macie. This tool is designed to help organizations identify and protect sensitive data, making it easier to manage data security and compliance requirements during the transition to cloud-based environments.
Amazon Macie is a data security service offered by Amazon that focuses on discovering sensitive data. It underwent a relaunch in 2020. Macie uses advanced technologies, such as machine learning and pattern matching, to identify sensitive data and highlight the different types of sensitive information. This service has two main features. Firstly, it's natively integrated with Amazon S3, a storage service from Amazon. This means it can provide information on all the different files you have stored in your Amazon S3 account and highlight this information in an interactive dashboard. The dashboard shows you a high-level view of all the objects you have stored in your buckets. The second key feature is that it helps you manage security by identifying sensitive information. The service uses both built-in managed data identifiers and custom data identifiers to look for sensitive information within your stored files. These identifiers are tools that help the service to match and identify specific types of sensitive data.
Macie is an account-based service provided by AWS that enables it at the account level. It can also activate at the organizational level, which recommends as a best practice. If data governance teams use delegated management access, they can enable Macie at the delegated management level, and it will be easily enabled across all member accounts in the organization. After you enable Macie at the account level, it will start to evaluate and gather the inventory of all S3 buckets across that account. Additionally, Macie continuously monitors control plane changes made to any bucket policies, which means it can identify any changes in encryption or access control policies in real-time. Further, Macie also automatically discovers sensitive data using a feature called intelligence sampling. This happens regularly, on a daily basis, and the discovered sensitive data is presented to customers in an interactive data map. This map makes it easy to identify high-priority and highly sensitive buckets or objects that require attention.
Macie reports two types of findings - policy findings and sensitive data findings - and both are available in the console. These findings can also access through EventBridge and routed to AWS Security Hub, a centralized monitoring and cloud posture assessment service provided by AWS. Alternatively, customers can use EventBridge to route the findings to their own third-party centralized monitoring tools.
The Amazon Macie offers several key benefits, including automatic sensitive data discovery at scale. This feature has been recently enhanced, and once it's activated on a new account, it automatically imports inventory from S3 and performs regular evaluations of sensitive data. Customers can use Macie to select target buckets for sensitive data discovery jobs.
If customers need to scan for sensitive data in buckets where they don't have visibility and need to take action, they can configure jobs in Macie. These jobs have a lot of configurations and flexibility available. Customers can choose how often they want to scan the data by defining recurrence patterns or scheduling the job daily, weekly, or monthly. They can also set objective criteria to determine how much data in the bucket they want to scan.
In some situations, customers know they need to scan for sensitive data, and it may not always be feasible to scan all the data as the organization expands, with an increase in AWS accounts and buckets and an accumulation of data in petabytes. To address this issue, Amazon Macie has introduced automatic sensitive data discovery at scale, which is currently accessible in the service.
As a managed service, Macie offers customers a broad range of managed data identifiers that cover various financial data types such as credit card numbers, bank account numbers, healthcare data types including insurance IDs, and personally identifiable information such as first name, last name, national IDs, and passport numbers. These identifiers are already built into the service, so customers only need to include or exclude them based on their needs. Additionally, Macie provides the regional and global expansion capabilities that customers require across multiple regions.
Furthermore, the service provides the ability to create custom data identifiers that customers can define based on what they think is sensitive to their organization. For instance, if an application collects data of a specific data type, such as a subscriber ID, and this is where the social security number is also collected, customers can create a PCRE (a regular expression) in the form of a custom data identifier. This identifier can include in Macie to identify the matching data.
Customers can access all the findings generated by Macie and automate actions based on them. This includes creating custom dashboards to display the results and routing event-based actions to respond to and remediate any issues found. Furthermore, Macie provides APIs for most of the management operations. This allows customers to easily automate the sensitive data discovery process and take necessary actions based on the findings in various environments such as data lakes and streaming data pipelines.
Before Amazon Macie introduced the automated sensitive data discovery feature, customers had to manually create scan jobs for specific buckets to identify sensitive data. This process was challenging as customers had to guess which buckets to prioritize for scanning, especially when dealing with many buckets or multiple AWS accounts. However, with the launch of this feature, customers can enable it to discover sensitive data more comprehensively and continuously across all their AWS accounts and S3 buckets. This feature provides the necessary visibility to detect sensitive data across all their accounts.
It supports continuous scanning of your AWS accounts for sensitive data cost-effectively. It uses intelligent sampling techniques to identify the data that needs to be discovered, reducing the amount of data scanning required and cutting the cost of data discovery by over 99%. This feature also helps you prioritize the buckets that need attention by providing information about findings in specific buckets, allowing you to focus on the investigation and remediation of actual sensitive data rather than wasting time on false positives.
To get started, select an AWS Macie Administrator who will be the designated administrator account. This account will have access to and monitor the overall data security of your entire S3 data store. Let's proceed and explore how to enable this feature.
You can go to the Automated Discovery setting page and enable this feature. Note that this feature is available for a 30-day free trial. Once enabled, it will conduct broad scans across all your S3 data stores to help you prioritize sensitive data investigation and remediation.
After enabling the feature, the next step is to review the results. To do so, navigate to the Summary page and assess the results. In this example, the designated Macie administrator account has been linked to six accounts and 107 buckets, as shown on the screen. However, depending on the organization, there may be hundreds of AWS accounts and thousands of S3 buckets. Additionally, Macie has detected 26 buckets that contain sensitive data while also identifying buckets that do not have sensitive data. It provides a broad view of the overall data security posture.
As organizations grow, discovering sensitive data and determining its location becomes increasingly challenging. This uncertainty often leaves customers unsure about whether their data is secure. With this new feature, customers now have comprehensive visibility into all the buckets across their various accounts.
After reviewing the general data security posture, the next step is to gain a deeper understanding and investigate. To do this, navigate to the S3 buckets data page and review the heat map for your S3 buckets.
The heat map on the S3 buckets data page visually represents hundreds of S3 buckets across multiple accounts. Each square on the heat map represents an S3 bucket and is colour-coded to indicate the level of prioritization needed for the investigation. The heat map provides information such as account ID, account name, and the number of buckets with sensitive data found in each account. For example, the first account has 14 buckets with sensitive data, 56 buckets with no sensitive data discovered, and three buckets pending scans for sensitive data. This view allows a quick assessment of how many sensitive data buckets exist in a particular account.
The colour on each bucket provides important information. You can select a specific account ID to investigate and drill down further to understand this. The colour red in this view indicates that Macie has discovered sensitive data within that particular bucket. The darker the red, the higher the number of sensitive data found. The blue colour indicates that Macie has conducted a scan of the objects in the bucket and did not detect any sensitive data. The darker the blue, the higher the proportion of objects scanned without finding sensitive data. This colour-coded view helps prioritize investigation efforts.
Next, you can analyze a particular bucket and examine the insights derived from Macie's findings. To do this, select a specific bucket and review the details that have been identified by Macie.
If you are a current Macie user, you need to enable this feature manually, and new accounts will have it automatically enabled with a 30-day free trial. The feature is cost-effective and eliminates the need for expensive full scans, making it easier for customers to identify sensitive data. It is optimized and has standalone pricing.
To ensure the effectiveness of the feature, organizations should define what data is considered sensitive before enabling it. This can be achieved by revising and updating the scope of sensitive data using managed data identifiers. Custom identifiers can also be defined to specify what is considered sensitive data.
Although the detection feature reveals a high-risk posture by identifying sensitive data, it is essential to take action to remediation or respond to those findings. All the findings can be automatically sent to Security Hub, which organizes and aggregates them with other control evaluations. Security Hub provides standard evaluations on the posture of your AWS infrastructure and its controls, and you can combine them with Macie's findings to take action.
AWS re:Invent 2022