Monday, September 26, 2022
HomeBig DataBuild a modern data architecture and data mesh pattern at scale using...

Build a modern data architecture and data mesh pattern at scale using AWS Lake Formation tag-based access control

[ad_1]

Customers are exploring building a data mesh on their AWS platform using AWS Lake Formation and sharing their data lakes across the organization. A data mesh architecture empowers business units (organized into domains) to have high ownership and autonomy for the technologies they use, while providing technology that enforces data security policies both within and between domains through data sharing. Data consumers request access to these data products, which are approved by producer owners within a framework that provides decentralized governance, but centralized monitoring and auditing of the data sharing process. As the number of tables and users increase, data stewards and administrators are looking for ways to manage permissions on data lakes easily at scale. Customers are struggling with “role explosion” and need to manage hundreds or even thousands of user permissions to control data access. For example, for an account with 1,000 resources and 100 principals, the data steward would have to create and manage up to 100,000 policy statements. As new principals and resources get added or deleted, these policies have to be updated to keep the permissions current.

Lake Formation tag-based access control (TBAC) solves this problem by allowing data stewards to create LF-tags (based on their business needs) that are attached to resources. You can create policies on a smaller number of logical tags instead of specifying policies on named resources. LF-tags enable you to categorize and explore data based on taxonomies, which reduces policy complexity and scales permissions management. You can create and manage policies with tens of logical tags instead of the thousands of resources. Lake Formation TBAC decouples policy creation from resource creation, which helps data stewards manage permissions on many databases, tables, and columns by removing the need to update policies every time a new resource is added to the data lake. Finally, TBAC allows you to create policies even before the resources come into existence. All you have to do is tag the resource with the right LF-tag to make sure existing policies manage it.

This post focuses on managing permissions on data lakes at scale using LF-tags in Lake Formation for cross accounts. For managing data lake catalog tables from AWS Glue and administering permission to Lake Formation, data stewards within the producing accounts have functional ownership based on the functions they support, and can grant access to various consumers, external organizations, and accounts. You can now define LF-tags; associate at the database, table, or column level; and then share controlled access across analytic, machine learning (ML), and extract, transform, and load (ETL) services for consumption. LF-tags make sure that governance can be scaled easily by replacing the policy definitions of thousands of resources with a few logical tags.

Solution overview

LF-tag access has three key components:

  • Tag ontology and classification – Data stewards can define an LF-tag ontology based on their business needs and grant access based on LF-tags to AWS Identity and Access Management (IAM) principals and SAML principals or groups
  • Tagging resources – Data engineers can easily create, automate, implement, and track all LF-tags and permissions against AWS Glue catalogs through the Lake Formation API
  • Policy evaluation – Lake Formation evaluates the effective permissions based on LF-tags at query time and allows access to data through consuming services such as Amazon Athena, AWS Glue, Amazon Redshift Spectrum, Amazon SageMaker Data Wrangler, and Amazon EMR Studio, based on the effective permissions granted across multiple accounts or organization-level data shares

The following diagram illustrates the relationship between the data producer, data consumer, and central governance accounts.

In the above diagram, the central governance account box shows the tagging ontology that will be used with the associated tag colors. These will be shared with both the producers and consumers, to be used to tag resources.

In this post, we considered two databases, as shown in the following figure, and show how you can set up a Lake Formation table and create Lake Formation tag-based policies.

The solution includes the following high-level steps:

  1. The data mesh owner defines the central tag ontology with LF-tags:
    1. LOB – Classified at the line of business (LOB) level (database)
    2. LOB:Function – Classified at the business function level (table)
    3. Classification – Classification of the functional data level (columns)
  2. The data mesh owner assigns respective permission levels to the product data steward to use centrally defined tags and associates permission to their database and tables with different LF-tags.
  3. The producer steward in the central account owns two databases: lob = card and lob = retail.
  4. The producer steward switches to the data producer account to add table metadata using an AWS Glue crawler.
  5. The producer steward associates column-level classifications Classification = Sensitive or Classification = Non-Sensitive to tables under the Card database in the central account.
  6. The producer steward associates table-level tags lob:retail = Customer and lob:retail = Reviews to tables under the Retail database in the central account.
  7. The consumer admin grants fine-grained access control to different data analysts.

With this configuration, the consumer analyst can focus on performing analysis with the right data.

Set up resources with AWS CloudFormation

We provide three AWS CloudFormation templates in this post: for the producer account, central account, and consumer account. Deploy the CloudFormation templates in the order of producer, central, and consumer, because there are dependencies between the templates.

The CloudFormation template for the central account generates the following resources:

  • Two IAM users:
    • DataMeshOwner
    • ProducerSteward
  • Grant DataMeshOwner as the LakeFormation Admin
  • One IAM role:
    • LFRegisterLocationServiceRole
  • Two IAM policies:
    • ProducerStewardPolicy
    • S3DataLakePolicy
  • Create databases “retail” and “cards” for ProducerSteward to manage Data Catalog
  • Share the data location permission for producer account to manage Data Catalog

The CloudFormation template for the producer account generates the following resources:

  • Two Amazon Simple Storage Service (Amazon S3) buckets:
    • RetailBucket, which holds two tables:
      • Customer_Info
      • Customer_Review
    • CardsBucket, which holds one table:
  • Allow Amazon S3 bucket access for the central account Lake Formation service role.
  • Two AWS Glue crawlers
  • One AWS Glue crawler service role
  • Grant permissions on the S3 bucket locations tbac-cards-<ProducerAccountID>-<aws-region> and tbac-retail-<ProducerAccountID>-<aws-region> to the AWS Glue crawler role
  • One producer steward IAM user

The CloudFormation template for the consumer account generates the following resources:

  • One S3 bucket:
    • <AWS Account ID>-<aws-region>-athena-logs
  • One Athena workgroup:
  • Three IAM users:
    • ConsumerAdmin
    • ConsumerAnalyst1
    • ConsumerAnalyst2

Launch the CloudFormation stack in the central account

To create resources in the central account, complete the following steps:

  1. Sign in to the central account’s AWS CloudFormation console in the target Region.
  2. Choose Launch Stack:   
    Launch Stack
  3. Choose Next.
  4. For Stack name, enter stack-central.
  5. For DataMeshOwnerUserPassword, enter the password you want for the data lake admin IAM user in the central account.
  6. For ProducerStewardUserPassword, enter the password you want for the producer steward IAM user in the producer account.
  7. For ProducerAWSAccount, enter the AWS <ProducerAccountID>.
  8. Choose Next.
  9. On the next page, choose Next.
  10. Review the details on the final page and select I acknowledge that AWS CloudFormation might create IAM resources.
  11. Choose Create stack.
  12. Collect the value for LFRegisterLocationServiceRole on the stack’s Outputs tab.

Launch the CloudFormation stack in the producer account

To set up resources in the producer account, complete the following steps:

  1. Sign in to the producer account’s AWS CloudFormation console in the target Region.
  2. Choose Launch Stack:
  3. Choose Next.
  4. For Stack name, enter stack-producer.
  5. For CentralAccountID, copy and paste the value of the <CentralAccountID> .
  6. For CentralAccountLFServiceRole, copy and paste the value of the LFRegisterLocationServiceRole collected from the stack-central.
  7. For LFDatabaseName, keep the default value of the tbac database name.
  8. For ProducerStewardUserPassword, enter the password you want for the data lake admin IAM user on the producer account.
  9. Choose Next.
  10. On the next page, choose Next.
  11. Review the details on the final page and select I acknowledge that AWS CloudFormation might create IAM resources.
  12. Choose Create stack.

Launch the CloudFormation stack in the consumer account

To create resources in the consumer account, complete the following steps:

  1. Sign in to the consumer account’s AWS CloudFormation console in the target Region.
  2. Choose Launch Stack:
  3. Choose Next.
  4. For Stack name, enter stack-consumer.
  5. For ConsumerAdminUserName and ConsumerAdminUserPassword, enter the user name and password you want for the data lake admin IAM user.
  6. For ConsumerAnalyst1UserName and ConsumerAnalyst1UserPassword, enter the user name and password you want for the consumeranalyst1 IAM user.
  7. For ConsumerAnalyst2UserName and ConsumerAnalyst2UserPassword, enter the user name and password you want for the consumeranalyst2 IAM user.
  8. Choose Next.
  9. On the next page, choose Next.
  10. Review the details on the final page and select I acknowledge that AWS CloudFormation might create IAM resources.
  11. Choose Create stack.

Configure Lake Formation cross-account sharing

After you create your resources with AWS CloudFormation, you perform the following steps in the producer and central account to set up Lake Formation cross-account sharing.

Central governance account

In the central account, complete the following steps:

  1. Sign in to the Lake Formation console as admin.
  2. In the navigation pane, choose Permissions, then choose Administrative roles and tasks.

The CloudFormation template added the data mesh owner as the data lake administrator.

Next, we update the Data Catalog settings to use Lake Formation permissions to control catalog resources instead of IAM-based access control.

  1. In the navigation pane, under Data catalog¸ choose Settings.
  2. Uncheck Use only IAM access control for new databases.
  3. Uncheck Use only IAM access control for new tables in new databases.
  4. Choose Save.

Next, we need to set up the AWS Glue Data Catalog resource policy to grant cross-account access to Data Catalog resources.

  1. Use the following policy, and replace the account number and Region with your own values:
  2. As described in Lake Formation Tag-Based Access Control Cross-Account Prerequisites, before you can use the tag-based access control method to grant cross-account access to resources, you must add the following JSON permissions object to the AWS Glue Data Catalog resource policy in the producer account. This gives the consumer account permission to access the Data Catalog when glue:EvaluatedByLakeFormationTagsis true. Also, this condition becomes true for resources on which you granted permission using Lake Formation permission Tags to the consumer’s account. This policy is required for every AWS account that you’re granting permissions to. We discuss the full IAM policy later in this post.
    {
       "PolicyInJson": "{\"Version\" : \"2012-10-17\",\"Statement\" : [ {\"Effect\" : \"Allow\",\"Principal\" : {\"AWS\" : [\"arn:aws:iam::<ProducerAccountID>:root\",\"arn:aws:iam::<ConsumerAccountID>:root\"]},\"Action\" : \"glue:*\",\"Resource\" : [ \"arn:aws:glue:<aws-region>:<CentralAccountID>:table/*\", \"arn:aws:glue:<aws-region>:<CentralAccountID>:database/*\", \"arn:aws:glue:<aws-region>:<CentralAccountID>:catalog\" ],\"Condition\" : {\"Bool\" : {\"glue:EvaluatedByLakeFormationTags\" : \"true\"}}}, {\"Effect\" : \"Allow\",\"Principal\" : {\"Service\" : \"ram.amazonaws.com\"},\"Action\" : \"glue:ShareResource\",\"Resource\" : [ \"arn:aws:glue:<aws-region>:<CentralAccountID>:table/*\", \"arn:aws:glue:<aws-region>:<CentralAccountID>:database/*\", \"arn:aws:glue:<aws-region>:<CentralAccountID>:catalog\" ]} ]}",
       "EnableHybrid": "TRUE"
    }

Replace the <aws-region>, <ProducerAccountID>, <ConsumerAccountID> and <CentralAccountID> values in the above policy as appropriate and save it in a file called policy.json.

  1. Next, run the following AWS Command Line Interface (AWS CLI) command on AWS CloudShell.
aws glue put-resource-policy --region <aws-region> --cli-input-json file://policy.json

For more information about this policy, see put-resource-policy.

  1. Next, we verify the two source data S3 buckets are registered as data lake locations in the central account. This is completed by the CloudFormation template.
  2. Under Register and ingest in the navigation pane, choose Data lake locations.

You should see the two S3 buckets registered under the data lake locations.

Configure Lake Formation Data Catalog settings in the central account

After we complete all the prerequisites, we start the data mesh configuration. We log in as DataMeshOwner in the central account.

Define LF-tags

DataMeshOwner creates the tag ontology by defining LF-tags. Complete the following steps:

  1. On the Lake Formation console, under Permissions in the navigation pane, under Administrative roles and tasks, choose LF-Tags.
  2. Choose Add LF-tags.
  3. For Key, enter LOB and for Values, choose Retail and Cards.
  4. Choose Add LF-tag.
  5. Repeat these steps to add the key LOB:Retail and values Customer and Reviews, and the key Classification with values Sensitive and Non-Sensitive.

Now we complete the configuration of the tag ontology.

Grant permissions

We grant ProducerSteward in the central accounts describe and associate permissions on the preceding tag ontology. This enables ProducerSteward to view the LF-tags and assign them to Data Catalog resources (databases, tables, and columns). ProducerSteward in the central account can further grant the permission to ProducerSteward in the producer account. For more information, see Granting, Revoking, and Listing LF-Tag Permissions. When you have multiple producers, grant the relevant tags to each steward.

  1. Under Permissions in the navigation pane, under Administrative roles and tasks, choose LF-tag permissions.
  2. Choose Grant.
  3. For IAM users and roles, choose the ProducerSteward user.
  4. In the LF-Tags section, add all three key-values:
    1. Key LOB with values Retail and Cards.
    2. Key LOB:Retail with values Customer and Reviews.
    3. Key Classification with values Sensitive and Non-Sensitive.
  5. For Permissions, select Describe and Associate for both LF-tag permissions and Grantable permissions.
  6. Choose Grant.

Next, we grant ProducerSteward tag-based data lake permissions. This enables ProducerSteward to create, alter, and drop tables in the databases with corresponding tags. ProducerSteward in the producer account can further grant the permission across accounts.

  1. In the navigation pane, under Permissions, Data lake permissions, choose Grant.
  2. For Principals, choose IAM users and roles, and choose ProducerSteward.
  3. For LF-tags or catalog resources, select Resources matched by LF-Tags (recommended).
  4. Choose Add LF-Tag.
  5. For Key, choose LOB and for Values, choose Cards.
  6. For Database permissions, select the Super permission because ProducerSteward owns the producer databases.

This permission allows a principal to perform every supported Lake Formation operation on the database. Use this admin permission when a principal is trusted with all operations.

  1. Select Super under Grantable permissions so the ProducerSteward user can grant database-level permissions to the producer and consumer accounts.
  2. For Table permissions, select Super.
  3. Select Super permission under Grantable permissions.
  4. Choose Grant.
  5. Repeat these steps for key LOB and value Retail.
  6. In the navigation pane, under Permissions, Data lake permissions, choose Grant.
  7. For Principals, choose IAM users and roles, and choose ProducerSteward.
  8. For LF-tags or catalog resources, select Resources matched by LF-Tags (recommended).
  9. Add the key LOB with value Cards, and the key Classification with values Sensitive and Non-Sensitive.
  10. For Database permissions, select Super.
  11. Select Super permission under Grantable permissions.
  12. For Table permissions, select Super.
  13. Select Super under Grantable permissions.
  14. Choose Grant.

This gives ProducerSteward fine-grained permission expression on columns with either Sensitive or Non-sensitive tags.

  1. Repeat these steps for key LOB and value Retails, and key LOB:Retails and value Reviews or Customer.

This gives ProducerSteward fine-grained permission expression on tables with either Reviews or Customers tags.

Producer data steward actions in the central account

Next, we log in as the ProducerSteward user in the central account and create skeleton databases.

  1. Sign in to the Lake Formation console as ProducerSteward.
  2. In the navigation pane, under Data catalog, select Databases.
  3. Choose the cards database.
  4. On the Actions menu, choose Edit LF-tags.
  5. Choose Assign new LF-tag.
  6. For Assigned Keys, enter LOB and for Values, choose Cards.
  7. Choose Save.

This assigns the LOB=Cards tag to the Cards database.

  1. Repeat these steps for Retail database and assign the LOB=Retail tag to the Retail database.

Next, we share the LF-tags and data lake permissions with the producer account so that ProducerSteward in the producer account can run AWS Glue crawlers and generate tables in the preceding skeleton databases.

  1. Under Permissions in the navigation pane, under Administrative roles and tasks, choose LF-tag permissions.
  2. Choose Grant.
  3. For Principals, select External accounts.
  4. For AWS account or AWS organization, enter the account ID for the producer account.
  5. In the LF-Tags section, we only need to add database-level tags.
  6. For Key, enter LOB and for Values, choose Retail and Cards.
  7. For Permissions, choose Describe and Associate for both LF-tag permissions and Grantable permissions.
  8. Choose Grant.
  9. In the navigation pane, under Permissions, Data lake permissions, choose Grant.
  10. For Principals, select External accounts.
  11. For AWS account or AWS organization, enter the account ID for the producer account.
  12. For LF-tags or catalog resources, select Resources matched by LF-Tags (recommended).
  13. Choose Add LF-Tag.
  14. Choose the key LOB and value Cards.
  15. For Database permissions, select Create table and Describe because the ProducerSteward user in the producer account will add tables in the database.
  16. Select Create table and Describe under Grantable permissions so the ProducerSteward user can further grant the permission to the AWS Glue crawler.
  17. For Table permissions, select all the permissions.
  18. Select all the permissions under Grantable permissions.
  19. Choose Grant.
  20. Repeat these steps for LOB=Retail.

Now the Lake Formation administrators on the producer account side has the right permissions to add tables.

Crawl source tables in the producer account

Next, we log in as the ProducerSteward user in the producer account to crawl the source tables for the Cards and Retail databases.

  1. Sign in to the Lake Formation console as ProducerSteward.
  2. In the navigation pane, under Administrative Roles and Tasks, verify that ProducerSteward is configured as the data lake administrator.
  3. In the navigation pane, under Permissions, then choose Administrative roles and tasks, choose LF-Tags.

You can verify the root-level LOB tags that were shared with the producer account.

  1. In the navigation pane, under Data catalog, select Databases.

You can verify the two databases cards and retail that were shared with the producer account from the previous step.

Now, we create resource links in the producer account for these two databases. These links point at the shared databases and are used by AWS Glue crawler to create the tables. First, we create a resource link for the cards database.

  1. Select the cards database and on the Actions menu, choose Create resource link.
  2. For Resource link name, enter rl_cards.
  3. Choose Create.
  4. Repeat these steps to create a resource link for the retail database.

After the resource link creation, you should see both the resource link databases as shown in the following screenshot.

Next, we need to grant permissions to the AWS Glue crawler role so that the crawler can crawl the source bucket and create the tables.

  1. Select the rl_cards database and on the Actions menu, choose Grant.
  2. In the Grant data permissions section, select IAM users and roles, and choose the AWS Glue crawler role that was created by the CloudFormation template (for example, stack-producer-AWSGlueServiceRoleDefault-xxxxxx).
  3. For Databases, choose rl_cards.
  4. For Resource link permissions, select Describe.
  5. Choose Grant.
  6. Repeat these steps for rl_retail.
  7. Next, in the navigation pane, choose Data lake Permissions and choose Grant.
  8. For IAM users and roles, choose the role stack-producer-AWSGlueServiceRoleDefault-XXXX.
  9. For LF-Tags or catalog resources, select Resources matched by LF-Tags.
  10. Enter the key LOB and values Retail and Cards.
  11. For Database permissions, select Create table and Describe.
  12. For Table permissions, choose Select, Describe, and Alter.
  13. Choose Grant.

Next, we will verify grant permissions on the S3 bucket locations corresponding to cards and retail producers to the AWS Glue crawler role. This is completed by the CloudFormation template.

In the navigation pane, under Permissions, on the Data Locations, you should see the locations.

Now we’re ready to run the crawlers. We configure the crawlers that the CloudFormation template created, to point to these resource link databases.

  1. On the AWS Glue console, under Data catalog in the navigation pane, choose Crawlers.

The two crawlers you created should be listed.

  1. Select the crawler for the cards database CardsCrawler-xxxxxxxxxxxx and on the Action menu, choose Edit crawler.
  2. For the input data store, choose the S3 bucket for the cards producer.
  3. For IAM role, choose the AWS Glue service role created by the CloudFormation template.
  4. For Schedule, choose Run on demand.
  5. For the output database, choose the resource link database rl_cards corresponding to the cards database.
  6. Verify all the information and choose Save.
  7. Repeat these steps for the crawler corresponding to the retail producer.
  8. Select both crawlers and choose Run crawler.

When the crawlers finish, they create tables corresponding to each producer in their respective resource link databases. The table schemas are present in the shared database in the central account.

Configure Lake Formation tags in the central account

Next, we perform fine-grained access control for the tables that the crawlers created to support different consumption use cases using Lake Formation tags.

Tag columns

First, we tag sensitive columns in the cards table corresponding to the cards database, first using the Classification tag that we created earlier.

  1. Log in to central account as IAM user ProducerSteward.
  2. On the Lake Formation console, in the navigation pane, choose Data catalog and then choose Tables.

You should see three tables: the cards table corresponding to cards database, and the reviews and customers tables corresponding to the retail database.

  1. Choose the cards table.
  2. Navigate to the Schema section and choose Edit schema.
  3. Select all the columns and choose Edit tags.
  4. Choose Assign new LF-Tag.
  5. For Assigned keys, enter Classification and for Values, choose Non-Sensitive.
  6. Choose Save.

Next, we selectively tag the sensitive columns.

  1. In the Edit schema section, select columns card number, card holder’s name, cvv/cvv2, and card pin.
  2. Choose Edit tags.
  3. For Assigned keys, enter Classification and for Values, choose Sensitive.
  4. Choose Save.
  5. Choose Save as new version to save the schema.

Tag tables

Next, we tag the reviews and customer tables under the retail database using the LOB:retail tag that we created earlier.

  1. On the Tables page, select the reviews table and on the Actions menu, choose Edit LF-tags.
  2. Choose Assign new LF-Tag.
  3. For Assigned keys, choose LOB:Retail and for Values, choose Reviews.
  4. Choose Save.
  5. Repeat the steps for the customer table. Choose LOB:Retail for the key and Customer for the value.

Grant tag permissions

Next, grant LF-tag permissions to the external consumer account.

  1. On the Lake Formation console, in the navigation pane, choose Permissions, then choose Administrative roles and tasks and choose LF-tag permissions.
  2. Choose Grant.
  3. For Principals, select External accounts.
  4. For AWS account or AWS organization, enter the AWS account number corresponding to the consumer account.
  5. For LF-Tags, choose Add LF-Tag.
  6. For Key, choose LOB and for Values, choose Retail and Cards.
  7. Repeat these steps for key Classification with values Non-Sensitive and Sensitive, and key LOB:Retail with values Reviews and Customer.
  8. For Permissions, choose Describe.
  9. For Grantable permissions, choose Describe.
  10. Choose Grant.

Next, we grant Lake Formation policy tag expression permissions to the external consumer account.

  1. In the navigation pane, choose Data lake permissions and choose Grant.
  2. In the Principals section, select External accounts.
  3. For AWS account or AWS organization, enter the AWS account number corresponding to the consumer account.
  4. For LF-Tags or catalog resources, select Resources matched by LF-Tags.
  5. Choose Add LF-Tag.
  6. For Key, choose LOB and for Values¸ choose Retail.
  7. For Database permissions, select Describe.
  8. For Grantable permissions, select Describe.
  9. Choose Grant.
  10. Repeat these steps to grant permissions on the policy tag expression LOB=Cards.

Next, we grant table permissions.

  1. In the navigation pane, choose Data lake permissions and choose Grant.
  2. For Principals, select External accounts.
  3. For AWS account or AWS organization, enter the AWS account number corresponding to the consumer account.
  4. For LF-Tags or catalog resources, select Resources matched by LF-Tags.
  5. Add key LOB with value Retail, and key LOB:Retail with values Reviews and Customer.
  6. For Table Permissions, select Select and Describe.
  7. For Grantable permissions, select Select and Describe.
  8. Choose Grant.
  9. Repeat these steps to grant permissions on the policy tag expressions LOB=Cards and Classification = (Non-Sensitive or Sensitive).

Share and consume tables in the consumer account

When you sign in to the Lake Formation console in the consumer account as ConsumerAdmin, you can see all the tags and the corresponding values that were shared by the producer.

In these next steps, we share and consume tables in the consumer account.

Create a resource link to the shared database

On the Databases page on the Lake Formation console, you can see all the databases that were shared to the consumer account. To create a resource link, complete the following steps:

  1. On the Databases page, select the cards database and on the Actions menu, choose Create resource link.
  2. Enter the resource link name as rl_cards.
  3. Leave the shared database and shared database’s owner ID as default.
  4. Choose Create.
  5. Follow the same process to create the rl_retail resource link.

Grant Describe permission to ConsumerAnalyst1

To grant Describe permissions on resource link databases to ConsumerAnalyst1, complete the following steps:

  1. On the Databases page, select the resource database rl_retail and on the Actions menu, choose Grant.
  2. In the Grant data permissions section, select IAM users and roles.
  3. Choose the role ConsumerAnalyst1.
  4. In the Resource link permissions section, select Describe.
  5. Choose Grant.
  6. Follow the same steps to grant rl_cards access to ConsumerAnalyst2.

Grant Tag permissions to ConsumerAnalyst1

To grant Tag permissions on the LOB:Retail Customer tag to ConsumerAnalyst1 to access the sales table, complete the following steps:

  1. On the Lake Formation console, on the Data permission page, choose Grant.
  2. In the Grant data permissions section, select IAM users and roles.
  3. Choose the role ConsumerAnalyst1.
  4. For LF-Tags or catalog resources, select Resources matched by LF-Tags.
  5. Add the key LOB with value Retail, and the key LOB:Retail with value Customer.
  6. For Table permissions, select Select and Describe.
  7. Choose Grant.

Access to the customers table inside the rl_retail database is granted to ConsumerAnalyst1.

Grant Tag permissions to ConsumerAnalyst2

To grant Tag permissions on the Classification:Sensitive tag to l to access attributes tagged as Sensitive in the cards table, complete the following steps:

  1. On the Lake Formation console, on the Data permission page, choose Grant.
  2. In the Grant data permissions section, select IAM users and roles.
  3. Choose the role ConsumerAnalyst2.
  4. For LF-Tags or catalog resources, select Resources matched by LF-Tags.
  5. Add the key LOB with value Cards, and the key Classification with value Sensitive.
  6. For Table permissions, select Select and Describe.
  7. Choose Grant.

Access to attributes tagged as sensitive in the cards table inside the rl_cards database is granted to ConsumerAnalyst2.

Validate the access to ConsumerAnalyst1

To confirm ConsumerAnalyst1 access, complete the following steps:

  1. On the Athena console, for Workgroup, choose consumer-workgroup.
  2. Choose Acknowledge.
  3. Choose the database rl_retail.

You should be able to see the customers table and be able to query.

Validate the access to ConsumerAnalyst2

To confirm ConsumerAnalyst2 access, complete the following steps:

  1. On the Athena console, for Workgroup, choose consumer-workgroup.
  2. Choose Acknowledge.
  3. Choose the database rl_cards.

You should be able to see only the sensitive attributes from the cards table.

As a thought experiment, you can also check to see the Lake Formation Tag-based access policy behavior on columns to which the user doesn’t have policy grants.

When an untagged column is selected from the table rl_cards.cards, Athena returns an error. For example, you can run the following query to choose the untagged column “issuing_bank” which is non-sensitive.

SELECT issuing_bank FROM "rl_cards"."cards" limit 10;

Conclusion

In this post, we explained how to create a Lake Formation tag-based access control policy in Lake Formation using an AWS public dataset. In addition, we explained how to query tables, databases, and columns that have Lake Formation tag-based access policies associated with them.

You can generalize these steps to share resources across accounts. You can also use these steps to grant permissions to SAML identities.

A data mesh approach provides a method by which organizations can share data across business units. Each domain is responsible for the ingestion, processing, and serving of their data. They are data owners and domain experts, and are responsible for data quality and accuracy. This is similar to how microservices turn a set of technical capabilities into a product that can be consumed by other microservices. Implementing a data mesh on AWS is made simple by using managed and serverless services such as AWS Glue, Lake Formation, Athena, and Redshift Spectrum to provide a well-understood, performant, scalable, and cost-effective solution to integrate, prepare, and serve data.


About the Authors

Nivas Shankar is a Principal Data Architect at Amazon Web Services. He helps and works closely with enterprise customers building data lakes and analytical applications on the AWS platform. He holds a master’s degree in physics and is highly passionate about theoretical physics concepts.

Dylan Qu is an AWS solutions architect responsible for providing architectural guidance across the full AWS stack with a focus on Data Analytics, AI/ML and DevOps.

Pavan Emani is a Data Lake Architect at AWS, specialized in big data and analytics solutions. He helps customers modernize their data platforms on the cloud. Outside of work, he likes reading about space and watching sports.

Prasanna Sridharan is a Senior Data & Analytics Architect with AWS. He is passionate about building the right big data solution for the AWS customers. He is specialized in the design and implementation of Analytics, Data Management and Big Data systems, mainly for Enterprise and FSI customers.

[ad_2]

Source link

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments