{"id":97,"date":"2022-09-28T19:25:09","date_gmt":"2022-09-28T19:25:09","guid":{"rendered":"https:\/\/sites.psu.edu\/jaredmcuevas\/?p=97"},"modified":"2022-09-28T19:25:09","modified_gmt":"2022-09-28T19:25:09","slug":"topic-3-1-master-data-management","status":"publish","type":"post","link":"https:\/\/jaredmcuevas.com\/?p=97","title":{"rendered":"Topic 3.2 \u2013 Data Warehouses &#038; Data Lakes"},"content":{"rendered":"<p><strong>Personal Lessons Learned<\/strong>. Primarily working as an end-user and in knowledge management roles within large organizations, learning about the dynamics of modern data management architectures was useful if only to gain initial exposure to the terms and current trends.\u00a0 I plan on revisiting these topics after interviewing some industry experts regarding the practical use of these architectures.<\/p>\n<p><strong>Introduction<\/strong>. There&#8217;s a plethora of information published on data warehouses and data lakes, including two great sources listed below (Amazon and Gartner). Here, we don&#8217;t need to restate the definitions, but rather draw on take-aways and highlights contrasts between the two terms -both of which are critical within the modern data management landscape.<\/p>\n<p>Both data warehouses and data lakes are used to store large amounts of client data. Gartner produced a great article in 2020 identifying similarities, differences, uses, trends, and recommendations for these architectural patterns, cited below.\u00a0 We will highlight some of the key findings here:<\/p>\n<p><strong>Similarities (Data Warehouses &amp; Data Lakes)<\/strong>:\u00a0 \u00a0Both patterns enable data analysis by providing repositories for large amounts of raw data which can be collected and analyzed.\u00a0 According to Friedman and Heudecker, &#8220;both provide an endpoint for collection of transactional, detailed data&#8230; <em>specifically to support the execution of analytic workloads<\/em>.&#8221;<\/p>\n<p><strong>Differences (Data Warehouses &amp; Data Lakes):\u00a0 <\/strong>The primary contrast between these two systems is the method and focus.\u00a0 Data warehouses contain &#8220;curated&#8221; data, or that which has been formatted and standardized to some extent.\u00a0 Data lakes, in contrast, contain data in a generally more raw form which may differ depending on the source.\u00a0 Data lakes are premised on storing information &#8220;as-is&#8221;.<\/p>\n<p><strong>I. Data Lakes<\/strong> collect and store &#8220;unrefined data&#8230;with limited transformation and quality assurance&#8230;and events captured from a diverse array of source systems&#8221; (<em>Gartner, citation below<\/em>).\u00a0 Enterprises with use cases in &#8220;exploratory analysis and data science activities&#8221; across multiple types and sources of data (i.e., from mobile phones, internal networks, external sources) will likely find the data lake format to be more useful in enabling their analysis activities.\u00a0 This means creative solutions and previously undefined links and trends &#8211; i.e., non-intuitive insights &#8211; may be better found using analytics processes with data lakes.<\/p>\n<p><strong>II.\u00a0 Data warehouses<\/strong> are &#8220;database[s] optimized to analyze relational data coming from transactional systems and line of business applications&#8230;.data is cleaned, enriched, and transformed so it can act as the &#8216;single source of truth&#8217; that users can trust.&#8221; (<em>Amazon, citation below<\/em>)<\/p>\n<p>Taking Amazon&#8217;s definition of data warehousing in mind, there is a close connection between MDM (discussed in blog #3.1) and data warehousing.\u00a0 By standardizing data and eliminating redundancy in entities, MDM (<em>architecture<\/em>) and data warehousing (<em>method<\/em>) can work extremely well together synchronously.<\/p>\n<p><strong>Key Strategic Findings<\/strong>:<\/p>\n<p>A.\u00a0 One of the important highlights of these two data architectures is that enterprises are not required to choose one or the other.\u00a0 Because analysis functions are growing increasingly complex and demanding, both architectural systems can be used to provide either (a) disparate or (b) sequential results.\u00a0 Friedman and Heudecker note that &#8220;it is equally important to recognize that these architectural patterns can bring more value to the enterprise when used in combination&#8230;the data warehouse, data lake and data hub can be combined to work together in an effective architecture.&#8221;\u00a0 For example, results from more diverse, less-structured data lake can be transferred and further analyzed in the data warehouse.<\/p>\n<p>B.\u00a0 It&#8217;s important to note that these repositories store data long-term (in contrast to data hubs, which generally pass data on and do not store it).\u00a0 In many cases, stored data may not be current. &#8220;Because the data warehouse is used for analytical processing, it contains data reflecting a specific point in time rather than the most current values.&#8221; &#8211; Amazon.\u00a0 Analysts must account for time-value of data and understand the limitations of real-time data when leveraging data warehouses.<\/p>\n<p>Sources:<br \/>\n(1) &#8220;What Is A Data Lake&#8221;, Amazon, 2022. https:\/\/aws.amazon.com\/big-data\/datalakes-and-analytics\/what-is-a-data-lake\/<br \/>\n(2) &#8220;Data Hubs, Data Lakes and Data Warehouses: How They Are Different and Why They Are Better Together&#8221;, Gartner, Refreshed 2 June 2021, Published 13 February 2020. https:\/\/www.gartner.com\/document\/3980938?ref=d-linkShare<br \/>\n(3) &#8220;Ten Steps to Build an Agile Information Architecture&#8221;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Personal Lessons Learned. Primarily working as an end-user and in knowledge management roles within large organizations, learning about the dynamics of modern data management architectures was useful if only to gain initial exposure to the terms and current trends.\u00a0 I plan on revisiting these topics after interviewing some industry experts regarding the practical use of [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-97","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/jaredmcuevas.com\/index.php?rest_route=\/wp\/v2\/posts\/97","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jaredmcuevas.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jaredmcuevas.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jaredmcuevas.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/jaredmcuevas.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=97"}],"version-history":[{"count":0,"href":"https:\/\/jaredmcuevas.com\/index.php?rest_route=\/wp\/v2\/posts\/97\/revisions"}],"wp:attachment":[{"href":"https:\/\/jaredmcuevas.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=97"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jaredmcuevas.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=97"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jaredmcuevas.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=97"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}