040206 Quality Assuring your Data Pipeline

2023-04-10

위 벳지는 수강을 완료하고 받은 뱃지입니다.

Get Data into the EMS

02. Refine your Data Pipeline

Quality Assuring your Data Pipeline

이것은 데이터 파이프라인이 실행되기 전에 품질 보증 검사를 실행하는 데 사용할 수 있는 실용적인 테이크 아웃 체크리스트 입니다. 체크리스트는 프로젝트 모범 사례를 기반으로 하며 현재 구현에 사용됩니다.

“데이터를 EMS로 가져오기” 트랙을 진행했다면 대부분의 이러한 사항에 익숙할 것입니다. 자명하지 않은 사항에 대해서는 간단한 설명도 제공합니다.

체크리스트는 다음과 같습니다. 빠르게 읽고 나중에 코스 리소스에서 목록을 다운로드 하십시오.

Area Check Explanation Status
Data Connection Connections in place working - No errors/warnings?   Not Started
Extractions Should we be using Replication Cockpit? This is a consideration if the existing pipeline is too slow or needs to work for an operational / high speed use case Not Started
  Full Extractions Loads run less than 12 hours? 12 hours is a benchmark that typically should not be crossed for extractions. The points below can help reduce this time limit. Not Started
  Replication Cockpit not used during extractions? If you use both Data Jobs and the Replication Cockpit, make sure they do not run at the same time. You can use the Replication Cockpit Calendar function along with Data Job schedules to set this up. Not Started
  No “unused” / “disabled” tables present in extractions?   Not Started
  Limited the columns extracted to only those necessary?   Not Started
  Filters are applied to large tables?   Not Started
  All extractions placed in a single data pool, and data connections exported to process-specific data pools. This is a best practice to avoid extracting the same data more than one. Not Started
  Dynamic Parameters utilized in Delta Filter section (Last Loads, Change Number, etc.)? This applies if you use Delta extractions with Data Jobs Not Started
Transformation scripts Review each step in each transformation script for :    
  a. Ensure that changes to any Marketplace Connector are commented with Initials, Date and Commentary Commented changes with dates allow for easier Connector updates in the future. Not Started
  b. Ensure each block of code has unambiguous explanation of the purpose of the block of code   Not Started
  c. ANALYZE_STATISTICS(‘XXXX’); used on all temporary tables   Not Started
  d. No Select Distincts (unless there is a comment present as to why it is needed)   Not Started
  e. Appropriate naming convention utlized
   Cases Table: «Process Name» + \ _ + «Table Name» (eg. CLAIMS_CASES)
   Activities Table: _CEL _ + «Process Name» + _ACTIVITIES (eg. _CEL_CLAIMS_ACTIVITIES)
  Not Started
  f. Intuitive Variable naming   Not Started
  g. No “unused” transformations (i.e. “Testing”, “Sandbox”, etc) present in Data Job   Not Started
Transformations - Additional Temporary Tables utilized? Use temporary tables if you run similar joins across multiple transformations. Not Started
  Ensure that there are no cartesian (many-to-many) joins present   Not Started
  Use WHERE EXISTS rather than joins where applicable   Not Started
  Can transformation jobs be run in parallel? If transformations are independent of one another, you can consider splitting them into separate Data Jobs and running them in parallel with a schedule. Not Started
Data Model Loads No error messages on Data Model upload (including warnings)   Not Started
  Using tables instead of views to load to Data Model?   Not Started
  Using a Data Model with the minimal number of tables and columns for a high speed use case?   Not Started
  Subscribed to all Data Models?   Not Started
Replication Cockpit Replication Cockpit replicating without errors?   Not Started
Scheduling Full / Delta Loads scheduled, enabled, and running?   Not Started
Execution History Processing Time for Delta ETL (Extraction>Transform>Data Model) run time less than 1 hour (unless other circumstances override)   Not Started
  Schedules have no errors in recent history?   Not Started
Data Validation Confirm that customer has approved the accuracy of the Raw Data and Activity Steps   Not Started

results matching ""

    No results matching ""

    99 other / uml

    04 react / JSX