Tuesday, December 23, 2014

CDC task in SSIS 2012

For SSIS 2012 added new task in CDC part
  1. CDC Control Task - The CDC Control task is used to control the life cycle of change data capture (CDC) packages. It handles CDC package synchronization with the initial load package, the management of Log Sequence Number (LSN) ranges that are processed in a run of a CDC package. In addition, the CDC Control task deals with error scenarios and recovery.
  2. CDC Source - The CDC source reads a range of change data from CDC change tables and delivers the changes downstream to other SSIS components.
  3. CDC Splitter - The CDC splitter splits a single flow of change rows from a CDC Source component into different data flows for Insert, Update and Delete operations. It is essentially a “smart” Conditional Split transform that automatically handles the standard values of the __$operation column.

For my scenario I have created 3 database (CDC_SRC, CDC_Stg and CDC_DEST). I will explain step by step it below:

Step1: Enable CDC for Source Database (CDC_SRC)

Step2: Check the system table area (cdc.captured_columns, cdc.change_tables, cdc.ddl_history, cdc.index_columns, cdc.lsn_time_mapping) tables available in the enabled CDC source database. Also check with SQL Server agent part, it will create two jobs (cdc.CDC_SRC_capture and cdc.CDC_SRC_cleanup)

Step3: Enable CDC for table (dbo.Employee it is source)

Step4: Check the CDC table (cdc.dbo_Employee_CT) available in system table area

Step5: Initial Load package
First create CDC_State string variable, Then I designed the package and CDC Control task configuration it below



For source connection I used CIT-BI\SQL2012.CDC_SRC it is pointed to CDC_SRC database, Destination connection I used CIT-BI\SQL2012.CDC_SRC_ODS is it pointed to the CDC_DEST database.
Here I used ADO.NET Source and ADO.NET Destination task, there is no manipulation of the columns data everything is direct mapping.

Step6: Incremental Load package, as it is same CDC_State variable created in SSIS package, here I used Staging database, and it is used to manipulating the data (insert, update and delete) operation.

Step7: Create temporary tables (insert, update and delete) in staging database.

Step8: Configured CDC Control task for below format



Step9: Load the DFT task for the below format


Step10: Comparing the staging data from destination (CIT-BI\SQL2012.CDC_SRC_ODS) table
I used Execute SQL Task 
Delete opertion
DELETE FROM EmpDest
  WHERE [name] IN (    SELECT [name]    FROM CDC_Stg.dbo.DEST_Delete )
Update Operation
UPDATE dest SET dest.sal = stg.sal   
FROM    EmpDest dest, CDC_Stg.dbo.Dest_Update stg
WHERE stg.Name = dest.name
Insert Operation
--SET IDENTITY_INSERT [dbo].[EmpDest] ON
INSERT INTO [dbo].[EmpDest]
(
       [name], [sal]
)
SELECT [name], [sal]
FROM CDC_Stg.dbo.DEST_Insert
--SET IDENTITY_INSERT [dbo].[EmpDest]  OFF
GO
Step11: CDC Control task the below format setting


Step12: Finally all the temp tables truncate (DEST_insert, update and delete) from the staging database.