The Dell Business Intelligence Project Using USPTO Data: Episode 9
Author:MonikaSahni
Links:EnterpriseSolutions, BusinessIntelligenceandApplianceSolutions
EPISODE9:RemovingDuplicateRecordsfromUSPTOdata
Overview
In episode 9, we will focus on removing duplicate records from the staging table.
In episode 8, we covered inserting USPTO data from Dell Boomi into a SQL table. However, the table contains duplicate data. In order to remove the duplicates, you first need to create a staging table. The staging table allows you to remove duplicate data by running a stored procedure before you load the contents of the staging table into destination tables.
Create Stored Procedure to Remove Duplicates
Create Stored Procedure
Create a stored procedure by using the create database script. Using common table expression (CTE), duplicates can be removed from a table. CTE is a feature available in SQL 2005 and later versions of SQL server. This stored procedure removed duplicates from the stage table.
Link to code: SQL data deduplication script
Moving Data from the Staging Table to the Destination Table
This stored procedure has code to move data from the staging table to the destination table.
Calling Stored Procedure from Dell Boomi
Please refer to the final process in episode 8 for importing csv to database. Building on that process, you will need to add a program command.
Program Command
Drag a program command and enter the name of the stored procedure.
Then run the test for importing the csv into the database as shown in episode 8. Then check the database in the staging table for any duplicates in the table.