Dell Business Intelligence Project Using USPTO Data: Episode 8
Overview
Episodes 8 focuses on directly loading a week’s patent data in XML format directly into a staging table. Essentially the processes in episode 5 and episode 7 are combined in this process.
Create staging tables
Before loading the data, we must first create the staging table in the database. This is a temporary table and contains all of the fields that we want to import from the patent.
Process for Importing XML Patent Data to the Database Staging Table
Episode 5 explains the processes of extracting data from USPTO and mapping the XML profile to a CSV file. Episode 7 describes getting data from the CSV file and inserting the data into a database table. We will
combine the two processes to get the data into a staging table.
For more details on steps 1, 2, 3, copying, combining, and executing processes, refer to episode 5. For details on database connectors (step 5), refer to episode 7.
- Modify Map XML to database
- Create database profile for staging table
- Map XML and database profile
- Modify database operation for database connector
Modify Map XML to the Database
In this section, you will modify the map function (XML to SV) that is copied from the process in Episode 5. The new mapping will be drectly from XML to the database table. You will first need to create a new database pofile for the staging table.
Create a Database Profile for the Staging Table
Episode 7 describes creating a database profile for a sample able. Follow the same steps to create a database profile for your staging able. This profile will then be used in the map function, to map XML to a database. Below are the steps for creating a new component profile for the staging table:
Map the XML and Database Profile
Open the Map shape in the process copied from an earlier project. The Map Properties window opens automatically.
On the right side (for example, for a database profile of the map), select the map symbol. Next to Profile Type, select Database, and then select the database profile for the staging table created in the previous step. Click OK.
To map, you can click Boomi Suggest and Boomi will automatically suggest mappings between the two
profiles. You can also simply drag elements on the left to the corresponding elements on the right to connect.
Click Save and Close.
Note: Currently we are importing 16 fields into the database table. The process takes about 42 minutes to complete. 30GB of memory is assigned to the atom. (For steps to assign memory to the atom, refer to the “Modify Atom” section of episode 5). Adding more fields from the patent XML profile to the database will increase the time for the test to complete. Some of the fields like “claim-text” have variable data (from single line to multiple paragraphs) and caused errors while importing. Importing more fields (for example, 100) caused the process to run for multiple days and the process had to be killed. So for now, we will be importing just the fields required for our reporting.
Modify Database Operation for the Database Connector
Since we want to connect to the same database using same user credentials, use the same database connector that is copied from the process in the episode 7. However, we must modify the operation. The operation in the earlier process (episode 7) refers to the sample table, whereas now we want to use the database profile for the staging table.
In the process window, click on the Operation for the database profile.
The Database Operation page opens.
Click on the magnifying glass icon and select the database profile that you created for the staging table in the previous step. Now the operation is pointing to the new database profile for the staging table.
Select Save and Close.
The process is now complete and when executed will insert the selected patent fields into the stage table.
The next steps are removing the duplicate rows, and inserting the rows into the respective database tables. This will be covered in the next episode.