Quantcast
Channel: TechCenter
Viewing all articles
Browse latest Browse all 2350

The Dell Business Intelligence Project Using USPTO Data: Episode 1

$
0
0
Revision 15 posted to Business Intelligence and Appliance Solutions by Robert_Pound on 5/16/2013 5:01:18 PM

The Dell Business Intelligence Project Using USPTO Data: Episode 1

Dell, database, quest, Boomi, data warehouse, Toad, Business Intelligence, Quickstart

Author: Robert Pound, Roger Lopez, Umesh Sunnapu

Links:Enterprise SolutionsBusiness Intelligence and Appliance Solutions

Episode 1 | Project Overview

Welcome to the first episode of the Dell Business Intelligence Project Using US Patent Office (USPTO) Project. You might wonder, what is the Dell Business Intelligence Project Using USPTO Data project? This project was created by the Database Solutions Engineering & Business Intelligence Product Group teams. The project goal is to show the step-by-step process of building a data warehouse and business intelligence solution. In this first episode, we describe what we want to achieve and examine the dataset we want to use for the project. Google and the USPTO have worked together to make the USPTO data publicly available. Our goal is to show you how to extract meaningful information from this data.

Project Objective

This project shows a Dell end-to-end business intelligence solution. The main objective of this solution is to download, capture, analyze, and find correlations with the publically available USPTO data. We accomplish this goal using an implementation of the Dell Quickstart Data Warehouse to import and load the data into an SQL database. We then implemented Dell Toad products to analyze the data. The project starts off by looking at the following information:  

  1. Top 10 patent producing companies for a given year.
  2. Top 10 patent producing cities for a given year.
  3. Top patent categories for a given company.
  4. Maximum time before a patent is awarded.
  5. The histogram for patents inflow for a given year.
  6. Top 10 patent authors.

Episode overview

This project is the collaboration of multiple Dell teams, over several months. The results of this collaboration are shown in the following episodes:

      2.    Sample XML 
      3.    Boomi: HTTP to Disk 
      4.    Boomi: XML to CSV 
      5.    Boomi: HTTP to CSV 
      6.    Quickstart Data Warehouse Appliance 
      7.    SQL creation/Boomi: CSV to SQL 
      8.    Boomi: HTTP to SQL 
      9.    SQL: De-duplication 
      10.  Boomi: one year after 
      11.  Analyzing Data using Toad BI Suite 

Episode 2 | Sample XML

Episode 2 covers the XML data format from the US patent office. The patent office must conform to several data standards with their XML documentation. This XML conformation makes the data more complicated than regular XML data. The episode explores how you can view sample patent data using Microsoft Excel to get a better idea of the collected data. Link

Episode 3 | Boomi: HTTP to Disk

Episode 3 is the introduction to Dell Boomi. The content covers downloading a single zip file, uncompressing in the cloud and storing the results on a local drive. Link

Episode 4 | Boomi: XML to CSV

Building on the process created in Episode 3, Episode 4 covers the details of mapping an XML file to a Comma Separated Value (CSV) file. The file downloaded in Episode 3 is used in this process. This episode includes detailed instructions on how to manually modify the file that still needs to be processed. Link

Episode 5 | Boomi: HTTP to CSV

Episode 5 combines the information covered in Episodes 3 and 4 by showing how to download a zip file and translate the contents of the uncompressed XML file to a CSV file without manual intervention. Link

Episode 6 | Dell Quickstart Data Warehouse Appliance

This episode describes what steps are taken to set up the data warehouse appliance to store and analyze data. [Coming16May2013]Link

Episode 7 | SQL creation/Boomi: CSV to SQL

Running in parallel to Episodes 4 and 5, Episode 7 uses the CSV output from the previous episodes as input to the process. The episode covers the Microsoft SQL database schema residing on the Dell Quickstart Data Warehouse Appliance. In addition, it covers the Boomi process created for connecting to and importing data into the database. [Coming 21 May 2013]

Episode 8 | Boomi: HTTP to SQL

Episode 8 combines the full Boomi process from end-to -end for one data file from the US patent office. This episode combines the knowledge of episodes 3 through 7. [Coming 23 May 2013]

Episode 9 | SQL: De-duplication

When data is translated from XML, there are many duplicate rows produced. Episode 9 covers how this duplication was overcome in SQL, once the data was loaded into the database. [Coming 28 May 2013]

Episode 10 | Boomi:  One year

Episode 10 combines all information gathered from previous episodes to collect one year of US patent data and store it into the database. [Coming 30 May 2013]

Episode 11 | Analyzing Data using Toad BI Suite

Episode 11 details the steps of how the Toad BI products are set up and then dives into the details of how analysis was performed on the US patent office data and what insights were gained. [Coming 4 June 2013]


Viewing all articles
Browse latest Browse all 2350

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>