Analytics are vital to the success of business, but anyone who has taken the visual or advanced analytics journey will tell you that the actual analysis is only 20% of the work. In this article we discuss how to prepare our data to be analytically ready, using a simple example to illustrate the other 80%.
This article is a step-by-step guide for connecting, exploring, integrating, and publishing data. In order to follow these steps, you must have Statistic Enterprise installed with its sample databases, as well as Toad Data Point and Toad Intelligence Central (TIC).
1. Connectivity
Before we can profile, merge, cleanse, transform, or aggregate our data, we first need to connect to it.
New Toad Intelligence Central connection
First, we create a connection to Toad Intelligence Central, where we will share all of our data.
- Open Toad Data Point.
- Click the Connect icon on the ribbon bar.
- Select Intelligence Central from the drop-down list of connectors.
- Enter in connection information for your Toad Intelligence Central instance.
- For this demonstration, I am creating a connection to a Toad Intelligence Central server running on the same system as Toad Data Point. Your host information may be different
- Register as a new user.
- Click Connect.
- A new Toad Intelligence Central connection is displayed in the Navigation Manager.
- The green arrow indicates a live connection.
New Statistica example database connection
Now that we have a connection to our collaboration platform, Toad Intelligence Central, we can pull in a simple data set and begin to investigate the data.
- Click the Connect icon on the ribbon bar.
- Select Microsoft Access from the drop down list.
- Click the ellipsis next to the Database file:
- A file selection dialog box is displayed.
- Navigate to the example database, typically found where Statistica is installed: C:\Program Files\Dell\Examples\Database
- Click Open.
- Click Connect.
- Once again, a new connection is displayed in the NavigationManager.
We have created the necessary connection to connect to and then publish our data; in the next section we look at how we explore our data.
2. Exploring ProcessData
Now that we have our sample data connected in Toad Data Point, we want to see what we have in that database. First, we will view table details, and then look at a statistical breakdown of the data in those tables.
Data Viewer
To understand what we are connecting to, we first use the Viewer tab to give us an idea of what the tables contain.
- In the Object Explorerat the bottom:
- Expand ProcessData>Tables.
- Right-click the table RAWMAT.
- Select View Details.
- A new window is displayed containing details on the table.
- To view details on the other tables, simply select them in the ObjectViewer and the information in the Viewer table will change to reflect the current selection.
- In the Viewer window, we can view several attributes including:
- Columns – details the data types associated with the tables
- Data – gives us a sample of the data
- Script – shows how the table can be rebuilt
Data Profiler
Now that we have some context on what was intended to be in the tables, we will look at the actual contents of the data. For an overview of data profiling in Toad Data Point, view this video: http://www.toadworld.com/products/toad-data-point/m/media-library/1458
- Ensure that ProcessData> Tables is expanded in the Object Explorer.
- Right-click the table RAWMAT.
- Select Data Profiling.
- In the new window, we are presented with a summary page.
- We can see, for instance that ID is 100% unique, which is what we would expect.
- We can see individual variable details for Statistics, Frequency, and Patterns on those respective tabs.
- On the Duplicates tab, we can view a combination of variables to see what duplicate rows arise.
- In the example below, I have selected both LENGTH and WIDTH and clicked Check Duplicates.
It is important to understand what our data contains before we begin the process of cleansing and joining our data. Now we have a better idea of what to expect when we create our analytically ready data set.
3. Integration
Now we will blend different tables together from the same database to form a more analytically ready data set. In real life, this would entail more than one type of data and may involve significant transformation to get the data ready to analyze. Since we want to utilize the example data from Statistica however, we will simply join two tables together and create a calculated column. For an idea of what other transformations can be performed in Toad Data Point, follow this link: http://www.toadworld.com/products/toad-data-point/b/weblog/archive/2015/03/20/toad-data-point-transformation-and-cleanse-part-1
- Ensure that the ProcessData Access database is displayed in the Navigation Manager.
- Click the Build icon on the ribbon bar.
- Double-click both the RAWMAT and LSPEC tables.
- The two tables are now displayed in the canvas of the Query Builder tab.
- Select the (Add All Columns) check box for each table
- Click and drag the variable RAWMAT > PARTNUM to LSPEC > PARTNUM
- An inner join is now created between these two tables.
- Right-click anywhere in the canvas.
- Select Calculated Fields.
- Enter L_DELTA as the New field name and click the (+).
- This will add the new field to the Defined Fields area.
- Click the down arrow in Field Definition and enter in the following:
- RAWMAT.LENGTH - LSPECS.TARGET
- Click OK.
- For the field Attach to table, select LSPECS.
- Click OK.
- Now our newly created field is displayed in the LSPECS table.
- At this point, we could select the Query tab at the bottom of the window to view the query that we are about to run.
- Finally, click Execute SQL at the bottom to query the data.
We finally have our data in a format that is acceptable for ingest into our analytical platform. In the next section, we see how we can collaborate around this data.
4. Publish
Now that we have a data set ready to be consumed for analytics, we can share this data out on our virtualization collaboration platform, Toad Intelligence Central server.
- The Result tab in the Query Builder window displays data in a grid obtained in the previous steps.
- Right-click anywhere in the data grid.
- Select Send To> Publish Data.
- A new window is displayed with publishing options.
- For this demo, simply change the Name to ProcessData1 and click Publish.
- For more information about publishing options, view this video: http://www.toadworld.com/products/toad-data-point/m/media-library/1374
Our analytical data is now available and ready to use from Statistica Enterprise Manager.