Recently I decided that I wanted to explore some certification topics and utilize the sample schemas (HR, OE, PM, IX) that you often find referenced in the Oracle documentation. During database creation with the DBCA utility, there is an option to include the sample schemas as part of the initial configuration. If you’re like me, however, you might often leave this option unchecked. This is a fairly straightforward task designed to allow DBA’s to manually install the Oracle 11g sample schemas after the database has already been created.
Difficulty: Easy
Downloading Oracle Examples Files
If you left the oracle samples option unchecked during DBCA and you want to install the HR schema, you will only see hr_code.sql in $ORACLE_HOME/demo/schema/human_resources.
[oracle@centosvm01]$ ls $ORACLE_HOME/demo/schema/human_resources hr_code.sql
We need to go out to Oracle’s website and download the necessary files.
Oracle Database Software Downloads
In my case I’m running 11gR2 on a Linux x86_64 architecture. Click the “See All” link next to your appropriate architecture to see the full list of Oracle downloads for your product.
Accept the license agreement at the top of the page and scroll down to find “Oracle Database 11g Release 2 Examples”. Enter your OTN credentials when prompted to download the files.
Note: The Database Examples is often referred to as “Companion CD” – something you’ll find floating around if you look on older message boards for this task.
Extracting and Installing the Examples
I have downloaded the examples to the /tmp directory on my Linux host. Now let’s make sure the oracle user owns the contents and enable the DISPLAY for the installer for the oracle user.
[root@centosvm01]$ xhost + access control disabled, clients can connect from any host [root@centosvm01]$ cd /tmp [root@centosvm01 tmp]$ chown oracle:oinstall linux.x64_11gR2_examples.zip [root@centosvm01 tmp]$ su - oracle [oracle@centosvm01 ~]$ cd /tmp && unzip linux.x64_11gR2_examples.zip [oracle@centosvm01]$ xhost + access control disabled, clients can connect from any host
Installing the Examples files to $ORACLE_HOME
[oracle@centosmv01 tmp]$ cd /tmp/examples] [oracle@centosvm01 examples]$ ./runInstaller
Assuming the $ORACLE_BASE and $ORACLE_HOME variables are already defined (i.e., in ~/.bash_profile), these will populate automatically for you in the wizard.
After running a few system checks, you will be presented with the summary screen.
Installing the HR schema
When executing the script to build the HR schema, it will prompt you for 5 variables (password for HR schema, default tablespace, temp tablespace, SYS password, default log path).
[oracle@centosvm01]$ cd $ORACLE_HOME [oracle@centosvm01]$ sqlplus "/ as sysdba" SQL> @?/demo/schema/human_resources/hr_main.sql
When prompted, I passed the following (hr, users, temp, foobar, $ORACLE_HOME/demo/schema/log/). Once the script completes, log in as the HR user with the password you supplied in the prior step and check to see the tables were created.
[oracle@centosvm01 db_1]$ sqlplus hr/hr SQL*Plus: Release 11.2.0.3.0 Production on Tue Apr 24 10:22:54 2012 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> select table_name from user_tables; TABLE_NAME ------------------------------ LOCATIONS DEPARTMENTS JOBS EMPLOYEES JOB_HISTORY REGIONS COUNTRIES 7 rows selected.
For more detailed information on the guidelines and installing other sample schemas, consult the official Oracle documentation:
http://docs.oracle.com/cd/B28359_01/install.111/e10402/toc.htm#CEGDAIDC
http://docs.oracle.com/cd/B28359_01/server.111/b28328/installation.htm#I4367
One of my passions is Data Warehousing – particularly on the ETL side of the house. Having the opportunity to meet Ralph Kimball while attending his class focused on Dimensional Data Modeling is really what sparked an interest me to improve the performance of our nightly ETL batch processes. While that was a few years ago and I’ve since moved on into the realm of full-time database administration, I try to find time every now and again to build “mini” warehouses just for the fun of reporting – which is the motivation behind this article.
Enough about that, on to the geeky stuff.
Assumptions
For the purpose of this demonstration, let’s assume the following:
DB Platform: Oracle
ETL Process: Manual, code-driven (no Informatica, SSIS, or other ETL GUI)
General PL/SQL knowledge
Natural keys on the source system never change
Assuming warehouse with thousands -> millions+ of records.
Establishing Conventions
A key responsibility of any ETL developer is not only having a firm technical grasp on the tools in front of you, but also a genuine understanding of the business processes (Accounts Receivable, Service, General Ledger, Sales) that drive the reporting needs. This is a very iterative process as you work with your business analysts and end users.
In order to have some consistency in the development process, the following standards and naming conventions are often helpful.
Facts: F_<business process> - F_SALES, F_SRV_REQ Dimensions: D_<entity> - D_CUST, D_DATE, D_PERS, D_EMP, D_SRV_REQ
I prefix all surrogate keys on my dimension and fact tables with “W_” along with the suffix “_ID” – indicative of a “Warehouse ID”. W_CUST_ID, W_PERS_ID, W_SRV_REQ_ID, W_DT_ID are all good examples. Additionally, it is beneficial to know when each record was added/updated on my dimensions and add the columns W_ADD_DTTM and W_UPD_DTTM to reflect this information. As for the primary keys from the transactional system, I track them on my dimension using the prefix “S<#_column name>“. For example, I might have the following data sources feeding into the data warehouse:
Source Source Identifier ------ -------------------- Siebel S1 Excel S2 SAP S3 PeopleSoft S4
This will allow me to easily identify what source(s) are behind a dimension.
The Fun Stuff
Consider the following – We want to build a person dimension in our warehouse – a table that I have constructed as follows:
SQL> desc edw.d_pers; Name Null? Type ----------------------------------------- -------- ---------------------------- W_PERS_ID NUMBER(38) S1_ROW_ID VARCHAR2(20) FNAME VARCHAR2(30) LNAME VARCHAR2(30) EMAIL_ADDR VARCHAR2(50) TITLE VARCHAR2(75) HASHKEY NUMBER(38) W_ADD_DTTM DATE W_UPD_DTTM DATE
The denormalized person dimension is designed based on those columns which are pre-defined as required for reporting purposes. Consider one of the potential source tables – which could have hundreds of columns – from the source transactional system – in this instance, from Siebel:
SQL> desc s_contact; Name Null? Type ----------------------------------------- -------- ---------------------------- ROW_ID NOT NULL VARCHAR2(15 CHAR) CREATED NOT NULL DATE CREATED_BY NOT NULL VARCHAR2(15 CHAR) LAST_UPD NOT NULL DATE LAST_UPD_BY NOT NULL VARCHAR2(15 CHAR) DCKING_NUM NUMBER(22,7) MODIFICATION_NUM NOT NULL NUMBER(10) CONFLICT_ID NOT NULL VARCHAR2(15 CHAR) PAR_ROW_ID NOT NULL VARCHAR2(15 CHAR) ACTIVE_FLG NOT NULL CHAR(1 CHAR) BU_ID NOT NULL VARCHAR2(15 CHAR) COURT_PAY_FLG NOT NULL CHAR(1 CHAR) DISA_CLEANSE_FLG NOT NULL CHAR(1 CHAR) DISP_IMG_AUTH_FLG NOT NULL CHAR(1 CHAR) EMAIL_SR_UPD_FLG NOT NULL CHAR(1 CHAR) EMP_FLG NOT NULL CHAR(1 CHAR) FST_NAME NOT NULL VARCHAR2(50 CHAR) INVSTGTR_FLG NOT NULL CHAR(1 CHAR) LAST_NAME NOT NULL VARCHAR2(50 CHAR) ...
You get the idea.
Type 1 Slowly Changing Dimensions (SCD)
Arguably the easiest of the SCDs to maintain is that of Type 1. In a Type 1 scenario, existing data in the dimension is overwritten with the new information. Simply put: there is no history for tracking change over time. Returning to my person dimension (D_PERS), let’s say 2 weeks from now a record flows from the source transactional system with a different email address than what already exists for that contact in my dimension. Since we’re managing a Type 1 SCD, we simply overwrite the existing email address. Type 1′s are easy to deploy, but prevents the end user from answering the question of “Why did the data change?”. For more on the various types of Slowly Changing Dimensions, check out wikipedia.
Change Data Capture
At this point, I have created my person dimension named D_PERS. In order to perform change data capture on the incoming data, I’m going to make use of Oracle’s built-in ORA_HASH() function. Among many things, the ORA_HASH() function returns a hash value for a given string w/out sacrificing much in way of performance. This will be the value stored as D_PERS.HASHKEY. The idea is you pass all columns that require an update through ORA_HASH() and save the returned hash on the dimensional table for later use.
To illustrate the hashing functionality:
SQL> SELECT ORA_HASH('FOOBAR') HASH1, ORA_HASH('FOO' || 'BAR') HASH2, ORA_HASH('foobar') HASH3
2 FROM DUAL;
HASH1 HASH2 HASH3
---------- ---------- ----------
3338140869 3338140869 2559319480
The following code utilizes the Oracle MERGE statement to process my Type 1 SCD.
MERGE INTO D_PERS D
USING STG.S1_S_CONTACT C
ON (D.S1_ROW_ID = C.ROW_ID)
WHEN MATCHED
THEN
UPDATE SET
D.FNAME = C.FST_NAME,
D.LNAME = C.LAST_NAME,
D.EMAIL_ADDR = C.EMAIL_ADDR,
D.TITLE = C.JOB_TITLE,
D.HASHKEY =
ORA_HASH( ROW_ID
|| C.FST_NAME
|| C.LAST_NAME
|| C.EMAIL_ADDR
|| C.JOB_TITLE),
D.W_UPD_DTTM = SYSDATE
WHERE D.HASHKEY <>
ORA_HASH( ROW_ID
|| C.FST_NAME
|| C.LAST_NAME
|| C.EMAIL_ADDR
|| C.JOB_TITLE)
WHEN NOT MATCHED
THEN
INSERT (S1_ROW_ID,
FNAME,
LNAME,
EMAIL_ADDR,
TITLE,
HASHKEY,
W_ADD_DTTM,
W_UPD_DTTM)
VALUES (
C.ROW_ID,
C.FST_NAME,
C.LAST_NAME,
C.EMAIL_ADDR,
C.JOB_TITLE,
ORA_HASH( ROW_ID
|| C.FST_NAME
|| C.LAST_NAME
|| C.EMAIL_ADDR
|| C.JOB_TITLE),
SYSDATE,
SYSDATE
);
COMMIT;
Let’s put this code into Layman’s terms. The above code can be broken down into 3 easy pieces. The first three lines of the MERGE establish the join condition between the source table (S_CONTACT) and the target table (D_PERS) that we are MERGing into based on the natural key (S_CONTACT.ROW_ID = D_PERS.S1_ROW_ID).
MERGE INTO D_PERS D
USING STG.S1_S_CONTACT C
ON (D.S1_ROW_ID = C.ROW_ID)
The next section defines the first “WHEN” condition – which instructs Oracle what to perform on data that has been matched (based on S_CONTACT.ROW_ID = D_PERS.S1_ROW_ID). In this scenario if first name, last name, email address or job title has changed, we will UPDATE (i.e., overwrite) those columns in addition to the HASHKEY and W_UPD_DTTM (to indicate the record has been updated).
WHEN MATCHED
THEN
UPDATE SET
D.FNAME = C.FST_NAME,
D.LNAME = C.LAST_NAME,
D.EMAIL_ADDR = C.EMAIL_ADDR,
D.TITLE = C.JOB_TITLE,
D.HASHKEY =
ORA_HASH( ROW_ID
|| C.FST_NAME
|| C.LAST_NAME
|| C.EMAIL_ADDR
|| C.JOB_TITLE),
D.W_UPD_DTTM = SYSDATE
WHERE D.HASHKEY <>
ORA_HASH( ROW_ID
|| C.FST_NAME
|| C.LAST_NAME
|| C.EMAIL_ADDR
|| C.JOB_TITLE)
We could avoid using ORA_HASH and save on storage at the cost of performance by re-writing this SQL:
WHEN MATCHED
THEN
UPDATE SET
D.FNAME = C.FST_NAME,
D.LNAME = C.LAST_NAME,
D.EMAIL_ADDR = C.EMAIL_ADDR,
D.TITLE = C.JOB_TITLE,
D.W_UPD_DTTM = SYSDATE
WHERE D.S1_ROW_ID = C.ROW_ID
AND (D.FNAME <> C.FST_NAME)
OR (D.LNAME <> C.LAST_NAME)
OR (D.EMAIL_ADDR <> C.EMAIL_ADDR)
OR (D.TITLE <> C.JOB_TITLE )
By writing the ORA_HASH() value to D_PERS.HASHKEY in advance, we can optimize our SQL by cutting out redundant OR statements for each column we want to monitor for change. The new hash value will differ from the stored value if any of the hashed column’s data has changed – thus triggering our update/insert. My milage with the ORA_HASH method has shown significant performance boosts with thousands – millions of updates. For one, you can index the HASHKEY column – where as indexing the character based columns (name, email, job) won’t provide the same benefit.
The third and final piece of code is the longest but also the most straightforward. It applies to all new records in the source system that do no already exist in the dimension.
WHEN NOT MATCHED
THEN
INSERT (S1_ROW_ID,
FNAME,
LNAME,
EMAIL_ADDR,
TITLE,
HASHKEY,
W_ADD_DTTM,
W_UPD_DTTM)
VALUES (
C.ROW_ID,
C.FST_NAME,
C.LAST_NAME,
C.EMAIL_ADDR,
C.JOB_TITLE,
ORA_HASH( ROW_ID
|| C.FST_NAME
|| C.LAST_NAME
|| C.EMAIL_ADDR
|| C.JOB_TITLE),
SYSDATE,
SYSDATE
);
That’s all there is to it. If first, last, email, or title ever change, it will be captured and overwrite the existing record. I can see this method of ETL being advantageous to smaller IT shops with relative budgets who are looking to roll a data warehouse. More advanced “transformations” are certainly possible, with more advanced solutions. I’ll save those for another day.
I have written in the past about tunneling HTTP traffic over SSH, and even spent some time talking about SSHFS. Needless to say, there is a never-ending list of cool things to do with SSH. There is no sexy way to articulate why you would do this – and for that matter really no practical explanation on why you should do this, but if you ever get the sudden temptation to tunnel smb over SSH – this article is for you. For me the motivation to try this stemmed from no cheap/free alternative to WinSCP available on Mac (to my knowledge). Sure I can scp from the command line or configure SSHFS – but I was determined to see a remote SMB mount show up in my Finder on OS X. I tested this method against Mac OS X Lion and Ubuntu 10.10 Maverick Meerkat and felt compelled to share my experience.
The “Dilemma”
You want to access a samba share on a remote server but do not want to use scp and/or lack a client to mount remote shares on your machine.
The Solution – Mac OS X Lion
This assumes the remote server hosting the samba share is running the SSH daemon. By default Samba is configured to listen on TCP ports 139 and 445. I will begin by establishing an SSH connection to the target server:
$ ssh cj@my-remote-server -L 6445:localhost:445
The “-L” instructs SSH to forward traffic listening on my-remote-server port 445 to my localhost (in this case, the machine running OS X) on port 6445. Once connected, fire up the Finder app and hit Command-K.

Enter “smb://127.0.0.1:6445″ (sans the quotes) in the Server Address box. If all worked correctly, you should be prompted for the samba share’s user/password/workgroup credentials.
Supply your information and you’ll be prompted to select a smb share (in my case, it is named “share”). Give it a second your share will be mounted in the left-handed pane under Shared in the Finder app as “127.0.0.1″.
The Solution – Ubuntu 10.10 Maverick
Same basic principles apply here – Samba is listening on 139 & 445. In order to get this to work on Ubuntu in the same fashion using Ubuntu’s nautilus “Connect to server…” feature – I had to tunnel 139 (or 445) from the remote server running ssh and samba to 139 on my local client. Note on the tip above regarding the Mac, I forwarded 445 on the remote machine to 6445 on the local client. I have yet to find a way to specify a port # in the nautilus interface under the “Windows Share” service type.
Create the tunnel to the remote ssh server running your samba share.
$ sudo ssh cj@my-remote-server -L 139:localhost:139
*Note – In order to forward any port below 1024 (privileged ports) – you must do so as root (hence the “sudo”).
Once connected, go to the Places menu and select “Connect to Server…”. Change “Service type” to “Windows Share” and enter “localhost” into the “Sever” box. Since we’re forwarding samba from port 139 on the remote server to port 139 on our local machine (where Samba listens by default), Ubuntu will react as if we are in fact opening a samba share on the local machine. A new nautilus window will pop up and you should see your “share” directory. Double click on the share and input your user/pass/workgroup details like the previous example.
That’s all there is to it – Still trying to determine how much I will do this in reality – but kind of geeky and cool regardless.
What about you? Why would you do this?






