MITB Banner

Creating Corporate Database – Managing Spreadsheet Series: 5 of 5

Share

The last step in creating corporate data from spreadsheets is the creation of the corporate database itself. The corporate database is created when –

  •   The spreadsheet has been selected
  •   The spreadsheet has been logged in
  •   The spreadsheet has been run through spreadsheet disambiguation technology

At this point the data and the metadata from the spreadsheet have been stripped from the spreadsheet. Fig 1 shows this progression –

The data that is put in the corporate data base consists of –

  •   Column name
  •   Row id name
  •   Value
  •   Spreadsheet system name
  •   Date of processing

Fig 2 shows the contents of the corporate data after processing –

The column name and the row id name serve to identify the value. The spreadsheet system name identifies the source of the data, and the date filed determines what particular day the spreadsheet was processed. Spreadsheet system name and date of processing are needed to satisfy lineage requirements, and column name and row id name are needed to provide the metadata that is associated with the value.

On occasion, it may be necessary to delete one or more entries into the corporate data. This is because sometimes spreadsheets contain extraneous or spurious data. The spreadsheet disambiguation program picks up ALL elements of data on the spreadsheet. So it is possible that unwanted data arrives in the spreadsheet corporate data input. If that is the case then those unwanted elements of data are “weeded out”.

Fig 3 shows the weeding out of unwanted data before it is placed into corporate data.

One of the features of corporate data is that it usually is arranged by subject area. But when data comes out of spreadsheet disambiguation, it comes out as it was laid out on the spreadsheet. For this reason, it is often convenient to divide the corporate data by subject area before finalizing the corporate database.

Fig 4 shows this activity.

A final consideration of corporate data coming from spreadsheets and corporate data coming from other sources is that once the spreadsheet data has been cast into the form of corporate data, it can be freely mixed with other corporate data.

Fig 5 shows this capability.

The ability to integrate spreadsheet data with other corporate data easily is one of the major advantages of moving spreadsheet data to the corporate data environment. The analyst finds this capability to be very useful.

The integration is achieved by using the column name/row id metadata and comparing it to the metadata that is already found in the corporate data environment.

Fig 6 shows this interaction.


Bill Inmon – the “father of data warehouse” – has written 57 books published in nine languages. Bill was named by ComputerWorld as one of the ten most influential people in the history of the computer profession. Bill lives in Castle Rock, Colorado.

Bill’s latest book is TURNING TEXT INTO GOLD, Technics Publications, a book that shows how text can be turned into business value. TURNING TEXT INTO GOLD is available on Amazon.com.

PS: The story was written using a keyboard.
Share
Picture of William Inmon

William Inmon

William H. Inmon (born 1945) is an American computer scientist, recognized by many as the father of the data warehouse. Bill Inmon wrote the first book, held the first conference (with Arnie Barnett), wrote the first column in a magazine and was the first to offer classes in data warehousing. Bill Inmon created the accepted definition of what a data warehouse is - a subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India