Databases can be any collection of information; a simple example is a book for recording names, addresses and telephone numbers. One the other hand, banks and government departments maintain databases containing millions of records covering their customers and businesses.
Database design has a number of objectives including: -
Flexibility in the representation of data. For instance a decision may be made that there should be no limit on the number of phone numbers recorded for each customer. Allowing for two or three phone numbers is no longer enough, so rather than attempting to pick how many, the design allows for an unspecified number .
Fast storage and retrieval of data to meet required response times. This may mean the data holdings are duplicated over several servers in order to spread the load in peak periods.
Each item of data should be stored only once. A customer name would not appear in a telephone file or an address file when it is already present in a customer file. Instead the telephone and address files link to the customer file via keys. The data must be accurate; sophisticated databases allow for exact specification of what is allowed. Obviously an address book has little in the way of validation and it is up to the author to ensure accuracy, whereas large databases are subject to significant testing and validation of their content.
The following discussion uses a simple example of customer data to explain some basic concepts. It is based on a type of database known as “relational”. Relational database management systems (RDBMS) are the most common type at this time and are the backbone of most significant web sites. Brand names include Oracle, MySQL, Microsoft SQL Server and SQLite. There are too many more to mention. Their common visible feature is the use of SQL – Structured Query Language as the means for data access and manipulation. “SQL” can be pronounced as “S-Q-L” or “Sequel”. Take your pick but be aware the discussion about pronunciation gets pedantic. “S-Q-L” is the safest.
Further information can be found here: Wikipedia - Database Models
Database design starts by identifying the data items to be held in each file or table. Files may be regarded as a storage method outside of formal database management systems (DBMS) while “table” is a similar container for data such as “customer” or “address” or “postcode” within an RDBMS.
The design is validated to ensure that each data item is represented correctly (integer, text, date/time) and exception testing against any existing data is built in to ensure that values are within correct ranges.
The process continues by analysing each data entity e.g. customer, address and defines separate files for each item likely to be duplicated, with only the keys to these files stored on the master record.
Consider a simple example of breaking down a street address into its individual items, e.g.
Shop 24
Arcade Shopping Centre
500 Elizabeth St
Melbourne Vic 3000
Each part of the address (unit type (shop, factory or unit for example), unit number, street number, street name, city, state and postcode) would be in its own field and values can be controlled by reference to a table of allowed values by means of a key known as a “foreign key”.
Since the last three items “city”, “state” and “postcode” have a limited range of values and occur on every record, they would be moved to a separate table with a foreign key which would be stored in the address record.
If an item doesn’t exist (e.g. unit type) its key field on the address file would be null, or (preferably) would link to a record with a null value.
|
Customer Master File |
CustomerAddressFile |
UnitTypeFile |
||
|
CustomerID |
CustomerAddrID |
UnitTypeID |
||
|
Customer ID |
UnitTypeValue |
|||
|
UnitTypeID |
||||
|
|
UnitNumber |
|||
|
StreetNumber |
||||
|
StreetNumberSuffix |
PostCodeFile |
|||
|
StreetName |
PostCodeID |
|||
|
PostCodeID |
PostCodeValue |
|||
|
(Other Values & Keys) |
CityName |
|||
|
StateName |
||||
In the above diagram, the colour-coded fields are keys into look-up files to illustrate how data might be structured. Please note that each of these files might contain other variables such as creation and expiration date stamps which are not included here.
The level of detail e.g. “unit type”, will depend on what is required and for many simple systems this may be no more than inclusion in a field “address1” with no control over its content. It may become specific for utility companies supplying gas, electricity or computer networks.
It falls on the skills of the analyst to identify what level of detail is required in any given situation.
Large databases can contain hundreds, if not thousands of tables. Whatever the size, careful upfront design will pay dividends in ease of access to data and the ongoing usefulness of that data over time.
Xansoft has many years of experience in database design and management available to our clients.