Databases

                                                                                                           

Possible references:

 

Topic objective: Students should understand the function and usage of databases, including relational databases, and the related relational operations. 

 

  Database – a collection of ordered data, and the software that maintains it. 

 

In a database, the data contained is independent from the program which maintains it.

 

 

Types of data:

 

data operations:

            statistical data –

      comparison

      mathematical operations

nominal, ordinal – comparison only (equality – nominal & ordinal; position (higher/lower) – ordinal)

complex data – special operations depending on data e.g. for DNA: length, percentage of pyrimidines, etc

 

data domain – set of allowed values

 

 

maintenance of data collection

 

        provide bin for each data category

        provide slot in every bin for each data

        label each slot and bin

        catalogue bins and their slots

        add new bin and slot when needed, and update description catalogue

 

 

Data model Entity Relationship (ER) – can be used to lessen data redundancy

                        Data relationships are reduced to three components

-         entities (data elements)

-         attributes (possessed by entities)

-         relations (between entities)

 

 

Normalization

Some relations can be broken up to smaller relations without loss of information

normal form

 

                                                                       

 

Database Models

 

Relational database(Codd model, 1969)

 

-         correct and complete

-         based on relations

 

 relation – set of domains which together descibe an entity/event

 

made up of   – head – domain name (, attribute)

                        – body – data

cf. table (but table – columns in specific order; relation – not in specific order)

 

e,g. relation called MATCH

its head

<PTR : PLBS>  <PPL : PLBS>  <GOLTR : INT>  <GOLPL : INT>  <TP : DATE>

(domain PTR with attribute PLBS, domain PPL with attribute PLBS, …)

body:

   Terengganu    Kelantan    5    0    01/01/01

   Kelantan        Johor         3    4    05/01/01

   Kelantan        Melaka      2    1    07/01/01

- rows of tuples which are indexed by a key (certain attribute or attributes)

 

can refer to [relation name].[attribute] for example MATCH.PTR

 

Relational algebra:

            unionH1 H2 – relations with same structure (same sets of attributes) – combine bodies [example]

intersectionnH1 H2 – relations with same structure – retain common tuples only [example]

            differenceH1 H2 – relations with same structure – tuples in one which are not in the other [example]

            Cartesian productH1 × H2 – not necessarily same structure – gives all possibilities [example]

            limit / choose operation – σattrib=‘…’ (H) – give new relation with body containing selected tuples from original relation [example]

            project operationπl1,l2,…(H) – give new relation with selected attributes from original relation [example]

            combine / join operation – (Cartesian product + project + limit) give new relation by combining two relations – repeated attributes discarded, tuples chosen according to key [example] –

                        equal join H1 atrib1=atrib1' H2 – combined according to one common column (also greater join H1 atrib1>atrib1' H2 dsb)

                                    example:            

 

relation A

 

relation B

 

 <V>

<W>

 

<X>

 <Y>

<Z>

 

r

t

p

2

4

6

 

5

4

4

2

g

d

t

m

p

e

f

q

                                                if C JOIN A and B where A.W=B.X       [C= A A.W=B.X B]

                                                then C is:               

 

relation C

 

<A.V>

<A.W>=<B.X>

<B.Y>

<B.Z>

 

r

t

t

2

4

4

m

d

t

q

e

f

                                                for C JOIN A and B where A.W<B.X    [C= A  A.W<B.X B]

                                                C is:                        

 

relation C

 

<A.V>

<A.W>

<B.X>

<B.Y>

<B.Z>

 

r

r

r

t

2

2

2

4

5

4

4

5

g

d

t

g

p

e

f

p

natural join H1  H2 combine all common columns but keep one of each column only.

external join:

H1  H2          left       Natural join complete in H1 though not fitting to common column

                                                e.g. C= A  B, with A.W and B.X as common columns; C is:                                

 

relation C

 

<A.V>

<A.W>=<B.X>

<B.Y>

<B.Z>

 

r

t

t

p

2

4

4

6

m

d

t

null

q

e

f

null

null placed in columns which do not correspond

H1  H2          right Natural join complete in H2 though not fitting to common column

H1  H2        full Natural join complete in H1 and H2 though nt fitting to common column

partial join H1 P H2 = πA(H1 P H2)           where P is a certain predicate and A represents all attributes in H1

            divide operation H1 ÷ H2 (limit + project) find all values in non-key attributes of first relation which share tuples with every value in second relation [example]

            comparison operations – (comparison of relations!)

                        =          equal

                        !=         not equal

                        <=        subset

                        <          proper subset

                        >=        superset

                        >          proper superset

            additional operations (additional to Codd)

                        - extend – add attribute (opposite to project)

                        - insert – copy all tuples from a relation to the body of another relation (of same structure)

                        - update – modify the value of an attribute

                        - delete – delete tuple depending on value of certain attribute

                        - rename attribute – ρH'(atrib1',atrib2',…)(H)

                        dll.

 

 

Relational calculus – non-operational description of relations – (has same capability as relational algebra)

            e.g. MATCH.PTR WHERE (MATCH.TP=05/01/01 AND MATCH.GOLTR > MATCH.GOLPL)

                        WHERE

                        AND

                        OR

                        EXISTS (true if true for one case)

                        FORALL (true if true for all)

                        IF … THEN

                        IFF – if and only if

 

Integrity

 

-         addition – have to be certain that external refers to an existing tuple

e.g. adding tuple Melaka Johor … to PERLAWANAN requires existence of Melaka Stadium Kubu in STADIUM

 

-         deletion – avoid ‘orphan’ tuples

e.g. delete tuple Terengganu … from PERLAWANAN results in orphan tuple Terengganu… in STADIUM

 

-         update – like deletion and addition

 

domain integrity: cases for example where –

            if A happens, B cannot happen

                        e.g. a brown-eyed person cannot have both parents blue-eyed

            if A happens, B must happen

                        e.g. if someone scores a goal, his team must have at least 1 goal in its favour

            the value of A constrains the value of B

                        e.g. the date of death of a person must be bigger than his date of birth

 

 

Commit/rollback protocol: A data ‘transaction’ may involve several processing steps. As a defence against damage, a copy (log) of the transaction is done, and when the copy is complete – commit point – if there is damage the transaction can be rebuilt. If a problem occurs before the commit point, the log can be used to rollback the steps that have been carried out. Problem: other transactions may have used the data updated by the halfway steps, and so on – cascading rollback.

 

Lock: To prevent problems from transactions executing when other transactions are executing (problem of wrong summary, problem of lost updates) – locking mechanism for locking data – shared lock (other parties can read but not write), exclusive lock (other parties cannot read nor write). To prevent deadlocks : bleed-wait protocol – ‘young’ processes give way (wait, or unlock and rollback) to ‘old’ processes.

 

 

 

Hierarchical database

 

Data organized in a hierarchical form

 

Networked database

 

Data organized in a network form

 

Object-oriented database

 

Data organisation uses the concept of the object-oriented approach

 

etc

 

 

Perisian pangkalan data

 

SQL (Structured Query Language) – suatu bahasa utk pengolahan pangkalan data

misal suruhan, operator, dsb:

SELECT l1, l2, …, ln FROM j1, j2, …, jm WHERE P        yg bermaksud π l1, l2, …, ln (σP(j1× j2×× jm))

ORDER BY

GROUP BY

HAVING

AND

CROSS JOIN

JOIN

LEFT JOIN

CREATE TABLE

UPDATE

INSERT

dsb

 

Contoh perisian alatan:

ACCESS

ADABAS

dBASE III

ORACLE

dsb

 

 

Perlombongan data

(KDD - Penemuan Pengetahuan dalam Pangkalan Data)

 

[Data yang banyak (dan kompleks) dalam pangkalan-pangkalan data →]

pengeluaran tak remeh maklumat tersirat, yang belum diketahui, dan berkemungkinan berguna, dari data

 

Teknik statistik dan pembelajaran mesin digunakan termasuk: pengelompokan, perumusan data, pembelajaran hukum pengelasan, pencarian rangkaian kegantungan, analisis perubahan, pengesanan anomali untuk penemuan, dan teknik gambaran untuk persembahan maklumat dalam bentuk yang mudah difahami.

 

 

 

back to synopsis of SMES3103