No account yet?
WARNING: It has come to my attention that this page, that contains my personal notes for my Databases module during my First Year of Computer Science at university, was indexed by Google and is being shown to students. This page is a really good resource for Databases, but be aware that this is alot more info than you need for the exam. I apologize for the confusion.
One possible way of storing information would be to store each set of data needed on a seperate text file and then implement a program to do specific queries on the text files.
This may seem like a perfectly good solution however there are many limitations to it.
Database: A collection of data related in a logical manner which is designed to provide all the necessary info for an organisation
This is a system which allows users to have complete control over the DB. Giving functions to define/create/maintain the DB
A DBMS has two features:
Data Definition Language: The schema. It allows users to define how the database should be. Including the data types the structures and the constraints of the data.
Data Manipulation Language: The queries. It's the language you use to manipulate the database. A common one is SQL.
DBMS also offers:
Whilst there are lots of very clear advantages, to be balanced lets go through the disadvantages too
Compared to a file system a DB is/has:
A noSQL DB solves these problems by using a hybrid of a file system and a DB.
We want for the DBMS to be able to be trusted and for all operations to be completed. It should be reliable and always be in a consistent state.
It needs
Transaction: An action carried out which reads/updates database.
DDL: specifies entities/attributes/relationships/constraints. However it is too low level to understand for most people.
Data model: Intuitive concepts describing data
The three way of charecterizing the data is:
This is a graphical description of the DB
Notations for the ER model:
The aim of the schema is to allow users to all have access to the same DB instance with customized views depending on what part they need to see themselves.
Upper levels in the DB schema should not be affected by changes in the lower level
Logical data independence: External schemas called views don't change if change the logical structure of the data.
Physical data independence: Conceptual schema doesn't change if we change the internal schema.
A relational model represents a DB as a collection of relations and constraints.
This is basically the same as the entity relationship model which we'll talk about next so I'll just gloss through this section and use it as an overview.
This is a graphical description of the DB to allow people perhaps without such advanced knowledge to understand the db. Aka engineers.
This is the set of all common characteristics that are shared by entity occurrences of a particular type.
This is a bottom up strategy, basically we start with all the data and attributes stored in the tables. Then from that we figure out optimal relationships such to have the best designed relational database
it should have No redundancy: Every data item is stored in one place.
Modification anomaly: Don't change all instances of a specific value, after modifying one.
Deletion anomaly: Losing other values because you delete one items data
Insertion anomaly: Where when new data items are added, we need to add information about other entities.
To fix the data anomaly problems, remove the dependencies and redundancy by splitting data into multiple tables. Where data which must stay consistent between values of same attribute value, is stored one time in a different table
This describes the relationships between attributes in the same relation.
Let A and B be two sets of attributes. Then B is Functionally dependent on A if each value of A is associated with exactly one value of B
The closure denoted F+ is the set of all functional dependencies that are implied by dependencies in F
To compute the closure F+ of F we need some interference rules
Armstrong's axioms:
This is a multi-stage process, where the result of each stage is called a normal form. At each stage we do a check to see if the specific criteria is satisfied and if it isn't we have to reoranise.
repeating group: Attribute values that repeat. Attributes should only have one value for each cell.
Is in 1NF and there are no partial functional dependencies. This means every non-key attributes depends on the whole primary key. E.g you can't work out a non-key attribute without knowing the primary key.
Whole primary key: the set of Keys which allow you to uniquely identify the set of attributes
So, perhaps a subset is dependent on one primary key and another subset dependent on the whole primary key. If this is the case then there is a partial dependency.
To convert to 2NF we do:
This gets rid of transitive dependencies. This means getting rid of cells that are dependent on another non-candidate key.
With this we describe the data by the operations applied to get it.
There are 6 basic operations:
Derived operations:
σpredicate(R)
πcol-1,...,col-n(R)
These are all the same as their set counterparts (except they require compatibility) so lets go quick through them.
R∪SR \cup S
R-SR - S
R∩S
Inorder for union, set difference or intersection to work the schemas must match.
So same number of attributes and corresponding attributes must have same domain. Note: not same name, just same domain.
R×S
ϱX(R)
Joins are cartesian products plus a selection.
equijoin: this is a type of join where the selection involves the equality operator.
$$R \bowtie S = \Prod_{C_1, \dots, C_2} (\sigma_{R.A_1=S.A_1, \dots, R.A_k =S.A_k)(R \times S)) $$
This is an equijoin between the relations R and S, over all their attributes that have the same name, without duplicates.
Steps to do a natural join:
This computes the natural join of R and S, and then adds tuples which did not match between R and S.
left outer join: Adds tuples from R which did not match those in S.
Right outer join: Adds tuples from S which did not match those in R.
Full outer join: Adds tuples from either which didn't match.
the natural join between R and S. However with only the attributes of R
This is literally how the internet browser works.
This has network of computers each with part of the database, which mirrors the organisational structure.
Aims to:
Distributed DBMS is the software system that manages the distributed database and makes the distribution transparent to the users.
Distributed processing is a centralised database that is accessed over a computer network. But Distributed DBMS has multiple DB fragments distributed across sites.
How to do this depends on Quantitative information and Qualitive information