Computer Science / Software Engineering Notes Network

Data Management

Matthew Barnes

Contents

Unix fundamentals        5

Philosophy        5

Definition of an OS        5

OS fundamentals        5

Multi-user        5

Multi-processing        5

Multi-tasking        5

Philosophy        5

Open-source software        6

Unix workings        7

Unix file systems        7

Navigating the Unix file system        7

Types of paths        7

Absolute paths        7

Relative paths        7

Path shortcuts        8

Hidden files        8

Useful commands        8

ls        8

cd        9

file        9

man, info        9

touch        9

mkdir        9

rmdir        10

rm, cp, mv        10

Create non-empty file        10

Display file        10

Wildcards        11

Permissions        11

Permission strings        11

Permission numbers        12

Changing permissions        13

Unix pipes, processes and filters        14

Useful commands        14

Running a process when logged off        14

Piping        15

Filter        15

Redirection        16

More useful commands        16

Scripting        17

grep        17

sed        17

awk        17

Regular expression        17

Data exchange        19

Network protocol stack        19

HTTP web protocol        19

Requests        19

Elements of an HTTP response        20

Stateless        20

Data exchange formats        20

XML        20

XPath        20

JSON        21

CSV        21

Remote access        21

Client / Server        22

Thin clients        22

Fat clients        22

Relational model        23

The relational data model        23

Independence        23

DBMS        23

Data model        23

Data sublanguages        23

Relational data model        23

Keys and functional dependencies        24

Key        24

Functional dependency        24

Splitting / Combining rule        25

Trivial dependencies        25

Implication        25

Equivalence        25

A small problem        26

More keys        26

Superkey        26

Candidate key        27

Closure        27

Closure algorithm        27

Bad relations and anomalies        29

Relational algebra        29

Union        29

Difference        29

Cartesian product        30

Projection        30

Selection        31

Renaming        32

Θ-Join        32

Natural join        33

Aggregate functions with unary operator        34

Database systems        34

Normalisation        34

Zeroth normal form        35

First normal form        35

Second normal form        36

Third normal form        37

Boyce-Codd normal form        38

Modelling and SQL basics        40

Types of data modelling        40

Conceptual (ideas)        40

Logical (high level)        40

Physical (low level)        40

Entities        41

Strong        41

Weak        41

Associative        41

Attributes        41

Attributes        41

Multivalued attributes        41

Derived attributes        41

Relationships        41

Relationship        41

Weak relationship        41

Cardinality        42

One-to-one        42

One-to-many        42

Many-to-many        42

SQL - Structured Query Language        42

Data definition language (DDL)        42

Data manipulation language (DML)        42

Data definition        42

Create database        42

Create table        42

Define primary keys        43

Define foreign keys        43

Joins        43

Inner join        44

Left join        44

Right join        45

Full outer join        45

Data languages        47

Relational Algebra vs SQL        47

Counterparts        47

Self-joins        47

Aliases in SQL        47

Multisets        47

Union        47

Difference        48

Intersection        48

Cartesian product        48

Multisets in SQL        48

Advanced SQL        49

SQL Aggregate functions        49

Grouping        49

Views        49

Indexes        49

Data Protection and the GDPR        49

Data Protection        49

GDPR        49

Key principles        50

Consent        50


Unix fundamentals

Philosophy

Definition of an OS

OS fundamentals

Multi-user

Multi-processing

Multi-tasking

Philosophy

Open-source software


Unix workings

Unix file systems

Navigating the Unix file system

Types of paths

Absolute paths

Relative paths

Path shortcuts

Hidden files

Useful commands

ls

cd

file

man, info

touch

mkdir

rmdir

rm, cp, mv

Create non-empty file

Display file

Wildcards

Permissions

Permission strings

Permission numbers

Owner

Group

Other

r

w

x

r

w

x

r

w

x

Owner

Group

Other

r

w

x

1

1

1

r

w

x

1

0

0

r

w

x

1

0

0

Binary

Decimal

Meaning

000

0

No permissions

001

1

Execute only

010

2

Write only

011

3

Write + Execute

100

4

Read only

101

5

Read + Execute

110

6

Read + Write

111

7

All permissions

Changing permissions

Person accessing

Operation

Type of permission

‘u’ - user

‘+’ - grant

‘r’ = read

‘g’ - group

‘-’ - revoke

‘w’ = write

‘o’ - others

‘=’ - set

‘x’ = execute

‘a’ - all

‘-’ = none

Unix pipes, processes and filters

Useful commands

Running a process when logged off

Command

Meaning

screen

Create a new screen

screen -d

Detach from existing screen

screen -list

List available screens

screen -r <id>

Resume screen given ID

CTRL-D

Kill screen

Piping

Filter

Filter

Function of filter

head

Takes first 10 lines

tail

Takes last 10 lines

sort

Organises data

wc

Prints number of newlines

uniq

Remove duplicate lines

du

Estimate file space usage

xargs

Build + execute command lines

Redirection

More useful commands

Command

Function of command

find

Search for files in directory hierarchy

tar

Create file archive / extract

gzip/gunzip & zip/unzip

Compress files

nohup

Run command in background

parallel

Run jobs in parallel

basename

Removes parent folders from pathname

e.g. basename /home/jsmith = jsmith

cut

Extract sections from each line of input

w3m

Text-based WWW browser


Scripting

grep

sed

awk

Regular expression

More regex syntax:


Data exchange

Network protocol stack

Layer

Explanation

Examples

Application

Protocols that software uses to communicate over networks

HTTP (documents), FTP (files), SMTP (sending mail), POP (receiving mail)

Transport

Protocols that manage how the data is sent to the recipient

TCP (establishing a connection, make sure that each system receives messages), UDP (fire and forget)

Internet

Protocols that manage how the packets are structured and distributed

IP, ICMP, IGMP

Link

Protocols concerning the physical hardware used for sending data

DSL, Ethernet, ARP

HTTP web protocol

Requests

Method

Meaning

GET

Request for a web page or object from a server

POST

Send data/information about client to server

PUT

Send document to server

DELETE

Delete object on server

HEAD

Request information about page/document

TRACE

Trace proxies or tunnels in path from client to server

OPTION

Determines server capabilities

Elements of an HTTP response

Stateless

Data exchange formats

XML

<menu id="file" value="File">
 <
popup>
   <
menuitem value="New" onclick="CreateNewDoc()" />
   <
menuitem value="Open" onclick="OpenDoc()" />
   <
menuitem value="Close" onclick="CloseDoc()" />
 </
popup>
</
menu>

XPath

JSON

{"menu": {
 "id":
"file",
 "value":
"File",
 "popup": {
   "menuitem": [
     {"value":
"New", "onclick": "CreateNewDoc()"},
     {"value":
"Open", "onclick": "OpenDoc()"},
     {"value":
"Close", "onclick": "CloseDoc()"}
   ]
 }
}}

CSV

x,y
0,6.77
1,7.23
2,8.97
3,9.26
4,10.24
5,11.25

Remote access

Client / Server

Thin clients

Fat clients


Relational model

The relational data model

Independence

DBMS

Data model

Data sublanguages

Relational data model

Keys and functional dependencies

Key

Functional dependency

First name

Last name

Age

Sex

Matthew

Barnes

19

M

Elon

Musk

46

M

Melinda

Gates

53

F

hPdmrLZk

Barnes

99

F

Splitting / Combining rule

Trivial dependencies

Implication

Equivalence

A small problem

First name

Last name

Age

Sex

Matthew

Barnes

19

M

Elon

Musk

46

M

Melinda

Gates

53

F

hPdmrLZk

Barnes

99

F

First name

Last name

Age

Sex

Matthew

Barnes

19

M

Elon

Musk

46

M

Melinda

Gates

53

F

hPdmrLZk

Barnes

99

F

Matthew

Patrick

31

M

More keys

Superkey

Candidate key

Closure

Closure algorithm

  1. You start with the set X. You make that X0.
  2. Look along F, and check if there are any Y → Z where Y is a subset of X.
  3. If there is, then union Z with X0 and make that X1. Now, we use X1 instead of X0.
  4. Keep repeating and looping around F until Xn just remains the same.
  5. You should now have X+.

Xn

Value

Explanation

X0

{A, C}

In the beginning, we start with the values in X, that is, the set we want to determine if it is a superkey or not.

X1

{A, C} ∪ {E, H}

In F, there is a C → EH, and we have a ‘C’ in X0. Therefore, we can union the result of that functional dependency with our X and increment its index.

X1

{A, C, E, H}

It should now look like this.

X2

{A, C, E, H} ∪ {H}

In F, there is an AC → H, and we have an ‘A’ and a ‘C’ in X1, so we can union ‘H’ with our X.

X2

{A, C, E, H}

Hmm, that didn’t seem to do anything. Additionally, there are no more functional dependencies of interest for us to look at. Therefore, this will be our output: this is {A, C}+.

Xn

Value

Explanation

X0

{A, B}

In the beginning, we start with the values in X, that is, the set we want to determine if it is a superkey or not.

X1

{A, B} ∪ {C, D}

In F, there is an AB → CD, and we have an ‘A’ and a ‘B’ in X0. Therefore, we can union the result of that functional dependency with our X and increment its index.

X1

{A, B, C, D}

It should now look like this.

X2

{A, B, C, D} ∪ {E, H}

In F, there is a C → EH, and we now have a ‘C’ in X1 so we can union ‘H’ and ‘E’ with our X.

X2

{A, B, C, D, E, H}

Now we have this.

X3

{A, B, C, D, E, H} ∪ {G}

In F, there is a D → G. Since we have ‘D’ in X3, we can union ‘G’ with our X.

X3

{A, B, C, D, E, H, G}

We’re done here now; there are no more functional dependencies of interest. This is {A, B}+.

Bad relations and anomalies

Relational algebra

Union

ID

Name

1

Jonathan

2

Joseph

ID

Name

3

Jotaro

4

Josuke

=

ID

Name

1

Jonathan

2

Joseph

3

Jotaro

4

Josuke

Difference

ID

Name

1

Jonathan

2

Joseph

3

Jotaro

4

Josuke

ID

Name

2

Joseph

4

Josuke

=

ID

Name

1

Jonathan

3

Jotaro

Cartesian product

Joestar

ID

Name

1

Jonathan

2

Joseph

Fruit

Name

Colour

Banana

Yellow

Apple

Red

=

Joestar x Fruit

ID

Joestar.

Name

Fruit.

Name

Colour

1

Jonathan

Banana

Yellow

1

Jonathan

Apple

Red

2

Joseph

Banana

Yellow

2

Joseph

Apple

Red

Projection

Joestar x Fruit

ID

Joestar.

Name

Fruit.

Name

Colour

1

Jonathan

Banana

Yellow

1

Jonathan

Apple

Red

2

Joseph

Banana

Yellow

2

Joseph

Apple

Red

(Joestar x Fruit)

Joestar.

Name

Colour

Jonathan

Yellow

Jonathan

Red

Joseph

Yellow

Joseph

Red

Selection

Joestar

ID

Name

1

Jonathan

2

Joseph

3

Jotaro

4

Josuke

(Joestar)

ID

Name

3

Jotaro

4

Josuke

(Joestar)

ID

Name

1

Jonathan

Renaming

Joestar

ID

Name

1

Jonathan

2

Joseph

3

Jotaro

4

Josuke

(Joestar)

ID

Surname

1

Jonathan

2

Joseph

3

Jotaro

4

Josuke

Θ-Join

Joestar

ID

Name

1

Jonathan

2

Joseph

3

Jotaro

4

Josuke

Birthyears

ID

Birthyear

1

1868

2

1920

3

1970

4

1983

(Joestar x Birthyears)

Joestar.

ID

Birthyears.

ID

Name

Birthyear

1

1

Jonathan

1868

2

2

Joseph

1920

3

3

Jotaro

1970

4

4

Josuke

1983

Natural join

Joestar

ID

Name

1

Jonathan

2

Joseph

3

Jotaro

4

Josuke

Birthyears

ID

Birthyear

1

1868

2

1920

3

1970

4

1983

Joestar ⋈ Birthyears

ID

Name

Birthyear

1

Jonathan

1868

2

Joseph

1920

3

Jotaro

1970

4

Josuke

1983

Aggregate functions with unary operator

STUDENT_ID

AVERAGE_SCORE

1

5.78

2

4.89

3

3.68

4

8.45

Database systems

Normalisation

Zeroth normal form

Stand users

Stand user

Stand name

Birthyear

Part

Stand Type

Joseph Joestar

Hermit Purple

1920

Part 2, Part 3, Part 4

Tool Stand

Joseph Joestar

Hermit Purple

1920

Part 2, Part 3, Part 4

Integrated Stand

Jotaro Kujo

Star Platinum

1970

Part 3, Part 4, Part 5, Part 6

Close-range Stand

Jotaro Kujo

Star Platinum

1970

Part 3, Part 4, Part 5, Part 6

Range Irrelevant

Josuke Higashikata

Crazy Diamond

1983

Part 4

Close-range Stand

Yoshikage Kira

Killer Queen

1966

Part 4

Close-range Stand

First normal form

Stand users

Stand user (PK)

Stand name

Birthyear

Stand type (PK)

Joseph Joestar

Hermit Purple

1920

Tool Stand

Joseph Joestar

Hermit Purple

1920

Integrated Stand

Jotaro Kujo

Star Platinum

1970

Close-range Stand

Jotaro Kujo

Star Platinum

1970

Range Irrelevant

Josuke Higashikata

Crazy Diamond

1983

Close-range Stand

Yoshikage Kira

Killer Queen

1966

Close-range Stand

Stand user parts

Stand user (PK)

Part (PK)

Joseph Joestar

Part 2

Joseph Joestar

Part 3

Joseph Joestar

Part 4

Jotaro Kujo

Part 3

Jotaro Kujo

Part 4

Jotaro Kujo

Part 5

Jotaro Kujo

Part 6

Second normal form

Stand users

Stand user (PK)

Stand name

Birthyear

Joseph Joestar

Hermit Purple

1920

Jotaro Kujo

Star Platinum

1970

Josuke Higashikata

Crazy Diamond

1983

Yoshikage Kira

Killer Queen

1966

Stand user parts

Stand user (PK)

Part (PK)

Joseph Joestar

Part 2

Joseph Joestar

Part 3

Joseph Joestar

Part 4

Jotaro Kujo

Part 3

Jotaro Kujo

Part 4

Jotaro Kujo

Part 5

Jotaro Kujo

Part 6

Stand types

Stand name (PK)

Stand type (PK)

Hermit Purple

Tool Stand

Hermit Purple

Integrated Stand

Star Platinum

Close-range Stand

Star Platinum

Range Irrelevant

Crazy Diamond

Close-range Stand

Killer Queen

Close-range Stand

Third normal form

Protagonists

Part (PK)

Protagonist

Stand name

Part 2

Joseph Joestar

Hermit Purple

Part 3

Jotaro Kujo

Star Platinum

Part 4

Josuke Higashikata

Crazy Diamond

Protagonists

Part (PK)

Protagonist

Part 2

Joseph Joestar

Part 3

Jotaro Kujo

Part 4

Josuke Higashikata

Stands

Protagonist (PK)

Stand name

Joseph Joestar

Hermit Purple

Jotaro Kujo

Star Platinum

Josuke Higashikata

Crazy Diamond

Boyce-Codd normal form

Enrolment

Student ID (PK)

Subject (PK)

Professor

101

Java

P.Java

101

C++

P.Cpp

102

Java

P.Java2

103

C#

P.Csharp

104

Java

P.Java

Students-Professors

Student ID (PK)

Professor (PK, FK)

101

P.Java

101

P.Cpp

102

P.Java2

103

P.Csharp

104

P.Java

Professors-Subjects

Professor (PK)

Subjects

P.Java

Java

P.Cpp

C++

P.Java2

Java

P.Csharp

C#

Modelling and SQL basics

Types of data modelling

Conceptual (ideas)

Logical (high level)

Physical (low level)

Entities

Strong

Weak

Associative

Attributes

Attributes

Multivalued attributes

Derived attributes

Relationships

Relationship

Weak relationship

Cardinality

One-to-one
One-to-many
Many-to-many

SQL - Structured Query Language

Data definition language (DDL)

Data manipulation language (DML)

Data definition

Create database

CREATE DATABASE databasename;

Create table

DROP TABLE IF EXISTS Student;
CREATE TABLE Student
(
 
ID integer,
 email
varchar(20),
 lastName
varchar(20),
 firstName
varchar(20),
 DOB
date
);

Define primary keys

CREATE TABLE Student
(
 
ID integer PRIMARY KEY,
 email
varchar(20) UNIQUE NOT NULL,
 lastName
varchar(20),
 firstName
varchar(20),
 DOB
date
);

Define foreign keys

CREATE TABLE Student
(
 
ID integer PRIMARY KEY,

  class integer,
 email
varchar(20) UNIQUE NOT NULL,
 lastName
varchar(20),
 firstName
varchar(20),
 DOB
date,

  FOREIGN KEY (class) REFERENCES Class(ID)
);

Joins

Inner join

Characters

Character

Stand ID

Jotaro Kujo

1

Polnareff

2

Jonathan Joestar

null

Stands

Stand ID

Stand

1

Star Platinum

2

Silver Chariot

3

The World

Inner Join

Character

Stand ID

Stand

Jotaro Kujo

1

Star Platinum

Polnareff

2

Silver Chariot

Left join

Characters

Character

Stand ID

Jotaro Kujo

1

Polnareff

2

Jonathan Joestar

null

Stands

Stand ID

Stand

1

Star Platinum

2

Silver Chariot

3

The World

Left Join

Character

Stand ID

Stand

Jotaro Kujo

1

Star Platinum

Polnareff

2

Silver Chariot

Jonathan Joestar

null

null

Right join

Characters

Character

Stand ID

Jotaro Kujo

1

Polnareff

2

Jonathan Joestar

null

Stands

Stand ID

Stand

1

Star Platinum

2

Silver Chariot

3

The World

Right Join

Character

Stand ID

Stand

Jotaro Kujo

1

Star Platinum

Polnareff

2

Silver Chariot

null

3

The World

Full outer join

Characters

Character

Stand ID

Jotaro Kujo

1

Polnareff

2

Jonathan Joestar

null

Stands

Stand ID

Stand

1

Star Platinum

2

Silver Chariot

3

The World

Full Outer Join

Character

Stand ID

Stand

Jotaro Kujo

1

Star Platinum

Polnareff

2

Silver Chariot

Jonathan Joestar

null

null

null

3

The World


Data languages

Relational Algebra vs SQL

Counterparts

SQL

Relational Algebra

SELECT

Projection

FROM

Cartesian product

WHERE

Selection

Self-joins

Aliases in SQL

Multisets

Union

Difference

Intersection

Cartesian product

Multisets in SQL

Advanced SQL

SQL Aggregate functions

Grouping

Views

Indexes

Data Protection and the GDPR

Data Protection

GDPR

Key principles

Consent