# Exported with the Erfurt API - http://aksw.org/Projects/Erfurt @base . @prefix dct: . @prefix ns0: . @prefix ns1: . @prefix ns2: . @prefix ns3: . @prefix ontology1288790966584: . @prefix ontowiki_wikipage: . @prefix owl2xml: . @prefix owl: . @prefix rdf: . @prefix rdfs: . @prefix sadocontology1: . @prefix sioc: . @prefix sioct: . @prefix sysconf: . @prefix uva-sa-ontology: . @prefix view: . @prefix xsd: . ontowiki_wikipage:content """

Relational database system

 

Explanation

In a relational database system, data is stored in tables that have relationships to each other using primary and foreign keys. The combining of these tables (terminology: joining of tables) to retrieve certain data can often be an expensive task. The relational database system would have to store multiple tables of data in order to prevent duplication. A (part of the database) could look like this for example:

Table: communication

communication_id

first_person_id

second_person_id

project_id

1

172

383

9292

2

383

482

3038

3

383

593

492


Table: persons

person_id

full_name

profession

172

David Smith

Junior Programmer

383

Jeremy Bakersfield

Senior Programmer

482

Lucas Morgan

Medior Programmer

593

Hank Copperfield

Intern

 

Table: subjects

project_id

project_title

9292

Public Transport Information System

3038

Bank Transaction system

492

NFC research


From this data, it can be concluded that Jeremy Bakersfield acts as a central bridge of communication for 2 other developers and about three projects about an information system for public transport, a bank transaction system and research about NFC. In order to reach this conclusion three tables need to be joined together, which can cost more and more performance as the data set grows larger. Also, every occurrence of communication gets its own database record. This problem set can easily grow quite large.A list of disadvantages and advantages can be found in the appendix under section: Relational Database system

 

Graph database system

 

Explanation

 

Though relational databases have broadly been used for decades now, they do have their limitations. As stated in the previous paragraphs, joining tables together costs a lot of processing power. In graph databases there are entities and relationships (nodes and edges). These relationships play an important role in the database system.

 

These graph databases provide a huge performance advantage over relational databases when querying for information that would otherwise require one or multiple joins.

\"\"

In a benchmark performed by researchers from the University of Mississippi [1], the performance of a popular relational database (MySQL) is compared to the performance of a popular graph database (Neo4j).

 

They say that “Graphs were created to contain approximately 1000, 5000, 10000, and 100000 nodes to aid in assessing scalability. The type of payload data stored in each node also varied. The payloads consisted of random integers, random 8KB strings, and random 32KB strings. Thus, twelve MySQL and twelve Neo4j databases were constructed from the random graphs.”

 

They also state that the size of the graph database is usually about 1.5 to 2 times larger in megabytes. This is certainly something to consider, but it doesn’t outweigh the gain in performance since data storage is quite cheap nowadays.

The queries they’ve used to test the database are divided into two categories: structural and data queries. They define three structural queries:

 

-          Find all orphan nodes. That is, find all nodes in the graph that are singletons, with no incoming edges and no outgoing edges


 

-           

-          Traverse the graph to a depth of 4 and count the number of nodes reachable

-          Traverse the graph to a depth of 128 and count the number of nodes reachable

 

As well as three data queries:

-          Count the number of nodes whose payload data is equal to some value

-          Count the number of nodes whose payload data is less than some value

-          Count the number of nodes whose payload data contains some search string (length ranges from 4 to 8)
 

They conclude that in the first type of queries, the Neo4j database performed a lot better than the MySQL database. Sometimes this was even an order of magnitude faster. However, the opposite is true for the latter three queries. A list of disadvantages and advantages can be found in the appendix under section: “Graph Database system”.

 

Conclusion

 

After explaining the two database choices as well as their advantages and disadvantages for this specific situation, a choice needs to be made. In this chapter we outweigh the pros and cons and come to an advice for the stakeholders which type of database system to implement.

 

One of the desired key qualities for the application as requested by the stakeholders is performance. They do not wish to wait for a long time in order for an analysis to finish. In order to deal with this problem, we decided to store several analysed data in a database rather than retrieve it every time the user makes a request.

 

This alone is however not assurance enough that the system will perform in a fast manner, since a lot of data still needs to be coupled in order to view communication between developers, trace isolated developers, etc. Due to the performance bottleneck that a relational database has the graph database seems like a fit solution for these types of queries.

 

As can be read in the explanation section of the graph database, it can perform up to an order of a magnitude faster than a relational database with these types of queries. Queries which take field values and do something with this (i.e. sum them) are performed faster in a relational database, but these are a lot less interesting as the first type will be queried much more.

 

The question is then whether the disadvantages of using a graph database can be overcome, the fact that there’s less expertise available and that it needs up to twice as much space to store its data. The latter is certainly no showstopper as disk storage has become cheaper and cheaper in the past few years.

 

The fact that the database type hasn’t been around for too much time certainly puts it behind relational databases regarding its maturity as well as availability for high quality tools. However, the concept is gaining attention in the past few years and big companies such as Google, Facebook and LinkedIn now use this technology so this will most likely improve even further over time.

We conclude that the graph database is therefore the best fit for this problem, provided that the developers that will ultimately create this application study the ins and outs of it thoroughly. The performance gain and the fact that large companies to achieve similar goals use this sort of database system provide us enough confidence to recommend this solution.

 

""" ; sadocontology1:contains_knowledge_about sadocontology1:Analsyis_data_in_graph_database, sadocontology1:Analsyis_database_is_relational, sadocontology1:Complexity, sadocontology1:Performance, sadocontology1:Scheduler, sadocontology1:developer ; a sadocontology1:WikiPage, owl:NamedIndividual ; rdfs:label "12 logical and implementation view - background - rationale - relational database system" .