Gradep

From Gppd-wiki

Jump to: navigation, search

Contents

History

GradeP is a cluster made by GPPD started in 2012 with scientific ambit, focused on post-graduate and undergraduate projects. Since GradeP was built it keeps providing an environment to process BigData works helping researchers to develop and create new solutions generating important contributions through articles, publications and products. Professor Dr. Cláudio Geyer is the coordinator of GPPD in BigData division and responsible to mantain GradeP.


Specifications

The cluster GradeP is a heterogeneous cluster composed by a variety of machines as follows:

1. Big Data Cluster of Machines (To Hadoop/Flink Applications)

10 Desktops Dell Optiplex GX270 (Pentium IV (HT) 2.8GHz, 3GBytes of RAM, 160GBytes of Sata disk)

This machines use a 3Com TOR Switch that allow dedicated link with 100Mbps of speed for each machine.
SUM of Resources: 20 logical cores, 30GBytes of RAM, 1600GBytes of disk.

2. Distributed Cluster of Machines (To Sockets/MPI Application)

7 Servers 1U Dell PowerEdge 1650 (Dual PIIIs 1.4GHz, 2GBytes of RAM, 55GBytes of 10.000 RPM SCSI disk).

[1] Mellanox Infiniband - 10Gbps each (6 links with 3 machines).
[2] Fibre channel - 10Gbps each (2 link with 4 machines).
[3] Ethernet - 100Mbps each (All machines using TOR 3Com Switch).
SUM of Resources: 14 physical cores, 14GBytes of RAM, 328GBytes of disk.

3. Parallel Cluster of Machines (To PThreads/OpenMP Applications)

2 Servers 2U Sun SPARC T5220 (UltraSparc Quad T2 1.2GHz, 8GBytes of RAM, 146Gb of 10.000 RPM SAS disk)

These 2 machines are connected by 4 dedicated links of 1Gbps ethernet.
SUM of Resources: 64 logical cores, 16GBytes of RAM, 292GBytes of disk.

The base system in Gradep is Linux (Rocks 6.1 32bits SandBoa) and provide the same notation in terms of commands to create folders, files or navigate into the directories such as any other Linux distro. Distributed Cluster running Ubuntu Server Linux having a internal shared file system (SMB). Parallel Cluster in same way running Debian Server for SPARC and having a proprietary shared file system and system tools.


Offered Services

Some services able to run in Gradep are related to parallel processing comprising BigData frameworks and far as cited below:

1. BigData Engines

  • Apache Spark 1.5.2
  • Apache Hadoop 1.2.1
  • Apache Hadoop 2.7.1
  • Apache Storm 0.10.0

2. Parallel Environment and Programming Tools

  • OpenMP
  • PThreads

 3. Distributed Environment and Programming Tools

  • Sockets
  • MPI

How to Obtain Gradep Access

The GradeP access needs to be done with a user previously registered followed by password, it means that a user request is wanted to grant the credentials. The request of user and credentials needs to be done by Email to Raffael Schemmer or Julio Anjos characterizing the reasons of use. The base system in GradeP is Linux (Rocks 6.1 SandBoa) and provide the same notation in terms of commands to create folders, files or navigate into the directories such as any other Linux distro.

  • ssh -X user@gradep.inf.ufrgs.br

This command provides access, them a password will be required to proceed.


Gradep Tutorials and User Operation Procedure

To ensure that utilization and procedures referring to the cluster Gradep and its installed programs have been succesfully set ups in order to run Data-intensive applications using Hadoop or Cassadra, documentation and tutorials are needed. Below you will find tutorials to use Gradep properly according each kind of framework.


1. BigData Cluster  

Private Loop

Shared Loop

  • Spark 1.5.2
  • Storm 0.10.0

2. Distributed Cluster - Sockets/MPI

3. Parallel Cluster - OpenMP/PThreads

4. Another Tutorial - Cassandra/Cuda


Projects

  • MAREMOTO MapReduce Applied to Heterogeneous Computing.
  • Big Data Applied to Mapping Genetic Diseases (In cooperation with Clinical Hospital of Porto Alegre - HCPA).


Publications

  • SCHEMMER, R. B. ; ANJOS, J. C. S : TIBOLA, A. L. ; BARROS, J. F. ; GEYER, C. F. R. Framework Hadoop em Plataformas de Cloud e Cluster Computing. In: XV  Escola Regional de Alto Desempenho (ERAD) v. 1, p.71 - 88, 2015. http://www.lbd.dcc.ufmg.br/colecoes/erad-rs/2015/073.pdf
  • BARROS, J. F. ; SCHEMMER, R. B. ; ANJOS, J. C. S ; GEYER, C. F. R. Estudo experimental do compressor BZIP2 em arquiteturas paralelas e distribuídas. In: XV Escola Regional de Alto Desempenho (ERAD) v. 1, p.213-216, 2015. http://www.lbd.dcc.ufmg.br/colecoes/erad-rs/2015/027.pdf
  • SCHEMMER, R. B. ; GEYER, C. F. R ; RECKZIEGEL FILHO, B ; BARROS, J. F. ; ANJOS, J. C. S. Map Reduce Aplicado ao Sequenciamento de DNA Humano. Comparativo entre Implementações das Linguagens Java e C++. In: XII Workshop de Processamento Paralelo e Distribuído, 2014, Porto Alegre. http://inf.ufrgs.br/gppd/wsppd/2014/proceedings.php.
  • ANJOS, JULIO C. S. ; BORDIN, M. ; TIBOLA, A. L. ; KOLBERG, WAGNER ; GEYER, C. F. R. Computação Intensiva em Dados, Implementações e Experimentos na GridRS. Escola Regional de Alto Desempenho (ERAD), v. 1, p. 4-27, 2014. http://www.lbd.dcc.ufmg.br/bdbcomp/servlet/Evento?id=721



Return

Personal tools
Alliances