2021
|
| Hoffmann, Renato Barreto; Griebler, Dalvan; Fernandes, Luiz Gustavo Geração de Código OpenMP para o Paralelismo deStream Journal Article In: Revista Eletrônica de Iniciação Científica em Computação, vol. 19, no. 2, pp. 2082, 2021. @article{HOFFMANN:REIC:21,
title = {Geração de Código OpenMP para o Paralelismo deStream},
author = {Renato Barreto Hoffmann and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://sol.sbc.org.br/journals/index.php/reic/article/view/2082},
year = {2021},
date = {2021-06-01},
journal = {Revista Eletrônica de Iniciação Científica em Computação},
volume = {19},
number = {2},
pages = {2082},
publisher = {Sociedade Brasileira de Computação (SBC)},
address = {Porto Alegre},
abstract = {OpenMP é uma interface para a programação paralela padrão e amplamente usada na indústria e academia, porém, torna-se complexa quando usada para desenvolver aplicações paralelas de fluxo de dados ou stream. Para resolver esse problema, foi proposto usar uma interface de programação paralela de alto nível (chamada SPar) e seu compilador para a geração de código estruturado de mais baixo nível com OpenMP em aplicações de fluxo de dados. O objetivo é diminuir a complexidade e verbosidade introduzida pelo OpenMP nas aplicações de stream. Nos experimentos em 4 aplicações, notou-se uma redução no tempo de execução de até 25,42%. Além do mais, requer-se um número de linhas de código fonte menor para expressar o paralelismo.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
OpenMP é uma interface para a programação paralela padrão e amplamente usada na indústria e academia, porém, torna-se complexa quando usada para desenvolver aplicações paralelas de fluxo de dados ou stream. Para resolver esse problema, foi proposto usar uma interface de programação paralela de alto nível (chamada SPar) e seu compilador para a geração de código estruturado de mais baixo nível com OpenMP em aplicações de fluxo de dados. O objetivo é diminuir a complexidade e verbosidade introduzida pelo OpenMP nas aplicações de stream. Nos experimentos em 4 aplicações, notou-se uma redução no tempo de execução de até 25,42%. Além do mais, requer-se um número de linhas de código fonte menor para expressar o paralelismo. |
| Löff, Júnior; Griebler, Dalvan; Fernandes, Luiz Gustavo Melhorando a Geração Automática de Código Paralelo para o Paradigma de Processamento de Stream em Multi-cores Journal Article In: Revista Eletrônica de Iniciação Científica em Computação, vol. 19, no. 2, pp. 2083, 2021. @article{LOFF:REIC:21,
title = {Melhorando a Geração Automática de Código Paralelo para o Paradigma de Processamento de Stream em Multi-cores},
author = {Júnior Löff and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://sol.sbc.org.br/journals/index.php/reic/article/view/2083},
year = {2021},
date = {2021-06-01},
journal = {Revista Eletrônica de Iniciação Científica em Computação},
volume = {19},
number = {2},
pages = {2083},
publisher = {Sociedade Brasileira de Computação (SBC)},
address = {Porto Alegre},
abstract = {A programação paralela ainda é um desafio para desenvolvedores, pois exibe demasiados detalhes de baixo nível e de sistemas operacionais. Programadores precisam lidar com detalhes como escalonamento, balanceamento de carga e sincronizações. Esse trabalho contribui com otimizações para uma abstração de programação paralela para expressar paralelismo de stream em multi-cores. O trabalho estendeu a SPar adicionando dois novos atributos na sua linguagem, e implementou melhorias no seu compilador a fim de proporcionar melhor desempenho ao código paralelo gerado automaticamente. Os experimentos revelaram que a nova versão da SPar consegue abstrair detalhes do paralelismo com desempenho similar às versões paralelizadas manualmente.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
A programação paralela ainda é um desafio para desenvolvedores, pois exibe demasiados detalhes de baixo nível e de sistemas operacionais. Programadores precisam lidar com detalhes como escalonamento, balanceamento de carga e sincronizações. Esse trabalho contribui com otimizações para uma abstração de programação paralela para expressar paralelismo de stream em multi-cores. O trabalho estendeu a SPar adicionando dois novos atributos na sua linguagem, e implementou melhorias no seu compilador a fim de proporcionar melhor desempenho ao código paralelo gerado automaticamente. Os experimentos revelaram que a nova versão da SPar consegue abstrair detalhes do paralelismo com desempenho similar às versões paralelizadas manualmente. |
| and, Anderson M. Maliszewski Ambiente de Nuvem Computacional Privada paraTeste e Desenvolvimento de Programas Paralelos Incollection doi In: Charão, Andrea; Serpa, Matheus (Ed.): Minicursos da XXI Escola Regional de Alto Desempenho da Região Sul, pp. 104-128, Sociedade Brasileira de Computação (SBC), Porto Alegre, 2021. @incollection{larcc:minicurso:ERAD:21,
title = {Ambiente de Nuvem Computacional Privada paraTeste e Desenvolvimento de Programas Paralelos},
author = {Anderson M. Maliszewski and},
editor = {Andrea Charão and Matheus Serpa},
url = {https://doi.org/10.5753/sbc.6150.4},
doi = {10.5753/sbc.6150.4},
year = {2021},
date = {2021-06-01},
booktitle = {Minicursos da XXI Escola Regional de Alto Desempenho da Região Sul},
pages = {104-128},
publisher = {Sociedade Brasileira de Computação (SBC)},
address = {Porto Alegre},
chapter = {5},
abstract = {A computação de alto desempenho costuma utilizar agregados de computadores para aexecução de aplicações paralelas. Alternativamente, a computação em nuvem oferecerecursos computacionais distribuídos para processamento com um nível de abstraçãoalém do tradicional, dinâmico e sob-demanda. Este capítulo tem como objetivo intro-duzir conceitos básicos, apresentar noções básicas para implantar uma nuvem privadae demonstrar os benefícios para o desenvolvimento e teste de programas paralelos emnuvem},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}
A computação de alto desempenho costuma utilizar agregados de computadores para aexecução de aplicações paralelas. Alternativamente, a computação em nuvem oferecerecursos computacionais distribuídos para processamento com um nível de abstraçãoalém do tradicional, dinâmico e sob-demanda. Este capítulo tem como objetivo intro-duzir conceitos básicos, apresentar noções básicas para implantar uma nuvem privadae demonstrar os benefícios para o desenvolvimento e teste de programas paralelos emnuvem |
| Vogel, Adriano; Mencagli, Gabriele; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo Online and Transparent Self-adaptation of Stream Parallel Patterns Journal Article doi In: Computing, vol. 105, no. 5, pp. 1039-1057, 2021. @article{VOGEL:Computing:23,
title = {Online and Transparent Self-adaptation of Stream Parallel Patterns},
author = {Adriano Vogel and Gabriele Mencagli and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1007/s00607-021-00998-8},
doi = {10.1007/s00607-021-00998-8},
year = {2021},
date = {2021-05-01},
journal = {Computing},
volume = {105},
number = {5},
pages = {1039-1057},
publisher = {Springer},
abstract = {Several real-world parallel applications are becoming more dynamic and long-running, demanding online (at run-time) adaptations. Stream processing is a representative scenario that computes data items arriving in real-time and where parallel executions are necessary. However, it is challenging for humans to monitor and manually self-optimize complex and long-running parallel executions continuously. Moreover, although high-level and structured parallel programming aims to facilitate parallelism, several issues still need to be addressed for improving the existing abstractions. In this paper, we extend self-adaptiveness for supporting autonomous and online changes of the parallel pattern compositions. Online self-adaptation is achieved with an online profiler that characterizes the applications, which is combined with a new self-adaptive strategy and a model for smooth transitions on reconfigurations. The solution provides a new abstraction layer that enables application programmers to define non-functional requirements instead of hand-tuning complex configurations. Hence, we contribute with additional abstractions and flexible self-adaptation for responsiveness at run-time. The proposed solution is evaluated with applications having different processing characteristics, workloads, and configurations. The results show that it is possible to provide additional abstractions, flexibility, and responsiveness while achieving performance comparable to the best static configuration executions.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Several real-world parallel applications are becoming more dynamic and long-running, demanding online (at run-time) adaptations. Stream processing is a representative scenario that computes data items arriving in real-time and where parallel executions are necessary. However, it is challenging for humans to monitor and manually self-optimize complex and long-running parallel executions continuously. Moreover, although high-level and structured parallel programming aims to facilitate parallelism, several issues still need to be addressed for improving the existing abstractions. In this paper, we extend self-adaptiveness for supporting autonomous and online changes of the parallel pattern compositions. Online self-adaptation is achieved with an online profiler that characterizes the applications, which is combined with a new self-adaptive strategy and a model for smooth transitions on reconfigurations. The solution provides a new abstraction layer that enables application programmers to define non-functional requirements instead of hand-tuning complex configurations. Hence, we contribute with additional abstractions and flexible self-adaptation for responsiveness at run-time. The proposed solution is evaluated with applications having different processing characteristics, workloads, and configurations. The results show that it is possible to provide additional abstractions, flexibility, and responsiveness while achieving performance comparable to the best static configuration executions. |
| Dopke, Luan; Rockenbach, Dinei André; Griebler, Dalvan Avaliação de Desempenho para Banco de Dados com Genoma em Nuvem Privada Inproceedings doi In: 21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 45-48, Sociedade Brasileira de Computação, Joinville, RS, Brazil, 2021. @inproceedings{larcc:cloud_DNA_databases:ERAD:21,
title = {Avaliação de Desempenho para Banco de Dados com Genoma em Nuvem Privada},
author = {Luan Dopke and Dinei André Rockenbach and Dalvan Griebler},
url = {https://doi.org/10.5753/eradrs.2021.14771},
doi = {10.5753/eradrs.2021.14771},
year = {2021},
date = {2021-04-01},
booktitle = {21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {45-48},
publisher = {Sociedade Brasileira de Computação},
address = {Joinville, RS, Brazil},
abstract = {Os bancos de dados são ferramentas particularmente interessantes para a manipulação de dados gerados através do sequenciamento de DNA. Este artigo tem como objetivo avaliar o desempenho de três bancos de dados com cargas relacionadas ao sequenciamento de DNA: PostgreSQL e MySQL como bancos de dados relacionais e MongoDB como banco de dados NoSQL. Os resultados demonstram que o PostgreSQL se sobressai aos demais..},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Os bancos de dados são ferramentas particularmente interessantes para a manipulação de dados gerados através do sequenciamento de DNA. Este artigo tem como objetivo avaliar o desempenho de três bancos de dados com cargas relacionadas ao sequenciamento de DNA: PostgreSQL e MySQL como bancos de dados relacionais e MongoDB como banco de dados NoSQL. Os resultados demonstram que o PostgreSQL se sobressai aos demais.. |
| Vanzan, Anthony; Fim, Gabriel; Welter, Greice; Griebler, Dalvan Aceleração da Classificação de Lavouras de Milho com MPI e Estratégias de Paralelismo Inproceedings doi In: 21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 49-52, Sociedade Brasileira de Computação, Joinville, RS, Brazil, 2021. @inproceedings{larcc:DL_Classificaiton_MPI:ERAD:21,
title = {Aceleração da Classificação de Lavouras de Milho com MPI e Estratégias de Paralelismo},
author = {Anthony Vanzan and Gabriel Fim and Greice Welter and Dalvan Griebler},
url = {https://doi.org/10.5753/eradrs.2021.14772},
doi = {10.5753/eradrs.2021.14772},
year = {2021},
date = {2021-04-01},
booktitle = {21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {49-52},
publisher = {Sociedade Brasileira de Computação},
address = {Joinville, RS, Brazil},
abstract = {Este trabalho visou acelerar a execução de um algoritmo de classificação de lavouras em imagens áreas. Para isso, foram implementadas diferentes versões paralelas usando a biblioteca MPI na linguagem Python. A avaliação foi conduzida em dois ambientes computacionais. Conclui-se que é possível reduzir o tempo de execução a medida que mais recursos paralelos são usados e a estratégia de distribuição de trabalho dinâmica é mais eficiente.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Este trabalho visou acelerar a execução de um algoritmo de classificação de lavouras em imagens áreas. Para isso, foram implementadas diferentes versões paralelas usando a biblioteca MPI na linguagem Python. A avaliação foi conduzida em dois ambientes computacionais. Conclui-se que é possível reduzir o tempo de execução a medida que mais recursos paralelos são usados e a estratégia de distribuição de trabalho dinâmica é mais eficiente. |
| Löff, Júnior; Griebler, Dalvan; Fernandes, Luiz Gustavo Melhorando a Geração Automática de Código Paralelo em Arquiteturas Multi-core na SPar Inproceedings doi In: 21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 65-68, Sociedade Brasileira de Computação, Joinville, Brazil, 2021. @inproceedings{LOFF:ERAD:21,
title = {Melhorando a Geração Automática de Código Paralelo em Arquiteturas Multi-core na SPar},
author = {Júnior Löff and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.5753/eradrs.2021.14776},
doi = {10.5753/eradrs.2021.14776},
year = {2021},
date = {2021-04-01},
booktitle = {21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {65-68},
publisher = {Sociedade Brasileira de Computação},
address = {Joinville, Brazil},
abstract = {Neste trabalho, a fim de melhorar a eficiência do código paralelo gerado em arquiteturas multi-core, foi estendida a linguagem e o compilador da SPar para permitir a geração automática de padrões paralelos pertencentes aos dois principais domínios de paralelismo, o de stream e de dados. Experimentos mostram que a nova versão da SPar obteve resultados similares, ou até mesmo melhores, que as versões implementadas manualmente.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Neste trabalho, a fim de melhorar a eficiência do código paralelo gerado em arquiteturas multi-core, foi estendida a linguagem e o compilador da SPar para permitir a geração automática de padrões paralelos pertencentes aos dois principais domínios de paralelismo, o de stream e de dados. Experimentos mostram que a nova versão da SPar obteve resultados similares, ou até mesmo melhores, que as versões implementadas manualmente. |
| Hoffmann, Renato Barreto; Griebler, Dalvan; Fernandes, Luiz Gustavo Abstraindo o OpenMP no Desenvolvimento de Aplicações de Fluxo de Dados Contínuo Inproceedings doi In: 21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 69-72, Sociedade Brasileira de Computação, Joinville, Brazil, 2021. @inproceedings{HOFFMANN:ERAD:21,
title = {Abstraindo o OpenMP no Desenvolvimento de Aplicações de Fluxo de Dados Contínuo},
author = {Renato Barreto Hoffmann and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.5753/eradrs.2021.14777},
doi = {10.5753/eradrs.2021.14777},
year = {2021},
date = {2021-04-01},
booktitle = {21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {69-72},
publisher = {Sociedade Brasileira de Computação},
address = {Joinville, Brazil},
abstract = {OpenMP é complexo quando usado para desenvolver aplicações de fluxo de dados. Com o objetivo de mitigar essa dificuldade, foi utilizada uma metodologia existente, chamada SPar, para aumentar o nível de abstração. Portanto, foram utilizadas anotações mais alto-nível da SPar para gerar código mais baixo-nível de fluxo de dados com OpenMP. Os experimentos revelaram que a SPar teve desempenho 0,86% inferior no caso mais extremo.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
OpenMP é complexo quando usado para desenvolver aplicações de fluxo de dados. Com o objetivo de mitigar essa dificuldade, foi utilizada uma metodologia existente, chamada SPar, para aumentar o nível de abstração. Portanto, foram utilizadas anotações mais alto-nível da SPar para gerar código mais baixo-nível de fluxo de dados com OpenMP. Os experimentos revelaram que a SPar teve desempenho 0,86% inferior no caso mais extremo. |
| Mello, Fernanda; Griebler, Dalvan; Manssour, Isabel; Fernandes, Luiz Gustavo Compressão de Dados em Multicores com Flink ou SPar? Inproceedings doi In: 21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 77-80, Sociedade Brasileira de Computação, Joinville, Brazil, 2021. @inproceedings{MELLO:ERAD:21,
title = {Compressão de Dados em Multicores com Flink ou SPar?},
author = {Fernanda Mello and Dalvan Griebler and Isabel Manssour and Luiz Gustavo Fernandes},
url = {https://doi.org/10.5753/eradrs.2021.14779},
doi = {10.5753/eradrs.2021.14779},
year = {2021},
date = {2021-04-01},
booktitle = {21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {77-80},
publisher = {Sociedade Brasileira de Computação},
address = {Joinville, Brazil},
abstract = {Neste trabalho, foi implementada uma versão do algoritmo de compressão de dados Bzip2 com o framework para processamento de stream Apache Flink, a fim de avaliar seu desempenho em comparação com a versão do Bzip2 já existente na linguagem de domínio específica SPar. Os experimentos revelaram que a versão com SPar possui um desempenho muito superior ao Flink.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Neste trabalho, foi implementada uma versão do algoritmo de compressão de dados Bzip2 com o framework para processamento de stream Apache Flink, a fim de avaliar seu desempenho em comparação com a versão do Bzip2 já existente na linguagem de domínio específica SPar. Os experimentos revelaram que a versão com SPar possui um desempenho muito superior ao Flink. |
| Leonarczyk, Ricardo; Griebler, Dalvan Implementação MPIC++ e HPX dos Kernels NPB Inproceedings doi In: 21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 81-84, Sociedade Brasileira de Computação, Joinville, RS, Brazil, 2021. @inproceedings{larcc:NPB_HPX_MPI:ERAD:21,
title = {Implementação MPIC++ e HPX dos Kernels NPB},
author = {Ricardo Leonarczyk and Dalvan Griebler},
url = {https://doi.org/10.5753/eradrs.2021.14780},
doi = {10.5753/eradrs.2021.14780},
year = {2021},
date = {2021-04-01},
booktitle = {21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {81-84},
publisher = {Sociedade Brasileira de Computação},
address = {Joinville, RS, Brazil},
abstract = {Este artigo apresenta a implementação paralela dos cinco kernels pertencentes ao NAS Parallel Benchmarks (NPB) com MPIC++ e HPX para execução em arquiteturas de cluster. Os resultados demonstraram que o modelo de programação HPX pode ser mais eficiente do que MPIC++ em algoritmos tais como transformada rápida de Fourier, ordenação e Gradiente Conjugado.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Este artigo apresenta a implementação paralela dos cinco kernels pertencentes ao NAS Parallel Benchmarks (NPB) com MPIC++ e HPX para execução em arquiteturas de cluster. Os resultados demonstraram que o modelo de programação HPX pode ser mais eficiente do que MPIC++ em algoritmos tais como transformada rápida de Fourier, ordenação e Gradiente Conjugado. |
| Andrade, Gabriella; Griebler, Dalvan; Santos, Rodrigo; Fernandes, Luiz Gustavo Uso de Métricas de Codificação para Avaliar a Programação Paralela nas Aplicações de Stream em Sistemas Multi-core Inproceedings doi In: 21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 93-94, Sociedade Brasileira de Computação, Joinville, Brazil, 2021. @inproceedings{ANDRADE:ERAD:21,
title = {Uso de Métricas de Codificação para Avaliar a Programação Paralela nas Aplicações de Stream em Sistemas Multi-core},
author = {Gabriella Andrade and Dalvan Griebler and Rodrigo Santos and Luiz Gustavo Fernandes},
url = {https://doi.org/10.5753/eradrs.2021.14785},
doi = {10.5753/eradrs.2021.14785},
year = {2021},
date = {2021-04-01},
booktitle = {21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {93-94},
publisher = {Sociedade Brasileira de Computação},
address = {Joinville, Brazil},
abstract = {Neste trabalho, sete métricas de codificação são avaliadas considerando quatro aplicações do mundo real implementadas com FastFlow, Pthreads, SPar e TBB. Nossos resultados mostram que SPar apresenta os melhores indicadores de acordo com as métricas utilizadas.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Neste trabalho, sete métricas de codificação são avaliadas considerando quatro aplicações do mundo real implementadas com FastFlow, Pthreads, SPar e TBB. Nossos resultados mostram que SPar apresenta os melhores indicadores de acordo com as métricas utilizadas. |
| Vogel, Adriano; Griebler, Dalvan; Fernandes, Luiz Gustavo Proposta de Adaptação Dinâmica de Padrões Paralelos Inproceedings doi In: 21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 101-102, Sociedade Brasileira de Computação, Joinville, Brazil, 2021. @inproceedings{VOGEL:ERAD:21,
title = {Proposta de Adaptação Dinâmica de Padrões Paralelos},
author = {Adriano Vogel and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.5753/eradrs.2021.14789},
doi = {10.5753/eradrs.2021.14789},
year = {2021},
date = {2021-04-01},
booktitle = {21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {101-102},
publisher = {Sociedade Brasileira de Computação},
address = {Joinville, Brazil},
abstract = {Este trabalho apresenta uma perspectiva para adaptar dinamicamente os padrões paralelos em tempo de execução, objetivando abstrair dos programadores a definição de qual padrão paralelo usar e aumentar a flexibilidade. Os resultados preliminares demonstram a eficácia da solução proposta.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Este trabalho apresenta uma perspectiva para adaptar dinamicamente os padrões paralelos em tempo de execução, objetivando abstrair dos programadores a definição de qual padrão paralelo usar e aumentar a flexibilidade. Os resultados preliminares demonstram a eficácia da solução proposta. |
| Araujo, Gabriell; Griebler, Dalvan; Fernandes, Luiz Gustavo Proposta de Suporte à Parametrização no NPB com CUDA Inproceedings doi In: 21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 103-104, Sociedade Brasileira de Computação, Joinville, Brazil, 2021. @inproceedings{ARAUJO:ERAD:21,
title = {Proposta de Suporte à Parametrização no NPB com CUDA},
author = {Gabriell Araujo and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.5753/eradrs.2021.14790},
doi = {10.5753/eradrs.2021.14790},
year = {2021},
date = {2021-04-01},
booktitle = {21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {103-104},
publisher = {Sociedade Brasileira de Computação},
address = {Joinville, Brazil},
abstract = {Este trabalho propõe a introdução de parâmetros configuráveis para GPUs no NPB. A etapa inicial do estudo contemplou a parametrização do número de threads por bloco e seu impacto no desempenho de GPUs.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Este trabalho propõe a introdução de parâmetros configuráveis para GPUs no NPB. A etapa inicial do estudo contemplou a parametrização do número de threads por bloco e seu impacto no desempenho de GPUs. |
| Rockenbach, Dinei André; Griebler, Dalvan; Fernandes, Luiz Gustavo Provendo Abstrações de Alto Nível para GPUs na SPar Inproceedings doi In: 21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 109-110, Sociedade Brasileira de Computação, Joinville, Brazil, 2021. @inproceedings{ROCKENBACH:ERAD:21,
title = {Provendo Abstrações de Alto Nível para GPUs na SPar},
author = {Dinei André Rockenbach and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.5753/eradrs.2021.14793},
doi = {10.5753/eradrs.2021.14793},
year = {2021},
date = {2021-04-01},
booktitle = {21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {109-110},
publisher = {Sociedade Brasileira de Computação},
address = {Joinville, Brazil},
abstract = {O presente trabalho apresenta uma extensão à linguagem SPar para suportar o paralelismo heterogêneo combinado de CPU e GPU através de anotações C++11 em aplicações de processamento de stream. Os testes sugerem melhoras significativas de desempenho com poucas modificações no código.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
O presente trabalho apresenta uma extensão à linguagem SPar para suportar o paralelismo heterogêneo combinado de CPU e GPU através de anotações C++11 em aplicações de processamento de stream. Os testes sugerem melhoras significativas de desempenho com poucas modificações no código. |
| Garcia, Adriano Marques; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo Proposta de um Framework para Avaliar Interfaces de Programação Paralela em Aplicações de Stream Inproceedings doi In: 21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 119-120, Sociedade Brasileira de Computação, Joinville, Brazil, 2021. @inproceedings{GARCIA:ERAD:21,
title = {Proposta de um Framework para Avaliar Interfaces de Programação Paralela em Aplicações de Stream},
author = {Adriano Marques Garcia and Dalvan Griebler and Claudio Schepke and Luiz Gustavo Fernandes},
url = {https://doi.org/10.5753/eradrs.2021.14798},
doi = {10.5753/eradrs.2021.14798},
year = {2021},
date = {2021-04-01},
booktitle = {21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {119-120},
publisher = {Sociedade Brasileira de Computação},
address = {Joinville, Brazil},
abstract = {Este trabalho propõe um framework que auxilia no desenvolvimento de benchmarks para avaliar Interfaces de Programação Paralela no domínio de paralelismo de stream em C++.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Este trabalho propõe um framework que auxilia no desenvolvimento de benchmarks para avaliar Interfaces de Programação Paralela no domínio de paralelismo de stream em C++. |
| Scheer, Claudio; Griebler, Dalvan; Fernandes, Luiz Gustavo Proposta de Otimização do Tamanho de Batch em Aplicações de Stream para Multicores usando Aprendizado de Máquina Inproceedings doi In: 21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 127-128, Sociedade Brasileira de Computação, Joinville, Brazil, 2021. @inproceedings{SCHEER:ERAD:21,
title = {Proposta de Otimização do Tamanho de Batch em Aplicações de Stream para Multicores usando Aprendizado de Máquina},
author = {Claudio Scheer and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.5753/eradrs.2021.14802},
doi = {10.5753/eradrs.2021.14802},
year = {2021},
date = {2021-04-01},
booktitle = {21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {127-128},
publisher = {Sociedade Brasileira de Computação},
address = {Joinville, Brazil},
abstract = {Este trabalho apresenta uma proposta de estudo e avaliação de features e algoritmos de aprendizado de máquina visando melhorar a desempenho através do ajuste/regulagem do tamanho do batch em aplicações paralelas de stream para arquiteturas multicore.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Este trabalho apresenta uma proposta de estudo e avaliação de features e algoritmos de aprendizado de máquina visando melhorar a desempenho através do ajuste/regulagem do tamanho do batch em aplicações paralelas de stream para arquiteturas multicore. |
| Vogel, Adriano; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo Self-adaptation on Parallel Stream Processing: A Systematic Review Journal Article doi In: Concurrency and Computation: Practice and Experience, vol. 34, no. 6, pp. e6759, 2021. @article{VOGEL:Survey:CCPE:2021,
title = {Self-adaptation on Parallel Stream Processing: A Systematic Review},
author = {Adriano Vogel and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1002/cpe.6759},
doi = {10.1002/cpe.6759},
year = {2021},
date = {2021-03-01},
journal = {Concurrency and Computation: Practice and Experience},
volume = {34},
number = {6},
pages = {e6759},
publisher = {Wiley},
abstract = {A recurrent challenge in real-world applications is autonomous management of the executions at run-time. In this vein, stream processing is a class of applications that compute data flowing in the form of streams (e.g., video feeds, images, and data analytics), where parallel computing can help accelerate the executions. On the one hand, stream processing applications are becoming more complex, dynamic, and long-running. On the other hand, it is unfeasible for humans to monitor and manually change the executions continuously. Hence, self-adaptation can reduce costs and human efforts by providing a higher-level abstraction with an autonomic/seamless management of executions. In this work, we aim at providing a literature review regarding self-adaptation applied to the parallel stream processing domain. We present a comprehensive revision using a systematic literature review method. Moreover, we propose a taxonomy to categorize and classify the existing self-adaptive approaches. Finally, applying the taxonomy made it possible to characterize the state-of-the-art, identify trends, and discuss open research challenges and future opportunities.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
A recurrent challenge in real-world applications is autonomous management of the executions at run-time. In this vein, stream processing is a class of applications that compute data flowing in the form of streams (e.g., video feeds, images, and data analytics), where parallel computing can help accelerate the executions. On the one hand, stream processing applications are becoming more complex, dynamic, and long-running. On the other hand, it is unfeasible for humans to monitor and manually change the executions continuously. Hence, self-adaptation can reduce costs and human efforts by providing a higher-level abstraction with an autonomic/seamless management of executions. In this work, we aim at providing a literature review regarding self-adaptation applied to the parallel stream processing domain. We present a comprehensive revision using a systematic literature review method. Moreover, we propose a taxonomy to categorize and classify the existing self-adaptive approaches. Finally, applying the taxonomy made it possible to characterize the state-of-the-art, identify trends, and discuss open research challenges and future opportunities. |
| Vogel, Adriano; Mencagli, Gabriele; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo Towards On-the-fly Self-Adaptation of Stream Parallel Patterns Inproceedings doi In: 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 889-93, IEEE, Valladolid, Spain, 2021. @inproceedings{VOGEL:PDP:21,
title = {Towards On-the-fly Self-Adaptation of Stream Parallel Patterns},
author = {Adriano Vogel and Gabriele Mencagli and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1109/PDP52278.2021.00022},
doi = {10.1109/PDP52278.2021.00022},
year = {2021},
date = {2021-03-01},
booktitle = {29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)},
pages = {889-93},
publisher = {IEEE},
address = {Valladolid, Spain},
series = {PDP'21},
abstract = {Stream processing applications compute streams of data and provide insightful results in a timely manner, where parallel computing is necessary for accelerating the application executions. Considering that these applications are becoming increasingly dynamic and long-running, a potential solution is to apply dynamic runtime changes. However, it is challenging for humans to continuously monitor and manually self-optimize the executions. In this paper, we propose self-adaptiveness of the parallel patterns used, enabling flexible on-the-fly adaptations. The proposed solution is evaluated with an existing programming framework and running experiments with a synthetic and a real-world application. The results show that the proposed solution is able to dynamically self-adapt to the most suitable parallel pattern configuration and achieve performance competitive with the best static cases. The feasibility of the proposed solution encourages future optimizations and other applicabilities.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Stream processing applications compute streams of data and provide insightful results in a timely manner, where parallel computing is necessary for accelerating the application executions. Considering that these applications are becoming increasingly dynamic and long-running, a potential solution is to apply dynamic runtime changes. However, it is challenging for humans to continuously monitor and manually self-optimize the executions. In this paper, we propose self-adaptiveness of the parallel patterns used, enabling flexible on-the-fly adaptations. The proposed solution is evaluated with an existing programming framework and running experiments with a synthetic and a real-world application. The results show that the proposed solution is able to dynamically self-adapt to the most suitable parallel pattern configuration and achieve performance competitive with the best static cases. The feasibility of the proposed solution encourages future optimizations and other applicabilities. |
| Garcia, Adriano Marques; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo Introducing a Stream Processing Framework for Assessing Parallel Programming Interfaces Inproceedings doi In: 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 84-88, IEEE, Valladolid, Spain, 2021. @inproceedings{GARCIA:PDP:21,
title = {Introducing a Stream Processing Framework for Assessing Parallel Programming Interfaces},
author = {Adriano Marques Garcia and Dalvan Griebler and Claudio Schepke and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1109/PDP52278.2021.00021},
doi = {10.1109/PDP52278.2021.00021},
year = {2021},
date = {2021-03-01},
booktitle = {29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)},
pages = {84-88},
publisher = {IEEE},
address = {Valladolid, Spain},
series = {PDP'21},
abstract = {Stream Processing applications are spread across different sectors of industry and people's daily lives. The increasing data we produce, such as audio, video, image, and text are demanding quickly and efficiently computation. It can be done through Stream Parallelism, which is still a challenging task and most reserved for experts. We introduce a Stream Processing framework for assessing Parallel Programming Interfaces (PPIs). Our framework targets multi-core architectures and C++ stream processing applications, providing an API that abstracts the details of the stream operators of these applications. Therefore, users can easily identify all the basic operators and implement parallelism through different PPIs. In this paper, we present the proposed framework, implement three applications using its API, and show how it works, by using it to parallelize and evaluate the applications with the PPIs Intel TBB, FastFlow, and SPar. The performance results were consistent with the literature.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Stream Processing applications are spread across different sectors of industry and people's daily lives. The increasing data we produce, such as audio, video, image, and text are demanding quickly and efficiently computation. It can be done through Stream Parallelism, which is still a challenging task and most reserved for experts. We introduce a Stream Processing framework for assessing Parallel Programming Interfaces (PPIs). Our framework targets multi-core architectures and C++ stream processing applications, providing an API that abstracts the details of the stream operators of these applications. Therefore, users can easily identify all the basic operators and implement parallelism through different PPIs. In this paper, we present the proposed framework, implement three applications using its API, and show how it works, by using it to parallelize and evaluate the applications with the PPIs Intel TBB, FastFlow, and SPar. The performance results were consistent with the literature. |
| Vogel, Adriano; Griebler, Dalvan; Fernandes, Luiz G. Providing High‐Level Self‐Adaptive Abstractions for Stream Parallelism on Multicores Journal Article doi In: Software: Practice and Experience, vol. 51, no. 6, pp. 1194-1217, 2021. @article{VOGEL:SPE:21,
title = {Providing High‐Level Self‐Adaptive Abstractions for Stream Parallelism on Multicores},
author = {Adriano Vogel and Dalvan Griebler and Luiz G. Fernandes},
url = {https://doi.org/10.1002/spe.2948},
doi = {10.1002/spe.2948},
year = {2021},
date = {2021-01-01},
journal = {Software: Practice and Experience},
volume = {51},
number = {6},
pages = {1194-1217},
publisher = {Wiley},
abstract = {Stream processing applications are common computing workloads that demand parallelism to increase their performance. As in the past, parallel programming remains a difficult task for application programmers. The complexity increases when application programmers must set non-intuitive parallelism parameters, i.e. the degree of parallelism. The main problem is that state-of-the-art libraries use a static degree of parallelism and are not sufficiently abstracted for developing stream processing applications. In this paper, we propose a self-adaptive regulation of the degree of parallelism to provide higher-level abstractions. Flexibility is provided to programmers with two new self-adaptive strategies, one is for performance experts, and the other abstracts the need to set a performance goal. We evaluated our solution using compiler transformation rules to generate parallel code with the SPar domain-specific language. The experimental results with real-world applications highlighted higher abstraction levels without significant performance degradation in comparison to static executions. The strategy for performance experts achieved slightly higher performance than the one that works without user-defined performance goals.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Stream processing applications are common computing workloads that demand parallelism to increase their performance. As in the past, parallel programming remains a difficult task for application programmers. The complexity increases when application programmers must set non-intuitive parallelism parameters, i.e. the degree of parallelism. The main problem is that state-of-the-art libraries use a static degree of parallelism and are not sufficiently abstracted for developing stream processing applications. In this paper, we propose a self-adaptive regulation of the degree of parallelism to provide higher-level abstractions. Flexibility is provided to programmers with two new self-adaptive strategies, one is for performance experts, and the other abstracts the need to set a performance goal. We evaluated our solution using compiler transformation rules to generate parallel code with the SPar domain-specific language. The experimental results with real-world applications highlighted higher abstraction levels without significant performance degradation in comparison to static executions. The strategy for performance experts achieved slightly higher performance than the one that works without user-defined performance goals. |
| Allebrandt, Alisson; Schmidt, Diego Henrique; Griebler, Dalvan Simplificando a Interpretação de Laudos de Análise de Solo com Deep Learning em Nuvem Journal Article doi In: Revista Eletrônica Argentina-Brasil de Tecnologias da Informação e da Comunicação (REABTIC), vol. 1, no. 13, 2021. @article{larcc:DL_solos:REABTIC:21,
title = {Simplificando a Interpretação de Laudos de Análise de Solo com Deep Learning em Nuvem},
author = {Alisson Allebrandt and Diego Henrique Schmidt and Dalvan Griebler},
url = {https://revistas.setrem.com.br/index.php/reabtic/article/view/387},
doi = {10.5281/zenodo.4445204},
year = {2021},
date = {2021-01-01},
journal = {Revista Eletrônica Argentina-Brasil de Tecnologias da Informação e da Comunicação (REABTIC)},
volume = {1},
number = {13},
publisher = {SETREM},
address = {Três de Maio, RS, Brazil},
abstract = {Um dos aspectos que interfere em uma boa produtividade agrícolaé o solo, consequentemente, a sua conservação por meio da aplicação corretade nutrientes e adubação é de suma importância. Neste artigo, propõe-se umaarquitetura de software e um aplicativo mobile capaz de auxiliar agricultores eengenheiros agrônomos na interpretação de análises de solo geradas em laboratórios.A arquitetura de software foi concebida para atuar em um ambientede nuvem e o aplicativo mobile é a interface para captura e apresentação dosdados. Inicialmente, foi necessário criar uma base de dados com diferentestipos e configurações de imagens. O dataset foi tratado para eliminar ruídos(tais como luminosidade, sombras e distorções) e usado para avaliação de duassoluções de Deep Learning (Google Vision e Tesseract OCR), onde o TesseractOCR se mostrou mais preciso usando as mesmas imagens. Além de ofertar oaplicativo mobile, que é um primeiro passo, a pesquisa realizada revela váriascarências tecnológicas e oportunidades para inovações na área de ciência dossolos.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Um dos aspectos que interfere em uma boa produtividade agrícolaé o solo, consequentemente, a sua conservação por meio da aplicação corretade nutrientes e adubação é de suma importância. Neste artigo, propõe-se umaarquitetura de software e um aplicativo mobile capaz de auxiliar agricultores eengenheiros agrônomos na interpretação de análises de solo geradas em laboratórios.A arquitetura de software foi concebida para atuar em um ambientede nuvem e o aplicativo mobile é a interface para captura e apresentação dosdados. Inicialmente, foi necessário criar uma base de dados com diferentestipos e configurações de imagens. O dataset foi tratado para eliminar ruídos(tais como luminosidade, sombras e distorções) e usado para avaliação de duassoluções de Deep Learning (Google Vision e Tesseract OCR), onde o TesseractOCR se mostrou mais preciso usando as mesmas imagens. Além de ofertar oaplicativo mobile, que é um primeiro passo, a pesquisa realizada revela váriascarências tecnológicas e oportunidades para inovações na área de ciência dossolos. |
2020
|
| Bordin, Maycon Viana; Griebler, Dalvan; Mencagli, Gabriele; Geyer, Claudio F. R.; Fernandes, Luiz Gustavo DSPBench: a Suite of Benchmark Applications for Distributed Data Stream Processing Systems Journal Article doi In: IEEE Access, vol. 8, no. na, pp. 222900-222917, 2020. @article{BORDIN:IEEEAccess:20,
title = {DSPBench: a Suite of Benchmark Applications for Distributed Data Stream Processing Systems},
author = {Maycon Viana Bordin and Dalvan Griebler and Gabriele Mencagli and Claudio F. R. Geyer and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1109/ACCESS.2020.3043948},
doi = {10.1109/ACCESS.2020.3043948},
year = {2020},
date = {2020-12-01},
journal = {IEEE Access},
volume = {8},
number = {na},
pages = {222900-222917},
publisher = {IEEE},
abstract = {Systems enabling the continuous processing of large data streams have recently attracted the attention of the scientific community and industrial stakeholders. Data Stream Processing Systems (DSPSs) are complex and powerful frameworks able to ease the development of streaming applications in distributed computing environments like clusters and clouds. Several systems of this kind have been released and currently maintained as open source projects, like Apache Storm and Spark Streaming. Some benchmark applications have often been used by the scientific community to test and evaluate new techniques to improve the performance and usability of DSPSs. However, the existing benchmark suites lack of representative workloads coming from the wide set of application domains that can leverage the benefits offered by the stream processing paradigm in terms of near real-time performance. The goal of this paper is to present a new benchmark suite composed of 15 applications coming from areas like Finance, Telecommunications, Sensor Networks, Social Networks and others. This paper describes in detail the nature of these applications, their full workload characterization in terms of selectivity, processing cost, input size and overall memory occupation. In addition, it exemplifies the usefulness of our benchmark suite to compare real DSPSs by selecting Apache Storm and Spark Streaming for this analysis.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Systems enabling the continuous processing of large data streams have recently attracted the attention of the scientific community and industrial stakeholders. Data Stream Processing Systems (DSPSs) are complex and powerful frameworks able to ease the development of streaming applications in distributed computing environments like clusters and clouds. Several systems of this kind have been released and currently maintained as open source projects, like Apache Storm and Spark Streaming. Some benchmark applications have often been used by the scientific community to test and evaluate new techniques to improve the performance and usability of DSPSs. However, the existing benchmark suites lack of representative workloads coming from the wide set of application domains that can leverage the benefits offered by the stream processing paradigm in terms of near real-time performance. The goal of this paper is to present a new benchmark suite composed of 15 applications coming from areas like Finance, Telecommunications, Sensor Networks, Social Networks and others. This paper describes in detail the nature of these applications, their full workload characterization in terms of selectivity, processing cost, input size and overall memory occupation. In addition, it exemplifies the usefulness of our benchmark suite to compare real DSPSs by selecting Apache Storm and Spark Streaming for this analysis. |
| Hoffmann, Renato B.; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz G. Stream Parallelism Annotations for Multi-Core Frameworks Inproceedings doi In: XXIV Brazilian Symposium on Programming Languages (SBLP), pp. 48-55, ACM, Natal, Brazil, 2020. @inproceedings{HOFFMANN:SBLP:20,
title = {Stream Parallelism Annotations for Multi-Core Frameworks},
author = {Renato B. Hoffmann and Dalvan Griebler and Marco Danelutto and Luiz G. Fernandes},
url = {https://doi.org/10.1145/3427081.3427088},
doi = {10.1145/3427081.3427088},
year = {2020},
date = {2020-10-01},
booktitle = {XXIV Brazilian Symposium on Programming Languages (SBLP)},
pages = {48-55},
publisher = {ACM},
address = {Natal, Brazil},
series = {SBLP'20},
abstract = {Data generation, collection, and processing is an important workload of modern computer architectures. Stream or high-intensity data flow applications are commonly employed in extracting and interpreting the information contained in this data. Due to the computational complexity of these applications, high-performance ought to be achieved using parallel computing. Indeed, the efficient exploitation of available parallel resources from the architecture remains a challenging task for the programmers. Techniques and methodologies are required to help shift the efforts from the complexity of parallelism exploitation to specific algorithmic solutions. To tackle this problem, we propose a methodology that provides the developer with a suitable abstraction layer between a clean and effective parallel programming interface targeting different multi-core parallel programming frameworks. We used standard C++ code annotations that may be inserted in the source code by the programmer. Then, a compiler parses C++ code with the annotations and generates calls to the desired parallel runtime API. Our experiments demonstrate the feasibility of our methodology and the performance of the abstraction layer, where the difference is negligible in four applications with respect to the state-of-the-art C++ parallel programming frameworks. Additionally, our methodology allows improving the application performance since the developers can choose the runtime that best performs in their system.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Data generation, collection, and processing is an important workload of modern computer architectures. Stream or high-intensity data flow applications are commonly employed in extracting and interpreting the information contained in this data. Due to the computational complexity of these applications, high-performance ought to be achieved using parallel computing. Indeed, the efficient exploitation of available parallel resources from the architecture remains a challenging task for the programmers. Techniques and methodologies are required to help shift the efforts from the complexity of parallelism exploitation to specific algorithmic solutions. To tackle this problem, we propose a methodology that provides the developer with a suitable abstraction layer between a clean and effective parallel programming interface targeting different multi-core parallel programming frameworks. We used standard C++ code annotations that may be inserted in the source code by the programmer. Then, a compiler parses C++ code with the annotations and generates calls to the desired parallel runtime API. Our experiments demonstrate the feasibility of our methodology and the performance of the abstraction layer, where the difference is negligible in four applications with respect to the state-of-the-art C++ parallel programming frameworks. Additionally, our methodology allows improving the application performance since the developers can choose the runtime that best performs in their system. |
| Garcia, Adriano M.; Serpa, Matheus; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz G. L.; Navaux, Philippe O. A. The Impact of CPU Frequency Scaling on Power Consumption of Computing Infrastructures Inproceedings doi In: International Conference on Computational Science and its Applications (ICCSA), pp. 142-157, Springer, Cagliari, Italy, 2020. @inproceedings{GARCIA:ICCSA:20,
title = {The Impact of CPU Frequency Scaling on Power Consumption of Computing Infrastructures},
author = {Adriano M. Garcia and Matheus Serpa and Dalvan Griebler and Claudio Schepke and Luiz G. L. Fernandes and Philippe O. A. Navaux},
url = {https://doi.org/10.1007/978-3-030-58817-5_12},
doi = {10.1007/978-3-030-58817-5_12},
year = {2020},
date = {2020-07-01},
booktitle = {International Conference on Computational Science and its Applications (ICCSA)},
volume = {12254},
pages = {142-157},
publisher = {Springer},
address = {Cagliari, Italy},
series = {ICCSA'20},
abstract = {Since the demand for computing power increases, new architectures emerged to obtain better performance. Reducing the power and energy consumption of these architectures is one of the main challenges to achieving high-performance computing. Current research trends aim at developing new software and hardware techniques to achieve the best performance and energy trade-offs. In this work, we investigate the impact of different CPU frequency scaling techniques such as ondemand, performance, and powersave on the power and energy consumption of multi-core based computer infrastructure. We apply these techniques in PAMPAR, a parallel benchmark suite implemented in PThreads, OpenMP, MPI-1, and MPI-2 (spawn). We measure the energy and execution time of 10 benchmarks, varying the number of threads. Our results show that although powersave consumes up to 43.1% less power than performance and ondemand governors, it consumes the triple of energy due to the high execution time. Our experiments also show that the performance governor consumes up to 9.8% more energy than ondemand for CPU-bound benchmarks. Finally, our results show that PThreads has the lowest power consumption, consuming less than the sequential version for memory-bound benchmarks. Regarding performance, the performance governor achieved 3% of performance over the ondemand.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Since the demand for computing power increases, new architectures emerged to obtain better performance. Reducing the power and energy consumption of these architectures is one of the main challenges to achieving high-performance computing. Current research trends aim at developing new software and hardware techniques to achieve the best performance and energy trade-offs. In this work, we investigate the impact of different CPU frequency scaling techniques such as ondemand, performance, and powersave on the power and energy consumption of multi-core based computer infrastructure. We apply these techniques in PAMPAR, a parallel benchmark suite implemented in PThreads, OpenMP, MPI-1, and MPI-2 (spawn). We measure the energy and execution time of 10 benchmarks, varying the number of threads. Our results show that although powersave consumes up to 43.1% less power than performance and ondemand governors, it consumes the triple of energy due to the high execution time. Our experiments also show that the performance governor consumes up to 9.8% more energy than ondemand for CPU-bound benchmarks. Finally, our results show that PThreads has the lowest power consumption, consuming less than the sequential version for memory-bound benchmarks. Regarding performance, the performance governor achieved 3% of performance over the ondemand. |
| Maliszewski, Anderson M.; Roloff, Eduardo; Griebler, Dalvan; Gaspary, Luciano P.; Navaux, Philippe O. A. Performance Impact of IEEE 802.3ad in Container-based Clouds for HPC Applications Inproceedings doi In: International Conference on Computational Science and its Applications (ICCSA), pp. 158-167, Springer, Cagliari, Italy, 2020. @inproceedings{larcc:ieee802.3ad_containers:ICCSA:20,
title = {Performance Impact of IEEE 802.3ad in Container-based Clouds for HPC Applications},
author = {Anderson M. Maliszewski and Eduardo Roloff and Dalvan Griebler and Luciano P. Gaspary and Philippe O. A. Navaux},
url = {https://doi.org/10.1007/978-3-030-58817-5_13},
doi = {10.1007/978-3-030-58817-5_13},
year = {2020},
date = {2020-07-01},
booktitle = {International Conference on Computational Science and its Applications (ICCSA)},
pages = {158-167},
publisher = {Springer},
address = {Cagliari, Italy},
series = {ICCSA'20},
abstract = {Historically, large computational clusters have supported hardware requirements for executing High-Performance Computing (HPC) applications. This model has become out of date due to the high costs of maintaining and updating these infrastructures. Currently, computing resources are delivered as a service because of the cloud computing paradigm. In this way, we witnessed consistent efforts to migrate HPC applications to the cloud. However, if on the one hand cloud computing offers an attractive environment for HPC, benefiting from the pay-per-use model and on-demand resource allocation, on the other, there are still significant performance challenges to be addressed, such as the known network bottleneck. In this article, we evaluate the use of a Network Interface Cards (NIC) aggregation approach, using the IEEE 802.3ad standard to improve the performance of representative HPC applications executed in LXD container based-cloud. We assessed the aggregation impact using two and four NICs with three distinct transmission hash policies. Our results demonstrated that if the correct hash policy is selected, the NIC aggregation can significantly improve the performance of network-intensive HPC applications by up to 40%.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Historically, large computational clusters have supported hardware requirements for executing High-Performance Computing (HPC) applications. This model has become out of date due to the high costs of maintaining and updating these infrastructures. Currently, computing resources are delivered as a service because of the cloud computing paradigm. In this way, we witnessed consistent efforts to migrate HPC applications to the cloud. However, if on the one hand cloud computing offers an attractive environment for HPC, benefiting from the pay-per-use model and on-demand resource allocation, on the other, there are still significant performance challenges to be addressed, such as the known network bottleneck. In this article, we evaluate the use of a Network Interface Cards (NIC) aggregation approach, using the IEEE 802.3ad standard to improve the performance of representative HPC applications executed in LXD container based-cloud. We assessed the aggregation impact using two and four NICs with three distinct transmission hash policies. Our results demonstrated that if the correct hash policy is selected, the NIC aggregation can significantly improve the performance of network-intensive HPC applications by up to 40%. |
| Maliszewski, Anderson M.; Roloff, Eduardo; Carreño, Emmanuell D.; Griebler, Dalvan; Gaspary, Luciano P.; Navaux, Philippe O. A. Performance and Cost-Aware in Clouds: A Network Interconnection Assessment Inproceedings doi In: IEEE Symposium on Computers and Communications (ISCC), pp. 1-6, IEEE, Rennes, France, 2020. @inproceedings{larcc:network_azure_cost_perf:ISCC:20,
title = {Performance and Cost-Aware in Clouds: A Network Interconnection Assessment},
author = {Anderson M. Maliszewski and Eduardo Roloff and Emmanuell D. Carreño and Dalvan Griebler and Luciano P. Gaspary and Philippe O. A. Navaux},
url = {https://doi.org/10.1109/ISCC50000.2020.9219554},
doi = {10.1109/ISCC50000.2020.9219554},
year = {2020},
date = {2020-07-01},
booktitle = {IEEE Symposium on Computers and Communications (ISCC)},
pages = {1-6},
publisher = {IEEE},
address = {Rennes, France},
series = {ISCC'20},
abstract = {The availability of computing resources has significantly changed due to the growing adoption of the cloud computing paradigm. Aiming at potential advantages such as cost savings through the pay-per-use method and resource allocation in a scalable/elastic way, we witnessed consistent efforts to execute high-performance computing (HPC) applications in the cloud. Performance in this environment depends heavily upon two main system components: processing power and network interconnection. If, on the one hand, allocating more powerful hardware theoretically boosts performance, on the other hand, it increases the allocation cost. In this paper, we evaluated how the network interconnection impacts on performance and cost efficiency. Our experiments were carried out using NAS Parallel Benchmarks and Alya HPC application on Microsoft Azure public cloud provider, with three different cloud instances/network interconnections. The results revealed that through the use of the accelerated networking approach, which allows the instance to have a high-performance interconnect without additional charges, the performance of HPC applications can be significantly improved with a better cost efficiency.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
The availability of computing resources has significantly changed due to the growing adoption of the cloud computing paradigm. Aiming at potential advantages such as cost savings through the pay-per-use method and resource allocation in a scalable/elastic way, we witnessed consistent efforts to execute high-performance computing (HPC) applications in the cloud. Performance in this environment depends heavily upon two main system components: processing power and network interconnection. If, on the one hand, allocating more powerful hardware theoretically boosts performance, on the other hand, it increases the allocation cost. In this paper, we evaluated how the network interconnection impacts on performance and cost efficiency. Our experiments were carried out using NAS Parallel Benchmarks and Alya HPC application on Microsoft Azure public cloud provider, with three different cloud instances/network interconnections. The results revealed that through the use of the accelerated networking approach, which allows the instance to have a high-performance interconnect without additional charges, the performance of HPC applications can be significantly improved with a better cost efficiency. |
| Stein, Charles M.; Rockenbach, Dinei A.; Griebler, Dalvan; Torquati, Massimo; Mencagli, Gabriele; Danelutto, Marco; Fernandes, Luiz G. Latency‐aware adaptive micro‐batching techniques for streamed data compression on graphics processing units Journal Article doi In: Concurrency and Computation: Practice and Experience, vol. na, no. na, pp. e5786, 2020. @article{STEIN:CCPE:20,
title = {Latency‐aware adaptive micro‐batching techniques for streamed data compression on graphics processing units},
author = {Charles M. Stein and Dinei A. Rockenbach and Dalvan Griebler and Massimo Torquati and Gabriele Mencagli and Marco Danelutto and Luiz G. Fernandes},
url = {https://doi.org/10.1002/cpe.5786},
doi = {10.1002/cpe.5786},
year = {2020},
date = {2020-05-01},
journal = {Concurrency and Computation: Practice and Experience},
volume = {na},
number = {na},
pages = {e5786},
publisher = {Wiley Online Library},
abstract = {Stream processing is a parallel paradigm used in many application domains. With the advance of graphics processing units (GPUs), their usage in stream processing applications has increased as well. The efficient utilization of GPU accelerators in streaming scenarios requires to batch input elements in microbatches, whose computation is offloaded on the GPU leveraging data parallelism within the same batch of data. Since data elements are continuously received based on the input speed, the bigger the microbatch size the higher the latency to completely buffer it and to start the processing on the device. Unfortunately, stream processing applications often have strict latency requirements that need to find the best size of the microbatches and to adapt it dynamically based on the workload conditions as well as according to the characteristics of the underlying device and network. In this work, we aim at implementing latency‐aware adaptive microbatching techniques and algorithms for streaming compression applications targeting GPUs. The evaluation is conducted using the Lempel‐Ziv‐Storer‐Szymanski compression application considering different input workloads. As a general result of our work, we noticed that algorithms with elastic adaptation factors respond better for stable workloads, while algorithms with narrower targets respond better for highly unbalanced workloads.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Stream processing is a parallel paradigm used in many application domains. With the advance of graphics processing units (GPUs), their usage in stream processing applications has increased as well. The efficient utilization of GPU accelerators in streaming scenarios requires to batch input elements in microbatches, whose computation is offloaded on the GPU leveraging data parallelism within the same batch of data. Since data elements are continuously received based on the input speed, the bigger the microbatch size the higher the latency to completely buffer it and to start the processing on the device. Unfortunately, stream processing applications often have strict latency requirements that need to find the best size of the microbatches and to adapt it dynamically based on the workload conditions as well as according to the characteristics of the underlying device and network. In this work, we aim at implementing latency‐aware adaptive microbatching techniques and algorithms for streaming compression applications targeting GPUs. The evaluation is conducted using the Lempel‐Ziv‐Storer‐Szymanski compression application considering different input workloads. As a general result of our work, we noticed that algorithms with elastic adaptation factors respond better for stable workloads, while algorithms with narrower targets respond better for highly unbalanced workloads. |
| Maliszewski, Anderson M.; Roloff, Eduardo; Griebler, Dalvan; Navaux, Philippe O. A. Avaliando o Impacto da Rede no Desempenho e Custo de Execução de Aplicações HPC Inproceedings doi In: 20th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 159-160, Sociedade Brasileira de Computação, Santa Maria, RS, Brazil, 2020. @inproceedings{larcc:network_impact:ERAD:20,
title = {Avaliando o Impacto da Rede no Desempenho e Custo de Execução de Aplicações HPC},
author = {Anderson M. Maliszewski and Eduardo Roloff and Dalvan Griebler and Philippe O. A. Navaux},
url = {https://doi.org/10.5753/eradrs.2020.10786},
doi = {10.5753/eradrs.2020.10786},
year = {2020},
date = {2020-04-01},
booktitle = {20th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {159-160},
publisher = {Sociedade Brasileira de Computação},
address = {Santa Maria, RS, Brazil},
abstract = {O desempenho das aplicações HPC depende de dois componentes principais; poder de processamento e interconexão de rede. Este artigo avalia o impacto que a interconexão de rede exerce em programas paralelos usando um cluster homogêneo, em relação a desempenho e custo de execução estimado.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
O desempenho das aplicações HPC depende de dois componentes principais; poder de processamento e interconexão de rede. Este artigo avalia o impacto que a interconexão de rede exerce em programas paralelos usando um cluster homogêneo, em relação a desempenho e custo de execução estimado. |
| Andrade, Gabriella; Griebler, Dalvan; Fernandes, Luiz G. L. Avaliação da Usabilidade de Interfaces de Programação Paralela para Sistemas Multi-Core em Aplicação de Vídeo Inproceedings doi In: XX Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 149-150, Sociedade Brasileira de Computação (SBC), Santa Maria, BR, 2020. @inproceedings{ANDRADE:ERAD:20,
title = {Avaliação da Usabilidade de Interfaces de Programação Paralela para Sistemas Multi-Core em Aplicação de Vídeo},
author = {Gabriella Andrade and Dalvan Griebler and Luiz G. L. Fernandes},
url = {https://doi.org/10.5753/eradrs.2020.10781},
doi = {10.5753/eradrs.2020.10781},
year = {2020},
date = {2020-04-01},
booktitle = {XX Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {149-150},
publisher = {Sociedade Brasileira de Computação (SBC)},
address = {Santa Maria, BR},
abstract = {Com a ampla variedade de interfaces para a programação paralela em ambientes multi-core é difícil determinar quais destas oferecem a melhor usabilidade. Esse trabalho realiza um experimento comparando a paralelização de uma aplicação de vídeo com as ferramentas FastFlow, SPar e TBB. Os resultados revelaram que a SPar requer menos esforço na paralelização de uma aplicação de vídeo do que as demais interfaces de programação paralela.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Com a ampla variedade de interfaces para a programação paralela em ambientes multi-core é difícil determinar quais destas oferecem a melhor usabilidade. Esse trabalho realiza um experimento comparando a paralelização de uma aplicação de vídeo com as ferramentas FastFlow, SPar e TBB. Os resultados revelaram que a SPar requer menos esforço na paralelização de uma aplicação de vídeo do que as demais interfaces de programação paralela. |
| Garcia, Adriano Marques; Griebler, Dalvan; Fernandes, Luiz G. L. Proposta de uma Suíte de Benchmarks para Processamento de Stream em Sistemas Multi-Core Inproceedings doi In: XX Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 167-168, Sociedade Brasileira de Computação (SBC), Santa Maria, BR, 2020. @inproceedings{GARCIA:ERAD:20,
title = {Proposta de uma Suíte de Benchmarks para Processamento de Stream em Sistemas Multi-Core},
author = {Adriano Marques Garcia and Dalvan Griebler and Luiz G. L. Fernandes},
url = {https://doi.org/10.5753/eradrs.2020.10790},
doi = {10.5753/eradrs.2020.10790},
year = {2020},
date = {2020-04-01},
booktitle = {XX Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {167-168},
publisher = {Sociedade Brasileira de Computação (SBC)},
address = {Santa Maria, BR},
abstract = {O aumento no volume de dados gerados por sistemas computacionais e a necessidade por processamento rápido desses dados vem alavancando a área de processamento de stream. Entretanto, ainda não existe um benchmark para auxiliar desenvolvedores e pesquisadores. Este trabalho visa propor uma suíte de benchmarks para processamento de stream em arquiteturas multi-core e discute as características necessárias no desenvolvimento dessa suíte.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
O aumento no volume de dados gerados por sistemas computacionais e a necessidade por processamento rápido desses dados vem alavancando a área de processamento de stream. Entretanto, ainda não existe um benchmark para auxiliar desenvolvedores e pesquisadores. Este trabalho visa propor uma suíte de benchmarks para processamento de stream em arquiteturas multi-core e discute as características necessárias no desenvolvimento dessa suíte. |
| Araújo, Gabriell Alves; Griebler, Dalvan; Fernandes, Luiz G. L. Implementação CUDA dos Kernels NPB Inproceedings doi In: XX Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 85-88, Sociedade Brasileira de Computação (SBC), Santa Maria, BR, 2020. @inproceedings{ARAUJO:ERAD:20,
title = {Implementação CUDA dos Kernels NPB},
author = {Gabriell Alves Araújo and Dalvan Griebler and Luiz G. L. Fernandes},
url = {https://doi.org/10.5753/eradrs.2020.10762},
doi = {10.5753/eradrs.2020.10762},
year = {2020},
date = {2020-04-01},
booktitle = {XX Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {85-88},
publisher = {Sociedade Brasileira de Computação (SBC)},
address = {Santa Maria, BR},
abstract = {NAS Parallel Benchmarks (NPB) é um conjunto de benchmarks utilizado para avaliar hardware e software, que ao longo dos anos foi portado para diferentes frameworks. Concernente a GPUs, atualmente existem apenas versões OpenCL e OpenACC. Este trabalho contribui com a literatura provendo a primeira implementação CUDA completa dos kernels do NPB, realizando experimentos com carga de trabalho inédita e revelando novos fatos sobre o NPB.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
NAS Parallel Benchmarks (NPB) é um conjunto de benchmarks utilizado para avaliar hardware e software, que ao longo dos anos foi portado para diferentes frameworks. Concernente a GPUs, atualmente existem apenas versões OpenCL e OpenACC. Este trabalho contribui com a literatura provendo a primeira implementação CUDA completa dos kernels do NPB, realizando experimentos com carga de trabalho inédita e revelando novos fatos sobre o NPB. |
| Hoffmann, Renato Barreto; Griebler, Dalvan; Fernandes, Luis G. L. Geração Automática de Código TBB na SPar Inproceedings doi In: XX Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 97-100, Sociedade Brasileira de Computação (SBC), Santa Maria, BR, 2020. @inproceedings{HOFFMANN:ERAD:20,
title = {Geração Automática de Código TBB na SPar},
author = {Renato Barreto Hoffmann and Dalvan Griebler and Luis G. L. Fernandes},
url = {https://doi.org/10.5753/eradrs.2020.10765},
doi = {10.5753/eradrs.2020.10765},
year = {2020},
date = {2020-04-01},
booktitle = {XX Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {97-100},
publisher = {Sociedade Brasileira de Computação (SBC)},
address = {Santa Maria, BR},
abstract = {Técnicas de programação paralela são necessárias para extrair todo o potencial dos processadores de múltiplos núcleos. Para isso, foi criada a SPar, uma linguagem para abstração do paralelismo de stream. Esse trabalho descreve a implementação da geração de código automática para a biblioteca TBB na SPar, uma vez que gerava-se código para FastFlow. Os testes com aplicações resultaram em tempos de execução até 12,76 vezes mais rápidos.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Técnicas de programação paralela são necessárias para extrair todo o potencial dos processadores de múltiplos núcleos. Para isso, foi criada a SPar, uma linguagem para abstração do paralelismo de stream. Esse trabalho descreve a implementação da geração de código automática para a biblioteca TBB na SPar, uma vez que gerava-se código para FastFlow. Os testes com aplicações resultaram em tempos de execução até 12,76 vezes mais rápidos. |
| Leonarczyk, Ricardo; Griebler, Dalvan Implementação MPIC++ dos kernels NPB EP, IS e CG Inproceedings doi In: 20th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 101-104, Sociedade Brasileira de Computação, Santa Maria, RS, Brazil, 2020. @inproceedings{larcc:NPB_MPI:ERAD:20,
title = {Implementação MPIC++ dos kernels NPB EP, IS e CG},
author = {Ricardo Leonarczyk and Dalvan Griebler},
url = {https://doi.org/10.5753/eradrs.2020.10766},
doi = {10.5753/eradrs.2020.10766},
year = {2020},
date = {2020-04-01},
booktitle = {20th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {101-104},
publisher = {Sociedade Brasileira de Computação},
address = {Santa Maria, RS, Brazil},
abstract = {Este trabalho busca contribuir com prévios esforços para disponibilizar os NAS Parallel benchmarks na linguagem C++, focando-se no aspecto memória distribuída com MPI. São apresentadas implementações do CG, EP e IS portadas da versão MPI original do NPB. Os experimentos realizados demonstram que a versão proposta dos benchmarks obteve um desempenho próximo da original.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Este trabalho busca contribuir com prévios esforços para disponibilizar os NAS Parallel benchmarks na linguagem C++, focando-se no aspecto memória distribuída com MPI. São apresentadas implementações do CG, EP e IS portadas da versão MPI original do NPB. Os experimentos realizados demonstram que a versão proposta dos benchmarks obteve um desempenho próximo da original. |
| Löff, Junior; Griebler, Dalvan; Fernandes, Luiz G. L. Implementação Paralela do LU no NPB C++ Utilizando um Pipeline Implícito Inproceedings doi In: XX Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 37-40, Sociedade Brasileira de Computação (SBC), Santa Maria, BR, 2020. @inproceedings{LOFF:ERAD:20,
title = {Implementação Paralela do LU no NPB C++ Utilizando um Pipeline Implícito},
author = {Junior Löff and Dalvan Griebler and Luiz G. L. Fernandes},
url = {https://doi.org/10.5753/eradrs.2020.10750},
doi = {10.5753/eradrs.2020.10750},
year = {2020},
date = {2020-04-01},
booktitle = {XX Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {37-40},
publisher = {Sociedade Brasileira de Computação (SBC)},
address = {Santa Maria, BR},
abstract = {Neste trabalho, um pipeline implícito com o padrão map foi implementado na aplicação LU do NAS Parallel Benchmarks em C++. O LU possui dependência de dados no tempo, o que dificulta a exploração do paralelismo. Ele foi convertido de Fortran para C++, a fim de ser paralelizado com diferentes bibliotecas de sistemas multi-core. O uso desta estratégia com as bibliotecas permitiu ganhos de desempenho de até 10.6% em relação a versão original.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Neste trabalho, um pipeline implícito com o padrão map foi implementado na aplicação LU do NAS Parallel Benchmarks em C++. O LU possui dependência de dados no tempo, o que dificulta a exploração do paralelismo. Ele foi convertido de Fortran para C++, a fim de ser paralelizado com diferentes bibliotecas de sistemas multi-core. O uso desta estratégia com as bibliotecas permitiu ganhos de desempenho de até 10.6% em relação a versão original. |
| Justo, Gabriel; Hoffmann, Renato Barreto; Vogel, Adriano; Griebler, Dalvan; Fernandes, Luiz G. L. Acelerando uma Aplicação de Detecção de Pistas com MPI Inproceedings doi In: XX Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 117-120, Sociedade Brasileira de Computação (SBC), Santa Maria, BR, 2020. @inproceedings{JUSTO:ERAD:20,
title = {Acelerando uma Aplicação de Detecção de Pistas com MPI},
author = {Gabriel Justo and Renato Barreto Hoffmann and Adriano Vogel and Dalvan Griebler and Luiz G. L. Fernandes},
url = {https://doi.org/10.5753/eradrs.2020.10770},
doi = {10.5753/eradrs.2020.10770},
year = {2020},
date = {2020-04-01},
booktitle = {XX Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {117-120},
publisher = {Sociedade Brasileira de Computação (SBC)},
address = {Santa Maria, BR},
abstract = {Aplicações de stream de vídeo demandam processamento de alto desempenho para atender requisitos de tempo real. Nesse cenário, a programação paralela distribuída é uma alternativa para acelerar e escalar o desempenho. Neste trabalho, o objetivo é paralelizar uma aplicação de detecção de pistas com a biblioteca MPI usando o padrão Farm e implementando duas estratégias de distribuição de tarefas. Os resultados evidenciam os ganhos de desempenho.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Aplicações de stream de vídeo demandam processamento de alto desempenho para atender requisitos de tempo real. Nesse cenário, a programação paralela distribuída é uma alternativa para acelerar e escalar o desempenho. Neste trabalho, o objetivo é paralelizar uma aplicação de detecção de pistas com a biblioteca MPI usando o padrão Farm e implementando duas estratégias de distribuição de tarefas. Os resultados evidenciam os ganhos de desempenho. |
| Araujo, Gabriell Alves; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo Efficient NAS Parallel Benchmark Kernels with CUDA Inproceedings doi In: 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 9-16, IEEE, Västerås, Sweden, Sweden, 2020. @inproceedings{ARAUJO:PDP:20,
title = {Efficient NAS Parallel Benchmark Kernels with CUDA},
author = {Gabriell Alves Araujo and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1109/PDP50117.2020.00009},
doi = {10.1109/PDP50117.2020.00009},
year = {2020},
date = {2020-03-01},
booktitle = {28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)},
pages = {9-16},
publisher = {IEEE},
address = {Västerås, Sweden, Sweden},
series = {PDP'20},
abstract = {NAS Parallel Benchmarks (NPB) are one of the standard benchmark suites used to evaluate parallel hardware and software. There are many research efforts trying to provide different parallel versions apart from the original OpenMP and MPI. Concerning GPU accelerators, there are only the OpenCL and OpenACC available as consolidated versions. Our goal is to provide an efficient parallel implementation of the five NPB kernels with CUDA. Our contribution covers different aspects. First, best parallel programming practices were followed to implement NPB kernels using CUDA. Second, the support of larger workloads (class B and C) allow to stress and investigate the memory of robust GPUs. Third, we show that it is possible to make NPB efficient and suitable for GPUs although the benchmarks were designed for CPUs in the past. We succeed in achieving double performance with respect to the state-of-the-art in some cases as well as implementing efficient memory usage. Fourth, we discuss new experiments comparing performance and memory usage against OpenACC and OpenCL state-of-the-art versions using a relative new GPU architecture. The experimental results also revealed that our version is the best one for all the NPB kernels compared to OpenACC and OpenCL. The greatest differences were observed for the FT and EP kernels.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
NAS Parallel Benchmarks (NPB) are one of the standard benchmark suites used to evaluate parallel hardware and software. There are many research efforts trying to provide different parallel versions apart from the original OpenMP and MPI. Concerning GPU accelerators, there are only the OpenCL and OpenACC available as consolidated versions. Our goal is to provide an efficient parallel implementation of the five NPB kernels with CUDA. Our contribution covers different aspects. First, best parallel programming practices were followed to implement NPB kernels using CUDA. Second, the support of larger workloads (class B and C) allow to stress and investigate the memory of robust GPUs. Third, we show that it is possible to make NPB efficient and suitable for GPUs although the benchmarks were designed for CPUs in the past. We succeed in achieving double performance with respect to the state-of-the-art in some cases as well as implementing efficient memory usage. Fourth, we discuss new experiments comparing performance and memory usage against OpenACC and OpenCL state-of-the-art versions using a relative new GPU architecture. The experimental results also revealed that our version is the best one for all the NPB kernels compared to OpenACC and OpenCL. The greatest differences were observed for the FT and EP kernels. |
| Vogel, Adriano; Rista, Cassiano; Justo, Gabriel; Ewald, Endrius; Griebler, Dalvan; Mencagli, Gabriele; Fernandes, Luiz Gustavo Parallel Stream Processing with MPI for Video Analytics and Data Visualization Inproceedings doi In: High Performance Computing Systems, pp. 102-116, Springer, Cham, 2020. @inproceedings{VOGEL:CCIS:20,
title = {Parallel Stream Processing with MPI for Video Analytics and Data Visualization},
author = {Adriano Vogel and Cassiano Rista and Gabriel Justo and Endrius Ewald and Dalvan Griebler and Gabriele Mencagli and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1007/978-3-030-41050-6_7},
doi = {10.1007/978-3-030-41050-6_7},
year = {2020},
date = {2020-02-01},
booktitle = {High Performance Computing Systems},
volume = {1171},
pages = {102-116},
publisher = {Springer},
address = {Cham},
series = {Communications in Computer and Information Science (CCIS)},
abstract = {The amount of data generated is increasing exponentially. However, processing data and producing fast results is a technological challenge. Parallel stream processing can be implemented for handling high frequency and big data flows. The MPI parallel programming model offers low-level and flexible mechanisms for dealing with distributed architectures such as clusters. This paper aims to use it to accelerate video analytics and data visualization applications so that insight can be obtained as soon as the data arrives. Experiments were conducted with a Domain-Specific Language for Geospatial Data Visualization and a Person Recognizer video application. We applied the same stream parallelism strategy and two task distribution strategies. The dynamic task distribution achieved better performance than the static distribution in the HPC cluster. The data visualization achieved lower throughput with respect to the video analytics due to the I/O intensive operations. Also, the MPI programming model shows promising performance outcomes for stream processing applications.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
The amount of data generated is increasing exponentially. However, processing data and producing fast results is a technological challenge. Parallel stream processing can be implemented for handling high frequency and big data flows. The MPI parallel programming model offers low-level and flexible mechanisms for dealing with distributed architectures such as clusters. This paper aims to use it to accelerate video analytics and data visualization applications so that insight can be obtained as soon as the data arrives. Experiments were conducted with a Domain-Specific Language for Geospatial Data Visualization and a Person Recognizer video application. We applied the same stream parallelism strategy and two task distribution strategies. The dynamic task distribution achieved better performance than the static distribution in the HPC cluster. The data visualization achieved lower throughput with respect to the video analytics due to the I/O intensive operations. Also, the MPI programming model shows promising performance outcomes for stream processing applications. |
2019
|
| Pieper, Ricardo; Griebler, Dalvan; Fernandes, Luiz G. Structured Stream Parallelism for Rust Inproceedings doi In: XXIII Brazilian Symposium on Programming Languages (SBLP), pp. 54-61, ACM, Salvador, Brazil, 2019. @inproceedings{PIEPER:SBLP:19,
title = {Structured Stream Parallelism for Rust},
author = {Ricardo Pieper and Dalvan Griebler and Luiz G. Fernandes},
url = {https://doi.org/10.1145/3355378.3355384},
doi = {10.1145/3355378.3355384},
year = {2019},
date = {2019-10-01},
booktitle = {XXIII Brazilian Symposium on Programming Languages (SBLP)},
pages = {54-61},
publisher = {ACM},
address = {Salvador, Brazil},
series = {SBLP'19},
abstract = {Structured parallel programming has been studied and applied in several programming languages. This approach has proven to be suitable for abstracting low-level and architecture-dependent parallelism implementations. Our goal is to provide a structured and high-level library for the Rust language, targeting parallel stream processing applications for multi-core servers. Rust is an emerging programming language that has been developed by Mozilla Research group, focusing on performance, memory safety, and thread-safety. However, it lacks parallel programming abstractions, especially for stream processing applications. This paper contributes to a new API based on the structured parallel programming approach to simplify parallel software developing. Our experiments highlight that our solution provides higher-level parallel programming abstractions for stream processing applications in Rust. We also show that the throughput and speedup are comparable to the state-of-the-art for certain workloads.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Structured parallel programming has been studied and applied in several programming languages. This approach has proven to be suitable for abstracting low-level and architecture-dependent parallelism implementations. Our goal is to provide a structured and high-level library for the Rust language, targeting parallel stream processing applications for multi-core servers. Rust is an emerging programming language that has been developed by Mozilla Research group, focusing on performance, memory safety, and thread-safety. However, it lacks parallel programming abstractions, especially for stream processing applications. This paper contributes to a new API based on the structured parallel programming approach to simplify parallel software developing. Our experiments highlight that our solution provides higher-level parallel programming abstractions for stream processing applications in Rust. We also show that the throughput and speedup are comparable to the state-of-the-art for certain workloads. |
| Maliszewski, Anderson; Roloff, Eduardo; Griebler, Dalvan; Navaux, Philippe O Impacto da Interconexão de Rede no Desempenho de Programas Paralelos Inproceedings doi In: Anais do XX Simpósio em Sistemas Computacionais de Alto Desempenho, pp. 73-84, Sociedade Brasileira de Computação, Campo Grande, Brazil, 2019. @inproceedings{larcc:impacto_interconexao_HPC:WSCAD:19,
title = {O Impacto da Interconexão de Rede no Desempenho de Programas Paralelos},
author = {Anderson Maliszewski and Eduardo Roloff and Dalvan Griebler and Philippe Navaux},
url = {https://doi.org/10.5753/wscad.2019.8658},
doi = {10.5753/wscad.2019.8658},
year = {2019},
date = {2019-10-01},
booktitle = {Anais do XX Simpósio em Sistemas Computacionais de Alto Desempenho},
pages = {73-84},
publisher = {Sociedade Brasileira de Computação},
address = {Campo Grande, Brazil},
series = {WSCAD'19},
abstract = {O desempenho de aplicações paralelas depende de dois componentes principais do ambiente; o poder de processamento e a interconexão de rede. Neste trabalho, foi avaliado o impacto de uma interconexão de alto desempenho em programas paralelos em um cluster homogêneo de servidores interconectados por Gigabit Ethernet 1 Gbps e InfiniBand FDR 56 Gbps. Foi realizada uma caracterização do NAS Parallel Benchmarks em relação à computação, comunicação e custo de execução em instâncias da Microsoft Azure. Os resultados mostraram que, em aplicações altamente dependentes de rede, o desempenho pode ser significativamente melhorado ao utilizar InfiniBand a um custo de execução melhor, mesmo com o preço superior da instância.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
O desempenho de aplicações paralelas depende de dois componentes principais do ambiente; o poder de processamento e a interconexão de rede. Neste trabalho, foi avaliado o impacto de uma interconexão de alto desempenho em programas paralelos em um cluster homogêneo de servidores interconectados por Gigabit Ethernet 1 Gbps e InfiniBand FDR 56 Gbps. Foi realizada uma caracterização do NAS Parallel Benchmarks em relação à computação, comunicação e custo de execução em instâncias da Microsoft Azure. Os resultados mostraram que, em aplicações altamente dependentes de rede, o desempenho pode ser significativamente melhorado ao utilizar InfiniBand a um custo de execução melhor, mesmo com o preço superior da instância. |
| Mencagli, Gabriele; Torquati, Massimo; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo L. Raising the Parallel Abstraction Level for Streaming Analytics Applications Journal Article doi In: IEEE Access, vol. 7, pp. 131944 - 131961, 2019. @article{MENCAGLI:IEEEAccess:19,
title = {Raising the Parallel Abstraction Level for Streaming Analytics Applications},
author = {Gabriele Mencagli and Massimo Torquati and Dalvan Griebler and Marco Danelutto and Luiz Gustavo L. Fernandes},
url = {https://doi.org/10.1109/ACCESS.2019.2941183},
doi = {10.1109/ACCESS.2019.2941183},
year = {2019},
date = {2019-09-01},
journal = {IEEE Access},
volume = {7},
pages = {131944 - 131961},
publisher = {IEEE},
abstract = {In the stream processing domain, applications are represented by graphs of operators arbitrarily connected and filled with their business logic code. The APIs of existing Stream Processing Systems (SPSs) ease the development of transformations that recur in the streaming practice (e.g., filtering, aggregation and joins). In contrast, their parallelism abstractions are quite limited since they provide support to stateless operators only, or when the state is organized in a set of key-value pairs. This paper presents how the parallel patterns methodology can be revisited for sliding-window streaming analytics. Our vision fosters a design process of the application as composition and nesting of ready-to-use patterns provided through a C++17 fluent interface. Our prototype implements the run-time system of the patterns in the FastFlow parallel library expressing thread-based parallelism. The experimental analysis shows interesting outcomes. First, our pattern-based approach allows easy prototyping of different versions of the application, and the programmer can leverage nesting of patterns to increase performance (up to 37% in one of the two considered test-bed cases). Second, our FastFlow implementation outperforms (three times faster) the handmade porting of our patterns in popular JVM-based SPSs. Finally, in the concluding part of this paper, we explore the use of a task-based run-time system, by deriving interesting insights into how to make our patterns library suitable for multi backends.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
In the stream processing domain, applications are represented by graphs of operators arbitrarily connected and filled with their business logic code. The APIs of existing Stream Processing Systems (SPSs) ease the development of transformations that recur in the streaming practice (e.g., filtering, aggregation and joins). In contrast, their parallelism abstractions are quite limited since they provide support to stateless operators only, or when the state is organized in a set of key-value pairs. This paper presents how the parallel patterns methodology can be revisited for sliding-window streaming analytics. Our vision fosters a design process of the application as composition and nesting of ready-to-use patterns provided through a C++17 fluent interface. Our prototype implements the run-time system of the patterns in the FastFlow parallel library expressing thread-based parallelism. The experimental analysis shows interesting outcomes. First, our pattern-based approach allows easy prototyping of different versions of the application, and the programmer can leverage nesting of patterns to increase performance (up to 37% in one of the two considered test-bed cases). Second, our FastFlow implementation outperforms (three times faster) the handmade porting of our patterns in popular JVM-based SPSs. Finally, in the concluding part of this paper, we explore the use of a task-based run-time system, by deriving interesting insights into how to make our patterns library suitable for multi backends. |
| Fischer, Gabriel Souto; Righi, Rodrigo Rosa; Costa, Cristiano André; Galante, Guilherme; Griebler, Dalvan Towards Evaluating Proactive and Reactive Approaches on Reorganizing Human Resources in IoT-Based Smart Hospitals Journal Article doi In: Sensors, vol. 19, no. 17, pp. 3800, 2019. @article{FISHER:Elasticity-Hospital:SENSORS:19,
title = {Towards Evaluating Proactive and Reactive Approaches on Reorganizing Human Resources in IoT-Based Smart Hospitals},
author = {Gabriel Souto Fischer and Rodrigo Rosa Righi and Cristiano André Costa and Guilherme Galante and Dalvan Griebler},
url = {https://doi.org/10.3390/s19173800},
doi = {10.3390/s19173800},
year = {2019},
date = {2019-09-01},
journal = {Sensors},
volume = {19},
number = {17},
pages = {3800},
publisher = {MDPI},
abstract = {Hospitals play an important role on ensuring a proper treatment of human health. One of the problems to be faced is the increasingly overcrowded patients care queues, who end up waiting for longer times without proper treatment to their health problems. The allocation of health professionals in hospital environments is not able to adapt to the demands of patients. There are times when underused rooms have idle professionals, and overused rooms have fewer professionals than necessary. Previous works have not solved this problem since they focus on understanding the evolution of doctor supply and patient demand, as to better adjust one to the other. However, they have not proposed concrete solutions for that regarding techniques for better allocating available human resources. Moreover, elasticity is one of the most important features of cloud computing, referring to the ability to add or remove resources according to the needs of the application or service. Based on this background, we introduce Elastic allocation of human resources in Healthcare environments (ElHealth) an IoT-focused model able to monitor patient usage of hospital rooms and adapt these rooms for patients demand. Using reactive and proactive elasticity approaches, ElHealth identifies when a room will have a demand that exceeds the capacity of care, and proposes actions to move human resources to adapt to patient demand. Our main contribution is the definition of Human Resources IoT-based Elasticity (i.e., an extension of the concept of resource elasticity in Cloud Computing to manage the use of human resources in a healthcare environment, where health professionals are allocated and deallocated according to patient demand). Another contribution is a cost–benefit analysis for the use of reactive and predictive strategies on human resources reorganization. ElHealth was simulated on a hospital environment using data from a Brazilian polyclinic, and obtained promising results, decreasing the waiting time by up to 96.4% and 96.73% in reactive and proactive approaches, respectively.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Hospitals play an important role on ensuring a proper treatment of human health. One of the problems to be faced is the increasingly overcrowded patients care queues, who end up waiting for longer times without proper treatment to their health problems. The allocation of health professionals in hospital environments is not able to adapt to the demands of patients. There are times when underused rooms have idle professionals, and overused rooms have fewer professionals than necessary. Previous works have not solved this problem since they focus on understanding the evolution of doctor supply and patient demand, as to better adjust one to the other. However, they have not proposed concrete solutions for that regarding techniques for better allocating available human resources. Moreover, elasticity is one of the most important features of cloud computing, referring to the ability to add or remove resources according to the needs of the application or service. Based on this background, we introduce Elastic allocation of human resources in Healthcare environments (ElHealth) an IoT-focused model able to monitor patient usage of hospital rooms and adapt these rooms for patients demand. Using reactive and proactive elasticity approaches, ElHealth identifies when a room will have a demand that exceeds the capacity of care, and proposes actions to move human resources to adapt to patient demand. Our main contribution is the definition of Human Resources IoT-based Elasticity (i.e., an extension of the concept of resource elasticity in Cloud Computing to manage the use of human resources in a healthcare environment, where health professionals are allocated and deallocated according to patient demand). Another contribution is a cost–benefit analysis for the use of reactive and predictive strategies on human resources reorganization. ElHealth was simulated on a hospital environment using data from a Brazilian polyclinic, and obtained promising results, decreasing the waiting time by up to 96.4% and 96.73% in reactive and proactive approaches, respectively. |
| Rockenbach, Dinei A.; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo High-Level Stream Parallelism Abstractions with SPar Targeting GPUs Inproceedings doi In: Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing (ParCo), pp. 543-552, IOS Press, Prague, Czech Republic, 2019. @inproceedings{ROCKENBACH:PARCO:19,
title = {High-Level Stream Parallelism Abstractions with SPar Targeting GPUs},
author = {Dinei A. Rockenbach and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.3233/APC200083},
doi = {10.3233/APC200083},
year = {2019},
date = {2019-09-01},
booktitle = {Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing (ParCo)},
volume = {36},
pages = {543-552},
publisher = {IOS Press},
address = {Prague, Czech Republic},
series = {ParCo'19},
abstract = {The combined exploitation of stream and data parallelism is demonstrating encouraging performance results in the literature for heterogeneous architectures, which are present on every computer systems today. However, provide parallel software efficiently targeting those architectures requires significant programming effort and expertise. The SPar domain-specific language already represents a solution to this problem providing proven high-level programming abstractions for multi-core architectures. In this paper, we enrich the SPar language adding support for GPUs. New transformation rules are designed for generating parallel code using stream and data parallel patterns. Our experiments revealed that these transformations rules are able to improve performance while the high-level programming abstractions are maintained.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
The combined exploitation of stream and data parallelism is demonstrating encouraging performance results in the literature for heterogeneous architectures, which are present on every computer systems today. However, provide parallel software efficiently targeting those architectures requires significant programming effort and expertise. The SPar domain-specific language already represents a solution to this problem providing proven high-level programming abstractions for multi-core architectures. In this paper, we enrich the SPar language adding support for GPUs. New transformation rules are designed for generating parallel code using stream and data parallel patterns. Our experiments revealed that these transformations rules are able to improve performance while the high-level programming abstractions are maintained. |
| Vogel, Adriano; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo Seamless Parallelism Management for Multi-core Stream Processing Inproceedings doi In: Advances in Parallel Computing, Proceedings of the International Conference on Parallel Computing (ParCo), pp. 533-542, IOS Press, Prague, Czech Republic, 2019. @inproceedings{VOGEL:PARCO:19,
title = {Seamless Parallelism Management for Multi-core Stream Processing},
author = {Adriano Vogel and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.3233/APC200082},
doi = {10.3233/APC200082},
year = {2019},
date = {2019-09-01},
booktitle = {Advances in Parallel Computing, Proceedings of the International Conference on Parallel Computing (ParCo)},
volume = {36},
pages = {533-542},
publisher = {IOS Press},
address = {Prague, Czech Republic},
series = {ParCo'19},
abstract = {Video streaming applications have critical performance requirements for dealing with fluctuating workloads and providing results in real-time. As a consequence, the majority of these applications demand parallelism for delivering quality of service to users. Although high-level and structured parallel programming aims at facilitating parallelism exploitation, there are still several issues to be addressed for increasing/improving existing parallel programming abstractions. In this paper, we aim at employing self-adaptivity for stream processing in order to seamlessly manage the application parallelism configurations at run-time, where a new strategy alleviates from application programmers the need to set time-consuming and error-prone parallelism parameters. The new strategy was implemented and validated on SPar. The results have shown that the proposed solution increases the level of abstraction and achieved a competitive performance.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Video streaming applications have critical performance requirements for dealing with fluctuating workloads and providing results in real-time. As a consequence, the majority of these applications demand parallelism for delivering quality of service to users. Although high-level and structured parallel programming aims at facilitating parallelism exploitation, there are still several issues to be addressed for increasing/improving existing parallel programming abstractions. In this paper, we aim at employing self-adaptivity for stream processing in order to seamlessly manage the application parallelism configurations at run-time, where a new strategy alleviates from application programmers the need to set time-consuming and error-prone parallelism parameters. The new strategy was implemented and validated on SPar. The results have shown that the proposed solution increases the level of abstraction and achieved a competitive performance. |
| Teixeira, Djalma; Vogel, Adriano; Griebler, Dalvan Proposta de Monitoramento e Gerenciamento Inteligente de Temperatura em Datacenters Inproceedings In: 16th Escola Regional de Redes de Computadores (ERRC), pp. 1-8, Sociedade Brasileira de Computação, Alegrete, Brazil, 2019. @inproceedings{larcc:smart_datacenter_temperatura:ERRC:19,
title = {Proposta de Monitoramento e Gerenciamento Inteligente de Temperatura em Datacenters},
author = {Djalma Teixeira and Adriano Vogel and Dalvan Griebler},
url = {https://sol.sbc.org.br/index.php/errc/article/view/9209/9112},
year = {2019},
date = {2019-09-01},
booktitle = {16th Escola Regional de Redes de Computadores (ERRC)},
pages = {1-8},
publisher = {Sociedade Brasileira de Computação},
address = {Alegrete, Brazil},
series = {ERRC'19},
abstract = {O aumento constante do crescimento e desenvolvimento das infraestruturas computacionais, vem impulsionando uma demanda cada vez maior por monitoramento e gerenciamento inteligente de datacenters. Em um ambiente gerenciado autonomicamente os equipamentos são controlados por meio de ações autonômicas, que são executadas sob determinadas condições sem a necessidade de intervenção humana. O objetivo deste trabalho é propor um modelo conceitual de monitoramento e gerenciamento inteligente para temperatura, que pode ser aplicado tanto em estruturas básicas quanto complexas e adaptado a heterogeneidade dos datacenters atuais.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
O aumento constante do crescimento e desenvolvimento das infraestruturas computacionais, vem impulsionando uma demanda cada vez maior por monitoramento e gerenciamento inteligente de datacenters. Em um ambiente gerenciado autonomicamente os equipamentos são controlados por meio de ações autonômicas, que são executadas sob determinadas condições sem a necessidade de intervenção humana. O objetivo deste trabalho é propor um modelo conceitual de monitoramento e gerenciamento inteligente para temperatura, que pode ser aplicado tanto em estruturas básicas quanto complexas e adaptado a heterogeneidade dos datacenters atuais. |
| Vogel, Adriano; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo Minimizing Self-Adaptation Overhead in Parallel Stream Processing for Multi-Cores Inproceedings doi In: Euro-Par 2019: Parallel Processing Workshops, pp. 12, Springer, Göttingen, Germany, 2019. @inproceedings{VOGEL:adaptive-overhead:AutoDaSP:19,
title = {Minimizing Self-Adaptation Overhead in Parallel Stream Processing for Multi-Cores},
author = {Adriano Vogel and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1007/978-3-030-48340-1_3},
doi = {10.1007/978-3-030-48340-1_3},
year = {2019},
date = {2019-08-01},
booktitle = {Euro-Par 2019: Parallel Processing Workshops},
volume = {11997},
pages = {12},
publisher = {Springer},
address = {Göttingen, Germany},
series = {Lecture Notes in Computer Science},
abstract = {Stream processing paradigm is present in several applications that apply computations over continuous data flowing in the form of streams (e.g., video feeds, image, and data analytics). Employing self-adaptivity to stream processing applications can provide higher-level programming abstractions and autonomic resource management. However, there are cases where the performance is suboptimal. In this paper, the goal is to optimize parallelism adaptations in terms of stability and accuracy, which can improve the performance of parallel stream processing applications. Therefore, we present a new optimized self-adaptive strategy that is experimentally evaluated. The proposed solution provided high-level programming abstractions, reduced the adaptation overhead, and achieved a competitive performance with the best static executions.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Stream processing paradigm is present in several applications that apply computations over continuous data flowing in the form of streams (e.g., video feeds, image, and data analytics). Employing self-adaptivity to stream processing applications can provide higher-level programming abstractions and autonomic resource management. However, there are cases where the performance is suboptimal. In this paper, the goal is to optimize parallelism adaptations in terms of stability and accuracy, which can improve the performance of parallel stream processing applications. Therefore, we present a new optimized self-adaptive strategy that is experimentally evaluated. The proposed solution provided high-level programming abstractions, reduced the adaptation overhead, and achieved a competitive performance with the best static executions. |
| Maliszewski, Anderson M.; Vogel, Adriano; Griebler, Dalvan; Roloff, Eduardo; Fernandes, Luz G.; Navaux, Philippe O. A. Minimizing Communication Overheads in Container-based Clouds for HPC Applications Inproceedings doi In: IEEE Symposium on Computers and Communications (ISCC), pp. 1-6, IEEE, Barcelona, Spain, 2019. @inproceedings{larcc:communication_overhead_lxd:ISCC:19,
title = {Minimizing Communication Overheads in Container-based Clouds for HPC Applications},
author = {Anderson M. Maliszewski and Adriano Vogel and Dalvan Griebler and Eduardo Roloff and Luz G. Fernandes and Philippe O. A. Navaux},
url = {https://doi.org/10.1109/ISCC47284.2019.8969716},
doi = {10.1109/ISCC47284.2019.8969716},
year = {2019},
date = {2019-07-01},
booktitle = {IEEE Symposium on Computers and Communications (ISCC)},
pages = {1-6},
publisher = {IEEE},
address = {Barcelona, Spain},
series = {ISCC'19},
abstract = {Although the industry has embraced the cloud computing model, there are still significant challenges to be addressed concerning the quality of cloud services. Network-intensive applications may not scale in the cloud due to the sharing of the network infrastructure. In the literature, performance evaluation studies are showing that the network tends to limit the scalability and performance of HPC applications. Therefore, we proposed the aggregation of Network Interface Cards (NICs) in a ready-to-use integration with the OpenNebula cloud manager using Linux containers. We perform a set of experiments using a network microbenchmark to get specific network performance metrics and NAS parallel benchmarks to analyze the performance impact on HPC applications. Our results highlight that the implementation of NIC aggregation improves network performance in terms of throughput and latency. Moreover, HPC applications have different patterns of behavior when using our approach, which depends on communication and the amount of data transferring. While network-intensive applications increased the performance up to 38%, other applications with aggregated NICs maintained the same performance or presented slightly worse performance.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Although the industry has embraced the cloud computing model, there are still significant challenges to be addressed concerning the quality of cloud services. Network-intensive applications may not scale in the cloud due to the sharing of the network infrastructure. In the literature, performance evaluation studies are showing that the network tends to limit the scalability and performance of HPC applications. Therefore, we proposed the aggregation of Network Interface Cards (NICs) in a ready-to-use integration with the OpenNebula cloud manager using Linux containers. We perform a set of experiments using a network microbenchmark to get specific network performance metrics and NAS parallel benchmarks to analyze the performance impact on HPC applications. Our results highlight that the implementation of NIC aggregation improves network performance in terms of throughput and latency. Moreover, HPC applications have different patterns of behavior when using our approach, which depends on communication and the amount of data transferring. While network-intensive applications increased the performance up to 38%, other applications with aggregated NICs maintained the same performance or presented slightly worse performance. |
| Griebler, Dalvan; Vogel, Adriano; Sensi, Daniele De; Danelutto, Marco; Fernandes, Luiz Gustavo Simplifying and implementing service level objectives for stream parallelism Journal Article doi In: Journal of Supercomputing, vol. 76, pp. 4603-4628, 2019, ISSN: 0920-8542. @article{GRIEBLER:JS:19,
title = {Simplifying and implementing service level objectives for stream parallelism},
author = {Dalvan Griebler and Adriano Vogel and Daniele De Sensi and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1007/s11227-019-02914-6},
doi = {10.1007/s11227-019-02914-6},
issn = {0920-8542},
year = {2019},
date = {2019-06-01},
journal = {Journal of Supercomputing},
volume = {76},
pages = {4603-4628},
publisher = {Springer},
abstract = {An increasing attention has been given to provide service level objectives (SLOs) in stream processing applications due to the performance and energy requirements, and because of the need to impose limits in terms of resource usage while improving the system utilization. Since the current and next-generation computing systems are intrinsically offering parallel architectures, the software has to naturally exploit the architecture parallelism. Implement and meet SLOs on existing applications is not a trivial task for application programmers, since the software development process, besides the parallelism exploitation, requires the implementation of autonomic algorithms or strategies. This is a system-oriented programming approach and requires the management of multiple knobs and sensors (e.g., the number of threads to use, the clock frequency of the cores, etc.) so that the system can self-adapt at runtime. In this work, we introduce a new and simpler way to define SLO in the application’s source code, by abstracting from the programmer all the details relative to self-adaptive system implementation. The application programmer specifies which parts of the code to parallelize and the related SLOs that should be enforced. To reach this goal, source-to-source code transformation rules are implemented in our compiler, which automatically generates self-adaptive strategies to enforce, at runtime, the user-expressed objectives. The experiments highlighted promising results with simpler, effective, and efficient SLO implementations for real-world applications.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
An increasing attention has been given to provide service level objectives (SLOs) in stream processing applications due to the performance and energy requirements, and because of the need to impose limits in terms of resource usage while improving the system utilization. Since the current and next-generation computing systems are intrinsically offering parallel architectures, the software has to naturally exploit the architecture parallelism. Implement and meet SLOs on existing applications is not a trivial task for application programmers, since the software development process, besides the parallelism exploitation, requires the implementation of autonomic algorithms or strategies. This is a system-oriented programming approach and requires the management of multiple knobs and sensors (e.g., the number of threads to use, the clock frequency of the cores, etc.) so that the system can self-adapt at runtime. In this work, we introduce a new and simpler way to define SLO in the application’s source code, by abstracting from the programmer all the details relative to self-adaptive system implementation. The application programmer specifies which parts of the code to parallelize and the related SLOs that should be enforced. To reach this goal, source-to-source code transformation rules are implemented in our compiler, which automatically generates self-adaptive strategies to enforce, at runtime, the user-expressed objectives. The experiments highlighted promising results with simpler, effective, and efficient SLO implementations for real-world applications. |
| Rockenbach, Dinei A.; Stein, Charles Michael; Griebler, Dalvan; Mencagli, Gabriele; Torquati, Massimo; Danelutto, Marco; Fernandes, Luiz Gustavo Stream Processing on Multi-cores with GPUs: Parallel Programming Models' Challenges Inproceedings doi In: International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 834-841, IEEE, Rio de Janeiro, Brazil, 2019. @inproceedings{ROCKENBACH:stream-multigpus:IPDPSW:19,
title = {Stream Processing on Multi-cores with GPUs: Parallel Programming Models' Challenges},
author = {Dinei A. Rockenbach and Charles Michael Stein and Dalvan Griebler and Gabriele Mencagli and Massimo Torquati and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1109/IPDPSW.2019.00137},
doi = {10.1109/IPDPSW.2019.00137},
year = {2019},
date = {2019-05-01},
booktitle = {International Parallel and Distributed Processing Symposium Workshops (IPDPSW)},
pages = {834-841},
publisher = {IEEE},
address = {Rio de Janeiro, Brazil},
series = {IPDPSW'19},
abstract = {The stream processing paradigm is used in several scientific and enterprise applications in order to continuously compute results out of data items coming from data sources such as sensors. The full exploitation of the potential parallelism offered by current heterogeneous multi-cores equipped with one or more GPUs is still a challenge in the context of stream processing applications. In this work, our main goal is to present the parallel programming challenges that the programmer has to face when exploiting CPUs and GPUs' parallelism at the same time using traditional programming models. We highlight the parallelization methodology in two use-cases (the Mandelbrot Streaming benchmark and the PARSEC's Dedup application) to demonstrate the issues and benefits of using heterogeneous parallel hardware. The experiments conducted demonstrate how a high-level parallel programming model targeting stream processing like the one offered by SPar can be used to reduce the programming effort still offering a good level of performance if compared with state-of-the-art programming models.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
The stream processing paradigm is used in several scientific and enterprise applications in order to continuously compute results out of data items coming from data sources such as sensors. The full exploitation of the potential parallelism offered by current heterogeneous multi-cores equipped with one or more GPUs is still a challenge in the context of stream processing applications. In this work, our main goal is to present the parallel programming challenges that the programmer has to face when exploiting CPUs and GPUs' parallelism at the same time using traditional programming models. We highlight the parallelization methodology in two use-cases (the Mandelbrot Streaming benchmark and the PARSEC's Dedup application) to demonstrate the issues and benefits of using heterogeneous parallel hardware. The experiments conducted demonstrate how a high-level parallel programming model targeting stream processing like the one offered by SPar can be used to reduce the programming effort still offering a good level of performance if compared with state-of-the-art programming models. |
| Rockenbach, Dinei A.; Griebler, Dalvan; Fernandes, Luiz G. Proposta de Suporte ao Paralelismo de GPU na SPar Inproceedings In: Escola Regional de Alto Desempenho (ERAD-RS), pp. 4, Sociedade Brasileira de Computação (SBC), Três de Maio, BR, 2019. @inproceedings{ROCKENBACH:ERAD:19,
title = {Proposta de Suporte ao Paralelismo de GPU na SPar},
author = {Dinei A. Rockenbach and Dalvan Griebler and Luiz G. Fernandes},
url = {https://gmap.pucrs.br/dalvan/papers/2019/CR_ERAD_PG_Dinei_2019.pdf},
year = {2019},
date = {2019-04-01},
booktitle = {Escola Regional de Alto Desempenho (ERAD-RS)},
pages = {4},
publisher = {Sociedade Brasileira de Computação (SBC)},
address = {Três de Maio, BR},
abstract = {As GPUs (Graphics Processing Units) têm se destacado devido a seualto poder de processamento paralelo e sua presença crescente nos dispositivoscomputacionais. Porém, a sua exploração ainda requer conhecimento e esforçoconsideráveis do desenvolvedor. O presente trabalho propõe o suporte ao para-lelismo de GPU na SPar, que fornece um alto nível de abstração através de umalinguagem baseada em anotações do C++.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
As GPUs (Graphics Processing Units) têm se destacado devido a seualto poder de processamento paralelo e sua presença crescente nos dispositivoscomputacionais. Porém, a sua exploração ainda requer conhecimento e esforçoconsideráveis do desenvolvedor. O presente trabalho propõe o suporte ao para-lelismo de GPU na SPar, que fornece um alto nível de abstração através de umalinguagem baseada em anotações do C++. |
| Vogel, Adriano; Griebler, Dalvan; Fernandes, Luiz G. Adaptando o Paralelismo em Aplicações de Stream Conforme Objetivos de Throughput Inproceedings In: Escola Regional de Alto Desempenho (ERAD-RS), pp. 4, Sociedade Brasileira de Computação (SBC), Três de Maio, BR, 2019. @inproceedings{VOGEL:ERAD:19,
title = {Adaptando o Paralelismo em Aplicações de Stream Conforme Objetivos de Throughput},
author = {Adriano Vogel and Dalvan Griebler and Luiz G. Fernandes},
url = {https://gmap.pucrs.br/dalvan/papers/2019/CR_ERAD_PG_Vogel_2019.pdf},
year = {2019},
date = {2019-04-01},
booktitle = {Escola Regional de Alto Desempenho (ERAD-RS)},
pages = {4},
publisher = {Sociedade Brasileira de Computação (SBC)},
address = {Três de Maio, BR},
abstract = {As aplicações de processamento de streams possuem característicasde execuções dinâmicas com variações na carga e na demanda por recursos. Adaptar o grau de paralelismo é uma alternativa para responder a variaçãodurante a execução. Nesse trabalho é apresentada uma abstração de parale-lismo para a DSL SPar através de uma estratégia que autonomicamente adaptao grau de paralelismo de acordo com objetivos de desempenho.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
As aplicações de processamento de streams possuem característicasde execuções dinâmicas com variações na carga e na demanda por recursos. Adaptar o grau de paralelismo é uma alternativa para responder a variaçãodurante a execução. Nesse trabalho é apresentada uma abstração de parale-lismo para a DSL SPar através de uma estratégia que autonomicamente adaptao grau de paralelismo de acordo com objetivos de desempenho. |