2023
|
| Mallmann, Leonardo; Bianchessi, Arthur; Griebler, Dalvan Impacto da biblioteca padrão do C++ nos Kernels do NAS Parallel Benchmarks Inproceedings doi In: Anais da XXIII Escola Regional de Alto Desempenho da Região Sul, pp. 89-92, Sociedade Brasileira de Computação, Porto Alegre, Brazil, 2023. @inproceedings{MALLMANN:ERAD:23,
title = {Impacto da biblioteca padrão do C++ nos Kernels do NAS Parallel Benchmarks},
author = {Leonardo Mallmann and Arthur Bianchessi and Dalvan Griebler},
url = {https://doi.org/10.5753/eradrs.2023.229236},
doi = {10.5753/eradrs.2023.229236},
year = {2023},
date = {2023-05-01},
booktitle = {Anais da XXIII Escola Regional de Alto Desempenho da Região Sul},
pages = {89-92},
publisher = {Sociedade Brasileira de Computação},
address = {Porto Alegre, Brazil},
abstract = {A programação paralela nativa na linguagem C++ ganhou força com os std algorithms e suas políticas de execução paralela. Para que seja possível a aplicação destes recursos, porém, é necessário a incorporação no código das estruturas de dados sobre as quais tais funções possam operar. Mesmo adicionando uma camada de abstração maior através de tais estruturas, observou-se um tempo de execução similar à versão em C.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
A programação paralela nativa na linguagem C++ ganhou força com os std algorithms e suas políticas de execução paralela. Para que seja possível a aplicação destes recursos, porém, é necessário a incorporação no código das estruturas de dados sobre as quais tais funções possam operar. Mesmo adicionando uma camada de abstração maior através de tais estruturas, observou-se um tempo de execução similar à versão em C. |
| Cunha, Lucas; Hoffmann, Renato; Griebler, Dalvan; Manssour, Isabel Avaliação do Paralelismo dos Kernels EP e CG em Sistemas Embarcados Inproceedings doi In: Anais da XXIII Escola Regional de Alto Desempenho da Região Sul, pp. 57-60, Sociedade Brasileira de Computação, Porto Alegre, Brazil, 2023. @inproceedings{CUNHA:ERAD:23,
title = {Avaliação do Paralelismo dos Kernels EP e CG em Sistemas Embarcados},
author = {Lucas Cunha and Renato Hoffmann and Dalvan Griebler and Isabel Manssour},
url = {https://doi.org/10.5753/eradrs.2023.229264},
doi = {10.5753/eradrs.2023.229264},
year = {2023},
date = {2023-05-01},
booktitle = {Anais da XXIII Escola Regional de Alto Desempenho da Região Sul},
pages = {57-60},
publisher = {Sociedade Brasileira de Computação},
address = {Porto Alegre, Brazil},
abstract = {Nesse artigo, testamos o ganho de desempenho obtido ao implementar códigos com processamento paralelo em sistemas embarcados genéricos. Para analisar ao desempenho em relação ao Speedup ideal, foram testados dois algoritmos (EP e CG) paralelos em dois sistemas embarcados diferentes. Os resultados mostram uma discrepância entre o melhor (3.98X) e o pior (1.38X) desempenho obtidos, indicando o tamanho do espectrum de desempenho.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Nesse artigo, testamos o ganho de desempenho obtido ao implementar códigos com processamento paralelo em sistemas embarcados genéricos. Para analisar ao desempenho em relação ao Speedup ideal, foram testados dois algoritmos (EP e CG) paralelos em dois sistemas embarcados diferentes. Os resultados mostram uma discrepância entre o melhor (3.98X) e o pior (1.38X) desempenho obtidos, indicando o tamanho do espectrum de desempenho. |
| Müller, Caetano; Griebler, Dalvan Um estudo sobre uso do MPI para uma aplicação de detecção de picos em data streams Inproceedings doi In: Anais da XXIII Escola Regional de Alto Desempenho da Região Sul, pp. 97-100, Sociedade Brasileira de Computação, Porto Alegre, Brazil, 2023. @inproceedings{MULLER:ERAD:23,
title = {Um estudo sobre uso do MPI para uma aplicação de detecção de picos em data streams},
author = {Caetano Müller and Dalvan Griebler},
url = {https://doi.org/10.5753/eradrs.2023.229253},
doi = {10.5753/eradrs.2023.229253},
year = {2023},
date = {2023-05-01},
booktitle = {Anais da XXIII Escola Regional de Alto Desempenho da Região Sul},
pages = {97-100},
publisher = {Sociedade Brasileira de Computação},
address = {Porto Alegre, Brazil},
abstract = {Aplicações de data stream podem ser implementadas com diferentes interfaces de programação paralela. Neste artigo, realizou-se um estudo e implementação da aplicação Spike Detection com MPI e a comparou-se com versões usando Flink, Storm e Windflow. Avaliou-se o trouhgput e conclui-se que a implementação com Windflow apresenta o melhor desempenho, enquanto as versões com MPI tiveram um throughput inferior as demais soluções.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Aplicações de data stream podem ser implementadas com diferentes interfaces de programação paralela. Neste artigo, realizou-se um estudo e implementação da aplicação Spike Detection com MPI e a comparou-se com versões usando Flink, Storm e Windflow. Avaliou-se o trouhgput e conclui-se que a implementação com Windflow apresenta o melhor desempenho, enquanto as versões com MPI tiveram um throughput inferior as demais soluções. |
| Bianchessi, Arthur; Mallmann, Leonardo; Griebler, Dalvan Avaliação do paralelismo nos kernels NAS Parallel Benchmarks usando estruturas de dados da biblioteca C++ Inproceedings doi In: Anais da XXIII Escola Regional de Alto Desempenho da Região Sul, pp. 61-64, Sociedade Brasileira de Computação, Porto Alegre, Brazil, 2023. @inproceedings{BIANCHESSI:ERAD:23,
title = {Avaliação do paralelismo nos kernels NAS Parallel Benchmarks usando estruturas de dados da biblioteca C++},
author = {Arthur Bianchessi and Leonardo Mallmann and Dalvan Griebler},
url = {https://doi.org/10.5753/eradrs.2023.229266},
doi = {10.5753/eradrs.2023.229266},
year = {2023},
date = {2023-05-01},
booktitle = {Anais da XXIII Escola Regional de Alto Desempenho da Região Sul},
pages = {61-64},
publisher = {Sociedade Brasileira de Computação},
address = {Porto Alegre, Brazil},
abstract = {O conjunto de aplicações NAS Parallel Benchmark (NPB) é projetado para avaliar a eficiência da paralelização em sistemas computacionais. Nesse estudo, a versão NPB-CPP foi adaptada para utilizar a C++ Standard Library e seu desempenho foi avaliado. Os resultados apontaram para uma boa performance nos kernels EP, FT e CG. Entretanto, apresentou uma degradação no desempenho dos kernels MG e IS.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
O conjunto de aplicações NAS Parallel Benchmark (NPB) é projetado para avaliar a eficiência da paralelização em sistemas computacionais. Nesse estudo, a versão NPB-CPP foi adaptada para utilizar a C++ Standard Library e seu desempenho foi avaliado. Os resultados apontaram para uma boa performance nos kernels EP, FT e CG. Entretanto, apresentou uma degradação no desempenho dos kernels MG e IS. |
| Zomer, Bernardo; Hoffmann, Renato; Griebler, Dalvan Implementação e Avaliação de Desempenho da Linguagem Rust no NAS Embarassingly Parallel Benchmark Inproceedings doi In: Anais da XXIII Escola Regional de Alto Desempenho da Região Sul, pp. 53-56, Sociedade Brasileira de Computação, Porto Alegre, Brazil, 2023. @inproceedings{ZOMER:ERAD:23,
title = {Implementação e Avaliação de Desempenho da Linguagem Rust no NAS Embarassingly Parallel Benchmark},
author = {Bernardo Zomer and Renato Hoffmann and Dalvan Griebler},
url = {https://doi.org/10.5753/eradrs.2023.229261},
doi = {10.5753/eradrs.2023.229261},
year = {2023},
date = {2023-05-01},
booktitle = {Anais da XXIII Escola Regional de Alto Desempenho da Região Sul},
pages = {53-56},
publisher = {Sociedade Brasileira de Computação},
address = {Porto Alegre, Brazil},
abstract = {Rust é uma linguagem multiparadigmática de alto desempenho que garante segurança de memória. NAS Parallel Benchmarks engloba aplicações paralelas de computação de dinâmica de fluidos, possuindo versões em Fortran e C++. Neste trabalho, a aplicação EP foi convertida para Rust e paralelizada com as bibliotecas Rayon e Rust SSP. Na avaliação de desempenho, Rust demonstrou a melhor escalabilidade no paralelismo quando foi usado Rust SSP.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Rust é uma linguagem multiparadigmática de alto desempenho que garante segurança de memória. NAS Parallel Benchmarks engloba aplicações paralelas de computação de dinâmica de fluidos, possuindo versões em Fortran e C++. Neste trabalho, a aplicação EP foi convertida para Rust e paralelizada com as bibliotecas Rayon e Rust SSP. Na avaliação de desempenho, Rust demonstrou a melhor escalabilidade no paralelismo quando foi usado Rust SSP. |
| Eichner, Eduardo; Andrade, Gabriella; Griebler, Dalvan; Fernandes, Luiz Gustavo Análise de Correlação no Esforço de Desenvolvimento de Aplicações Paralelas Inproceedings doi In: Anais da XXIII Escola Regional de Alto Desempenho da Região Sul, pp. 49-52, Sociedade Brasileira de Computação, Porto Alegre, Brazil, 2023. @inproceedings{EICHNER:ERAD:23,
title = {Análise de Correlação no Esforço de Desenvolvimento de Aplicações Paralelas},
author = {Eduardo Eichner and Gabriella Andrade and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.5753/eradrs.2023.229265},
doi = {10.5753/eradrs.2023.229265},
year = {2023},
date = {2023-05-01},
booktitle = {Anais da XXIII Escola Regional de Alto Desempenho da Região Sul},
pages = {49-52},
publisher = {Sociedade Brasileira de Computação},
address = {Porto Alegre, Brazil},
abstract = {Neste trabalho, foram analisadas diversas métricas, voltadas para uma aplicação de processamento de vídeo, utilizando as interfaces FastFlow, TBB e SPar. Os resultados revelam que utilizando a SPar e o FastFlow é possível desenvolver uma aplicação paralela eficiente com menos esforço, ao contrário do TBB. Em trabalhos futuros planejamos incluir mais aplicações no dataset a fim de confirmar os resultados.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Neste trabalho, foram analisadas diversas métricas, voltadas para uma aplicação de processamento de vídeo, utilizando as interfaces FastFlow, TBB e SPar. Os resultados revelam que utilizando a SPar e o FastFlow é possível desenvolver uma aplicação paralela eficiente com menos esforço, ao contrário do TBB. Em trabalhos futuros planejamos incluir mais aplicações no dataset a fim de confirmar os resultados. |
| Gaspary, Pedro; Müller, Caetano; Griebler, Dalvan; Eizirik, Eduardo Avaliação do paralelismo em classificadores taxonômicos de sequências de rRNA usando Qiime2 Inproceedings doi In: Anais da XXIII Escola Regional de Alto Desempenho da Região Sul, pp. 13-16, Sociedade Brasileira de Computação, Porto Alegre, Brazil, 2023. @inproceedings{GASPARY:ERAD:23,
title = {Avaliação do paralelismo em classificadores taxonômicos de sequências de rRNA usando Qiime2},
author = {Pedro Gaspary and Caetano Müller and Dalvan Griebler and Eduardo Eizirik},
url = {https://doi.org/10.5753/eradrs.2023.229241},
doi = {10.5753/eradrs.2023.229241},
year = {2023},
date = {2023-05-01},
booktitle = {Anais da XXIII Escola Regional de Alto Desempenho da Região Sul},
pages = {13-16},
publisher = {Sociedade Brasileira de Computação},
address = {Porto Alegre, Brazil},
abstract = {Classificação de sequências de rRNA é de suma importância para análise de microbiomas. Portanto, este trabalho avaliou o desempenho e eficiência do paralelismo de três algoritmos de classificação taxonômica do Qiime2. Entre eles, o VSearch apresentou a melhor eficiência na paralelização, mas também os maiores tempos de execução. Os outros dois, Naive-Bayes e Hybrid, apresentaram desempenho similar entre si, este sendo mais rápido até o quinto grau de paralelismo, e consumindo pouco menos memória que aquele.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Classificação de sequências de rRNA é de suma importância para análise de microbiomas. Portanto, este trabalho avaliou o desempenho e eficiência do paralelismo de três algoritmos de classificação taxonômica do Qiime2. Entre eles, o VSearch apresentou a melhor eficiência na paralelização, mas também os maiores tempos de execução. Os outros dois, Naive-Bayes e Hybrid, apresentaram desempenho similar entre si, este sendo mais rápido até o quinto grau de paralelismo, e consumindo pouco menos memória que aquele. |
| Hoffmann, Renato Barreto; Griebler, Dalvan Avaliando Paralelismo em Dispositivos com Recursos Limitados Inproceedings doi In: Anais da XXIII Escola Regional de Alto Desempenho da Região Sul, pp. 105-106, Sociedade Brasileira de Computação, Porto Alegre, Brazil, 2023. @inproceedings{HOFFMANN:ERAD:23,
title = {Avaliando Paralelismo em Dispositivos com Recursos Limitados},
author = {Renato Barreto Hoffmann and Dalvan Griebler},
url = {https://doi.org/10.5753/eradrs.2023.229269},
doi = {10.5753/eradrs.2023.229269},
year = {2023},
date = {2023-05-01},
booktitle = {Anais da XXIII Escola Regional de Alto Desempenho da Região Sul},
pages = {105-106},
publisher = {Sociedade Brasileira de Computação},
address = {Porto Alegre, Brazil},
abstract = {Sistemas computacionais com recursos limitados dispõem cada vez mais recursos computacionais. Portanto, para atender às restrições de desempenho e obter um baixo consumo de energia, é necessário utilizar o paralelismo. Este trabalho avaliou 4 aplicações em 3 dispositivos diferentes comparando 5 interfaces de paralelismo.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Sistemas computacionais com recursos limitados dispõem cada vez mais recursos computacionais. Portanto, para atender às restrições de desempenho e obter um baixo consumo de energia, é necessário utilizar o paralelismo. Este trabalho avaliou 4 aplicações em 3 dispositivos diferentes comparando 5 interfaces de paralelismo. |
| Leonarczyk, Ricardo; Griebler, Dalvan Avaliação da Auto-Adaptação de Micro-Lote para aplicação de Processamento de Streaming em GPUs Inproceedings doi In: Anais da XXIII Escola Regional de Alto Desempenho da Região Sul, pp. 123-124, Sociedade Brasileira de Computação, Porto Alegre, Brazil, 2023. @inproceedings{LEONARCZYK:ERAD:23,
title = {Avaliação da Auto-Adaptação de Micro-Lote para aplicação de Processamento de Streaming em GPUs},
author = {Ricardo Leonarczyk and Dalvan Griebler},
url = {https://doi.org/10.5753/eradrs.2023.229267},
doi = {10.5753/eradrs.2023.229267},
year = {2023},
date = {2023-05-01},
booktitle = {Anais da XXIII Escola Regional de Alto Desempenho da Região Sul},
pages = {123-124},
publisher = {Sociedade Brasileira de Computação},
address = {Porto Alegre, Brazil},
abstract = {Este artigo apresenta uma avaliação de algoritmos para regular a latência através da auto-adaptação de micro-lote em sistemas de processamento de streaming acelerados por GPU. Os resultados demonstraram que o algoritmo com o fator de adaptação fixo conseguiu ficar por mais tempo na região de latência especificada para a aplicação.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Este artigo apresenta uma avaliação de algoritmos para regular a latência através da auto-adaptação de micro-lote em sistemas de processamento de streaming acelerados por GPU. Os resultados demonstraram que o algoritmo com o fator de adaptação fixo conseguiu ficar por mais tempo na região de latência especificada para a aplicação. |
| Fim, Gabriel Rustick; Griebler, Dalvan Implementação e Avaliação do Paralelismo de Flink nas Aplicações de Processamento de Log e Análise de Cliques Inproceedings doi In: Anais da XXIII Escola Regional de Alto Desempenho da Região Sul, pp. 69-72, Sociedade Brasileira de Computação, Porto Alegre, Brazil, 2023. @inproceedings{larcc:FIM:ERAD:23,
title = {Implementação e Avaliação do Paralelismo de Flink nas Aplicações de Processamento de Log e Análise de Cliques},
author = {Gabriel Rustick Fim and Dalvan Griebler},
url = {https://doi.org/10.5753/eradrs.2023.229290},
doi = {10.5753/eradrs.2023.229290},
year = {2023},
date = {2023-05-01},
booktitle = {Anais da XXIII Escola Regional de Alto Desempenho da Região Sul},
pages = {69-72},
publisher = {Sociedade Brasileira de Computação},
address = {Porto Alegre, Brazil},
abstract = {Este trabalho visou implementar e avaliar o desempenho das aplicações de Processamento de Log e Análise de Cliques no Apache Flink, comparando o desempenho com Apache Storm em um ambiente computacional distribuído. Os resultados mostram que a execução em Flink apresenta um consumo de recursos relativamente menor quando comparada a execução em Storm, mas possui um desvio padrão alto expondo um desbalanceamento de carga em execuções onde algum componente da aplicação é replicado.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Este trabalho visou implementar e avaliar o desempenho das aplicações de Processamento de Log e Análise de Cliques no Apache Flink, comparando o desempenho com Apache Storm em um ambiente computacional distribuído. Os resultados mostram que a execução em Flink apresenta um consumo de recursos relativamente menor quando comparada a execução em Storm, mas possui um desvio padrão alto expondo um desbalanceamento de carga em execuções onde algum componente da aplicação é replicado. |
| Dopke, Luan; Griebler, Dalvan Estudo Sobre Spark nas Aplicações de Processamento de Log e Análise de Cliques Inproceedings doi In: Anais da XXIII Escola Regional de Alto Desempenho da Região Sul, pp. 85-88, Sociedade Brasileira de Computação, Porto Alegre, Brazil, 2023. @inproceedings{larcc:DOPKE:ERAD:23,
title = {Estudo Sobre Spark nas Aplicações de Processamento de Log e Análise de Cliques},
author = {Luan Dopke and Dalvan Griebler},
url = {https://doi.org/10.5753/eradrs.2023.229298},
doi = {10.5753/eradrs.2023.229298},
year = {2023},
date = {2023-05-01},
booktitle = {Anais da XXIII Escola Regional de Alto Desempenho da Região Sul},
pages = {85-88},
publisher = {Sociedade Brasileira de Computação},
address = {Porto Alegre, Brazil},
abstract = {O uso de aplicações de processamento de dados de fluxo contínuo vem crescendo cada vez mais, dado este fato o presente estudo visa mensurar a desempenho do framework Apache Spark Strucutured Streaming perante o framework Apache Storm nas aplicações de fluxo contínuo de dados, estas sendo processamento de logs e análise de cliques. Os resultados demonstram melhor desempenho para o Apache Storm em ambas as aplicações.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
O uso de aplicações de processamento de dados de fluxo contínuo vem crescendo cada vez mais, dado este fato o presente estudo visa mensurar a desempenho do framework Apache Spark Strucutured Streaming perante o framework Apache Storm nas aplicações de fluxo contínuo de dados, estas sendo processamento de logs e análise de cliques. Os resultados demonstram melhor desempenho para o Apache Storm em ambas as aplicações. |
 | Andrade, Gabriella; Griebler, Dalvan; Santos, Rodrigo; Fernandes, Luiz Gustavo A parallel programming assessment for stream processing applications on multi-core systems Journal Article doi In: Computer Standards & Interfaces, vol. 84, pp. 103691, 2023. @article{ANDRADE:CSI:2023,
title = {A parallel programming assessment for stream processing applications on multi-core systems},
author = {Gabriella Andrade and Dalvan Griebler and Rodrigo Santos and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1016/j.csi.2022.103691},
doi = {10.1016/j.csi.2022.103691},
year = {2023},
date = {2023-03-01},
journal = {Computer Standards & Interfaces},
volume = {84},
pages = {103691},
publisher = {Elsevier},
abstract = {Multi-core systems are any computing device nowadays and stream processing applications are becoming recurrent workloads, demanding parallelism to achieve the desired quality of service. As soon as data, tasks, or requests arrive, they must be computed, analyzed, or processed. Since building such applications is not a trivial task, the software industry must adopt parallel APIs (Application Programming Interfaces) that simplify the exploitation of parallelism in hardware for accelerating time-to-market. In the last years, research efforts in academia and industry provided a set of parallel APIs, increasing productivity to software developers. However, a few studies are seeking to prove the usability of these interfaces. In this work, we aim to present a parallel programming assessment regarding the usability of parallel API for expressing parallelism on the stream processing application domain and multi-core systems. To this end, we conducted an empirical study with beginners in parallel application development. The study covered three parallel APIs, reporting several quantitative and qualitative indicators involving developers. Our contribution also comprises a parallel programming assessment methodology, which can be replicated in future assessments. This study revealed important insights such as recurrent compile-time and programming logic errors performed by beginners in parallel programming, as well as the programming effort, challenges, and learning curve. Moreover, we collected the participants’ opinions about their experience in this study to understand deeply the results achieved.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Multi-core systems are any computing device nowadays and stream processing applications are becoming recurrent workloads, demanding parallelism to achieve the desired quality of service. As soon as data, tasks, or requests arrive, they must be computed, analyzed, or processed. Since building such applications is not a trivial task, the software industry must adopt parallel APIs (Application Programming Interfaces) that simplify the exploitation of parallelism in hardware for accelerating time-to-market. In the last years, research efforts in academia and industry provided a set of parallel APIs, increasing productivity to software developers. However, a few studies are seeking to prove the usability of these interfaces. In this work, we aim to present a parallel programming assessment regarding the usability of parallel API for expressing parallelism on the stream processing application domain and multi-core systems. To this end, we conducted an empirical study with beginners in parallel application development. The study covered three parallel APIs, reporting several quantitative and qualitative indicators involving developers. Our contribution also comprises a parallel programming assessment methodology, which can be replicated in future assessments. This study revealed important insights such as recurrent compile-time and programming logic errors performed by beginners in parallel programming, as well as the programming effort, challenges, and learning curve. Moreover, we collected the participants’ opinions about their experience in this study to understand deeply the results achieved. |
| Vogel, Adriano; Danelutto, Marco; Griebler, Dalvan; Fernandes, Luiz Gustavo Revisiting self-adaptation for efficient decision-making at run-time in parallel executions Inproceedings doi In: 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 43-50, IEEE, Naples, Italy, 2023. @inproceedings{VOGEL:PDP:23,
title = {Revisiting self-adaptation for efficient decision-making at run-time in parallel executions},
author = {Adriano Vogel and Marco Danelutto and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1109/PDP59025.2023.00015},
doi = {10.1109/PDP59025.2023.00015},
year = {2023},
date = {2023-03-01},
booktitle = {31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)},
pages = {43-50},
publisher = {IEEE},
address = {Naples, Italy},
series = {PDP'23},
abstract = {Self-adaptation is a potential alternative to provide a higher level of autonomic abstractions and run-time responsiveness in parallel executions. However, the recurrent problem is that self-adaptation is still limited in flexibility and efficiency. For instance, there is a lack of mechanisms to apply adaptation actions and efficient decision-making strategies to decide which configurations should be conveniently enforced at run-time. In this work, we are interested in providing and evaluating potential abstractions achievable with self-adaptation transparently managing parallel executions. Therefore, we provide a new mechanism to support self-adaptation in applications with multiple parallel stages executed in multi-cores. Moreover, we reproduce, reimplement, and evaluate an existing decision-making strategy in our scenario. The observations from the results show that the proposed mechanism for self-adaptation can provide new parallelism abstractions and autonomous responsiveness at run-time. On the other hand, there is a need for more accurate decision-making strategies to enable efficient executions of applications in resource-constrained scenarios like multi-cores.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Self-adaptation is a potential alternative to provide a higher level of autonomic abstractions and run-time responsiveness in parallel executions. However, the recurrent problem is that self-adaptation is still limited in flexibility and efficiency. For instance, there is a lack of mechanisms to apply adaptation actions and efficient decision-making strategies to decide which configurations should be conveniently enforced at run-time. In this work, we are interested in providing and evaluating potential abstractions achievable with self-adaptation transparently managing parallel executions. Therefore, we provide a new mechanism to support self-adaptation in applications with multiple parallel stages executed in multi-cores. Moreover, we reproduce, reimplement, and evaluate an existing decision-making strategy in our scenario. The observations from the results show that the proposed mechanism for self-adaptation can provide new parallelism abstractions and autonomous responsiveness at run-time. On the other hand, there is a need for more accurate decision-making strategies to enable efficient executions of applications in resource-constrained scenarios like multi-cores. |
| Garcia, Adriano Marques; Griebler, Dalvan; Schepke, Claudio; García, José Daniel; Muñoz, Javier Fernández; Fernandes, Luiz Gustavo A Latency, Throughput, and Programmability Perspective of GrPPI for Streaming on Multi-cores Inproceedings doi In: 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 164-168, IEEE, Naples, Italy, 2023. @inproceedings{GARCIA:PDP:23,
title = {A Latency, Throughput, and Programmability Perspective of GrPPI for Streaming on Multi-cores},
author = {Adriano Marques Garcia and Dalvan Griebler and Claudio Schepke and José Daniel García and Javier Fernández Muñoz and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1109/PDP59025.2023.00033},
doi = {10.1109/PDP59025.2023.00033},
year = {2023},
date = {2023-03-01},
booktitle = {31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)},
pages = {164-168},
publisher = {IEEE},
address = {Naples, Italy},
series = {PDP'23},
abstract = {Several solutions aim to simplify the burdening task of parallel programming. The GrPPI library is one of them. It allows users to implement parallel code for multiple backends through a unified, abstract, and generic layer while promising minimal overhead on performance. An outspread evaluation of GrPPI regarding stream parallelism with representative metrics for this domain, such as throughput and latency, was not yet done. In this work, we evaluate GrPPI focused on stream processing. We evaluate performance, memory usage, and programming effort and compare them against handwritten parallel code. For this, we use the benchmarking framework SPBench to build custom GrPPI benchmarks. The basis of the benchmarks is real applications, such as Lane Detection, Bzip2, Face Recognizer, and Ferret. Experiments show that while performance is competitive with handwritten code in some cases, in other cases, the infeasibility of fine-tuning GrPPI is a crucial drawback. Despite this, programmability experiments estimate that GrPPI has the potential to reduce by about three times the development time of parallel applications.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Several solutions aim to simplify the burdening task of parallel programming. The GrPPI library is one of them. It allows users to implement parallel code for multiple backends through a unified, abstract, and generic layer while promising minimal overhead on performance. An outspread evaluation of GrPPI regarding stream parallelism with representative metrics for this domain, such as throughput and latency, was not yet done. In this work, we evaluate GrPPI focused on stream processing. We evaluate performance, memory usage, and programming effort and compare them against handwritten parallel code. For this, we use the benchmarking framework SPBench to build custom GrPPI benchmarks. The basis of the benchmarks is real applications, such as Lane Detection, Bzip2, Face Recognizer, and Ferret. Experiments show that while performance is competitive with handwritten code in some cases, in other cases, the infeasibility of fine-tuning GrPPI is a crucial drawback. Despite this, programmability experiments estimate that GrPPI has the potential to reduce by about three times the development time of parallel applications. |
 | Garcia, Adriano Marques; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo SPBench: a framework for creating benchmarks of stream processing applications Journal Article doi In: Computing, vol. 105, no. 5, pp. 1077-1099, 2023. @article{GARCIA:Computing:23,
title = {SPBench: a framework for creating benchmarks of stream processing applications},
author = {Adriano Marques Garcia and Dalvan Griebler and Claudio Schepke and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1007/s00607-021-01025-6},
doi = {10.1007/s00607-021-01025-6},
year = {2023},
date = {2023-01-01},
urldate = {2023-01-01},
journal = {Computing},
volume = {105},
number = {5},
pages = {1077-1099},
publisher = {Springer},
abstract = {In a fast-changing data-driven world, real-time data processing systems are becoming ubiquitous in everyday applications. The increasing data we produce, such as audio, video, image, and, text are demanding quickly and efficiently computation. Stream Parallelism allows accelerating this computation for real-time processing. But it is still a challenging task and most reserved for experts. In this paper, we present SPBench, a framework for benchmarking stream processing applications. It aims to support users with a set of real-world stream processing applications, which are made accessible through an Application Programming Interface (API) and executable via Command Line Interface (CLI) to create custom benchmarks. We tested SPBench by implementing parallel benchmarks with Intel Threading Building Blocks (TBB), FastFlow, and SPar. This evaluation provided useful insights and revealed the feasibility of the proposed framework in terms of usage, customization, and performance analysis. SPBench demonstrated to be a high-level, reusable, extensible, and easy of use abstraction to build parallel stream processing benchmarks on multi-core architectures.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
In a fast-changing data-driven world, real-time data processing systems are becoming ubiquitous in everyday applications. The increasing data we produce, such as audio, video, image, and, text are demanding quickly and efficiently computation. Stream Parallelism allows accelerating this computation for real-time processing. But it is still a challenging task and most reserved for experts. In this paper, we present SPBench, a framework for benchmarking stream processing applications. It aims to support users with a set of real-world stream processing applications, which are made accessible through an Application Programming Interface (API) and executable via Command Line Interface (CLI) to create custom benchmarks. We tested SPBench by implementing parallel benchmarks with Intel Threading Building Blocks (TBB), FastFlow, and SPar. This evaluation provided useful insights and revealed the feasibility of the proposed framework in terms of usage, customization, and performance analysis. SPBench demonstrated to be a high-level, reusable, extensible, and easy of use abstraction to build parallel stream processing benchmarks on multi-core architectures. |
 | Araujo, Gabriell; Griebler, Dalvan; Rockenbach, Dinei A.; Danelutto, Marco; Fernandes, Luiz Gustavo NAS Parallel Benchmarks with CUDA and Beyond Journal Article doi In: Software: Practice and Experience, vol. 53, no. 1, pp. 53-80, 2023. @article{ARAUJO:SPE:23,
title = {NAS Parallel Benchmarks with CUDA and Beyond},
author = {Gabriell Araujo and Dalvan Griebler and Dinei A. Rockenbach and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1002/spe.3056},
doi = {10.1002/spe.3056},
year = {2023},
date = {2023-01-01},
urldate = {2023-01-01},
journal = {Software: Practice and Experience},
volume = {53},
number = {1},
pages = {53-80},
publisher = {Wiley},
abstract = {NAS Parallel Benchmarks (NPB) is a standard benchmark suite used in the evaluation of parallel hardware and software. Several research efforts from academia have made these benchmarks available with different parallel programming models beyond the original versions with OpenMP and MPI. This work joins these research efforts by providing a new CUDA implementation for NPB. Our contribution covers different aspects beyond the implementation. First, we define design principles based on the best programming practices for GPUs and apply them to each benchmark using CUDA. Second, we provide ease of use parametrization support for configuring the number of threads per block in our version. Third, we conduct a broad study on the impact of the number of threads per block in the benchmarks. Fourth, we propose and evaluate five strategies for helping to find a better number of threads per block configuration. The results have revealed relevant performance improvement solely by changing the number of threads per block, showing performance improvements from 8% up to 717% among the benchmarks. Fifth, we conduct a comparative analysis with the literature, evaluating performance, memory consumption, code refactoring required, and parallelism implementations. The performance results have shown up to 267% improvements over the best benchmarks versions available. We also observe the best and worst design choices, concerning code size and the performance trade-off. Lastly, we highlight the challenges of implementing parallel CFD applications for GPUs and how the computations impact the GPU's behavior.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
NAS Parallel Benchmarks (NPB) is a standard benchmark suite used in the evaluation of parallel hardware and software. Several research efforts from academia have made these benchmarks available with different parallel programming models beyond the original versions with OpenMP and MPI. This work joins these research efforts by providing a new CUDA implementation for NPB. Our contribution covers different aspects beyond the implementation. First, we define design principles based on the best programming practices for GPUs and apply them to each benchmark using CUDA. Second, we provide ease of use parametrization support for configuring the number of threads per block in our version. Third, we conduct a broad study on the impact of the number of threads per block in the benchmarks. Fourth, we propose and evaluate five strategies for helping to find a better number of threads per block configuration. The results have revealed relevant performance improvement solely by changing the number of threads per block, showing performance improvements from 8% up to 717% among the benchmarks. Fifth, we conduct a comparative analysis with the literature, evaluating performance, memory consumption, code refactoring required, and parallelism implementations. The performance results have shown up to 267% improvements over the best benchmarks versions available. We also observe the best and worst design choices, concerning code size and the performance trade-off. Lastly, we highlight the challenges of implementing parallel CFD applications for GPUs and how the computations impact the GPU's behavior. |
 | Garcia, Adriano Marques; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo Micro-batch and data frequency for stream processing on multi-cores Journal Article doi In: The Journal of Supercomputing, vol. 79, no. 8, pp. 9206-9244, 2023. @article{GARCIA:JS:23,
title = {Micro-batch and data frequency for stream processing on multi-cores},
author = {Adriano Marques Garcia and Dalvan Griebler and Claudio Schepke and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1007/s11227-022-05024-y},
doi = {10.1007/s11227-022-05024-y},
year = {2023},
date = {2023-01-01},
journal = {The Journal of Supercomputing},
volume = {79},
number = {8},
pages = {9206-9244},
publisher = {Springer},
abstract = {Latency or throughput is often critical performance metrics in stream processing. Applications’ performance can fluctuate depending on the input stream. This unpredictability is due to the variety in data arrival frequency and size, complexity, and other factors. Researchers are constantly investigating new ways to mitigate the impact of these variations on performance with self-adaptive techniques involving elasticity or micro-batching. However, there is a lack of benchmarks capable of creating test scenarios to further evaluate these techniques. This work extends and improves the SPBench benchmarking framework to support dynamic micro-batching and data stream frequency management. We also propose a set of algorithms that generates the most commonly used frequency patterns for benchmarking stream processing in related work. It allows the creation of a wide variety of test scenarios. To validate our solution, we use SPBench to create custom benchmarks and evaluate the impact of micro-batching and data stream frequency on the performance of Intel TBB and FastFlow. These are two libraries that leverage stream parallelism for multi-core architectures. Our results demonstrated that our test cases did not benefit from micro-batches on multi-cores. For different data stream frequency configurations, TBB ensured the lowest latency, while FastFlow assured higher throughput in shorter pipelines.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Latency or throughput is often critical performance metrics in stream processing. Applications’ performance can fluctuate depending on the input stream. This unpredictability is due to the variety in data arrival frequency and size, complexity, and other factors. Researchers are constantly investigating new ways to mitigate the impact of these variations on performance with self-adaptive techniques involving elasticity or micro-batching. However, there is a lack of benchmarks capable of creating test scenarios to further evaluate these techniques. This work extends and improves the SPBench benchmarking framework to support dynamic micro-batching and data stream frequency management. We also propose a set of algorithms that generates the most commonly used frequency patterns for benchmarking stream processing in related work. It allows the creation of a wide variety of test scenarios. To validate our solution, we use SPBench to create custom benchmarks and evaluate the impact of micro-batching and data stream frequency on the performance of Intel TBB and FastFlow. These are two libraries that leverage stream parallelism for multi-core architectures. Our results demonstrated that our test cases did not benefit from micro-batches on multi-cores. For different data stream frequency configurations, TBB ensured the lowest latency, while FastFlow assured higher throughput in shorter pipelines. |
2022
|
 | Löff, Júnior; Hoffmann, Renato Barreto; Griebler, Dalvan; Fernandes, Luiz Gustavo Combining stream with data parallelism abstractions for multi-cores Journal Article doi In: Journal of Computer Languages, vol. 73, pp. 101160, 2022. @article{LOFF:COLA:22,
title = {Combining stream with data parallelism abstractions for multi-cores},
author = {Júnior Löff and Renato Barreto Hoffmann and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1016/j.cola.2022.101160},
doi = {10.1016/j.cola.2022.101160},
year = {2022},
date = {2022-12-01},
urldate = {2022-12-01},
journal = {Journal of Computer Languages},
volume = {73},
pages = {101160},
publisher = {Elsevier},
abstract = {Stream processing applications have seen an increasing demand with the raised availability of sensors, IoT devices, and user data. Modern systems can generate millions of data items per day that require to be processed timely. To deal with this demand, application programmers must consider parallelism to exploit the maximum performance of the underlying hardware resources. In this work, we introduce improvements to stream processing applications by exploiting fine-grained data parallelism (via Map and MapReduce) inside coarse-grained stream parallelism stages. The improvements are including techniques for identifying data parallelism in sequential codes, a new language, semantic analysis, and a set of definition and transformation rules to perform source-to-source parallel code generation. Moreover, we investigate the feasibility of employing higher-level programming abstractions to support the proposed optimizations. For that, we elect SPar programming model as a use case, and extend it by adding two new attributes to its language and implementing our optimizations as a new algorithm in the SPar compiler. We conduct a set of experiments in representative stream processing and data-parallel applications. The results showed that our new compiler algorithm is efficient and that performance improved by up to 108.4x in data-parallel applications. Furthermore, experiments evaluating stream processing applications towards the composition of stream and data parallelism revealed new insights. The results showed that such composition may improve latencies by up to an order of magnitude. Also, it enables programmers to exploit different degrees of stream and data parallelism to accomplish a balance between throughput and latency according to their necessity.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Stream processing applications have seen an increasing demand with the raised availability of sensors, IoT devices, and user data. Modern systems can generate millions of data items per day that require to be processed timely. To deal with this demand, application programmers must consider parallelism to exploit the maximum performance of the underlying hardware resources. In this work, we introduce improvements to stream processing applications by exploiting fine-grained data parallelism (via Map and MapReduce) inside coarse-grained stream parallelism stages. The improvements are including techniques for identifying data parallelism in sequential codes, a new language, semantic analysis, and a set of definition and transformation rules to perform source-to-source parallel code generation. Moreover, we investigate the feasibility of employing higher-level programming abstractions to support the proposed optimizations. For that, we elect SPar programming model as a use case, and extend it by adding two new attributes to its language and implementing our optimizations as a new algorithm in the SPar compiler. We conduct a set of experiments in representative stream processing and data-parallel applications. The results showed that our new compiler algorithm is efficient and that performance improved by up to 108.4x in data-parallel applications. Furthermore, experiments evaluating stream processing applications towards the composition of stream and data parallelism revealed new insights. The results showed that such composition may improve latencies by up to an order of magnitude. Also, it enables programmers to exploit different degrees of stream and data parallelism to accomplish a balance between throughput and latency according to their necessity. |
 | Ernstsson, August; Griebler, Dalvan; Kessler, Christoph Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems Journal Article doi In: International Journal of Parallel Programming, vol. 51, no. 5, pp. 61-82, 2022. @article{Ernstsson:IJPP:22,
title = {Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems},
author = {August Ernstsson and Dalvan Griebler and Christoph Kessler},
url = {https://doi.org/10.1007/s10766-022-00746-1},
doi = {10.1007/s10766-022-00746-1},
year = {2022},
date = {2022-12-01},
urldate = {2022-12-01},
journal = {International Journal of Parallel Programming},
volume = {51},
number = {5},
pages = {61-82},
publisher = {Springer},
abstract = {We analyze the performance portability of the skeleton-based, single-source multi-backend high-level programming framework SkePU across multiple different CPU–GPU heterogeneous systems. Thereby, we provide a systematic application efficiency characterization of SkePU-generated code in comparison to equivalent hand-written code in more low-level parallel programming models such as OpenMP and CUDA. For this purpose, we contribute ports of the STREAM benchmark suite and of a part of the NAS Parallel Benchmark suite to SkePU. We show that for STREAM and the EP benchmark, SkePU regularly scores efficiency values above 80% and in particular for CPU systems, SkePU can outperform hand-written code..},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
We analyze the performance portability of the skeleton-based, single-source multi-backend high-level programming framework SkePU across multiple different CPU–GPU heterogeneous systems. Thereby, we provide a systematic application efficiency characterization of SkePU-generated code in comparison to equivalent hand-written code in more low-level parallel programming models such as OpenMP and CUDA. For this purpose, we contribute ports of the STREAM benchmark suite and of a part of the NAS Parallel Benchmark suite to SkePU. We show that for STREAM and the EP benchmark, SkePU regularly scores efficiency values above 80% and in particular for CPU systems, SkePU can outperform hand-written code.. |
| Andrade, Gabriella; Griebler, Dalvan; Santos, Rodrigo; Fernandes, Luiz Gustavo Opinião de Brasileiros Sobre a Produtividade no Desenvolvimento de Aplicações Paralelas Inproceedings doi In: Anais do XXII Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD), pp. 276-287, SBC, Florianópolis, Brasil, 2022. @inproceedings{ANDRADE:WSCAD:22,
title = {Opinião de Brasileiros Sobre a Produtividade no Desenvolvimento de Aplicações Paralelas},
author = {Gabriella Andrade and Dalvan Griebler and Rodrigo Santos and Luiz Gustavo Fernandes},
url = {https://doi.org/10.5753/wscad.2022.226392},
doi = {10.5753/wscad.2022.226392},
year = {2022},
date = {2022-10-01},
booktitle = {Anais do XXII Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)},
pages = {276-287},
publisher = {SBC},
address = {Florianópolis, Brasil},
abstract = {A partir da popularização das arquiteturas paralelas, surgiram várias interfaces de programação a fim de facilitar a exploração de tais arquiteturas e aumentar a produtividade dos desenvolvedores. Entretanto, desenvolver aplicações paralelas ainda é uma tarefa complexa para desenvolvedores com pouca experiência. Neste trabalho, realizamos uma pesquisa para descobrir a opinião de desenvolvedores de aplicações paralelas sobre os fatores que impedem a produtividade. Nossos resultados mostraram que a experiência dos desenvolvedores é uma das principais razões para a baixa produtividade. Além disso, os resultados indicaram formas para contornar este problema, como melhorar e incentivar o ensino de programação paralela em cursos de graduação.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
A partir da popularização das arquiteturas paralelas, surgiram várias interfaces de programação a fim de facilitar a exploração de tais arquiteturas e aumentar a produtividade dos desenvolvedores. Entretanto, desenvolver aplicações paralelas ainda é uma tarefa complexa para desenvolvedores com pouca experiência. Neste trabalho, realizamos uma pesquisa para descobrir a opinião de desenvolvedores de aplicações paralelas sobre os fatores que impedem a produtividade. Nossos resultados mostraram que a experiência dos desenvolvedores é uma das principais razões para a baixa produtividade. Além disso, os resultados indicaram formas para contornar este problema, como melhorar e incentivar o ensino de programação paralela em cursos de graduação. |
| Rockenbach, Dinei A.; Löff, Júnior; Araujo, Gabriell; Griebler, Dalvan; Fernandes, Luiz G. High-Level Stream and Data Parallelism in C++ for GPUs Inproceedings doi In: XXVI Brazilian Symposium on Programming Languages (SBLP), pp. 41-49, ACM, Uberlândia, Brazil, 2022. @inproceedings{ROCKENBACH:SBLP:22,
title = {High-Level Stream and Data Parallelism in C++ for GPUs},
author = {Dinei A. Rockenbach and Júnior Löff and Gabriell Araujo and Dalvan Griebler and Luiz G. Fernandes},
url = {https://doi.org/10.1145/3561320.3561327},
doi = {10.1145/3561320.3561327},
year = {2022},
date = {2022-10-01},
booktitle = {XXVI Brazilian Symposium on Programming Languages (SBLP)},
pages = {41-49},
publisher = {ACM},
address = {Uberlândia, Brazil},
series = {SBLP'22},
abstract = {GPUs are massively parallel processors that allow solving problems that are not viable to traditional processors like CPUs. However, implementing applications for GPUs is challenging to programmers as it requires parallel programming to efficiently exploit the GPU resources. In this sense, parallel programming abstractions, notably domain-specific languages, are fundamental for improving programmability. SPar is a high-level Domain-Specific Language (DSL) that allows expressing stream and data parallelism in the serial code through annotations using C++ attributes. This work elaborates on a methodology and tool for GPU code generation by introducing new attributes to SPar language and transformation rules to SPar compiler. These new contributions, besides the gains in simplicity and code reduction compared to CUDA and OpenCL, enabled SPar achieve of higher throughput when exploring combined CPU and GPU parallelism, and when using batching.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
GPUs are massively parallel processors that allow solving problems that are not viable to traditional processors like CPUs. However, implementing applications for GPUs is challenging to programmers as it requires parallel programming to efficiently exploit the GPU resources. In this sense, parallel programming abstractions, notably domain-specific languages, are fundamental for improving programmability. SPar is a high-level Domain-Specific Language (DSL) that allows expressing stream and data parallelism in the serial code through annotations using C++ attributes. This work elaborates on a methodology and tool for GPU code generation by introducing new attributes to SPar language and transformation rules to SPar compiler. These new contributions, besides the gains in simplicity and code reduction compared to CUDA and OpenCL, enabled SPar achieve of higher throughput when exploring combined CPU and GPU parallelism, and when using batching. |
| Andrade, Gabriella; Griebler, Dalvan; Santos, Rodrigo; Kessler, Christoph; Ernstsson, August; Fernandes, Luiz Gustavo Analyzing Programming Effort Model Accuracy of High-Level Parallel Programs for Stream Processing Inproceedings doi In: 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA 2022), pp. 229-232, IEEE, Gran Canaria, Spain, 2022. @inproceedings{ANDRADE:SEAA:22,
title = {Analyzing Programming Effort Model Accuracy of High-Level Parallel Programs for Stream Processing},
author = {Gabriella Andrade and Dalvan Griebler and Rodrigo Santos and Christoph Kessler and August Ernstsson and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1109/SEAA56994.2022.00043},
doi = {10.1109/SEAA56994.2022.00043},
year = {2022},
date = {2022-09-01},
booktitle = {48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA 2022)},
pages = {229-232},
publisher = {IEEE},
address = {Gran Canaria, Spain},
series = {SEAA'22},
abstract = {Over the years, several Parallel Programming Models (PPMs) have supported the abstraction of programming complexity for parallel computer systems. However, few studies aim to evaluate the productivity reached by such abstractions since this is a complex task that involves human beings. There are several studies to develop predictive methods to estimate the effort required to program applications in software engineering. In order to evaluate the reliability of such metrics, it is necessary to assess the accuracy in different programming domains. In this work, we used the data of an experiment conducted with beginners in parallel programming to determine the effort required for implementing stream parallelism using FastFlow, SPar, and TBB. Our results show that some traditional software effort estimation models, such as COCOMO II, fall short, while Putnam's model could be an alternative for high-level PPMs evaluation. To overcome the limitations of existing models, we plan to create a parallelism-aware model to evaluate applications in this domain in future work.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Over the years, several Parallel Programming Models (PPMs) have supported the abstraction of programming complexity for parallel computer systems. However, few studies aim to evaluate the productivity reached by such abstractions since this is a complex task that involves human beings. There are several studies to develop predictive methods to estimate the effort required to program applications in software engineering. In order to evaluate the reliability of such metrics, it is necessary to assess the accuracy in different programming domains. In this work, we used the data of an experiment conducted with beginners in parallel programming to determine the effort required for implementing stream parallelism using FastFlow, SPar, and TBB. Our results show that some traditional software effort estimation models, such as COCOMO II, fall short, while Putnam's model could be an alternative for high-level PPMs evaluation. To overcome the limitations of existing models, we plan to create a parallelism-aware model to evaluate applications in this domain in future work. |
| Garcia, Adriano Marques; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo Evaluating Micro-batch and Data Frequency for Stream Processing Applications on Multi-cores Inproceedings doi In: 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 10-17, IEEE, Valladolid, Spain, 2022. @inproceedings{GARCIA:PDP:22,
title = {Evaluating Micro-batch and Data Frequency for Stream Processing Applications on Multi-cores},
author = {Adriano Marques Garcia and Dalvan Griebler and Claudio Schepke and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1109/PDP55904.2022.00011},
doi = {10.1109/PDP55904.2022.00011},
year = {2022},
date = {2022-04-01},
booktitle = {30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)},
pages = {10-17},
publisher = {IEEE},
address = {Valladolid, Spain},
series = {PDP'22},
abstract = {In stream processing, data arrives constantly and is often unpredictable. It can show large fluctuations in arrival frequency, size, complexity, and other factors. These fluctuations can strongly impact application latency and throughput, which are critical factors in this domain. Therefore, there is a significant amount of research on self-adaptive techniques involving elasticity or micro-batching as a way to mitigate this impact. However, there is a lack of benchmarks and tools for helping researchers to investigate micro-batching and data stream frequency implications. In this paper, we extend a benchmarking framework to support dynamic micro-batching and data stream frequency management. We used it to create custom benchmarks and compare latency and throughput aspects from two different parallel libraries. We validate our solution through an extensive analysis of the impact of micro-batching and data stream frequency on stream processing applications using Intel TBB and FastFlow, which are two libraries that leverage stream parallelism on multi-core architectures. Our results demonstrated up to 33% throughput gain over latency using micro-batches. Additionally, while TBB ensures lower latency, FastFlow ensures higher throughput in the parallel applications for different data stream frequency configurations.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In stream processing, data arrives constantly and is often unpredictable. It can show large fluctuations in arrival frequency, size, complexity, and other factors. These fluctuations can strongly impact application latency and throughput, which are critical factors in this domain. Therefore, there is a significant amount of research on self-adaptive techniques involving elasticity or micro-batching as a way to mitigate this impact. However, there is a lack of benchmarks and tools for helping researchers to investigate micro-batching and data stream frequency implications. In this paper, we extend a benchmarking framework to support dynamic micro-batching and data stream frequency management. We used it to create custom benchmarks and compare latency and throughput aspects from two different parallel libraries. We validate our solution through an extensive analysis of the impact of micro-batching and data stream frequency on stream processing applications using Intel TBB and FastFlow, which are two libraries that leverage stream parallelism on multi-core architectures. Our results demonstrated up to 33% throughput gain over latency using micro-batches. Additionally, while TBB ensures lower latency, FastFlow ensures higher throughput in the parallel applications for different data stream frequency configurations. |
| Mencagli, Gabriele; Griebler, Dalvan; Danelutto, Marco Towards Parallel Data Stream Processing on System-on-Chip CPU+GPU Devices Inproceedings doi In: 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 34-38, IEEE, Valladolid, Spain, 2022. @inproceedings{MENCAGLI:PDP:22,
title = {Towards Parallel Data Stream Processing on System-on-Chip CPU+GPU Devices},
author = {Gabriele Mencagli and Dalvan Griebler and Marco Danelutto},
url = {https://doi.org/10.1109/PDP55904.2022.00014},
doi = {10.1109/PDP55904.2022.00014},
year = {2022},
date = {2022-04-01},
booktitle = {30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)},
pages = {34-38},
publisher = {IEEE},
address = {Valladolid, Spain},
series = {PDP'22},
abstract = {Data Stream Processing is a pervasive computing paradigm with a wide spectrum of applications. Traditional streaming systems exploit the processing capabilities provided by homogeneous Clusters and Clouds. Due to the transition to streaming systems suitable for IoT/Edge environments, there has been the urgent need of new streaming frameworks and tools tailored for embedded platforms, often available as System-onChips composed of a small multicore CPU and an integrated onchip GPU. Exploiting this hybrid hardware requires special care in the runtime system design. In this paper, we discuss the support provided by the WindFlow library, showing its design principles and its effectiveness on the NVIDIA Jetson Nano board.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Data Stream Processing is a pervasive computing paradigm with a wide spectrum of applications. Traditional streaming systems exploit the processing capabilities provided by homogeneous Clusters and Clouds. Due to the transition to streaming systems suitable for IoT/Edge environments, there has been the urgent need of new streaming frameworks and tools tailored for embedded platforms, often available as System-onChips composed of a small multicore CPU and an integrated onchip GPU. Exploiting this hybrid hardware requires special care in the runtime system design. In this paper, we discuss the support provided by the WindFlow library, showing its design principles and its effectiveness on the NVIDIA Jetson Nano board. |
| Andrade, Gabriella; Griebler, Dalvan; Fernandes, Luiz Gustavo Avaliação do Esforço de Programação em GPU: Estudo Piloto Inproceedings doi In: Anais da XXII Escola Regional de Alto Desempenho da Região Sul, pp. 95-96, Sociedade Brasileira de Computação, Curitiba, Brazil, 2022. @inproceedings{ANDRADE:ERAD:22,
title = {Avaliação do Esforço de Programação em GPU: Estudo Piloto},
author = {Gabriella Andrade and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.5753/eradrs.2022.19179},
doi = {10.5753/eradrs.2022.19179},
year = {2022},
date = {2022-04-01},
booktitle = {Anais da XXII Escola Regional de Alto Desempenho da Região Sul},
pages = {95-96},
publisher = {Sociedade Brasileira de Computação},
address = {Curitiba, Brazil},
abstract = {O desenvolvimento de aplicações para GPU não é uma tarefa fácil, pois exige um maior conhecimento da arquitetura. Neste trabalho realizamos um estudo piloto para avaliar o esforço de programadores não-especialistas ao desenvolver aplicações para GPU. Os resultados revelaram que a GSParLib requer menos esforço em relação as demais interfaces de programação paralela. Entretanto, mais investigações são necessárias a fim de complementar o estudo.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
O desenvolvimento de aplicações para GPU não é uma tarefa fácil, pois exige um maior conhecimento da arquitetura. Neste trabalho realizamos um estudo piloto para avaliar o esforço de programadores não-especialistas ao desenvolver aplicações para GPU. Os resultados revelaram que a GSParLib requer menos esforço em relação as demais interfaces de programação paralela. Entretanto, mais investigações são necessárias a fim de complementar o estudo. |
| Garcia, Adriano Marques; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo Um Framework para Criar Benchmarks de Aplicações Paralelas de Stream Inproceedings doi In: Anais da XXII Escola Regional de Alto Desempenho da Região Sul, pp. 97-98, Sociedade Brasileira de Computação, Curitiba, Brazil, 2022. @inproceedings{GARCIA:ERAD:22,
title = {Um Framework para Criar Benchmarks de Aplicações Paralelas de Stream},
author = {Adriano Marques Garcia and Dalvan Griebler and Claudio Schepke and Luiz Gustavo Fernandes},
url = {https://doi.org/10.5753/eradrs.2022.19180},
doi = {10.5753/eradrs.2022.19180},
year = {2022},
date = {2022-04-01},
booktitle = {Anais da XXII Escola Regional de Alto Desempenho da Região Sul},
pages = {97-98},
publisher = {Sociedade Brasileira de Computação},
address = {Curitiba, Brazil},
abstract = {Este trabalho apresenta o SPBench, um framework para o desenvolvimento de benchmarks de processamento de stream em C++. O SPBench fornece um conjunto de aplicações realísticas através de abstrações de alto nível e permite customizações nos dados de entrada e métricas de desempenho.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Este trabalho apresenta o SPBench, um framework para o desenvolvimento de benchmarks de processamento de stream em C++. O SPBench fornece um conjunto de aplicações realísticas através de abstrações de alto nível e permite customizações nos dados de entrada e métricas de desempenho. |
| Faé, Leonardo; Griebler, Dalvan; Manssour, Isabel Aplicação de Vídeo com Flink, Storm e SPar em Multicores Inproceedings doi In: Anais da XXII Escola Regional de Alto Desempenho da Região Sul, pp. 13-16, Sociedade Brasileira de Computação, Curitiba, Brazil, 2022. @inproceedings{FAE:ERAD:22,
title = {Aplicação de Vídeo com Flink, Storm e SPar em Multicores},
author = {Leonardo Faé and Dalvan Griebler and Isabel Manssour},
url = {https://doi.org/10.5753/eradrs.2022.19149},
doi = {10.5753/eradrs.2022.19149},
year = {2022},
date = {2022-04-01},
booktitle = {Anais da XXII Escola Regional de Alto Desempenho da Região Sul},
pages = {13-16},
publisher = {Sociedade Brasileira de Computação},
address = {Curitiba, Brazil},
abstract = {Este trabalho apresenta comparações de desempenho entre as interfaces de programação SPar, Apache Flink e Apache Storm, no que diz respeito à execução de uma aplicação de processamento de vídeo. Os resultados revelam que as versões da SPar apresentam um desempenho superior, enquanto o Apache Storm apresentou o pior desempenho.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Este trabalho apresenta comparações de desempenho entre as interfaces de programação SPar, Apache Flink e Apache Storm, no que diz respeito à execução de uma aplicação de processamento de vídeo. Os resultados revelam que as versões da SPar apresentam um desempenho superior, enquanto o Apache Storm apresentou o pior desempenho. |
| Müller, Caetano; Löff, Junior; Griebler, Dalvan; Eizirik, Eduardo Avaliação da aplicação de paralelismo em classificadores taxonômicos usando Qiime2 Inproceedings doi In: Anais da XXII Escola Regional de Alto Desempenho da Região Sul, pp. 25-28, Sociedade Brasileira de Computação, Curitiba, Brazil, 2022. @inproceedings{MULLER:ERAD:22,
title = {Avaliação da aplicação de paralelismo em classificadores taxonômicos usando Qiime2},
author = {Caetano Müller and Junior Löff and Dalvan Griebler and Eduardo Eizirik},
url = {https://doi.org/10.5753/eradrs.2022.19152},
doi = {10.5753/eradrs.2022.19152},
year = {2022},
date = {2022-04-01},
booktitle = {Anais da XXII Escola Regional de Alto Desempenho da Região Sul},
pages = {25-28},
publisher = {Sociedade Brasileira de Computação},
address = {Curitiba, Brazil},
abstract = {A classificação de sequências de DNA usando algoritmos de aprendizado de máquina ainda tem espaço para evoluir, tanto na qualidade do resultado quanto na eficiência computacional dos algoritmos. Nesse trabalho, realizou-se uma avaliação de desempenho em dois algoritmos de aprendizado de máquina da ferramenta Qiime2 para classificação de sequências de DNA. Os resultados mostram que o desempenho melhorou em até 9,65 vezes utilizando 9 threads.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
A classificação de sequências de DNA usando algoritmos de aprendizado de máquina ainda tem espaço para evoluir, tanto na qualidade do resultado quanto na eficiência computacional dos algoritmos. Nesse trabalho, realizou-se uma avaliação de desempenho em dois algoritmos de aprendizado de máquina da ferramenta Qiime2 para classificação de sequências de DNA. Os resultados mostram que o desempenho melhorou em até 9,65 vezes utilizando 9 threads. |
| Löff, Júnior; Griebler, Dalvan; Fernandes, Luiz Gustavo Proposta de Framework para Processamento de Stream Distribuído em C++ utilizando o MPI Inproceedings doi In: Anais da XXII Escola Regional de Alto Desempenho da Região Sul, pp. 91-92, Sociedade Brasileira de Computação, Curitiba, Brazil, 2022. @inproceedings{LOFF:ERAD:22,
title = {Proposta de Framework para Processamento de Stream Distribuído em C++ utilizando o MPI},
author = {Júnior Löff and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.5753/eradrs.2022.19177},
doi = {10.5753/eradrs.2022.19177},
year = {2022},
date = {2022-04-01},
booktitle = {Anais da XXII Escola Regional de Alto Desempenho da Região Sul},
pages = {91-92},
publisher = {Sociedade Brasileira de Computação},
address = {Curitiba, Brazil},
abstract = {Este trabalho apresenta uma proposta de framework para processamento de stream distribuído em C++ com MPI. A etapa inicial do estudo aborda a problemática de pesquisa e a concepção da arquitetura do framework.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Este trabalho apresenta uma proposta de framework para processamento de stream distribuído em C++ com MPI. A etapa inicial do estudo aborda a problemática de pesquisa e a concepção da arquitetura do framework. |
| Hoffmann, Renato Barreto; Griebler, Dalvan; Fernandes, Luiz Gustavo Towards Efficient Stream Parallelism for Embedded Devices Inproceedings doi In: Anais da XXII Escola Regional de Alto Desempenho da Região Sul, pp. 62-64, Sociedade Brasileira de Computação, Curitiba, Brazil, 2022. @inproceedings{HOFFMANN:ERAD:22,
title = {Towards Efficient Stream Parallelism for Embedded Devices},
author = {Renato Barreto Hoffmann and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.5753/eradrs.2022.19163},
doi = {10.5753/eradrs.2022.19163},
year = {2022},
date = {2022-04-01},
booktitle = {Anais da XXII Escola Regional de Alto Desempenho da Região Sul},
pages = {62-64},
publisher = {Sociedade Brasileira de Computação},
address = {Curitiba, Brazil},
abstract = {Stream processing applications process raw data-flows to reveal insightful information. Efficiently coordinating the requirements of these applications is a challenge. We propose investigating high-level software solutions for these applications to achieve efficiency and high performance for embedded devices.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Stream processing applications process raw data-flows to reveal insightful information. Efficiently coordinating the requirements of these applications is a challenge. We propose investigating high-level software solutions for these applications to achieve efficiency and high performance for embedded devices. |
| Araujo, Gabriell; Griebler, Dalvan; Fernandes, Luiz Gustavo Provendo melhorias na GSParLib Inproceedings doi In: Anais da XXII Escola Regional de Alto Desempenho da Região Sul, pp. 113-114, Sociedade Brasileira de Computação, Curitiba, Brazil, 2022. @inproceedings{ARAUJO:ERAD:22,
title = {Provendo melhorias na GSParLib},
author = {Gabriell Araujo and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.5753/eradrs.2022.19188},
doi = {10.5753/eradrs.2022.19188},
year = {2022},
date = {2022-04-01},
booktitle = {Anais da XXII Escola Regional de Alto Desempenho da Região Sul},
pages = {113-114},
publisher = {Sociedade Brasileira de Computação},
address = {Curitiba, Brazil},
abstract = {Neste trabalho são apresentados resultados parciais do estudo que está sendo conduzido para prover melhorias de programabilidade e desempenho no framework de programação para GPUs GSParLib.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Neste trabalho são apresentados resultados parciais do estudo que está sendo conduzido para prover melhorias de programabilidade e desempenho no framework de programação para GPUs GSParLib. |
| Scheer, Claudio; Araujo, Gabriell; Griebler, Dalvan; Meneguzzi, Felipe; Fernandes, Luiz Gustavo Encontrando a Configuração de Threads por Bloco para os Kernels NPB-CUDA com Q-Learning Inproceedings doi In: Anais da XXII Escola Regional de Alto Desempenho da Região Sul, pp. 119-120, Sociedade Brasileira de Computação, Curitiba, Brazil, 2022. @inproceedings{SCHEER:ERAD:22,
title = {Encontrando a Configuração de Threads por Bloco para os Kernels NPB-CUDA com Q-Learning},
author = {Claudio Scheer and Gabriell Araujo and Dalvan Griebler and Felipe Meneguzzi and Luiz Gustavo Fernandes},
url = {https://doi.org/10.5753/eradrs.2022.19191},
doi = {10.5753/eradrs.2022.19191},
year = {2022},
date = {2022-04-01},
booktitle = {Anais da XXII Escola Regional de Alto Desempenho da Região Sul},
pages = {119-120},
publisher = {Sociedade Brasileira de Computação},
address = {Curitiba, Brazil},
abstract = {Este trabalho apresenta um novo método que utiliza aprendizado de máquina para prever a melhor configuração de threads por bloco para aplicações de GPUs. Os resultados foram similares a estratégias manuais.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Este trabalho apresenta um novo método que utiliza aprendizado de máquina para prever a melhor configuração de threads por bloco para aplicações de GPUs. Os resultados foram similares a estratégias manuais. |
| Fim, Gabriel; Welter, Greice; Löff, Júnior; Griebler, Dalvan Compressão de Dados em Clusters HPC com Flink, MPI e SPar Inproceedings doi In: Anais da XXII Escola Regional de Alto Desempenho da Região Sul, pp. 29-32, Sociedade Brasileira de Computação, Curitiba, Brazil, 2022. @inproceedings{larcc:FIM:ERAD:22,
title = {Compressão de Dados em Clusters HPC com Flink, MPI e SPar},
author = {Gabriel Fim and Greice Welter and Júnior Löff and Dalvan Griebler},
url = {https://doi.org/10.5753/eradrs.2022.19153},
doi = {10.5753/eradrs.2022.19153},
year = {2022},
date = {2022-04-01},
booktitle = {Anais da XXII Escola Regional de Alto Desempenho da Região Sul},
pages = {29-32},
publisher = {Sociedade Brasileira de Computação},
address = {Curitiba, Brazil},
abstract = {Este trabalho visa avaliar o desempenho do algoritmo de compressão de dados Bzip2 com as ferramentas de processamento de stream Apache Flink, MPI e SPar utilizando um cluster Beowulf. Os resultados mostram que as versões com maior desempenho em relação ao tempo sequencial são o MPI e SPar com speed-up de 7,6 e 7,2 vezes, respectivamente.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Este trabalho visa avaliar o desempenho do algoritmo de compressão de dados Bzip2 com as ferramentas de processamento de stream Apache Flink, MPI e SPar utilizando um cluster Beowulf. Os resultados mostram que as versões com maior desempenho em relação ao tempo sequencial são o MPI e SPar com speed-up de 7,6 e 7,2 vezes, respectivamente. |
 | Gomes, Márcio Miguel; Righi, Rodrigo Rosa; Costa, Cristiano André; Griebler, Dalvan Steam++: An Extensible End-to-end Framework for Developing IoT Data Processing Applications in the Fog Journal Article doi In: International Journal of Computer Science & Information Technology, vol. 14, no. 1, pp. 31-51, 2022. @article{GOMES:IJCSIT:22,
title = {Steam++: An Extensible End-to-end Framework for Developing IoT Data Processing Applications in the Fog},
author = {Márcio Miguel Gomes and Rodrigo Rosa Righi and Cristiano André Costa and Dalvan Griebler},
url = {http://dx.doi.org/10.5121/ijcsit.2022.14103},
doi = {10.5121/ijcsit.2022.14103},
year = {2022},
date = {2022-02-01},
urldate = {2022-02-01},
journal = {International Journal of Computer Science & Information Technology},
volume = {14},
number = {1},
pages = {31-51},
publisher = {AIRCC},
abstract = {IoT applications usually rely on cloud computing services to perform data analysis such as filtering, aggregation, classification, pattern detection, and prediction. When applied to specific domains, the IoT needs to deal with unique constraints. Besides the hostile environment such as vibration and electricmagnetic interference, resulting in malfunction, noise, and data loss, industrial plants often have Internet access restricted or unavailable, forcing us to design stand-alone fog and edge computing solutions. In this context, we present STEAM++, a lightweight and extensible framework for real-time data stream processing and decision-making in the network edge, targeting hardware-limited devices, besides proposing a micro-benchmark methodology for assessing embedded IoT applications. In real-case experiments in a semiconductor industry, we processed an entire data flow, from values sensing, processing and analysing data, detecting relevant events, and finally, publishing results to a dashboard. On average, the application consumed less than 500kb RAM and 1.0% of CPU usage, processing up to 239 data packets per second and reducing the output data size to 14% of the input raw data size when notifying events.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
IoT applications usually rely on cloud computing services to perform data analysis such as filtering, aggregation, classification, pattern detection, and prediction. When applied to specific domains, the IoT needs to deal with unique constraints. Besides the hostile environment such as vibration and electricmagnetic interference, resulting in malfunction, noise, and data loss, industrial plants often have Internet access restricted or unavailable, forcing us to design stand-alone fog and edge computing solutions. In this context, we present STEAM++, a lightweight and extensible framework for real-time data stream processing and decision-making in the network edge, targeting hardware-limited devices, besides proposing a micro-benchmark methodology for assessing embedded IoT applications. In real-case experiments in a semiconductor industry, we processed an entire data flow, from values sensing, processing and analysing data, detecting relevant events, and finally, publishing results to a dashboard. On average, the application consumed less than 500kb RAM and 1.0% of CPU usage, processing up to 239 data packets per second and reducing the output data size to 14% of the input raw data size when notifying events. |
 | Hoffmann, Renato Barreto; Löff, Júnior; Griebler, Dalvan; Fernandes, Luiz Gustavo OpenMP as runtime for providing high-level stream parallelism on multi-cores Journal Article doi In: The Journal of Supercomputing, vol. 78, no. 1, pp. 7655-7676, 2022. @article{HOFFMANN:Jsuper:2022,
title = {OpenMP as runtime for providing high-level stream parallelism on multi-cores},
author = {Renato Barreto Hoffmann and Júnior Löff and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1007/s11227-021-04182-9},
doi = {10.1007/s11227-021-04182-9},
year = {2022},
date = {2022-01-01},
journal = {The Journal of Supercomputing},
volume = {78},
number = {1},
pages = {7655-7676},
publisher = {Springer},
address = {New York, United States},
abstract = {OpenMP is an industry and academic standard for parallel programming. However, using it for developing parallel stream processing applications is complex and challenging. OpenMP lacks key programming mechanisms and abstractions for this particular domain. To tackle this problem, we used a high-level parallel programming framework (named SPar) for automatically generating parallel OpenMP code. We achieved this by leveraging SPar’s language and its domain-specific code annotations for simplifying the complexity and verbosity added by OpenMP in this application domain. Consequently, we implemented a new compiler algorithm in SPar for automatically generating parallel code targeting the OpenMP runtime using source-to-source code transformations. The experiments in four different stream processing applications demonstrated that the execution time of SPar was improved up to 25.42% when using the OpenMP runtime. Additionally, our abstraction over OpenMP introduced at most 1.72% execution time overhead when compared to handwritten parallel codes. Furthermore, SPar significantly reduces the total source lines of code required to express parallelism with respect to plain OpenMP parallel codes.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
OpenMP is an industry and academic standard for parallel programming. However, using it for developing parallel stream processing applications is complex and challenging. OpenMP lacks key programming mechanisms and abstractions for this particular domain. To tackle this problem, we used a high-level parallel programming framework (named SPar) for automatically generating parallel OpenMP code. We achieved this by leveraging SPar’s language and its domain-specific code annotations for simplifying the complexity and verbosity added by OpenMP in this application domain. Consequently, we implemented a new compiler algorithm in SPar for automatically generating parallel code targeting the OpenMP runtime using source-to-source code transformations. The experiments in four different stream processing applications demonstrated that the execution time of SPar was improved up to 25.42% when using the OpenMP runtime. Additionally, our abstraction over OpenMP introduced at most 1.72% execution time overhead when compared to handwritten parallel codes. Furthermore, SPar significantly reduces the total source lines of code required to express parallelism with respect to plain OpenMP parallel codes. |
 | Löff, Júnior; Hoffmann, Renato Barreto; Pieper, Ricardo; Griebler, Dalvan; Fernandes, Luiz Gustavo DSParLib: A C++ Template Library for Distributed Stream Parallelism Journal Article doi In: International Journal of Parallel Programming, vol. 50, no. 5, pp. 454-485, 2022. @article{LOFF:IJPP:22,
title = {DSParLib: A C++ Template Library for Distributed Stream Parallelism},
author = {Júnior Löff and Renato Barreto Hoffmann and Ricardo Pieper and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1007/s10766-022-00737-2},
doi = {10.1007/s10766-022-00737-2},
year = {2022},
date = {2022-01-01},
journal = {International Journal of Parallel Programming},
volume = {50},
number = {5},
pages = {454-485},
publisher = {Springer},
abstract = {Stream processing applications deal with millions of data items continuously generated over time. Often, they must be processed in real-time and scale performance, which requires the use of distributed parallel computing resources. In C/C++, the current state-of-the-art for distributed architectures and High-Performance Computing is Message Passing Interface (MPI). However, exploiting stream parallelism using MPI is complex and error-prone because it exposes many low-level details to the programmer. In this work, we introduce a new parallel programming abstraction for implementing distributed stream parallelism named DSParLib. Our abstraction of MPI simplifies parallel programming by providing a pattern-based and building block-oriented development to inter-connect, model, and parallelize data streams found in modern applications. Experiments conducted with five different stream processing applications and the representative PARSEC Ferret benchmark revealed that DSParLib is efficient and flexible. Also, DSParLib achieved similar or better performance, required less coding, and provided simpler abstractions to express parallelism with respect to handwritten MPI programs.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Stream processing applications deal with millions of data items continuously generated over time. Often, they must be processed in real-time and scale performance, which requires the use of distributed parallel computing resources. In C/C++, the current state-of-the-art for distributed architectures and High-Performance Computing is Message Passing Interface (MPI). However, exploiting stream parallelism using MPI is complex and error-prone because it exposes many low-level details to the programmer. In this work, we introduce a new parallel programming abstraction for implementing distributed stream parallelism named DSParLib. Our abstraction of MPI simplifies parallel programming by providing a pattern-based and building block-oriented development to inter-connect, model, and parallelize data streams found in modern applications. Experiments conducted with five different stream processing applications and the representative PARSEC Ferret benchmark revealed that DSParLib is efficient and flexible. Also, DSParLib achieved similar or better performance, required less coding, and provided simpler abstractions to express parallelism with respect to handwritten MPI programs. |
2021
|
| Löff, Júnior; Hoffmann, Renato Barreto; Griebler, Dalvan; Fernandes, Luiz G. High-Level Stream and Data Parallelism in C++ for Multi-Cores Inproceedings doi In: XXV Brazilian Symposium on Programming Languages (SBLP), pp. 41-48, ACM, Joinville, Brazil, 2021. @inproceedings{LOFF:SBLP:21,
title = {High-Level Stream and Data Parallelism in C++ for Multi-Cores},
author = {Júnior Löff and Renato Barreto Hoffmann and Dalvan Griebler and Luiz G. Fernandes},
url = {https://doi.org/10.1145/3475061.3475078},
doi = {10.1145/3475061.3475078},
year = {2021},
date = {2021-10-01},
booktitle = {XXV Brazilian Symposium on Programming Languages (SBLP)},
pages = {41-48},
publisher = {ACM},
address = {Joinville, Brazil},
series = {SBLP'21},
abstract = {Stream processing applications have seen an increasing demand with the increased availability of sensors, IoT devices, and user data. Modern systems can generate millions of data items per day that require to be processed timely. To deal with this demand, application programmers must consider parallelism to exploit the maximum performance of the underlying hardware resources. However, parallel programming is often difficult and error-prone, because programmers must deal with low-level system and architecture details. In this work, we introduce a new strategy for automatic data-parallel code generation in C++ targeting multi-core architectures. This strategy was integrated with an annotation-based parallel programming abstraction named SPar. We have increased SPar’s expressiveness for supporting stream and data parallelism, and their arbitrary composition. Therefore, we added two new attributes to its language and improved the compiler parallel code generation. We conducted a set of experiments on different stream and data-parallel applications to assess the efficiency of our solution. The results showed that the new SPar version obtained similar performance with respect to handwritten parallelizations. Moreover, the new SPar version is able to achieve up to 74.9x better performance with respect to the original ones due to this work.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Stream processing applications have seen an increasing demand with the increased availability of sensors, IoT devices, and user data. Modern systems can generate millions of data items per day that require to be processed timely. To deal with this demand, application programmers must consider parallelism to exploit the maximum performance of the underlying hardware resources. However, parallel programming is often difficult and error-prone, because programmers must deal with low-level system and architecture details. In this work, we introduce a new strategy for automatic data-parallel code generation in C++ targeting multi-core architectures. This strategy was integrated with an annotation-based parallel programming abstraction named SPar. We have increased SPar’s expressiveness for supporting stream and data parallelism, and their arbitrary composition. Therefore, we added two new attributes to its language and improved the compiler parallel code generation. We conducted a set of experiments on different stream and data-parallel applications to assess the efficiency of our solution. The results showed that the new SPar version obtained similar performance with respect to handwritten parallelizations. Moreover, the new SPar version is able to achieve up to 74.9x better performance with respect to the original ones due to this work. |
| Andrade, Gabriella; Griebler, Dalvan; Santos, Rodrigo; Danelutto, Marco; Fernandes, Luiz Gustavo Assessing Coding Metrics for Parallel Programming of Stream Processing Programs on Multi-cores Inproceedings doi In: 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA 2021), pp. 291-295, IEEE, Pavia, Italy, 2021. @inproceedings{ANDRADE:SEAA:21,
title = {Assessing Coding Metrics for Parallel Programming of Stream Processing Programs on Multi-cores},
author = {Gabriella Andrade and Dalvan Griebler and Rodrigo Santos and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1109/SEAA53835.2021.00044},
doi = {10.1109/SEAA53835.2021.00044},
year = {2021},
date = {2021-09-01},
booktitle = {47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA 2021)},
pages = {291-295},
publisher = {IEEE},
address = {Pavia, Italy},
series = {SEAA'21},
abstract = {From the popularization of multi-core architectures, several parallel APIs have emerged, helping to abstract the programming complexity and increasing productivity in application development. Unfortunately, only a few research efforts in this direction managed to show the usability pay-back of the programming abstraction created, because it is not easy and poses many challenges for conducting empirical software engineering. We believe that coding metrics commonly used in software engineering code measurements can give useful indicators on the programming effort of parallel applications and APIs. These metrics were designed for general purposes without considering the evaluation of applications from a specific domain. In this study, we aim to evaluate the feasibility of seven coding metrics to be used in the parallel programming domain. To do so, five stream processing applications implemented with different parallel APIs for multi-cores were considered. Our experiments have shown COCOMO II is a suitable model for evaluating the productivity of different parallel APIs targeting multi-cores on stream processing applications while other metrics are restricted to the code size.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
From the popularization of multi-core architectures, several parallel APIs have emerged, helping to abstract the programming complexity and increasing productivity in application development. Unfortunately, only a few research efforts in this direction managed to show the usability pay-back of the programming abstraction created, because it is not easy and poses many challenges for conducting empirical software engineering. We believe that coding metrics commonly used in software engineering code measurements can give useful indicators on the programming effort of parallel applications and APIs. These metrics were designed for general purposes without considering the evaluation of applications from a specific domain. In this study, we aim to evaluate the feasibility of seven coding metrics to be used in the parallel programming domain. To do so, five stream processing applications implemented with different parallel APIs for multi-cores were considered. Our experiments have shown COCOMO II is a suitable model for evaluating the productivity of different parallel APIs targeting multi-cores on stream processing applications while other metrics are restricted to the code size. |
 | Löff, Júnior; Griebler, Dalvan; Mencagli, Gabriele; Araujo, Gabriell; Torquati, Massimo; Danelutto, Marco; Fernandes, Luiz Gustavo The NAS parallel benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures Journal Article doi In: Future Generation Computer Systems, vol. 125, pp. 743-757, 2021. @article{LOFF:FGCS:21,
title = {The NAS parallel benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures},
author = {Júnior Löff and Dalvan Griebler and Gabriele Mencagli and Gabriell Araujo and Massimo Torquati and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1016/j.future.2021.07.021},
doi = {10.1016/j.future.2021.07.021},
year = {2021},
date = {2021-07-01},
journal = {Future Generation Computer Systems},
volume = {125},
pages = {743-757},
publisher = {Elsevier},
abstract = {The NAS Parallel Benchmarks (NPB), originally implemented mostly in Fortran, is a consolidated suite containing several benchmarks extracted from Computational Fluid Dynamics (CFD) models. The benchmark suite has important characteristics such as intensive memory communications, complex data dependencies, different memory access patterns, and hardware components/sub-systems overload. Parallel programming APIs, libraries, and frameworks that are written in C++ as well as new optimizations and parallel processing techniques can benefit if NPB is made fully available in this programming language. In this paper we present NPB-CPP, a fully C++ translated version of NPB consisting of all the NPB kernels and pseudo-applications developed using OpenMP, Intel TBB, and FastFlow parallel frameworks for multicores. The design of NPB-CPP leverages the Structured Parallel Programming methodology (essentially based on parallel design patterns). We show the structure of each benchmark application in terms of composition of few patterns (notably Map and MapReduce constructs) provided by the selected C++ frameworks. The experimental evaluation shows the accuracy of NPB-CPP with respect to the original NPB source code. Furthermore, we carefully evaluate the parallel performance on three multi-core systems (Intel, IBM Power and AMD) with different C++ compilers (gcc, icc and clang) by discussing the performance differences in order to give to the researchers useful insights to choose the best parallel programming framework for a given type of problem.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
The NAS Parallel Benchmarks (NPB), originally implemented mostly in Fortran, is a consolidated suite containing several benchmarks extracted from Computational Fluid Dynamics (CFD) models. The benchmark suite has important characteristics such as intensive memory communications, complex data dependencies, different memory access patterns, and hardware components/sub-systems overload. Parallel programming APIs, libraries, and frameworks that are written in C++ as well as new optimizations and parallel processing techniques can benefit if NPB is made fully available in this programming language. In this paper we present NPB-CPP, a fully C++ translated version of NPB consisting of all the NPB kernels and pseudo-applications developed using OpenMP, Intel TBB, and FastFlow parallel frameworks for multicores. The design of NPB-CPP leverages the Structured Parallel Programming methodology (essentially based on parallel design patterns). We show the structure of each benchmark application in terms of composition of few patterns (notably Map and MapReduce constructs) provided by the selected C++ frameworks. The experimental evaluation shows the accuracy of NPB-CPP with respect to the original NPB source code. Furthermore, we carefully evaluate the parallel performance on three multi-core systems (Intel, IBM Power and AMD) with different C++ compilers (gcc, icc and clang) by discussing the performance differences in order to give to the researchers useful insights to choose the best parallel programming framework for a given type of problem. |
 | Pieper, Ricardo; Löff, Júnior; Hoffmann, Renato Berreto; Griebler, Dalvan; Fernandes, Luiz Gustavo High-level and Efficient Structured Stream Parallelism for Rust on Multi-cores Journal Article doi In: Journal of Computer Languages, vol. 65, pp. 101054, 2021. @article{PIEPER:COLA:21,
title = {High-level and Efficient Structured Stream Parallelism for Rust on Multi-cores},
author = {Ricardo Pieper and Júnior Löff and Renato Berreto Hoffmann and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1016/j.cola.2021.101054},
doi = {10.1016/j.cola.2021.101054},
year = {2021},
date = {2021-07-01},
journal = {Journal of Computer Languages},
volume = {65},
pages = {101054},
publisher = {Elsevier},
abstract = {This work aims at contributing with a structured parallel programming abstraction for Rust in order to provide ready-to-use parallel patterns that abstract low-level and architecture-dependent details from application programmers. We focus on stream processing applications running on shared-memory multi-core architectures (i.e, video processing, compression, and others). Therefore, we provide a new high-level and efficient parallel programming abstraction for expressing stream parallelism, named Rust-SSP. We also created a new stream benchmark suite for Rust that represents real-world scenarios and has different application characteristics and workloads. Our benchmark suite is an initiative to assess existing parallelism abstraction for this domain, as parallel implementations using these abstractions were provided. The results revealed that Rust-SSP achieved up to 41.1% better performance than other solutions. In terms of programmability, the results revealed that Rust-SSP requires the smallest number of extra lines of code to enable stream parallelism..},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
This work aims at contributing with a structured parallel programming abstraction for Rust in order to provide ready-to-use parallel patterns that abstract low-level and architecture-dependent details from application programmers. We focus on stream processing applications running on shared-memory multi-core architectures (i.e, video processing, compression, and others). Therefore, we provide a new high-level and efficient parallel programming abstraction for expressing stream parallelism, named Rust-SSP. We also created a new stream benchmark suite for Rust that represents real-world scenarios and has different application characteristics and workloads. Our benchmark suite is an initiative to assess existing parallelism abstraction for this domain, as parallel implementations using these abstractions were provided. The results revealed that Rust-SSP achieved up to 41.1% better performance than other solutions. In terms of programmability, the results revealed that Rust-SSP requires the smallest number of extra lines of code to enable stream parallelism.. |
 | Gomes, Márcio Miguel; Righi, Rodrigo Rosa; Costa, Cristiano André; Griebler, Dalvan Simplifying IoT data stream enrichment and analytics in the edge Journal Article doi In: Computers & Electrical Engineering, vol. 92, pp. 107110, 2021. @article{GOMES:CEE:21,
title = {Simplifying IoT data stream enrichment and analytics in the edge},
author = {Márcio Miguel Gomes and Rodrigo Rosa Righi and Cristiano André Costa and Dalvan Griebler},
url = {https://doi.org/10.1016/j.compeleceng.2021.107110},
doi = {10.1016/j.compeleceng.2021.107110},
year = {2021},
date = {2021-06-01},
urldate = {2021-06-01},
journal = {Computers & Electrical Engineering},
volume = {92},
pages = {107110},
publisher = {Elsevier},
abstract = {Edge devices are usually limited in resources. They often send data to the cloud, where techniques such as filtering, aggregation, classification, pattern detection, and prediction are performed. This process results in critical issues such as data loss, high response time, and overhead. On the other hand, processing data in the edge is not a simple task due to devices’ heterogeneity, resource limitations, a variety of programming languages and standards. In this context, this work proposes STEAM, a framework for developing data stream processing applications in the edge targeting hardware-limited devices. As the main contribution, STEAM enables the development of applications for different platforms, with standardized functions and class structures that use consolidated IoT data formats and communication protocols. Moreover, the experiments revealed the viability of stream processing in the edge resulting in the reduction of response time without compromising the quality of results.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Edge devices are usually limited in resources. They often send data to the cloud, where techniques such as filtering, aggregation, classification, pattern detection, and prediction are performed. This process results in critical issues such as data loss, high response time, and overhead. On the other hand, processing data in the edge is not a simple task due to devices’ heterogeneity, resource limitations, a variety of programming languages and standards. In this context, this work proposes STEAM, a framework for developing data stream processing applications in the edge targeting hardware-limited devices. As the main contribution, STEAM enables the development of applications for different platforms, with standardized functions and class structures that use consolidated IoT data formats and communication protocols. Moreover, the experiments revealed the viability of stream processing in the edge resulting in the reduction of response time without compromising the quality of results. |
| Hoffmann, Renato Barreto; Griebler, Dalvan; Fernandes, Luiz Gustavo Geração de Código OpenMP para o Paralelismo deStream Journal Article In: Revista Eletrônica de Iniciação Científica em Computação, vol. 19, no. 2, pp. 2082, 2021. @article{HOFFMANN:REIC:21,
title = {Geração de Código OpenMP para o Paralelismo deStream},
author = {Renato Barreto Hoffmann and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://sol.sbc.org.br/journals/index.php/reic/article/view/2082},
year = {2021},
date = {2021-06-01},
journal = {Revista Eletrônica de Iniciação Científica em Computação},
volume = {19},
number = {2},
pages = {2082},
publisher = {Sociedade Brasileira de Computação (SBC)},
address = {Porto Alegre},
abstract = {OpenMP é uma interface para a programação paralela padrão e amplamente usada na indústria e academia, porém, torna-se complexa quando usada para desenvolver aplicações paralelas de fluxo de dados ou stream. Para resolver esse problema, foi proposto usar uma interface de programação paralela de alto nível (chamada SPar) e seu compilador para a geração de código estruturado de mais baixo nível com OpenMP em aplicações de fluxo de dados. O objetivo é diminuir a complexidade e verbosidade introduzida pelo OpenMP nas aplicações de stream. Nos experimentos em 4 aplicações, notou-se uma redução no tempo de execução de até 25,42%. Além do mais, requer-se um número de linhas de código fonte menor para expressar o paralelismo.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
OpenMP é uma interface para a programação paralela padrão e amplamente usada na indústria e academia, porém, torna-se complexa quando usada para desenvolver aplicações paralelas de fluxo de dados ou stream. Para resolver esse problema, foi proposto usar uma interface de programação paralela de alto nível (chamada SPar) e seu compilador para a geração de código estruturado de mais baixo nível com OpenMP em aplicações de fluxo de dados. O objetivo é diminuir a complexidade e verbosidade introduzida pelo OpenMP nas aplicações de stream. Nos experimentos em 4 aplicações, notou-se uma redução no tempo de execução de até 25,42%. Além do mais, requer-se um número de linhas de código fonte menor para expressar o paralelismo. |
| Löff, Júnior; Griebler, Dalvan; Fernandes, Luiz Gustavo Melhorando a Geração Automática de Código Paralelo para o Paradigma de Processamento de Stream em Multi-cores Journal Article In: Revista Eletrônica de Iniciação Científica em Computação, vol. 19, no. 2, pp. 2083, 2021. @article{LOFF:REIC:21,
title = {Melhorando a Geração Automática de Código Paralelo para o Paradigma de Processamento de Stream em Multi-cores},
author = {Júnior Löff and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://sol.sbc.org.br/journals/index.php/reic/article/view/2083},
year = {2021},
date = {2021-06-01},
journal = {Revista Eletrônica de Iniciação Científica em Computação},
volume = {19},
number = {2},
pages = {2083},
publisher = {Sociedade Brasileira de Computação (SBC)},
address = {Porto Alegre},
abstract = {A programação paralela ainda é um desafio para desenvolvedores, pois exibe demasiados detalhes de baixo nível e de sistemas operacionais. Programadores precisam lidar com detalhes como escalonamento, balanceamento de carga e sincronizações. Esse trabalho contribui com otimizações para uma abstração de programação paralela para expressar paralelismo de stream em multi-cores. O trabalho estendeu a SPar adicionando dois novos atributos na sua linguagem, e implementou melhorias no seu compilador a fim de proporcionar melhor desempenho ao código paralelo gerado automaticamente. Os experimentos revelaram que a nova versão da SPar consegue abstrair detalhes do paralelismo com desempenho similar às versões paralelizadas manualmente.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
A programação paralela ainda é um desafio para desenvolvedores, pois exibe demasiados detalhes de baixo nível e de sistemas operacionais. Programadores precisam lidar com detalhes como escalonamento, balanceamento de carga e sincronizações. Esse trabalho contribui com otimizações para uma abstração de programação paralela para expressar paralelismo de stream em multi-cores. O trabalho estendeu a SPar adicionando dois novos atributos na sua linguagem, e implementou melhorias no seu compilador a fim de proporcionar melhor desempenho ao código paralelo gerado automaticamente. Os experimentos revelaram que a nova versão da SPar consegue abstrair detalhes do paralelismo com desempenho similar às versões paralelizadas manualmente. |
| and, Anderson M. Maliszewski Ambiente de Nuvem Computacional Privada paraTeste e Desenvolvimento de Programas Paralelos Incollection doi In: Charão, Andrea; Serpa, Matheus (Ed.): Minicursos da XXI Escola Regional de Alto Desempenho da Região Sul, pp. 104-128, Sociedade Brasileira de Computação (SBC), Porto Alegre, 2021. @incollection{larcc:minicurso:ERAD:21,
title = {Ambiente de Nuvem Computacional Privada paraTeste e Desenvolvimento de Programas Paralelos},
author = {Anderson M. Maliszewski and},
editor = {Andrea Charão and Matheus Serpa},
url = {https://doi.org/10.5753/sbc.6150.4},
doi = {10.5753/sbc.6150.4},
year = {2021},
date = {2021-06-01},
booktitle = {Minicursos da XXI Escola Regional de Alto Desempenho da Região Sul},
pages = {104-128},
publisher = {Sociedade Brasileira de Computação (SBC)},
address = {Porto Alegre},
chapter = {5},
abstract = {A computação de alto desempenho costuma utilizar agregados de computadores para aexecução de aplicações paralelas. Alternativamente, a computação em nuvem oferecerecursos computacionais distribuídos para processamento com um nível de abstraçãoalém do tradicional, dinâmico e sob-demanda. Este capítulo tem como objetivo intro-duzir conceitos básicos, apresentar noções básicas para implantar uma nuvem privadae demonstrar os benefícios para o desenvolvimento e teste de programas paralelos emnuvem},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}
A computação de alto desempenho costuma utilizar agregados de computadores para aexecução de aplicações paralelas. Alternativamente, a computação em nuvem oferecerecursos computacionais distribuídos para processamento com um nível de abstraçãoalém do tradicional, dinâmico e sob-demanda. Este capítulo tem como objetivo intro-duzir conceitos básicos, apresentar noções básicas para implantar uma nuvem privadae demonstrar os benefícios para o desenvolvimento e teste de programas paralelos emnuvem |
 | Vogel, Adriano; Mencagli, Gabriele; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo Online and Transparent Self-adaptation of Stream Parallel Patterns Journal Article doi In: Computing, vol. 105, no. 5, pp. 1039-1057, 2021. @article{VOGEL:Computing:23,
title = {Online and Transparent Self-adaptation of Stream Parallel Patterns},
author = {Adriano Vogel and Gabriele Mencagli and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1007/s00607-021-00998-8},
doi = {10.1007/s00607-021-00998-8},
year = {2021},
date = {2021-05-01},
journal = {Computing},
volume = {105},
number = {5},
pages = {1039-1057},
publisher = {Springer},
abstract = {Several real-world parallel applications are becoming more dynamic and long-running, demanding online (at run-time) adaptations. Stream processing is a representative scenario that computes data items arriving in real-time and where parallel executions are necessary. However, it is challenging for humans to monitor and manually self-optimize complex and long-running parallel executions continuously. Moreover, although high-level and structured parallel programming aims to facilitate parallelism, several issues still need to be addressed for improving the existing abstractions. In this paper, we extend self-adaptiveness for supporting autonomous and online changes of the parallel pattern compositions. Online self-adaptation is achieved with an online profiler that characterizes the applications, which is combined with a new self-adaptive strategy and a model for smooth transitions on reconfigurations. The solution provides a new abstraction layer that enables application programmers to define non-functional requirements instead of hand-tuning complex configurations. Hence, we contribute with additional abstractions and flexible self-adaptation for responsiveness at run-time. The proposed solution is evaluated with applications having different processing characteristics, workloads, and configurations. The results show that it is possible to provide additional abstractions, flexibility, and responsiveness while achieving performance comparable to the best static configuration executions.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Several real-world parallel applications are becoming more dynamic and long-running, demanding online (at run-time) adaptations. Stream processing is a representative scenario that computes data items arriving in real-time and where parallel executions are necessary. However, it is challenging for humans to monitor and manually self-optimize complex and long-running parallel executions continuously. Moreover, although high-level and structured parallel programming aims to facilitate parallelism, several issues still need to be addressed for improving the existing abstractions. In this paper, we extend self-adaptiveness for supporting autonomous and online changes of the parallel pattern compositions. Online self-adaptation is achieved with an online profiler that characterizes the applications, which is combined with a new self-adaptive strategy and a model for smooth transitions on reconfigurations. The solution provides a new abstraction layer that enables application programmers to define non-functional requirements instead of hand-tuning complex configurations. Hence, we contribute with additional abstractions and flexible self-adaptation for responsiveness at run-time. The proposed solution is evaluated with applications having different processing characteristics, workloads, and configurations. The results show that it is possible to provide additional abstractions, flexibility, and responsiveness while achieving performance comparable to the best static configuration executions. |
| Dopke, Luan; Rockenbach, Dinei André; Griebler, Dalvan Avaliação de Desempenho para Banco de Dados com Genoma em Nuvem Privada Inproceedings doi In: 21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 45-48, Sociedade Brasileira de Computação, Joinville, RS, Brazil, 2021. @inproceedings{larcc:cloud_DNA_databases:ERAD:21,
title = {Avaliação de Desempenho para Banco de Dados com Genoma em Nuvem Privada},
author = {Luan Dopke and Dinei André Rockenbach and Dalvan Griebler},
url = {https://doi.org/10.5753/eradrs.2021.14771},
doi = {10.5753/eradrs.2021.14771},
year = {2021},
date = {2021-04-01},
booktitle = {21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {45-48},
publisher = {Sociedade Brasileira de Computação},
address = {Joinville, RS, Brazil},
abstract = {Os bancos de dados são ferramentas particularmente interessantes para a manipulação de dados gerados através do sequenciamento de DNA. Este artigo tem como objetivo avaliar o desempenho de três bancos de dados com cargas relacionadas ao sequenciamento de DNA: PostgreSQL e MySQL como bancos de dados relacionais e MongoDB como banco de dados NoSQL. Os resultados demonstram que o PostgreSQL se sobressai aos demais..},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Os bancos de dados são ferramentas particularmente interessantes para a manipulação de dados gerados através do sequenciamento de DNA. Este artigo tem como objetivo avaliar o desempenho de três bancos de dados com cargas relacionadas ao sequenciamento de DNA: PostgreSQL e MySQL como bancos de dados relacionais e MongoDB como banco de dados NoSQL. Os resultados demonstram que o PostgreSQL se sobressai aos demais.. |
| Vanzan, Anthony; Fim, Gabriel; Welter, Greice; Griebler, Dalvan Aceleração da Classificação de Lavouras de Milho com MPI e Estratégias de Paralelismo Inproceedings doi In: 21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 49-52, Sociedade Brasileira de Computação, Joinville, RS, Brazil, 2021. @inproceedings{larcc:DL_Classificaiton_MPI:ERAD:21,
title = {Aceleração da Classificação de Lavouras de Milho com MPI e Estratégias de Paralelismo},
author = {Anthony Vanzan and Gabriel Fim and Greice Welter and Dalvan Griebler},
url = {https://doi.org/10.5753/eradrs.2021.14772},
doi = {10.5753/eradrs.2021.14772},
year = {2021},
date = {2021-04-01},
booktitle = {21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {49-52},
publisher = {Sociedade Brasileira de Computação},
address = {Joinville, RS, Brazil},
abstract = {Este trabalho visou acelerar a execução de um algoritmo de classificação de lavouras em imagens áreas. Para isso, foram implementadas diferentes versões paralelas usando a biblioteca MPI na linguagem Python. A avaliação foi conduzida em dois ambientes computacionais. Conclui-se que é possível reduzir o tempo de execução a medida que mais recursos paralelos são usados e a estratégia de distribuição de trabalho dinâmica é mais eficiente.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Este trabalho visou acelerar a execução de um algoritmo de classificação de lavouras em imagens áreas. Para isso, foram implementadas diferentes versões paralelas usando a biblioteca MPI na linguagem Python. A avaliação foi conduzida em dois ambientes computacionais. Conclui-se que é possível reduzir o tempo de execução a medida que mais recursos paralelos são usados e a estratégia de distribuição de trabalho dinâmica é mais eficiente. |
| Löff, Júnior; Griebler, Dalvan; Fernandes, Luiz Gustavo Melhorando a Geração Automática de Código Paralelo em Arquiteturas Multi-core na SPar Inproceedings doi In: 21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 65-68, Sociedade Brasileira de Computação, Joinville, Brazil, 2021. @inproceedings{LOFF:ERAD:21,
title = {Melhorando a Geração Automática de Código Paralelo em Arquiteturas Multi-core na SPar},
author = {Júnior Löff and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.5753/eradrs.2021.14776},
doi = {10.5753/eradrs.2021.14776},
year = {2021},
date = {2021-04-01},
booktitle = {21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {65-68},
publisher = {Sociedade Brasileira de Computação},
address = {Joinville, Brazil},
abstract = {Neste trabalho, a fim de melhorar a eficiência do código paralelo gerado em arquiteturas multi-core, foi estendida a linguagem e o compilador da SPar para permitir a geração automática de padrões paralelos pertencentes aos dois principais domínios de paralelismo, o de stream e de dados. Experimentos mostram que a nova versão da SPar obteve resultados similares, ou até mesmo melhores, que as versões implementadas manualmente.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Neste trabalho, a fim de melhorar a eficiência do código paralelo gerado em arquiteturas multi-core, foi estendida a linguagem e o compilador da SPar para permitir a geração automática de padrões paralelos pertencentes aos dois principais domínios de paralelismo, o de stream e de dados. Experimentos mostram que a nova versão da SPar obteve resultados similares, ou até mesmo melhores, que as versões implementadas manualmente. |
| Hoffmann, Renato Barreto; Griebler, Dalvan; Fernandes, Luiz Gustavo Abstraindo o OpenMP no Desenvolvimento de Aplicações de Fluxo de Dados Contínuo Inproceedings doi In: 21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 69-72, Sociedade Brasileira de Computação, Joinville, Brazil, 2021. @inproceedings{HOFFMANN:ERAD:21,
title = {Abstraindo o OpenMP no Desenvolvimento de Aplicações de Fluxo de Dados Contínuo},
author = {Renato Barreto Hoffmann and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.5753/eradrs.2021.14777},
doi = {10.5753/eradrs.2021.14777},
year = {2021},
date = {2021-04-01},
booktitle = {21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {69-72},
publisher = {Sociedade Brasileira de Computação},
address = {Joinville, Brazil},
abstract = {OpenMP é complexo quando usado para desenvolver aplicações de fluxo de dados. Com o objetivo de mitigar essa dificuldade, foi utilizada uma metodologia existente, chamada SPar, para aumentar o nível de abstração. Portanto, foram utilizadas anotações mais alto-nível da SPar para gerar código mais baixo-nível de fluxo de dados com OpenMP. Os experimentos revelaram que a SPar teve desempenho 0,86% inferior no caso mais extremo.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
OpenMP é complexo quando usado para desenvolver aplicações de fluxo de dados. Com o objetivo de mitigar essa dificuldade, foi utilizada uma metodologia existente, chamada SPar, para aumentar o nível de abstração. Portanto, foram utilizadas anotações mais alto-nível da SPar para gerar código mais baixo-nível de fluxo de dados com OpenMP. Os experimentos revelaram que a SPar teve desempenho 0,86% inferior no caso mais extremo. |
| Mello, Fernanda; Griebler, Dalvan; Manssour, Isabel; Fernandes, Luiz Gustavo Compressão de Dados em Multicores com Flink ou SPar? Inproceedings doi In: 21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 77-80, Sociedade Brasileira de Computação, Joinville, Brazil, 2021. @inproceedings{MELLO:ERAD:21,
title = {Compressão de Dados em Multicores com Flink ou SPar?},
author = {Fernanda Mello and Dalvan Griebler and Isabel Manssour and Luiz Gustavo Fernandes},
url = {https://doi.org/10.5753/eradrs.2021.14779},
doi = {10.5753/eradrs.2021.14779},
year = {2021},
date = {2021-04-01},
booktitle = {21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS)},
pages = {77-80},
publisher = {Sociedade Brasileira de Computação},
address = {Joinville, Brazil},
abstract = {Neste trabalho, foi implementada uma versão do algoritmo de compressão de dados Bzip2 com o framework para processamento de stream Apache Flink, a fim de avaliar seu desempenho em comparação com a versão do Bzip2 já existente na linguagem de domínio específica SPar. Os experimentos revelaram que a versão com SPar possui um desempenho muito superior ao Flink.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Neste trabalho, foi implementada uma versão do algoritmo de compressão de dados Bzip2 com o framework para processamento de stream Apache Flink, a fim de avaliar seu desempenho em comparação com a versão do Bzip2 já existente na linguagem de domínio específica SPar. Os experimentos revelaram que a versão com SPar possui um desempenho muito superior ao Flink. |