Selected Publications

You can find below a list of selected publications. To view all publications, please click on the following button, or download the full bibliography in BibTex style on the second button.

View all publications Download bibliography

88 entries « ‹ 2 of 2 › »

2019
	Pieper, Ricardo; Griebler, Dalvan; Fernandes, Luiz G. Structured Stream Parallelism for Rust Inproceedings doi In: XXIII Brazilian Symposium on Programming Languages (SBLP), pp. 54-61, ACM, Salvador, Brazil, 2019. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{PIEPER:SBLP:19, title = {Structured Stream Parallelism for Rust}, author = {Ricardo Pieper and Dalvan Griebler and Luiz G. Fernandes}, url = {https://doi.org/10.1145/3355378.3355384}, doi = {10.1145/3355378.3355384}, year = {2019}, date = {2019-10-01}, booktitle = {XXIII Brazilian Symposium on Programming Languages (SBLP)}, pages = {54-61}, publisher = {ACM}, address = {Salvador, Brazil}, series = {SBLP'19}, abstract = {Structured parallel programming has been studied and applied in several programming languages. This approach has proven to be suitable for abstracting low-level and architecture-dependent parallelism implementations. Our goal is to provide a structured and high-level library for the Rust language, targeting parallel stream processing applications for multi-core servers. Rust is an emerging programming language that has been developed by Mozilla Research group, focusing on performance, memory safety, and thread-safety. However, it lacks parallel programming abstractions, especially for stream processing applications. This paper contributes to a new API based on the structured parallel programming approach to simplify parallel software developing. Our experiments highlight that our solution provides higher-level parallel programming abstractions for stream processing applications in Rust. We also show that the throughput and speedup are comparable to the state-of-the-art for certain workloads.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Structured parallel programming has been studied and applied in several programming languages. This approach has proven to be suitable for abstracting low-level and architecture-dependent parallelism implementations. Our goal is to provide a structured and high-level library for the Rust language, targeting parallel stream processing applications for multi-core servers. Rust is an emerging programming language that has been developed by Mozilla Research group, focusing on performance, memory safety, and thread-safety. However, it lacks parallel programming abstractions, especially for stream processing applications. This paper contributes to a new API based on the structured parallel programming approach to simplify parallel software developing. Our experiments highlight that our solution provides higher-level parallel programming abstractions for stream processing applications in Rust. We also show that the throughput and speedup are comparable to the state-of-the-art for certain workloads. Close https://doi.org/10.1145/3355378.3355384 doi:10.1145/3355378.3355384 Close
	Mencagli, Gabriele; Torquati, Massimo; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo L. Raising the Parallel Abstraction Level for Streaming Analytics Applications Journal Article doi In: IEEE Access, vol. 7, pp. 131944 - 131961, 2019. (Abstract \| Links \| BibTeX \| Tags: ) @article{MENCAGLI:IEEEAccess:19, title = {Raising the Parallel Abstraction Level for Streaming Analytics Applications}, author = {Gabriele Mencagli and Massimo Torquati and Dalvan Griebler and Marco Danelutto and Luiz Gustavo L. Fernandes}, url = {https://doi.org/10.1109/ACCESS.2019.2941183}, doi = {10.1109/ACCESS.2019.2941183}, year = {2019}, date = {2019-09-01}, journal = {IEEE Access}, volume = {7}, pages = {131944 - 131961}, publisher = {IEEE}, abstract = {In the stream processing domain, applications are represented by graphs of operators arbitrarily connected and filled with their business logic code. The APIs of existing Stream Processing Systems (SPSs) ease the development of transformations that recur in the streaming practice (e.g., filtering, aggregation and joins). In contrast, their parallelism abstractions are quite limited since they provide support to stateless operators only, or when the state is organized in a set of key-value pairs. This paper presents how the parallel patterns methodology can be revisited for sliding-window streaming analytics. Our vision fosters a design process of the application as composition and nesting of ready-to-use patterns provided through a C++17 fluent interface. Our prototype implements the run-time system of the patterns in the FastFlow parallel library expressing thread-based parallelism. The experimental analysis shows interesting outcomes. First, our pattern-based approach allows easy prototyping of different versions of the application, and the programmer can leverage nesting of patterns to increase performance (up to 37% in one of the two considered test-bed cases). Second, our FastFlow implementation outperforms (three times faster) the handmade porting of our patterns in popular JVM-based SPSs. Finally, in the concluding part of this paper, we explore the use of a task-based run-time system, by deriving interesting insights into how to make our patterns library suitable for multi backends.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close In the stream processing domain, applications are represented by graphs of operators arbitrarily connected and filled with their business logic code. The APIs of existing Stream Processing Systems (SPSs) ease the development of transformations that recur in the streaming practice (e.g., filtering, aggregation and joins). In contrast, their parallelism abstractions are quite limited since they provide support to stateless operators only, or when the state is organized in a set of key-value pairs. This paper presents how the parallel patterns methodology can be revisited for sliding-window streaming analytics. Our vision fosters a design process of the application as composition and nesting of ready-to-use patterns provided through a C++17 fluent interface. Our prototype implements the run-time system of the patterns in the FastFlow parallel library expressing thread-based parallelism. The experimental analysis shows interesting outcomes. First, our pattern-based approach allows easy prototyping of different versions of the application, and the programmer can leverage nesting of patterns to increase performance (up to 37% in one of the two considered test-bed cases). Second, our FastFlow implementation outperforms (three times faster) the handmade porting of our patterns in popular JVM-based SPSs. Finally, in the concluding part of this paper, we explore the use of a task-based run-time system, by deriving interesting insights into how to make our patterns library suitable for multi backends. Close https://doi.org/10.1109/ACCESS.2019.2941183 doi:10.1109/ACCESS.2019.2941183 Close
	Fischer, Gabriel Souto; Righi, Rodrigo Rosa; Costa, Cristiano André; Galante, Guilherme; Griebler, Dalvan Towards Evaluating Proactive and Reactive Approaches on Reorganizing Human Resources in IoT-Based Smart Hospitals Journal Article doi In: Sensors, vol. 19, no. 17, pp. 3800, 2019. (Abstract \| Links \| BibTeX \| Tags: ) @article{FISHER:Elasticity-Hospital:SENSORS:19, title = {Towards Evaluating Proactive and Reactive Approaches on Reorganizing Human Resources in IoT-Based Smart Hospitals}, author = {Gabriel Souto Fischer and Rodrigo Rosa Righi and Cristiano André Costa and Guilherme Galante and Dalvan Griebler}, url = {https://doi.org/10.3390/s19173800}, doi = {10.3390/s19173800}, year = {2019}, date = {2019-09-01}, urldate = {2019-09-01}, journal = {Sensors}, volume = {19}, number = {17}, pages = {3800}, publisher = {MDPI}, abstract = {Hospitals play an important role on ensuring a proper treatment of human health. One of the problems to be faced is the increasingly overcrowded patients care queues, who end up waiting for longer times without proper treatment to their health problems. The allocation of health professionals in hospital environments is not able to adapt to the demands of patients. There are times when underused rooms have idle professionals, and overused rooms have fewer professionals than necessary. Previous works have not solved this problem since they focus on understanding the evolution of doctor supply and patient demand, as to better adjust one to the other. However, they have not proposed concrete solutions for that regarding techniques for better allocating available human resources. Moreover, elasticity is one of the most important features of cloud computing, referring to the ability to add or remove resources according to the needs of the application or service. Based on this background, we introduce Elastic allocation of human resources in Healthcare environments (ElHealth) an IoT-focused model able to monitor patient usage of hospital rooms and adapt these rooms for patients demand. Using reactive and proactive elasticity approaches, ElHealth identifies when a room will have a demand that exceeds the capacity of care, and proposes actions to move human resources to adapt to patient demand. Our main contribution is the definition of Human Resources IoT-based Elasticity (i.e., an extension of the concept of resource elasticity in Cloud Computing to manage the use of human resources in a healthcare environment, where health professionals are allocated and deallocated according to patient demand). Another contribution is a cost–benefit analysis for the use of reactive and predictive strategies on human resources reorganization. ElHealth was simulated on a hospital environment using data from a Brazilian polyclinic, and obtained promising results, decreasing the waiting time by up to 96.4% and 96.73% in reactive and proactive approaches, respectively.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close Hospitals play an important role on ensuring a proper treatment of human health. One of the problems to be faced is the increasingly overcrowded patients care queues, who end up waiting for longer times without proper treatment to their health problems. The allocation of health professionals in hospital environments is not able to adapt to the demands of patients. There are times when underused rooms have idle professionals, and overused rooms have fewer professionals than necessary. Previous works have not solved this problem since they focus on understanding the evolution of doctor supply and patient demand, as to better adjust one to the other. However, they have not proposed concrete solutions for that regarding techniques for better allocating available human resources. Moreover, elasticity is one of the most important features of cloud computing, referring to the ability to add or remove resources according to the needs of the application or service. Based on this background, we introduce Elastic allocation of human resources in Healthcare environments (ElHealth) an IoT-focused model able to monitor patient usage of hospital rooms and adapt these rooms for patients demand. Using reactive and proactive elasticity approaches, ElHealth identifies when a room will have a demand that exceeds the capacity of care, and proposes actions to move human resources to adapt to patient demand. Our main contribution is the definition of Human Resources IoT-based Elasticity (i.e., an extension of the concept of resource elasticity in Cloud Computing to manage the use of human resources in a healthcare environment, where health professionals are allocated and deallocated according to patient demand). Another contribution is a cost–benefit analysis for the use of reactive and predictive strategies on human resources reorganization. ElHealth was simulated on a hospital environment using data from a Brazilian polyclinic, and obtained promising results, decreasing the waiting time by up to 96.4% and 96.73% in reactive and proactive approaches, respectively. Close https://doi.org/10.3390/s19173800 doi:10.3390/s19173800 Close
	Rockenbach, Dinei A.; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo High-Level Stream Parallelism Abstractions with SPar Targeting GPUs Inproceedings doi In: Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing (ParCo), pp. 543-552, IOS Press, Prague, Czech Republic, 2019. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{ROCKENBACH:PARCO:19, title = {High-Level Stream Parallelism Abstractions with SPar Targeting GPUs}, author = {Dinei A. Rockenbach and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.3233/APC200083}, doi = {10.3233/APC200083}, year = {2019}, date = {2019-09-01}, booktitle = {Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing (ParCo)}, volume = {36}, pages = {543-552}, publisher = {IOS Press}, address = {Prague, Czech Republic}, series = {ParCo'19}, abstract = {The combined exploitation of stream and data parallelism is demonstrating encouraging performance results in the literature for heterogeneous architectures, which are present on every computer systems today. However, provide parallel software efficiently targeting those architectures requires significant programming effort and expertise. The SPar domain-specific language already represents a solution to this problem providing proven high-level programming abstractions for multi-core architectures. In this paper, we enrich the SPar language adding support for GPUs. New transformation rules are designed for generating parallel code using stream and data parallel patterns. Our experiments revealed that these transformations rules are able to improve performance while the high-level programming abstractions are maintained.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close The combined exploitation of stream and data parallelism is demonstrating encouraging performance results in the literature for heterogeneous architectures, which are present on every computer systems today. However, provide parallel software efficiently targeting those architectures requires significant programming effort and expertise. The SPar domain-specific language already represents a solution to this problem providing proven high-level programming abstractions for multi-core architectures. In this paper, we enrich the SPar language adding support for GPUs. New transformation rules are designed for generating parallel code using stream and data parallel patterns. Our experiments revealed that these transformations rules are able to improve performance while the high-level programming abstractions are maintained. Close https://doi.org/10.3233/APC200083 doi:10.3233/APC200083 Close
	Vogel, Adriano; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo Seamless Parallelism Management for Multi-core Stream Processing Inproceedings doi In: Advances in Parallel Computing, Proceedings of the International Conference on Parallel Computing (ParCo), pp. 533-542, IOS Press, Prague, Czech Republic, 2019. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{VOGEL:PARCO:19, title = {Seamless Parallelism Management for Multi-core Stream Processing}, author = {Adriano Vogel and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.3233/APC200082}, doi = {10.3233/APC200082}, year = {2019}, date = {2019-09-01}, booktitle = {Advances in Parallel Computing, Proceedings of the International Conference on Parallel Computing (ParCo)}, volume = {36}, pages = {533-542}, publisher = {IOS Press}, address = {Prague, Czech Republic}, series = {ParCo'19}, abstract = {Video streaming applications have critical performance requirements for dealing with fluctuating workloads and providing results in real-time. As a consequence, the majority of these applications demand parallelism for delivering quality of service to users. Although high-level and structured parallel programming aims at facilitating parallelism exploitation, there are still several issues to be addressed for increasing/improving existing parallel programming abstractions. In this paper, we aim at employing self-adaptivity for stream processing in order to seamlessly manage the application parallelism configurations at run-time, where a new strategy alleviates from application programmers the need to set time-consuming and error-prone parallelism parameters. The new strategy was implemented and validated on SPar. The results have shown that the proposed solution increases the level of abstraction and achieved a competitive performance.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Video streaming applications have critical performance requirements for dealing with fluctuating workloads and providing results in real-time. As a consequence, the majority of these applications demand parallelism for delivering quality of service to users. Although high-level and structured parallel programming aims at facilitating parallelism exploitation, there are still several issues to be addressed for increasing/improving existing parallel programming abstractions. In this paper, we aim at employing self-adaptivity for stream processing in order to seamlessly manage the application parallelism configurations at run-time, where a new strategy alleviates from application programmers the need to set time-consuming and error-prone parallelism parameters. The new strategy was implemented and validated on SPar. The results have shown that the proposed solution increases the level of abstraction and achieved a competitive performance. Close https://doi.org/10.3233/APC200082 doi:10.3233/APC200082 Close
	Vogel, Adriano; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo Minimizing Self-Adaptation Overhead in Parallel Stream Processing for Multi-Cores Inproceedings doi In: Euro-Par 2019: Parallel Processing Workshops, pp. 12, Springer, Göttingen, Germany, 2019. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{VOGEL:adaptive-overhead:AutoDaSP:19, title = {Minimizing Self-Adaptation Overhead in Parallel Stream Processing for Multi-Cores}, author = {Adriano Vogel and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1007/978-3-030-48340-1_3}, doi = {10.1007/978-3-030-48340-1_3}, year = {2019}, date = {2019-08-01}, booktitle = {Euro-Par 2019: Parallel Processing Workshops}, volume = {11997}, pages = {12}, publisher = {Springer}, address = {Göttingen, Germany}, series = {Lecture Notes in Computer Science}, abstract = {Stream processing paradigm is present in several applications that apply computations over continuous data flowing in the form of streams (e.g., video feeds, image, and data analytics). Employing self-adaptivity to stream processing applications can provide higher-level programming abstractions and autonomic resource management. However, there are cases where the performance is suboptimal. In this paper, the goal is to optimize parallelism adaptations in terms of stability and accuracy, which can improve the performance of parallel stream processing applications. Therefore, we present a new optimized self-adaptive strategy that is experimentally evaluated. The proposed solution provided high-level programming abstractions, reduced the adaptation overhead, and achieved a competitive performance with the best static executions.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Stream processing paradigm is present in several applications that apply computations over continuous data flowing in the form of streams (e.g., video feeds, image, and data analytics). Employing self-adaptivity to stream processing applications can provide higher-level programming abstractions and autonomic resource management. However, there are cases where the performance is suboptimal. In this paper, the goal is to optimize parallelism adaptations in terms of stability and accuracy, which can improve the performance of parallel stream processing applications. Therefore, we present a new optimized self-adaptive strategy that is experimentally evaluated. The proposed solution provided high-level programming abstractions, reduced the adaptation overhead, and achieved a competitive performance with the best static executions. Close https://doi.org/10.1007/978-3-030-48340-1_3 doi:10.1007/978-3-030-48340-1_3 Close
	Maliszewski, Anderson M.; Vogel, Adriano; Griebler, Dalvan; Roloff, Eduardo; Fernandes, Luz G.; Navaux, Philippe O. A. Minimizing Communication Overheads in Container-based Clouds for HPC Applications Inproceedings doi In: IEEE Symposium on Computers and Communications (ISCC), pp. 1-6, IEEE, Barcelona, Spain, 2019. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{larcc:communication_overhead_lxd:ISCC:19, title = {Minimizing Communication Overheads in Container-based Clouds for HPC Applications}, author = {Anderson M. Maliszewski and Adriano Vogel and Dalvan Griebler and Eduardo Roloff and Luz G. Fernandes and Philippe O. A. Navaux}, url = {https://doi.org/10.1109/ISCC47284.2019.8969716}, doi = {10.1109/ISCC47284.2019.8969716}, year = {2019}, date = {2019-07-01}, booktitle = {IEEE Symposium on Computers and Communications (ISCC)}, pages = {1-6}, publisher = {IEEE}, address = {Barcelona, Spain}, series = {ISCC'19}, abstract = {Although the industry has embraced the cloud computing model, there are still significant challenges to be addressed concerning the quality of cloud services. Network-intensive applications may not scale in the cloud due to the sharing of the network infrastructure. In the literature, performance evaluation studies are showing that the network tends to limit the scalability and performance of HPC applications. Therefore, we proposed the aggregation of Network Interface Cards (NICs) in a ready-to-use integration with the OpenNebula cloud manager using Linux containers. We perform a set of experiments using a network microbenchmark to get specific network performance metrics and NAS parallel benchmarks to analyze the performance impact on HPC applications. Our results highlight that the implementation of NIC aggregation improves network performance in terms of throughput and latency. Moreover, HPC applications have different patterns of behavior when using our approach, which depends on communication and the amount of data transferring. While network-intensive applications increased the performance up to 38%, other applications with aggregated NICs maintained the same performance or presented slightly worse performance.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Although the industry has embraced the cloud computing model, there are still significant challenges to be addressed concerning the quality of cloud services. Network-intensive applications may not scale in the cloud due to the sharing of the network infrastructure. In the literature, performance evaluation studies are showing that the network tends to limit the scalability and performance of HPC applications. Therefore, we proposed the aggregation of Network Interface Cards (NICs) in a ready-to-use integration with the OpenNebula cloud manager using Linux containers. We perform a set of experiments using a network microbenchmark to get specific network performance metrics and NAS parallel benchmarks to analyze the performance impact on HPC applications. Our results highlight that the implementation of NIC aggregation improves network performance in terms of throughput and latency. Moreover, HPC applications have different patterns of behavior when using our approach, which depends on communication and the amount of data transferring. While network-intensive applications increased the performance up to 38%, other applications with aggregated NICs maintained the same performance or presented slightly worse performance. Close https://doi.org/10.1109/ISCC47284.2019.8969716 doi:10.1109/ISCC47284.2019.8969716 Close
	Griebler, Dalvan; Vogel, Adriano; Sensi, Daniele De; Danelutto, Marco; Fernandes, Luiz Gustavo Simplifying and implementing service level objectives for stream parallelism Journal Article doi In: The Journal of Supercomputing, vol. 76, pp. 4603-4628, 2019, ISSN: 0920-8542. (Abstract \| Links \| BibTeX \| Tags: ) @article{GRIEBLER:JS:19, title = {Simplifying and implementing service level objectives for stream parallelism}, author = {Dalvan Griebler and Adriano Vogel and Daniele De Sensi and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1007/s11227-019-02914-6}, doi = {10.1007/s11227-019-02914-6}, issn = {0920-8542}, year = {2019}, date = {2019-06-01}, urldate = {2019-06-01}, journal = {The Journal of Supercomputing}, volume = {76}, pages = {4603-4628}, publisher = {Springer}, abstract = {An increasing attention has been given to provide service level objectives (SLOs) in stream processing applications due to the performance and energy requirements, and because of the need to impose limits in terms of resource usage while improving the system utilization. Since the current and next-generation computing systems are intrinsically offering parallel architectures, the software has to naturally exploit the architecture parallelism. Implement and meet SLOs on existing applications is not a trivial task for application programmers, since the software development process, besides the parallelism exploitation, requires the implementation of autonomic algorithms or strategies. This is a system-oriented programming approach and requires the management of multiple knobs and sensors (e.g., the number of threads to use, the clock frequency of the cores, etc.) so that the system can self-adapt at runtime. In this work, we introduce a new and simpler way to define SLO in the application’s source code, by abstracting from the programmer all the details relative to self-adaptive system implementation. The application programmer specifies which parts of the code to parallelize and the related SLOs that should be enforced. To reach this goal, source-to-source code transformation rules are implemented in our compiler, which automatically generates self-adaptive strategies to enforce, at runtime, the user-expressed objectives. The experiments highlighted promising results with simpler, effective, and efficient SLO implementations for real-world applications.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close An increasing attention has been given to provide service level objectives (SLOs) in stream processing applications due to the performance and energy requirements, and because of the need to impose limits in terms of resource usage while improving the system utilization. Since the current and next-generation computing systems are intrinsically offering parallel architectures, the software has to naturally exploit the architecture parallelism. Implement and meet SLOs on existing applications is not a trivial task for application programmers, since the software development process, besides the parallelism exploitation, requires the implementation of autonomic algorithms or strategies. This is a system-oriented programming approach and requires the management of multiple knobs and sensors (e.g., the number of threads to use, the clock frequency of the cores, etc.) so that the system can self-adapt at runtime. In this work, we introduce a new and simpler way to define SLO in the application’s source code, by abstracting from the programmer all the details relative to self-adaptive system implementation. The application programmer specifies which parts of the code to parallelize and the related SLOs that should be enforced. To reach this goal, source-to-source code transformation rules are implemented in our compiler, which automatically generates self-adaptive strategies to enforce, at runtime, the user-expressed objectives. The experiments highlighted promising results with simpler, effective, and efficient SLO implementations for real-world applications. Close https://doi.org/10.1007/s11227-019-02914-6 doi:10.1007/s11227-019-02914-6 Close
	Rockenbach, Dinei A.; Stein, Charles Michael; Griebler, Dalvan; Mencagli, Gabriele; Torquati, Massimo; Danelutto, Marco; Fernandes, Luiz Gustavo Stream Processing on Multi-cores with GPUs: Parallel Programming Models' Challenges Inproceedings doi In: International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 834-841, IEEE, Rio de Janeiro, Brazil, 2019. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{ROCKENBACH:stream-multigpus:IPDPSW:19, title = {Stream Processing on Multi-cores with GPUs: Parallel Programming Models' Challenges}, author = {Dinei A. Rockenbach and Charles Michael Stein and Dalvan Griebler and Gabriele Mencagli and Massimo Torquati and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/IPDPSW.2019.00137}, doi = {10.1109/IPDPSW.2019.00137}, year = {2019}, date = {2019-05-01}, booktitle = {International Parallel and Distributed Processing Symposium Workshops (IPDPSW)}, pages = {834-841}, publisher = {IEEE}, address = {Rio de Janeiro, Brazil}, series = {IPDPSW'19}, abstract = {The stream processing paradigm is used in several scientific and enterprise applications in order to continuously compute results out of data items coming from data sources such as sensors. The full exploitation of the potential parallelism offered by current heterogeneous multi-cores equipped with one or more GPUs is still a challenge in the context of stream processing applications. In this work, our main goal is to present the parallel programming challenges that the programmer has to face when exploiting CPUs and GPUs' parallelism at the same time using traditional programming models. We highlight the parallelization methodology in two use-cases (the Mandelbrot Streaming benchmark and the PARSEC's Dedup application) to demonstrate the issues and benefits of using heterogeneous parallel hardware. The experiments conducted demonstrate how a high-level parallel programming model targeting stream processing like the one offered by SPar can be used to reduce the programming effort still offering a good level of performance if compared with state-of-the-art programming models.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close The stream processing paradigm is used in several scientific and enterprise applications in order to continuously compute results out of data items coming from data sources such as sensors. The full exploitation of the potential parallelism offered by current heterogeneous multi-cores equipped with one or more GPUs is still a challenge in the context of stream processing applications. In this work, our main goal is to present the parallel programming challenges that the programmer has to face when exploiting CPUs and GPUs' parallelism at the same time using traditional programming models. We highlight the parallelization methodology in two use-cases (the Mandelbrot Streaming benchmark and the PARSEC's Dedup application) to demonstrate the issues and benefits of using heterogeneous parallel hardware. The experiments conducted demonstrate how a high-level parallel programming model targeting stream processing like the one offered by SPar can be used to reduce the programming effort still offering a good level of performance if compared with state-of-the-art programming models. Close https://doi.org/10.1109/IPDPSW.2019.00137 doi:10.1109/IPDPSW.2019.00137 Close
	Stein, Charles Michael; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo Stream Parallelism on the LZSS Data Compression Application for Multi-Cores with GPUs Inproceedings doi In: 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 247-251, IEEE, Pavia, Italy, 2019. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{STEIN:LZSS-multigpu:PDP:19, title = {Stream Parallelism on the LZSS Data Compression Application for Multi-Cores with GPUs}, author = {Charles Michael Stein and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/EMPDP.2019.8671624}, doi = {10.1109/EMPDP.2019.8671624}, year = {2019}, date = {2019-02-01}, booktitle = {27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)}, pages = {247-251}, publisher = {IEEE}, address = {Pavia, Italy}, series = {PDP'19}, abstract = {GPUs have been used to accelerate different data parallel applications. The challenge consists in using GPUs to accelerate stream processing applications. Our goal is to investigate and evaluate whether stream parallel applications may benefit from parallel execution on both CPU and GPU cores. In this paper, we introduce new parallel algorithms for the Lempel-Ziv-Storer-Szymanski (LZSS) data compression application. We implemented the algorithms targeting both CPUs and GPUs. GPUs have been used with CUDA and OpenCL to exploit inner algorithm data parallelism. Outer stream parallelism has been exploited using CPU cores through SPar. The parallel implementation of LZSS achieved 135 fold speedup using a multi-core CPU and two GPUs. We also observed speedups in applications where we were not expecting to get it using the same combine data-stream parallel exploitation techniques.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close GPUs have been used to accelerate different data parallel applications. The challenge consists in using GPUs to accelerate stream processing applications. Our goal is to investigate and evaluate whether stream parallel applications may benefit from parallel execution on both CPU and GPU cores. In this paper, we introduce new parallel algorithms for the Lempel-Ziv-Storer-Szymanski (LZSS) data compression application. We implemented the algorithms targeting both CPUs and GPUs. GPUs have been used with CUDA and OpenCL to exploit inner algorithm data parallelism. Outer stream parallelism has been exploited using CPU cores through SPar. The parallel implementation of LZSS achieved 135 fold speedup using a multi-core CPU and two GPUs. We also observed speedups in applications where we were not expecting to get it using the same combine data-stream parallel exploitation techniques. Close https://doi.org/10.1109/EMPDP.2019.8671624 doi:10.1109/EMPDP.2019.8671624 Close
	Maron, Carlos A. F.; Vogel, Adriano; Griebler, Dalvan; Fernandes, Luiz Gustavo Should PARSEC Benchmarks be More Parametric? A Case Study with Dedup Inproceedings doi In: 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 217-221, IEEE, Pavia, Italy, 2019. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{MARON:parametric-parsec:PDP:19, title = {Should PARSEC Benchmarks be More Parametric? A Case Study with Dedup}, author = {Carlos A. F. Maron and Adriano Vogel and Dalvan Griebler and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/EMPDP.2019.8671592}, doi = {10.1109/EMPDP.2019.8671592}, year = {2019}, date = {2019-02-01}, booktitle = {27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)}, pages = {217-221}, publisher = {IEEE}, address = {Pavia, Italy}, series = {PDP'19}, abstract = {Parallel applications of the same domain can present similar patterns of behavior and characteristics. Characterizing common application behaviors can help for understanding performance aspects in the real-world scenario. One way to better understand and evaluate applications' characteristics is by using customizable/parametric benchmarks that enable users to represent important characteristics at run-time. We observed that parameterization techniques should be better exploited in the available benchmarks, especially on stream processing domain. For instance, although widely used, the stream processing benchmarks available in PARSEC do not support the simulation and evaluation of relevant and modern characteristics. Therefore, our goal is to identify the stream parallelism characteristics present in PARSEC. We also implemented a ready to use parameterization support and evaluated the application behaviors considering relevant performance metrics for stream parallelism (service time, throughput, latency). We choose Dedup to be our case study. The experimental results have shown performance improvements in our parameterization support for Dedup. Moreover, this support increased the customization space for benchmark users, which is simple to use. In the future, our solution can be potentially explored on different parallel architectures and parallel programming frameworks.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Parallel applications of the same domain can present similar patterns of behavior and characteristics. Characterizing common application behaviors can help for understanding performance aspects in the real-world scenario. One way to better understand and evaluate applications' characteristics is by using customizable/parametric benchmarks that enable users to represent important characteristics at run-time. We observed that parameterization techniques should be better exploited in the available benchmarks, especially on stream processing domain. For instance, although widely used, the stream processing benchmarks available in PARSEC do not support the simulation and evaluation of relevant and modern characteristics. Therefore, our goal is to identify the stream parallelism characteristics present in PARSEC. We also implemented a ready to use parameterization support and evaluated the application behaviors considering relevant performance metrics for stream parallelism (service time, throughput, latency). We choose Dedup to be our case study. The experimental results have shown performance improvements in our parameterization support for Dedup. Moreover, this support increased the customization space for benchmark users, which is simple to use. In the future, our solution can be potentially explored on different parallel architectures and parallel programming frameworks. Close https://doi.org/10.1109/EMPDP.2019.8671592 doi:10.1109/EMPDP.2019.8671592 Close
	Serpa, Matheus S.; Moreira, Francis B.; Navaux, Philippe O. A.; Cruz, Eduardo H. M.; Diener, Matthias; Griebler, Dalvan; Fernandes, Luiz Gustavo Memory Performance and Bottlenecks in Multicore and GPU Architectures Inproceedings doi In: 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 233-236, IEEE, Pavia, Italy, 2019. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{SERPA:memory-gpu-multicore:PDP:19, title = {Memory Performance and Bottlenecks in Multicore and GPU Architectures}, author = {Matheus S. Serpa and Francis B. Moreira and Philippe O. A. Navaux and Eduardo H. M. Cruz and Matthias Diener and Dalvan Griebler and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/EMPDP.2019.8671628}, doi = {10.1109/EMPDP.2019.8671628}, year = {2019}, date = {2019-02-01}, booktitle = {27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)}, pages = {233-236}, publisher = {IEEE}, address = {Pavia, Italy}, series = {PDP'19}, abstract = {Nowadays, there are several different architectures available not only for the industry, but also for normal consumers. Traditional multicore processors, GPUs, accelerators such as the Sunway SW26010, or even energy efficiency-driven processors such as the ARM family, present very different architectural characteristics. This wide range of characteristics presents a challenge for the developers of applications. Developers must deal with different instruction sets, memory hierarchies, or even different programming paradigms when programming for these architectures. Therefore, the same application can perform well when executing on one architecture, but poorly on another architecture. To optimize an application, it is important to have a deep understanding of how it behaves on different architectures. The related work in this area mostly focuses on a limited analysis encompassing execution time and energy. In this paper, we perform a detailed investigation on the impact of the memory subsystem of different architectures, which is one of the most important aspects to be considered. For this study, we performed experiments in the Broadwell CPU and Pascal GPU, using applications from the Rodinia benchmark suite. In this way, we were able to understand why an application performs well on one architecture and poorly on others.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Nowadays, there are several different architectures available not only for the industry, but also for normal consumers. Traditional multicore processors, GPUs, accelerators such as the Sunway SW26010, or even energy efficiency-driven processors such as the ARM family, present very different architectural characteristics. This wide range of characteristics presents a challenge for the developers of applications. Developers must deal with different instruction sets, memory hierarchies, or even different programming paradigms when programming for these architectures. Therefore, the same application can perform well when executing on one architecture, but poorly on another architecture. To optimize an application, it is important to have a deep understanding of how it behaves on different architectures. The related work in this area mostly focuses on a limited analysis encompassing execution time and energy. In this paper, we perform a detailed investigation on the impact of the memory subsystem of different architectures, which is one of the most important aspects to be considered. For this study, we performed experiments in the Broadwell CPU and Pascal GPU, using applications from the Rodinia benchmark suite. In this way, we were able to understand why an application performs well on one architecture and poorly on others. Close https://doi.org/10.1109/EMPDP.2019.8671628 doi:10.1109/EMPDP.2019.8671628 Close
2018
	Maliszewski, Anderson M; Griebler, Dalvan; Vogel, Adriano; Schepke, Claudio On the Performance of Multithreading Applications under Private Cloud Conditions Inproceedings doi In: Symposium on High Performance Computing Systems (WSCAD), pp. 273-273, IEEE, São Paulo, Brazil, 2018. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{larcc:multithreading_cloud:WSCAD:18, title = {On the Performance of Multithreading Applications under Private Cloud Conditions}, author = {Anderson M Maliszewski and Dalvan Griebler and Adriano Vogel and Claudio Schepke}, url = {https://doi.org/10.1109/WSCAD.2018.00055}, doi = {10.1109/WSCAD.2018.00055}, year = {2018}, date = {2018-10-01}, booktitle = {Symposium on High Performance Computing Systems (WSCAD)}, pages = {273-273}, publisher = {IEEE}, address = {São Paulo, Brazil}, abstract = {IaaS private clouds provide an attractive environment for scientific applications. However, the performance is a challenge, as additional abstraction layers imposed by the virtualization can cause overheads and bottlenecks. This paper contributes to a performance analysis of applications with dedicated and shared resources environments under private cloud conditions, deployed with container (LXC) or kernel-based (KVM) instances. We selected five benchmarks from PARSEC suite. In the experimental results, identify a performance pattern of behavior among the applications was hard. For a set of multi-threading applications, the KVM-based cloud instances achieved better performance, however, in the other set of applications, the LXC-based cloud instances performed better.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close IaaS private clouds provide an attractive environment for scientific applications. However, the performance is a challenge, as additional abstraction layers imposed by the virtualization can cause overheads and bottlenecks. This paper contributes to a performance analysis of applications with dedicated and shared resources environments under private cloud conditions, deployed with container (LXC) or kernel-based (KVM) instances. We selected five benchmarks from PARSEC suite. In the experimental results, identify a performance pattern of behavior among the applications was hard. For a set of multi-threading applications, the KVM-based cloud instances achieved better performance, however, in the other set of applications, the LXC-based cloud instances performed better. Close https://doi.org/10.1109/WSCAD.2018.00055 doi:10.1109/WSCAD.2018.00055 Close
	Ewald, Endrius; Vogel, Adriano; Rista, Cassiano; Griebler, Dalvan; Manssour, Isabel; Fernandes, Luiz G. Parallel and Distributed Processing Support for a Geospatial Data Visualization DSL Inproceedings doi In: Symposium on High Performance Computing Systems (WSCAD), pp. 221-228, IEEE, São Paulo, Brazil, 2018. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{EWALD:WSCAD:18, title = {Parallel and Distributed Processing Support for a Geospatial Data Visualization DSL}, author = {Endrius Ewald and Adriano Vogel and Cassiano Rista and Dalvan Griebler and Isabel Manssour and Luiz G. Fernandes}, url = {https://doi.org/10.1109/WSCAD.2018.00042}, doi = {10.1109/WSCAD.2018.00042}, year = {2018}, date = {2018-10-01}, booktitle = {Symposium on High Performance Computing Systems (WSCAD)}, pages = {221-228}, publisher = {IEEE}, address = {São Paulo, Brazil}, abstract = {The amount of data generated worldwide related to geolocalization has exponentially increased. However, the fast processing of this amount of data is a challenge from the programming perspective, and many available solutions require learning a variety of tools and programming languages. This paper introduces the support for parallel and distributed processing in a DSL for Geospatial Data Visualization to speed up the data pre-processing phase. The results have shown the MPI version with dynamic data distribution performing better under medium and large data set files, while MPI-I/O version achieved the best performance with small data set files.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close The amount of data generated worldwide related to geolocalization has exponentially increased. However, the fast processing of this amount of data is a challenge from the programming perspective, and many available solutions require learning a variety of tools and programming languages. This paper introduces the support for parallel and distributed processing in a DSL for Geospatial Data Visualization to speed up the data pre-processing phase. The results have shown the MPI version with dynamic data distribution performing better under medium and large data set files, while MPI-I/O version achieved the best performance with small data set files. Close https://doi.org/10.1109/WSCAD.2018.00042 doi:10.1109/WSCAD.2018.00042 Close
	Vogel, Adriano; Griebler, Dalvan; Sensi, Daniele De; Danelutto, Marco; Fernandes, Luiz Gustavo Autonomic and Latency-Aware Degree of Parallelism Management in SPar Inproceedings doi In: Euro-Par 2018: Parallel Processing Workshops, pp. 28-39, Springer, Turin, Italy, 2018. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{VOGEL:Adaptive-Latency-SPar:AutoDaSP:18, title = {Autonomic and Latency-Aware Degree of Parallelism Management in SPar}, author = {Adriano Vogel and Dalvan Griebler and Daniele De Sensi and Marco Danelutto and Luiz Gustavo Fernandes}, url = {http://dx.doi.org/10.1007/978-3-030-10549-5_3}, doi = {10.1007/978-3-030-10549-5_3}, year = {2018}, date = {2018-08-01}, booktitle = {Euro-Par 2018: Parallel Processing Workshops}, pages = {28-39}, publisher = {Springer}, address = {Turin, Italy}, series = {Lecture Notes in Computer Science}, abstract = {Stream processing applications became a representative workload in current computing systems. A significant part of these applications demands parallelism to increase performance. However, programmers are often facing a trade-off between coding productivity and performance when introducing parallelism. SPar was created for balancing this trade-off to the application programmers by using the C++11 attributes’ annotation mechanism. In SPar and other programming frameworks for stream processing applications, the manual definition of the number of replicas to be used for the stream operators is a challenge. In addition to that, low latency is required by several stream processing applications. We noted that explicit latency requirements are poorly considered on the state-of-the-art parallel programming frameworks. Since there is a direct relationship between the number of replicas and the latency of the application, in this work we propose an autonomic and adaptive strategy to choose the proper number of replicas in SPar to address latency constraints. We experimentally evaluated our implemented strategy and demonstrated its effectiveness on a real-world application, demonstrating that our adaptive strategy can provide higher abstraction levels while automatically managing the latency.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Stream processing applications became a representative workload in current computing systems. A significant part of these applications demands parallelism to increase performance. However, programmers are often facing a trade-off between coding productivity and performance when introducing parallelism. SPar was created for balancing this trade-off to the application programmers by using the C++11 attributes’ annotation mechanism. In SPar and other programming frameworks for stream processing applications, the manual definition of the number of replicas to be used for the stream operators is a challenge. In addition to that, low latency is required by several stream processing applications. We noted that explicit latency requirements are poorly considered on the state-of-the-art parallel programming frameworks. Since there is a direct relationship between the number of replicas and the latency of the application, in this work we propose an autonomic and adaptive strategy to choose the proper number of replicas in SPar to address latency constraints. We experimentally evaluated our implemented strategy and demonstrated its effectiveness on a real-world application, demonstrating that our adaptive strategy can provide higher abstraction levels while automatically managing the latency. Close http://dx.doi.org/10.1007/978-3-030-10549-5_3 doi:10.1007/978-3-030-10549-5_3 Close
	Griebler, Dalvan; Sensi, Daniele De; Vogel, Adriano; Danelutto, Marco; Fernandes, Luiz Gustavo Service Level Objectives via C++11 Attributes Inproceedings doi In: Euro-Par 2018: Parallel Processing Workshops, pp. 745-756, Springer, Turin, Italy, 2018. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{GRIEBLER:SLO-SPar-Nornir:REPARA:18, title = {Service Level Objectives via C++11 Attributes}, author = {Dalvan Griebler and Daniele De Sensi and Adriano Vogel and Marco Danelutto and Luiz Gustavo Fernandes}, url = {http://dx.doi.org/10.1007/978-3-030-10549-5_58}, doi = {10.1007/978-3-030-10549-5_58}, year = {2018}, date = {2018-08-01}, booktitle = {Euro-Par 2018: Parallel Processing Workshops}, pages = {745-756}, publisher = {Springer}, address = {Turin, Italy}, series = {Lecture Notes in Computer Science}, abstract = {In recent years, increasing attention has been given to the possibility of guaranteeing Service Level Objectives (SLOs) to users about their applications, either regarding performance or power consumption. SLO can be implemented for parallel applications since they can provide many control knobs (e.g., the number of threads to use, the clock frequency of the cores, etc.) to tune the performance and power consumption of the application. Different from most of the existing approaches, we target sequential stream processing applications by proposing a solution based on C++ annotations. The user specifies which parts of the code to parallelize and what type of requirements should be enforced on that part of the code. Our solution first automatically parallelizes the annotated code and then applies self-adaptation approaches at run-time to enforce the user-expressed objectives. We ran experiments on different real-world applications, showing its simplicity and effectiveness.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close In recent years, increasing attention has been given to the possibility of guaranteeing Service Level Objectives (SLOs) to users about their applications, either regarding performance or power consumption. SLO can be implemented for parallel applications since they can provide many control knobs (e.g., the number of threads to use, the clock frequency of the cores, etc.) to tune the performance and power consumption of the application. Different from most of the existing approaches, we target sequential stream processing applications by proposing a solution based on C++ annotations. The user specifies which parts of the code to parallelize and what type of requirements should be enforced on that part of the code. Our solution first automatically parallelizes the annotated code and then applies self-adaptation approaches at run-time to enforce the user-expressed objectives. We ran experiments on different real-world applications, showing its simplicity and effectiveness. Close http://dx.doi.org/10.1007/978-3-030-10549-5_58 doi:10.1007/978-3-030-10549-5_58 Close
	Maliszewski, Anderson M; Griebler, Dalvan; Schepke, Claudio; Ditter, Alexander; Fey, Dietmar; Fernandes, Luiz Gustavo The NAS Benchmark Kernels for Single and Multi-Tenant Cloud Instances with LXC/KVM Inproceedings doi In: International Conference on High Performance Computing & Simulation (HPCS), pp. 359-366, IEEE, Orleans, France, 2018. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{larcc:NAS_cloud_LXC_KVM:HPCS:2018, title = {The NAS Benchmark Kernels for Single and Multi-Tenant Cloud Instances with LXC/KVM}, author = {Anderson M Maliszewski and Dalvan Griebler and Claudio Schepke and Alexander Ditter and Dietmar Fey and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/HPCS.2018.00066}, doi = {10.1109/HPCS.2018.00066}, year = {2018}, date = {2018-07-01}, booktitle = {International Conference on High Performance Computing & Simulation (HPCS)}, pages = {359-366}, publisher = {IEEE}, address = {Orleans, France}, series = {HPCS'18}, abstract = {Private IaaS clouds are an attractive environment for scientific workloads and applications. It provides advantages such as almost instantaneous availability of high-performance computing in a single node as well as compute clusters, easy access for researchers, and users that do not have access to conventional supercomputers. Furthermore, a cloud infrastructure provides elasticity and scalability to ensure and manage any software dependency on the system with no third-party dependency for researchers. However, one of the biggest challenges is to avoid significant performance degradation when migrating these applications from physical nodes to a cloud environment. Also, we lack more research investigations for multi-tenant cloud instances. In this paper, our goal is to perform a comparative performance evaluation of scientific applications with single and multi-tenancy cloud instances using KVM and LXC virtualization technologies under private cloud conditions. All analyses and evaluations were carried out based on NAS Benchmark kernels to simulate different types of workloads. We applied statistic significance tests to highlight the differences. The results have shown that applications running on LXC-based cloud instances outperform KVM-based cloud instances in 93.75% of the experiments w.r.t single tenant. Regarding multi-tenant, LXC instances outperform KVM instances in 45% of the results, where the performance differences were not as significant as expected.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Private IaaS clouds are an attractive environment for scientific workloads and applications. It provides advantages such as almost instantaneous availability of high-performance computing in a single node as well as compute clusters, easy access for researchers, and users that do not have access to conventional supercomputers. Furthermore, a cloud infrastructure provides elasticity and scalability to ensure and manage any software dependency on the system with no third-party dependency for researchers. However, one of the biggest challenges is to avoid significant performance degradation when migrating these applications from physical nodes to a cloud environment. Also, we lack more research investigations for multi-tenant cloud instances. In this paper, our goal is to perform a comparative performance evaluation of scientific applications with single and multi-tenancy cloud instances using KVM and LXC virtualization technologies under private cloud conditions. All analyses and evaluations were carried out based on NAS Benchmark kernels to simulate different types of workloads. We applied statistic significance tests to highlight the differences. The results have shown that applications running on LXC-based cloud instances outperform KVM-based cloud instances in 93.75% of the experiments w.r.t single tenant. Regarding multi-tenant, LXC instances outperform KVM instances in 45% of the results, where the performance differences were not as significant as expected. Close https://doi.org/10.1109/HPCS.2018.00066 doi:10.1109/HPCS.2018.00066 Close
	Griebler, Dalvan; Hoffmann, Renato B.; Danelutto, Marco; Fernandes, Luiz Gustavo Stream Parallelism with Ordered Data Constraints on Multi-Core Systems Journal Article doi In: The Journal of Supercomputing, vol. 75, no. 8, pp. 4042-4061, 2018, ISSN: 0920-8542. (Abstract \| Links \| BibTeX \| Tags: ) @article{GRIEBLER:JS:18, title = {Stream Parallelism with Ordered Data Constraints on Multi-Core Systems}, author = {Dalvan Griebler and Renato B. Hoffmann and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1007/s11227-018-2482-7}, doi = {10.1007/s11227-018-2482-7}, issn = {0920-8542}, year = {2018}, date = {2018-07-01}, urldate = {2018-07-01}, journal = {The Journal of Supercomputing}, volume = {75}, number = {8}, pages = {4042-4061}, publisher = {Springer}, abstract = {It is often a challenge to keep input/output tasks/results in order for parallel computations ver data streams, particularly when stateless task operators are replicated to increase parallelism when there are irregular tasks. Maintaining input/output order requires additional coding effort and may significantly impact the application's actual throughput. Thus, we propose a new implementation technique designed to be easily integrated with any of the existing C++ parallel programming frameworks that support stream parallelism. In this paper, it is first implemented and studied using SPar, our high-level domain-specific language for stream parallelism. We discuss the results of a set of experiments with real-world applications revealing how significant performance improvements may be achieved when our proposed solution is integrated within SPar, especially for data compression applications. Also, we show the results of experiments performed after integrating our solution within FastFlow and TBB, revealing no significant overheads.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close It is often a challenge to keep input/output tasks/results in order for parallel computations ver data streams, particularly when stateless task operators are replicated to increase parallelism when there are irregular tasks. Maintaining input/output order requires additional coding effort and may significantly impact the application's actual throughput. Thus, we propose a new implementation technique designed to be easily integrated with any of the existing C++ parallel programming frameworks that support stream parallelism. In this paper, it is first implemented and studied using SPar, our high-level domain-specific language for stream parallelism. We discuss the results of a set of experiments with real-world applications revealing how significant performance improvements may be achieved when our proposed solution is integrated within SPar, especially for data compression applications. Also, we show the results of experiments performed after integrating our solution within FastFlow and TBB, revealing no significant overheads. Close https://doi.org/10.1007/s11227-018-2482-7 doi:10.1007/s11227-018-2482-7 Close
	Griebler, Dalvan; Vogel, Adriano; Maron, Carlos A F; Maliszewski, Anderson M; Schepke, Claudio; Fernandes, Luiz Gustavo Performance of Data Mining, Media, and Financial Applications under Private Cloud Conditions Inproceedings doi In: IEEE Symposium on Computers and Communications (ISCC), pp. 1530-1346, IEEE, Natal, Brazil, 2018. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{larcc:parsec_cloudstack_lxc_kvm:ISCC:2018, title = {Performance of Data Mining, Media, and Financial Applications under Private Cloud Conditions}, author = {Dalvan Griebler and Adriano Vogel and Carlos A F Maron and Anderson M Maliszewski and Claudio Schepke and Luiz Gustavo Fernandes}, url = {https://dx.doi.org/10.1109/ISCC.2018.8538759}, doi = {10.1109/ISCC.2018.8538759}, year = {2018}, date = {2018-06-01}, booktitle = {IEEE Symposium on Computers and Communications (ISCC)}, pages = {1530-1346}, publisher = {IEEE}, address = {Natal, Brazil}, series = {ISCC'18}, abstract = {This paper contributes to a performance analysis of real-world workloads under private cloud conditions. We selected six benchmarks from PARSEC related to three mainstream application domains (financial, data mining, and media processing). Our goal was to evaluate these application domains in different cloud instances and deployment environments, concerning container or kernel-based instances and using dedicated or shared machine resources. Experiments have shown that performance varies according to the application characteristics, virtualization technology, and cloud environment. Results highlighted that financial, data mining, and media processing applications running in the LXC instances tend to outperform KVM when there is a dedicated machine resource environment. However, when two instances are sharing the same machine resources, these applications tend to achieve better performance in the KVM instances. Finally, financial applications achieved better performance in the cloud than media and data mining.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close This paper contributes to a performance analysis of real-world workloads under private cloud conditions. We selected six benchmarks from PARSEC related to three mainstream application domains (financial, data mining, and media processing). Our goal was to evaluate these application domains in different cloud instances and deployment environments, concerning container or kernel-based instances and using dedicated or shared machine resources. Experiments have shown that performance varies according to the application characteristics, virtualization technology, and cloud environment. Results highlighted that financial, data mining, and media processing applications running in the LXC instances tend to outperform KVM when there is a dedicated machine resource environment. However, when two instances are sharing the same machine resources, these applications tend to achieve better performance in the KVM instances. Finally, financial applications achieved better performance in the cloud than media and data mining. Close https://dx.doi.org/10.1109/ISCC.2018.8538759 Close
	Rista, Cassiano; Teixeira, Marcelo; Griebler, Dalvan; Fernandes, Luiz Gustavo Evaluating, Estimating, and Improving Network Performance in Container-based Clouds Inproceedings doi In: IEEE Symposium on Computers and Communications (ISCC), pp. 1530-1346, IEEE, Natal, Brazil, 2018. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{larcc:network_performance_container:ISCC:2018, title = {Evaluating, Estimating, and Improving Network Performance in Container-based Clouds}, author = {Cassiano Rista and Marcelo Teixeira and Dalvan Griebler and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/ISCC.2018.8538558}, doi = {10.1109/ISCC.2018.8538558}, year = {2018}, date = {2018-06-01}, booktitle = {IEEE Symposium on Computers and Communications (ISCC)}, pages = {1530-1346}, publisher = {IEEE}, address = {Natal, Brazil}, series = {ISCC'18}, abstract = {Cloud computing has recently attracted a great deal of interest from both industry and academia, emerging as an important paradigm to improve resource utilization, efficiency, flexibility, and pay-per-use. However, cloud platforms inherently include a virtualization layer that imposes performance degradation on network-intensive applications. Thus, it is crucial to anticipate possible performance degradation to resolve system bottlenecks. This paper uses the Petri Nets approach to create different models for evaluating, estimating, and improving network performance in container-based cloud environments. Based on model estimations, we assessed the network bandwidth utilization of the system under different setups. Then, by identifying possible bottlenecks, we show how the system could be modified to improve performance. We then tested how the model would behave through real-world experiments. When the model indicates probable bandwidth saturation, we propose a link aggregation approach to increase bandwidth, using lightweight virtualization to reduce virtualization overhead. Results reveal that our model anticipates the structural and behavioral characteristics of the network in the cloud environment. Therefore, it systematically improves network efficiency, which saves effort, time, and money.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Cloud computing has recently attracted a great deal of interest from both industry and academia, emerging as an important paradigm to improve resource utilization, efficiency, flexibility, and pay-per-use. However, cloud platforms inherently include a virtualization layer that imposes performance degradation on network-intensive applications. Thus, it is crucial to anticipate possible performance degradation to resolve system bottlenecks. This paper uses the Petri Nets approach to create different models for evaluating, estimating, and improving network performance in container-based cloud environments. Based on model estimations, we assessed the network bandwidth utilization of the system under different setups. Then, by identifying possible bottlenecks, we show how the system could be modified to improve performance. We then tested how the model would behave through real-world experiments. When the model indicates probable bandwidth saturation, we propose a link aggregation approach to increase bandwidth, using lightweight virtualization to reduce virtualization overhead. Results reveal that our model anticipates the structural and behavioral characteristics of the network in the cloud environment. Therefore, it systematically improves network efficiency, which saves effort, time, and money. Close https://doi.org/10.1109/ISCC.2018.8538558 doi:10.1109/ISCC.2018.8538558 Close
	Griebler, Dalvan; Loff, Junior; Mencagli, Gabriele; Danelutto, Marco; Fernandes, Luiz Gustavo Efficient NAS Benchmark Kernels with C++ Parallel Programming Inproceedings doi In: 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 733-740, IEEE, Cambridge, UK, 2018. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{GRIEBLER:NAS-CPP:PDP:18, title = {Efficient NAS Benchmark Kernels with C++ Parallel Programming}, author = {Dalvan Griebler and Junior Loff and Gabriele Mencagli and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/PDP2018.2018.00120}, doi = {10.1109/PDP2018.2018.00120}, year = {2018}, date = {2018-03-01}, booktitle = {26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)}, pages = {733-740}, publisher = {IEEE}, address = {Cambridge, UK}, series = {PDP'18}, abstract = {Benchmarking is a way to study the performance of new architectures and parallel programming frameworks. Well-established benchmark suites such as the NAS Parallel Benchmarks (NPB) comprise legacy codes that still lack portability to C++ language. As a consequence, a set of high-level and easy-to-use C++ parallel programming frameworks cannot be tested in NPB. Our goal is to describe a C++ porting of the NPB kernels and to analyze the performance achieved by different parallel implementations written using the Intel TBB, OpenMP and FastFlow frameworks for Multi-Cores. The experiments show an efficient code porting from Fortran to C++ and an efficient parallelization on average.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Benchmarking is a way to study the performance of new architectures and parallel programming frameworks. Well-established benchmark suites such as the NAS Parallel Benchmarks (NPB) comprise legacy codes that still lack portability to C++ language. As a consequence, a set of high-level and easy-to-use C++ parallel programming frameworks cannot be tested in NPB. Our goal is to describe a C++ porting of the NPB kernels and to analyze the performance achieved by different parallel implementations written using the Intel TBB, OpenMP and FastFlow frameworks for Multi-Cores. The experiments show an efficient code porting from Fortran to C++ and an efficient parallelization on average. Close https://doi.org/10.1109/PDP2018.2018.00120 doi:10.1109/PDP2018.2018.00120 Close
	Griebler, Dalvan; Hoffmann, Renato B.; Danelutto, Marco; Fernandes, Luiz Gustavo High-Level and Productive Stream Parallelism for Dedup, Ferret, and Bzip2 Journal Article doi In: International Journal of Parallel Programming, vol. 47, no. 1, pp. 253-271, 2018, ISSN: 1573-7640. (Abstract \| Links \| BibTeX \| Tags: ) @article{GRIEBLER:IJPP:18, title = {High-Level and Productive Stream Parallelism for Dedup, Ferret, and Bzip2}, author = {Dalvan Griebler and Renato B. Hoffmann and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1007/s10766-018-0558-x}, doi = {10.1007/s10766-018-0558-x}, issn = {1573-7640}, year = {2018}, date = {2018-02-01}, journal = {International Journal of Parallel Programming}, volume = {47}, number = {1}, pages = {253-271}, publisher = {Springer}, abstract = {Parallel programming has been a challenging task for application programmers. Stream processing is an application domain present in several scientific, enterprise, and financial areas that lack suitable abstractions to exploit parallelism. Our goal is to assess the feasibility of state-of-the-art frameworks/libraries (Pthreads, TBB, and FastFlow) and the SPar domain-specific language for real-world streaming applications (Dedup, Ferret, and Bzip2) targeting multi-core architectures. SPar was specially designed to provide high-level and productive stream parallelism abstractions, supporting programmers with standard C++-11 annotations. For the experiments, we implemented three streaming applications. We discussed SPar’s programmability advantages compared to the frameworks in terms of productivity and structured parallel programming. The results demonstrate that SPar improves productivity and provides the necessary features to achieve similar performances compared to the state-of-the-art.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close Parallel programming has been a challenging task for application programmers. Stream processing is an application domain present in several scientific, enterprise, and financial areas that lack suitable abstractions to exploit parallelism. Our goal is to assess the feasibility of state-of-the-art frameworks/libraries (Pthreads, TBB, and FastFlow) and the SPar domain-specific language for real-world streaming applications (Dedup, Ferret, and Bzip2) targeting multi-core architectures. SPar was specially designed to provide high-level and productive stream parallelism abstractions, supporting programmers with standard C++-11 annotations. For the experiments, we implemented three streaming applications. We discussed SPar’s programmability advantages compared to the frameworks in terms of productivity and structured parallel programming. The results demonstrate that SPar improves productivity and provides the necessary features to achieve similar performances compared to the state-of-the-art. Close https://doi.org/10.1007/s10766-018-0558-x doi:10.1007/s10766-018-0558-x Close
2017
	Griebler, Dalvan; Hoffmann, Renato B.; Loff, Junior; Danelutto, Marco; Fernandes, Luiz G. High-Level and Efficient Stream Parallelism on Multi-core Systems with SPar for Data Compression Applications Inproceedings In: XVIII Simpósio em Sistemas Computacionais de Alto Desempenho, pp. 16-27, SBC, Campinas, SP, Brasil, 2017. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{GRIEBLER:WSCAD:17, title = {High-Level and Efficient Stream Parallelism on Multi-core Systems with SPar for Data Compression Applications}, author = {Dalvan Griebler and Renato B. Hoffmann and Junior Loff and Marco Danelutto and Luiz G. Fernandes}, url = {https://gmap.pucrs.br/dalvan/papers/2017/CR_WSCAD_2017.pdf}, year = {2017}, date = {2017-10-01}, booktitle = {XVIII Simpósio em Sistemas Computacionais de Alto Desempenho}, pages = {16-27}, publisher = {SBC}, address = {Campinas, SP, Brasil}, abstract = {The stream processing domain is present in several real-world applications that are running on multi-core systems. In this paper, we focus on data compression applications that are an important sub-set of this domain. Our main goal is to assess the programmability and efficiency of domain-specific language called SPar. It was specially designed for expressing stream parallelism and it promises higher-level parallelism abstractions without significant performance losses. Therefore, we parallelized Lzip and Bzip2 compressors with SPar and compared with state-of-the-art frameworks. The results revealed that SPar is able to efficiently exploit stream parallelism as well as provide suitable abstractions with less code intrusion and code re-factoring.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close The stream processing domain is present in several real-world applications that are running on multi-core systems. In this paper, we focus on data compression applications that are an important sub-set of this domain. Our main goal is to assess the programmability and efficiency of domain-specific language called SPar. It was specially designed for expressing stream parallelism and it promises higher-level parallelism abstractions without significant performance losses. Therefore, we parallelized Lzip and Bzip2 compressors with SPar and compared with state-of-the-art frameworks. The results revealed that SPar is able to efficiently exploit stream parallelism as well as provide suitable abstractions with less code intrusion and code re-factoring. Close https://gmap.pucrs.br/dalvan/papers/2017/CR_WSCAD_2017.pdf Close
	Griebler, Dalvan; Hoffmann, Renato B.; Danelutto, Marco; Fernandes, Luiz Gustavo Higher-Level Parallelism Abstractions for Video Applications with SPar Inproceedings doi In: Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing, pp. 698-707, IOS Press, Bologna, Italy, 2017. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{GRIEBLER:REPARA:17, title = {Higher-Level Parallelism Abstractions for Video Applications with SPar}, author = {Dalvan Griebler and Renato B. Hoffmann and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.3233/978-1-61499-843-3-698}, doi = {10.3233/978-1-61499-843-3-698}, year = {2017}, date = {2017-09-01}, booktitle = {Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing}, pages = {698-707}, publisher = {IOS Press}, address = {Bologna, Italy}, series = {ParCo'17}, abstract = {SPar is a Domain-Specific Language (DSL) designed to provide high-level parallel programming abstractions for streaming applications. Video processing application domain requires parallel processing to extract and analyze information quickly. When using state-of-the-art frameworks such as FastFlow and TBB, the application programmer has to manage source code re-factoring and performance optimization to implement parallelism efficiently. Our goal is to make this process easier for programmers through SPar. Thus we assess SPar's programming language and its performance in traditional video applications. We also discuss different implementations compared to the ones of SPar. Results demonstrate that SPar maintains the sequential code structure, is less code intrusive, and provides higher-level programming abstractions without introducing notable performance losses. Therefore, it represents a good choice for application programmers from the video processing domain.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close SPar is a Domain-Specific Language (DSL) designed to provide high-level parallel programming abstractions for streaming applications. Video processing application domain requires parallel processing to extract and analyze information quickly. When using state-of-the-art frameworks such as FastFlow and TBB, the application programmer has to manage source code re-factoring and performance optimization to implement parallelism efficiently. Our goal is to make this process easier for programmers through SPar. Thus we assess SPar's programming language and its performance in traditional video applications. We also discuss different implementations compared to the ones of SPar. Results demonstrate that SPar maintains the sequential code structure, is less code intrusive, and provides higher-level programming abstractions without introducing notable performance losses. Therefore, it represents a good choice for application programmers from the video processing domain. Close https://doi.org/10.3233/978-1-61499-843-3-698 doi:10.3233/978-1-61499-843-3-698 Close
	Griebler, Dalvan; Fernandes, Luiz Gustavo Towards Distributed Parallel Programming Support for the SPar DSL Inproceedings doi In: Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing, pp. 563-572, IOS Press, Bologna, Italy, 2017. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{GRIEBLER:PARCO:17, title = {Towards Distributed Parallel Programming Support for the SPar DSL}, author = {Dalvan Griebler and Luiz Gustavo Fernandes}, url = {https://doi.org/10.3233/978-1-61499-843-3-563}, doi = {10.3233/978-1-61499-843-3-563}, year = {2017}, date = {2017-09-01}, booktitle = {Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing}, pages = {563-572}, publisher = {IOS Press}, address = {Bologna, Italy}, series = {ParCo'17}, abstract = {SPar was originally designed to provide high-level abstractions for stream parallelism in C++ programs targeting multi-core systems. This work proposes distributed parallel programming support for SPar targeting cluster environments. The goal is to preserve the original semantics while source-to-source code transformations will be turned into MPI (Message Passing Interface) parallel code. The results of the experiments presented in the paper demonstrate improved programmability without significant performance losses.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close SPar was originally designed to provide high-level abstractions for stream parallelism in C++ programs targeting multi-core systems. This work proposes distributed parallel programming support for SPar targeting cluster environments. The goal is to preserve the original semantics while source-to-source code transformations will be turned into MPI (Message Passing Interface) parallel code. The results of the experiments presented in the paper demonstrate improved programmability without significant performance losses. Close https://doi.org/10.3233/978-1-61499-843-3-563 doi:10.3233/978-1-61499-843-3-563 Close
	Rista, Cassiano; Griebler, Dalvan; Maron, Carlos A. F.; Fernandes, Luiz Gustavo Improving the Network Performance of a Container-Based Cloud Environment for Hadoop Systems Inproceedings doi In: International Conference on High Performance Computing & Simulation (HPCS), pp. 619-626, IEEE, Genoa, Italy, 2017. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{larcc:link_aggregation:HPCS:2017, title = {Improving the Network Performance of a Container-Based Cloud Environment for Hadoop Systems}, author = {Cassiano Rista and Dalvan Griebler and Carlos A. F. Maron and Luiz Gustavo Fernandes}, url = {http://ieeexplore.ieee.org/document/8035136/}, doi = {10.1109/HPCS.2017.97}, year = {2017}, date = {2017-07-01}, booktitle = {International Conference on High Performance Computing & Simulation (HPCS)}, pages = {619-626}, publisher = {IEEE}, address = {Genoa, Italy}, series = {HPCS'17}, abstract = {Cloud computing has emerged as an important paradigm to improve resource utilization, efficiency, flexibility, and the pay-per-use billing structure. However, cloud platforms cause performance degradations due to their virtualization layer and may not be appropriate for the requirements of high-performance applications, such as big data. This paper tackles the problem of improving network performance in container-based cloud instances to create a viable alternative to run network intensive Hadoop applications. Our approach consists of deploying link aggregation via the IEEE 802.3ad standard to increase the available bandwidth and using LXC (Linux Container) cloud instances to create a Hadoop cluster. In order to evaluate the efficiency of our approach and the overhead added by the container-based cloud environment, we ran a set of experiments to measure throughput, latency, bandwidth utilization, and completion times. The results prove that our approach adds minimal overhead in cloud environment as well as increases throughput and reduces latency. Moreover, our approach demonstrates a suitable alternative for running Hadoop applications, reducing completion times up to 33.73%}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Cloud computing has emerged as an important paradigm to improve resource utilization, efficiency, flexibility, and the pay-per-use billing structure. However, cloud platforms cause performance degradations due to their virtualization layer and may not be appropriate for the requirements of high-performance applications, such as big data. This paper tackles the problem of improving network performance in container-based cloud instances to create a viable alternative to run network intensive Hadoop applications. Our approach consists of deploying link aggregation via the IEEE 802.3ad standard to increase the available bandwidth and using LXC (Linux Container) cloud instances to create a Hadoop cluster. In order to evaluate the efficiency of our approach and the overhead added by the container-based cloud environment, we ran a set of experiments to measure throughput, latency, bandwidth utilization, and completion times. The results prove that our approach adds minimal overhead in cloud environment as well as increases throughput and reduces latency. Moreover, our approach demonstrates a suitable alternative for running Hadoop applications, reducing completion times up to 33.73% Close http://ieeexplore.ieee.org/document/8035136/ doi:10.1109/HPCS.2017.97 Close
	Ledur, Cleverson; Griebler, Dalvan; Manssour, Isabel; Fernandes, Luiz Gustavo A High-Level DSL for Geospatial Visualizations with Multi-core Parallelism Support Inproceedings doi In: 41th IEEE Computer Society Signature Conference on Computers, Software and Applications, pp. 298-304, IEEE, Torino, Italy, 2017. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{LEDUR:COMPSAC:17, title = {A High-Level DSL for Geospatial Visualizations with Multi-core Parallelism Support}, author = {Cleverson Ledur and Dalvan Griebler and Isabel Manssour and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/COMPSAC.2017.18}, doi = {10.1109/COMPSAC.2017.18}, year = {2017}, date = {2017-07-01}, booktitle = {41th IEEE Computer Society Signature Conference on Computers, Software and Applications}, pages = {298-304}, publisher = {IEEE}, address = {Torino, Italy}, series = {COMPSAC'17}, abstract = {The amount of data generated worldwide associated with geolocalization has exponentially increased over the last decade due to social networks, population demographics, and the popularization of Global Positioning Systems. Several methods for geovisualization have already been developed, but many of them are focused on a specific application or require learning a variety of tools and programming languages. It becomes even more difficult when users have to manage a large amount of data because state-of-the-art alternatives require the use of third-party pre-processing tools. We present a novel Domain-Specific Language (DSL), which focuses on large data geovisualizations. Through a compiler, we support automatic visualization generations and data pre-processing. The system takes advantage of multi-core parallelism to speed-up data pre-processing abstractly. Our experiments were designated to highlight the programming effort and performance of our DSL. The results have shown a considerable programming effort reduction and efficient parallelism support with respect to the sequential version.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close The amount of data generated worldwide associated with geolocalization has exponentially increased over the last decade due to social networks, population demographics, and the popularization of Global Positioning Systems. Several methods for geovisualization have already been developed, but many of them are focused on a specific application or require learning a variety of tools and programming languages. It becomes even more difficult when users have to manage a large amount of data because state-of-the-art alternatives require the use of third-party pre-processing tools. We present a novel Domain-Specific Language (DSL), which focuses on large data geovisualizations. Through a compiler, we support automatic visualization generations and data pre-processing. The system takes advantage of multi-core parallelism to speed-up data pre-processing abstractly. Our experiments were designated to highlight the programming effort and performance of our DSL. The results have shown a considerable programming effort reduction and efficient parallelism support with respect to the sequential version. Close https://doi.org/10.1109/COMPSAC.2017.18 doi:10.1109/COMPSAC.2017.18 Close
	Vogel, Adriano; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo An Intra-Cloud Networking Performance Evaluation on CloudStack Environment Inproceedings doi In: 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 5, IEEE, St. Petersburg, Russia, 2017. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{larcc:intra-cloud_networking_cloudstack:PDP:17, title = {An Intra-Cloud Networking Performance Evaluation on CloudStack Environment}, author = {Adriano Vogel and Dalvan Griebler and Claudio Schepke and Luiz Gustavo Fernandes}, url = {http://ieeexplore.ieee.org/document/7912689/}, doi = {10.1109/PDP.2017.40}, year = {2017}, date = {2017-03-01}, booktitle = {25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)}, pages = {5}, publisher = {IEEE}, address = {St. Petersburg, Russia}, series = {PDP'17}, abstract = {Infrastructure-as-a-Service (IaaS) is a cloud on-demand commodity built on top of virtualization technologies and managed by IaaS tools. In this scenario, performance is a relevant matter because a set of aspects may impact and increase the system overhead.Specific on the network, the use of virtualized capabilities may cause performance degradation (eg.,latency, throughput). The goal of this paper is to contribute to networking performance evaluation, providing new insights for private IaaS clouds. To achieve our goal, we deploy CloudStack environments and conduct experiments with different configurations and techniques. The research findings demonstrate that KVM-based cloud instances have small network performance degradation regarding throughput (about 0.2% for coarse-grained and 6.8% for fine-grained messages) while container-based instances have even better results. On the other hand, the KVM instances present worst latency (about 12.4% on coarse-grained and two times more on fine-grained messages w.r.t. native environment) and better in container-based instances, where the performance results are close to the native environment. Furthermore, we demonstrate a performance optimization of applications running on KVM.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Infrastructure-as-a-Service (IaaS) is a cloud on-demand commodity built on top of virtualization technologies and managed by IaaS tools. In this scenario, performance is a relevant matter because a set of aspects may impact and increase the system overhead.Specific on the network, the use of virtualized capabilities may cause performance degradation (eg.,latency, throughput). The goal of this paper is to contribute to networking performance evaluation, providing new insights for private IaaS clouds. To achieve our goal, we deploy CloudStack environments and conduct experiments with different configurations and techniques. The research findings demonstrate that KVM-based cloud instances have small network performance degradation regarding throughput (about 0.2% for coarse-grained and 6.8% for fine-grained messages) while container-based instances have even better results. On the other hand, the KVM instances present worst latency (about 12.4% on coarse-grained and two times more on fine-grained messages w.r.t. native environment) and better in container-based instances, where the performance results are close to the native environment. Furthermore, we demonstrate a performance optimization of applications running on KVM. Close http://ieeexplore.ieee.org/document/7912689/ doi:10.1109/PDP.2017.40 Close
	Griebler, Dalvan; Danelutto, Marco; Torquati, Massimo; Fernandes, Luiz Gustavo SPar: A DSL for High-Level and Productive Stream Parallelism Journal Article doi In: Parallel Processing Letters, vol. 27, no. 01, pp. 1740005, 2017. (Abstract \| Links \| BibTeX \| Tags: ) @article{GRIEBLER:PPL:17, title = {SPar: A DSL for High-Level and Productive Stream Parallelism}, author = {Dalvan Griebler and Marco Danelutto and Massimo Torquati and Luiz Gustavo Fernandes}, url = {http://dx.doi.org/10.1142/S0129626417400059}, doi = {10.1142/S0129626417400059}, year = {2017}, date = {2017-03-01}, urldate = {2017-03-01}, journal = {Parallel Processing Letters}, volume = {27}, number = {01}, pages = {1740005}, publisher = {World Scientific}, abstract = {This paper introduces SPar, an internal C++ Domain-Specific Language (DSL) that supports the development of classic stream parallel applications. The DSL uses standard C++ attributes to introduce annotations tagging the notable components of stream parallel applications: stream sources and stream processing stages. A set of tools process SPar code (C++ annotated code using the SPar attributes) to generate FastFlow C++ code that exploits the stream parallelism denoted by SPar annotations while targeting shared memory multi-core architectures. We outline the main SPar features along with the main implementation techniques and tools. Also, we show the results of experiments assessing the feasibility of the entire approach as well as SPar’s performance and expressiveness.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close This paper introduces SPar, an internal C++ Domain-Specific Language (DSL) that supports the development of classic stream parallel applications. The DSL uses standard C++ attributes to introduce annotations tagging the notable components of stream parallel applications: stream sources and stream processing stages. A set of tools process SPar code (C++ annotated code using the SPar attributes) to generate FastFlow C++ code that exploits the stream parallelism denoted by SPar annotations while targeting shared memory multi-core architectures. We outline the main SPar features along with the main implementation techniques and tools. Also, we show the results of experiments assessing the feasibility of the entire approach as well as SPar’s performance and expressiveness. Close http://dx.doi.org/10.1142/S0129626417400059 doi:10.1142/S0129626417400059 Close
2016
	Pieper, Ricardo; Griebler, Dalvan; Lovato, Adalberto Towards a Software as a Service for Biodigestor Analytics Journal Article doi In: Revista Eletrônica Argentina-Brasil de Tecnologias da Informação e da Comunicação (REABTIC), vol. 1, no. 5, pp. 15, 2016. (Abstract \| Links \| BibTeX \| Tags: ) @article{larcc:saas_analytics:REABTIC:16, title = {Towards a Software as a Service for Biodigestor Analytics}, author = {Ricardo Pieper and Dalvan Griebler and Adalberto Lovato}, url = {http://larcc.setrem.com.br/wp-content/uploads/2017/04/PIEPER_REABTIC_2016.pdf}, doi = {10.5281/zenodo.345587}, year = {2016}, date = {2016-08-01}, journal = {Revista Eletrônica Argentina-Brasil de Tecnologias da Informação e da Comunicação (REABTIC)}, volume = {1}, number = {5}, pages = {15}, publisher = {SETREM}, address = {Três de Maio, Brazil}, abstract = {The field of machine learning is becoming even more important in the last years. The ever-increasing amount of data and complexity of computational problems challenges the currently available technology. Meanwhile, anaerobic digesters represent a good alternative for renewable energy production in Brazil. However, performing efficient and accurate predictions/analytics while completely abstracting machine learning details from end-users might not be a simple task to achieve. Usually, such tools are made for a specific scenario and may not fit with particular and general needs. Our goal was to create a SaaS for biogas data analytics by using a neural network. Therefore, an open source, cloud-enabled SaaS (Software as a Service) was developed and deployed in LARCC (Laboratory of Advanced Researches on Cloud Computing) at SETREM. The results have shown the SaaS application is able to perform predictions. The neural network's accuracy is not significantly worse than a state-of-the-art implementation, and its training speed is faster. The user interface demonstrates to be intuitive, and the predictions were accurate when providing the training algorithm with sufficient data. In addition, the file processing and network training time were good enough under traditional workload conditions.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close The field of machine learning is becoming even more important in the last years. The ever-increasing amount of data and complexity of computational problems challenges the currently available technology. Meanwhile, anaerobic digesters represent a good alternative for renewable energy production in Brazil. However, performing efficient and accurate predictions/analytics while completely abstracting machine learning details from end-users might not be a simple task to achieve. Usually, such tools are made for a specific scenario and may not fit with particular and general needs. Our goal was to create a SaaS for biogas data analytics by using a neural network. Therefore, an open source, cloud-enabled SaaS (Software as a Service) was developed and deployed in LARCC (Laboratory of Advanced Researches on Cloud Computing) at SETREM. The results have shown the SaaS application is able to perform predictions. The neural network's accuracy is not significantly worse than a state-of-the-art implementation, and its training speed is faster. The user interface demonstrates to be intuitive, and the predictions were accurate when providing the training algorithm with sufficient data. In addition, the file processing and network training time were good enough under traditional workload conditions. Close http://larcc.setrem.com.br/wp-content/uploads/2017/04/PIEPER_REABTIC_2016.pdf doi:10.5281/zenodo.345587 Close
	Griebler, Dalvan Domain-Specific Language & Support Tool for High-Level Stream Parallelism PhD Thesis Faculdade de Informática - PPGCC - PUCRS, 2016. (Abstract \| Links \| BibTeX \| Tags: ) @phdthesis{GRIEBLER:PHD:16, title = {Domain-Specific Language & Support Tool for High-Level Stream Parallelism}, author = {Dalvan Griebler}, url = {http://tede2.pucrs.br/tede2/handle/tede/6776}, year = {2016}, date = {2016-06-01}, address = {Porto Alegre, Brazil}, school = {Faculdade de Informática - PPGCC - PUCRS}, abstract = {Stream-based systems are representative of several application domains including video, audio, networking, graphic processing, etc. Stream programs may run on different kinds of parallel architectures (desktop, servers, cell phones, and supercomputers) and represent significant workloads on our current computing systems. Nevertheless, most of them are still not parallelized. Moreover, when new software has to be developed, programmers often face a trade-off between coding productivity, code portability, and performance. To solve this problem, we provide a new Domain-Specific Language (DSL) that naturally/on-the-fly captures and represents parallelism for stream-based applications. The aim is to offer a set of attributes (through annotations) that preserves the program's source code and is not architecture-dependent for annotating parallelism. We used the C++ attribute mechanism to design a ``textitde-facto'' standard C++ embedded DSL named SPar. However, the implementation of DSLs using compiler-based tools is difficult, complicated, and usually requires a significant learning curve. This is even harder for those who are not familiar with compiler technology. Therefore, our motivation is to simplify this path for other researchers (experts in their domain) with support tools (our tool is CINCLE) to create high-level and productive DSLs through powerful and aggressive source-to-source transformations. In fact, parallel programmers can use their expertise without having to design and implement low-level code. The main goal of this thesis was to create a DSL and support tools for high-level stream parallelism in the context of a programming framework that is compiler-based and domain-oriented. Thus, we implemented SPar using CINCLE. SPar supports the software developer with productivity, performance, and code portability while CINCLE provides sufficient support to generate new DSLs. Also, SPar targets source-to-source transformation producing parallel pattern code built on top of FastFlow and MPI. Finally, we provide a full set of experiments showing that SPar provides better coding productivity without significant performance degradation in multi-core systems as well as transformation rules that are able to achieve code portability (for cluster architectures) through its generalized attributes.}, keywords = {}, pubstate = {published}, tppubtype = {phdthesis} } Close Stream-based systems are representative of several application domains including video, audio, networking, graphic processing, etc. Stream programs may run on different kinds of parallel architectures (desktop, servers, cell phones, and supercomputers) and represent significant workloads on our current computing systems. Nevertheless, most of them are still not parallelized. Moreover, when new software has to be developed, programmers often face a trade-off between coding productivity, code portability, and performance. To solve this problem, we provide a new Domain-Specific Language (DSL) that naturally/on-the-fly captures and represents parallelism for stream-based applications. The aim is to offer a set of attributes (through annotations) that preserves the program's source code and is not architecture-dependent for annotating parallelism. We used the C++ attribute mechanism to design a ``textitde-facto'' standard C++ embedded DSL named SPar. However, the implementation of DSLs using compiler-based tools is difficult, complicated, and usually requires a significant learning curve. This is even harder for those who are not familiar with compiler technology. Therefore, our motivation is to simplify this path for other researchers (experts in their domain) with support tools (our tool is CINCLE) to create high-level and productive DSLs through powerful and aggressive source-to-source transformations. In fact, parallel programmers can use their expertise without having to design and implement low-level code. The main goal of this thesis was to create a DSL and support tools for high-level stream parallelism in the context of a programming framework that is compiler-based and domain-oriented. Thus, we implemented SPar using CINCLE. SPar supports the software developer with productivity, performance, and code portability while CINCLE provides sufficient support to generate new DSLs. Also, SPar targets source-to-source transformation producing parallel pattern code built on top of FastFlow and MPI. Finally, we provide a full set of experiments showing that SPar provides better coding productivity without significant performance degradation in multi-core systems as well as transformation rules that are able to achieve code portability (for cluster architectures) through its generalized attributes. Close http://tede2.pucrs.br/tede2/handle/tede/6776 Close
	Vogel, Adriano; Griebler, Dalvan; Maron, Carlos A. F.; Schepke, Claudio; Fernandes, Luiz Gustavo Private IaaS Clouds: A Comparative Analysis of OpenNebula, CloudStack and OpenStack Inproceedings doi In: 24th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 672-679, IEEE, Heraklion Crete, Greece, 2016. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{larcc:IaaS_private:PDP:16, title = {Private IaaS Clouds: A Comparative Analysis of OpenNebula, CloudStack and OpenStack}, author = {Adriano Vogel and Dalvan Griebler and Carlos A. F. Maron and Claudio Schepke and Luiz Gustavo Fernandes}, url = {http://ieeexplore.ieee.org/document/7445407/}, doi = {10.1109/PDP.2016.75}, year = {2016}, date = {2016-02-01}, booktitle = {24th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)}, pages = {672-679}, publisher = {IEEE}, address = {Heraklion Crete, Greece}, series = {PDP'16}, abstract = {Despite the evolution of cloud computing in recent years, the performance and comprehensive understanding of the available private cloud tools are still under research. This paper contributes to an analysis of the Infrastructure as a Service (IaaS) domain by mapping new insights and discussing the challenges for improving cloud services. The goal is to make a comparative analysis of OpenNebula, OpenStack and CloudStack tools, evaluating their differences on support for flexibility and resiliency. Also, we aim at evaluating these three cloud tools when they are deployed using a mutual hypervisor (KVM) for discovering new empirical insights. Our research results demonstrated that OpenStack is the most resilient and CloudStack is the most flexible for deploying an IaaS private cloud. Moreover, the performance experiments indicated some contrasts among the private IaaS cloud instances when running intensive workloads and scientific applications.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Despite the evolution of cloud computing in recent years, the performance and comprehensive understanding of the available private cloud tools are still under research. This paper contributes to an analysis of the Infrastructure as a Service (IaaS) domain by mapping new insights and discussing the challenges for improving cloud services. The goal is to make a comparative analysis of OpenNebula, OpenStack and CloudStack tools, evaluating their differences on support for flexibility and resiliency. Also, we aim at evaluating these three cloud tools when they are deployed using a mutual hypervisor (KVM) for discovering new empirical insights. Our research results demonstrated that OpenStack is the most resilient and CloudStack is the most flexible for deploying an IaaS private cloud. Moreover, the performance experiments indicated some contrasts among the private IaaS cloud instances when running intensive workloads and scientific applications. Close http://ieeexplore.ieee.org/document/7445407/ doi:10.1109/PDP.2016.75 Close
2015
	Adornes, Daniel; Griebler, Dalvan; Ledur, Cleverson; Fernandes, Luiz G. Coding Productivity in MapReduce Applications for Distributed and Shared Memory Architectures Journal Article doi In: International Journal of Software Engineering and Knowledge Engineering, vol. 25, no. 10, pp. 1739-1741, 2015. (Abstract \| Links \| BibTeX \| Tags: ) @article{ADORNES:IJSEKE:15, title = {Coding Productivity in MapReduce Applications for Distributed and Shared Memory Architectures}, author = {Daniel Adornes and Dalvan Griebler and Cleverson Ledur and Luiz G. Fernandes}, url = {http://dx.doi.org/10.1142/S0218194015710096}, doi = {10.1142/S0218194015710096}, year = {2015}, date = {2015-12-01}, urldate = {2015-12-01}, journal = {International Journal of Software Engineering and Knowledge Engineering}, volume = {25}, number = {10}, pages = {1739-1741}, publisher = {World Scientific}, abstract = {MapReduce was originally proposed as a suitable and efficient approach for analyzing and processing large amounts of data. Since then, many researches contributed with MapReduce implementations for distributed and shared memory architectures. Nevertheless, different architectural levels require different optimization strategies in order to achieve high-performance computing. Such strategies in turn have caused very different MapReduce programming interfaces among these researches. This paper presents some research notes on coding productivity when developing MapReduce applications for distributed and shared memory architectures. As a case study, we introduce our current research on a unified MapReduce domain-specific language with code generation for Hadoop and Phoenix++, which has achieved a coding productivity increase from 41.84% and up to 94.71% without significant performance losses (below 3%) compared to those frameworks.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close MapReduce was originally proposed as a suitable and efficient approach for analyzing and processing large amounts of data. Since then, many researches contributed with MapReduce implementations for distributed and shared memory architectures. Nevertheless, different architectural levels require different optimization strategies in order to achieve high-performance computing. Such strategies in turn have caused very different MapReduce programming interfaces among these researches. This paper presents some research notes on coding productivity when developing MapReduce applications for distributed and shared memory architectures. As a case study, we introduce our current research on a unified MapReduce domain-specific language with code generation for Hadoop and Phoenix++, which has achieved a coding productivity increase from 41.84% and up to 94.71% without significant performance losses (below 3%) compared to those frameworks. Close http://dx.doi.org/10.1142/S0218194015710096 doi:10.1142/S0218194015710096 Close
	Ledur, Cleverson; Griebler, Dalvan; Manssour, Isabel; Fernandes, Luiz G. Towards a Domain-Specific Language for Geospatial Data Visualization Maps with Big Data Sets Inproceedings doi In: ACS/IEEE International Conference on Computer Systems and Applications, pp. 8, IEEE, Marrakech, Marrocos, 2015. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{LEDUR:AICCSA:15, title = {Towards a Domain-Specific Language for Geospatial Data Visualization Maps with Big Data Sets}, author = {Cleverson Ledur and Dalvan Griebler and Isabel Manssour and Luiz G. Fernandes}, url = {http://dx.doi.org/10.1109/AICCSA.2015.7507178}, doi = {10.1109/AICCSA.2015.7507178}, year = {2015}, date = {2015-11-01}, booktitle = {ACS/IEEE International Conference on Computer Systems and Applications}, pages = {8}, publisher = {IEEE}, address = {Marrakech, Marrocos}, series = {AICCSA'15}, abstract = {Data visualization is an alternative for representing information and helping people gain faster insights. However, the programming/creating of a visualization for large data sets is still a challenging task for users with low-level of software development knowledge. Our goal is to increase the productivity of experts who are familiar with the application domain. Therefore, we proposed an external Domain-Specific Language (DSL) that allows massive input of raw data and provides a small dictionary with suitable data visualization keywords. Also, we implemented it to support efficient data filtering operations and generate HTML or Javascript output code files (using Google Maps API). To measure the potential of our DSL, we evaluated four types of geospatial data visualization maps with four different technologies. The experiment results demonstrated a productivity gain when compared to the traditional way of implementing (e.g., Google Maps API, OpenLayers, and Leaflet), and efficient algorithm implementation.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Data visualization is an alternative for representing information and helping people gain faster insights. However, the programming/creating of a visualization for large data sets is still a challenging task for users with low-level of software development knowledge. Our goal is to increase the productivity of experts who are familiar with the application domain. Therefore, we proposed an external Domain-Specific Language (DSL) that allows massive input of raw data and provides a small dictionary with suitable data visualization keywords. Also, we implemented it to support efficient data filtering operations and generate HTML or Javascript output code files (using Google Maps API). To measure the potential of our DSL, we evaluated four types of geospatial data visualization maps with four different technologies. The experiment results demonstrated a productivity gain when compared to the traditional way of implementing (e.g., Google Maps API, OpenLayers, and Leaflet), and efficient algorithm implementation. Close http://dx.doi.org/10.1109/AICCSA.2015.7507178 doi:10.1109/AICCSA.2015.7507178 Close
	Griebler, Dalvan; Danelutto, Marco; Torquati, Massimo; Fernandes, Luiz G. An Embedded C++ Domain-Specific Language for Stream Parallelism Inproceedings doi In: Parallel Computing: On the Road to Exascale, Proceedings of the International Conference on Parallel Computing, pp. 317-326, IOS Press, Edinburgh, Scotland, UK, 2015. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{GRIEBLER:PARCO:15, title = {An Embedded C++ Domain-Specific Language for Stream Parallelism}, author = {Dalvan Griebler and Marco Danelutto and Massimo Torquati and Luiz G. Fernandes}, url = {http://dx.doi.org/10.3233/978-1-61499-621-7-317}, doi = {10.3233/978-1-61499-621-7-317}, year = {2015}, date = {2015-09-01}, booktitle = {Parallel Computing: On the Road to Exascale, Proceedings of the International Conference on Parallel Computing}, pages = {317-326}, publisher = {IOS Press}, address = {Edinburgh, Scotland, UK}, series = {ParCo'15}, abstract = {This paper proposes a new C++ embedded Domain-Specific Language (DSL) for expressing stream parallelism by using standard C++11 attributes annotations. The main goal is to introduce high-level parallel abstractions for developing stream based parallel programs as well as reducing sequential source code rewriting. We demonstrated that by using a small set of attributes it is possible to produce different parallel versions depending on the way the source code is annotated. The performances of the parallel code produced are comparable with those obtained by manual parallelization.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close This paper proposes a new C++ embedded Domain-Specific Language (DSL) for expressing stream parallelism by using standard C++11 attributes annotations. The main goal is to introduce high-level parallel abstractions for developing stream based parallel programs as well as reducing sequential source code rewriting. We demonstrated that by using a small set of attributes it is possible to produce different parallel versions depending on the way the source code is annotated. The performances of the parallel code produced are comparable with those obtained by manual parallelization. Close http://dx.doi.org/10.3233/978-1-61499-621-7-317 doi:10.3233/978-1-61499-621-7-317 Close
	Adornes, Daniel; Griebler, Dalvan; Ledur, Cleverson; Fernandes, Luiz G. A Unified MapReduce Domain-Specific Language for Distributed and Shared Memory Architectures Inproceedings doi In: The 27th International Conference on Software Engineering & Knowledge Engineering, pp. 6, Knowledge Systems Institute Graduate School, Pittsburgh, USA, 2015. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{ADORNES:SEKE:15, title = {A Unified MapReduce Domain-Specific Language for Distributed and Shared Memory Architectures}, author = {Daniel Adornes and Dalvan Griebler and Cleverson Ledur and Luiz G. Fernandes}, url = {http://dx.doi.org/10.18293/SEKE2015-204}, doi = {10.18293/SEKE2015-204}, year = {2015}, date = {2015-07-01}, booktitle = {The 27th International Conference on Software Engineering & Knowledge Engineering}, pages = {6}, publisher = {Knowledge Systems Institute Graduate School}, address = {Pittsburgh, USA}, abstract = {MapReduce is a suitable and efficient parallel programming pattern for processing big data analysis. In recent years, many frameworks/languages have implemented this pattern to achieve high performance in data mining applications, particularly for distributed memory architectures (e.g., clusters). Nevertheless, the industry of processors is now able to offer powerful processing on single machines (e.g., multi-core). Thus, these applications may address the parallelism in another architectural level. The target problems of this paper are code reuse and programming effort reduction since current solutions do not provide a single interface to deal with these two architectural levels. Therefore, we propose a unified domain-specific language in conjunction with transformation rules for code generation for Hadoop and Phoenix++. We selected these frameworks as state-of-the-art MapReduce implementations for distributed and shared memory architectures, respectively. Our solution achieves a programming effort reduction from 41.84% and up to 95.43% without significant performance losses (below the threshold of 3%) compared to Hadoop and Phoenix++.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close MapReduce is a suitable and efficient parallel programming pattern for processing big data analysis. In recent years, many frameworks/languages have implemented this pattern to achieve high performance in data mining applications, particularly for distributed memory architectures (e.g., clusters). Nevertheless, the industry of processors is now able to offer powerful processing on single machines (e.g., multi-core). Thus, these applications may address the parallelism in another architectural level. The target problems of this paper are code reuse and programming effort reduction since current solutions do not provide a single interface to deal with these two architectural levels. Therefore, we propose a unified domain-specific language in conjunction with transformation rules for code generation for Hadoop and Phoenix++. We selected these frameworks as state-of-the-art MapReduce implementations for distributed and shared memory architectures, respectively. Our solution achieves a programming effort reduction from 41.84% and up to 95.43% without significant performance losses (below the threshold of 3%) compared to Hadoop and Phoenix++. Close http://dx.doi.org/10.18293/SEKE2015-204 doi:10.18293/SEKE2015-204 Close
2014
	Griebler, Dalvan; Adornes, Daniel; Fernandes, Luiz G. Performance and Usability Evaluation of a Pattern-Oriented Parallel Programming Interface for Multi-Core Architectures Inproceedings In: The 26th International Conference on Software Engineering & Knowledge Engineering, pp. 25-30, Knowledge Systems Institute Graduate School, Vancouver, Canada, 2014. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{GRIEBLER:SEKE:14, title = {Performance and Usability Evaluation of a Pattern-Oriented Parallel Programming Interface for Multi-Core Architectures}, author = {Dalvan Griebler and Daniel Adornes and Luiz G. Fernandes}, url = {https://gmap.pucrs.br/dalvan/papers/2014/CR_SEKE_2014.pdf}, year = {2014}, date = {2014-07-01}, booktitle = {The 26th International Conference on Software Engineering & Knowledge Engineering}, pages = {25-30}, publisher = {Knowledge Systems Institute Graduate School}, address = {Vancouver, Canada}, abstract = {Multi-core architectures have increased the power of parallelism by coupling many cores in a single chip. This becomes even more complex for developers to exploit the avail-able parallelism in order to provide high performance scalable programs. To address these challenges, we propose the DSL-POPP (Domain-Specific Language for Pattern-Oriented Parallel Programming), which links the pattern-based approach in the programming interface as an alternative to reduce the effort of parallel software development, and achieve good performance in some applications. In this paper, the objective is to evaluate the usability and performance of the master/slave pattern and compare it to the Pthreads library. Moreover, experiments have shown that the master/slave interface of the DSL-POPP reduces up to 50% of the programming effort, without significantly affecting the performance.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Multi-core architectures have increased the power of parallelism by coupling many cores in a single chip. This becomes even more complex for developers to exploit the avail-able parallelism in order to provide high performance scalable programs. To address these challenges, we propose the DSL-POPP (Domain-Specific Language for Pattern-Oriented Parallel Programming), which links the pattern-based approach in the programming interface as an alternative to reduce the effort of parallel software development, and achieve good performance in some applications. In this paper, the objective is to evaluate the usability and performance of the master/slave pattern and compare it to the Pthreads library. Moreover, experiments have shown that the master/slave interface of the DSL-POPP reduces up to 50% of the programming effort, without significantly affecting the performance. Close https://gmap.pucrs.br/dalvan/papers/2014/CR_SEKE_2014.pdf Close
2013
	Griebler, Dalvan; Fernandes, Luiz G. Towards a Domain-Specific Language for Patterns-Oriented Parallel Programming Inproceedings doi In: Programming Languages - 17th Brazilian Symposium - SBLP, pp. 105-119, Springer Berlin Heidelberg, Brasilia, Brazil, 2013. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{GRIEBLER:SBLP:13, title = {Towards a Domain-Specific Language for Patterns-Oriented Parallel Programming}, author = {Dalvan Griebler and Luiz G. Fernandes}, url = {http://dx.doi.org/10.1007/978-3-642-40922-6_8}, doi = {10.1007/978-3-642-40922-6_8}, year = {2013}, date = {2013-10-01}, booktitle = {Programming Languages - 17th Brazilian Symposium - SBLP}, volume = {8129}, pages = {105-119}, publisher = {Springer Berlin Heidelberg}, address = {Brasilia, Brazil}, series = {Lecture Notes in Computer Science}, abstract = {Pattern-oriented programming has been used in parallel code development for many years now. During this time, several tools (mainly frameworks and libraries) proposed the use of patterns based on programming primitives or templates. The implementation of patterns using those tools usually requires human expertise to correctly set up communication/synchronization among processes. In this work, we propose the use of a Domain Specific Language to create pattern-oriented parallel programs (DSL-POPP). This approach has the advantage of offering a higher programming abstraction level in which communication/synchronization among processes is hidden from programmers. We compensate the reduction in programming flexibility offering the possibility to use combined and/or nested parallel patterns (i.e., parallelism in levels), allowing the design of more complex parallel applications. We conclude this work presenting an experiment in which we develop a parallel application exploiting combined and nested parallel patterns in order to demonstrate the main properties of DSL-POPP.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Pattern-oriented programming has been used in parallel code development for many years now. During this time, several tools (mainly frameworks and libraries) proposed the use of patterns based on programming primitives or templates. The implementation of patterns using those tools usually requires human expertise to correctly set up communication/synchronization among processes. In this work, we propose the use of a Domain Specific Language to create pattern-oriented parallel programs (DSL-POPP). This approach has the advantage of offering a higher programming abstraction level in which communication/synchronization among processes is hidden from programmers. We compensate the reduction in programming flexibility offering the possibility to use combined and/or nested parallel patterns (i.e., parallelism in levels), allowing the design of more complex parallel applications. We conclude this work presenting an experiment in which we develop a parallel application exploiting combined and nested parallel patterns in order to demonstrate the main properties of DSL-POPP. Close http://dx.doi.org/10.1007/978-3-642-40922-6_8 doi:10.1007/978-3-642-40922-6_8 Close

88 entries « ‹ 2 of 2 › »

2019
	Pieper, Ricardo; Griebler, Dalvan; Fernandes, Luiz G. Structured Stream Parallelism for Rust Inproceedings doi In: XXIII Brazilian Symposium on Programming Languages (SBLP), pp. 54-61, ACM, Salvador, Brazil, 2019. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{PIEPER:SBLP:19, title = {Structured Stream Parallelism for Rust}, author = {Ricardo Pieper and Dalvan Griebler and Luiz G. Fernandes}, url = {https://doi.org/10.1145/3355378.3355384}, doi = {10.1145/3355378.3355384}, year = {2019}, date = {2019-10-01}, booktitle = {XXIII Brazilian Symposium on Programming Languages (SBLP)}, pages = {54-61}, publisher = {ACM}, address = {Salvador, Brazil}, series = {SBLP'19}, abstract = {Structured parallel programming has been studied and applied in several programming languages. This approach has proven to be suitable for abstracting low-level and architecture-dependent parallelism implementations. Our goal is to provide a structured and high-level library for the Rust language, targeting parallel stream processing applications for multi-core servers. Rust is an emerging programming language that has been developed by Mozilla Research group, focusing on performance, memory safety, and thread-safety. However, it lacks parallel programming abstractions, especially for stream processing applications. This paper contributes to a new API based on the structured parallel programming approach to simplify parallel software developing. Our experiments highlight that our solution provides higher-level parallel programming abstractions for stream processing applications in Rust. We also show that the throughput and speedup are comparable to the state-of-the-art for certain workloads.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Structured parallel programming has been studied and applied in several programming languages. This approach has proven to be suitable for abstracting low-level and architecture-dependent parallelism implementations. Our goal is to provide a structured and high-level library for the Rust language, targeting parallel stream processing applications for multi-core servers. Rust is an emerging programming language that has been developed by Mozilla Research group, focusing on performance, memory safety, and thread-safety. However, it lacks parallel programming abstractions, especially for stream processing applications. This paper contributes to a new API based on the structured parallel programming approach to simplify parallel software developing. Our experiments highlight that our solution provides higher-level parallel programming abstractions for stream processing applications in Rust. We also show that the throughput and speedup are comparable to the state-of-the-art for certain workloads. Close https://doi.org/10.1145/3355378.3355384 doi:10.1145/3355378.3355384 Close
	Mencagli, Gabriele; Torquati, Massimo; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo L. Raising the Parallel Abstraction Level for Streaming Analytics Applications Journal Article doi In: IEEE Access, vol. 7, pp. 131944 - 131961, 2019. (Abstract \| Links \| BibTeX \| Tags: ) @article{MENCAGLI:IEEEAccess:19, title = {Raising the Parallel Abstraction Level for Streaming Analytics Applications}, author = {Gabriele Mencagli and Massimo Torquati and Dalvan Griebler and Marco Danelutto and Luiz Gustavo L. Fernandes}, url = {https://doi.org/10.1109/ACCESS.2019.2941183}, doi = {10.1109/ACCESS.2019.2941183}, year = {2019}, date = {2019-09-01}, journal = {IEEE Access}, volume = {7}, pages = {131944 - 131961}, publisher = {IEEE}, abstract = {In the stream processing domain, applications are represented by graphs of operators arbitrarily connected and filled with their business logic code. The APIs of existing Stream Processing Systems (SPSs) ease the development of transformations that recur in the streaming practice (e.g., filtering, aggregation and joins). In contrast, their parallelism abstractions are quite limited since they provide support to stateless operators only, or when the state is organized in a set of key-value pairs. This paper presents how the parallel patterns methodology can be revisited for sliding-window streaming analytics. Our vision fosters a design process of the application as composition and nesting of ready-to-use patterns provided through a C++17 fluent interface. Our prototype implements the run-time system of the patterns in the FastFlow parallel library expressing thread-based parallelism. The experimental analysis shows interesting outcomes. First, our pattern-based approach allows easy prototyping of different versions of the application, and the programmer can leverage nesting of patterns to increase performance (up to 37% in one of the two considered test-bed cases). Second, our FastFlow implementation outperforms (three times faster) the handmade porting of our patterns in popular JVM-based SPSs. Finally, in the concluding part of this paper, we explore the use of a task-based run-time system, by deriving interesting insights into how to make our patterns library suitable for multi backends.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close In the stream processing domain, applications are represented by graphs of operators arbitrarily connected and filled with their business logic code. The APIs of existing Stream Processing Systems (SPSs) ease the development of transformations that recur in the streaming practice (e.g., filtering, aggregation and joins). In contrast, their parallelism abstractions are quite limited since they provide support to stateless operators only, or when the state is organized in a set of key-value pairs. This paper presents how the parallel patterns methodology can be revisited for sliding-window streaming analytics. Our vision fosters a design process of the application as composition and nesting of ready-to-use patterns provided through a C++17 fluent interface. Our prototype implements the run-time system of the patterns in the FastFlow parallel library expressing thread-based parallelism. The experimental analysis shows interesting outcomes. First, our pattern-based approach allows easy prototyping of different versions of the application, and the programmer can leverage nesting of patterns to increase performance (up to 37% in one of the two considered test-bed cases). Second, our FastFlow implementation outperforms (three times faster) the handmade porting of our patterns in popular JVM-based SPSs. Finally, in the concluding part of this paper, we explore the use of a task-based run-time system, by deriving interesting insights into how to make our patterns library suitable for multi backends. Close https://doi.org/10.1109/ACCESS.2019.2941183 doi:10.1109/ACCESS.2019.2941183 Close
	Fischer, Gabriel Souto; Righi, Rodrigo Rosa; Costa, Cristiano André; Galante, Guilherme; Griebler, Dalvan Towards Evaluating Proactive and Reactive Approaches on Reorganizing Human Resources in IoT-Based Smart Hospitals Journal Article doi In: Sensors, vol. 19, no. 17, pp. 3800, 2019. (Abstract \| Links \| BibTeX \| Tags: ) @article{FISHER:Elasticity-Hospital:SENSORS:19, title = {Towards Evaluating Proactive and Reactive Approaches on Reorganizing Human Resources in IoT-Based Smart Hospitals}, author = {Gabriel Souto Fischer and Rodrigo Rosa Righi and Cristiano André Costa and Guilherme Galante and Dalvan Griebler}, url = {https://doi.org/10.3390/s19173800}, doi = {10.3390/s19173800}, year = {2019}, date = {2019-09-01}, urldate = {2019-09-01}, journal = {Sensors}, volume = {19}, number = {17}, pages = {3800}, publisher = {MDPI}, abstract = {Hospitals play an important role on ensuring a proper treatment of human health. One of the problems to be faced is the increasingly overcrowded patients care queues, who end up waiting for longer times without proper treatment to their health problems. The allocation of health professionals in hospital environments is not able to adapt to the demands of patients. There are times when underused rooms have idle professionals, and overused rooms have fewer professionals than necessary. Previous works have not solved this problem since they focus on understanding the evolution of doctor supply and patient demand, as to better adjust one to the other. However, they have not proposed concrete solutions for that regarding techniques for better allocating available human resources. Moreover, elasticity is one of the most important features of cloud computing, referring to the ability to add or remove resources according to the needs of the application or service. Based on this background, we introduce Elastic allocation of human resources in Healthcare environments (ElHealth) an IoT-focused model able to monitor patient usage of hospital rooms and adapt these rooms for patients demand. Using reactive and proactive elasticity approaches, ElHealth identifies when a room will have a demand that exceeds the capacity of care, and proposes actions to move human resources to adapt to patient demand. Our main contribution is the definition of Human Resources IoT-based Elasticity (i.e., an extension of the concept of resource elasticity in Cloud Computing to manage the use of human resources in a healthcare environment, where health professionals are allocated and deallocated according to patient demand). Another contribution is a cost–benefit analysis for the use of reactive and predictive strategies on human resources reorganization. ElHealth was simulated on a hospital environment using data from a Brazilian polyclinic, and obtained promising results, decreasing the waiting time by up to 96.4% and 96.73% in reactive and proactive approaches, respectively.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close Hospitals play an important role on ensuring a proper treatment of human health. One of the problems to be faced is the increasingly overcrowded patients care queues, who end up waiting for longer times without proper treatment to their health problems. The allocation of health professionals in hospital environments is not able to adapt to the demands of patients. There are times when underused rooms have idle professionals, and overused rooms have fewer professionals than necessary. Previous works have not solved this problem since they focus on understanding the evolution of doctor supply and patient demand, as to better adjust one to the other. However, they have not proposed concrete solutions for that regarding techniques for better allocating available human resources. Moreover, elasticity is one of the most important features of cloud computing, referring to the ability to add or remove resources according to the needs of the application or service. Based on this background, we introduce Elastic allocation of human resources in Healthcare environments (ElHealth) an IoT-focused model able to monitor patient usage of hospital rooms and adapt these rooms for patients demand. Using reactive and proactive elasticity approaches, ElHealth identifies when a room will have a demand that exceeds the capacity of care, and proposes actions to move human resources to adapt to patient demand. Our main contribution is the definition of Human Resources IoT-based Elasticity (i.e., an extension of the concept of resource elasticity in Cloud Computing to manage the use of human resources in a healthcare environment, where health professionals are allocated and deallocated according to patient demand). Another contribution is a cost–benefit analysis for the use of reactive and predictive strategies on human resources reorganization. ElHealth was simulated on a hospital environment using data from a Brazilian polyclinic, and obtained promising results, decreasing the waiting time by up to 96.4% and 96.73% in reactive and proactive approaches, respectively. Close https://doi.org/10.3390/s19173800 doi:10.3390/s19173800 Close
	Rockenbach, Dinei A.; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo High-Level Stream Parallelism Abstractions with SPar Targeting GPUs Inproceedings doi In: Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing (ParCo), pp. 543-552, IOS Press, Prague, Czech Republic, 2019. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{ROCKENBACH:PARCO:19, title = {High-Level Stream Parallelism Abstractions with SPar Targeting GPUs}, author = {Dinei A. Rockenbach and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.3233/APC200083}, doi = {10.3233/APC200083}, year = {2019}, date = {2019-09-01}, booktitle = {Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing (ParCo)}, volume = {36}, pages = {543-552}, publisher = {IOS Press}, address = {Prague, Czech Republic}, series = {ParCo'19}, abstract = {The combined exploitation of stream and data parallelism is demonstrating encouraging performance results in the literature for heterogeneous architectures, which are present on every computer systems today. However, provide parallel software efficiently targeting those architectures requires significant programming effort and expertise. The SPar domain-specific language already represents a solution to this problem providing proven high-level programming abstractions for multi-core architectures. In this paper, we enrich the SPar language adding support for GPUs. New transformation rules are designed for generating parallel code using stream and data parallel patterns. Our experiments revealed that these transformations rules are able to improve performance while the high-level programming abstractions are maintained.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close The combined exploitation of stream and data parallelism is demonstrating encouraging performance results in the literature for heterogeneous architectures, which are present on every computer systems today. However, provide parallel software efficiently targeting those architectures requires significant programming effort and expertise. The SPar domain-specific language already represents a solution to this problem providing proven high-level programming abstractions for multi-core architectures. In this paper, we enrich the SPar language adding support for GPUs. New transformation rules are designed for generating parallel code using stream and data parallel patterns. Our experiments revealed that these transformations rules are able to improve performance while the high-level programming abstractions are maintained. Close https://doi.org/10.3233/APC200083 doi:10.3233/APC200083 Close
	Vogel, Adriano; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo Seamless Parallelism Management for Multi-core Stream Processing Inproceedings doi In: Advances in Parallel Computing, Proceedings of the International Conference on Parallel Computing (ParCo), pp. 533-542, IOS Press, Prague, Czech Republic, 2019. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{VOGEL:PARCO:19, title = {Seamless Parallelism Management for Multi-core Stream Processing}, author = {Adriano Vogel and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.3233/APC200082}, doi = {10.3233/APC200082}, year = {2019}, date = {2019-09-01}, booktitle = {Advances in Parallel Computing, Proceedings of the International Conference on Parallel Computing (ParCo)}, volume = {36}, pages = {533-542}, publisher = {IOS Press}, address = {Prague, Czech Republic}, series = {ParCo'19}, abstract = {Video streaming applications have critical performance requirements for dealing with fluctuating workloads and providing results in real-time. As a consequence, the majority of these applications demand parallelism for delivering quality of service to users. Although high-level and structured parallel programming aims at facilitating parallelism exploitation, there are still several issues to be addressed for increasing/improving existing parallel programming abstractions. In this paper, we aim at employing self-adaptivity for stream processing in order to seamlessly manage the application parallelism configurations at run-time, where a new strategy alleviates from application programmers the need to set time-consuming and error-prone parallelism parameters. The new strategy was implemented and validated on SPar. The results have shown that the proposed solution increases the level of abstraction and achieved a competitive performance.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Video streaming applications have critical performance requirements for dealing with fluctuating workloads and providing results in real-time. As a consequence, the majority of these applications demand parallelism for delivering quality of service to users. Although high-level and structured parallel programming aims at facilitating parallelism exploitation, there are still several issues to be addressed for increasing/improving existing parallel programming abstractions. In this paper, we aim at employing self-adaptivity for stream processing in order to seamlessly manage the application parallelism configurations at run-time, where a new strategy alleviates from application programmers the need to set time-consuming and error-prone parallelism parameters. The new strategy was implemented and validated on SPar. The results have shown that the proposed solution increases the level of abstraction and achieved a competitive performance. Close https://doi.org/10.3233/APC200082 doi:10.3233/APC200082 Close
	Vogel, Adriano; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo Minimizing Self-Adaptation Overhead in Parallel Stream Processing for Multi-Cores Inproceedings doi In: Euro-Par 2019: Parallel Processing Workshops, pp. 12, Springer, Göttingen, Germany, 2019. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{VOGEL:adaptive-overhead:AutoDaSP:19, title = {Minimizing Self-Adaptation Overhead in Parallel Stream Processing for Multi-Cores}, author = {Adriano Vogel and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1007/978-3-030-48340-1_3}, doi = {10.1007/978-3-030-48340-1_3}, year = {2019}, date = {2019-08-01}, booktitle = {Euro-Par 2019: Parallel Processing Workshops}, volume = {11997}, pages = {12}, publisher = {Springer}, address = {Göttingen, Germany}, series = {Lecture Notes in Computer Science}, abstract = {Stream processing paradigm is present in several applications that apply computations over continuous data flowing in the form of streams (e.g., video feeds, image, and data analytics). Employing self-adaptivity to stream processing applications can provide higher-level programming abstractions and autonomic resource management. However, there are cases where the performance is suboptimal. In this paper, the goal is to optimize parallelism adaptations in terms of stability and accuracy, which can improve the performance of parallel stream processing applications. Therefore, we present a new optimized self-adaptive strategy that is experimentally evaluated. The proposed solution provided high-level programming abstractions, reduced the adaptation overhead, and achieved a competitive performance with the best static executions.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Stream processing paradigm is present in several applications that apply computations over continuous data flowing in the form of streams (e.g., video feeds, image, and data analytics). Employing self-adaptivity to stream processing applications can provide higher-level programming abstractions and autonomic resource management. However, there are cases where the performance is suboptimal. In this paper, the goal is to optimize parallelism adaptations in terms of stability and accuracy, which can improve the performance of parallel stream processing applications. Therefore, we present a new optimized self-adaptive strategy that is experimentally evaluated. The proposed solution provided high-level programming abstractions, reduced the adaptation overhead, and achieved a competitive performance with the best static executions. Close https://doi.org/10.1007/978-3-030-48340-1_3 doi:10.1007/978-3-030-48340-1_3 Close
	Maliszewski, Anderson M.; Vogel, Adriano; Griebler, Dalvan; Roloff, Eduardo; Fernandes, Luz G.; Navaux, Philippe O. A. Minimizing Communication Overheads in Container-based Clouds for HPC Applications Inproceedings doi In: IEEE Symposium on Computers and Communications (ISCC), pp. 1-6, IEEE, Barcelona, Spain, 2019. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{larcc:communication_overhead_lxd:ISCC:19, title = {Minimizing Communication Overheads in Container-based Clouds for HPC Applications}, author = {Anderson M. Maliszewski and Adriano Vogel and Dalvan Griebler and Eduardo Roloff and Luz G. Fernandes and Philippe O. A. Navaux}, url = {https://doi.org/10.1109/ISCC47284.2019.8969716}, doi = {10.1109/ISCC47284.2019.8969716}, year = {2019}, date = {2019-07-01}, booktitle = {IEEE Symposium on Computers and Communications (ISCC)}, pages = {1-6}, publisher = {IEEE}, address = {Barcelona, Spain}, series = {ISCC'19}, abstract = {Although the industry has embraced the cloud computing model, there are still significant challenges to be addressed concerning the quality of cloud services. Network-intensive applications may not scale in the cloud due to the sharing of the network infrastructure. In the literature, performance evaluation studies are showing that the network tends to limit the scalability and performance of HPC applications. Therefore, we proposed the aggregation of Network Interface Cards (NICs) in a ready-to-use integration with the OpenNebula cloud manager using Linux containers. We perform a set of experiments using a network microbenchmark to get specific network performance metrics and NAS parallel benchmarks to analyze the performance impact on HPC applications. Our results highlight that the implementation of NIC aggregation improves network performance in terms of throughput and latency. Moreover, HPC applications have different patterns of behavior when using our approach, which depends on communication and the amount of data transferring. While network-intensive applications increased the performance up to 38%, other applications with aggregated NICs maintained the same performance or presented slightly worse performance.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Although the industry has embraced the cloud computing model, there are still significant challenges to be addressed concerning the quality of cloud services. Network-intensive applications may not scale in the cloud due to the sharing of the network infrastructure. In the literature, performance evaluation studies are showing that the network tends to limit the scalability and performance of HPC applications. Therefore, we proposed the aggregation of Network Interface Cards (NICs) in a ready-to-use integration with the OpenNebula cloud manager using Linux containers. We perform a set of experiments using a network microbenchmark to get specific network performance metrics and NAS parallel benchmarks to analyze the performance impact on HPC applications. Our results highlight that the implementation of NIC aggregation improves network performance in terms of throughput and latency. Moreover, HPC applications have different patterns of behavior when using our approach, which depends on communication and the amount of data transferring. While network-intensive applications increased the performance up to 38%, other applications with aggregated NICs maintained the same performance or presented slightly worse performance. Close https://doi.org/10.1109/ISCC47284.2019.8969716 doi:10.1109/ISCC47284.2019.8969716 Close
	Griebler, Dalvan; Vogel, Adriano; Sensi, Daniele De; Danelutto, Marco; Fernandes, Luiz Gustavo Simplifying and implementing service level objectives for stream parallelism Journal Article doi In: The Journal of Supercomputing, vol. 76, pp. 4603-4628, 2019, ISSN: 0920-8542. (Abstract \| Links \| BibTeX \| Tags: ) @article{GRIEBLER:JS:19, title = {Simplifying and implementing service level objectives for stream parallelism}, author = {Dalvan Griebler and Adriano Vogel and Daniele De Sensi and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1007/s11227-019-02914-6}, doi = {10.1007/s11227-019-02914-6}, issn = {0920-8542}, year = {2019}, date = {2019-06-01}, urldate = {2019-06-01}, journal = {The Journal of Supercomputing}, volume = {76}, pages = {4603-4628}, publisher = {Springer}, abstract = {An increasing attention has been given to provide service level objectives (SLOs) in stream processing applications due to the performance and energy requirements, and because of the need to impose limits in terms of resource usage while improving the system utilization. Since the current and next-generation computing systems are intrinsically offering parallel architectures, the software has to naturally exploit the architecture parallelism. Implement and meet SLOs on existing applications is not a trivial task for application programmers, since the software development process, besides the parallelism exploitation, requires the implementation of autonomic algorithms or strategies. This is a system-oriented programming approach and requires the management of multiple knobs and sensors (e.g., the number of threads to use, the clock frequency of the cores, etc.) so that the system can self-adapt at runtime. In this work, we introduce a new and simpler way to define SLO in the application’s source code, by abstracting from the programmer all the details relative to self-adaptive system implementation. The application programmer specifies which parts of the code to parallelize and the related SLOs that should be enforced. To reach this goal, source-to-source code transformation rules are implemented in our compiler, which automatically generates self-adaptive strategies to enforce, at runtime, the user-expressed objectives. The experiments highlighted promising results with simpler, effective, and efficient SLO implementations for real-world applications.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close An increasing attention has been given to provide service level objectives (SLOs) in stream processing applications due to the performance and energy requirements, and because of the need to impose limits in terms of resource usage while improving the system utilization. Since the current and next-generation computing systems are intrinsically offering parallel architectures, the software has to naturally exploit the architecture parallelism. Implement and meet SLOs on existing applications is not a trivial task for application programmers, since the software development process, besides the parallelism exploitation, requires the implementation of autonomic algorithms or strategies. This is a system-oriented programming approach and requires the management of multiple knobs and sensors (e.g., the number of threads to use, the clock frequency of the cores, etc.) so that the system can self-adapt at runtime. In this work, we introduce a new and simpler way to define SLO in the application’s source code, by abstracting from the programmer all the details relative to self-adaptive system implementation. The application programmer specifies which parts of the code to parallelize and the related SLOs that should be enforced. To reach this goal, source-to-source code transformation rules are implemented in our compiler, which automatically generates self-adaptive strategies to enforce, at runtime, the user-expressed objectives. The experiments highlighted promising results with simpler, effective, and efficient SLO implementations for real-world applications. Close https://doi.org/10.1007/s11227-019-02914-6 doi:10.1007/s11227-019-02914-6 Close
	Rockenbach, Dinei A.; Stein, Charles Michael; Griebler, Dalvan; Mencagli, Gabriele; Torquati, Massimo; Danelutto, Marco; Fernandes, Luiz Gustavo Stream Processing on Multi-cores with GPUs: Parallel Programming Models' Challenges Inproceedings doi In: International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 834-841, IEEE, Rio de Janeiro, Brazil, 2019. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{ROCKENBACH:stream-multigpus:IPDPSW:19, title = {Stream Processing on Multi-cores with GPUs: Parallel Programming Models' Challenges}, author = {Dinei A. Rockenbach and Charles Michael Stein and Dalvan Griebler and Gabriele Mencagli and Massimo Torquati and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/IPDPSW.2019.00137}, doi = {10.1109/IPDPSW.2019.00137}, year = {2019}, date = {2019-05-01}, booktitle = {International Parallel and Distributed Processing Symposium Workshops (IPDPSW)}, pages = {834-841}, publisher = {IEEE}, address = {Rio de Janeiro, Brazil}, series = {IPDPSW'19}, abstract = {The stream processing paradigm is used in several scientific and enterprise applications in order to continuously compute results out of data items coming from data sources such as sensors. The full exploitation of the potential parallelism offered by current heterogeneous multi-cores equipped with one or more GPUs is still a challenge in the context of stream processing applications. In this work, our main goal is to present the parallel programming challenges that the programmer has to face when exploiting CPUs and GPUs' parallelism at the same time using traditional programming models. We highlight the parallelization methodology in two use-cases (the Mandelbrot Streaming benchmark and the PARSEC's Dedup application) to demonstrate the issues and benefits of using heterogeneous parallel hardware. The experiments conducted demonstrate how a high-level parallel programming model targeting stream processing like the one offered by SPar can be used to reduce the programming effort still offering a good level of performance if compared with state-of-the-art programming models.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close The stream processing paradigm is used in several scientific and enterprise applications in order to continuously compute results out of data items coming from data sources such as sensors. The full exploitation of the potential parallelism offered by current heterogeneous multi-cores equipped with one or more GPUs is still a challenge in the context of stream processing applications. In this work, our main goal is to present the parallel programming challenges that the programmer has to face when exploiting CPUs and GPUs' parallelism at the same time using traditional programming models. We highlight the parallelization methodology in two use-cases (the Mandelbrot Streaming benchmark and the PARSEC's Dedup application) to demonstrate the issues and benefits of using heterogeneous parallel hardware. The experiments conducted demonstrate how a high-level parallel programming model targeting stream processing like the one offered by SPar can be used to reduce the programming effort still offering a good level of performance if compared with state-of-the-art programming models. Close https://doi.org/10.1109/IPDPSW.2019.00137 doi:10.1109/IPDPSW.2019.00137 Close
	Stein, Charles Michael; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo Stream Parallelism on the LZSS Data Compression Application for Multi-Cores with GPUs Inproceedings doi In: 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 247-251, IEEE, Pavia, Italy, 2019. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{STEIN:LZSS-multigpu:PDP:19, title = {Stream Parallelism on the LZSS Data Compression Application for Multi-Cores with GPUs}, author = {Charles Michael Stein and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/EMPDP.2019.8671624}, doi = {10.1109/EMPDP.2019.8671624}, year = {2019}, date = {2019-02-01}, booktitle = {27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)}, pages = {247-251}, publisher = {IEEE}, address = {Pavia, Italy}, series = {PDP'19}, abstract = {GPUs have been used to accelerate different data parallel applications. The challenge consists in using GPUs to accelerate stream processing applications. Our goal is to investigate and evaluate whether stream parallel applications may benefit from parallel execution on both CPU and GPU cores. In this paper, we introduce new parallel algorithms for the Lempel-Ziv-Storer-Szymanski (LZSS) data compression application. We implemented the algorithms targeting both CPUs and GPUs. GPUs have been used with CUDA and OpenCL to exploit inner algorithm data parallelism. Outer stream parallelism has been exploited using CPU cores through SPar. The parallel implementation of LZSS achieved 135 fold speedup using a multi-core CPU and two GPUs. We also observed speedups in applications where we were not expecting to get it using the same combine data-stream parallel exploitation techniques.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close GPUs have been used to accelerate different data parallel applications. The challenge consists in using GPUs to accelerate stream processing applications. Our goal is to investigate and evaluate whether stream parallel applications may benefit from parallel execution on both CPU and GPU cores. In this paper, we introduce new parallel algorithms for the Lempel-Ziv-Storer-Szymanski (LZSS) data compression application. We implemented the algorithms targeting both CPUs and GPUs. GPUs have been used with CUDA and OpenCL to exploit inner algorithm data parallelism. Outer stream parallelism has been exploited using CPU cores through SPar. The parallel implementation of LZSS achieved 135 fold speedup using a multi-core CPU and two GPUs. We also observed speedups in applications where we were not expecting to get it using the same combine data-stream parallel exploitation techniques. Close https://doi.org/10.1109/EMPDP.2019.8671624 doi:10.1109/EMPDP.2019.8671624 Close
	Maron, Carlos A. F.; Vogel, Adriano; Griebler, Dalvan; Fernandes, Luiz Gustavo Should PARSEC Benchmarks be More Parametric? A Case Study with Dedup Inproceedings doi In: 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 217-221, IEEE, Pavia, Italy, 2019. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{MARON:parametric-parsec:PDP:19, title = {Should PARSEC Benchmarks be More Parametric? A Case Study with Dedup}, author = {Carlos A. F. Maron and Adriano Vogel and Dalvan Griebler and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/EMPDP.2019.8671592}, doi = {10.1109/EMPDP.2019.8671592}, year = {2019}, date = {2019-02-01}, booktitle = {27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)}, pages = {217-221}, publisher = {IEEE}, address = {Pavia, Italy}, series = {PDP'19}, abstract = {Parallel applications of the same domain can present similar patterns of behavior and characteristics. Characterizing common application behaviors can help for understanding performance aspects in the real-world scenario. One way to better understand and evaluate applications' characteristics is by using customizable/parametric benchmarks that enable users to represent important characteristics at run-time. We observed that parameterization techniques should be better exploited in the available benchmarks, especially on stream processing domain. For instance, although widely used, the stream processing benchmarks available in PARSEC do not support the simulation and evaluation of relevant and modern characteristics. Therefore, our goal is to identify the stream parallelism characteristics present in PARSEC. We also implemented a ready to use parameterization support and evaluated the application behaviors considering relevant performance metrics for stream parallelism (service time, throughput, latency). We choose Dedup to be our case study. The experimental results have shown performance improvements in our parameterization support for Dedup. Moreover, this support increased the customization space for benchmark users, which is simple to use. In the future, our solution can be potentially explored on different parallel architectures and parallel programming frameworks.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Parallel applications of the same domain can present similar patterns of behavior and characteristics. Characterizing common application behaviors can help for understanding performance aspects in the real-world scenario. One way to better understand and evaluate applications' characteristics is by using customizable/parametric benchmarks that enable users to represent important characteristics at run-time. We observed that parameterization techniques should be better exploited in the available benchmarks, especially on stream processing domain. For instance, although widely used, the stream processing benchmarks available in PARSEC do not support the simulation and evaluation of relevant and modern characteristics. Therefore, our goal is to identify the stream parallelism characteristics present in PARSEC. We also implemented a ready to use parameterization support and evaluated the application behaviors considering relevant performance metrics for stream parallelism (service time, throughput, latency). We choose Dedup to be our case study. The experimental results have shown performance improvements in our parameterization support for Dedup. Moreover, this support increased the customization space for benchmark users, which is simple to use. In the future, our solution can be potentially explored on different parallel architectures and parallel programming frameworks. Close https://doi.org/10.1109/EMPDP.2019.8671592 doi:10.1109/EMPDP.2019.8671592 Close
	Serpa, Matheus S.; Moreira, Francis B.; Navaux, Philippe O. A.; Cruz, Eduardo H. M.; Diener, Matthias; Griebler, Dalvan; Fernandes, Luiz Gustavo Memory Performance and Bottlenecks in Multicore and GPU Architectures Inproceedings doi In: 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 233-236, IEEE, Pavia, Italy, 2019. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{SERPA:memory-gpu-multicore:PDP:19, title = {Memory Performance and Bottlenecks in Multicore and GPU Architectures}, author = {Matheus S. Serpa and Francis B. Moreira and Philippe O. A. Navaux and Eduardo H. M. Cruz and Matthias Diener and Dalvan Griebler and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/EMPDP.2019.8671628}, doi = {10.1109/EMPDP.2019.8671628}, year = {2019}, date = {2019-02-01}, booktitle = {27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)}, pages = {233-236}, publisher = {IEEE}, address = {Pavia, Italy}, series = {PDP'19}, abstract = {Nowadays, there are several different architectures available not only for the industry, but also for normal consumers. Traditional multicore processors, GPUs, accelerators such as the Sunway SW26010, or even energy efficiency-driven processors such as the ARM family, present very different architectural characteristics. This wide range of characteristics presents a challenge for the developers of applications. Developers must deal with different instruction sets, memory hierarchies, or even different programming paradigms when programming for these architectures. Therefore, the same application can perform well when executing on one architecture, but poorly on another architecture. To optimize an application, it is important to have a deep understanding of how it behaves on different architectures. The related work in this area mostly focuses on a limited analysis encompassing execution time and energy. In this paper, we perform a detailed investigation on the impact of the memory subsystem of different architectures, which is one of the most important aspects to be considered. For this study, we performed experiments in the Broadwell CPU and Pascal GPU, using applications from the Rodinia benchmark suite. In this way, we were able to understand why an application performs well on one architecture and poorly on others.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Nowadays, there are several different architectures available not only for the industry, but also for normal consumers. Traditional multicore processors, GPUs, accelerators such as the Sunway SW26010, or even energy efficiency-driven processors such as the ARM family, present very different architectural characteristics. This wide range of characteristics presents a challenge for the developers of applications. Developers must deal with different instruction sets, memory hierarchies, or even different programming paradigms when programming for these architectures. Therefore, the same application can perform well when executing on one architecture, but poorly on another architecture. To optimize an application, it is important to have a deep understanding of how it behaves on different architectures. The related work in this area mostly focuses on a limited analysis encompassing execution time and energy. In this paper, we perform a detailed investigation on the impact of the memory subsystem of different architectures, which is one of the most important aspects to be considered. For this study, we performed experiments in the Broadwell CPU and Pascal GPU, using applications from the Rodinia benchmark suite. In this way, we were able to understand why an application performs well on one architecture and poorly on others. Close https://doi.org/10.1109/EMPDP.2019.8671628 doi:10.1109/EMPDP.2019.8671628 Close
2018
	Maliszewski, Anderson M; Griebler, Dalvan; Vogel, Adriano; Schepke, Claudio On the Performance of Multithreading Applications under Private Cloud Conditions Inproceedings doi In: Symposium on High Performance Computing Systems (WSCAD), pp. 273-273, IEEE, São Paulo, Brazil, 2018. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{larcc:multithreading_cloud:WSCAD:18, title = {On the Performance of Multithreading Applications under Private Cloud Conditions}, author = {Anderson M Maliszewski and Dalvan Griebler and Adriano Vogel and Claudio Schepke}, url = {https://doi.org/10.1109/WSCAD.2018.00055}, doi = {10.1109/WSCAD.2018.00055}, year = {2018}, date = {2018-10-01}, booktitle = {Symposium on High Performance Computing Systems (WSCAD)}, pages = {273-273}, publisher = {IEEE}, address = {São Paulo, Brazil}, abstract = {IaaS private clouds provide an attractive environment for scientific applications. However, the performance is a challenge, as additional abstraction layers imposed by the virtualization can cause overheads and bottlenecks. This paper contributes to a performance analysis of applications with dedicated and shared resources environments under private cloud conditions, deployed with container (LXC) or kernel-based (KVM) instances. We selected five benchmarks from PARSEC suite. In the experimental results, identify a performance pattern of behavior among the applications was hard. For a set of multi-threading applications, the KVM-based cloud instances achieved better performance, however, in the other set of applications, the LXC-based cloud instances performed better.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close IaaS private clouds provide an attractive environment for scientific applications. However, the performance is a challenge, as additional abstraction layers imposed by the virtualization can cause overheads and bottlenecks. This paper contributes to a performance analysis of applications with dedicated and shared resources environments under private cloud conditions, deployed with container (LXC) or kernel-based (KVM) instances. We selected five benchmarks from PARSEC suite. In the experimental results, identify a performance pattern of behavior among the applications was hard. For a set of multi-threading applications, the KVM-based cloud instances achieved better performance, however, in the other set of applications, the LXC-based cloud instances performed better. Close https://doi.org/10.1109/WSCAD.2018.00055 doi:10.1109/WSCAD.2018.00055 Close
	Ewald, Endrius; Vogel, Adriano; Rista, Cassiano; Griebler, Dalvan; Manssour, Isabel; Fernandes, Luiz G. Parallel and Distributed Processing Support for a Geospatial Data Visualization DSL Inproceedings doi In: Symposium on High Performance Computing Systems (WSCAD), pp. 221-228, IEEE, São Paulo, Brazil, 2018. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{EWALD:WSCAD:18, title = {Parallel and Distributed Processing Support for a Geospatial Data Visualization DSL}, author = {Endrius Ewald and Adriano Vogel and Cassiano Rista and Dalvan Griebler and Isabel Manssour and Luiz G. Fernandes}, url = {https://doi.org/10.1109/WSCAD.2018.00042}, doi = {10.1109/WSCAD.2018.00042}, year = {2018}, date = {2018-10-01}, booktitle = {Symposium on High Performance Computing Systems (WSCAD)}, pages = {221-228}, publisher = {IEEE}, address = {São Paulo, Brazil}, abstract = {The amount of data generated worldwide related to geolocalization has exponentially increased. However, the fast processing of this amount of data is a challenge from the programming perspective, and many available solutions require learning a variety of tools and programming languages. This paper introduces the support for parallel and distributed processing in a DSL for Geospatial Data Visualization to speed up the data pre-processing phase. The results have shown the MPI version with dynamic data distribution performing better under medium and large data set files, while MPI-I/O version achieved the best performance with small data set files.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close The amount of data generated worldwide related to geolocalization has exponentially increased. However, the fast processing of this amount of data is a challenge from the programming perspective, and many available solutions require learning a variety of tools and programming languages. This paper introduces the support for parallel and distributed processing in a DSL for Geospatial Data Visualization to speed up the data pre-processing phase. The results have shown the MPI version with dynamic data distribution performing better under medium and large data set files, while MPI-I/O version achieved the best performance with small data set files. Close https://doi.org/10.1109/WSCAD.2018.00042 doi:10.1109/WSCAD.2018.00042 Close
	Vogel, Adriano; Griebler, Dalvan; Sensi, Daniele De; Danelutto, Marco; Fernandes, Luiz Gustavo Autonomic and Latency-Aware Degree of Parallelism Management in SPar Inproceedings doi In: Euro-Par 2018: Parallel Processing Workshops, pp. 28-39, Springer, Turin, Italy, 2018. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{VOGEL:Adaptive-Latency-SPar:AutoDaSP:18, title = {Autonomic and Latency-Aware Degree of Parallelism Management in SPar}, author = {Adriano Vogel and Dalvan Griebler and Daniele De Sensi and Marco Danelutto and Luiz Gustavo Fernandes}, url = {http://dx.doi.org/10.1007/978-3-030-10549-5_3}, doi = {10.1007/978-3-030-10549-5_3}, year = {2018}, date = {2018-08-01}, booktitle = {Euro-Par 2018: Parallel Processing Workshops}, pages = {28-39}, publisher = {Springer}, address = {Turin, Italy}, series = {Lecture Notes in Computer Science}, abstract = {Stream processing applications became a representative workload in current computing systems. A significant part of these applications demands parallelism to increase performance. However, programmers are often facing a trade-off between coding productivity and performance when introducing parallelism. SPar was created for balancing this trade-off to the application programmers by using the C++11 attributes’ annotation mechanism. In SPar and other programming frameworks for stream processing applications, the manual definition of the number of replicas to be used for the stream operators is a challenge. In addition to that, low latency is required by several stream processing applications. We noted that explicit latency requirements are poorly considered on the state-of-the-art parallel programming frameworks. Since there is a direct relationship between the number of replicas and the latency of the application, in this work we propose an autonomic and adaptive strategy to choose the proper number of replicas in SPar to address latency constraints. We experimentally evaluated our implemented strategy and demonstrated its effectiveness on a real-world application, demonstrating that our adaptive strategy can provide higher abstraction levels while automatically managing the latency.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Stream processing applications became a representative workload in current computing systems. A significant part of these applications demands parallelism to increase performance. However, programmers are often facing a trade-off between coding productivity and performance when introducing parallelism. SPar was created for balancing this trade-off to the application programmers by using the C++11 attributes’ annotation mechanism. In SPar and other programming frameworks for stream processing applications, the manual definition of the number of replicas to be used for the stream operators is a challenge. In addition to that, low latency is required by several stream processing applications. We noted that explicit latency requirements are poorly considered on the state-of-the-art parallel programming frameworks. Since there is a direct relationship between the number of replicas and the latency of the application, in this work we propose an autonomic and adaptive strategy to choose the proper number of replicas in SPar to address latency constraints. We experimentally evaluated our implemented strategy and demonstrated its effectiveness on a real-world application, demonstrating that our adaptive strategy can provide higher abstraction levels while automatically managing the latency. Close http://dx.doi.org/10.1007/978-3-030-10549-5_3 doi:10.1007/978-3-030-10549-5_3 Close
	Griebler, Dalvan; Sensi, Daniele De; Vogel, Adriano; Danelutto, Marco; Fernandes, Luiz Gustavo Service Level Objectives via C++11 Attributes Inproceedings doi In: Euro-Par 2018: Parallel Processing Workshops, pp. 745-756, Springer, Turin, Italy, 2018. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{GRIEBLER:SLO-SPar-Nornir:REPARA:18, title = {Service Level Objectives via C++11 Attributes}, author = {Dalvan Griebler and Daniele De Sensi and Adriano Vogel and Marco Danelutto and Luiz Gustavo Fernandes}, url = {http://dx.doi.org/10.1007/978-3-030-10549-5_58}, doi = {10.1007/978-3-030-10549-5_58}, year = {2018}, date = {2018-08-01}, booktitle = {Euro-Par 2018: Parallel Processing Workshops}, pages = {745-756}, publisher = {Springer}, address = {Turin, Italy}, series = {Lecture Notes in Computer Science}, abstract = {In recent years, increasing attention has been given to the possibility of guaranteeing Service Level Objectives (SLOs) to users about their applications, either regarding performance or power consumption. SLO can be implemented for parallel applications since they can provide many control knobs (e.g., the number of threads to use, the clock frequency of the cores, etc.) to tune the performance and power consumption of the application. Different from most of the existing approaches, we target sequential stream processing applications by proposing a solution based on C++ annotations. The user specifies which parts of the code to parallelize and what type of requirements should be enforced on that part of the code. Our solution first automatically parallelizes the annotated code and then applies self-adaptation approaches at run-time to enforce the user-expressed objectives. We ran experiments on different real-world applications, showing its simplicity and effectiveness.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close In recent years, increasing attention has been given to the possibility of guaranteeing Service Level Objectives (SLOs) to users about their applications, either regarding performance or power consumption. SLO can be implemented for parallel applications since they can provide many control knobs (e.g., the number of threads to use, the clock frequency of the cores, etc.) to tune the performance and power consumption of the application. Different from most of the existing approaches, we target sequential stream processing applications by proposing a solution based on C++ annotations. The user specifies which parts of the code to parallelize and what type of requirements should be enforced on that part of the code. Our solution first automatically parallelizes the annotated code and then applies self-adaptation approaches at run-time to enforce the user-expressed objectives. We ran experiments on different real-world applications, showing its simplicity and effectiveness. Close http://dx.doi.org/10.1007/978-3-030-10549-5_58 doi:10.1007/978-3-030-10549-5_58 Close
	Maliszewski, Anderson M; Griebler, Dalvan; Schepke, Claudio; Ditter, Alexander; Fey, Dietmar; Fernandes, Luiz Gustavo The NAS Benchmark Kernels for Single and Multi-Tenant Cloud Instances with LXC/KVM Inproceedings doi In: International Conference on High Performance Computing & Simulation (HPCS), pp. 359-366, IEEE, Orleans, France, 2018. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{larcc:NAS_cloud_LXC_KVM:HPCS:2018, title = {The NAS Benchmark Kernels for Single and Multi-Tenant Cloud Instances with LXC/KVM}, author = {Anderson M Maliszewski and Dalvan Griebler and Claudio Schepke and Alexander Ditter and Dietmar Fey and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/HPCS.2018.00066}, doi = {10.1109/HPCS.2018.00066}, year = {2018}, date = {2018-07-01}, booktitle = {International Conference on High Performance Computing & Simulation (HPCS)}, pages = {359-366}, publisher = {IEEE}, address = {Orleans, France}, series = {HPCS'18}, abstract = {Private IaaS clouds are an attractive environment for scientific workloads and applications. It provides advantages such as almost instantaneous availability of high-performance computing in a single node as well as compute clusters, easy access for researchers, and users that do not have access to conventional supercomputers. Furthermore, a cloud infrastructure provides elasticity and scalability to ensure and manage any software dependency on the system with no third-party dependency for researchers. However, one of the biggest challenges is to avoid significant performance degradation when migrating these applications from physical nodes to a cloud environment. Also, we lack more research investigations for multi-tenant cloud instances. In this paper, our goal is to perform a comparative performance evaluation of scientific applications with single and multi-tenancy cloud instances using KVM and LXC virtualization technologies under private cloud conditions. All analyses and evaluations were carried out based on NAS Benchmark kernels to simulate different types of workloads. We applied statistic significance tests to highlight the differences. The results have shown that applications running on LXC-based cloud instances outperform KVM-based cloud instances in 93.75% of the experiments w.r.t single tenant. Regarding multi-tenant, LXC instances outperform KVM instances in 45% of the results, where the performance differences were not as significant as expected.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Private IaaS clouds are an attractive environment for scientific workloads and applications. It provides advantages such as almost instantaneous availability of high-performance computing in a single node as well as compute clusters, easy access for researchers, and users that do not have access to conventional supercomputers. Furthermore, a cloud infrastructure provides elasticity and scalability to ensure and manage any software dependency on the system with no third-party dependency for researchers. However, one of the biggest challenges is to avoid significant performance degradation when migrating these applications from physical nodes to a cloud environment. Also, we lack more research investigations for multi-tenant cloud instances. In this paper, our goal is to perform a comparative performance evaluation of scientific applications with single and multi-tenancy cloud instances using KVM and LXC virtualization technologies under private cloud conditions. All analyses and evaluations were carried out based on NAS Benchmark kernels to simulate different types of workloads. We applied statistic significance tests to highlight the differences. The results have shown that applications running on LXC-based cloud instances outperform KVM-based cloud instances in 93.75% of the experiments w.r.t single tenant. Regarding multi-tenant, LXC instances outperform KVM instances in 45% of the results, where the performance differences were not as significant as expected. Close https://doi.org/10.1109/HPCS.2018.00066 doi:10.1109/HPCS.2018.00066 Close
	Griebler, Dalvan; Hoffmann, Renato B.; Danelutto, Marco; Fernandes, Luiz Gustavo Stream Parallelism with Ordered Data Constraints on Multi-Core Systems Journal Article doi In: The Journal of Supercomputing, vol. 75, no. 8, pp. 4042-4061, 2018, ISSN: 0920-8542. (Abstract \| Links \| BibTeX \| Tags: ) @article{GRIEBLER:JS:18, title = {Stream Parallelism with Ordered Data Constraints on Multi-Core Systems}, author = {Dalvan Griebler and Renato B. Hoffmann and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1007/s11227-018-2482-7}, doi = {10.1007/s11227-018-2482-7}, issn = {0920-8542}, year = {2018}, date = {2018-07-01}, urldate = {2018-07-01}, journal = {The Journal of Supercomputing}, volume = {75}, number = {8}, pages = {4042-4061}, publisher = {Springer}, abstract = {It is often a challenge to keep input/output tasks/results in order for parallel computations ver data streams, particularly when stateless task operators are replicated to increase parallelism when there are irregular tasks. Maintaining input/output order requires additional coding effort and may significantly impact the application's actual throughput. Thus, we propose a new implementation technique designed to be easily integrated with any of the existing C++ parallel programming frameworks that support stream parallelism. In this paper, it is first implemented and studied using SPar, our high-level domain-specific language for stream parallelism. We discuss the results of a set of experiments with real-world applications revealing how significant performance improvements may be achieved when our proposed solution is integrated within SPar, especially for data compression applications. Also, we show the results of experiments performed after integrating our solution within FastFlow and TBB, revealing no significant overheads.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close It is often a challenge to keep input/output tasks/results in order for parallel computations ver data streams, particularly when stateless task operators are replicated to increase parallelism when there are irregular tasks. Maintaining input/output order requires additional coding effort and may significantly impact the application's actual throughput. Thus, we propose a new implementation technique designed to be easily integrated with any of the existing C++ parallel programming frameworks that support stream parallelism. In this paper, it is first implemented and studied using SPar, our high-level domain-specific language for stream parallelism. We discuss the results of a set of experiments with real-world applications revealing how significant performance improvements may be achieved when our proposed solution is integrated within SPar, especially for data compression applications. Also, we show the results of experiments performed after integrating our solution within FastFlow and TBB, revealing no significant overheads. Close https://doi.org/10.1007/s11227-018-2482-7 doi:10.1007/s11227-018-2482-7 Close
	Griebler, Dalvan; Vogel, Adriano; Maron, Carlos A F; Maliszewski, Anderson M; Schepke, Claudio; Fernandes, Luiz Gustavo Performance of Data Mining, Media, and Financial Applications under Private Cloud Conditions Inproceedings doi In: IEEE Symposium on Computers and Communications (ISCC), pp. 1530-1346, IEEE, Natal, Brazil, 2018. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{larcc:parsec_cloudstack_lxc_kvm:ISCC:2018, title = {Performance of Data Mining, Media, and Financial Applications under Private Cloud Conditions}, author = {Dalvan Griebler and Adriano Vogel and Carlos A F Maron and Anderson M Maliszewski and Claudio Schepke and Luiz Gustavo Fernandes}, url = {https://dx.doi.org/10.1109/ISCC.2018.8538759}, doi = {10.1109/ISCC.2018.8538759}, year = {2018}, date = {2018-06-01}, booktitle = {IEEE Symposium on Computers and Communications (ISCC)}, pages = {1530-1346}, publisher = {IEEE}, address = {Natal, Brazil}, series = {ISCC'18}, abstract = {This paper contributes to a performance analysis of real-world workloads under private cloud conditions. We selected six benchmarks from PARSEC related to three mainstream application domains (financial, data mining, and media processing). Our goal was to evaluate these application domains in different cloud instances and deployment environments, concerning container or kernel-based instances and using dedicated or shared machine resources. Experiments have shown that performance varies according to the application characteristics, virtualization technology, and cloud environment. Results highlighted that financial, data mining, and media processing applications running in the LXC instances tend to outperform KVM when there is a dedicated machine resource environment. However, when two instances are sharing the same machine resources, these applications tend to achieve better performance in the KVM instances. Finally, financial applications achieved better performance in the cloud than media and data mining.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close This paper contributes to a performance analysis of real-world workloads under private cloud conditions. We selected six benchmarks from PARSEC related to three mainstream application domains (financial, data mining, and media processing). Our goal was to evaluate these application domains in different cloud instances and deployment environments, concerning container or kernel-based instances and using dedicated or shared machine resources. Experiments have shown that performance varies according to the application characteristics, virtualization technology, and cloud environment. Results highlighted that financial, data mining, and media processing applications running in the LXC instances tend to outperform KVM when there is a dedicated machine resource environment. However, when two instances are sharing the same machine resources, these applications tend to achieve better performance in the KVM instances. Finally, financial applications achieved better performance in the cloud than media and data mining. Close https://dx.doi.org/10.1109/ISCC.2018.8538759 Close
	Rista, Cassiano; Teixeira, Marcelo; Griebler, Dalvan; Fernandes, Luiz Gustavo Evaluating, Estimating, and Improving Network Performance in Container-based Clouds Inproceedings doi In: IEEE Symposium on Computers and Communications (ISCC), pp. 1530-1346, IEEE, Natal, Brazil, 2018. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{larcc:network_performance_container:ISCC:2018, title = {Evaluating, Estimating, and Improving Network Performance in Container-based Clouds}, author = {Cassiano Rista and Marcelo Teixeira and Dalvan Griebler and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/ISCC.2018.8538558}, doi = {10.1109/ISCC.2018.8538558}, year = {2018}, date = {2018-06-01}, booktitle = {IEEE Symposium on Computers and Communications (ISCC)}, pages = {1530-1346}, publisher = {IEEE}, address = {Natal, Brazil}, series = {ISCC'18}, abstract = {Cloud computing has recently attracted a great deal of interest from both industry and academia, emerging as an important paradigm to improve resource utilization, efficiency, flexibility, and pay-per-use. However, cloud platforms inherently include a virtualization layer that imposes performance degradation on network-intensive applications. Thus, it is crucial to anticipate possible performance degradation to resolve system bottlenecks. This paper uses the Petri Nets approach to create different models for evaluating, estimating, and improving network performance in container-based cloud environments. Based on model estimations, we assessed the network bandwidth utilization of the system under different setups. Then, by identifying possible bottlenecks, we show how the system could be modified to improve performance. We then tested how the model would behave through real-world experiments. When the model indicates probable bandwidth saturation, we propose a link aggregation approach to increase bandwidth, using lightweight virtualization to reduce virtualization overhead. Results reveal that our model anticipates the structural and behavioral characteristics of the network in the cloud environment. Therefore, it systematically improves network efficiency, which saves effort, time, and money.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Cloud computing has recently attracted a great deal of interest from both industry and academia, emerging as an important paradigm to improve resource utilization, efficiency, flexibility, and pay-per-use. However, cloud platforms inherently include a virtualization layer that imposes performance degradation on network-intensive applications. Thus, it is crucial to anticipate possible performance degradation to resolve system bottlenecks. This paper uses the Petri Nets approach to create different models for evaluating, estimating, and improving network performance in container-based cloud environments. Based on model estimations, we assessed the network bandwidth utilization of the system under different setups. Then, by identifying possible bottlenecks, we show how the system could be modified to improve performance. We then tested how the model would behave through real-world experiments. When the model indicates probable bandwidth saturation, we propose a link aggregation approach to increase bandwidth, using lightweight virtualization to reduce virtualization overhead. Results reveal that our model anticipates the structural and behavioral characteristics of the network in the cloud environment. Therefore, it systematically improves network efficiency, which saves effort, time, and money. Close https://doi.org/10.1109/ISCC.2018.8538558 doi:10.1109/ISCC.2018.8538558 Close
	Griebler, Dalvan; Loff, Junior; Mencagli, Gabriele; Danelutto, Marco; Fernandes, Luiz Gustavo Efficient NAS Benchmark Kernels with C++ Parallel Programming Inproceedings doi In: 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 733-740, IEEE, Cambridge, UK, 2018. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{GRIEBLER:NAS-CPP:PDP:18, title = {Efficient NAS Benchmark Kernels with C++ Parallel Programming}, author = {Dalvan Griebler and Junior Loff and Gabriele Mencagli and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/PDP2018.2018.00120}, doi = {10.1109/PDP2018.2018.00120}, year = {2018}, date = {2018-03-01}, booktitle = {26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)}, pages = {733-740}, publisher = {IEEE}, address = {Cambridge, UK}, series = {PDP'18}, abstract = {Benchmarking is a way to study the performance of new architectures and parallel programming frameworks. Well-established benchmark suites such as the NAS Parallel Benchmarks (NPB) comprise legacy codes that still lack portability to C++ language. As a consequence, a set of high-level and easy-to-use C++ parallel programming frameworks cannot be tested in NPB. Our goal is to describe a C++ porting of the NPB kernels and to analyze the performance achieved by different parallel implementations written using the Intel TBB, OpenMP and FastFlow frameworks for Multi-Cores. The experiments show an efficient code porting from Fortran to C++ and an efficient parallelization on average.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Benchmarking is a way to study the performance of new architectures and parallel programming frameworks. Well-established benchmark suites such as the NAS Parallel Benchmarks (NPB) comprise legacy codes that still lack portability to C++ language. As a consequence, a set of high-level and easy-to-use C++ parallel programming frameworks cannot be tested in NPB. Our goal is to describe a C++ porting of the NPB kernels and to analyze the performance achieved by different parallel implementations written using the Intel TBB, OpenMP and FastFlow frameworks for Multi-Cores. The experiments show an efficient code porting from Fortran to C++ and an efficient parallelization on average. Close https://doi.org/10.1109/PDP2018.2018.00120 doi:10.1109/PDP2018.2018.00120 Close
	Griebler, Dalvan; Hoffmann, Renato B.; Danelutto, Marco; Fernandes, Luiz Gustavo High-Level and Productive Stream Parallelism for Dedup, Ferret, and Bzip2 Journal Article doi In: International Journal of Parallel Programming, vol. 47, no. 1, pp. 253-271, 2018, ISSN: 1573-7640. (Abstract \| Links \| BibTeX \| Tags: ) @article{GRIEBLER:IJPP:18, title = {High-Level and Productive Stream Parallelism for Dedup, Ferret, and Bzip2}, author = {Dalvan Griebler and Renato B. Hoffmann and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1007/s10766-018-0558-x}, doi = {10.1007/s10766-018-0558-x}, issn = {1573-7640}, year = {2018}, date = {2018-02-01}, journal = {International Journal of Parallel Programming}, volume = {47}, number = {1}, pages = {253-271}, publisher = {Springer}, abstract = {Parallel programming has been a challenging task for application programmers. Stream processing is an application domain present in several scientific, enterprise, and financial areas that lack suitable abstractions to exploit parallelism. Our goal is to assess the feasibility of state-of-the-art frameworks/libraries (Pthreads, TBB, and FastFlow) and the SPar domain-specific language for real-world streaming applications (Dedup, Ferret, and Bzip2) targeting multi-core architectures. SPar was specially designed to provide high-level and productive stream parallelism abstractions, supporting programmers with standard C++-11 annotations. For the experiments, we implemented three streaming applications. We discussed SPar’s programmability advantages compared to the frameworks in terms of productivity and structured parallel programming. The results demonstrate that SPar improves productivity and provides the necessary features to achieve similar performances compared to the state-of-the-art.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close Parallel programming has been a challenging task for application programmers. Stream processing is an application domain present in several scientific, enterprise, and financial areas that lack suitable abstractions to exploit parallelism. Our goal is to assess the feasibility of state-of-the-art frameworks/libraries (Pthreads, TBB, and FastFlow) and the SPar domain-specific language for real-world streaming applications (Dedup, Ferret, and Bzip2) targeting multi-core architectures. SPar was specially designed to provide high-level and productive stream parallelism abstractions, supporting programmers with standard C++-11 annotations. For the experiments, we implemented three streaming applications. We discussed SPar’s programmability advantages compared to the frameworks in terms of productivity and structured parallel programming. The results demonstrate that SPar improves productivity and provides the necessary features to achieve similar performances compared to the state-of-the-art. Close https://doi.org/10.1007/s10766-018-0558-x doi:10.1007/s10766-018-0558-x Close
2017
	Griebler, Dalvan; Hoffmann, Renato B.; Loff, Junior; Danelutto, Marco; Fernandes, Luiz G. High-Level and Efficient Stream Parallelism on Multi-core Systems with SPar for Data Compression Applications Inproceedings In: XVIII Simpósio em Sistemas Computacionais de Alto Desempenho, pp. 16-27, SBC, Campinas, SP, Brasil, 2017. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{GRIEBLER:WSCAD:17, title = {High-Level and Efficient Stream Parallelism on Multi-core Systems with SPar for Data Compression Applications}, author = {Dalvan Griebler and Renato B. Hoffmann and Junior Loff and Marco Danelutto and Luiz G. Fernandes}, url = {https://gmap.pucrs.br/dalvan/papers/2017/CR_WSCAD_2017.pdf}, year = {2017}, date = {2017-10-01}, booktitle = {XVIII Simpósio em Sistemas Computacionais de Alto Desempenho}, pages = {16-27}, publisher = {SBC}, address = {Campinas, SP, Brasil}, abstract = {The stream processing domain is present in several real-world applications that are running on multi-core systems. In this paper, we focus on data compression applications that are an important sub-set of this domain. Our main goal is to assess the programmability and efficiency of domain-specific language called SPar. It was specially designed for expressing stream parallelism and it promises higher-level parallelism abstractions without significant performance losses. Therefore, we parallelized Lzip and Bzip2 compressors with SPar and compared with state-of-the-art frameworks. The results revealed that SPar is able to efficiently exploit stream parallelism as well as provide suitable abstractions with less code intrusion and code re-factoring.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close The stream processing domain is present in several real-world applications that are running on multi-core systems. In this paper, we focus on data compression applications that are an important sub-set of this domain. Our main goal is to assess the programmability and efficiency of domain-specific language called SPar. It was specially designed for expressing stream parallelism and it promises higher-level parallelism abstractions without significant performance losses. Therefore, we parallelized Lzip and Bzip2 compressors with SPar and compared with state-of-the-art frameworks. The results revealed that SPar is able to efficiently exploit stream parallelism as well as provide suitable abstractions with less code intrusion and code re-factoring. Close https://gmap.pucrs.br/dalvan/papers/2017/CR_WSCAD_2017.pdf Close
	Griebler, Dalvan; Hoffmann, Renato B.; Danelutto, Marco; Fernandes, Luiz Gustavo Higher-Level Parallelism Abstractions for Video Applications with SPar Inproceedings doi In: Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing, pp. 698-707, IOS Press, Bologna, Italy, 2017. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{GRIEBLER:REPARA:17, title = {Higher-Level Parallelism Abstractions for Video Applications with SPar}, author = {Dalvan Griebler and Renato B. Hoffmann and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.3233/978-1-61499-843-3-698}, doi = {10.3233/978-1-61499-843-3-698}, year = {2017}, date = {2017-09-01}, booktitle = {Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing}, pages = {698-707}, publisher = {IOS Press}, address = {Bologna, Italy}, series = {ParCo'17}, abstract = {SPar is a Domain-Specific Language (DSL) designed to provide high-level parallel programming abstractions for streaming applications. Video processing application domain requires parallel processing to extract and analyze information quickly. When using state-of-the-art frameworks such as FastFlow and TBB, the application programmer has to manage source code re-factoring and performance optimization to implement parallelism efficiently. Our goal is to make this process easier for programmers through SPar. Thus we assess SPar's programming language and its performance in traditional video applications. We also discuss different implementations compared to the ones of SPar. Results demonstrate that SPar maintains the sequential code structure, is less code intrusive, and provides higher-level programming abstractions without introducing notable performance losses. Therefore, it represents a good choice for application programmers from the video processing domain.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close SPar is a Domain-Specific Language (DSL) designed to provide high-level parallel programming abstractions for streaming applications. Video processing application domain requires parallel processing to extract and analyze information quickly. When using state-of-the-art frameworks such as FastFlow and TBB, the application programmer has to manage source code re-factoring and performance optimization to implement parallelism efficiently. Our goal is to make this process easier for programmers through SPar. Thus we assess SPar's programming language and its performance in traditional video applications. We also discuss different implementations compared to the ones of SPar. Results demonstrate that SPar maintains the sequential code structure, is less code intrusive, and provides higher-level programming abstractions without introducing notable performance losses. Therefore, it represents a good choice for application programmers from the video processing domain. Close https://doi.org/10.3233/978-1-61499-843-3-698 doi:10.3233/978-1-61499-843-3-698 Close
	Griebler, Dalvan; Fernandes, Luiz Gustavo Towards Distributed Parallel Programming Support for the SPar DSL Inproceedings doi In: Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing, pp. 563-572, IOS Press, Bologna, Italy, 2017. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{GRIEBLER:PARCO:17, title = {Towards Distributed Parallel Programming Support for the SPar DSL}, author = {Dalvan Griebler and Luiz Gustavo Fernandes}, url = {https://doi.org/10.3233/978-1-61499-843-3-563}, doi = {10.3233/978-1-61499-843-3-563}, year = {2017}, date = {2017-09-01}, booktitle = {Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing}, pages = {563-572}, publisher = {IOS Press}, address = {Bologna, Italy}, series = {ParCo'17}, abstract = {SPar was originally designed to provide high-level abstractions for stream parallelism in C++ programs targeting multi-core systems. This work proposes distributed parallel programming support for SPar targeting cluster environments. The goal is to preserve the original semantics while source-to-source code transformations will be turned into MPI (Message Passing Interface) parallel code. The results of the experiments presented in the paper demonstrate improved programmability without significant performance losses.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close SPar was originally designed to provide high-level abstractions for stream parallelism in C++ programs targeting multi-core systems. This work proposes distributed parallel programming support for SPar targeting cluster environments. The goal is to preserve the original semantics while source-to-source code transformations will be turned into MPI (Message Passing Interface) parallel code. The results of the experiments presented in the paper demonstrate improved programmability without significant performance losses. Close https://doi.org/10.3233/978-1-61499-843-3-563 doi:10.3233/978-1-61499-843-3-563 Close
	Rista, Cassiano; Griebler, Dalvan; Maron, Carlos A. F.; Fernandes, Luiz Gustavo Improving the Network Performance of a Container-Based Cloud Environment for Hadoop Systems Inproceedings doi In: International Conference on High Performance Computing & Simulation (HPCS), pp. 619-626, IEEE, Genoa, Italy, 2017. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{larcc:link_aggregation:HPCS:2017, title = {Improving the Network Performance of a Container-Based Cloud Environment for Hadoop Systems}, author = {Cassiano Rista and Dalvan Griebler and Carlos A. F. Maron and Luiz Gustavo Fernandes}, url = {http://ieeexplore.ieee.org/document/8035136/}, doi = {10.1109/HPCS.2017.97}, year = {2017}, date = {2017-07-01}, booktitle = {International Conference on High Performance Computing & Simulation (HPCS)}, pages = {619-626}, publisher = {IEEE}, address = {Genoa, Italy}, series = {HPCS'17}, abstract = {Cloud computing has emerged as an important paradigm to improve resource utilization, efficiency, flexibility, and the pay-per-use billing structure. However, cloud platforms cause performance degradations due to their virtualization layer and may not be appropriate for the requirements of high-performance applications, such as big data. This paper tackles the problem of improving network performance in container-based cloud instances to create a viable alternative to run network intensive Hadoop applications. Our approach consists of deploying link aggregation via the IEEE 802.3ad standard to increase the available bandwidth and using LXC (Linux Container) cloud instances to create a Hadoop cluster. In order to evaluate the efficiency of our approach and the overhead added by the container-based cloud environment, we ran a set of experiments to measure throughput, latency, bandwidth utilization, and completion times. The results prove that our approach adds minimal overhead in cloud environment as well as increases throughput and reduces latency. Moreover, our approach demonstrates a suitable alternative for running Hadoop applications, reducing completion times up to 33.73%}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Cloud computing has emerged as an important paradigm to improve resource utilization, efficiency, flexibility, and the pay-per-use billing structure. However, cloud platforms cause performance degradations due to their virtualization layer and may not be appropriate for the requirements of high-performance applications, such as big data. This paper tackles the problem of improving network performance in container-based cloud instances to create a viable alternative to run network intensive Hadoop applications. Our approach consists of deploying link aggregation via the IEEE 802.3ad standard to increase the available bandwidth and using LXC (Linux Container) cloud instances to create a Hadoop cluster. In order to evaluate the efficiency of our approach and the overhead added by the container-based cloud environment, we ran a set of experiments to measure throughput, latency, bandwidth utilization, and completion times. The results prove that our approach adds minimal overhead in cloud environment as well as increases throughput and reduces latency. Moreover, our approach demonstrates a suitable alternative for running Hadoop applications, reducing completion times up to 33.73% Close http://ieeexplore.ieee.org/document/8035136/ doi:10.1109/HPCS.2017.97 Close
	Ledur, Cleverson; Griebler, Dalvan; Manssour, Isabel; Fernandes, Luiz Gustavo A High-Level DSL for Geospatial Visualizations with Multi-core Parallelism Support Inproceedings doi In: 41th IEEE Computer Society Signature Conference on Computers, Software and Applications, pp. 298-304, IEEE, Torino, Italy, 2017. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{LEDUR:COMPSAC:17, title = {A High-Level DSL for Geospatial Visualizations with Multi-core Parallelism Support}, author = {Cleverson Ledur and Dalvan Griebler and Isabel Manssour and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/COMPSAC.2017.18}, doi = {10.1109/COMPSAC.2017.18}, year = {2017}, date = {2017-07-01}, booktitle = {41th IEEE Computer Society Signature Conference on Computers, Software and Applications}, pages = {298-304}, publisher = {IEEE}, address = {Torino, Italy}, series = {COMPSAC'17}, abstract = {The amount of data generated worldwide associated with geolocalization has exponentially increased over the last decade due to social networks, population demographics, and the popularization of Global Positioning Systems. Several methods for geovisualization have already been developed, but many of them are focused on a specific application or require learning a variety of tools and programming languages. It becomes even more difficult when users have to manage a large amount of data because state-of-the-art alternatives require the use of third-party pre-processing tools. We present a novel Domain-Specific Language (DSL), which focuses on large data geovisualizations. Through a compiler, we support automatic visualization generations and data pre-processing. The system takes advantage of multi-core parallelism to speed-up data pre-processing abstractly. Our experiments were designated to highlight the programming effort and performance of our DSL. The results have shown a considerable programming effort reduction and efficient parallelism support with respect to the sequential version.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close The amount of data generated worldwide associated with geolocalization has exponentially increased over the last decade due to social networks, population demographics, and the popularization of Global Positioning Systems. Several methods for geovisualization have already been developed, but many of them are focused on a specific application or require learning a variety of tools and programming languages. It becomes even more difficult when users have to manage a large amount of data because state-of-the-art alternatives require the use of third-party pre-processing tools. We present a novel Domain-Specific Language (DSL), which focuses on large data geovisualizations. Through a compiler, we support automatic visualization generations and data pre-processing. The system takes advantage of multi-core parallelism to speed-up data pre-processing abstractly. Our experiments were designated to highlight the programming effort and performance of our DSL. The results have shown a considerable programming effort reduction and efficient parallelism support with respect to the sequential version. Close https://doi.org/10.1109/COMPSAC.2017.18 doi:10.1109/COMPSAC.2017.18 Close
	Vogel, Adriano; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo An Intra-Cloud Networking Performance Evaluation on CloudStack Environment Inproceedings doi In: 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 5, IEEE, St. Petersburg, Russia, 2017. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{larcc:intra-cloud_networking_cloudstack:PDP:17, title = {An Intra-Cloud Networking Performance Evaluation on CloudStack Environment}, author = {Adriano Vogel and Dalvan Griebler and Claudio Schepke and Luiz Gustavo Fernandes}, url = {http://ieeexplore.ieee.org/document/7912689/}, doi = {10.1109/PDP.2017.40}, year = {2017}, date = {2017-03-01}, booktitle = {25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)}, pages = {5}, publisher = {IEEE}, address = {St. Petersburg, Russia}, series = {PDP'17}, abstract = {Infrastructure-as-a-Service (IaaS) is a cloud on-demand commodity built on top of virtualization technologies and managed by IaaS tools. In this scenario, performance is a relevant matter because a set of aspects may impact and increase the system overhead.Specific on the network, the use of virtualized capabilities may cause performance degradation (eg.,latency, throughput). The goal of this paper is to contribute to networking performance evaluation, providing new insights for private IaaS clouds. To achieve our goal, we deploy CloudStack environments and conduct experiments with different configurations and techniques. The research findings demonstrate that KVM-based cloud instances have small network performance degradation regarding throughput (about 0.2% for coarse-grained and 6.8% for fine-grained messages) while container-based instances have even better results. On the other hand, the KVM instances present worst latency (about 12.4% on coarse-grained and two times more on fine-grained messages w.r.t. native environment) and better in container-based instances, where the performance results are close to the native environment. Furthermore, we demonstrate a performance optimization of applications running on KVM.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Infrastructure-as-a-Service (IaaS) is a cloud on-demand commodity built on top of virtualization technologies and managed by IaaS tools. In this scenario, performance is a relevant matter because a set of aspects may impact and increase the system overhead.Specific on the network, the use of virtualized capabilities may cause performance degradation (eg.,latency, throughput). The goal of this paper is to contribute to networking performance evaluation, providing new insights for private IaaS clouds. To achieve our goal, we deploy CloudStack environments and conduct experiments with different configurations and techniques. The research findings demonstrate that KVM-based cloud instances have small network performance degradation regarding throughput (about 0.2% for coarse-grained and 6.8% for fine-grained messages) while container-based instances have even better results. On the other hand, the KVM instances present worst latency (about 12.4% on coarse-grained and two times more on fine-grained messages w.r.t. native environment) and better in container-based instances, where the performance results are close to the native environment. Furthermore, we demonstrate a performance optimization of applications running on KVM. Close http://ieeexplore.ieee.org/document/7912689/ doi:10.1109/PDP.2017.40 Close
	Griebler, Dalvan; Danelutto, Marco; Torquati, Massimo; Fernandes, Luiz Gustavo SPar: A DSL for High-Level and Productive Stream Parallelism Journal Article doi In: Parallel Processing Letters, vol. 27, no. 01, pp. 1740005, 2017. (Abstract \| Links \| BibTeX \| Tags: ) @article{GRIEBLER:PPL:17, title = {SPar: A DSL for High-Level and Productive Stream Parallelism}, author = {Dalvan Griebler and Marco Danelutto and Massimo Torquati and Luiz Gustavo Fernandes}, url = {http://dx.doi.org/10.1142/S0129626417400059}, doi = {10.1142/S0129626417400059}, year = {2017}, date = {2017-03-01}, urldate = {2017-03-01}, journal = {Parallel Processing Letters}, volume = {27}, number = {01}, pages = {1740005}, publisher = {World Scientific}, abstract = {This paper introduces SPar, an internal C++ Domain-Specific Language (DSL) that supports the development of classic stream parallel applications. The DSL uses standard C++ attributes to introduce annotations tagging the notable components of stream parallel applications: stream sources and stream processing stages. A set of tools process SPar code (C++ annotated code using the SPar attributes) to generate FastFlow C++ code that exploits the stream parallelism denoted by SPar annotations while targeting shared memory multi-core architectures. We outline the main SPar features along with the main implementation techniques and tools. Also, we show the results of experiments assessing the feasibility of the entire approach as well as SPar’s performance and expressiveness.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close This paper introduces SPar, an internal C++ Domain-Specific Language (DSL) that supports the development of classic stream parallel applications. The DSL uses standard C++ attributes to introduce annotations tagging the notable components of stream parallel applications: stream sources and stream processing stages. A set of tools process SPar code (C++ annotated code using the SPar attributes) to generate FastFlow C++ code that exploits the stream parallelism denoted by SPar annotations while targeting shared memory multi-core architectures. We outline the main SPar features along with the main implementation techniques and tools. Also, we show the results of experiments assessing the feasibility of the entire approach as well as SPar’s performance and expressiveness. Close http://dx.doi.org/10.1142/S0129626417400059 doi:10.1142/S0129626417400059 Close
2016
	Pieper, Ricardo; Griebler, Dalvan; Lovato, Adalberto Towards a Software as a Service for Biodigestor Analytics Journal Article doi In: Revista Eletrônica Argentina-Brasil de Tecnologias da Informação e da Comunicação (REABTIC), vol. 1, no. 5, pp. 15, 2016. (Abstract \| Links \| BibTeX \| Tags: ) @article{larcc:saas_analytics:REABTIC:16, title = {Towards a Software as a Service for Biodigestor Analytics}, author = {Ricardo Pieper and Dalvan Griebler and Adalberto Lovato}, url = {http://larcc.setrem.com.br/wp-content/uploads/2017/04/PIEPER_REABTIC_2016.pdf}, doi = {10.5281/zenodo.345587}, year = {2016}, date = {2016-08-01}, journal = {Revista Eletrônica Argentina-Brasil de Tecnologias da Informação e da Comunicação (REABTIC)}, volume = {1}, number = {5}, pages = {15}, publisher = {SETREM}, address = {Três de Maio, Brazil}, abstract = {The field of machine learning is becoming even more important in the last years. The ever-increasing amount of data and complexity of computational problems challenges the currently available technology. Meanwhile, anaerobic digesters represent a good alternative for renewable energy production in Brazil. However, performing efficient and accurate predictions/analytics while completely abstracting machine learning details from end-users might not be a simple task to achieve. Usually, such tools are made for a specific scenario and may not fit with particular and general needs. Our goal was to create a SaaS for biogas data analytics by using a neural network. Therefore, an open source, cloud-enabled SaaS (Software as a Service) was developed and deployed in LARCC (Laboratory of Advanced Researches on Cloud Computing) at SETREM. The results have shown the SaaS application is able to perform predictions. The neural network's accuracy is not significantly worse than a state-of-the-art implementation, and its training speed is faster. The user interface demonstrates to be intuitive, and the predictions were accurate when providing the training algorithm with sufficient data. In addition, the file processing and network training time were good enough under traditional workload conditions.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close The field of machine learning is becoming even more important in the last years. The ever-increasing amount of data and complexity of computational problems challenges the currently available technology. Meanwhile, anaerobic digesters represent a good alternative for renewable energy production in Brazil. However, performing efficient and accurate predictions/analytics while completely abstracting machine learning details from end-users might not be a simple task to achieve. Usually, such tools are made for a specific scenario and may not fit with particular and general needs. Our goal was to create a SaaS for biogas data analytics by using a neural network. Therefore, an open source, cloud-enabled SaaS (Software as a Service) was developed and deployed in LARCC (Laboratory of Advanced Researches on Cloud Computing) at SETREM. The results have shown the SaaS application is able to perform predictions. The neural network's accuracy is not significantly worse than a state-of-the-art implementation, and its training speed is faster. The user interface demonstrates to be intuitive, and the predictions were accurate when providing the training algorithm with sufficient data. In addition, the file processing and network training time were good enough under traditional workload conditions. Close http://larcc.setrem.com.br/wp-content/uploads/2017/04/PIEPER_REABTIC_2016.pdf doi:10.5281/zenodo.345587 Close
	Griebler, Dalvan Domain-Specific Language & Support Tool for High-Level Stream Parallelism PhD Thesis Faculdade de Informática - PPGCC - PUCRS, 2016. (Abstract \| Links \| BibTeX \| Tags: ) @phdthesis{GRIEBLER:PHD:16, title = {Domain-Specific Language & Support Tool for High-Level Stream Parallelism}, author = {Dalvan Griebler}, url = {http://tede2.pucrs.br/tede2/handle/tede/6776}, year = {2016}, date = {2016-06-01}, address = {Porto Alegre, Brazil}, school = {Faculdade de Informática - PPGCC - PUCRS}, abstract = {Stream-based systems are representative of several application domains including video, audio, networking, graphic processing, etc. Stream programs may run on different kinds of parallel architectures (desktop, servers, cell phones, and supercomputers) and represent significant workloads on our current computing systems. Nevertheless, most of them are still not parallelized. Moreover, when new software has to be developed, programmers often face a trade-off between coding productivity, code portability, and performance. To solve this problem, we provide a new Domain-Specific Language (DSL) that naturally/on-the-fly captures and represents parallelism for stream-based applications. The aim is to offer a set of attributes (through annotations) that preserves the program's source code and is not architecture-dependent for annotating parallelism. We used the C++ attribute mechanism to design a ``textitde-facto'' standard C++ embedded DSL named SPar. However, the implementation of DSLs using compiler-based tools is difficult, complicated, and usually requires a significant learning curve. This is even harder for those who are not familiar with compiler technology. Therefore, our motivation is to simplify this path for other researchers (experts in their domain) with support tools (our tool is CINCLE) to create high-level and productive DSLs through powerful and aggressive source-to-source transformations. In fact, parallel programmers can use their expertise without having to design and implement low-level code. The main goal of this thesis was to create a DSL and support tools for high-level stream parallelism in the context of a programming framework that is compiler-based and domain-oriented. Thus, we implemented SPar using CINCLE. SPar supports the software developer with productivity, performance, and code portability while CINCLE provides sufficient support to generate new DSLs. Also, SPar targets source-to-source transformation producing parallel pattern code built on top of FastFlow and MPI. Finally, we provide a full set of experiments showing that SPar provides better coding productivity without significant performance degradation in multi-core systems as well as transformation rules that are able to achieve code portability (for cluster architectures) through its generalized attributes.}, keywords = {}, pubstate = {published}, tppubtype = {phdthesis} } Close Stream-based systems are representative of several application domains including video, audio, networking, graphic processing, etc. Stream programs may run on different kinds of parallel architectures (desktop, servers, cell phones, and supercomputers) and represent significant workloads on our current computing systems. Nevertheless, most of them are still not parallelized. Moreover, when new software has to be developed, programmers often face a trade-off between coding productivity, code portability, and performance. To solve this problem, we provide a new Domain-Specific Language (DSL) that naturally/on-the-fly captures and represents parallelism for stream-based applications. The aim is to offer a set of attributes (through annotations) that preserves the program's source code and is not architecture-dependent for annotating parallelism. We used the C++ attribute mechanism to design a ``textitde-facto'' standard C++ embedded DSL named SPar. However, the implementation of DSLs using compiler-based tools is difficult, complicated, and usually requires a significant learning curve. This is even harder for those who are not familiar with compiler technology. Therefore, our motivation is to simplify this path for other researchers (experts in their domain) with support tools (our tool is CINCLE) to create high-level and productive DSLs through powerful and aggressive source-to-source transformations. In fact, parallel programmers can use their expertise without having to design and implement low-level code. The main goal of this thesis was to create a DSL and support tools for high-level stream parallelism in the context of a programming framework that is compiler-based and domain-oriented. Thus, we implemented SPar using CINCLE. SPar supports the software developer with productivity, performance, and code portability while CINCLE provides sufficient support to generate new DSLs. Also, SPar targets source-to-source transformation producing parallel pattern code built on top of FastFlow and MPI. Finally, we provide a full set of experiments showing that SPar provides better coding productivity without significant performance degradation in multi-core systems as well as transformation rules that are able to achieve code portability (for cluster architectures) through its generalized attributes. Close http://tede2.pucrs.br/tede2/handle/tede/6776 Close
	Vogel, Adriano; Griebler, Dalvan; Maron, Carlos A. F.; Schepke, Claudio; Fernandes, Luiz Gustavo Private IaaS Clouds: A Comparative Analysis of OpenNebula, CloudStack and OpenStack Inproceedings doi In: 24th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 672-679, IEEE, Heraklion Crete, Greece, 2016. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{larcc:IaaS_private:PDP:16, title = {Private IaaS Clouds: A Comparative Analysis of OpenNebula, CloudStack and OpenStack}, author = {Adriano Vogel and Dalvan Griebler and Carlos A. F. Maron and Claudio Schepke and Luiz Gustavo Fernandes}, url = {http://ieeexplore.ieee.org/document/7445407/}, doi = {10.1109/PDP.2016.75}, year = {2016}, date = {2016-02-01}, booktitle = {24th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)}, pages = {672-679}, publisher = {IEEE}, address = {Heraklion Crete, Greece}, series = {PDP'16}, abstract = {Despite the evolution of cloud computing in recent years, the performance and comprehensive understanding of the available private cloud tools are still under research. This paper contributes to an analysis of the Infrastructure as a Service (IaaS) domain by mapping new insights and discussing the challenges for improving cloud services. The goal is to make a comparative analysis of OpenNebula, OpenStack and CloudStack tools, evaluating their differences on support for flexibility and resiliency. Also, we aim at evaluating these three cloud tools when they are deployed using a mutual hypervisor (KVM) for discovering new empirical insights. Our research results demonstrated that OpenStack is the most resilient and CloudStack is the most flexible for deploying an IaaS private cloud. Moreover, the performance experiments indicated some contrasts among the private IaaS cloud instances when running intensive workloads and scientific applications.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Despite the evolution of cloud computing in recent years, the performance and comprehensive understanding of the available private cloud tools are still under research. This paper contributes to an analysis of the Infrastructure as a Service (IaaS) domain by mapping new insights and discussing the challenges for improving cloud services. The goal is to make a comparative analysis of OpenNebula, OpenStack and CloudStack tools, evaluating their differences on support for flexibility and resiliency. Also, we aim at evaluating these three cloud tools when they are deployed using a mutual hypervisor (KVM) for discovering new empirical insights. Our research results demonstrated that OpenStack is the most resilient and CloudStack is the most flexible for deploying an IaaS private cloud. Moreover, the performance experiments indicated some contrasts among the private IaaS cloud instances when running intensive workloads and scientific applications. Close http://ieeexplore.ieee.org/document/7445407/ doi:10.1109/PDP.2016.75 Close
2015
	Adornes, Daniel; Griebler, Dalvan; Ledur, Cleverson; Fernandes, Luiz G. Coding Productivity in MapReduce Applications for Distributed and Shared Memory Architectures Journal Article doi In: International Journal of Software Engineering and Knowledge Engineering, vol. 25, no. 10, pp. 1739-1741, 2015. (Abstract \| Links \| BibTeX \| Tags: ) @article{ADORNES:IJSEKE:15, title = {Coding Productivity in MapReduce Applications for Distributed and Shared Memory Architectures}, author = {Daniel Adornes and Dalvan Griebler and Cleverson Ledur and Luiz G. Fernandes}, url = {http://dx.doi.org/10.1142/S0218194015710096}, doi = {10.1142/S0218194015710096}, year = {2015}, date = {2015-12-01}, urldate = {2015-12-01}, journal = {International Journal of Software Engineering and Knowledge Engineering}, volume = {25}, number = {10}, pages = {1739-1741}, publisher = {World Scientific}, abstract = {MapReduce was originally proposed as a suitable and efficient approach for analyzing and processing large amounts of data. Since then, many researches contributed with MapReduce implementations for distributed and shared memory architectures. Nevertheless, different architectural levels require different optimization strategies in order to achieve high-performance computing. Such strategies in turn have caused very different MapReduce programming interfaces among these researches. This paper presents some research notes on coding productivity when developing MapReduce applications for distributed and shared memory architectures. As a case study, we introduce our current research on a unified MapReduce domain-specific language with code generation for Hadoop and Phoenix++, which has achieved a coding productivity increase from 41.84% and up to 94.71% without significant performance losses (below 3%) compared to those frameworks.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close MapReduce was originally proposed as a suitable and efficient approach for analyzing and processing large amounts of data. Since then, many researches contributed with MapReduce implementations for distributed and shared memory architectures. Nevertheless, different architectural levels require different optimization strategies in order to achieve high-performance computing. Such strategies in turn have caused very different MapReduce programming interfaces among these researches. This paper presents some research notes on coding productivity when developing MapReduce applications for distributed and shared memory architectures. As a case study, we introduce our current research on a unified MapReduce domain-specific language with code generation for Hadoop and Phoenix++, which has achieved a coding productivity increase from 41.84% and up to 94.71% without significant performance losses (below 3%) compared to those frameworks. Close http://dx.doi.org/10.1142/S0218194015710096 doi:10.1142/S0218194015710096 Close
	Ledur, Cleverson; Griebler, Dalvan; Manssour, Isabel; Fernandes, Luiz G. Towards a Domain-Specific Language for Geospatial Data Visualization Maps with Big Data Sets Inproceedings doi In: ACS/IEEE International Conference on Computer Systems and Applications, pp. 8, IEEE, Marrakech, Marrocos, 2015. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{LEDUR:AICCSA:15, title = {Towards a Domain-Specific Language for Geospatial Data Visualization Maps with Big Data Sets}, author = {Cleverson Ledur and Dalvan Griebler and Isabel Manssour and Luiz G. Fernandes}, url = {http://dx.doi.org/10.1109/AICCSA.2015.7507178}, doi = {10.1109/AICCSA.2015.7507178}, year = {2015}, date = {2015-11-01}, booktitle = {ACS/IEEE International Conference on Computer Systems and Applications}, pages = {8}, publisher = {IEEE}, address = {Marrakech, Marrocos}, series = {AICCSA'15}, abstract = {Data visualization is an alternative for representing information and helping people gain faster insights. However, the programming/creating of a visualization for large data sets is still a challenging task for users with low-level of software development knowledge. Our goal is to increase the productivity of experts who are familiar with the application domain. Therefore, we proposed an external Domain-Specific Language (DSL) that allows massive input of raw data and provides a small dictionary with suitable data visualization keywords. Also, we implemented it to support efficient data filtering operations and generate HTML or Javascript output code files (using Google Maps API). To measure the potential of our DSL, we evaluated four types of geospatial data visualization maps with four different technologies. The experiment results demonstrated a productivity gain when compared to the traditional way of implementing (e.g., Google Maps API, OpenLayers, and Leaflet), and efficient algorithm implementation.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Data visualization is an alternative for representing information and helping people gain faster insights. However, the programming/creating of a visualization for large data sets is still a challenging task for users with low-level of software development knowledge. Our goal is to increase the productivity of experts who are familiar with the application domain. Therefore, we proposed an external Domain-Specific Language (DSL) that allows massive input of raw data and provides a small dictionary with suitable data visualization keywords. Also, we implemented it to support efficient data filtering operations and generate HTML or Javascript output code files (using Google Maps API). To measure the potential of our DSL, we evaluated four types of geospatial data visualization maps with four different technologies. The experiment results demonstrated a productivity gain when compared to the traditional way of implementing (e.g., Google Maps API, OpenLayers, and Leaflet), and efficient algorithm implementation. Close http://dx.doi.org/10.1109/AICCSA.2015.7507178 doi:10.1109/AICCSA.2015.7507178 Close
	Griebler, Dalvan; Danelutto, Marco; Torquati, Massimo; Fernandes, Luiz G. An Embedded C++ Domain-Specific Language for Stream Parallelism Inproceedings doi In: Parallel Computing: On the Road to Exascale, Proceedings of the International Conference on Parallel Computing, pp. 317-326, IOS Press, Edinburgh, Scotland, UK, 2015. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{GRIEBLER:PARCO:15, title = {An Embedded C++ Domain-Specific Language for Stream Parallelism}, author = {Dalvan Griebler and Marco Danelutto and Massimo Torquati and Luiz G. Fernandes}, url = {http://dx.doi.org/10.3233/978-1-61499-621-7-317}, doi = {10.3233/978-1-61499-621-7-317}, year = {2015}, date = {2015-09-01}, booktitle = {Parallel Computing: On the Road to Exascale, Proceedings of the International Conference on Parallel Computing}, pages = {317-326}, publisher = {IOS Press}, address = {Edinburgh, Scotland, UK}, series = {ParCo'15}, abstract = {This paper proposes a new C++ embedded Domain-Specific Language (DSL) for expressing stream parallelism by using standard C++11 attributes annotations. The main goal is to introduce high-level parallel abstractions for developing stream based parallel programs as well as reducing sequential source code rewriting. We demonstrated that by using a small set of attributes it is possible to produce different parallel versions depending on the way the source code is annotated. The performances of the parallel code produced are comparable with those obtained by manual parallelization.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close This paper proposes a new C++ embedded Domain-Specific Language (DSL) for expressing stream parallelism by using standard C++11 attributes annotations. The main goal is to introduce high-level parallel abstractions for developing stream based parallel programs as well as reducing sequential source code rewriting. We demonstrated that by using a small set of attributes it is possible to produce different parallel versions depending on the way the source code is annotated. The performances of the parallel code produced are comparable with those obtained by manual parallelization. Close http://dx.doi.org/10.3233/978-1-61499-621-7-317 doi:10.3233/978-1-61499-621-7-317 Close
	Adornes, Daniel; Griebler, Dalvan; Ledur, Cleverson; Fernandes, Luiz G. A Unified MapReduce Domain-Specific Language for Distributed and Shared Memory Architectures Inproceedings doi In: The 27th International Conference on Software Engineering & Knowledge Engineering, pp. 6, Knowledge Systems Institute Graduate School, Pittsburgh, USA, 2015. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{ADORNES:SEKE:15, title = {A Unified MapReduce Domain-Specific Language for Distributed and Shared Memory Architectures}, author = {Daniel Adornes and Dalvan Griebler and Cleverson Ledur and Luiz G. Fernandes}, url = {http://dx.doi.org/10.18293/SEKE2015-204}, doi = {10.18293/SEKE2015-204}, year = {2015}, date = {2015-07-01}, booktitle = {The 27th International Conference on Software Engineering & Knowledge Engineering}, pages = {6}, publisher = {Knowledge Systems Institute Graduate School}, address = {Pittsburgh, USA}, abstract = {MapReduce is a suitable and efficient parallel programming pattern for processing big data analysis. In recent years, many frameworks/languages have implemented this pattern to achieve high performance in data mining applications, particularly for distributed memory architectures (e.g., clusters). Nevertheless, the industry of processors is now able to offer powerful processing on single machines (e.g., multi-core). Thus, these applications may address the parallelism in another architectural level. The target problems of this paper are code reuse and programming effort reduction since current solutions do not provide a single interface to deal with these two architectural levels. Therefore, we propose a unified domain-specific language in conjunction with transformation rules for code generation for Hadoop and Phoenix++. We selected these frameworks as state-of-the-art MapReduce implementations for distributed and shared memory architectures, respectively. Our solution achieves a programming effort reduction from 41.84% and up to 95.43% without significant performance losses (below the threshold of 3%) compared to Hadoop and Phoenix++.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close MapReduce is a suitable and efficient parallel programming pattern for processing big data analysis. In recent years, many frameworks/languages have implemented this pattern to achieve high performance in data mining applications, particularly for distributed memory architectures (e.g., clusters). Nevertheless, the industry of processors is now able to offer powerful processing on single machines (e.g., multi-core). Thus, these applications may address the parallelism in another architectural level. The target problems of this paper are code reuse and programming effort reduction since current solutions do not provide a single interface to deal with these two architectural levels. Therefore, we propose a unified domain-specific language in conjunction with transformation rules for code generation for Hadoop and Phoenix++. We selected these frameworks as state-of-the-art MapReduce implementations for distributed and shared memory architectures, respectively. Our solution achieves a programming effort reduction from 41.84% and up to 95.43% without significant performance losses (below the threshold of 3%) compared to Hadoop and Phoenix++. Close http://dx.doi.org/10.18293/SEKE2015-204 doi:10.18293/SEKE2015-204 Close
2014
	Griebler, Dalvan; Adornes, Daniel; Fernandes, Luiz G. Performance and Usability Evaluation of a Pattern-Oriented Parallel Programming Interface for Multi-Core Architectures Inproceedings In: The 26th International Conference on Software Engineering & Knowledge Engineering, pp. 25-30, Knowledge Systems Institute Graduate School, Vancouver, Canada, 2014. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{GRIEBLER:SEKE:14, title = {Performance and Usability Evaluation of a Pattern-Oriented Parallel Programming Interface for Multi-Core Architectures}, author = {Dalvan Griebler and Daniel Adornes and Luiz G. Fernandes}, url = {https://gmap.pucrs.br/dalvan/papers/2014/CR_SEKE_2014.pdf}, year = {2014}, date = {2014-07-01}, booktitle = {The 26th International Conference on Software Engineering & Knowledge Engineering}, pages = {25-30}, publisher = {Knowledge Systems Institute Graduate School}, address = {Vancouver, Canada}, abstract = {Multi-core architectures have increased the power of parallelism by coupling many cores in a single chip. This becomes even more complex for developers to exploit the avail-able parallelism in order to provide high performance scalable programs. To address these challenges, we propose the DSL-POPP (Domain-Specific Language for Pattern-Oriented Parallel Programming), which links the pattern-based approach in the programming interface as an alternative to reduce the effort of parallel software development, and achieve good performance in some applications. In this paper, the objective is to evaluate the usability and performance of the master/slave pattern and compare it to the Pthreads library. Moreover, experiments have shown that the master/slave interface of the DSL-POPP reduces up to 50% of the programming effort, without significantly affecting the performance.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Multi-core architectures have increased the power of parallelism by coupling many cores in a single chip. This becomes even more complex for developers to exploit the avail-able parallelism in order to provide high performance scalable programs. To address these challenges, we propose the DSL-POPP (Domain-Specific Language for Pattern-Oriented Parallel Programming), which links the pattern-based approach in the programming interface as an alternative to reduce the effort of parallel software development, and achieve good performance in some applications. In this paper, the objective is to evaluate the usability and performance of the master/slave pattern and compare it to the Pthreads library. Moreover, experiments have shown that the master/slave interface of the DSL-POPP reduces up to 50% of the programming effort, without significantly affecting the performance. Close https://gmap.pucrs.br/dalvan/papers/2014/CR_SEKE_2014.pdf Close
2013
	Griebler, Dalvan; Fernandes, Luiz G. Towards a Domain-Specific Language for Patterns-Oriented Parallel Programming Inproceedings doi In: Programming Languages - 17th Brazilian Symposium - SBLP, pp. 105-119, Springer Berlin Heidelberg, Brasilia, Brazil, 2013. (Abstract \| Links \| BibTeX \| Tags: ) @inproceedings{GRIEBLER:SBLP:13, title = {Towards a Domain-Specific Language for Patterns-Oriented Parallel Programming}, author = {Dalvan Griebler and Luiz G. Fernandes}, url = {http://dx.doi.org/10.1007/978-3-642-40922-6_8}, doi = {10.1007/978-3-642-40922-6_8}, year = {2013}, date = {2013-10-01}, booktitle = {Programming Languages - 17th Brazilian Symposium - SBLP}, volume = {8129}, pages = {105-119}, publisher = {Springer Berlin Heidelberg}, address = {Brasilia, Brazil}, series = {Lecture Notes in Computer Science}, abstract = {Pattern-oriented programming has been used in parallel code development for many years now. During this time, several tools (mainly frameworks and libraries) proposed the use of patterns based on programming primitives or templates. The implementation of patterns using those tools usually requires human expertise to correctly set up communication/synchronization among processes. In this work, we propose the use of a Domain Specific Language to create pattern-oriented parallel programs (DSL-POPP). This approach has the advantage of offering a higher programming abstraction level in which communication/synchronization among processes is hidden from programmers. We compensate the reduction in programming flexibility offering the possibility to use combined and/or nested parallel patterns (i.e., parallelism in levels), allowing the design of more complex parallel applications. We conclude this work presenting an experiment in which we develop a parallel application exploiting combined and nested parallel patterns in order to demonstrate the main properties of DSL-POPP.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Pattern-oriented programming has been used in parallel code development for many years now. During this time, several tools (mainly frameworks and libraries) proposed the use of patterns based on programming primitives or templates. The implementation of patterns using those tools usually requires human expertise to correctly set up communication/synchronization among processes. In this work, we propose the use of a Domain Specific Language to create pattern-oriented parallel programs (DSL-POPP). This approach has the advantage of offering a higher programming abstraction level in which communication/synchronization among processes is hidden from programmers. We compensate the reduction in programming flexibility offering the possibility to use combined and/or nested parallel patterns (i.e., parallelism in levels), allowing the design of more complex parallel applications. We conclude this work presenting an experiment in which we develop a parallel application exploiting combined and nested parallel patterns in order to demonstrate the main properties of DSL-POPP. Close http://dx.doi.org/10.1007/978-3-642-40922-6_8 doi:10.1007/978-3-642-40922-6_8 Close

2019

2018

2017

2016

2015

2014

2013