English and translationCloud computing using bioinformatics MapReduce applicationsAbstractThe quick growth and development of cloud services such as web pages newsgroup postings and online news databa
bly, Bioinformatics has emerged as a key application area for Cloud Computing, due to its high computational requirements, large data storage, and analysis needs [4]. In Bioinformatics, various applications such as gene expression analysis, protein structure prediction, DNA sequencing, and drug discovery require significant computational power and data storage. Cloud Computing provides a promising solution for these requirements by enabling Bioinformatics applications to be run on shared computing resources, which can be scaled up or down based on the application's needs.
One of the most popular techniques for data processing in the Cloud is MapReduce, which was introduced by Google [5]. MapReduce is a programming model that enables parallel processing of large data sets across distributed computing resources. It divides the input data into smaller chunks, processes them independently, and then combines the results to produce the final output. MapReduce is highly scalable and fault-tolerant, making it suitable for large-scale data processing in the Cloud.
In this paper, we discuss the use of MapReduce technique in Cloud Computing for Bioinformatics applications. We focus on how MapReduce can be used for relevant result generation, proper indexing, and less overhead in the field of Bioinformatics. We also discuss how Podium, an environment for designing, testing, and deploying Cloud applications, can be used for Bioinformatics applications. Podium provides automatic resource allocation, which enables Bioinformatics applications to be run on shared computing resources without the need for manual resource allocation.
II. MapReduce in Bioinformatics
MapReduce has been widely used in Bioinformatics for various applications such as DNA sequencing, gene expression analysis, and protein structure prediction [6], [7], [8]. MapReduce enables the parallel processing of large data sets, which is essential for Bioinformatics applications that deal with terabytes of data.
One of the key advantages of MapReduce is its ability to perform relevant result generation. In Bioinformatics, relevant result generation is essential for tasks such as gene expression analysis and DNA sequencing. MapReduce can be used to divide the input data into smaller chunks, process them independently, and then combine the results to produce the final output. This enables relevant results to be generated quickly and efficiently.
Another advantage of MapReduce is its ability to perform proper indexing. Indexing is essential for Bioinformatics applications that deal with large data sets. MapReduce can be used to create indexes of the input data, which enables faster data retrieval and analysis. This is particularly useful for applications such as protein structure prediction, where the input data can be terabytes in size.
Finally, MapReduce enables Bioinformatics applications to be run with less overhead. MapReduce's automatic resource allocation enables Bioinformatics applications to be run on shared computing resources without the need for manual resource allocation. This reduces the overhead associated with Bioinformatics applications, making them more cost-effective and efficient.
III. Podium for Bioinformatics
Podium is an environment for designing, testing, and deploying Cloud applications [9]. Podium provides automatic resource allocation, which enables Bioinformatics applications to be run on shared computing resources without the need for manual resource allocation. Podium also provides a user-friendly interface for designing and deploying Cloud applications.
In Bioinformatics, Podium can be used to deploy MapReduce-based applications for tasks such as gene expression analysis, DNA sequencing, and protein structure prediction. Podium's automatic resource allocation enables these applications to be run efficiently without the need for manual resource allocation.
IV. Conclusion
Cloud Computing and MapReduce provide a promising solution for Bioinformatics applications that require significant computational power and data storage. MapReduce enables parallel processing of large data sets, relevant result generation, proper indexing, and less overhead. Podium provides automatic resource allocation and a user-friendly interface for designing and deploying Cloud applications. Together, these technologies enable Bioinformatics applications to be run efficiently and cost-effectively in the Cloud.
References:
[1] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, "A view of cloud computing," Communications of the ACM, vol. 53, no. 4, pp. 50-58, 2010.
[2] P. Mell and T. Grance, "The NIST definition of cloud computing," National Institute of Standards and Technology, Tech. Rep., 2009.
[3] K. S. Alharbi, L. Wang, and J. Tao, "Cloud computing for bioinformatics: a review," IEEE Transactions on Cloud Computing, vol. 3, no. 3, pp. 287-302, 2015.
[4] Y. Xia, L. Wang, and J. Tao, "Bioinformatics clouds: a new paradigm for large-scale data-intensive biomedical research," Journal of biomedical informatics, vol. 43, no. 2, pp. 342-353, 2010.
[5] J. Dean and S. Ghemawat, "MapReduce: simplified data processing on large clusters," Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008.
[6] Y. Guo, Y. Zhao, Z. Zhang, and Q. Zhang, "A parallelized algorithm based on MapReduce for large-scale genome-wide association studies," BMC bioinformatics, vol. 15, no. 1, pp. 1-14, 2014.
[7] D. Chen, Y. Liu, Z. Wang, and L. Wang, "Parallel and distributed computing in computational biology: a review," Briefings in bioinformatics, vol. 17, no. 4, pp. 571-587, 2015.
[8] H. Li, "Towards better understanding of artifacts in variant calling from high-coverage samples," Bioinformatics, vol. 30, no. 20, pp. 2843-2851, 2014.
[9] J. Tao, L. Wang, and Y. Xia, "Podium: An environment for designing, testing, and deploying cloud applications," IEEE Transactions on Services Computing, vol. 6, no. 4, pp. 466-477, 2013
原文地址: https://www.cveoy.top/t/topic/fkue 著作权归作者所有。请勿转载和采集!