Система мониторинга распределенной обработки и анализа данных в гетерогенной компьютерной среде для приложений физики высоких энергий (Татьяна Корчуганова, ISPRASOPEN-2019)
Материал из 0x1.tv
- Татьяна Корчуганова
BigPanDA monitoring is a web application that provides various processing and representation of the Production and Distributed Analysis (PanDA) system objects states.
Analysing hundreds of millions of computation entities, such as an event or a job, BigPanDA monitoring builds different scales and levels of abstraction reports in real time mode.
Provided information allows users to drill down into the reason of a concrete event failure or observe the broad picture such as tracking the computation nucleus and satellites performance or the progress of a whole production campaign.
PanDA system was originally developed for the ATLAS experiment. Currently, it manages execution of more than 2 million jobs distributed over 170 computing centers worldwide on daily basis. BigPanDA is its core component commissioned in the middle of 2014 and now is the primary source of information for ATLAS users about the state of their computations and the source of decision support information for shifters, operators and managers.
In this work, we describe the evolution of the architecture, current status and plans for the development of the BigPanDA monitoring.
Примечания и ссылки