What is iRODS?
What is iRODS?
iRODS is the integrated Rule-Oriented Data-management System, a community-driven, open source, data grid software solution.
What is iRODS used for?
Fundamentally, iRODS helps researchers, archivists and others manage (organize, share, protect, and preserve) large sets of computer files. Collections can range in size from moderate to a hundred million files or more totaling petabytes of data.
The requirements to manage large collections of data include both a number of generic capabilities and diverse features that depend on the details of different applications. iRODS has been designed with a core with a comprehensive set of these generic features, and many applications require only a subset of these features.
Beyond these generic capabilities, iRODS is also highly configurable and easily extensible for a very wide range of use cases through user-defined Micro-services, without having to modify core code.
How many files can iRODS manage?
iRODS can manage many tens to hundreds of millions of files. There are also situations where it can be worthwhile to use it with smaller collections of just a few thousand files.
iRODS has performed well in recent large-scale testing with ~50 million files and ~250 million annotations (metadata items). Other sites have even larger collections. Small instances on commodity PCs perform well too.
Who uses iRODS?
iRODS is used by many projects and teams, small and large, national and international, computer technologists and non.
Major national projects using iRODS have included the National Archives and Records Administration (NARA) Transcontinental Persistent Archives Prototype (TPAP), the Ocean Observatories Initiative (OOI), the National Optical Astronomy Observatory (NOAO), the Southern California Earthquake Center (SCEC), the Chronopolis Digital Preservation Program, the iPlant Collaborative, and many others.
Major international projects using iRODS include the French national high-performance computing center CCIN2P3, the French National Library, the United Kingdom e-Science program, the European Union Sustaining Heritage Access through Multivalent ArchiviNg (SHAMAN) project, the University of Liverpool, the Australian Research Collaboration Service (ARCS), the Centre for e-Research at King’s College London (CeRch) and the High Energy Accelerator Research Organization in Japan (KEK).
At the University of North Carolina at Chapel Hill (UNC) projects and groups that have been using iRODS include the Renaissance Computing Institute (RENCI); the Carolina Digital Repository, with a preservation environment based on iRODS; the Triangle Universities Center for Advanced Studies Inc. (TUCASI) project that is creating a data grid for sharing classroom video between Duke, NCSU, and UNC/RENCI; the UNC School of Information and Library Science (SILS) Life-Long Learning Digital Library for students; UNC-CH Information Technology Services (ITS); the Odum Institute for Research in Social Science, and others.
Groups at UCSD that have been using iRODS/SRB include the Biomedical Informatics Research Network (BIRN), the National Center for Microscopy and Imaging Research (NCMIR), researchers in the Southern California Earthquake Center (SCEC), the San Diego Supercomputer Center (SDSC), UCSD Libraries Digital Asset Management System (DAMS), the Ocean Observatories Initiative (OOI), the Laboratory for Earth and Environmental Science, CineGrid, the Temporal Dynamics of Learning Center (TDLC), and more.
The Download_Statistics page summarizes the 3.1 downloads and provides some sense of the interest in iRODS and its distribution but, of course, provides only an general overview. Since iRODS is fully open-source we do know who all is using it. We also have a list of Science and Engineerng Domains known to be using iRODS.
What does iRODS do?
iRODS includes a set of features that blend together well and augment each other to form a comprehensive whole. iRODS major features include:
- High-performance network data transfer. iRODS transfers data across the network in an integrated manner (get/put, read/write; parallel threads for large files), efficiently using up to 70% of available bandwidth.
- A unified view of disparate data. iRODS uses unique logical names that are separate from the names as stored physically, providing a global ‘logical name-space’. The system (via the iCAT Metadata Catalog in a DBMS) keeps track of the names and locations of files so users don’t have to.
- Support for a wide range of physical storage. iRODS accesses files stored in various systems including Unix and Windows files systems, archival storages systems (HPSS, tapes), etc; and does this in the same manner for each (i.e. to users they all look the same).
- Easy back up and replication. iRODS provides easy, automated replication and backup to multiple storage devices/locations at the physical level. So, users access the files via the logical names and the system finds and gets the physical files.
- Manages metadata (data about data). iRODS metadata is both system (automatic) and user-defined, and stored in the iCAT Metadata Catalog running in a DBMS (PostgreSQL, Oracle or MySQL). Users can query the system to find, use, verify, etc. files with particular attributes (metadata).
- Controlled access. iRODS provides fine-grained controlled access, by user or group. Users are authenticated using a iRODS secure password mechanism or other standards including Grid Security Infrastructure (GSI), Kerberos, Shibboleth, etc.
- Policies, Rules and Micro-services. iRODS innovative Rule Engine applies local and community Policies expressed as Rules and executed via server-side Micro-services. Rules invoke other Rules and/or Micro-services making the system highly configurable for site-specific needs and automated for cost-effective administration of today’s mushrooming data collections.
- Workflows. A workflow is a series of steps to be done to process data. These can be executed as part of normal operation (e.g. a Rule can be run as a file is initially stored to automatically make an offsite replica) or as delayed or periodic Rules.
- Management of large collections. Various features including: irsync (to check and synchronize between iRODS collections and local storage), audit trails (to record activity, verify authenticity, show compliance with human subjects access controls, etc.), metadata (to help organize and find data), bulk ingestion, etc.
How did iRODS come about?
iRODS, the integrated Rule Oriented Data-management System, has been developed since 2006 with support from the National Science Foundation, the National Archives and Records Administration, and other agencies. The Data Intensive Cyber Environments (DICE) team has been developing related software since 1997 with the Storage Resource Broker (SRB) and now its follow-on iRODS. The approach has been user-driven, developing software that meets the needs of particular immediate projects, with features that can also be used by other projects.
Who is developing iRODS?
iRODS is developed and supported by the Data Intensive Cyber Environments (DICE) group of the University of North Carolina at Chapel Hill and the University of California San Diego. We are a team of about a dozen researchers and software engineers. The UCSD side, DICE-UCSD, is part of the Institute for Neural Computation organized research unit.
What support is available?
The DICE team and the iRODS community are happy to provide assistance in using iRODS through the irods-chat email list (see the Main Page), email, phone, etc. On larger projects, additional support and project-specific features can be added via collaborative arrangements. Your use of iRODS benefits the ongoing open source project by broadening the community’s experience and use cases of iRODS. User feedback and suggestions on features and documentation also help improve the software, benefitting the user community. Some user projects also contribute code in the iRODS open source development effort. We are also open to teaming with projects on proposals in which iRODS can provide added value as data-management/data-grid infrastructure.
How easily can iRODS be modified/configured?
iRODS is designed to provide both a core of powerful generic data management capabilities, combined with a highly configurable layer with a Rule-Engine at its core for site-specific tailoring. The default Rule-set is sufficient for most needs and can be modified and extended to meet specific requirements for your particular application.
iRODS can operate as a complete stand-alone system (utilizing storage systems, database systems, and networks underneath) and also as ‘middleware’ where higher-level and application-specific software makes use of iRODS as part of its infrastructure.
What types of computers does iRODS run on?
iRODS runs on Linux, Unix, and MacIntosh hosts ranging from small commodity PCs to large-scale high performance clusters. Both the client and server components also run on Windows, however, the iCAT Metadata Catalog is not currently supported on Windows.
A simple example use case would be to install two commodity PCs at geographically distributed locations, and install iRODS on both as a single iRODS instance (two iRODS Servers, one of which includes the iCAT Metadata Catalog). You can then easily and automatically back up all the data on one PC to the other, providing a low-cost, reasonably-secure and reliable storage system, using e.g. the Ubuntu-Linux OS which works well for this.
What is Metadata?
Metadata is data that describes your data. In iRODS, there is both user-defined and system-defined (automatic) metadata about files, collections, users, and more, used to help you annotate, locate, and manage your collections. Metadata are actually maintained as rows in a DataBase Management System (DBMS). One reason iRODS runs well at larger sizes (scales well) is that it uses a DBMS to manage metadata, and DBMS systems like Oracle, PostgreSQL, and MySQL, scale well. We call this iRODS DBMS the iRODS Metadata Catalog, or iCAT.
How do I get started using iRODS?
iRODS is freely available under a BSD open source license via the Downloads page on the main iRODS web site irods.org. A full install can easily be done using the automated install script on Unix and Linux host computers in about half an hour. Installing and going through a self-paced online Tutorial with a small iRODS instance is a good way to start exploring its features.
Where do I get more information?
For more information, see the iRODS Main Page, which contains links to various introductions, publications, tutorials, and the email list (irods-chat), etc. You can also email the iRODS team at email@example.com.
For more general information and use cases, see the Data Intensive Cyberinfrastructure Foundation website at http://diceresearch.org.
There's a lot in iRODS for large-scale applications, and it is also effectively utilized in many simpler applications using a subset of the full capabilities.