Archiving Critical Research Data at The University of Texas at Austin

The Texas Advanced Computing Center (TACC) at The University of Texas at Austin provides powerful supercomputing resources that enable breakthrough research. TACC needed to replace an aging archive environment with a reliable, scalable solution that could help enforce strict usage policies.

From astrophysics and biomedicine to materials research and earth sciences, researchers in a broad range of disciplines capitalize on TACC’s resources to answer the world’s most complex questions. To preserve data and enable long-term access to that data, TACC offers researchers a large-scale archive environment, called Ranch.

TACC was using a hierarchical storage manager (HSM) solution for the archive environment, but it was reaching the end of its life. “The whole environment was beginning to atrophy,” says Frank Douma, senior systems administrator of large-scale systems at TACC. “We needed an environment that could deliver the data reliability and restorability promised in our service-level agreement.”

Douma also wanted to change the way people used the archive. “Researchers were using the environment more like nearline storage, where they might store their frequently accessed files and current data sets. But it’s not designed for that,” says Douma. “We wanted to implement policies that would make sure researchers would use the environment for long-term archiving rather than as another place to throw any kind of data.”

Building a New Archive with Quantum

After evaluating several possible archive solutions, TACC decided to move forward with a Quantum solution powered by the StorNext® platform. “We recognized that Quantum could deliver the reliability and restorability we needed over the long term,” says Douma.

The newly designed environment uses 30 PB of user-facing disk storage from another vendor, which is connected by InfiniBand to six Quantum Xcellis® workflow storage nodes. Four nodes serve as distributed data movers (DDMs), which help preserve system performance during archive operations. The nodes are then connected to a Quantum Scalar® i6000 tape library.

TACC uses a Quantum QXS™ hybrid disk environment with 4.7 PB to support the HSM. “By integrating the QXS solution, I can continue to support the legacy environment until everyone has migrated their data from it,” says Douma.

Enforcing User Quotas with StorNext®

StorNext gives Douma control over the allotment of storage to researchers. “The StorNext platform enables us to enforce policies that prevent researchers from using the environment like nearline storage,” says Douma. “We’ve put rock-solid quotas in place for directories—each researcher gets 2 TB and 100,000 files. If someone needs more than that, they can request a project space environment. With control over quotas, we can make sure we are optimizing utilization of the archive.”

Accelerating Archiving Performance with Workflow Storage

“Performance wasn’t our highest priority, but the Quantum solution is delivering very strong throughput from disk to tape,” says Douma. “Quantum workflow storage with DDMs provides parallelization of I/O. We concurrently run multiple tape drives across multiple backplanes to get much higher performance than what we had in the past.”

Planning to Scale for Growing Data Volumes

The new environment has a licensed capacity of 5 billion user files. But in the future, TACC plans to expand the Quantum environment to handle even more research data. With that scalability, TACC can continue to support a growing number of researchers, running more sophisticated workloads to solve increasingly complex problems.

Moving Beyond Fighting Fires

The Quantum environment enables Douma to concentrate on new initiatives and future planning rather than constantly worrying about keeping the lights on. “With Quantum, I don’t have to try to make things work,” says Douma. “Instead I can focus on ways to better support cutting-edge research.”