As Charles Bowden astutely put it, “summertime is always the best of what might be,” and the Center for High Throughput
+Computing (CHTC) couldn’t agree more. Enter the Fellows Program: a new 12-week summer
+initiative where participants collaborate with mentors to each deliver a project that will contribute to high throughput
+computing in support of the nation’s scientific community.
+
+
+
+
Aimed at providing extraordinary opportunities for undergraduate and graduate students, this program offers a chance
+to collaboratively develop software for high throughput computing and cyberinfrastructure, operate complex service
+environments, and facilitate the utilization of large-scale computational services. Coupled with hands-on experience
+and training, the fellows will gain technical skills, as well as research and collaboration skills. It offers these
+students insight into how scientists employ research computing as a tool to advance their studies.
+
+
The summer program kicked off on June 3rd with 8 fellows, 10 mentors, CHTC leaders and the camaraderie of coffee and
+doughnuts. The team was inaugurated by program director Brian Bockelman’s
+welcoming address, shortly followed by mentor meetings and digging into the procedures, schedule reviews, HR policies,
+and breakout sessions for mentor/fellow onboarding.
+
+
+
+
Three days later, the fellows presented to the CHTC team their first (out of three) presentations, detailing their
+projects for the upcoming 12 weeks.
+
+
In addition to the initial presentation during the first week of the program, the fellows will deliver two more talks:
+the first at High Throughput Computing 2024 (HTC),
+where they will give lightning talks about their projects and the challenges they are addressing, and a final
+presentation at the end of the program to share the results of their work and their learnings.
+
+
Out of a deep pool of over 80 applicants, only eight fellows were selected. Among them are Ben Staehle, Kristina Zhao,
+Neha Talluri, Patrick Brophy, Pratham Patel, Ryan Boone, Thinh Nguyen, and Wil Cram. You can read more about their
+projects here.
+
+
+
+
+
+
+
+
+
+
Fellows at their first presentation, introducing themselves and their projects.
+
+
+
+
Through mentorship and support, the CHTC Fellows program aims to develop the Fellows potential and contribute to
+research computing. Whether in research, creativity, or social impact, this fellowship strives to foster the next
+generation of budding engineers and scientists.
+ The American Museum of Natural History Ramps Up Education on Research Computing
+
+
With a multi-day workshop, the museum strives to expand the scale of its educational and training services by bringing additional computing capacity resources to
+New York-area researchers and tapping into the power of high throughput computing (HTC).
+
+
+
+
After “falling in love with the system” during the 2023 OSG School, American Museum of Natural History Museum (AMNH)
+bioinformatics specialist Dean Bobo wondered if he could jump on an offer to bring New York institutions’ and
+researchers’ attention to the OSPool, a pool of computing capacity freely available to U.S.-affiliated institution
+researchers. Research Facilitation Lead Christina Koch mentioned the capacity of the National Science Foundation (NSF)-funded Partnership to Advance Throughput Computing (PATh)
+project to help institutions put on local trainings. So he reached out to Koch — and indeed the offer did stand!
+
+
The PATh project is committed to advancing the state of the art and adoption of high throughput computing (HTC). As part of this commitment, the project annually offers the OSG
+School at UW–Madison, which is open to participants who want to transform their research and scale out utilizing HTC. AMNH wanted to host a shortened version of the OSG School
+for their researchers with the help of the PATh team.
+
+
A Successful Workshop
+
+
Through Koch, Bobo connected with Research Computing Facilitator Rachel Lombardi who helped him plan the OSPool workshop on the second day of the museum’s multi-day workshop.
+“It was for our own museum community, but for other outside institutions as well,” Bobo says. So, Bobo arranged a computational skills training on November 3 and 6 at the AMNH in
+New York, New York. This was the first time the museum arranged a multi-day workshop with one day centered around OSPool resources.
+
+
The first day of the two-day training included a workshop teaching basic computational skills to an audience of students from the museum’s graduate program and graduate students,
+as well as researchers from various institutions around New York City. About 20 people chose to attend the second day, which involved training on OSPool resources. That day, Lombardi
+led a workshop likened to an OSG School crash course, with lectures covering the topics of software and container basics, principles of job submission, troubleshooting, learning about
+the jobs a user is running, and information for the next steps researchers could take.
+
+
+
+
The workshop garnered great success, which Bobo measured through the number of eyes it opened, including “folks who are completely new to HTC but also people who are more experienced
+with high performance computing on our local HPCs. They realized the utility and the capabilities of the OSPool and the resources therein. Some folks after the workshop said that
+they would give it a shot, which is great for me to hear. I feel like all this work was worth it because there are going to be attempts to get their software and pipelines lifted
+over to the OSPool.”
+
+
Empowering the HTC Community
+
+
The AMNH is looking to start hosting more OSPool events, bringing an event inspired by the OSG School locally to New York, and this workshop was the first step toward future OSPool
+workshops. From leading a section of the workshop, Lombardi learned “what resources [the AMNH] would need from PATh facilitators to run its own OSPool trainings.” The goal is to
+“empower them to do these things [conduct training] without necessarily waiting for the annual OSG School,” notes Lombardi. Bobo also picked up a few valuable lessons too. He gained
+insights about community outreach and a better understanding of instructing on HTC and utilizing OSPool capacity.
+
+
In this sense, the workshops the AMNH hosted — with support from PATh — reflected the ideal of “training the trainers” to scale out the facilitation effort and share computing
+capacity. “It won’t be sustainable to come in person and support a training for everyone who asks, so we’re thinking about how to develop and publish easy-to-use training materials
+that people could use on their own, a formal process of (remote) coaching and support, and even a ‘train the trainers’ program where we could build community among people who want
+to run an OSPool training,” Koch explains.
+
+
A Continuing Partnership
+
+
Even before arranging the two-day workshop, the AMNH already had a strong partnership with the PATh and the OSG Consortium, which provides distributed HTC
+services to the research community, Bobo says. The museum contributes its spare CPU power to the OSPool, and museum staff as well as PATh system administrators and facilitators
+communicate regularly. So far the museum has contributed over 15.5 million core hours to the OSPool.
+
+
One way the museum wants to utilize the OSPool capacity is for a genomic surveillance tool that surveys the population dynamics of diseases like COVID-19, RSV, influenza, or other
+emerging diseases. “We’ve been using this method of diversity called K Hill. We’re looking to port that software into the OSPool because it’s computationally expensive to do this
+every day, but that becomes feasible with the OSPool. We would like to make this tool a public resource, but we would have to work with the PATh facilitators to figure out if this
+is logistically possible. We want to make our tools ported to the OSPool so that you don’t need your own dedicated cluster to run an analysis,” Bobo explains.
+
+
Future Directions
+
+
When asked what’s in store for the future of this partnership, Bobo says he wants it to grow by putting on workshops that mirror the OSG School as a means of generating proximity and
+convenience for investigators in New York for whom the school may be out of reach. “We are so enthusiastic about building and continuing our relationship with the PATh project. I’m
+looking forward to developing a workshop that we run here at the museum. In our first year, getting help from the facilitators whom I’m familiar with would be really helpful, and
+this is something that I’m looking forward to doing subsequent to our first workshop to get there. There’s definitely more coming from our collaboration,” Bobo elaborates.
+
+
The PATh facilitators aim to give community members the resources they need to learn about the OSPool and control workload placement at the Access Points, Lombardi explains.
+Attending and arranging trainings at this workshop with the AMNH was one of the ways they upheld this goal. “I feel like we hit the nail on the head with this event set up in that
+we provided OSPool as a resource and they provided a lot of valuable input and feedback; it’s like a two-way street.”
+ Distributed Computing at the African School of Physics 2022 Workshop
+
+
Over 50 students chose to participate in a distributed computing workshop from the 7th biennial African School of Physics (ASP) 2022 at Nelson Mandela University in Gqeberha, South Africa.
+
+
+
+
+
+
Almost 200 students from 41 countries were selected to participate in the 7th ASP 2022 at Nelson Mandela University in Gqeberha, South Africa. With the school being shortened to two weeks, a parallel learning system was implemented, where participants could choose lectures to attend to improve their educational growth. Dr. Horst Severini is a Research Scientist and Adjunct Professor in High Energy Physics and Information Technology from the University of Oklahoma (OU) and a co-leader of the high-performance computing workshop. He anticipated maybe 25 students attending on his track, “…we had about that many laptops,” he remarked, “and then we ended up with over 50 students!”
+
+
Severini was first introduced to distributed computing during his postdoc at OU. Then in the spring of 2012, Severini was introduced to Kétévi Assamagan, one of the founders of the ASP. Assamagan met with Severini and invited him and his colleagues to participate, leading to a scramble to create a curriculum for this new lecture series. They were eager to show students how distributed computing could help with their work.
+
+
After a few years of fine-tuning the high throughput classes, Severini has the workshop ironed out. After receiving an introduction to basic commands in Linux, the students started with a basic overview of high-energy physics, why computing is important to high-energy physics, and then some HTCondor basics. “The goal, really, is to teach students the basics of HTCondor, and then let them go off and see what they can do with it,” Severini explained. The workshop was so successful that students worked through coffee breaks and even stuck around at the end to obtain OSG accounts to continue their work.
+
+
A significant improvement for the 2022 high-performance computing workshop was the move from using OSG Connect for training sessions to Jupyter Notebooks. The switch to Jupyter Notebooks for training developed during the middle of 2022. “Jupyter allows people to ‘test drive’ submitting jobs on an HTCondor system without needing to create a full OSPool account,” OSGResearch Computing FacilitatorChristina Koch clarified. “Moving forward, we hope people can keep using the Jupyter Notebook interface once they get a full OSPool account so that they can move seamlessly from the training experience to all of the OSPool.”
+
+
+
+
“[Jupyter Notebooks] worked quite well,” Severini said, noting that the only issue was that a few people lost their home directories overnight. However, these “beginning glitches” didn’t slow participants down whatsoever. “People enjoyed [the workshop] and showed it by not wanting to leave during breaks; they just wanted to keep working!”
+
+
Severini’s main goal for the high-performance computing workshop is to migrate the material into Jupyter Notebooks. “I’ve always been most familiar with shell scripts, so I always do anything I can in there because I know it’s repeatable…but I’ll adapt, so we’ll work on that for the next one,” he explains.
+
+
Overall, “everything’s been working well, and the students enjoy it; we’ll keep adjusting and going with the times!”
+
+
…
+
+
More information about scheduling and materials from the 7th ASP 2022. The 8th ASP 2024 will take place in Morocco, Africa. Check this site for more information as it comes out.
+
+
For more information or questions about the switch to Jupyter Notebooks, please email chtc@cs.wisc.edu.
+ Antimatter: Using HTC to study very rare processes
+
+
+
+
The final speaker at the OSG User School Showcase was Anirvan Shukla, a graduate student at the University of Hawai’i Mānoa, and this wasn’t his first school event. In 2016, Anirvan attended as a participant, but today he assumed the role of presenter and had the opportunity to explain how high throughput computing (HTC) has transformed his research in the last five years.
+
+
Anirvan studies antimatter and the extremely rare processes that produce it. Hypothetical dark matter decays into different matter and antimatter particles, like protons, antiprotons, deuterons, and anti-deuterons. When these particles are detected, they suggest that there may be dark matter inside or outside our galaxy. However, these matter and antimatter particles are also produced by the regular collisions of cosmic rays with the particles that make up the interstellar medium.
+
+
Given their rarity, such events can only really be studied with simulations, where they’re still extremely rare. In order to determine whether antimatter particles can be attributed to the decay of dark matter –– or if they’re merely a product of regular cosmic interactions –– Anirvan would need to simulate trillions of collisions.
+
+
Leveraging what he learned at the OSG School, Anirvan knew he would only be able to tackle these computations using the capacity of the Open Science Pool (OSPool). Capturing the impact of the OSG’s computing resources, Anirvan attests, “this project definitely would not have been possible on any other cluster that I have access to.”
+
+
For instance, to observe antihelium particles, a researcher must simulate approximately 100 trillion events, in this case proton-proton collisions. One million of such events typically require about one CPU hour of computation. Therefore, a researcher needs roughly 100 million CPU hours in order to see a few antihelium particles –– that’s equal to 12,000 years on a single CPU. So, Anirvan divided his work into chunks of 10 hour jobs, each containing 10 million simulations. Within each job, the final output file was also analyzed and all the relevant data was extracted and placed in a histogram. This reduces the total size of the output files, which are then transferred over to the server at the University of Hawai’i by an automated workflow that Anirvan created with HTCondor’s DAGMan feature.
+
+
In his presentation at the OSG School, Anirvan noted that over the last two years, he submitted more than 8 million jobs to the OSPool and used nearly 50 million core hours. The results from his simulations generated a spectra that had never been produced before, shown below.
+
+
+
+
If Anirvan had tried to run these simulations on his own laptop, he would still be searching for dark matter in the year 14,021. Even the available computing resources at CERN and the University of Hawai’i weren’t enough for this colossal project –– the OSPool was necessary.
+
+
…
+
+
This article is part of a series of articles from the 2021 OSG Virtual School Showcase. OSG School is an annual education event for researchers who want to learn how to use distributed high throughput computing methods and tools. The Showcase, which features researchers sharing how HTC has impacted their work, is a highlight of the school each year.
+ Centuries of newspapers are now easily searchable thanks to HTCSS
+
+
+
+
The Bibliothèque et Archives nationales du Québec (BAnQ) has been using the HTCondor Software Suite (HTCSS) to help digitize their vast collections of documents
+since 2013. Just this year, they built a powerful computing cluster out of staff workstations, using HTCSS’s cycle scavenging capabilities to tackle their largest
+computational endeavor yet.
+
+
Anything published in Québec –– books, magazines, newspapers, and more –– is all housed within BAnQ, an institution uniting Québec’s National Library, the
+province’s National Archives, and Montreal’s vast public library. “You can imagine the result is a colossal amount of materials,” attests Senior Computer
+Technician David Lamarche, “ranging from the discovery of the Americas and the very beginning of the colony, to whatever’s being written in newspapers this week.”
+Ultimately, these archives and collections reflect important historical moments, rich cultural heritage, and a tremendous amount of data.
+
+
To tackle this archival mountain, the digital collections team at BAnQ enlists the help of the HTCondor Software Suite (HTCSS)
+to transform images of pages into text, which can be analyzed in-house and made available to the public. This was the goal of their largest computational project yet ––
+completing text recognition on decades of articles from 114 archived newspapers in order to make them available for full-text search. This feat took them several years,
+but on July 12 of this year, the digital collections team finished text recognition on the very last newspaper.
+
+
Now, with full-text search available, users of the BAnQ Digital Archives and Collections have nearly 260 years of cultural and
+historical moments at their fingertips. Information that used to be buried in the ink of these newspapers, accessible only through time-consuming searches and
+tedious record-keeping, can now be unearthed with mere strokes of a keyboard. This saves users immense amounts of time and elevates the cultural value of the
+documents themselves.
+
+
The end result wouldn’t have happened quite as fast without the ability of HTCSS to automate the work across BAnQ’s staff workstations. File analyses, conversions,
+and text recognitions that typically took weeks or even months to complete are now completed in the same week, or perhaps even overnight.
+
+
“HTCondor has become nothing less than a central pillar of our team,” attests David Lamarche, the HTCondor administrator for the digital collections team.
+“We want to give credit to HTCondor for its role in this project’s success, as we would not have reached that milestone quite so quickly without it!”
+
+
But accelerating digitization was only half the battle. David reflects that the project’s main challenge “was not only to process this backlog of 114 newspapers,
+but to do so while minimizing the impact on our daily workflows for newly-digitized titles.” Continuing, he explains two HTCondor features that were vital to the
+project’s completion: “The first is HTCondor’s scalability, which allowed us to easily add more workstations to our resource pool. The second is HTCondor’s
+resource distribution mechanisms, which we were able to configure to control how many resources could be allocated to processing older titles.”
+
+
Over the course of the project, the team used HTCSS to process over 5 million files. Many of the newspapers span decades, and some centuries, with new issues
+published monthly, weekly, or even daily. For every issue, each page is manually scanned before the team uses HTCondor to analyze the file, convert it into a
+high-quality version, prepare it for text recognition, conduct text recognition, and finally convert the file into a smaller, lower-quality version that can be
+disseminated on a web platform. Throughout the workflow, the team integrated a variety of software tools into their jobs, which ran by cycle scavenging on 50
+workstations when they were not being used by in-office staff.
+
+
+
+
The La Patrie newspaper, which circulated as one of the main news sources in Québec from 1879 to 1978, was one of the larger publications that the team digitized.
+Recounting of the Great Depression, both world wars, and a plethora of other important historical events are buried in its –– now digital –– ink. Consisting of
+more than 600,000 files, text recognition on La Patrie would take an estimate of 18 years on a single workstation. With HTCondor, this publication was successfully
+processed in merely 8 months.
+
+
Digitization –– enabled by the HTCondor Software Suite –– offers a solution to the tradeoff between the preservation of these cultural documents and their
+accessibility, and even adds value back into the documents themselves by enabling full-text searches. In the future, BAnQ’s digitization team hopes to expand their
+use of HTCSS to text recognition on handwritten documents and perhaps even object recognition in photographs.
+ Construction Commences on CHTC's Future Home in New CDIS Building
+
+
Breaking ground is as symbolic as it is exciting – a metaphorical act of consecrating a new location and the start of something new. On April 25th, UW-Madison broke ground on 1240 W. Johnson St., Madison WI, a location that will become the new building site for the School of Computer, Data & Information Sciences and the new home for the Center for High Throughput Computing (CHTC) in 2025.
+
+
“The new CDIS building is the latest crest in a wave of expansion and renewal enhancing the campus landscape to meet the needs of current and future Badgers,” the university reports. This building, expected to be nearly 350000 square feet, will be the most sustainable facility on campus and will create a new center of activity for UW, enabling important connections and establishing a tech corridor from Physics and Chemistry to the Discovery Building to the College of Engineering.
+
+
CHTC Technical Lead Todd Tannenbaum wryly remarks that “while the 1960’s charm of our current old building is endearing at times (isn’t cinder block making a comeback?), I am inspired by the opportunity to work every day in a new and modern building. I am also especially excited by how this will open up new possibilities for collaboration across not only Comp Sci, but also the community of faculty and researchers in the Information School, Statistics, and Data Sciences.”
+
+
Read more about the extensive construction plans ahead, the budget, and how the project is being funded here. Launch a virtual tour of the building here.
+ CHTC Facilitation Innovations for Research Computing
+
+
After adding Research Computing Facilitators in 2013-2014, CHTC has expanded its reach to support researchers in all disciplines interested in using large-scale computing to support their research through the shared computing capacity offered by the CHTC.
+
+
+
+
As the core research computing center at the University of Wisconsin-Madison and the leading high throughput computing (HTC) force nationally, the Center for High Throughput Computing (CHTC), formed in 2014, has always had one simple goal: to help researchers in all fields use HTC to advance their work.
+
+
Soon after its founding, CHTC learned that computing capacity alone was not enough; there needed to be more communication between researchers who used computing and the computer scientists who wanted to help them. To address this gap, the CHTC needed a new, two-way communication model that better understood and advocated for the needs of researchers and helped them understand how to apply computing to transform their research. In 2013, CHTC hired its first Research Computing Facilitator (RCF), Lauren Michael, to implement this new model and provide staff experience in domain research, research computing, and communication/teaching skills. Since then, the team has expanded to include additional facilitators, which today include Christina Koch, now leading the team, Rachel Lombardi, and an additional team member CHTC is actively hiring.
+
+
What is an RCF?
+
An RCF’s job is to understand a new user’s research goals and provide computing options that fit their needs. “As a Research Computing Facilitator, we want to facilitate the researcher’s use of computing,” explains Koch. “They can come to us with problems with their research, and we can advise them on different computing possibilities.”
+
+
Computing facilitators know how to work with researchers and understand research enough to guide the customizations researchers need. More importantly, RCFs are passionate about helping people and solving problems.
+
+
In the early days of CHTC, it was a relatively new idea to hire people with communication and problem-solving skills and apply those talents to computational research. Having facilitators with these skills bridge the gap between research computing organizations and researchers was what was unique to CHTC; in fact, the term “Research Computing Facilitator” was coined at UW-Madison.
+
+
RCF as a part of the CHTC model
+
Research computing facilitators have become an integral part of the CHTC and are a unique part of the model for this center. Koch elaborates that “…what’s unique at the CHTC is having a dedicated role – that we’re not just ‘user support’ responding to people’s questions, but we’re taking this more proactive, collaborative stance with researchers.” Research Computing Facilitators strengthen the CHTC and allow a more diverse range of computing dimensions to be supported. This support gives these researchers a competitive edge that others may not necessarily have.
+
+
The uniqueness of the RFC role allows for customized solutions for researchers and their projects. They meet with every researcher who requests an account to use CHTC computing resources. These individual meetings allow RCFs to have strategic conversations to provide personal recommendations and discuss long-term goals.
+
+
Meetings between the facilitators and researchers also get researchers thinking about what they could do if they could do things faster, at a grander scale, and with less time and effort investment for each project. “We want to understand what their research project is, the goals of that project, and the limitations they’re concerned with to see if using CHTC resources could aid them,” Lombardi explains. “We’re always willing to push the boundaries of our services to try to accommodate to researchers’ needs.” The RCFs must know enough about the researchers’ work to talk to the researchers about the dimensions of their computing requirements in terms they understand.
+
+
Although RCFs are integral to CHTC’s model, that doesn’t mean it doesn’t come without challenges. One hurdle is that they are facilitators, which means they’re ultimately not the ones to make choices for the researchers they support. They present solutions given each researcher’s unique circumstances, and it’s up to researchers to decide what to do. Koch explains that“it’s about finding the balance between helping them make those decisions while still having them do the actual work, even if it’s sometimes hard, because they understand that it will pay off in the long run.”
+
+
Supporting research computing across domains is also a significant CHTC facilitation accomplishment. Researchers used to need a programming background to apply computing to their analyses, which meant the physical sciences typically dominated large-scale computational analyses. Over the years, computing has become a lot more accessible. More researchers in the life sciences, social sciences, and humanities, have access to community software tools they can apply to their research problems. “It’s not about a user’s level of technical skill or what kind of science they do,” Koch says. It’s about asking, “are you using computing, and do you need help expanding?” CHTC’s ability to pull in researchers across new disciplines has been rewarding and beneficial. “When new disciplines start using computing to tackle their problems, they can do some new, interesting research to contribute to their fields,” Koch notes.
+
+
Democratizing Access
+
CHTC’s success can inspire other campuses to rethink their research computing operations to support their researchers better and innovate. Recognized nationally and internationally as an expert in HTC and facilitation, CHTC’s approach has started to make its way onto other campus computing centers.
+
+
CHTC efforts aim to bring broader access to HTC systems. “CHTC has enabled access to computing to a broad spectrum of researchers on campus,” Lombardi explains, “and we strive to help researchers and organizations implement throughput computing capacity.” CHTC is part of national and international efforts to bring that level of computing to other communities through partnerships with organizations, such as the Campus Cyberinfrastructure (CC*) NSF program.
+
+
The CC* program supports campuses across the country that wish to contribute computing capacity to the Open Science Pool (OSPool). These institutions are awarded a grant, and in turn, they agree to donate resources to the OSPool, a mutually beneficial system to democratize computing and make it more accessible to researchers who might not have access to such capacity otherwise.
+
+
The RCF team meets with researchers weekly from around the world (including Africa, Europe, and Asia). They hold OSG Office Hours twice a week for one-on-one support and provide training at least twice a month for new users and on special topics.
+
+
For other campuses to follow in CHTC’s footsteps, they can start implementing facilitation first, even before a campus has any computing systems. In some cases, such as on smaller campuses, they might not even have or need to have a computing center. Having facilitators is crucial to providing researchers with individualized support for their projects.
+
+
The next step would be for campuses to look at how they currently support their researchers, including examining what they’re currently doing and if there’s anything they’d want to do differently to communicate this ethic of supporting researchers.
+
+
Apart from the impact that research computing facilitators have had on the research community, Koch notes what this job means to her, “[w]orking for a more mission-driven organization where I feel like I’m enabling other people’s research success is so motivating.” Now, almost ten years later, the CHTC has gone from having roughly one hundred research groups using the capacity it provides to having several hundred research groups and thousands of users per year. “Facilitation will continue to advise and support these projects to advance the big picture,” Lombardi notes, “we’ll always be available to researchers who want to talk to someone about how CHTC resources can advance their work!”
+ The CHTC Philosophy of High Throughput Computing – A Talk by Greg Thain
+
+
HTCondor Core Developer Greg Thain spoke to UW faculty and researchers about research computing and the missions and goals of the Center for High Throughput Computing (CHTC).
+
+
+
+
The Center for High Throughput Computing (CHTC) is proud to be home to a breadth of research on campus, with over 300 projects and 20 million core hours used by departments on the University of Wisconsin-Madison campus, ranging from the College of Agriculture and Life Sciences (CALS) to the School of Education, School of Pharmacy, and many more. “The CHTC is known best for being a place to run lots of fast jobs for free, to which we hope to continue democratizing computing across the campus,” Greg Thain began in his talks to UW-Madison researchers and staff on March 9 and 17, organized by UW-Madison Chief Technology Officer Todd Shechter.
+
+
“We like to think of the CHTC like the UW Hospital,” Thain explained, “like the hospital’s main purpose is to train the next generation of health professionals and conduct medical research. In the same way, the CHTC is our research laboratory and is where others can come and conduct their research; we do both research and provide a service.”
+
+
The main asset leveraged by the CHTC is research computing. “Research computing consists of research that happens to use computing and research about computing,” Thain explained, “both of which start and end with people.” Thain then described the two phases researchers go through when they approach the CHTC for help; “first, they seek assistance and guidance on a problem they’re currently facing. Second, they realize they can do something revolutionary with high throughput computing (HTC).”
+
+
A component of research computing using the CHTC tailored to scientists and researchers is that they don’t have to spend time supervising their programs running. Users can configure an HTCondor Access Point to manage all their work, allowing them to essentially “submit it and forget it.” This compute system is similar to others in that any user can understand it and have it be reliable, “except ours has the extra touch of being a ‘submit it and forget it’ system,” Thain clarified.
+
+
Similarly, the CHTC also created software for where the work runs, called an HTCondor Execution Point (EP). These Execution Points may be machines owned by other researcher providers and have different policies.
+
+
Both researchers and research providers may have constraints; the goal then of HTCondor is to “manage and maintain these restraints; there are many users and researcher providers in the real world, and the CHTC is currently working on optimizing these individuals’ wants and needs.”
+
+
“This is a distributed problem,” Thain continued, “not because of the machines; it’s distributed because of the people.” Having distributed authority as opposed to distributed machines means that tools and policies are distributed.
+
+
The implicit assumption is that all work can be divided into smaller, mostly independent jobs. In this way, “the goal is to optimize the time to finish running these jobs instead of the time to run a single one; to do this, we want to break up the jobs as much as possible so they can run in parallel,” Thain explained. The implication of this is there are a lot of different jobs, and how difficult it is to break them up varies.
+
+
+
+
To mitigate this, research computing facilitators (RCFs) work with users and researchers to overcome their specific problems. RCFs are different from a traditional “help desk;” their role is to interface with graduate students, PIs, and other researchers and guide them to find the best-fit solution for their projects. RCFs must have a broad understanding of the basic sciences to communicate with the researchers, understand their work, and give them useful and reasonable recommendations and other technological approaches.
+
+
“The CHTC’s top priority is always reliability, but with all this work going on, the dream for us is scalability,” Thain described. Ideally, more loads would increase performance; in reality, it boosts performance a little, and then it plateaus. To compensate for this, the CHTC goes out of its way to make access points more reliable. “Adding access points helps to scale and allows submission near the user.” Thain notes the mantra: “submit locally, run globally.”
+
+
As the CHTC is our on-campus laboratory for experimenting with distributing computing, the Open Science Pool (OSPool) is a bolder experiment expanding these idea onto a national scale of interconnected campuses.
+
+
+
+
The OSG and subsequent OSPool provide computing access on a national level in the same way that someone can access an available machine locally. For example, if the machines on campus are unavailable or all being used, users can access machines in the greater OSG Consortium. “But at the end of the day, all this computing, storage and networking research is in service to the needs of people who rely on high throughput computing to accomplish their research,” Thain maintains. “We hope the OSPool will be an accelerator for a broad swath of researchers in all kinds of disciplines, from all over the United States.”
+
+
…
+
+
The full slideshow can be found here. Please click here for more information about researching computing within the CHTC, or visit this page to contact our RCFs for any questions.
+ Over 240,000 CHTC Jobs Hit Record Daily Capacity Consumption
+
+
The Center for High Throughput (CHTC) users continue to be hard at work smashing records with high throughput computational workloads. On October 20th, more than 240,000 jobs completed that day, reporting a total consumption of more than 710,000 core hours. This is equivalent to the capacity of 30,000 cores running non-stop for 24 hours.
+
+
What is contributing to these records? One factor likely is UW’s investment in new hardware.
+UW-Madison’s research computing hardware recently underwent a substantial hardware refresh,
+adding 207 new servers representing over 40,000 “batch slots” of computing capacity.
+
+
However, additional capacity requires researchers ready and capable to use it.
+The efforts of the CHTC facilitation team, led by Christina Koch, contributed to
+this readiness. Since September 1, CHTC’s Research Computing Facilitators have met
+with 70 new users for an introductory consultation, and there have been over 80
+visits to the twice-weekly drop-in office hours hosted by the facilitation team.
+Koch notes that “using large-scale computing can require skills and concepts that
+are new to most researchers - we are here to help bridge that gap.”
+
+
Finally, the hard work of the researchers themselves is another linchpin to these records.
+Over 80 users that span many fields of science contributed to this success, including
+these users with substantial usage:
+
+
+
Ice Cube Neutrino Observatory: an observatory operated by University of Wiconsin-Madison, designed to observe the cosmos from deep within the South Pole ice.
+
ECE_miguel: In the Department of Electrical and Computer Engineering, Joshua San Miguel’s group explores new paradigms in computer architecture.
+
MSE_Szlufarska: Isabel Szlufarska’s lab focuses on computational materials science, mechanical behavior at the nanoscale using atomic scale modeling to understand and design new materials.
+
Genetics_Payseur: Genetics professor Bret Payseur’s lab uses genetics and genomics to understand mechanisms of evolution.
+
Pharmacy_Jiang: Pharmacy professor Jiaoyang Jiang’s interests span the gap between biology and chemistry by focusing on identifying the roles of protein post-translational modifications in regulating human physiological and pathological processes.
+
EngrPhys_Franck: Jennifer Franck’s group specializes in the development of new experimental techniques at the micro and nano scales with the goal of providing unprecedented full-field 3D access to real-time imaging and deformation measurements in complex soft matter and cellular systems.
+
BMI_Gitter: In Biostatistics and Computer Sciences, Anthony Gitter’s lab conducts computational biology research that brings together machine learning techniques and problems in biology
+
DairyScience_Dorea: Joao Dorea’s Animal and Dairy Science group focuses on the development of high-throughput phenotyping technologies.
+
+
+
Any UW student or researcher who wants to utilize high throughput of computing resources
+towards a given problem can harness the capacity of the CHTC Pool.
+ Expanding, uniting, and enhancing CLAS12 computing with OSG’s fabric of services
+
+
A mutually beneficial partnership between Jefferson Lab and the OSG Consortium at both the organizational and individual levels has delivered a prolific impact for the CLAS12 Experiment.
+
+
+
+
Twenty-five feet underground within the U.S. Department of Energy’s Thomas Jefferson National Accelerator Facility in Newport News, Virginia, electrons circulating at nearly the speed of light form a beam that’s as narrow as a single strand of human hair. Traveling around a racetrack-shaped accelerator five times in about 22 millionths of a second, electrons in this beam are directed into a target material, where they collide with protons and neutrons that reside inside the nuclei of the target atoms. These collisions produce an array of new particles, which ricochet out of the target material and into a unique detector that measures the particle’s momentum and speed to determine its mass and identity.
+
+
+
+
At first, these quantum interactions may seem incomprehensible in human dimensions, but these marvels of physics –– and the computational approaches
+required to study them –– have brought together people, groups, and institutions across nations and scientific disciplines. The racetrack-shaped
+accelerator at Jefferson Lab, officially known as the Continuous Electron Beam Accelerator Facility (CEBAF), attracts approximately 1,500 scientists
+from around the world, all visiting Jefferson Lab to conduct experiments. The one-of-a-kind detector known as the CEBAF Large Acceptance Spectrometer,
+or the CLAS detector, is the namesake of the CLAS Collaboration, a group of over 200 collaborators from more than 40 institutions that span a total of
+8 countries. To manage their ever-growing amounts of data, geographically-distributed collaboration, and complex workflows, the CLAS Collaboration
+partners with the OSG Consortium in expanding, uniting, and enhancing their experiment.
+
+
Researchers within this collaboration all strive to understand atomic structure, yet their individual topics of study
+are diverse, ranging from the multi-dimensional distribution of quarks and gluons inside a proton, to the binding interactions within a complex nuclei.
+In pursuit of this research, scientists in the Collaboration have used 42 million core hours through OSG services in the past year. This number is
+impressive in itself, yet the amount of communication and coordination required to achieve this level of computational throughput is far more
+extraordinary. These collaborative endeavors have a long history, dating all the way back to the inception of the OSG Consortium.
+
+
The foundations of a partnership
+
+
After ten years of construction, Jefferson Lab began operations in 1997. This marked the beginnings not only of the CLAS experiment, but also the
+collection of other physics experiments that call Jefferson Lab home. Soon after their launch, Jefferson Lab contributed as a founding institution for
+the OSG Consortium. They participated in the formation of OSG’s bylaws but didn’t leverage OSG’s services because it wasn’t an appropriate fit for their
+experiments at the time. In April of 2018, however, Jefferson Lab rejoined the OSG Consortium in full force to pursue opportunities for the GlueX
+experiment, and eventually also for the CLAS Collaboration’s new and upgraded experiment called CLAS12.
+
+
This resurgence on the organizational level all stems from the actions of individual people. Before Jefferson Lab rejoined the OSG Consortium,
+Richard Jones, a principal investigator (PI) at the University of Connecticut who is involved in the GlueX experiment, began exploring OSG’s services.
+Jones not only introduced the benefits of OSG to GlueX, but also to Jefferson Lab more broadly. After OSG’s workflow and infrastructure proved to be
+scalable for GlueX, members of the CLAS Collaboration became interested in OSG’s fabric of services too. Frank Würthwein, OSG Executive Director,
+interprets this process as a “flow of engagement that followed the social structures that the relevant parties were embedded in. Basically, it’s a campus
+word-of-mouth.”
+
+
+
+
This partnership was cemented when Würthwein visited Jefferson Lab to discuss opportunities for both the GlueX and CLAS12 experiments. The resulting
+partnership that exists today has proven to be notably symbiotic. In fact, Würthwein professes that the partnership with Jefferson Lab has been absolutely
+central to OSG’s mission: “Jefferson Lab and the CLAS Collaboration have helped us multiply our message, improve our tools, and ultimately advance open
+science itself. They have played an important role in making us a better organization.” Likewise, the CLAS Collaboration has been able to expand their
+computing capacity, unite their computing resources, and enhance their science as a result of working with OSG.
+
+
Expanding computing resources
+
+
On a fundamental level, OSG’s fabric of services provides the CLAS Collaboration with additional computing power through the Open Science Pool (OSPool) ––
+an asset that was vital after transitioning to a new, upgraded version of the experiment in 2018. Compared to the original experiment, the electrons
+blasting into the target material in the new experiment carry twice the energy –– 12 billion electron volts to be exact. This new experiment, coined
+‘CLAS12’ to signify this energy increase, also engendered a tenfold increase in computing demand. While Jefferson Lab’s in-house computing resources are
+extensive, the sheer amount of data produced in the CLAS12 experiment is substantial. Today, the experiment generates about 1 petabyte of data each year.
+
+
To put this number into perspective, 1 petabyte is equivalent to twenty million four-drawer filing cabinets completely filled with text, or 13.3 years of
+HD-TV video. That’s a lot of data to manage.
+
+
+
+
Nathan Baltzell, a Jefferson Lab Staff Scientist who organizes software efforts for CLAS12, describes how staff at Jefferson Lab responded to this data
+dilemma: “When this newer era of experiments started four years ago, projections were that we would absorb all our local computing resources crunching
+the real, experimental data. It was critical to be able to run simulations somewhere else.”
+
+
That somewhere else became the capacity offered by the OSG. Each job submitted by CLAS12 researchers contains about 10,000 different monte-carlo
+simulations and runs for roughly 4-6 hours on a single core. Once submitted to an OSG Access Point, CLAS12 jobs either run on opportunistic or dedicated
+resources. Opportunistic resources, or resources contributed to the common good of all open science via the OSPool, have provided the CLAS12 experiment
+with roughly 33 million core hours in the past year. On the other hand, dedicated resources –– those exclusively reserved for the CLAS12 experiment ––
+supply the Collaboration with about 9 million core hours annually. These dedicated resources have undoubtedly played a role in expanding computing
+capacity, but they also have proven instrumental in uniting computing resources of the CLAS Collaboration.
Beyond expanding the computing resources available to the CLAS12 experiment, OSG services have also played a role in uniting the CLAS Collaboration’s
+existing computing resources scattered around the globe. Hundreds of collaborators belonging to many different institutions in a collection of countries
+translates to more total computing resources at the Collaboration’s disposal. However, accessing this swath of distributed resources, installing the
+necessary software, and ensuring everything runs smoothly proved to be a logistical headache that worsened as the CLAS Collaboration’s software evolved
+and became more sophisticated.
+
+
+
+
Thankfully, OSG’s services could serve as a unified pool that would unite the CLAS Collaboration’s computing resources and bypass the logistical
+bottlenecks. Raffaella De Vita, Software Coordinator and former Chair of the CLAS Collaboration, comments on the value of this approach: “The idea of
+using OSG services to basically collect resources that our institutions could provide and make them in a unified pool that could be used more efficiently,
+became very appealing to us.”
+
+
Today, 6 CLAS Collaborators with their own computing centers have joined the OSPool to provide dedicated resources to the experiment in a more efficient
+manner. These institutions include Massachusetts Institute of Technology, Glasgow University, Grille au service de la Recherche en Ile de France (GRIF),
+Lamar University, Compute Canada, and Istituto Nazionale di Fisica Nucleare (INFN). De Vita, a Staff Scientist at INFN, was personally involved in
+coordinating the addition of INFN’s computing resources to the OSPool. She considers the process to be quite successful from her perspective: “People at
+OSG took care of creating the connection and working with our computing center staff, and I basically just had to send some emails.” Zooming out on
+impacts to the CLAS Collaboration more broadly, De Vita adds, “it’s been an excellent way to get members of the collaboration to contribute not only with
+manpower, but also with computing resources.”
+
+
Enhancing science and improving workflows
+
+
Finally, collaboration among OSG and Jefferson Lab staff has resulted in improved workflows, streamlined submissions, and enhanced science. The HTCondor Software Suite
+(HTCSS), which was developed at UW-Madison and is used to automate and manage workloads, coordinates the submission of CLAS12 jobs. Containers, which
+function naturally on the OSPool, are used to create custom software environments for CLAS12 jobs.
+
+
+
+
When asked about workflows and job submissions, Maurizio Ungaro, a Jefferson Lab Staff Scientist who helps
+coordinate CLAS12’s monte-carlo simulations, expresses: “This is actually where OSG services are really useful. Containers allow us to encapsulate the
+software that we run, and HTCondor coordinates the submission of our jobs. Because of this, we’re able to solve two problems: one being CPU usage, and
+the other being simulation organization.”
+
+
Before they began using OSG Access Points, CLAS Collaborators used to write their own submission scripts, a challenging task that involved many moving
+parts and was prone to errors. Now, through coordination with OSG staff, Ungaro and his team have been able to package the array of tools in a user-friendly
+web portal. Describing the impacts of this new interface, Ungaro explains: “Now, collaborators are able to submit jobs using the web portal, even from
+their phone! They can choose from several experiment configuration options, click the submit button, and within a few hours the results will be here at
+Jefferson Lab on their user disk space.” In essence, this web portal streamlines the process of job submission, all so that CLAS Collaborators can grow
+and improve their physics.
+
+
A legacy of multi-scale collaboration
+
+
The partnership between Jefferson Lab and the OSG Consortium is a story of many dimensions. Projects of this scale are rarely a seamless production system
+in which all components are automated. They require hard work and close coordination, at both the organizational and individual levels.
+
+
On the individual scale, consistent, day-to-day interactions accumulate to instill a lasting impact. OSG staff participate in Jefferson Lab’s weekly
+meetings, engage in one-on-one calls, and organize meetings to resolve issues and support the CLAS12 experiment. Reflecting on the culmination of these
+interactions, Ungaro would characterize his experience as “nothing short of incredible.” He adds: “I can see not just their technical expertise, but also
+how they’re really willing to help, happy to contribute, and grateful to help our science.”
+
+
+
+
Pascal Paschos, the OSG Area Coordinator for Collaboration support who works closely with the CLAS12 Collaboration,
+sees the experience as an opportunity for growth: “OSG doesn’t merely provide a service to these individual labs; it’s also an opportunity for us to grow
+as an organization by identifying what we have done well in our partnership with Jefferson Lab to enable such a prolific production from one of their
+experiments.”
+
+
Ultimately, the CLAS experiment as it exists today is a product of cross-coordination between Collaboration members, executive teams, and technical staff
+on both sides of the partnership, all working together to make something happen. As Paschos phrases it: “At the end of the day, you’re looking at
+partnerships –– not between institutional entities –– but between people.”
+ Solving for the future: Investment, new coalition levels up research computing infrastructure at UW–Madison
+
+
Original article posted by Corissa Runde on September 21, 2022, on UW-Madison’s Department of Information Technology website.
+
+
+
+
UW-Madison’s research computing hardware recently underwent a substantial hardware refresh, adding 207 new servers representing over 40,000 “batch slots” of computing capacity. This refresh and plan of an annual commitment at the campus level to sustain it will allow UW researchers to push the limits of their research on an all-new sustained shared infrastructure. The funding for this was made possible by a $4.3 million investment from the Wisconsin Alumni Research Foundation (WARF), which will remove some of the worries for individual PIs of having to stand up their own facilities. Now, researchers will have a sustainable computational infrastructure to harness more computing capacity and produce computationally heavy research results more efficiently.
+
+
The research computing investments, equipment upgrades, and services to support researchers were made possible by the growing collaboration between:
+ High-throughput computing as an enabler of black hole science
+
+
+
+
On June 25, 2021, Arizona astrophysicist Feryal Ozel posted an item on Twitter that must have fired up scientific imaginations. She noted that the Open Science Pool (OSPool) just set a single-day record of capacity delivered — churning through more than 1.1 million core hours. Her team’s project was leading the surge.
Almost a year later, the secret is out. The Event Horizon Telescope (EHT) Project, a collaboration of more than 300 astronomers around the world, announced on May 12 it had produced an image of a supermassive black hole at the center of the Milky Way, only the second image of its kind in history.
+
+
EHT made that initial history in 2019 when it shared a dramatic image of a black hole at the center of the M87 galaxy, 55 million light-years from Earth, thereby taking black holes from a theoretical concept to an observable phenomenon.
+
+
+
+
For this newest image, EHT harnessed the power of the OSPool that is operated by the OSG Consortium to help with the computational challenge behind this work. This required the execution of more than 5 million computational tasks that consumed more than 20 million core hours. Most of the computations took place over a 3-month period in 2021.
+
+
+
+
The OSG fabric of services has become the computational backbone for science pursuits of all sizes – from single investigators to international collaborations like EHT. Based on the high-throughput computing (HTC) principles pioneered by UW-Madison computer scientist and Morgridge Institute for Research investigator Miron Livny, the OSG services address the need of research projects to manage workloads that consist of ever-growing ensembles of computational tasks. Researchers can place these workloads at OSG Access Points and harness the capacity of the OSPool that is provided by contributions of more than 50 institutions across the country.
+
+
Over the decades, large international collaborations have been leveraging the OSG services to chase cosmic neutrinos at the South Pole, identify gravitational waves generated billions of miles away in space, and discover the last puzzle piece of particle physics, the Higgs boson.
+
+
Chi-Kwan “CK” Chan, a University of Arizona astronomer who coordinates the EHT simulation work, says the project uses data from 8 telescopes around the world. He says that since getting plugged into the OSG services in 2020, it has become a “critical resource” in producing the millions of simulations that help validate physical properties not directly “seen” by these telescopes — like temperature, density and plasma parameters.
+
+
“And once we pull together these many computed images across many parameters, we’re able to compare our simulations with our observations and develop a truer picture of the actual physics of a black hole,” Chan says.
+
+
“Simulation is especially important in astronomy, because our astrophysical system is so complicated,” he adds. “Using the OSG services allows us to discard hundreds of thousands of parameters and find the configurations that work the best.”
+
+
+
“It improved our science an order of magnitude.”
+– CK Chan
+
+
+
Chan adds that the OSG consortium also provides the storage the EHT simulation work needs, which allows data to exist in one place and makes it easier to manage. The bottom line is that OSG greatly improves the effectiveness of the EHT simulation work. Chan estimates that the partnership enabled the EHT scientists to accomplish in three months what might take 3 years with conventional methods.
+
+
“It improved our science an order of magnitude,” Chan adds. “There are so many more parameters of space that we can explore.”
+
+
The EHT collaboration was triggered through contacts at the National Science Foundation (NSF) Office for Advanced Cyberinfrastructure (OAC). “Following our commitment to leverage NSF investments in cyberinfrastructure, we reached out to CK and it turned out to be a perfect match,” Livny says.
+
+
NSF has been a vital supporter of the OSG Consortium since its origin in 2005, and this is a perfect example of a collaboration between two NSF funded activities, Livny says. In 2020, NSF launched the $22.5 million Partnership to Advance Throughput Computing (PATh), with a significant presence at the UW-Madison Computer Sciences Department and the Morgridge Institute for Research. That partnership is helping to expand the adoption of HTC and advance the HTC technologies that power the OSG Services.
+
+
Livny, who serves as principal investigator of PATh, says the EHT computational workload is the equivalent of having several million individual tasks on your to-do list. The HTC principles that underpin the OSG services provide effective means to manage such a long, and sometimes interdependent, to-do list. “Otherwise, it’s like trying to fill up a swimming pool one teaspoon at a time,” he says.
+
+
Chan and his team of researchers at Arizona, Illinois, and Harvard worked closely with the OSG team of research facilitators to optimize the impact of OSG services on their high throughput workloads. Led by UW-Madison facilitator Lauren Michael, the team provided the EHT group with the necessary storage, advised their workload automation policies, and helped them with moving results back to the Arizona campus.
+
+
Livny emphasizes that the OSG services are founded on the principles of sharing and mutual trust. Any U.S. researcher can bring their computational workload to an OSG Access Point and any U.S. institution can contribute computing capacity to the OSPool.
+
+
“I like to say that you don’t have to be a super person to do super high-throughput computing,” says Livny.
+
+
…
+
+
This article is courtesy of the Morgridge Institute for Research. Find the original article on the Morgridge Institute’s news page.
+
+
To read more about this discovery you can find other articles covering this event below:
+ Retirements and New Beginnings: The Transition to Tokens
+
+
May 1, 2022 officially marked the retirement of OSG 3.5, GridFTP, and GSI dependencies. OSG 3.6, up and running since February of 2021, is prepared for usage and took its place, relying on WebDAV and bearer tokens.
+
+
In December of 2019, OSG announced its plan to transition towards bearer tokens and WebDAV-based file transfer, which would ultimately culminate in the retirement of OSG 3.5. Nearly two and a half years later, after significant development and work with collaborators on the transition, OSG marked the end of support for OSG 3.5.
+
+
OSG celebrated the successful and long-planned OSG 3.5 retirement and transition to OSG 3.6, the first version of the OSG Software Stack without any Globus dependencies. Instead, it relies on WebDAV (an extension to HTTP/S allowing for distributed authoring and versioning of files) and bearer tokens.
+
+
Jeff Dost, OSG Coordinator of Operations, reports that the transition “was a big success!” Ultimately, OSG made the May 1st deadline without having to backtrack and put out new fires. Dost notes, however, that “the transition was one of the most difficult ones I can remember in the ten plus years of working with OSG, due to all the coordination needed.”
+
+
Looking back, for nearly fifteen years, communications in OSG were secured with X.509 certificates and proxies via Globus Security Infrastructure (GSI) as an Authentication and Authorization Infrastructure (AAI).
+
+
Then, in June of 2017, Globus announced the end of support for its open-source Toolkit that the OSG depended on. In October, they established the Grid Community Forum (GCF) to continue supporting the Toolkit to ensure that research could continue uninterrupted.
+
+
While the OSG continued contributing to the GCT, the long-term goal was to transition the research community from these approaches to token-based pilot job authentication instead of X.509 proxy authentication.
+
+
A more detailed document of the OSG-LHC GridFTP and GSI migration plans can be found in this document. Please visit the GridFTP and GSI Migration FAQ page if you have any questions. For more information and news about OSG 3.6, please visit the OSG 3.6 News release documentation page.
+
+
…
+
+
If you have any questions about the retirement of OSG 3.5 or the implementation of OSG 3.6, please contact help@osg-htc.org.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/preview-calendar/European-HTCondor-Week.html b/preview-calendar/European-HTCondor-Week.html
new file mode 100644
index 000000000..463a0890b
--- /dev/null
+++ b/preview-calendar/European-HTCondor-Week.html
@@ -0,0 +1,361 @@
+
+
+
+
+
+
+Save The Date for the European HTCondor Workshop, September 24-27
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Save The Date for the European HTCondor Workshop, September 24-27
+
+
This year’s European HTCondor Workshop will be held from September 24 to 27th hosted by NIKHEF-Amsterdam, the Dutch
+National Institute for Subatomic Physics, in the beautiful Dutch capital city of Amsterdam.
+
+
The workshop will be an excellent occasion for learning from the sources (the developers!) about HTCondor, exchanging
+with your colleagues about experiences and plans and providing your feedback to the experts. The HTCondor Compute Entry
+point (CE) will be covered as well. Participation is open to all organizations (including companies) and persons interested
+in HTCondor (and by no means restricted to particle physics and/or academia!) If you know potentially interested persons,
+don’t hesitate to make them aware of this opportunity.
+
+
The workshop will cover both using and administering HTCondor; topics will be chosen to best match participants’ interests.
+We would very much like to know about your use of HTCondor, in your project, your experience and your plans. You are warmly
+encouraged to propose a short presentation.
+
+
There will also time and space for short, maybe spontaneous interactive participation (“show us your toolbox sessions”)
+which proved to be very popular in previous meetings.
+
+
Registration and abstract submission will be opened in due course.
+
+
To ease travel, the workshop will begin Tuesday morning and end around Friday lunchtime.
+ Using high throughput computing to investigate the role of neural oscillations in visual working memory
+
+
Jacqueline M. Fulvio, lab manager and research scientist for the Postle Lab at the University of Wisconsin-Madison, explains how she used the HTCondor Software Suite to investigate neural oscillations in visual working memory.
+
+
+
+
+
+
If you could use a method of analysis that results in better insights into your research, you’d want to use that option. The catch? It can take months to analyze one set of data.
+
+
Jacqueline M. Fulvio, a research scientist for the Postle Lab at the University of Wisconsin-Madison, explained at HTCondor Week 2022 how she overcame this problem using high throughput computing (HTC) in her analysis of neural oscillations’ role in visual working memory.
+
+
The Postle Lab analyzed the patterns of brain waves recorded from participants as they performed working memory tasks using an HTC workflow. Visual working memory is the brain process that temporarily allows us to maintain and manipulate visual information to solve a task. First, participants were given a sample with two images to memorize for two seconds. Then the image disappeared, and, following a five-second delay, participants were given a cue that indicated which item in memory would later be tested. The experimenter then delivered a single pulse of transcranial magnetic stimulation (TMS) to the participants’ scalp on half of the trials. TMS alters brain function, so Fulvio and her collaborators looked for corresponding impacts on participants’ brain waves recorded in an electroencephalogram (EEG). Finally, the participants indicated whether the image shown on the screen matched the original sample item.
+
+
+
+
After collecting and processing the data from the EEG, they can analyze the neural oscillations (or brain waves) to understand how they change throughout the task. Previous results have shown that the frequency of neural oscillations is associated with working memory processes.
+
+
“In our current work, we wanted to more deeply investigate the role of these neural oscillations in working memory,” Fulvio states, “we chose to leverage an analysis called spatially distributed phase coupling extraction with a frequency-specific phases model (SPACE-FSP).” This analysis is a multi-way decomposition of the EEG data.
+
+
The number of decomposable networks can’t be determined analytically, so the group estimates it using decomposition. Finding the optimal decomposition is an iterative process that starts with a statistical criterion and a set number of oscillating networks, which incrementally increase until they can no longer achieve the criterion. As a result, a single decomposition can take up to several months to complete.
+
+
Although this method provides better insight into what Fulvio and her group want to analyze, “this remains a largely unused approach in the field.” Fulvio speculates that other scientists in the field often don’t use this kind of analysis because it’s very computationally demanding. “This is where high throughput [computing] came in for us.”
+
+
Fulvio and her team planned to analyze at least 186 data sets, which, at the time, “seemed insurmountable.” The HTC capabilities of HTCondor offered them a solution to this problem by running the decompositions in parallel using the capacity of a campus wide shared facility. They also had the opportunity to utilize the Matlab parallel pool compatibility, which helped scale out the processing.
+
+
The group started following the HTC paradigm because their lab had already used services provided by the UW-Madison Center for High Throughput Computing (CHTC) for some time. Fulvio’s supervisor, Dr. Bradley Postle, suggested setting up a meeting and seeing if what they needed could be achieved using the capacity offered by CHTC.
+
+
Fulvio has an extensive coding history, but when she did run into compiling problems, she found the office hours offered by the CHTC Research Computing Facilitators extremely helpful, “I got useful tips from the staff in figuring out what was going wrong and what I needed to fix!”
+
+
The group ran 42 jobs, each job taking anywhere from two days to two weeks to run. The initial results of the analyses were promising, but the two data analysis pipelines the group tried were insufficient to address some of the critical questions.
+
+
After re-running the analyses using new data Fulvio collected, she overcame some limitations from the prior dataset to address the original questions. For this dataset, the group ran almost twice the amount of jobs – 72 – with each one again taking anywhere from two days to two weeks to run.
+
+
The group updated the analysis once more to increase the data size from 500 milliseconds to 1-second chunks. They also combined the data into a single pipeline instead of having it in different chunks of data for two separate analyses.
+
+
The goal of this update was to increase the amount of data they were sending, which in turn increased the amount of time it took to do these decompositions. More data resulted in a more robust and interpretable statistical result.
+
+
“All versions of the analyses were ultimately successful,” Fulvio comments. “We’ve benefited significantly from this process.” Their final analysis obtained 1,690 components – a “fantastic number” for their data analyses.
+
+
“We had such good support along the way so we could get this going,” Fulvio notes. In addition, what could have been years of computing on their lab machines, was condensed and boiled down into merely months for each analysis iteration.
+
+
The group also conducted one more analysis, as “[this] experience helped us think about a special control analysis,” Fulvio remarks. The group carried out hundreds of jobs within a day using this separate analysis, giving them rapid confirmation through the control analysis results.
+
+
Fulvio reflects, “from our research group’s broad perspective, OSPool capacity accessible via the CHTC have significantly expanded our computational capabilities.” Although computationally demanding, these resources helped the group apply this better-suited analysis method to address their original questions.
+
+
From a more personal perspective, Fulvio notes that learning how to take advantage of these OSPool capacity has improved her skills, including coding. These resources allowed her to work with additional languages and sharpened her ability to optimize code.
+
+
Fulvio concludes that “this has allowed us to help advance our field’s understanding, address key questions in the grant funding the research, and it provides the opportunity to reconsider other established findings and fill gaps in understanding of those studies.”
+
+
…
+
+
Watch a video recording of Jacqueline M. Fulvio’s talk at HTCondor Week 2022, and browse her slides.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/preview-calendar/GLUE-lab.html b/preview-calendar/GLUE-lab.html
new file mode 100644
index 000000000..7f9b91a6f
--- /dev/null
+++ b/preview-calendar/GLUE-lab.html
@@ -0,0 +1,407 @@
+
+
+
+
+
+
+How the GLUE Lab is bringing the potential of HTC to track the movement of cattle and land use change
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
The GLUE Lab studies how land across the world is being used for agriculture and the systems responsible for land use change. Christie — who researches land use in Brazil with a
+focus on how the Amazon and Cerrado biomes are changing as natural vegetation recedes — takes data describing the cattle supply chain in Brazil and integrates it into a single
+database the GLUE Lab can use for research. With this data, the lab also aims to inform policy decisions by the Brazilian government and international companies.
+
+
In the Amazon, Christie says, one of the main systems causing land use change is in the cattle sector, or the production of cattle. “One of the motivating facts of our research
+is that 80% of forest cleared in the Amazon is cleared in order to raise cattle. And so we’re interested in understanding the cattle supply chain, how it operates, and what it
+looks like.” The lab gets its data from the Rural Environmental Registry (CAR), which is a public property boundary registry data from Brazil, and the Guide to Animal Transport
+(GTA), which records animal movement and sales in Brazil.
+
+
The possibilities of utilizing high throughput computing (HTC) for the lab’s research intrigued Christie, who had some awareness of HTC from the research bazaar and had even started refactoring some of the lab’s
+data pipeline before attending, but he wanted to learn more besides what he gained from watching introductory tutorials. Christie was accepted and attended the OSG School
+in the summer of 2023. He and other lab members believed their work could benefit from the school training with HTCondor, the workload management application developed by the CHTC for HTC, and the associated big data sets
+with a large number of jobs.
+
+
Upon realizing the lab’s work could greatly benefit from the OSG School, Christie used a “test case” project that resembled a standard research project to model a task with many
+independent trials, finding how — for the first time — HTC could prove itself resourceful for GLUE Lab research. The specific project Christie worked
+on during the School using HTC was to compute simulated journeys of cows through properties in Brazil’s cattle supply chain. By the end of the week-long School, Christie says
+using HTC scaled up the modeling project by a factor of 10. In this sense, HTC is the “grease that makes our research run more smoothly.”
+
+
Since attending the School, witnessing the test case’s success with HTC, and discovering ways its other research projects could benefit, the GLUE Lab has begun shifting to applying
+HTC. However, this process requires pipeline changes lab members are currently working through. “We have been in the process of working through some of our big projects that we
+think really could benefit from these resources, but that in itself has a cost. Currently, we’re still in the process of writing or refactoring our pipelines to use HTC,” Christie
+elaborates.
+
+
For a current project, Christie mentions he and other GLUE Lab members are looking at how to adapt their code to HTC without having to rewrite all of it. With the parallelism that
+HTC offers compared to the single computing environment the lab used before to run its data pipeline, each job now has its own environment. But it’s complex “leveraging the
+parallelism in our database build pipeline. Working on that is an exercise, but with handling data, there are many dependencies, and you have to figure out how to model them.”
+Christie says lab members are working on adjusting the workflow to ensure each job has the data it needs before it can run. While this can sometimes be straightforward,
+“sometimes a step in the pipeline has special inputs that are unique to it. With many steps in the pipeline, properly tracking and preparing all this data has been the main source
+of work to get the pipeline to run fully using HTC.”
+
+
For now, Christie says cutting down the two-day run time of their database build pipeline to just a matter of hours with HTC “would be a wonderful improvement that would accelerate
+deployment and testing of this database. It would let us introduce new features and catch bugs faster.”
+
+
+
+
Christie recognizes the strength of the CHTC comes from not only its limitless computation power but also the humans who are running it behind the screen and that it’s free for
+researchers at UW–Madison, distinguishing it from other platforms and drastically lowering the entry barrier for researchers who want to scale up their research projects —
+“Instead of waiting months or years to receive funding for cloud resources, they can request an account and get
+started in a matter of weeks,” Christie says.
+
+
Christie values the unique opportunity to attend office hours and meet with facilitators, which makes his experience special. “I would definitely recommend that people look at this
+invaluable resource that we have on campus. Whether your work is with high throughput or high performance computing, there are offerings for both that researchers should consider,”
+Christie says.
+ Using HTC and HPC Applications to Track the Dispersal of Spruce Budworm Moths
+
+
Matthew Garcia, a Postdoctoral Research Associate in the Department of Forest & Wildlife Ecology at the University of Wisconsin–Madison, discusses how he used the HTCondor Software Suite to combine HTC and HPC capacity to perform simulations that modeled the dispersal of budworm moths.
+
+
+
+
Spruce budworms are small, caterpillar-like insects that enjoy munching on balsam fir and spruce trees. What the budworms lack in size, they make up for in total forest devastation; within five to six years, the budworm kills the tree entirely. An example of this can be seen in the image above from eastern Canada, with the brown trees being “pretty much dead.”
+
+
Matthew Garcia, a Postdoctoral Research Associate in the Department of Forest & Wildlife Ecology at the University of Wisconsin–Madison, examined the flight behavior of these budworm moths. He aims to determine where the budworms disperse to stop them from causing these mass tree deaths. His research combines high throughput computing (HTC) and high-performance computing (HPC) applications.
+
+
+
+
Garcia’s project takes a closer look at the biological process of the species. He’s looking at the dispersal of adult spruce budworm moths in the summertime, as this is a process least understood by researchers in the field.
+
+
Working with collaborators at the U.S. and Canadian Forest Services, Garcia’s study of budworm dispersal tracks the budworm’s movement from where they grew up defoliating the fir and spruce trees to where they mate and drop their eggs. This biological process is driven mainly by weather and lasts about a year, though the adult phase is the period Garcia has focused on for his work thus far.
+
+
In January 2022, Garcia published “Modeling weather-driven long-distance dispersal of spruce budworm moths. Part 1: Model Description.” This individual-based model of moth behavior was developed in Python and is heavily dependent on weather model outputs. Garcia is currently working on “Part 2: Parameter calibration and feedback” that will supplement the early model results and compare them with radar observations of moth flight events.
+
+
Garcia uses two modeling workflows to obtain the results of his study. He uses a combination of HTC and HPC for the weather modeling workflow, with the main weather model running on the HPC system and numerous pre-and post-processing tasks running on HTC. For the second workflow, he developed a Markov chain Monte Carlo (MCMC) modeling process for the flight simulation currently running at the CHTC.
+
+
For the weather modeling workflow, Garcia runs the pre-processing using HTC, which takes in one month of historical weather data at a time and takes just about a day to complete. The pre-processing provides the initial and boundary conditions to the weather simulations. He then runs the Weather Research & Forecasting (WRF) model as an HPC application, feeding the output from the pre-processing as input to the WRF model, which takes a little over six hours to generate one day of high-resolution output. Finally, the WRF model output returns to HTC for post-processing, reducing the data to just the variables needed for the budworm flight model.
+
+
For the flight modeling workflow, Garcia runs a pre-processing step using HTC to determine the pool of available moths for the flight simulations; each simulation randomly selects a thousand moths out of the pool. He then uses the post-processed temperature and wind fields from the WRF model output to tell the moths when to fly and where to go in the flight model. Garcia runs ensembles of flight simulations to obtain a good sample of the moth population available on a given night. These simulations then run sequentially over the nights in the seasons when moths are emerging, flying, and laying eggs just about everywhere they land.
+
+
+
+
“HTCondor developers have been immensely helpful in making sure that I can fit the HPC component into the middle of this larger DAGMan process,” Garcia notes. He uses DAGMan workflow scripts from HTCondor to organize his workflows with mixed submission protocols.
+
+
Garcia combines all the collected information and calculates the moths’ survival likelihood. He has demonstrated that adult dispersal is almost entirely weather-driven and occurs almost nightly during summer and that males and females have different flight capabilities.
+
+
“I love this because I can easily take that pre-processing part of the DAG and make it its node to build in more biological processes for the daytime part of the model,” Garica remarks. “I can then relatively easily expand the scope of the whole DAG to cover more of the seasonal or annual biological cycle model.”
+
+
Garcia concludes, “everything’s going great – there are no pain points, everything is looking good, and my colleagues and I are very excited about the modeling results we’re seeing.”
For the first time, UW Statistics undergraduates could participate in a course teaching high throughput computing (HTC). John Gillett, lecturer of Statistics at the University of Wisconsin-Madison, designed and taught the course with the support of the Center for High Throughput Computing (CHTC).
+
+
+
+
+
+
This past spring HTC was introduced to a new realm – the inside of an undergraduate statistics course. John Gillett, a lecturer in the Statistics department at the University of Wisconsin-Madison, unveiled a new special topics course, Statistics 479, to undergraduate students in the spring of 2022. The course introduced students with little programming experience to a robust and easy-to-learn approach that they could use to tackle significant computational problems. “The basics of distributed computing are easy to learn and very powerful,” Gillett explained.“[That’s why] it fit with the CHTC – I knew they could give the students and me the computing capabilities and support.”
+
+
This class was created as an undergraduate counterpart to the graduate-level course, Statistics 605, which Gillett has taught since the Spring of 2017. The course includes learning basic distributed computing to analyze data sets too large for a laptop.
+
+
Gillett reached out to research computing facilitator Lauren Michael in 2016. He hoped to learn how he could teach his students easy parallel computing. He settled on HTC, as it was easiest for helping students do large computations. “This was an easy path for me,” the teacher remarked, “and everyone at the CHTC made it easy.”
+
+
+
+
Research Facilitator Christina Koch guest lectured in 2017 when the graduate class was first offered, and every semester since. She talks to the students about the CHTC and high throughput computing and has them run a few jobs. Koch notes that this partnership between the CHTC and Gillett’s class has been “a win-win; we get to share about our system and how people run things, and he gets to have this interesting, hands-on assignment for his class.”
+
+
Gillett created an assignment that involves using HTC on a real data set with the help of Christy Tremonti, a UW-Madison Astronomy professor. Tremonti had a research problem that required searching through many astronomical spectra (of photos of galaxies) for a particular type corresponding to a gravitationally lensed Lyman-break galaxy. “In the beginning, she gave a lot of good, critical feedback for the research element of this,” Gillett explained. She guided the students through large-scale computations during the first few semesters. As he reflects on this partnership, Gillett beams, “this was exciting too – we were doing unknown statistics on a real research problem. We didn’t know what the right answer was!”
+
+
Gillett remarked that his students enjoy working with the CHTC; “[the students] now understand how to work a parallel computing environment,” he noted. “They get excited about the power they now have to extract solutions from big piles of data.” This course offers students simple, powerful tools to do just that.
+
+
Gillett appreciated the help and support he received from the CHTC in this course development “I needed a little more knowledge and their willingness to help support the students and me.” The technologies and services that the CHTC develops for HTC gave Gillett an easy and accessible way to teach his students programming and computational thinking skills that they’ll be able to carry with them.
+
+
“Students go from being weak programmers to not being intimidated by big data sets and computations that they wouldn’t have been able to consider otherwise. I’m proud about that.” These individuals come out of these classes with a different kind of confidence about data problems – and that is priceless.
+
+
…
+
+
John Gillett is currently looking for new researchers with whom his students could collaborate. If you are a researcher who can provide a reasonably large and accessible dataset, a question, and guidance, please reach out to jgillett@wisc.edu.
Google’s launch of a Quantum Virtual Machine emulates the experience and results of programming one of Google’s quantum computers, managed by an HTCondor system running in Google Cloud.
+
+
+
+
The CEO of Google and Alphabet, Sudar Pichai, tweeted out some thrilling news:
+
+
“Excited to launch a Quantum Virtual Machine (QVM) which emulates the experience and results of programming one of our quantum computers. It will make it easier for researchers to prototype new algorithms and help students learn how to program a quantum computer.” – Tweet.
+
+
Today’s “classical” computing systems, from laptops to large supercomputers, are built using circuit behavior defined by classical physics. Quantum computer circuity, still in the early phases of development, harnesses the laws of quantum mechanics to solve computing problems in new ways. Quantum computers offer exponential speedups – over 100 million times faster for specific issues – to produce groundbreaking results. However, quantum computing will require scientists and engineers to revisit many classical algorithms and develop new ones tailored to exploit the benefits of quantum processors. Therefore, the QVM is a helpful tool for quantum algorithms research.
+
+
“The QVM is, in essence, a realistic simulation of a grid on our quantum hardware using classical computers,” Tom Downes, a consultant for High-Performance Computing (HPC) at Google Cloud, explains. Simulating a grid of qubits, the basic unit of quantum information, on a quantum processor requires many trajectory simulations of quantum noise. Downes explains, “quantum computers are noisy, so it is important to test and adjust your quantum circuits in realistic conditions so they can perform well and output the data you are looking for in your research problem. To virtualize a processor, the QVM uses the noise data and topology of Google’s real hardware.” This grid size determines whether a researcher can use their laptop or require a setup utilizing many classical computers to power the simulation. Essentially, research on the QVM is “proof of concept” research.
+
+
To enable researchers to test their algorithms on a larger grid of qubits, Google utilized the HTCondor Software Suite (HTCSS) to organize the capacity of many classical computers to run multiple simulations of a quantum circuit simultaneously. The HTCondor Software Suite enables researchers to easily harness the collective computing power of many classical computers and submit and manage large numbers of computing jobs. Today, HTCSS is used at universities, government labs, and commercial organizations worldwide, including within Google’s own Google Cloud Platform, to power QVM. Downes details, “this ability to test on a 32-qubit grid can extrapolate its performance to a non-simulatable grid more feasible.”
+
+
The new Google Quantum AI tutorial shows users how to use the Cloud HPC Toolkit, which makes it easy for new users to deploy HTCondor pools in Google Cloud. Downes describes that the tutorial “provides the basic elements of an HTCondor pool: a central manager, an access point, and a pool of execute points that scale in size to work through the job queue.”
+
+
The tutorial by Google describes how to:
+
+
Use terraform to deploy an HTCondor cluster in the Google Cloud
+
Run a multi-node quantum computing simulation using HTCondor
+
Query cluster information and monitor running jobs in HTCondor
Don’t miss this opportunity to reconnect with colleagues and learn more about HTC.
+
+
Join us for the second annual integrated Throughput Computing event — which combines HTCondor’s former annual event “HTCondor Week” and the OSG’s “All-Hands Meeting” — from July 8-12 to be held at the University of Wisconsin-Madison’s Fluno Center. HTC24 is sponsored by the OSG Consortium, the HTCondor team and the UW-Madison Center for High Throughput Computing.
+
+
Registration will open in March. This will primarily be an in-person event, but remote participation (via Zoom) for the many plenary events will also be offered.
+If you register for the in-person event at the University of Wisconsin–Madison, you can attend plenary and non-plenary sessions, mingle with colleagues, and have planned or ad hoc meetings. Evening events are also planned throughout the week.
+
+
The Agenda
+
+
All the topics typically covered by HTCondor Week and the OSG All-Hands Meeting will be included:
+
+
+
Science Enabled by the OSPool and the HTCondor Software Suite (HTCSS)
+
OSG Technology
+
HTCondor Technology
+
HTCondor and OSG Tutorials
+
State of the OSG
+
Campus Services and Perspectives
+
+
+
Questions and Resources
+
+
For questions about attending, speaking, accommodations, and other concerns please contact us at htc@path-cc.io.
+
+
To learn about this event in more detail, view last year’s schedules for HTC23:
+ A Long-Awaited Reunion: HTCondor Week 2022 in Photos
+
+
+
+
HTCondor Week 2022 featured over 40 exciting talks, tutorials, and research spotlights focused on the HTCondor Software Suite (HTCSS). Sixty-three attendees reunited in Madison, Wisconsin for the long-awaited in-person meeting, and 111 followed the action virtually on Zoom. Continue scrolling for a visual recap of the exciting week.
+
+
+
+
To kick off the day, staff and attendees gather in the Fluno Lobby –– where there’s no shortage of coffee, snacks, or conversation.
+
+
+
+
Miron Livny welcomes participants to HTCondor Week. In-person participants traveled from Illinois, Nebraska, and even Amsterdam. Those who tuned in virtually represented seven different countries.
+
+
+
+
Eric Wilcots, Dean of the College of Letters & Science and the Mary C. Jacoby Professor of Astronomy at UW-Madison, delivered an inspiring keynote talk on the impact that high-throughput computing will bring on the future discoveries about our universe.
+
+
+
+
To wrap up the first day of HTCondor Week, staff and attendees embarked on a bike ride around Madison.
+
+
+
+
Justin Hiemstra, a Machine Learning Application Specialist for CHTC’s GPU Lab, describes the testing suite he developed to test for compatibility across ML frameworks and various GPU models in CHTC’s local HTC pool.
+
+
+
+
Emile Turatsinze, a systems administrator at the Morgridge Institute for Research, thoughtfully listens to a talk from Saqib Haleem about the CMS project’s transition to token-based authentication.
+
+
+
+
HTCondor Week staff and participants enjoy cold pitchers and tasty food on the Wisconsin Union Terrace during an evening sponsored by Google Cloud.
+
+
+
+
Yudhajit Pal, a member of the Schmidt research group in UW-Madison’s Department of Chemistry, briefly pauses while explaining how he used HTCSS-enabled machine learning to probe photoexcitation of iridium complexes.
+
+
+
+
Brian Bockelman poses a question during the Q&A period following Sam Gelman’s presentation on using HTCSS for high-throughput molecular simulations of the protein sequence-function relationship.
+
+
+
+
Lively discussions filled the Fluno Auditorium between sessions. Pictured above are CHTC Research Computing Facilitator Lauren Michael and Ph.D. Candidate Rafael Ferreira of UW-Madison’s Department of Animal and Dairy Sciences.
+
+
+
+
Todd Tannenbaum, Mary Hester, Brian Bockelman, and Miron Livny get some fresh air between talks.
+
+
+
+
Miron Livny expresses closing remarks as the week comes to a close. Thank you to all who participated in HTCondor Week 2022. We hope to see you next year!
+ Using HTC for a simulation study on cross-validation for model evaluation in psychological science
+
+
+
+
During the OSG School Showcase, Hannah Moshontz, a postdoctoral fellow at UW-Madison’s Department of Psychology, described her experience of using high throughput computing (HTC) for the very first time, when taking on an entirely new project within the field of psychology. While Hannah’s research generally focuses on understanding goal pursuit in everyday life, she and her colleagues had noticed that there seemed to be a lack of “best practices” for evaluating the quality of results from the field’s recent integration of machine learning approaches.
+
+
Describing the motivation behind the project, Hannah explains: “We were seeing a lot of published papers in top outlets that were incorrectly understanding and interpreting cross-validated model performance estimates. These models were described as usable for making diagnoses and clinical decisions.” This project, a simulation study, aimed to understand cross-validated performance estimates in psychology, and give guidance on how future psychological science researchers should use cross validation in their data.
+
+
While a typical machine learning study entails running tens of thousands models –– Hannah’s study required 144,000 times this number in order to evaluate results from numerous studies. With the total estimated compute time for the project being over one million hours, Hannah understood from the beginning that “high throughput computing was going to be essential.”
+
+
The Center for High Throughput Computing at UW-Madison worked with Hannah to help get her team’s simulations distributed on the Open Science Pool. Hannah used the programming software R to simulate data and train, select, and evaluate machine learning models. The output from each simulation batch came in the form of a zipped file that included a summary of the best model performance along with information about the model. Throughout the process, Hannah and her team tracked jobs in a spreadsheet to stay organized.
+
+
Reflecting on the impact of HTC on the study as a whole, she reasons, “without HTC, we couldn’t have conducted this study in my lifetime.” While this project was Hannah’s first taste of HTC, today she’s integrated it into many different facets of her work.
+
+
…
+
+
This article is part of a series of articles from the 2021 OSG Virtual School Showcase. OSG School is an annual education event for researchers who want to learn how to use distributed high throughput computing methods and tools. The Showcase, which features researchers sharing how HTC has impacted their work, is a highlight of the school each year.
Justin Hiemstra, a Machine Learning Application Specialist for CHTC’s GPU Lab, discusses the testing suite developed to test CHTC’s support for GPU and ML framework compatibility.
+
+
+
+
Researchers at UW–Madison have increasingly required graphics processing units (GPUs) for their work. GPUs are specialized computing hardware that drives different data science technologies, including machine learning (ML). But what actually goes into running an ML job on the UW-Madison Center for High Throughput Computing (CHTC) using GPU capacity?
+
+
Justin Hiemstra, a graduate student in the Department of Electrical and Computer Engineering at the University of Wisconsin-Madison and currently working as an ML Application Specialist for CHTC’s GPU Lab, outlined the steps for running an ML job on CHTC using GPU capacity during HTCondor Week 2022.
+
+
Whenever a researcher has an ML job that they want to run on CHTC with a GPU, they need three things:
+
+
First, the researcher must write their ML code using a deep learning framework, such as PyTorch or TensorFlow.
+
+
Second, the researcher needs to pick a GPU type. “You can run ML jobs on a normal server without GPUs, but certain machine learning processes (e.g., neural networks) run much faster if you use one,” notes Christina Koch, one of Hiemstra’s supervisors for his work. When using the HTCondor Software Suite, the researcher can choose a specific GPU type by specifying a CUDA compute capability in the HTCondor job submit file.
+
+
Third, the researcher has to pick a CUDA runtime library. This library handles communication between the GPU and the application space, allowing the ML code to run its computations.
+
+
For an ML job to complete successfully, these components (ML framework, GPU type, CUDA runtime) must be compatible.
+
+
Some issues come into play with this setup. The first issue is a lack of documentation.“There’s no central resource we can go to to look at and see if different versions of deep learning frameworks, GPUs, and capabilities are compatible with each other,” Hiemstra notes.
+
+
The second issue is that as these frameworks and GPU hardware evolves, Hiemstra and his team have noticed they’ve started to drop support for older frameworks and compute capabilities.
+
+
The third issue is “whenever you have computing resources made up of discrete servers in a computing pool, you run into the issue of heterogeneous server configurations.” This issue adds to the problem and confusion of trying to pick compatible versions.
+
+
Hiemstra has put together a suite of test jobs to explore this compatibility issue. The jobs test whether a single tuple of CUDA runtime, framework, and compute capability versions are compatible on a CHTC resource. He looks at three things to do so:
+
+
First, did the jobs match, meaning, was it able to find the resources that they requested? Second, did the Conda Environment resolve, meaning, was it able to match all of the versions without finding any conflicting dependencies? Finally, was the framework able to communicate with GPUs? The job should run as expected if all three of these things happen. The test job will print out an error file if any of these fail.
+
+
When a job fails, it’ll give some indication as to where the error occurred. These messages get recorded and later reviewed by Hiemstra so he can try to understand better what’s happening on the GPU Servers in the CHTC GPU Lab.
+
+
The goal now for Hiemstra and his team is to “look at all of the different versions we might be interested in combining to see which ranges of compute capabilities, frameworks, and CUDA runtime libraries might work with each other.”
+
+
Issues arise when trying to analyze the entire version space. First, the version space to test grows combinatorially. “On an active system where research is being done, that’ll start gobbling up capacity and taking it away from the researchers,” Hiemstra remarks.
+
+
To prune the version space, so they’re not testing tens of thousands of different versions, they know that the CHTC has certain compute capabilities available. Knowing this, Hiemstra and his team can limit the number of versions they test only to include those that the CHTC has available. In addition, they assume that researchers use tools, such as Conda, to install their software, so they focus on framework and CUDA runtime versions that are available through Conda.
+
+
The second issue is that the team needs some way of automatically collecting the different test parameters. Essentially, the goal is to have it so that someone in CHTC doesn’t have to update this by hand continuously. Each job needs several files to run, so to dynamically generate these files, the team uses Python String formatting to build these files.
+
+
Finally, they’d like to find a way to manage all the jobs since they will continue to “fire off” hundreds of jobs during this process. To do this, they decided on a “timeout” period of 24 hours so that they don’t have scripts running on the CHTC Access Point indefinitely.
+
+
Hiemstra and his team use DAGman, a tool for Directed Acyclic Graph (DAG) workflows, to first spawn a parent process.
+
+
That parent process will do all the version space pruning and file generation. It’ll then submit all the jobs for testing, wait 24 hours for that timeout, and “run a postscript to interpret the output of those jobs.”
+
+
Next, they process all the output files to gain further insight into how the system works.
+
+
Currently, they’re running it quarterly, looking at the output table, and seeing if anything unexpected pops up. Hiemstra explains that going through this process “should give us some tools to debug the system if something suddenly crashes or if different versions a researcher is using are not compatible with each other.
+
+
Hiemstra is curious to see and examine for the future how the choice of versions that a researcher picks affects the runtime of their ML model or whether or not it affects the outcome or performance of that model.
+
+
“Everything about machine learning approaches is diverse and changing quickly,” Koch remarks, “having information about compatible frameworks and GPUs allows us to be more responsive and helpful to researchers who may be new to the field.”
+
+
The implementation of this tool that Hiemstra and his team have developed can be found on the CHTC GitHub Repository.
+
+
…
+
+
Watch a video recording of Justin Hiemstra’s talk at HTCondor Week 2022, and browse his slides.
As an infrastructure administrator, I operate the various computers that provide the services required by the CHTC. The CHTC Infrastructure Services team handles the behind-the-scenes technology so that researchers can focus on what matters: their research! And, of course, leveraging various technologies to meet their research computing needs. For high throughput computing (HTC), we run HTCondor, developed right here at UW-Madison. For high-performance computing (HPC) needs, we offer a Slurm cluster. It is a great privilege to work down the hall from the development team of HTCondor.
+
+
Can you talk about the new hardware refresh that just occurred?
+
As you might imagine, part of being responsible for running an HTCondor pool is providing a place for the research computing to happen – we call such computers “execute points.” Our newest and most powerful execute points came from the recent “Technology Refresh,” an effort made possible through the generous support of the Office of the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation. These 207 new computers provide substantially more capacity for researchers across campus to do science with the CHTC. Recently, much of my time and effort has gone into taking these devices from new-in-box machines and turning them into fully functioning execute points. It has been quite a challenge, but it also has been very rewarding.
+
+
What’s been your favorite part about working at the CHTC?
+
I really like the people I work with! Everyone is very friendly and helpful; one can cry for help in the hallway, and team members will almost certainly stop by to lend a hand. Don’t get me wrong – the hardware, the technology, and supporting research are all highlights of being a part of the CHTC, but it is the people around me that I appreciate the most.
+
+
What challenges do you face in your position, and how do you overcome them?
+
Research computing, despite its name, lends itself to a fast-paced environment. It is engaging (sometimes even fun!) but also quite the challenge. Priorities change rapidly, and it takes a good deal of flexibility to keep up. Most often, my days do not go as I plan – and that’s okay! Keeping an eye on the big picture, going with the “flow” of each new day, and working closely with my colleagues is how I overcome the many challenges of being a SysAdmin in the research computing world.
+
+
What’s been one of the most exciting changes that have happened recently at the CHTC?
+
I don’t mean to bang on the Tech Refresh drum, but then, I absolutely do – the tech refresh is an exciting and “refreshing” change. It’s a huge deal to us. The quantity and quality of the new hardware really make a massive difference from my perspective, and I hope that the researchers leveraging CHTC will notice it too. Even more exciting is the hope that the CHTC and research computing are becoming more well-known on campus. For me, the Tech Refresh is evidence that we are moving in the right direction toward that goal.
+
+
What’s your favorite flavor of Babcock ice cream?
+
Blue Moon is always my go-to flavor. Nostalgia may influence my choice, as that’s the flavor we would have while visiting the beach when I was very young.
+
+
What’s your favorite fall activity to do in Madison?
+
My favorite fall activity is going apple picking; the sheer number of apple varieties always impresses me. There are a few local orchards that I particularly enjoy.
+
+
You famously came up with “Caturday,” where people post pictures of their cats every Saturday in our CHTC social chat; can you tell us a little about yours?
+
I’m not sure about “famously,” but who doesn’t like cat pictures? CHTC, as it turns out, is made possible by the many cats that allow their humans to work here. I have two cats named Lilac and Peony. They’re both female orange tabbies, which is interesting because most orange tabbies are males. I adopted them upon moving to Madison. They are a bonded pair, meaning they had to be adopted together, and I am so glad to have two! They keep each other company, play together, and cause trouble together. I wouldn’t have it any other way! I often joke that I work to put food on their plates.
+ The role of HTC in advancing population genetics research
+
+
Postdoctoral researcher Parul Johri uses OSG services, the HTCondor Software Suite, and the population genetics simulation program SLiM to investigate historical patterns of genetic variation.
+
+
+
+
Running hundreds of thousands of simulations is no easy task for just any researcher. When Parul Johri was faced with this particular problem, she knew she needed more computational power, which is where the OSG came into play.
+
+
+
+
Johri is a postdoctoral researcher with the Jensen Lab at Arizona State University who recently spoke about using high throughput computing (HTC) in her population genetics work at the recent OSG All-Hands Meeting 2022. Running hundreds of thousands of jobs that harnessed more than nine million computing hours on OSG’s Open Science Pool (OSPool), she shared that OSG services and the HTCondor Software Suite (HTCSS) were essential capabilities: “Without these HTC services and technologies, it would not have been possible to complete any of this work.”
+
+
Population genetics research focuses on understanding the impact of processes like selection and mutation that affect genetic variation within natural populations. However, there are no mathematical expressions to describe patterns of genetic variation in populations with complex histories and selection. Instead, hundreds of thousands of simulations are required to model these complicated evolutionary scenario trajectories, with HTCSS playing a critical role.
+
+
Some HTCSS features and HTC services and technologies were helpful for Johri’s work. First, high-throughput simulations are easy to communicate and execute via an HTCSS Access Point operated as part of the OSG Connect service. Beginning with population parameters that describe the entire population, Johri can create a single HTCSS submit file to simulate hundreds of thousands of gene samples across the genomes for each of these parameters. She then creates hundreds of thousands of evolutionary replicates for each simulation to make inferences about the parameters from a natural population. Each simulation is managed as a single job by HTCSS.
+
+
Additionally, because the OSPool supports the execution of user software within containers, Johri can easily run this work using SLiM, a population-genetic simulator. She and other population genetics researchers use these parameters to create simulations that imitate realistic data, making SLiM a beneficial and convenient program. Christina Koch, a Research Computing Facilitator at the CHTC, helped Johri create a SLiM container, making it easy to run on the OSPool.
+
+
The SLiM software doesn’t require input files, just the parameters Johri passes as commands to SLiM in the HTCSS submit file. HTCSS capabilities are available via the Access Points operated by OSG as part of the OSG Connect service for US-based research projects. After she submits the jobs through an HTCSS Access Point, SLiM performs simulations for each input parameter. It sends back an output file – anything from a simple summary statistic to entire genome samples of individuals from the simulated population.
+
+
Through an HTCSS Access Point, Johri ran three million jobs for examining genetic variation in Drosophila (common fruit flies common to genetics research), 50,000 jobs for influenza, and one and a half million jobs for humans. Using over nine and a half million wall hours in the last three years, Johri has published three manuscripts rich with genetic patterns and findings.
+
+
Looking towards the horizon, Johri views HTC services as a vital resource: “I’m hoping that HTC services and technologies will continue to play a central role in performing evolutionary inferences in the future.” This hope doesn’t only apply to Johri’s research –– it’s reflective of the entire field of population genetics. With dHTC services and technologies like the OSPool and HTCSS at their fingertips, population genetics researchers everywhere can push the field’s boundaries.
+ OSG User School 2022 Researchers Present Inspirational Lightning Talks
+
+
The OSG User School student lightning talks showcased their research, inspiring all the event participants.
+
+
+
+
Each summer, the OSG Consortium offers a week-long summer school for researchers who want to learn how to use high-throughput computing (HTC) methods and services to handle large-scale computing applications at the heart of today’s cutting-edge science. This past summer the school was back in-person on the University of Wisconsin–Madison campus, attended by 57 students and over a dozen staff.
+
+
Participants from Mali and Uganda, Africa, to campuses across the United States learned through lectures, discussions, and hands-on activities how to apply HTC approaches to handle large ensembles of jobs and large datasets in support of their research work.
+“It’s truly humbling to see how much cool work is being done with computing on @CHTC_UW and @opensciencegrid!!” research facilitator Christina Koch tweeted regarding the School.
+
+
One highlight of the School is the closing participants’ lightning talks, where the researchers present their work and plans to integrate HTC, expanding the scope and goals of their research.
+The lightning talks given at this year’s OSG User School illustrate the diversity of students’ research and its expanding scope enabled by the power of HTC and the School.
+
+
Note: Applications to attend the School typically open in March. Check the OSG website for this announcement.
+
+
+
+
Devin Bayly, a data and visualization consultant at the University of Arizona’s Research Technologies department, presented “OSG for Vulkan StarForge Renders.” Devin has been working on a multimedia project called Stellarscape, which combines astronomy data with the fine arts. The project aims to pair the human’s journey with a star’s journey from birth to death.
+
+
His goal has been to find a way to support connections with the fine arts, a rarity in the HTC community. After attending the User School, Devin intends to use the techniques he learned to break up his data and entire simulation into tiles and use a low-level graphics API called Vulkan to target and render the data on CPU/GPU capacity. He then intends to combine the tiles into individual frames and assemble them into a video.
+
+
+
+
Starforge Anvil of Creation: Grudi’c, Michael Y. et al. “STARFORGE: Toward a comprehensive numerical model of star cluster formation and feedback.” arXiv: Instrumentation and Methods for Astrophysics (2020): n. pag. https://arxiv.org/abs/2010.11254
+
+
+
+
Mike Nsubuga, a Bioinformatics Research fellow at the African Center of Excellence in Bioinformatics and Data-Intensive Sciences (ACE) within the Infectious Disease Institute (IDI) at Makerere University in Uganda, presented “End-to-End AI data systems for targeted surveillance and management of COVID-19 and future pandemics affecting Uganda.”
+
+
Nsubuga noted that in the United States, there are two physicians for every 1000 people; in Uganda, there is only one physician per 25,000 people. Research shows that AI, automation, and data science can support overburdened health systems and health workers when deployed responsibly.
+Nsubuga and a team of Researchers at ACE are working on creating AI chatbots for automated and personalized symptom assessments in English and Luganda, one of the major languages of Uganda. He’s training the AI models using data from the public and healthcare workers to communicate with COVID-19 patients and the general public.
+
+
While at the School, Nsubuga learned how to containerize his data into a Docker image, and from that, he built an Apptainer (formerly Singularity) container image. He then deployed this to the Open Science Pool (OSPool) to determine how to mimic the traditional conversation assistant workflow model in the context of COVID-19. The capacity offered by the OSPool significantly reduced the time it takes to train the AI model by eight times.
+
+
+
+
Jem Guhit, a Physics Ph.D. candidate from the University of Michigan, presented “Search for Di-Higgs production in the LHC with the ATLAS Experiment in the bbtautau Final State.” The Higgs boson was discovered in 2012 and is known for the Electroweak Symmetry Breaking (EWSB) phenomenon, which explains how other particles get mass. Since then, the focus of the LHC has been to investigate the properties of the Higgs boson, and one can get more insight into how the EWSB Mechanism works by searching for two Higgs bosons using the ATLAS Detector. The particle detectors capture the resultant particles from proton-proton collisions and use this as data to look for two Higgs bosons.
+
+
DiHiggs searches pose a challenge because the rate at which a particle process occurs for two Higgs bosons is 30x smaller than for a single Higgs boson. Furthermore, the particles the Higgs can decay to have similar particle trajectories to other particles produced in the collisions unrelated to the Higgs boson. Her strategy is to use a machine learning (ML) method powerful enough to handle complex patterns to determine whether the decay products come from a Higgs boson. She plans to use what she’s learned at the User School to show improvements in her machine-learning techniques and optimizations. With these new skills, she has been running jobs on the University of Michigan’s HTCondor system utilizing GPU and CPUs to run ML jobs efficiently and plans to use the OSPool computing cluster to run complex jobs.
+
+
+
+
Peder Engelstad, a spatial ecologist and research associate in the Natural Resource Ecology Laboratory at Colorado State University (and 2006 University of Wisconsin-Madison alumni), presented a talk on “Spatial Ecology & Invasive Species.” Engelstad’s work focuses on the ecological importance of natural spatial patterns of invasive species.
+
+
He uses modeling and mapping techniques to explore the spatial distribution of suitable habitats for invasive species. The models he uses combine locations of species with remotely-sensed data, using ML and spatial libraries in R. Recently. he’s taken on the massive task of creating thousands of suitability maps. To do this sequentially would take over three years, but he anticipates HTC methods can help drastically reduce this timeframe to a matter of days.
+
+
Engelstad said it’s been exciting to see the approaches he can use to tackle this problem using what he’s learned about HTC, including determining how to structure his data and break it into smaller chunks. He notes that the nice thing about using geospatial data is that they are often in a 2-D grid system, making it easy to index them spatially and designate georeferenced tiles to work on. Engelstad says that an additional benefit of incorporating HTC methods will be to free up time to work on other scientific questions.
+
+
+
+
Zachary Baldwin, a Ph.D. candidate in Nuclear and Particle Physics at Carnegie Mellon University, works for the GlueX Collaboration, a particle physics experiment at the Thomas Jefferson National Lab that searches for and studies exotic hybrid mesons. Baldwin presented a talk on “Analyzing hadronic systems in the search for exotic hybrid mesons at GlueX.”
+
+
His thesis looks at data collected from the GlueX experiment to possibly discover forbidden quantum numbers found within subatomic particle systems to determine if they exist within our universe. Baldwin’s experiment takes a beam of electrons, speeds them up to high energies, and then collides them with a thin diamond wafer. These electrons then slow down, producing linearly polarized photons. These photons will then collide with a container of liquid hydrogen (protons) within the center of his experiment. Baldwin studies the resulting systems produced within these photon-proton collisions.
+
+
The collision creates billions of particles, leaving Baldwin with many petabytes of data. Baldwin remarks that too much time gets wasted looping through all the data points, and massive processes run out of memory before he can compute results, which is one aspect where HTC comes into play. Through the User School, another major area he’s been working on is simulating Monte Carlo particle reactions using OSPool’s containers which he pushes into the OSPool using HTCondor to simulate events that he believes would happen in the real world.
+
+
+
+
Olaitan Awe, a systems analyst in the Information Technology department at the Jackson Laboratory (JAX), presented “Newborn Screening (NBS) of Inborn Errors of Metabolism (IEM).” The goal of newborn screening is that, when a baby is born, it detects early what diseases they might have.
+
+
Genomic Newborn Screenings (gNBS) are generally cheap, detect many diseases, and have a quick turnaround time. The gNBS takes a child’s genome and compares it to a reference genome to check for variations. The computing challenge lies in looking for all variations, determining which are pathogenic, and seeing which diseases they align with.
+
+
After attending the User School, Awe intends to tackle this problem by writing DAGMan scripts to implement parent-child relations in a pipeline he created. He then plans to build custom containers to run the pipeline on the OSPool and stage big data shared across parent-child processes. The long-term goal is to develop a validated, reproducible gNBS pipeline for routine clinical practice and apply it to African populations.
+
+
+
+
Max Bareiss, a Ph.D. Candidate at the Virginia Tech Center for Injury Biomechanics presented “Detection of Camera Movement in Virginia Traffic Camera Video on OSG.” Bareiss used a data set of 1263 traffic cameras in Virginia for his project. His goal was to determine how to document the crash, near-crashes, and normal driving recorded by traffic cameras using his video analysis pipeline. This work would ultimately allow him to detect vehicles and pedestrians and determine their trajectories.
+
+
The three areas he wanted to tackle and obtain help with at the User School were data movement, code movement, and using GPUs for other tasks. For data movement, he used MinIO, a high-performance object storage, so that the execution points could directly copy the videos from Virginia Tech. For code movement, Bareiss used Alpine Linux and multi-stage build, which he learned to implement throughout the week. He learned about using GPUs at the Center for High Throughput Computing (CHTC) and in the OSPool.
+
+
Additionally, he learned about DAGMan, which he noted was “very exciting” since his pipeline was already a directed acyclic graph (DAG).
+
+
+
+
Matthew Dorsey, a Ph.D. candidate in the Chemical and Biomolecular Engineering Department at North Carolina State University, presented on “Computational Studies of the Structural Properties of Dipolar Square Colloids.”
+
+
Dorsey is studying a colloidal particle developed in a research lab at NC State University in the Biomolecular Engineering Department. His research focuses on using computer models to discover what these particles can do. The computer models he has developed explore how different parameters (like the system’s temperature, particle density, and the strength of an applied external field) affect the particle’s self-assembly.
+
+
Dorsey recently discovered how the magnetic dipoles embedded in the squares lead to structures with different material properties. He intends to use the HTCondor Software Suite (HTCSS) to investigate the applied external fields that change with respect to time. “The HTCondor system allows me to rapidly investigate how different combinations of many different parameters affect the colloids’ self-assembly,” Dorsey says.
+
+
+
+
Ananya Bandopadhyay, a graduate student from the Physics Department at Syracuse University, presented “Using HTCondor to Study Gravitational Waves from Binary Neutron Star Mergers.”
+
+
Gravitational waves are created when black holes or neutron stars crash into each other. Analyzing these waves helps us to learn about the objects that created them and their properties.
+
+
Bandopadhyay’s project focuses on LIGO’s ability to detect gravitational wave signals coming from binary neutron star mergers involving sub-solar mass component stars, which she determines from a graph which shows the detectability of the signals as a function of the component masses comprising the binary system.
+
+
The fitting factors for the signals would have initially taken her laptop a little less than a year to run. She learned how to use OSPool capacity from the School, where it takes her jobs only 2-3 days to run. Other lessons that Bandopadhyay hopes to apply are data organization and management as she scales up the number of jobs. Additionally, she intends to implement containers to help collaborate with and build upon the work of researchers in related areas.
+
+
+
+
Meng Luo, a Ph.D. student from the Department of Forest and Wildlife Ecology at the University of Wisconsin–Madison, presented “Harnessing OSG to project the impact of future forest productivity change on land use change.” Luo is interested in learning how forest productivity increases or decreases over time.
+
+
Luo built a single forest productivity model using three sets of remote sensing data to predict this productivity, coupling it with a global change analysis model to project possible futures.
+
+
Using her computer would take her two years to finish this work. During the User School, Luo learned she could use Apptainer to run her model and multiple events simultaneously. She also learned to use the DAGMan workflow to organize the process better. With all this knowledge, she ran a scenario, which used to take a week to complete but only took a couple of hours with the help of OSPool capacity.
+
+
Tinghua Chen from Wichita State University presented a talk on “Applying HTC to Higgs Boson Production Simulations.” Ten years ago, the ATLAS and CMS experiments at CERN announced the discovery of the Higgs boson. CERN is a research center that operates the world’s largest particle physics laboratory. The ATLAS and CMS experiments are general-purpose detectors at the Large Hadron Collider (LHC) that both study the Higgs boson.
+
+
For his work, Chen uses a Monte Carlo event generator, Herwig 7, to simulate the production of the Higgs boson in vector boson fusion (VBF). He uses the event generator to predict hadronic cross sections, which could be useful for the experimentalist to study the Standard Model Higgs boson. Based on the central limit theorem, the more events Chen can generate, the more accurate the prediction.
+
+
Chen can run ten thousand events on his laptop, but the predictions could be more accurate. Ideally, he’d like to run five billion events for more precision. Running all these events would be impossible on his laptop; his solution is to run the event generators using the HTC services provided by the OSG consortium.
+
+
Using a workflow he built, he can set up the event generator using parallel integration steps and event generation. He can then use the Herwig 7 event generator to build, integrate, and run the events.
+
+
…
+
+
Thank you to all the researchers who presented their work in the Student Lightning Talks portion of the OSG User School 2022!
+ CHTC Hosts Machine Learning Demo and Q+A session
+
+
Over 60 students and researchers attended the Center for High Throughput Computing (CHTC) machine learning and GPU demonstration on November 16th. UW Madison Associate Professor of Biostatistics and Medical Informatics Anthony Gitter and CHTC Lead Research Computing Facilitator Christina Koch led the demonstration and fielded many questions from the engaged audience.
+
+
+
+
CHTC services include a free large scale computing systems solution for campus researchers who have encountered computing issues and outgrown their resources, often a laptop, Koch began. One of the services CHTC provides is the GPU Lab, a resource within the HTC system of CHTC.
+
+
The GPU Lab supports up to dozens of concurrent jobs per user, a variety of GPU types including 40GB and 80GB A100s, runtimes from a few hours up to seven days, significant RAM needs, and space for large data sets.
+
+
Researchers are not waiting to take advantage of these CHTC GPU resources. Over the past two months, 52 researchers ran over 17,000 jobs on GPU hardware. Additionally, the UW-Madison IceCube project alone ran over 70,000 jobs.
There are two main ways to know what GPUs are available and the number of GPUs users may request per job:
+
+
+
The first is through the CHTC website - which offers up-to-date information. To access this information, go to the CHTC website and enter ‘gpu’ in the search bar. The first result will be the ‘Jobs that Use GPU Overview’ which is the main guide on using GPUs in CHTC. At the very top of this guide is a table that contains information about the kinds of GPUs, the number of servers, and the number of GPUs per server, which limits how many GPUs can be requested per job. Also listed is the GPU memory, which shows the amount of GPU memory and the attribute you would use in the ‘required_gpu’ statement when submitting a job.
+
+
+
A second way is to use the ‘condor_status’ command. To use this command, make sure to set a constraint of ‘Gpus > 0’ to prevent printing out information on every single server we have in the system: condor_status -constraint ‘Gpus > 0’. This gives the names of servers in the pool and their availability status - idle or busy. Users may also add an auto format flag attribute ‘-af’ to print out any desired attribute of the machine. For instance, to access the attributes like those listed in the table of the CHTC guide, users must include the GPUs prefix followed by an underscore and then the name of the column to access.
+
+
+
+
The GPU Lab, due to its expansive potential, can be used in many scenarios. Koch explained this using real-world examples. Researchers might want to seek the CHTC GPU Lab when:
+Running into the time limit of an existing GPU while trying to develop and run a machine learning algorithm.
+Working with models that require more memory than what is available with a current GPU in use.
+Trying to benchmark the performance of a new machine algorithm and realizing that the computing resources available are time-consuming and not equipped for multitasking.
+
+
While GPU Lab users routinely submit many jobs that need a single GPU without issue, users may need to work collaboratively with the CHTC team on extra testing and configuration when handling larger data sets and models and benchmark precise timing. Koch presented a slide outlining what is easy to more challenging on CHTC GPU resources, stressing that, when in doubt about what is feasible, to contact CHTC:
+
+
+
+
Work that is done in CHTC is run through a job submission. Koch presented a flowchart demonstration on how this works:
+
+
+
She demonstrated the three-step process of
+
+
login and file upload
+
submission to queue, and
+
job-run execution by HTCondor job scheduler.
+This process, she displayed, involves writing up a submit file and utilizing command line syntax to be submitted to the queue. Below are some commands that can be used to submit a file:
+
+
+
+
The next part of the demo was led by Gitter. To demonstrate what commands would be needed for specific kinds of job submissions, he explained what a job submit file should look like, some necessary commands, and the importance of listing out commands sequentially.
+
+
+
Gitter also demonstrated how to run jobs using the example GitHub repository with the following steps:
+
+
Connecting a personal user account to a submit server in CHTC
+
Utilizing the ‘ls’ command to inspect the home directory
+
Cloning the pre existing template repository with runnable GPU examples
+
Including a “‘condor_submitinsert-file-name.sub’” command line to define the job the user wants to run
+
Applying the ‘condor_q’command to monitor the job that has been submitted
+
+
+
Users are able to choose GPU related submit file options. Gitter demonstrated ways to access the different options that are needed in the HTCondor submit file in order to access the GPUs in CHTC GPU Lab and beyond. These include:
+
+
‘Request_gpus’ to enable GPU use
+
‘+WantGPULab’ to indicate whether or not to use CHTC’s shared use GPUs
+
+GPUJobLength’ to indicate which job type the user would like to submit
+
‘Require_gpus’ to request specific GPU attributes or CUDA functionality
+
+
+
He outlined some other commands for running PyTorch jobs and for exploring available GPUs. All commands from the demo can be accessed here.
+
+
The event concluded with a Q&A session for audience members. Some of these questions prompted a discussion on the availability of default repositories and tools that are able to track the resources a job is using. In addition to interactive monitoring, HTCondor has a log file that provides information about when a job was started, a summary of what was requested – disk, memory, GPUs and CPUs as well as what was allocated and estimated to be used.
+
+
Currently, there is a template GitHub repository that can be cloned and used as a starting point. These PyTorch and TensorFlow examples can be useful to you as a starting point. However, nearly every user is using a slightly different combination of packages for their work. For this reason, users will most likely need to make some manual modifications to either adjust versions, change scripts, attribute different names to your data file, etc.
+
+
These resources will be helpful when getting started:
+ Machine Learning and Image Analyses for Livestock Data
+
+
+
+
The vision of the Digital Livestock Lab is to create state-of-the-art
+computer vision systems and the largest public database for livestock.
+
+
In this presentation from HTCondor Week 2021, Joao Dorea from the
+Digital Livestock Lab explains how
+high-throughput computing is used in the field of animal and dairy
+sciences. Computer vision systems and sensors collect
+animal-level phenotypic data on cows to make more optimized decisions
+about what to do with each animal in terms of health, nutrition,
+reproduction, and genetics. One challenge of doing this has to do
+with the sheer size of data that is collected. Processing and
+storing tens of thousands of images of cows requires significant
+computational resources.
+
+
By utilizing HTCondor through a collaboration with the Center for High Throughput Computing, the Digital
+Livestock Lab has been able to focus their time and money on the livestock.
+Specialized to handle computational work that can be split into many pieces
+and run in parallel, image analysis aligns well with the ideal HTCSS workload.
+HTCondor allows them to run many jobs
+and experiments congruently faster, opening the door to larger and larger data sets.
+Being able to internalize numerous data sets in parallel has allowed the Digital Livestock Lab
+to gain significant insight into livestock systems, all thanks to HTCondor
+and collaborations with the faculty at the CHTC!
+
+
Read more about Joao Dorea and his research on the development of
+high-throughput phenotyping technologies on his
+homepage.
+ LIGO's Search for Gravitational Waves Signals Using HTCondor
+
+
Cody Messick, a Postdoc at the Massachusetts Institute of Technology (MIT) working for the LIGO lab, describes LIGO’s use of HTCondor to search for new gravitational wave sources.
+
+
+
+
High-throughput computing (HTC) is critical to astronomy, from black hole research to radial astronomy and beyond. At the 2022 HTCondor Week, another area of astronomy was put in the spotlight by Cody Messick, a researcher working for the LIGO lab and a Postdoc at the Massachusetts Institute of Technology (MIT). His work focuses on a gravitational-wave analysis that he’s been running with the help of HTCondor to search for new gravitational wave signals.
+
+
Starting with general relativity and why it’s crucial to his work, Messick explains that “it tells us two things; first, space and time are not separate entities but are instead part of a four-dimensional object called space-time. Second, space-time is warped by mass and energy, and it’s these changes to the geometry of space-time that we experience as gravity.”
+
+
Messick notes that general relativity is important to his work because it predicts the existence of gravitational waves. These waves are tiny ripples in the curvature of space-time that travel at the speed of light and stretch and compress space. Accelerating non-spherically symmetric masses generate these waves.
+
+
Generating ripples in the curvature of space-time large enough to be detectable using modern ground-based gravitational-wave observatories takes an enormous amount of energy; the observations made thus far have come from the mergers of compact binaries, pairs of extraordinarily dense yet relatively small astronomical objects that spiral into each other at speeds approaching the speed of light. Black holes and neutron stars are examples of these so-called compact objects, both of which are or almost are perfectly spherical.
+
+
Messick and his team first detected two black holes going two-thirds the speed of light right before they collided. “It’s these fantastic amounts of energy in a collision that moves our detectors by less than the radius of a proton, so we need extremely energetic explosions of collisions to detect these things.”
+
+
Messick looks for specific gravitational waveforms during the data analysis. “We don’t know which ones we’re going to look for or see in advance, so we look for about a million different ones.” They then use match filtering to find the probability that the random noise in the detectors would generate something that looks like a gravitational-wave; the first gravitational-wave observation had less than a 1 in 3.5 billion chance of coming from noise and matched theoretical predictions from general relativity extremely well.
+
+
Messick’s work with external collaborators outside the LIGO-Virgo-KAGRA collaboration looks for systems their normal analyses are not sensitive to. Scientists use the parameter kappa to characterize the ability of a nearly spherical object to distort when spinning rapidly or, in simple terms, how squished a sphere will become when spinning quickly.
+
+
LIGO searches are insensitive to any signal with a kappa greater than approximately ten. “There could be [signals] hiding in the data that we can’t see because we’re not looking with the right waveforms,” Messick explains. His analysis has been working on this problem.
+
+
Messick uses HTCondor DAGs to model his workflows, which he modified to make integration with OSG easier. The first job checks the frequency spectrum of the noise. These workflows go into an aggregation of the frequency spectrum, decomposition (labeled by color by type of detector), and finally, the filtering process occurs.
+
+
+
+
Although Messick’s work is more physics-heavy than computationally driven, he remarks that “HTCondor is extremely useful to us… it can fit the work we’ve been doing very, very naturally.”
+
+
…
+
+
Watch a video recording of Cody Messick’s talk at HTCondor Week 2022, and browse his slides.
+ NIAID/ACE students attend this year’s OSG User School 2022
+
+
+
+
This past July, the OSG User School 2022 welcomed students from across the globe to learn how to use high-throughput computing (HTC) in their scientific research.
+
+
The OSG User School has been an annual week-long event hosted at the University of Wisconsin-Madison for over a decade. The program uses lectures and hands-on exercises to introduce and inform students about HTC systems.
+
+
Five students from Makerere University in Uganda and the University Of Sciences, Techniques, and Technologies of Bamako in Mali, Africa, participated as a part of The U.S. National Institute of Allergy and Infectious Diseases (NIAID) and the African Centers for Excellence in Bioinformatics and Data-Intensive Science (ACE) partnership program.
+
+
This event was not the first time NIAID, ACE, and OSG partnered. Back in February, students and faculty in the ACE program engaged in a customized HTC training session over Zoom led by Christina Koch, a research computing facilitator with UW-Madison’s Center for High Throughput Computing.
+
+
HTC makes it easier for researchers with data-intensive or computationally heavy research to manage their work better and more efficiently. Using OSG high throughput computing services, researchers can tackle numerous tasks (like analyzing large amounts of data) that are too resource-intensive to run on just a laptop.
+
+
HTC uses parallel computing; so, when a researcher has a large data set they want to analyze, OSG high throughput computing services allow them to submit jobs in parallel and produce results more quickly.
+
+
+
+
One OSG User School 2022 attendee, Mike Nsubuga, came from Makerere University in Uganda as an MS student in Bioinformatics. Nsubuga also participated in the virtual training session back in February, which he says was a good start for him to have some experience using HTC and to be able to see how he can apply it to his research. To gain more experience, he applied for the continuation of the OSG School this summer.
+
+
In addition to conducting his research on antimicrobial resistance, Nsubuga is a software developer responsible for creating a Covid-19 AI chatbot based in Uganda. And although Nsubuga came to the User School almost certain the application of HTC wouldn’t work within the scope of his research, he admits he was pleasantly proved wrong.
+
+
“I would definitely recommend the OSG User School to others, without a doubt, at least to try,” he says. “It’s just a process of understanding what someone is trying to solve, what challenges they are facing, how they want to be helped—and trying to fit that into the OSG and seeing what it has to offer and what it can’t.”
+
+
Aoua Coulibaly was another participant who had taken the February training. Coulibaly is a Bioinformatics consultant at ACE from the University Of Sciences, Techniques, and Technologies of Bamako in Mali and a Ph.D. student in the same subdiscipline. Her research of interest lies in studying the malaria parasite, Plasmodium.
+
+
Coulibaly had previous working experience with High-Performance Computing (HPC) to evaluate systems. Through the User School, she found the benefits of incorporating HTC with research.
+
+
“The fact that we can submit multiple jobs at once, I think that was really interesting,” she says. “I can apply that to my research so the analysis can go faster.”
+
+
Also continuing training was Modibo Goita, an MS student in Bioinformatics with studies focused on Malian genetic neurological disorders. His thesis is on the concept of genetics with an emphasis on early breast cancer detection screening via germline mutations.
+
+
In genomics, the challenge is that the data size is often immense. Goita learned that with the help of OSG high throughput computing services, he could explore the possibility of scaling up and going beyond the limitations that a single computer cluster could provide.
+
+
+
+
Other trainees in attendance included Sitapha Coulibaly and Kangaye Amadou Diallo, both ACE students who journeyed to the Midwest from Mali. Diallo is a Ph.D. student in Bioinformatics whose research surrounds the potential for rice microbiomes to block damage to pesticide-free plants. Coulibaly is an MS student in Bioinformatics who concentrates on the genetics of crop-damaging soil bacteria.
+
+
As for the ACE students as a collective, they accredit the OSG staff’s willingness to help as a large part of why integrating HTC into their research was more effective and why their experience was worthwhile. Their consensus is that they would recommend the OSG User School to other researchers dealing with computing-intensive science while noting that spreading the word and hosting more collaborations is an essential means to do so.
+
+
The OSG and ACE/NIAID teams are looking forward to continued collaboration. In September 2022, the OSG’s Director, Frank Wuerthwein, and Research Computing Facilitator, Rachel Lombardi, will be traveling to Makerere University in Kampala, Uganda to lead a workshop on using OSG resources at the 2022 ACE Global Consortium Meeting.
+
+
Through this continued partnership, The NIAID/ACE, Morgridge, CHTC, and OSG hope to spread the word of HTC and advance basic research through HTC, with continued support for local and global collaborators—and ultimately helping bring computing resources to all.
+ NIAID/ACE - OSG collaboration leads to a successful virtual training session
+
+
The U.S. National Institute of Allergy and Infectious Diseases (NIAID) and the African Centers for Excellence in Bioinformatics and Data-Intensive Science (ACE) partnered with the OSG Consortium to host a virtual high throughput computing training session for graduate students from Makerere University and the University Of Sciences, Techniques, and Technologies of Bamako (USTTB).
+
+
+
+
Five thousand miles and seven time zones were no obstacle for forty-one dedicated researchers from Uganda and Mali participating in their first high throughput computing training session using the OSG high throughput computing services. On February 15, bioinformatics graduate students and faculty members from Makerere University in Uganda and the University Of Sciences, Techniques, and Technologies of Bamako in Mali engaged in a customized training session over Zoom led by Christina Koch, an OSG Research Computing Facilitator.
+
+
Dr. Mariam Quiñones, Dr. Darrell E. Hurt, Mr. Chris Whalen, and the ACE Global Operations Team within NIAID’s Office of Cyber Infrastructure and Computational Biology (OCICB) spearheaded this cross-continent collaboration between the OSG Consortium, the NIAID, and ACE, which supports bioinformatics training for graduate students and other researchers at Makerere University and USTTB. The ACE Global Operations Team works closely with the ACE Center Directors and instructors to identify gaps and provide supplemental hands-on training to the students. The NIAID ACE Global Operations Team recognized a need for additional computing resources to train graduate students and knew precisely where to turn.
+
+
Envisioning the power of a partnership between the OSG Consortium and the ACE community, Quiñones approached OSG Research Facilitation Lead Lauren Michael with the idea of a high throughput computing training session for the students and faculty within the ACE program.
+
+
NIAID’s previous success with running computational work on the Open Science Pool (OSPool) led Quiñones to think the impact might even reach beyond students trained by the ACE program. Predicting the spread of this adoption of OSG services, Quiñones remarks, “[w]e hope some of the faculty and associated staff actively generating data from data-intense research projects will begin to use the OSG services.”
+
+
In preparation for the training, OSG’s Research Facilitation Team planned to go beyond the usual introduction to the OSPool. This time around, the team designed a new tutorial that incorporated the BWA software, a tool commonly used in bioinformatics and familiar to the students. Koch, who led the training session, notes that the “goal of using the tutorial was to give the students hands-on experience using software that would be relevant to the kind of work they are already doing for their research.”
+
+
Building off Koch’s thoughts, Michael explains: “Given the shared bioinformatics needs of the students, we wanted to make sure the content went beyond our general New User Training format by encouraging conversation among training participants and using examples they’d connect with.” Reflecting, she adds: “It seemed to pay off, given the level of engagement.”
+
+
Through numerous public-private partnerships with the NIAID, African institutions, governments, and private-sector companies, ACE aims to enhance access to computational capabilities and infrastructure and provide training in data science and bioinformatics. This access will empower researchers and students to accelerate biomedical research and drive discoveries that could impact the treatment, prevention, and diagnosis of diseases in Africa and across the globe.
+
+
And while high throughput computing and the OSPool can play an essential role in advancing the bioinformatics behind some of these efforts, Michael emphasizes that the benefits are undoubtedly mutual for the OSG consortium:
+
+
“By working with ACE, engaging with participants, and adding documented bioinformatics examples to our resources –– we are better poised to support other researchers doing similar work and flexibly customize our training materials for other domains. We’re deeply grateful for this partnership.”
+ Learning and adapting with OSG: Investigating the strong nuclear force
+
+
+
+
Connor Natzke’s journey with the OSG Consortium began in 2019 as a student of the OSG User School. Today, nearly three years later, Natzke has executed
+600,000 simulations with the help of OSG staff and prior OSG programming. These simulations, each of them submitted as a job, logged over 135,000 core
+hours provided by the Open Science Pool (OSPool). Natzke’s history with the OSG Consortium reflects a pattern of learning, adapting, and improving that
+translates to the acceleration and expansion of scientific discovery. During the March OSG All-Hands Meeting 2022,
+Natzke was presented the David Swanson Memorial Award, which recognized him for his dedication and tenacity since joining the OSG Community.
+
+
+
+
Natzke is a Ph.D. student at the Colorado School of Mines and is currently located at TRIUMF, a particle
+physics laboratory in Vancouver, British Columbia. Natzke’s research focuses on the strong nuclear force, a fundamental force in nature that keeps protons and neutrons bound together in a cohesive unit at
+the center of an atom. This force exists at subatomic scales. Therefore, Natzke and his team require something quite large to observe it –– the GRIFFIN
+spectrometer. Standing at over ten feet tall, GRIFFIN can measure the angle between photons emitted from an unstable atomic nucleus located at the center
+of the instrument. This angle reveals important information about nuclear structure, but Natzke relies on numerous simulations to unveil the whole picture.
+
+
Because the gamma-ray detectors that make up GRIFFIN have limits to their measurement capabilities, Natzke and his team use a Monte Carlo simulation
+package called GEANT4 to reconstruct the angle between the emitted photons more precisely. This simulation involves mapping a large parameter space ––
+an energy surface –– of individual photon energies. Forty-one combinations of photon energies are needed to make one of these maps and three simulations
+are run for each of these combinations, with each requiring one billion simulated events. The resulting time required to make just one energy surface map is fifty thousand core hours or roughly five years and nine months if Natzke was relying simply on his laptop’s computational power.
+
+
“With standard computation, this quickly becomes an intractable project,” Natzke explains. “Luckily, I attended the OSG User School in 2019 and learned that
+Monte Carlo simulations are essentially the poster child for distributed high-throughput computing.”
+
+
With Monte Carlo simulations, one simulation of one billion events produces results equivalent to one million simulations of one thousand events.
+This unique quality transforms what would otherwise be a single lengthy and time-consuming job into many short and quick jobs that can be scaled out to
+run in a high-throughput environment. As Natzke sums it up, “It’s frankly beautiful how easily this works.”
+
+
With the help of OSG Research Computing Facilitation Lead Lauren Michael, Natzke used a personal meta-scheduler for HTCondor called
+DAGMan (Directed Acyclic Graph Manager) to automate his workflow. He wrote python scripts that created and
+submitted the DAG file to automate the process further. In total, this workflow took roughly 24 hours to produce one of 41 points on the energy surface
+map. Before using DAGMan, each point took one week.
+
+
But Natzke didn’t stop there. In 2021, he attended the OSG All-Hands Meeting and learned about Pegasus, an HTCondor-integrated
+workflow system that is offered by OSG’s Access Points. With support from OSG Facilitator and Pegasus developer, Mats Rynge, Natzke remodeled his workflow
+using Pegasus to improve file management, transfers, and error handling. The additional automation that Natzke had written around his DAGMan workflow was
+already provided by Pegasus, and it was enhanced. Natzke humbly jokes, “It’s written by computer scientists, rather than physicists masquerading as
+computer scientists.” His resulting workflow only takes three commands and finishes in merely four hours, a forty-fold increase compared to Natzke’s
+capabilities before OSG services.
+
+
With this new workflow, Natzke can expand upon what’s possible in terms of his research: “Every time I run this, I’m amazed at how much time and effort
+I’ve saved, and just the pure automation and capacity that I have access to with OSG. It’s just mind-blowing to me.”
+
+
…
+
+
+
+
The OSG David Swanson Award was established to honor our late colleague and chair of the OSG Consortium, David Swanson. David contributed to campus
+research across the country by advancing distributed high-throughput computing (dHTC) and the OSG. Learn more about David’s legacy and past recipients of his namesake award.
+ OSG School mission: Don’t let computing be a barrier to research
+
+
+
+
Most applicants to the annual OSG School share a common challenge: obstacles within their research that they would like to overcome. Answering this need, the OSG Consortium holds an annual weeklong School each summer for researchers and facilitators to expand their adoption of high throughput computing (HTC) methodologies. Instructors teach students through a combination of lectures and hands-on activities, starting out with the basics to accommodate all experience levels.
+
+
This year the 11th OSG School took place in August, with over 50 participants from across the nation as well as 5 attendees from Uganda and Mali, representing over 30 campuses or institutions and 35 research domains.
+
+
Online applications to attend the School open in March. Applicants are considered based on how large-scale computing could benefit their research. Over 100 applications are submitted each year, with around 60 being admitted. All of the participants’ travel and accommodation expenses are covered with funding from the Partnership to Advance Throughput Computing (PATh) NSF award.
+
+
The OSG School Director Tim Cartwright believes this year’s participants had as diverse computing experiences as they do backgrounds. “Some had never heard about large-scale computing until they saw the School announcements,” he said, “and others had been using it and recognized they were not getting as much out of it as they could.”
+
+
The obstacles researchers encountered that motivated their application to the School varied. Political Methodology Ph.D. candidate at the University of Wisconsin–Madison Saloni Bhogale attended this year’s School after applying HTC methods to her research for almost a year. Her research — which analyzes factors affecting access to justice in India — requires computation over millions of court cases and complaints. Bhogale found that her jobs kept abruptly halting throughout the year, and she was puzzled about how to resolve the problem and how the HTC services were operating. “There were too many hiccups I was constantly running into,” Bhogale said, “I felt like I was more confused than I should be.” When she saw a flier for the OSG School, she decided some extra help was in order.
+
+
Assistant Professor Xiaoyuan (Sue) Suo works in the Department of Math and Computer Science at Webster University and decided to attend the OSG School because she wanted to know more about HTC and its applications. “I never had systematic training,” she explained, “I felt training would be beneficial to me.”
+
+
Another participant at this year’s user school was Paulina Grekov, a doctoral student in Educational Psychology at the University of Wisconsin–Madison. She works in the quantitative methods program and runs complex statistical models of educational studies. Grekov originally tried to run computations without HTC, but it was taking a toll on her personal computer. “Some of the modeling I was doing, specifically statistical modeling, was just frying my computer. The battery was slowly breaking — it was a disaster — my computer was constantly on overdrive,” Grekov recalled.
+
+
During the School, participants were taught the basics of HTC. They were guided through step-by-step instructions and lectures, discussing everything from HTCondor job execution to troubleshooting. Each topic was accompanied by hands-on exercises that allowed attendees to experience the power of HTC. The School also delved into extra topics that could be useful to students, like workflows with DAGMan and GPUs.
+
+
Bhogale recalls that she appreciated the time participants were given to work on their own science applications and the ease of finding an expert to answer her questions. “I was running a pilot of the processes that I would want to do during the School — everyone was right there. So if I ran into an issue, I could just talk to someone,” she said.
+
+
On the last day of the School, the students had an opportunity to showcase what they learned during the week by presenting lightning talks on how they plan to apply HTC in their research. From tracing the evolution of binary black holes to estimating the effect of macroeconomic policies on the economy, ten participants presented ways in which their work could benefit from HTC.
+
+
Postdoctoral Ecologist Researcher Kristin Davis from New Mexico State University gave a lightning talk on how she would utilize HTC to run her large environmental datasets concerning the American Kestrel faster. Yujie Wan from the astronomy department at the University of Illinois Urbana-Champaign talked about how HTC could help her create astronomical maps using a submit file for each observation. Wan said she could then make a DAG file that combines her submit files and have all her maps in just two hours. Cyril Versoza, a graduate research assistant for the Pfeifer Lab at Arizona State University, discussed how the OSG would be a suitable system to implement a mutational spectrum pipeline for his work in evolutionary biology.
+
+
Lightning presentations like these open the door for researchers to hear from those outside of their fields. Participants also had the opportunity to hear from researchers who have already made progress in their research applying HTC. “I remember coming back almost every day and talking to my friends and saying there’s fascinating research happening,” Bhogale said.
After the school ended, some of this year’s attendees provided advice for prospective OSG School students. Grekov recommended that those who attend come in with a goal and a research question in mind. She believes it would lead students to ask the right questions and focus on particular aspects. “Come with an idea you want to solve,” she said. Bhogale recommended any potential student who is concerned about the difficulty of the School to simply “go all in.” She hopes to see more of the social science crowd, like herself, incorporating HTC into their research.
+
+
The 2023 OSG School was one event among a variety of activities that have furthered the spread of large-scale computing in the research world. Tim Cartwright says the goal of the School goes beyond selective expansion, however. “The big picture is always focused on the democratization of access to computing for research,” he said. “We’re trying to make it available to everyone in higher education, regardless of the scale of their computational needs.”
Thank you to all those who attended the OSG User School 2022. Throughout the week, students
+learned how to use HTC systems to run large-scale computing applications through lectures,
+discussions, and hands-on activities.
+
+
All materials and lesson plans can be found on the
+School’s website.
+
+
+
+
Thinking about applying next year?
+
+ We will begin taking applications near the beginning of 2023,
+ please check back to then to the OSG Website for more details!
+
+ OSPool's Growing Number of Cores Reaching New Levels
+
+
Campuses contributing to the capacity of the OSPool led to record breaking number of cores this December, 2022. On December 9th, the OSPool, which provides computing resources to researchers across the country, crossed the 70,000 cores line –– for the very first time.
+
+
+
+
+
It is no small feat to top over 70,000 cores in a single day. Over 50 campuses and organizations freely contributed their resources to the OSPool in support of Open Science. These campuses and organizations are dedicated to their mission to support research computing on their own campus and across the country.
+
+
Each year additional campuses and organizations add their contributions to the OSPool. Campuses newly adding computing capacity to the OSPool this year come in all sizes and include Cardiff University, Kansas State, New Mexico State University, University of South Dakota, University of Maine and more.
+
+
The contributions to the OSPool this year supported the research of 180 science projects and over 75 million computing jobs.
+ Expediting Nuclear Forensics and Security Using High Throughput Computing
+
+
Arrielle C. Opotowsky, a 2021 Ph.D. graduate from the University of Wisconsin-Madison’s Department of Engineering Physics, describes how she utilized high throughput computing to expedite nuclear forensics investigations.
+
+
+
+
+
+
“Each year, there can be from two to twenty incidents related to the malicious use of nuclear materials,” including theft, sabotage, illegal transfer, and even terrorism, Arrielle C. Opotowsky direly warned. Opotowsky, a 2021 Ph.D. graduate from the University of Wisconsin-Madison’s Department of Engineering Physics, immediately grabbed the audience’s attention at HTCondor Week 2022.
+
+
Opotowsky’s work focuses on nuclear forensics. Preventing nuclear terrorism is the primary concern of nuclear security, and nuclear forensics is “the response side to a nuclear event occurring,” Opotowsky explains. Typically in a nuclear forensics investigation, specific measurements need to be processed; unfortunately, some of these measurements can take months to process. Opotowsky calls this “slow measure” general mass spectrometry. Although this measurement can help point investigators in the right direction, they wouldn’t be able to do until long after the incident has occurred.
+
+
In trying to learn how she could expedite a nuclear forensics investigation, Opotowsky wanted to see if Gamma Spectroscopy, a “fast measurement”, could be the solution. This measure can potentially point investigators in the right direction, but in days rather than months.
+
+
To test whether this “fast measurement” could expedite a nuclear forensics investigation compared to a “slow measurement”, Opotowsky created a workflow and compared the two measurements.
+
+
While Opotowsky was a graduate student working on this problem, the workflow she created was running on her personal computer and suddenly stopped working. In a panic, she went to her advisor, Paul Wilson, for help, and he pointed her to the UW-Madison Center for High Throughput Computing (CHTC).
+
+
CHTC Research Computing Facilitators came to her aid, and “the support was phenomenal – there was a one-on-one introduction and a tutorial and incredible help via emails and office hours…I had a ton of help along the way.”
+
+
She needed capacity from the CHTC because she used a machine-learning workflow and 10s of case variations. She had a relatively large training database because she used several algorithms and hyperparameter variations and wanted to predict several labels. The sheer magnitude of these training databases is the leading reason why Opotowsky needed the services of the CHTC.
+
+
She used two computation categories, the second of which required a specific capability offered by the CHTC - the ability to scale out a large problem into an ensemble of smaller jobs running in parallel. With 500,000 total entries in the databases and a limit of 10,000 jobs per case submission, Opotowsky split the computations into fifty calculations per job. This method resulted in lower memory needs per job, each taking only a few minutes to run.
+
+
“I don’t think my research would have been possible” without High Throughput Computing (HTC), Opotowsky noted as she reflected on how the CHTC impacted her research. “The main component of my research driving my need [for the CHTC] was the size of my database. It would’ve had to be smaller, have fewer parameter variations, and that ‘fast’ measurement was like a ‘real-world’ scenario; I wouldn’t have been able to have that.”
+
+
Little did Opotowsky know that her experience using HTC would also benefit her professionally. Having HTC experience has helped Opotowsky in job interviews and securing her current position in nuclear security. As a nuclear methods software engineer, “knowledge of designing code and interacting with job submission systems is something I use all the time,” she comments, “[learning HTC] was a wonderful experience to gain” from both a researcher and professional point of view.
+
+
…
+
+
Watch a video recording of Arrielle C. Opotowsky’s talk at HTCondor Week 2022, and browse her slides.
+ Introducing the PATh Facility: A Unique Distributed High Throughput Computing Service
+
+
Researchers can now request credits on the PATh Facility, the PATh project’s new service intended for distributed high throughput computing workflows supporting NSF science.
+
+
+
+
With the launch of the new PATh Facility, the PATh project will soon begin providing the partnership’s first dedicated High Throughput Computing (HTC) capacity directly to researchers with NSF-funded projects. This milestone opens the door to longer runtimes, larger jobs, and greater customization for researchers. PATh is a partnership between the OSG Consortium and the University of Wisconsin-Madison’s Center for High Throughput Computing (CHTC). Jointly, the two entities have provided distributed high-throughput computing services and technologies to the S&E community for several decades.
+
+
+
+
The National Science Foundation (NSF) awards credits to access the PATh Facility, making it well-integrated in the nation’s cyberinfrastructure. Researchers can request computing credits associated with their NSF award, which they ‘cash in’ when they run HTC workloads using the PATh Facility’s services. There are currently two mechanisms to request such credit: researchers can request PATh credits within new proposals,
+or primary investigators (PIs) with existing awards can email their program officer to add to their award. In both cases, researchers outline the kind of HTC capacity they need; PATh’s experts are available to help researchers estimate the different requirements of their HTC workloads.
+
+
Just like the partnership, the PATh Facility is distributed and will eventually include computational resources distributed over six different sites across the nation: the Center for High Throughput Computing at the University of Wisconsin-Madison, the Holland Computing Center at the University of Nebraska-Lincoln, Syracuse University’s Research Computing group, the San Diego Supercomputing Center at University of California San Diego, the Texas Advanced Computing Center at the University of Texas at Austin, and Florida International University’s AMPATH network in Miami. This uniquely distributed resource is intended to handle HTC workloads, all for the support and advancement of NSF-funded open science. With access to the PATh Facility, researchers will have approximately 35,000 modern cores and up to 44 A100 GPUs at their fingertips.
+
+
While the PATh credit ecosystem is still growing, any PATh Facility capacity not used for credit will be available to the Open Science Pool (OSPool) to benefit all open science under a Fair-Share allocation policy. In fact, for researchers familiar with the OSPool, running HTC workloads on the PATh Facility should feel second-nature. Like the OSPool, the PATh Facility is nationally-spanning, geographically distributed, and ideal for HTC workloads. But while resources on the OSPool belong to a diverse range of campuses and organizations that have generously donated their resources to open science, the allocation of capacity in the PATh Facility is managed by the PATh Project itself.
+
+
This distinction enables longer runtimes and larger jobs otherwise infeasible on the OSPool opportunistic resources. This higher degree of control also empowers the PATh team to provide researchers with a more customized level of support. Brian Bockelman, Co-PI of the PATh Project, notes: “With the PATh Facility, we can work with researchers to come up with more bespoke solutions. Whether it’s the configuration of the hardware, the runtime, IPv6 connectivity, or whatever it is that’s not working out –– we have far more ability to change it.”
+
+
Initial facility hardware is ready for immediate use by researchers, and the remainder of the hardware is enroute to its future home. Wisconsin serves as a central hub for testing and development, and PATh Facility resources are tested there before being shipped off to their final destinations. For example, Nebraska’s share of the PATh Facility has already been shipped and is running opportunistic backfill jobs. The lights are beginning to turn on, and as Bockelman likes to say, “we’re turning electrons into science.”
+
+
However, the effort required to make the PATh Facility possible goes beyond shipping hardware and plugging in cables. To truly turn electrons into science,
+creativity and problem-solving will be instrumental. While the NSF is trying out new, innovative ways to award credits, PATh is responsible for credit
+management and tracking. This task has blossomed into an internal service development project –– the PATh development team is working on ensuring that
+the HTCondor Software Suite (HTCSS) can effectively track credit usage across the facility. Additionally, containers are being used as an enabling technology to provide uniform software environments across PATh Facility resources. Kubernetes, an open-source system for automating management of containerized applications, will allow PATh staff to maintain containers not just individually, but site-wide.
+
+
Marking a monumental moment for the PATh Project, the PATh Facility provides dedicated resources directly to researchers for the first time ever. The project has always been focused on advancing and democratizing access to HTC computing at all scales, and the launch of the PATh Facility makes this goal more attainable than ever. Perhaps Bockelman characterizes the facility’s impact best: “I think the unique part is the distributed aspect and the focus on high throughput computing. It extends that vision of HTC as a mechanism that can make an outsized impact on how researchers leverage computing capacity to advance their science.”
+
+
To hear more about the PATh Facility, listen to Brian Bockelman’s talk from the 2022 OSG All-Hands Meeting in March:
+
+
+
+
…
+
+
Request credits for the PATh Facility by contacting NSF. PATh Research Computing Facilitators are here to help –– please reach out to mailto:credit-accounts@path-cc.io with questions about PATh resources, using HTC, or estimating credit needs.
+Learn more about the PATh Facility, credit accounts, and view the 2022 Charge Listing.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/preview-calendar/README.md b/preview-calendar/README.md
new file mode 100644
index 000000000..7cef552f7
--- /dev/null
+++ b/preview-calendar/README.md
@@ -0,0 +1,159 @@
+# CHTC Website
+
+Source repository for CHTC website
+
+![Build Status](https://github.com/CHTC/chtc-website-source/workflows/Build%2Fdeploy%20web%20pages/badge.svg)
+
+## Research Computing Guides Guide
+
+[View Research Computing Guides Guide Here](./_uw-research-computing/README.md)
+
+## How to Edit
+
+### Setup (one time, or anytime you want to start fresh)
+
+1. "Fork" the Github source repository (look for the fork button at the
+top right of this page: https://github.com/CHTC/chtc-website-source).
+1. Clone the source repository to your own computer.
+
+ git clone https://github.com/CHTC/chtc-website-source
+1. `cd` into the `chtc-website-source` folder and add your Github fork to the list of
+remotes:
+
+ git remote add mycopy https://github.com/myusername/chtc-website-source
+
+### Submit a Pull Request (each major change)
+
+1. Create a branch for new work and switch to it:
+
+ git branch feature-name
+ git checkout feature-name
+ Your changes will now be saved in this branch.
+1. Make changes to files and add/commit them, following the usual git add/commit workflow. You
+can test your changes at any time by following the [instructions below](#testing-changes-locally).
+1. Once you're satisfied with your changes and have committed them, push the branch
+to **your fork**:
+
+ git push mycopy feature-name
+1. On Github, go to your fork of the repo. There will likely be a message prompting you
+to open and submit a pull request.
+
+If you need to update the pull requests, make the necessary changes on your computer,
+commit them, and then push the same branch to your fork.
+
+### Update your copy
+
+To update your local copy of the source repository, make sure that you're on the `master`
+branch; then pull from the original CHTC Github repository:
+
+ git checkout master
+ git pull origin master
+
+## Testing Changes on Remote
+
+:exclamation: This is a new feature!
+
+To test changes on a publicly viewable development location do the following steps.
+
+- Populate a branch with the changes you would like to preview and prepend the name of the branch with 'preview-'
+ - For this example we will call our branch 'preview-helloworld'
+- Push the branch to the remote repository at 'https://github.com/CHTC/chtc-website-source.git'
+- View the changes at:
+ - https://chtc.github.io/web-preview//
+ - In this demo we would look in https://chtc.github.io/web-preview/preview-helloworld/
+
+**You can continue to push commits to this branch and have them populate on the preview at this point!**
+
+- When you are satisfied with these changes you can create a PR to merge into master
+- Delete the preview branch and Github will take care of the garbage collection!
+
+## Testing Changes Locally
+
+### Quickstart (Unix Only)
+
+1. Install Docker if you don't already have it on your computer.
+2. Open a terminal and `cd` to your local copy of the `chtc-website-source` repository
+3. Run the `./edit.sh` script.
+4. The website should appear at [http://localhost:8080](http://localhost:8080). Note that this system is missing the secret sauce of our setup that converts
+the pages to an `.shtml` file ending, so links won't work but just typing in the name of a page into the address bar (with no
+extension) will.
+
+### Run via Ruby
+
+```shell
+bundle install
+bundle exec jekyll serve --watch -p
+```
+
+### Run Docker Manually
+
+At the website root:
+
+```
+docker run -it -p 8001:8000 -v $PWD:/app -w /app ruby:2.7 /bin/bash
+```
+
+This will utilize the latest Jekyll version and map port `8000` to your host. Within the container, a small HTTP server can be started with the following command:
+
+```
+bundle install
+bundle exec jekyll serve --watch --config _config.yml -H 0.0.0.0 -P 8000
+```
+
+## Formatting
+
+### Markdown Reference and Style
+
+This is a useful reference for most common markdown features: https://daringfireball.net/projects/markdown/
+
+To format code blocks, we have the following special formatting tags:
+
+ ```
+ Pre-formatted text / code goes here
+ ```
+ {:.sub}
+
+`.sub` will generate a "submit file" styled block; `.term` will create a terminal style, and `.file` can
+be used for any generic text file.
+
+We will be using the pound sign for headers, not the `==` or `--` notation.
+
+For internal links (to a header inside the document), use this syntax:
+* header is written as
+ ```
+ ## A. Sample Header
+ ```
+* the internal link will look like this:
+ ```
+ [link to header A](#a-sample-header)
+ ```
+
+### Converting HTML to Markdown
+
+Right now, most of our pages are written in html and have a `.shtml` extension. We are
+gradually converting them to be formatted with markdown. To easily convert a page, you
+can install and use the `pandoc` converter:
+
+ pandoc hello.shtml --from html --to markdown > hello.md
+
+You'll still want to go through and double check / clean up the text, but that's a good starting point. Once the
+document is converted from markdown to html, the file extension should be `.md` instead. If you use the
+command above, this means you can just delete the `.shtml` version of the file and commit the new `.md` one.
+
+
+### Adding "Copy Code" Button to code blocks in guides
+
+Add .copy to the class and you will have a small button in the top right corner of your code blocks that
+when clicked, will copy all of the code inside of the block.
+
+### Adding Software Overview Guide
+
+When creating a new Software Guide format the frontmatter like this:
+
+software_icon: /uw-research-computing/guide-icons/miniconda-icon.png
+software: Miniconda
+excerpt_separator: <!--more-->
+
+Software Icon and software are how the guides are connected to the Software Overview page. The
+excerpt_seperator must be <!--more--> and can be placed anywhere in a document and all text
+above it will be put in the excerpt.
\ No newline at end of file
diff --git a/preview-calendar/Record.html b/preview-calendar/Record.html
new file mode 100644
index 000000000..d6a5d97bc
--- /dev/null
+++ b/preview-calendar/Record.html
@@ -0,0 +1,357 @@
+
+
+
+
+
+
+OSPool Hits Record Number of Jobs
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
The OSPool processed over 2.6 million jobs during the week of April 14th - 17th this year and ran over half a million jobs on two separate days that week.
+
+
OSPool users and collaborators are smashing records. In April, researchers submitted a record-breaking number of jobs during the week of April 14th – 2.6 million, to be exact. The OSPool also processed over 500k jobs on two separate days during that same week, another record!
+
+
Nearly 60 projects from different fields contributed to the number of jobs processed during this record-breaking week, including these with substantial usage:
+
+
BioMedInfo: University of Pittsburgh PI Erik Wright of the Wright Lab, develops and applies software tools to perform large-scale biomedical informatics on microbial genome sequence data.
+
Michigan_Riles: University of Michigan PI Keith Riles leads the Michigan Gravitational Wave Group, researching continuous gravitational waves.
+
chemml: PI Olexandr Isayev from Carnegie-Mellon University, whose group develops machine learning (ML) models for molecular simulations.
+
CompBinFormMod: Researcher PI Geoffrey Hutchison from the University of Pittsburgh, looking at data-driven ML as surrogates for quantum chemical methods to improve existing processes and next-generation atomistic force fields.
+
+
+
Any researcher tackling a problem that can run as many self-contained jobs can harness the capacity of the OSPool. If you have any questions about the Open Science Pool or how to create an account, please visit the FAQ page on the OSG Help Desk website. Descriptions of active OSG projects can be found here.
+ Resilience: How COVID-19 challenged the scientific world
+
+
In the face of the pandemic, scientists needed to adapt.
+The article below by the Morgridge Institute for Research provides a thoughtful look into how researchers have pivoted in these challenging times to come together and contribute meaningfully in the global fight against COVID-19.
+One of these pivots occurred in the spring of 2020, when Morgridge and the CHTC issued a call for projects investigating COVID-19, resulting in five major collaborations that leveraged the power of HTC.
+
+
For a closer look into how the CHTC and researchers have learned, grown, and adapted during the pandemic, read the full Morgridge article:
+ OSG fuels a student-developed computing platform to advance RNA nanomachines
+
+
How undergraduates at the University of Nebraska-Lincoln developed a science gateway that enables researchers to build RNA nanomachines for therapeutic, engineering, and basic science applications.
+
+
+
+
The UNL students involved in the capstone project, on graduation day. Order from left to right: Evan, Josh, Dan, Daniel, and Conner.
+
+
When a science gateway built by a group of undergraduate students is deployed this fall, it will open the door for researchers to leverage the capabilities of advanced software and the capacity of the Open Science Pool (OSPool). Working under the guidance of researcher Joe Yesselman and longtime OSG contributor Derek Weitzel, the students united advanced simulation technology and a national, open source of high throughput computing capacity –– all within an intuitive, web-accessible science gateway.
+
+
Joe, a biochemist, has been fascinated by computers and mathematical languages for as long as he can remember. Reminiscing to when he first adopted computer programming and coding as a hobby back in high school, he reflects: “English was difficult for me to learn, but for some reason mathematical languages make a lot of sense to me.”
+
+
Today, he is an Assistant Professor of Chemistry at the University of Nebraska-Lincoln (UNL), and his affinity for computer science hasn’t waned. Leading the Yesselman Lab, he relies on the interplay between computation and experimentation to study the unique structural properties of RNA.
+
+
In September of 2020, Joe began collaborating with UNL’s Holland Computing Center (HCC) and the OSG to accelerate RNA nanostructure research everywhere by making his lab’s RNAMake software suite accessible to other scientists through a web portal. RNAMake enables researchers to build nanomachines for therapeutic, engineering, and basic science applications by simulating the 3D design of RNA structures.
+
+
Five UNL undergraduate students undertook this project as part of a year-long computer science capstone experience. By the end of the academic year, the students developed a science gateway –– an intuitive web-accessible interface that makes RNAMake easier and faster to use. Once it’s deployed this fall, the science gateway will put the Yesselman Lab’s advanced software and the shared computing resources of the OSPool into the hands of researchers, all through a mouse and keyboard.
+
+
The gateway’s workflow is efficient and simple. Researchers upload their input files, set a few parameters, and click the submit button –– no command lines necessary. Short simulations will take merely a few seconds, while complex simulations can last up to an hour. Once the job is completed, an email appears in their inbox, prompting them to analyze and download the resulting RNA nanostructures through the gateway.
+
+
This was no small feat. Collaboration among several organizations brought this seemingly simple final product to fruition.
+
+
To begin the process, the students received a number of startup allocations from the Extreme Science and Engineering Discovery Environment (XSEDE). When it was time to build the application, they used Apache Airavata to power the science gateway and they extended this underlying software in some notable ways. In order to provide researchers with more intuitive results, they implemented a table viewer and a 3D molecule visualization tool. Additionally, they added the ability for Airavata to submit directly to HTCondor, making it possible for simulations to be distributed across the resources offered by the OSPool.
+
+
The simulations themselves are small, short, and can be run independently. Furthermore, many of these simulations are needed in order to discover the right RNA nanostructures for each researcher’s purpose. Combined, these qualities make the jobs a perfect candidate for the OSPool’s distributed high throughput computing capabilities, enabled by computing capacity from campuses across the country.
+
+
Commenting on the incorporation of OSG resources, project sponsor Derek Weitzel explains how the gateway “not only makes it easier to use RNAMake, but it also distributes the work on the OSPool so that researchers can run more RNAMake simulations at the same time.” If the scientific process is like a long road trip, using high throughput computing isn’t even like taking the highway –– it’s like skipping the road entirely and taking to the skies in a high-speed jet.
+
+
The science gateway has immense potential to transform the way in which RNA nanostructure research is conducted, and the collaboration required to build it has already made lasting impacts on those involved. The group of undergraduate students are, in fact, no longer undergraduates. The team’s student development manager, Daniel Shchur, is now a software design engineer at Communication System Solutions in Lincoln, Nebraska. Reflecting on the capstone project, he remarks, “I think the most useful thing that my teammates and I learned was just being able to collaborate with outside people. It was definitely something that wasn’t taught in any of our classes and I think that was the most invaluable thing we learned.”
+
+
But learning isn’t just exclusive to students. Joe notes that he gained some unexpected knowledge from the students and Derek. “I learned a ton about software development, which I’m actually using in my lab,” he explains. “It’s very interesting how people can be so siloed. Something that’s so obvious, almost trivial for Derek is something that I don’t even know about because I don’t have that expertise. I loved that collaboration and I loved hearing his advice.”
+
+
In the end, this collaboration vastly improved the accessibility of RNAMake, Joe’s software suite and the focus of the science gateway. Perhaps he explains it best with an analogy: ”RNAMake is basically a set of 500 different LEGO® pieces. Using enthusiastic gestures, Joe continues by offering an example: “Suppose you want to build something from this palm to this palm, in three-dimensional space. It [RNAMake] will find a set of LEGO® pieces that will fit there.”
+
+
+
+
A demonstration of how RNAMake’s design algorithm works. Credit: Yesselman, J.D., Eiler, D., Carlson, E.D. et al. Computational design of three-dimensional RNA structure and function. Nat. Nanotechnol. 14, 866–873 (2019). https://doi.org/10.1038/s41565-019-0517-8
+
+
Since the possible combinations of these LEGO® pieces of RNA are endless, this tool saves users the painstaking work of predicting the structures manually. However, the installation and use of RNAMake requires researchers to have a large amount of command line knowledge –– something that the average biochemist might not have.
+
+
Ultimately, the science gateway makes this previously complicated software suddenly more accessible, allowing researchers to easily, quickly, and accurately design RNA nanostructures.
+
+
These structures are the basis for RNA nanomachines, which have a vast range of applications in society. Whether it be silencing RNAs that are used in clinical trials to cut cancer genes, or RNA biosensors that effectively bind to small molecules in order to detect contaminants even at low concentrations –– the RNAMake science gateway can help researchers design and build these structures.
+
+
Perhaps the most relevant and pressing applications are RNA-based vaccines like Moderna and Pfizer. These vaccines continue to be shipped across cities, countries, and continents to reach people in need, and it’s crucial that they remain in a stable form throughout their journey. Insight from RNA nanostructures can help ensure that these long strands of mRNA maintain stability so that they can eventually make their way into our cells.
+
+
Looking to the future, a second science gateway capstone project is already being planned for next year at UNL. Although it’s currently unclear what field of research it will serve, there’s no doubt that this project will foster collaboration, empower students and researchers, and impact society –– all through a few strokes on a keyboard.
+ Transforming research with high throughput computing
+
+
During the OSG Virtual School Showcase, three different researchers shared how high throughput computing has made lasting impacts on their work.
+
+
+
+
Over 40 researchers and campus research computing staff were selected to attend this year’s OSG Virtual School, all united by a shared desire to learn how high throughput computing can advance their work. During the first two weeks of August, school participants were busy attending lectures, watching demonstrations, and completing hands-on exercises; but on Wednesday, August 11, participants had the chance to hear from researchers who have successfully used high throughput computing (HTC) to transform their work. Year after year, this event –– the HTC Showcase –– is one highlight of the experience for many User School participants. This year, three different researchers in the fields of structural biology, psychology, and particle physics shared how HTC impacted their work. Read the articles below to learn about their stories.
Collectively, these testimonies demonstrate how high throughput computing can transform research. In a few years, the students of this year’s User School might be the next Spencer, Hannah, and Anirvan, representing the new generation of researchers empowered by high throughput computing.
+
+
…
+
+
Visit the materials page to browse slide decks, exercises, and recordings of public lectures from OSG Virtual School 2021.
+
+
Established in 2010, OSG School, typically held each summer at the University of Wisconsin–Madison, is an annual education event for researchers who want to learn how to use distributed high throughput computing methods and tools. We hope to return to an in-person User School in 2022.
+ Scaling virtual screening to ultra-large virtual chemical libraries
+
+
+
+
Kicking off the OSG User School Showcase, Spencer Ericksen, a researcher at the University of Wisconsin-Madison’s Carbone Cancer Center, described how high throughput computing (HTC) has made his work in early-stage drug discovery infinitely more scalable. Spencer works within the Small Molecule Screening Facility, where he partners with researchers across campus to search for small molecules that might bind to and affect the behavior of proteins they study. By using a computational approach, Spencer can help a researcher inexpensively screen many more candidates than possible through traditional laboratory approaches. With as many as 1033 possible molecules, the best binders from computational ‘docking’ might even be investigated as potential drug candidates.
+
+
With traditional laboratory approaches, researchers might test just 100,000 individual compounds using liquid handlers like the one pictured above. However, this approach is expensive, imposing limits both on the number of molecules tested and the number of researchers able to pursue potential binders of the proteins they study.
+
+
Spencer’s use of HTC allows him to take a different approach with virtual screening. By using computational models and machine learning techniques, he can inexpensively filter the masses of molecules and predict which ones will have the highest potential to interfere with a certain biological process. This reduces the time and money spent in the lab by selecting a subset of binding candidates that would be best to study experimentally.
+
+
“HTC is a fabulous resource for virtual screening,” Spencer attests. “We can now effectively validate, develop, and test virtual screening models, and scale to ever-increasing ultra-large virtual chemical libraries.” Today, Spencer is able to screen approximately 3.5 million molecules each day thanks to HTC.
+
+
There are a variety of virtual screening programs, but none of them are all that reliable individually. Instead of opting for a single program, Spencer runs several programs on the Open Science Pool (OSPool) and calculates a consensus score for each potential binder. “It’s a pretty old idea, basically like garnering wisdom from a council of fools,” Spencer explains. “Each program is a weak discriminator, but they do it in different ways. When we combine them, we get a positive effect that’s much better than the individual programs. Since we have the throughput, why not run them all?”
+
+
And there’s nothing stopping the Small Molecule Screening Facility from doing just that. Spencer’s jobs are independent from each other, making them “pleasantly parallelizable” on the OSPool’s distributed resources. To maximize throughput, Spencer splits the compound libraries that he’s analyzing into small increments that will run in approximately 2 hours, reducing the chances of a job being evicted and using the OSPool more efficiently.
+
+
…
+
+
This article is part of a series of articles from the 2021 OSG Virtual School Showcase. OSG School is an annual education event for researchers who want to learn how to use distributed high throughput computing methods and tools. The Showcase, which features researchers sharing how HTC has impacted their work, is a highlight of the school each year.
Thanks to the generous support of the Office of the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation, CHTC has been able to execute a major refresh of hardware. This provided 207 new servers for our systems, representing over 40,000 batch slots of computing capacity. Most of this hardware arrived over the summer and we have started adding them to CHTC systems.
+
+
Continue reading to learn more about the types of servers we are adding and how to access them.
+
+
HTC System
+
+
On the HTC system, we are adding 167 servers of new capacity, representing 36,352 job slots and 40 high-end GPU cards.
+
+
The new servers will be running CentOS Linux 8 – CHTC users should see our website page about how to test your jobs and
+take advantage of servers running CentOS Stream 8. Details on user actions needed for this change can be found on the
+OS transition page.
+
+
New Server specs
+
+
PowerEdge R6525
+
+
+
157 servers with 128 cores / 256 job slots using the AMD Epyc 7763 processor
+
512 GB RAM per server
+
+
+
PowerEdge XE8545
+
+
+
10 servers, each with four A100 SXM4 80GB GPU cards
+
128 cores per server
+
512GB RAM per server
+
+
+
HPC Cluster
+
+
For the HPC cluster, we are adding 40 servers representing 5,120 cores. These servers have arrived but have not yet been added to the HPC cluster. In most cases, when we add them, they will form a new partition and displace some of our oldest servers, currently in the “univ2” partition.
+
+
New server specs:
+
+
Dell Poweredge R6525
+
+
+
128 cores using the AMD Epyc 7763 processor
+
512GB of memory
+
+
+
Users interested in early access to AMD processors before all 40 servers are installed should contact CHTC at chtc@cs.wisc.edu.
+
+
We have also obtained hardware and network infrastructure to completely replace the HPC cluster’s underlying file system and infiniband network fabric. We will be sending more updates to the chtc-users mailing list as we schedule specific transition dates for these major cluster components.
+ Save the dates for Throughput Computing 2023 - a joint HTCondor/OSG event
+
+
Don't miss these in-person learning opportunities in beautiful Madison, Wisconsin!
+
+
Save the dates for Throughput Computing 2023! For the first time, HTCondor Week and the OSG All-Hands Meeting will join together as a single, integrated event from July 10–14 to be held at the University of Wisconsin–Madison’s Fluno Center. Throughput Computing 2023 is sponsored by the OSG Consortium, the HTCondor team, and the UW-Madison Center for High Throughput Computing.
+
+
This will primarily be an in-person event, but remote participation (via Zoom) for the many plenary events will also be offered. Required registration for both components will open in March 2023.
+
+
If you register for the in-person event at the University of Wisconsin–Madison, you can attend plenary and non-plenary sessions, mingle with colleagues, and have planned or ad hoc meetings. Evening events are also planned throughout the week.
+
+
All the topics typically covered by HTCondor Week and the OSG All-Hands Meeting will be included:
+
+
+
Science Enabled by the OSPool and the HTCondor Software Suite (HTCSS)
+
OSG Technology
+
HTCSS Technology
+
HTCSS and OSG Tutorials
+
State of the OSG
+
Campus Services and Perspectives
+
+
+
The U.S. ATLAS and U.S. CMS high-energy physics projects are also planning parallel OSG-related topics during the event on Wednesday, July 12. (For other attendees, Wedneday’s schedule will also include parallel HTCondor and OSG tutorials and OSG Collaborations sessions.)
IN2P3 made the data available via https, but the number of files and their total size made the management of the transfer an engineering challenge. There were two kinds of files to be transferred, with 3.5 million files with a median size of roughly 100 Mb, and another 3.5 million smaller files, with a median size of about 10 megabytes. Total transfer size is roughly 460 Terabytes.
+
+
The Requirements
+
+
The requirement for this transfer was to reliably transfer all the files in a reasonably performant way, minimizing the human time to set up, run, and manage the transfer. Note the noni-goal of optimizing for the fastest possible transfer time – reliability and minimizing the human effort take priority here. Reliability, in this context implies:
+
+
Failed transfers are identified and re-run (with millions of files, a failed transfer is almost inevitable)
+Every file will get transferred
+The operation will not overload the sender, the receiver, or any network in between
+
+
The Inspiration
+
+
Daues presented unrelated work at the 2017 HTCondor Week workshop. At this workshop, he heard about the work of Phillip Papodopolous at UCSD, and his international Data Placement Lab (iDPL). iDPL used HTCondor jobs solely for transferring data between international sites. Daues re-used and adapted some of these ideas for NCSA’s needs.
+
+
The Solution
+
First, Daues installed a “mini-condor”, an HTCondor pool entirely on one machine, with an access point and eight execution slots on that same machine. Then, given a single large file containing the names of all the files to transfer, he ran the Unix split command to create separate files with either 50 of the larger files, or 200 of the smaller files. Finally, using the HTCondor submit file command
+
+
Queue filename matching files *.txt
+
+
the condor_submit command creates one job per split file, which runs the wget2 command and passes the list of filenames to wget2. The HTCondor access point can handle tens of thousands of idle jobs, and will schedule these jobs on the eight execution slots. While more slots would yield more overlapped i/o, eight slots were chosen to throttle the total network bandwidth used. Over the course of days, this machine with eight slots maintained roughly 600 MB/seconds.
+
+
(Note that the machine running HTCondor did not crash during this run, but if it had, all the jobs, after submission, were stored reliably on the local disk, and at such time as the crashed machine restarted, and the init program restarted the HTCondor system, all interrupted jobs would be restarted, and the process would continue without human intervention.)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/preview-calendar/Vagrantfile b/preview-calendar/Vagrantfile
new file mode 100644
index 000000000..308f54f89
--- /dev/null
+++ b/preview-calendar/Vagrantfile
@@ -0,0 +1,102 @@
+# -*- mode: ruby -*-
+# vi: set ft=ruby :
+
+# Comment this out if not using a host-only network
+class VagrantPlugins::ProviderVirtualBox::Action::Network
+ def dhcp_server_matches_config?(dhcp_server, config)
+ true
+ end
+end
+
+# All Vagrant configuration is done below. The "2" in Vagrant.configure
+# configures the configuration version (we support older styles for
+# backwards compatibility). Please don't change it unless you know what
+# you're doing.
+Vagrant.configure("2") do |config|
+ config.vm.define "chtcsite"
+
+ config.vm.hostname = "chtcsite.vm"
+ # The most common configuration options are documented and commented below.
+ # For a complete reference, please see the online documentation at
+ # https://docs.vagrantup.com.
+
+ # Every Vagrant development environment requires a box. You can search for
+ # boxes at https://vagrantcloud.com/search.
+ config.vm.box = "ubuntu/bionic64"
+
+ # Disable automatic box update checking. If you disable this, then
+ # boxes will only be checked for updates when the user runs
+ # `vagrant box outdated`. This is not recommended.
+ # config.vm.box_check_update = false
+
+ # Create a forwarded port mapping which allows access to a specific port
+ # within the machine from a port on the host machine. In the example below,
+ # accessing "localhost:8080" will access port 80 on the guest machine.
+ # NOTE: This will enable public access to the opened port
+ # config.vm.network "forwarded_port", guest: 80, host: 8080
+
+ # Create a forwarded port mapping which allows access to a specific port
+ # within the machine from a port on the host machine and only allow access
+ # via 127.0.0.1 to disable public access
+ # config.vm.network "forwarded_port", guest: 80, host: 8080, host_ip: "127.0.0.1"
+
+ # Create a private network, which allows host-only access to the machine
+ # using a specific IP.
+ # config.vm.network "private_network", ip: "192.168.33.10"
+
+ # Create a public network, which generally matched to bridged network.
+ # Bridged networks make the machine appear as another physical device on
+ # your network.
+ # config.vm.network "public_network"
+
+ # Share an additional folder to the guest VM. The first argument is
+ # the path on the host to the actual folder. The second argument is
+ # the path on the guest to mount the folder. And the optional third
+ # argument is a set of non-required options.
+ config.vm.synced_folder ".", "/chtc-website-source", type: "rsync",
+ rsync__args: ["--verbose", "--archive", "--delete"]
+
+
+ # Provider-specific configuration so you can fine-tune various
+ # backing providers for Vagrant. These expose provider-specific options.
+ # Example for VirtualBox:
+ #
+ # config.vm.provider "virtualbox" do |vb|
+ # # Display the VirtualBox GUI when booting the machine
+ # vb.gui = true
+ #
+ # # Customize the amount of memory on the VM:
+ # vb.memory = "1024"
+ # end
+ #
+ # View the documentation for the provider you are using for more
+ # information on available options.
+
+ config.vm.provision "shell", inline: <<-SHELL
+ apt-get update
+ apt-get install -y make gcc g++
+ echo 'export NO_PUSH=1' > /etc/profile.d/NO_PUSH.sh
+ snap install --classic ruby
+ gem install bundle
+ cd /chtc-website-source
+ runuser -u vagrant -- bundle install
+ runuser -u vagrant -- git config --global user.name "Vagrant"
+ runuser -u vagrant -- git config --global user.email "vagrant@chtcsite.vm"
+ echo
+ echo
+ echo ===============================================================================
+ echo "Setup complete!"
+ echo
+ echo "The repo checkout is in /chtc-website-source."
+ echo
+ echo "Run 'script/cibuild to build the pages, and script/cideploy to deploy them."
+ echo "(cideploy will run all deploy steps except for the actual push.)"
+ echo ""
+ echo "Set BRANCH and TARGET_REPO to test deploying to a different branch"
+ echo "or GitHub repo."
+ echo
+ echo "If you make changes to files outside of the image, run 'vagrant reload'"
+ echo "to restart the VM with these new changes."
+ SHELL
+
+end
diff --git a/preview-calendar/Wilcots.html b/preview-calendar/Wilcots.html
new file mode 100644
index 000000000..9aabd9c00
--- /dev/null
+++ b/preview-calendar/Wilcots.html
@@ -0,0 +1,439 @@
+
+
+
+
+
+
+The Future of Radio Astronomy Using High Throughput Computing
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ The Future of Radio Astronomy Using High Throughput Computing
+
+
Eric Wilcots, UW-Madison dean of the College of Letters & Science and the Mary C. Jacoby Professor of Astronomy, dazzles the HTCondor Week 2022 audience.
+
+
+
+
+
+
“My job here is to…inspire you all with a sense of the discoveries to come that will need to be enabled by” high throughput computing (HTC), Eric Wilcots opened his keynote for HTCondor Week 2022. Wilcots is the UW-Madison dean of the College of Letters & Science and the Mary C. Jacoby Professor of Astronomy.
+
+
Wilcots points out that the black hole image (shown above) is a remarkable feat in the world of astronomy. “Only the third such black hole imaged in this way by the Event Horizon Telescope,” and it was made possible with the help of the HTCondor Software Suite (HTCSS).
+
+
Beginning to build the future
+
+
Wilcots described how in the 1940s, a group of universities recognized that no single university could build a radio telescope necessary to advance science. To access these kinds of telescopes, the universities would need to have the national government involved, as it was the only one with this capability at that time. In 1946, these universities created Associated Universities Incorporated (AUI), which eventually became the management agency for the National Radio Astronomy Observatory (NRAO).
+
+
Advances in radio astronomy rely on current technology available to experts in this field. Wilcots explained that “the science demands more sensitivity, more resolution, and the ability to map large chunks of the sky simultaneously.” New and emerging technologies must continue pushing forward to discover the next big thing in radio astronomy.
+
+
This next generation of science requires more sensitive technology with higher spectra resolution than the Karl G. Jansky Very Large Array (JVLA) can provide. It also requires sensitivity in a particular chunk of the spectrum that neither the JVLA nor Atacama Large Millimeter/submillimeter Array (ALMA) can achieve. Wilcots described just what piece of technology astronomers and engineers need to create to reach this level of sensitivity. “We’re looking to build the Next Generation Very Large Array (ngVLA)…an instrument that will cover a huge chunk of spectrum from 1 GHz to 116 GHz.”
+
+
The fundamentals of the ngVLA
+
+
“The unique and wonderful thing about interferometry, or the basis of radio astronomy,” Wilcots discussed, “is the ability to have many individual detectors or dishes to form a telescope.” Each dish collects signals, creating an image or spectrum of the sky when combined. Because of this capability, engineers working on these detectors can begin to collect signals right away, and as more dishes get added, the telescope grows larger and larger.
+
+
Many individual detectors also mean lots of flexibility in the telescope arrays built, Wilcots explained. Here, the idea is to do several different arrays to make up one telescope. A particular scientific case drives each of these arrays:
+
+
Main Array: a dish that you can control and point accurately but is also robust; it’ll be the workhorse of the ngVLA, simultaneously capable of high sensitivity and high-resolution observations.
+
Short Baseline Array: dishes that are very close together, which allows you to have a large field of view of the sky.
+
Long Baseline Array: spread out across the continental United States. The idea here is the longer the baseline, the higher the resolution. Dishes that are well separated allow the user to get spectacular spatial resolution of the sky. For example, the Event Horizon Telescope that took the image of the black hole is a telescope that spans the globe, which is the longest baseline we can get without putting it into orbit.
+
+
+
+
+
A consensus study report called Pathways to Discovery in Astronomy and Astrophysics for the 2020s (Astro2020) identified the ngVLA as a high priority. The construction of this telescope should begin this decade and be completed by the middle of the 2020s.
+
+
Future of radio astronomy: planet formation
+
+
An area of research that radio astronomers are interested in examining in the future is imaging the formation of planets, Wilcot notes. Right now, astronomers can detect a planet’s presence and deduce specific characteristics, but being able to detect a planet directly is the next huge priority.
+
+
+
+
One place astronomers might be able to do this with something like the ngVLA is in the early phases of planet formation within a planetary system. The thermal emissions from this process are bright enough to be detected by a telescope like the ngVLA. So the idea is to use this telescope to map an image of nearby planetary systems and begin to image the early stages of planet formation directly. A catalog of these planets forming will allow astronomers to understand what happens when planetary systems, like our own, form.
+
+
Future of radio astronomy: molecular systems
+
+
Wilcots explains that radio astronomers have discovered the spectral signature of innumerable molecules within the past fifty years. The ngVLA is being designed to probe, detect, catalog, and understand the origin of complex molecules and what they might tell us about star and planet formation. Wilcots comments in his talk that “this type of work is spawning a new type of science…a remarkable new discipline of astrobiology is emerging from our ability to identify and trace complex organic molecules.”
+
+
Future of radio astronomy: galaxy completion
+
+
Next, Wilcots discusses that radio astronomers want to understand how stars form in the first place and the processes that drive the collapse of clouds of gas into regions of star formations.
+
+
+
+
The gas in a galaxy tends to extend well beyond the visible part of the galaxy, and this enormous gas reservoir is how the galaxy can make stars.
+
+
Astronomers like Wilcots want to know where the gas is, what drives that process of converting the gas into stars, what role the environment might play, and finally, what makes a galaxy stop creating stars.
+
+
ngVLA will be able to answer these questions as it combines the sensitivity and spatial resolution needed to take images of gas clouds in nearby galaxies while also capturing the full extent of that gas.
+
+
Future of radio astronomy: black holes
+
+
Wilcots’ look into the future of radio astronomy finishes with the idea and understanding of black holes.
+
+
Multi-messenger astrophysics helps experts recognize that information about the universe is not simply electromagnetic, as it is known best; there is more than one way astronomers can look at the universe.
+
+
More recently, astronomers have been looking at gravitational waves. In particular, they’ve been looking at how they can find a way to detect the gravitational waves produced by two black holes orbiting around one another to determine each black hole’s mass and learn something about them. As the recent EHT images show, we need radio telescopes’ high resolution and sensitivity to understand the nature of black holes fully.
+
+
A look toward the future
+
+
The next step is for the NRAO to create a prototype of the dishes they want to install for the telescope. Then, it’s just a question of whether or not they can build and install enough dishes to deliver this instrument to its full capacity. Wilcots elaborates, “we hope to transition to full scientific operations by the middle of next decade (the 2030s).”
+
+
The distinguished administrator expressed that “something that’s haunted radio astronomy for a while is that to do the imaging, you have to ‘be in the club,’ ” meaning that not just anyone can access the science coming out of these telescopes. The goal of the NRAO moving forward is to create science-ready data products so that this information can be more widely available to anyone, not just those with intimate knowledge of the subject.
+
+
This effort to make this science more accessible has been part of a budding collaboration between UW-Madison, the NRAO, and a consortium of Historically Black Colleges and Universities and other Minority Serving Institutions in what is called Project RADIAL.
+
+
“The idea behind RADIAL is to broaden the community; not just of individuals engaged in radio astronomy, but also of individuals engaged in the computing that goes into doing the great kind of science we have,” Wilcots explains.
+
+
On the UW-Madison campus in the Summer of 2022, half a dozen undergraduate students from the RADIAL consortium will be on campus doing summer research. The goal is to broaden awareness and increase the participation of communities not typically involved in these discussions in the kind of research in the radial astronomy field.
+
+
“We laid the groundwork for a partnership with a number of these institutions, and that partnership is alive and well,” Wilcots remarks, “so stay tuned for more of that, and we will be advancing that in the upcoming years.”
+
+
…
+
+
Watch a video recording of Eric Wilcots’ talk at HTCondor Week 2022.
+ Using HTC expanded scale of research using noninvasive measurements of tendons and ligaments
+
+
With this technique and the computing power of high throughput computing (HTC) combined, researchers can obtain thousands of simulations to study the pathology of tendons
+and ligaments.
+
+
A recent paper published in the Journal of the Mechanical Behavior of Biomedical Materials by former Ph.D.
+student in the Department of Mechanical Engineering (and current post-doctoral researcher at the University of Pennsylvania)
+Jonathon Blank and John Bollinger Chair of Mechanical Engineering
+Darryl Thelen used the Center for High Throughput Computing (CHTC) to obtain their results.
+Results that, Blank says, would not have been obtained at the same scale without HTC. “[This project], and a number of other projects, would have had a very small snapshot of the
+problem at hand, which would not have allowed me to obtain the understanding of shear waves that I did. Throughout my time at UW, I ran tens of thousands of simulations — probably
+even hundreds of thousands.”
+
+
+
+
Using noninvasive sensors called shear wave tensiometers, researchers on this project applied HTC to study tendon structure and function. Currently, research in this field is hard
+to translate because most assessments of tendon and ligament structure-function relationships are performed on the benchtop in a lab, Blank explains. To translate the benchtop
+experiments into studying tendons in humans, the researchers use tensiometers as a measurement tool, and this study developed from trying to better understand these measurements
+and how they can be applied to humans. “Tendons are very complex materials from an engineering perspective. When stretched, they can bear loads far exceeding your body weight, and
+interestingly, even though they serve their roles in transmitting force from muscle to bone really well, the mechanisms that give rise to injury and pathology in these tissues aren’t
+well understood.”
+
+
+
+
In living organisms, researchers have used tensiometers to study the loading of muscles and tendons, including the triceps surae, which connects to the Achilles tendon, Blank notes.
+Since humans are variable regarding the size, stiffness, composition, and length of their tendons or ligaments, it’s “challenging to use a model to accurately represent a parameter
+space of human biomechanics in the real world. High throughput computing is particularly useful for our field just because we can readily express that variability at a large scale”
+through HTC. With Thelen and Orthopedics and Rehabilitation assistant professor Josh Roth, Blank developed a pipeline for
+simulating shear wave propagation in tendons and ligaments with HTC, which Blank and Thelen used in the paper.
+
+
With HTC, the researchers of this paper were able to further explore the mechanistic causes of changes in wave speed. “The advantage of this technique is being able to fully explore
+an input space of different stiffnesses, geometries, microstructures, and applied forces. The advantage of the capabilities offered by the CHTC is that we can fill the entire input
+space, not just between two data points, and thereby study changes in shear wave speed due to physiological factors and the mechanical underpinning driving those changes,” Blank
+elaborates.
+
+
It wasn’t challenging to implement, Blank states, since facilitators were readily available to help and meet with him. When he first started using HTC, Blank attended the CHTC
+office hours to get answers to his questions, even during COVID-19; during this time, there were also numerous one-on-one meetings. Having this backbone of support from the CHTC
+research facilitators propelled Blank’s research and made it much easier. “For a lot of modeling studies, you’ll have this sparse input space where you change a couple of parameters
+and investigate the sensitivity of your model that way. But it’s hard to interpret what goes on in between, so the CHTC quite literally saved me a lot of time. There were some
+1,000 simulations in the paper, and HTC by scaling out the workload turned a couple thousand hours of simulation time into two or three hours of wall clock time. It’s a unique tool
+for this kind of research.”
+
+
The next step from this paper’s findings, Blank describes, is providing subject-specific measurements of wave speeds. This involves “understanding if when we use a tensiometer on
+someone’s Achilles tendon, for example, can we account for the tendon’s shape, size, injury status, etcetera — all of these variables matter when measuring shear wave speeds.”
+Researchers from the lab can then use wearable tensiometers to measure tension in the Achilles and other tendons to study human movement in the real world.
+
+
From his CHTC-supported studies, Blank learned how to design computational research, diagnose different parameter spaces, and manage data. “For my field, it [HTC] is very important
+because people are extremely variable — so our models should be too. The automation and capacity enabled by HTC makes it easy to understand whether our models are useful, and if
+they are, how best to tune them to inform human biomechanics,” Blank says.
CHTC’s specialty is High
+Throughput Computing (HTC), which involves breaking up a single large
+computational task into many smaller tasks for the fastest overall
+turnaround. Most of our users find HTC to be invaluable
+in accelerating their computational work and thus their research.
+We support thousands of multi-core computers and use the task
+scheduling software called HTCondor, developed right here in Madison, to
+run thousands of independent jobs on as many total processors as
+possible. These computers, or “machines”, are distributed across several
+collections that we call pools (similar to “clusters”). Because machines are
+assigned to individual jobs, many users can be running jobs on a pool at any
+given time, all managed by HTCondor.
+
+
The diagram below shows some of the largest pools on campus and also
+shows our connection to the US-wide OS Pool where UW computing
+work can “backfill” available computers all over the country. The number
+under each resource name shows an approximate number of computing hours
+available to campus researchers for a typical week in Fall 2013. As
+demonstrated in the diagram, we help users to submit their work not only
+to our CHTC-owned machines, but to improve their throughput even further
+by seamlessly accessing as many available computers as possible, all
+over campus AND all over the country.
+
+
The vast majority of the computational work that campus researcher have
+is HTC, though we are happy to support researchers with a variety of
+beyond-the-desktop needs, including tightly-coupled computations (e.g.
+MPI), high-memory work (e.g. metagenomics), and specialized
+hardware like GPUs.
+
+
+
+
What kinds of applications run best in the CHTC?
+
+
“Pleasantly parallel” tasks, where many jobs can run independently,
+is what works best in the CHTC, and is what we can offer the greatest
+computational capacity for.
+Analyzing thousands of images, inferring statistical significance of hundreds of
+thousands of samples, optimizing an electric motor design with millions
+of constraints, aligning genomes, and performing deep linguistic search
+on a 30 TB sample of the internet are a few of the applications that
+campus researchers run every day in the CHTC. If you are not sure if
+your application is a good fit for CHTC resources, get in
+touch and we will be happy to help you figure it out.
+
+
Within a single compute system, we also support GPUs, high-memory
+servers, and specialized hardware owned by individual research groups.
+For tightly-coupled computations (e.g. MPI and similar programmed
+parallelization), our resources include an HPC Cluster, with faster
+inter-node networking.
+
+
How to Get Access
+
+
While you may be excited at the prospect of harnessing 100,000 compute
+hours a day for your research, the most valuable thing we offer is,
+well, us. We have a small, yet dedicated team of professionals who eat,
+breathe and sleep distributed computing. If you are a UW-Madison Researcher, you can request an
+account, and one of our dedicated Research Computing
+Facilitators will follow up to provide specific recommendations to
+accelerate YOUR science.
` all receive top and bottom margins. We nuke the top\n// margin for easier control within type scales as it avoids margin collapsing.\n\n%heading {\n margin-top: 0; // 1\n margin-bottom: $headings-margin-bottom;\n font-family: $headings-font-family;\n font-style: $headings-font-style;\n font-weight: $headings-font-weight;\n line-height: $headings-line-height;\n color: $headings-color;\n}\n\nh1 {\n @extend %heading;\n @include font-size($h1-font-size);\n}\n\nh2 {\n @extend %heading;\n @include font-size($h2-font-size);\n}\n\nh3 {\n @extend %heading;\n @include font-size($h3-font-size);\n}\n\nh4 {\n @extend %heading;\n @include font-size($h4-font-size);\n}\n\nh5 {\n @extend %heading;\n @include font-size($h5-font-size);\n}\n\nh6 {\n @extend %heading;\n @include font-size($h6-font-size);\n}\n\n\n// Reset margins on paragraphs\n//\n// Similarly, the top margin on `
`s get reset. However, we also reset the\n// bottom margin to use `rem` units instead of `em`.\n\np {\n margin-top: 0;\n margin-bottom: $paragraph-margin-bottom;\n}\n\n\n// Abbreviations\n//\n// 1. Duplicate behavior to the data-bs-* attribute for our tooltip plugin\n// 2. Add the correct text decoration in Chrome, Edge, Opera, and Safari.\n// 3. Add explicit cursor to indicate changed behavior.\n// 4. Prevent the text-decoration to be skipped.\n\nabbr[title],\nabbr[data-bs-original-title] { // 1\n text-decoration: underline dotted; // 2\n cursor: help; // 3\n text-decoration-skip-ink: none; // 4\n}\n\n\n// Address\n\naddress {\n margin-bottom: 1rem;\n font-style: normal;\n line-height: inherit;\n}\n\n\n// Lists\n\nol,\nul {\n padding-left: 2rem;\n}\n\nol,\nul,\ndl {\n margin-top: 0;\n margin-bottom: 1rem;\n}\n\nol ol,\nul ul,\nol ul,\nul ol {\n margin-bottom: 0;\n}\n\ndt {\n font-weight: $dt-font-weight;\n}\n\n// 1. Undo browser default\n\ndd {\n margin-bottom: .5rem;\n margin-left: 0; // 1\n}\n\n\n// Blockquote\n\nblockquote {\n margin: 0 0 1rem;\n}\n\n\n// Strong\n//\n// Add the correct font weight in Chrome, Edge, and Safari\n\nb,\nstrong {\n font-weight: $font-weight-bolder;\n}\n\n\n// Small\n//\n// Add the correct font size in all browsers\n\nsmall {\n @include font-size($small-font-size);\n}\n\n\n// Mark\n\nmark {\n padding: $mark-padding;\n background-color: $mark-bg;\n}\n\n\n// Sub and Sup\n//\n// Prevent `sub` and `sup` elements from affecting the line height in\n// all browsers.\n\nsub,\nsup {\n position: relative;\n @include font-size($sub-sup-font-size);\n line-height: 0;\n vertical-align: baseline;\n}\n\nsub { bottom: -.25em; }\nsup { top: -.5em; }\n\n\n// Links\n\na {\n color: $link-color;\n text-decoration: $link-decoration;\n\n &:hover {\n color: $link-hover-color;\n text-decoration: $link-hover-decoration;\n }\n}\n\n// And undo these styles for placeholder links/named anchors (without href).\n// It would be more straightforward to just use a[href] in previous block, but that\n// causes specificity issues in many other styles that are too complex to fix.\n// See https://github.com/twbs/bootstrap/issues/19402\n\na:not([href]):not([class]) {\n &,\n &:hover {\n color: inherit;\n text-decoration: none;\n }\n}\n\n\n// Code\n\npre,\ncode,\nkbd,\nsamp {\n font-family: $font-family-code;\n @include font-size(1em); // Correct the odd `em` font sizing in all browsers.\n direction: ltr #{\"/* rtl:ignore */\"};\n unicode-bidi: bidi-override;\n}\n\n// 1. Remove browser default top margin\n// 2. Reset browser default of `1em` to use `rem`s\n// 3. Don't allow content to break outside\n\npre {\n display: block;\n margin-top: 0; // 1\n margin-bottom: 1rem; // 2\n overflow: auto; // 3\n @include font-size($code-font-size);\n color: $pre-color;\n\n // Account for some code outputs that place code tags in pre tags\n code {\n @include font-size(inherit);\n color: inherit;\n word-break: normal;\n }\n}\n\ncode {\n @include font-size($code-font-size);\n color: $code-color;\n word-wrap: break-word;\n\n // Streamline the style when inside anchors to avoid broken underline and more\n a > & {\n color: inherit;\n }\n}\n\nkbd {\n padding: $kbd-padding-y $kbd-padding-x;\n @include font-size($kbd-font-size);\n color: $kbd-color;\n background-color: $kbd-bg;\n @include border-radius($border-radius-sm);\n\n kbd {\n padding: 0;\n @include font-size(1em);\n font-weight: $nested-kbd-font-weight;\n }\n}\n\n\n// Figures\n//\n// Apply a consistent margin strategy (matches our type styles).\n\nfigure {\n margin: 0 0 1rem;\n}\n\n\n// Images and content\n\nimg,\nsvg {\n vertical-align: middle;\n}\n\n\n// Tables\n//\n// Prevent double borders\n\ntable {\n caption-side: bottom;\n border-collapse: collapse;\n}\n\ncaption {\n padding-top: $table-cell-padding-y;\n padding-bottom: $table-cell-padding-y;\n color: $table-caption-color;\n text-align: left;\n}\n\n// 1. Removes font-weight bold by inheriting\n// 2. Matches default `
` alignment by inheriting `text-align`.\n// 3. Fix alignment for Safari\n\nth {\n font-weight: $table-th-font-weight; // 1\n text-align: inherit; // 2\n text-align: -webkit-match-parent; // 3\n}\n\nthead,\ntbody,\ntfoot,\ntr,\ntd,\nth {\n border-color: inherit;\n border-style: solid;\n border-width: 0;\n}\n\n\n// Forms\n//\n// 1. Allow labels to use `margin` for spacing.\n\nlabel {\n display: inline-block; // 1\n}\n\n// Remove the default `border-radius` that macOS Chrome adds.\n// See https://github.com/twbs/bootstrap/issues/24093\n\nbutton {\n // stylelint-disable-next-line property-disallowed-list\n border-radius: 0;\n}\n\n// Explicitly remove focus outline in Chromium when it shouldn't be\n// visible (e.g. as result of mouse click or touch tap). It already\n// should be doing this automatically, but seems to currently be\n// confused and applies its very visible two-tone outline anyway.\n\nbutton:focus:not(:focus-visible) {\n outline: 0;\n}\n\n// 1. Remove the margin in Firefox and Safari\n\ninput,\nbutton,\nselect,\noptgroup,\ntextarea {\n margin: 0; // 1\n font-family: inherit;\n @include font-size(inherit);\n line-height: inherit;\n}\n\n// Remove the inheritance of text transform in Firefox\nbutton,\nselect {\n text-transform: none;\n}\n// Set the cursor for non-`
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Feedback or content questions:
+ send email to "condor-admin" at the cs.wisc.edu server
+
+ Technical or accessibility issues:
+ chtc@cs.wisc.edu
+
+
+
+
+
+
+
+
diff --git a/preview-calendar/includes/chtc_on_campus.png b/preview-calendar/includes/chtc_on_campus.png
new file mode 100644
index 000000000..b77707bb1
Binary files /dev/null and b/preview-calendar/includes/chtc_on_campus.png differ
diff --git a/preview-calendar/includes/chtcusers.jpg b/preview-calendar/includes/chtcusers.jpg
new file mode 100644
index 000000000..f647dbcaa
Binary files /dev/null and b/preview-calendar/includes/chtcusers.jpg differ
diff --git a/preview-calendar/includes/chtcusers_400.jpg b/preview-calendar/includes/chtcusers_400.jpg
new file mode 100644
index 000000000..efa7e6675
Binary files /dev/null and b/preview-calendar/includes/chtcusers_400.jpg differ
diff --git a/preview-calendar/includes/chtcusers_L.jpg b/preview-calendar/includes/chtcusers_L.jpg
new file mode 100644
index 000000000..fac6f5084
Binary files /dev/null and b/preview-calendar/includes/chtcusers_L.jpg differ
diff --git a/preview-calendar/includes/cron-generated/dynamic-resources-noedit.html b/preview-calendar/includes/cron-generated/dynamic-resources-noedit.html
new file mode 100644
index 000000000..f4161e936
--- /dev/null
+++ b/preview-calendar/includes/cron-generated/dynamic-resources-noedit.html
@@ -0,0 +1,39 @@
+
Pool/Mem
≥1GB
≥2GB
≥4GB
≥8GB
≥16GB
≥32GB
≥64GB
+
cm.chtc.wisc.edu
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
+
condor.cs.wisc.edu
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
+
condor.cae.wisc.edu
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
+
Totals
+
0
+
0
+
0
+
0
+
0
+
0
+
0
+
+
+
As of Mon Jun 12 07:30:02 CDT 2017
diff --git a/preview-calendar/includes/jahns/Bundle Proximity Losses Paragraph.doc b/preview-calendar/includes/jahns/Bundle Proximity Losses Paragraph.doc
new file mode 100644
index 000000000..2695061cf
Binary files /dev/null and b/preview-calendar/includes/jahns/Bundle Proximity Losses Paragraph.doc differ
diff --git a/preview-calendar/includes/jahns/Thumbs.db b/preview-calendar/includes/jahns/Thumbs.db
new file mode 100644
index 000000000..dd3f84614
Binary files /dev/null and b/preview-calendar/includes/jahns/Thumbs.db differ
diff --git a/preview-calendar/includes/old.intro.html b/preview-calendar/includes/old.intro.html
new file mode 100755
index 000000000..7b02b4a01
--- /dev/null
+++ b/preview-calendar/includes/old.intro.html
@@ -0,0 +1,26 @@
+
Research is a computationally expensive endeavor, demanding on any computing resources available. Quite often, a researcher will require resources for computations for short bursts of time, frequently leaving the computer idle. This often results in wasted potential computation time. This issue can be addressed by means of high-throughput computing.
+
High-throughput computing allows for many computational tasks to be done over a long period of time. It is concerned largely with the number of compute resources that are available to people who wish to use the system. It is a very useful system for researchers, who are more concerned with the number of computations they can do over long spans of time than they are with short-burst computations. Because of its value to research computations, the Univeristy of Wisconsin set up the Center for High-Throughput Computing to bring researchers and compute resources together.
+
The Center for High-Throughput Computing (CHTC), approved in August 2006, has numerous resources at its disposal to keep up with the computational needs of UW Madison. These resources are being funded by the National Institute of Health (NIH), the Department of Energy (DOE), the National Science Foundation (NSF), and various grants from the University itself. Email us to see what we can do to help automate your research project at chtc@cs.wisc.edu It aims to pull four different resources together into one operation:
+
+
HTC Technologies: The CHTC leans heavily on the HTCondor project to provide a framework where high-throughput computing can take place. The HTCondor project aims to make grid and high-throughput computing a reality in any number of environments.
+
Dedicated Resources: CHTC HTCondor pool
+ The CHTC cluster is now composed of 1900 cores for use by researches across our campus. These rack mounted blade systems run Linux. Each core is 2.8Ghz with 1.5GB RAM or better. CHTC has provided 10 million CPU hours of research computation between 05/17/2008 and 02/23/2010 prior to the additional 960 cores. With the recent server purchase, CHTC provides in excess of 37,000 CPU hours per day.
+
Middleware: The GRIDS branch at UW Madison will be an essential part towards keeping the CHTC running efficiently. GRIDS is funded by the NSF Middleware Initiative (NMI). At the University of Wisconsin, the HTCondor project makes heavy use of this system with their NMI Build & Test facility. The NMI Build & Test facility provides a framework to build and test software on a wide variety of platform and hardware combinations.
+
Computing Laboratory: The University of Wisconsin has many compute clusters at its disposal. In 2004 the university won an award to build the Grid Laboratory of Wisconsin (GLOW). GLOW is an interdepartmental pool of HTCondor nodes, containing 3000 CPUs and about 1 PB of storage.
+
+
+The University of Wisconsin-Madison (UW-Madison) campus is an excellent match for meeting the computational needs of your project. Existing UW technology infrastructure that can be leveraged includes CPU capacity, network connectivity, storage availability, and middleware connectivity. But perhaps most important, the UW has significant staff experience and core competency in deploying, managing, and using computational technology.
+
+
+To reiterate: The UW launched and funded the Center for High Throughput Computing (CHTC), a campus-wide organization dedicated to supercharging research on campus by working side-by-side with you, the domain scientists on infusing high throughput computing and grid computing techniques into your routine. Between the CHTC and the aforementioned HTCondor Project, the UW is home to over 20 full-time staff with a proven track record of making compute middleware work for scientists. Far beyond just being familiar with deployment and use of such software, UW staff has been intimately involved in its design and implementation.
+
+
+Applications: Many researchers are already using these facilities. More information about a sampling of those using the CHTC can be found here.
+And less recent projects in CHTC Older projects.
+
+
+ The Center for High Throughput Computing (CHTC), established in 2006, aims to bring the power
+ of High Throughput Computing to all fields of research, and to allow the future of HTC to be shaped
+ by insight from all fields.
+
+
+
+
+
+
+ Are you a UW-Madison researcher looking to expand your computing beyond your local resources? Request
+ an account now to take advantage of the open computing services offered by the CHTC!
+
+ High Throughput Computing is a collection of principles and techniques which maximize the effective throughput
+ of computing resources towards a given problem. When applied for scientific computing, HTC can result in
+ improved use of a computing resource, improved automation, and help drive the scientific problem forward.
+
+
+ The team at CHTC develops technologies and services for HTC. CHTC is the home of the HTCondor Software
+ Suite which has over 30 years of experience in tackling HTC problems;
+ it manages shared computing resources
+ for researchers on the UW-Madison campus; and it leads the OSG Consortium,
+ a national-scale environment for distributed HTC.
+
The HTCondor Software Suite (HTCSS) provides sites and users with the ability to manage and execute
+HTC workloads. Whether it’s managing a single laptop or 250,000 cores at CERN, HTCondor
+helps solve computational problems through the application of the HTC principles.
CHTC manages over 20,000 cores and dozens of GPUs for the UW-Madison
+campus; this resource, which is free and shared, aims to advance the
+mission of the University of Wisconsin in support of the Wisconsin
+Idea. Researchers can place their workloads on an access point at
+CHTC and utilize the resources at CHTC, across the campus, and across
+the nation.
+
+
Research Facilitation
+
CHTC’s Research Facilitation team empowers researchers to utilize computing to achieve
+their goals. The Research Facilitation approach emphasizes teaching users skills and
+methodologies to manage & automate workloads on resources like those at CHTC, the campus,
+or across the world.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ As part of its many services to UW-Madison and beyond,
+ the CHTC is home to or supports the following Research
+ Projects and Partners.
+
+ The OSG is a consortium of research collaborations, campuses, national laboratories and software
+ providers dedicated to the advancement of all of open science via the practice of distributed High Throughput
+ Computing (dHTC), and the advancement of its state of the art. The OSG operates a fabric of dHTC services
+ for the national science and engineering community and CHTC has been a major force in OSG since its inception
+ in 2005.
+
+ The Partnership to Advance Throughput Computing (PATh) is a partnership between
+ CHTC and OSG to advance throughput computing. Funded through a major investment
+ from NSF, PATh helps advance HTC at a national level through support for
+ HTCSS and provides a fabric of services for the NSF science and engineering community
+ to access resources across the nation.
+
+ The Morgridge Institute for Research is a private, biomedical research institute
+ located on the UW-Madison campus. Morgridge’s Research Computing Theme is a unique
+ partner with CHTC, investing in the vision of HTC and its ability to advance basic
+ research.
+
+ The Path to Internship and Fellowship Opportunities
+
+
+
+ Want to make a difference and see your work directly
+ impact science across the globe?
+
+
+ The Center for High Throughput Computing offers internship and summer fellowship opportunities
+ for undergraduate and graduate students.
+
+
+
The Mission
+
CHTC is a research computing organization located within the University
+ of Wisconsin-Madison CS Department and at the Morgridge Institute for
+ Research. CHTC is an internationally recognized leader in high
+ throughput computing and provides access to free large-scale computing
+ capacity for research. CHTC advances the field of research computing
+ through innovative software and services, leading distributed computing
+ projects across the campus and the nation.
+
+
+
The CHTC Fellows Program
+
Our Fellows Program provides undergraduate and graduate students with
+ learning opportunities in research computing, system administration,
+ and facilitation. Working with engineers and
+ dedicated mentors, fellows will have the opportunity to learn
+ from leaders in their field and access state of the art computing
+ facilities. Fellows can also attend workshops, lectures and social
+ and recreational events with CHTC team members. Learn more about the CHTC Fellows Program.
+
+
+
We Value Diversity
+
CHTC and Morgridge are committed to increasing diversity among
+ interns and staff. We believe that advancing throughput computing
+ and scientific research is enhanced by a wide range of backgrounds
+ and perspectives.
+
+
+
+
+
+
Student Hourly Positions (Undergrad and Grad)
+
+ We're always looking for smart motivated students to partner with
+ software developer and system administrator mentors. We expect students
+ to work between 10 and 20 hours a week, with 10 of those being during
+ business hours, with some flexibility on remaining hours. During the
+ summer and breaks, it is possible to work up to 29 hours per week.
+
The Morgridge Institute for Research, Research Computing group is looking for a student web developer to contribute to the development of web-based tools and applications that support the research community.
+
+
The position will work 10-20 hours per week, based upon the schedule and availability of the student. Work hours will be flexible, but mostly between 8am – 5pm, Monday – Friday.
The Morgridge Institute for Research, Research Computing group is looking for a student science writer to promote the impacts of the NSF Partnership to Advance Throughput Computing (PATh) project (and other projects) on research and campuses across the country. The position will provide opportunities to develop science writing and communication skills, learn about research across multiple domains, to develop an understanding of computational research methods and computing technologies, and to advance the understanding of these concepts by relevant audiences.
+
+
The position will work 10-20 hours per week, based upon the schedule and availability of the student. Work hours will be flexible, but mostly between 8am – 5pm, Monday – Friday.
+ If advancing the state of the art of distributed computing in an
+ academic environment interests you, the Center for High Throughput
+ Computing (CHTC) at the University of Wisconsin-Madison (UW) offers a
+ unique working environment. Our project’s home is in the UW Department
+ of Computer Sciences, an internationally recognized department
+ consistently ranked in the top ten across the USA.
+
+
+ A position with CHTC
+ will provide you the opportunity to interact with both department
+ faculty and students to translate novel ideas into real-world solutions.
+ The software and infrastructure you will be working on is used by
+ scientists and engineers at hundreds of institutions, from universities
+ to national laboratories and from large high tech corporations to small
+ animation teams.
+
+
+
Internships
+
+ Our internship program provides undergraduate and graduate students with learning opportunities in
+ research computing, system administration, web development, and communication.
+
+ Details about our open full-time positions are typically provided below.
+ Positions pertaining to "HTCondor" and "CHTC" search terms can also
+ be found on the University's
+ Position Vacancy List
+ (PVL).
+
The Center for High Throughput Computing (CHTC) seeks a Research Computing Facilitator to support the goals of a diverse set of researchers who use computing for their research. This is an ideal position for an academic researcher who has used computational approaches in their own work and is strongly motivated to support and empower the work of other researchers through access to large-scale computing resources.
+ We're always looking for smart motivated students to partner with
+ software developer and system administrator mentors. We expect students
+ to work between 10 and 20 hours a week, with 10 of those being during
+ business hours, with some flexibility on remaining hours. During the
+ summer and breaks, it is possible to work up to 29 hours per week.
+
The Morgridge Institute for Research, Research Computing group is looking for a student web developer to contribute to the development of web-based tools and applications that support the research community.
+
+
The position will work 10-20 hours per week, based upon the schedule and availability of the student. Work hours will be flexible, but mostly between 8am – 5pm, Monday – Friday.
The Morgridge Institute for Research, Research Computing group is looking for a student science writer to promote the impacts of the NSF Partnership to Advance Throughput Computing (PATh) project (and other projects) on research and campuses across the country. The position will provide opportunities to develop science writing and communication skills, learn about research across multiple domains, to develop an understanding of computational research methods and computing technologies, and to advance the understanding of these concepts by relevant audiences.
+
+
The position will work 10-20 hours per week, based upon the schedule and availability of the student. Work hours will be flexible, but mostly between 8am – 5pm, Monday – Friday.
+ The University of Wisconsin-Madison is a great place to work. You can
+ read about the benefits in detail
+ elsewhere. In short, we have
+ five weeks of vacation/personal time per year, very good health
+ insurance (and cost effective for entire families), and a good
+ retirement plan. Please note that the minimum salary in our job listings
+ are just that - the minimum. Compensation will increase with experience.
+
+
+
In addition to the official benefits, there are many side benefits:
+
+
+
You will work with the CHTC team. We are world leaders in solving
+ interesting distributed computing problems!
+
+
You can attend interesting talks in the department
+
Relatively flexible working hours — we value work-life balance.
We're in a lively neighborhood with great restaurants in easy
+ walking distance.
+
+
+
+
+
+
+ If you are interested in a position with CHTC, explore the job listings
+ below! If you would like to apply, send your resume and cover letter to
+ chtc-jobs@g-groups.wisc.edu, and indicate
+ which job you would like to apply for.
+
+
+
Please note:
+
+
+
A criminal background check will be conducted prior to hiring.
+
A period of evaluation will be required.
+
UW-Madison is an equal opportunity/affirmative action employer. We
+ promote excellence through diversity and encourage all qualified
+ individuals to apply.
+
+ Machine learning insights into molecular science using the Open Science Pool
+
+
Machine learning insights into molecular science using the Open Science Pool
+Computation has extended what researchers can investigate in chemistry, biology, and material science. Studying complex systems like proteins or nanocomposites can use similar techniques for common challenges. For example, computational power is expanding the horizons of protein research and opening up vast new possibilities for drug discovery and disease treatment.
+
+
Olexandr Isayev is an assistant professor at the School of Pharmacy, University of North Carolina (UNC) at Chapel Hill. Isayev is part of a group at UNC using machine learning for chemical problems and material science.
+
+
“Specifically, we apply machine learning to chemical and material science data to understand the data, find patterns in it, and make predictive models,” says Isayev. “We focus on three areas: computer-aided design of novel materials, computational drug discovery, and acceleration of quantum mechanical methods with GPUs (graphic processing units) and machine learning.”
+
+
For studying drug discovery, where small organic molecule binds to a protein receptor, Isayev uses machine learning to build predictive models based on historical collection of experimental data. “We want to challenge models and find a new molecule with better binding properties,” says Isayev.
+
+
+
+
+
+
+
Protein Model
+
Example of a protein model that Isayev and his group study. Courtesy image.
+
+
+
+
Similar to the human genome project, five years ago President Obama created a new Materials Genome Initiative to accelerate the design of new materials. Using machine learning methods based on the crystal structure of the material he is studying, Isayev can predict its physical properties.
+
+
“Looking at a molecule or material based on geometry and topology, we can get the energy, and predict critical physical properties,” says Isayev. “This machine learning allows us to avoid many expensive uses of numeric simulation to understand the material.”
+
+
The challenge for Isayev’s group is that initial data accumulation is extremely numerically time consuming. So, they use the Open Science Pool to run simulations. Based on the data, they train their machine learning model, so the next time, instead of a time-consuming simulation model, they can use the machine learning model on a desktop PC.
+
+
“Using machine learning to do the preliminary screening saves a lot of computing time,” says Isayev. “Since we performed the hard work, scientists can save a lot of time by prioritizing a few promising candidate materials instead of running everything.”
+
+
For studying something like a photovoltaic semiconductor, Isayev selects a candidate after running about a thousand of quantum mechanical calculations. He then uses machine learning to screen 50,000 materials. “You can do this on a laptop,” says Isayev. “We prioritize a few—like ten to fifty. We can predict what to run next instead of running all of them. This saves a lot of computing time and gives us a powerful tool for screening and prioritization.”
+
+
On the OSG, they run “small density function (DFT) calculations. We are interested in molecular properties,” says Isayev. “We run a program package called ORCA (Quantum Chemistry Program), a free chemistry package. It implements lots of QM methods for molecules and crystals. We use it and then we have our own scripts, run them on the OSG, collect the data, and then analyze the data.”
+
+
“I am privileged to work with extremely talented people like Roman Zubatyuk,” says Isayev. Zubatyuk works with Isayev on many different projects. “Roman has developed our software ecosystem container using Docker. These simulations run locally on our machines through the Docker virtual environment and eliminate many issues. With a central database and set of scripts, we could seamlessly run hundreds of thousands of simulations without any problems.”
+
+
Finding new materials and molecules are hard science problems. “There is no one answer when looking for a new molecule,” says Isayev. “We cannot just use brute force. We have to be creative because it is like looking for a needle in a hay stack.”
+
+
For something like a solar cell device, researchers might find a drawback in the performance of the material. “We are looking to improve current materials, improve their performance, or make them cheaper, so we can move them to mass production so everyone benefits,” says Isayev.
+
+
“For us, the OSG is a fantastic resource for which we are very grateful,” says Isayev. “It gives us access to computation that enables our simulations that we could not do otherwise. To run all our simulations requires lots of computing resources that we cannot run on a local cluster. To do our simulation screening, we have to perform lots of calculations. We can easily distribute these calculations because they don’t need to communicate to each other. The OSG is a perfect fit.”
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/preview-calendar/map/images/Thumbs.db b/preview-calendar/map/images/Thumbs.db
new file mode 100644
index 000000000..479db4c6c
Binary files /dev/null and b/preview-calendar/map/images/Thumbs.db differ
diff --git a/preview-calendar/map/images/add dept.xlsx b/preview-calendar/map/images/add dept.xlsx
new file mode 100644
index 000000000..f573423c5
Binary files /dev/null and b/preview-calendar/map/images/add dept.xlsx differ
diff --git a/preview-calendar/map/images/map-empty.jpg b/preview-calendar/map/images/map-empty.jpg
new file mode 100644
index 000000000..d6558c927
Binary files /dev/null and b/preview-calendar/map/images/map-empty.jpg differ
diff --git a/preview-calendar/map/images/map.jpg b/preview-calendar/map/images/map.jpg
new file mode 100644
index 000000000..b73fac42b
Binary files /dev/null and b/preview-calendar/map/images/map.jpg differ
diff --git a/preview-calendar/map/images/map.psd b/preview-calendar/map/images/map.psd
new file mode 100644
index 000000000..0fca71212
Binary files /dev/null and b/preview-calendar/map/images/map.psd differ
diff --git a/preview-calendar/map/images/map1.jpg b/preview-calendar/map/images/map1.jpg
new file mode 100644
index 000000000..5364d0512
Binary files /dev/null and b/preview-calendar/map/images/map1.jpg differ
diff --git a/preview-calendar/map/images/map2.jpg b/preview-calendar/map/images/map2.jpg
new file mode 100644
index 000000000..9389fdb41
Binary files /dev/null and b/preview-calendar/map/images/map2.jpg differ
diff --git a/preview-calendar/map/images/map2.psd b/preview-calendar/map/images/map2.psd
new file mode 100644
index 000000000..7a457c7e4
Binary files /dev/null and b/preview-calendar/map/images/map2.psd differ
diff --git a/preview-calendar/map/images/map3.jpg b/preview-calendar/map/images/map3.jpg
new file mode 100644
index 000000000..65bae21f5
Binary files /dev/null and b/preview-calendar/map/images/map3.jpg differ
diff --git a/preview-calendar/map/images/map4.jpg b/preview-calendar/map/images/map4.jpg
new file mode 100644
index 000000000..dec68e6dc
Binary files /dev/null and b/preview-calendar/map/images/map4.jpg differ
diff --git a/preview-calendar/map/images/map5.jpg b/preview-calendar/map/images/map5.jpg
new file mode 100644
index 000000000..9d3927126
Binary files /dev/null and b/preview-calendar/map/images/map5.jpg differ
diff --git a/preview-calendar/map/images/map6.jpg b/preview-calendar/map/images/map6.jpg
new file mode 100644
index 000000000..29f1a1aff
Binary files /dev/null and b/preview-calendar/map/images/map6.jpg differ
diff --git a/preview-calendar/map/index.html b/preview-calendar/map/index.html
new file mode 100644
index 000000000..037bd6b53
--- /dev/null
+++ b/preview-calendar/map/index.html
@@ -0,0 +1,649 @@
+
+
+
+
+
+
+CHTC User Map
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Researchers who use the CHTC are located throughout campus in the red buildings below. Hover above one to learn more.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/preview-calendar/map/scripts/jquery.imagemapster.js b/preview-calendar/map/scripts/jquery.imagemapster.js
new file mode 100644
index 000000000..fce098724
--- /dev/null
+++ b/preview-calendar/map/scripts/jquery.imagemapster.js
@@ -0,0 +1,4559 @@
+/* ImageMapster
+ Version: 1.2.8 (12/30/2012)
+
+Copyright 2011-2012 James Treworgy
+
+http://www.outsharked.com/imagemapster
+https://github.com/jamietre/ImageMapster
+
+A jQuery plugin to enhance image maps.
+
+*/
+
+;
+
+/// LICENSE (MIT License)
+///
+/// Permission is hereby granted, free of charge, to any person obtaining
+/// a copy of this software and associated documentation files (the
+/// "Software"), to deal in the Software without restriction, including
+/// without limitation the rights to use, copy, modify, merge, publish,
+/// distribute, sublicense, and/or sell copies of the Software, and to
+/// permit persons to whom the Software is furnished to do so, subject to
+/// the following conditions:
+///
+/// The above copyright notice and this permission notice shall be
+/// included in all copies or substantial portions of the Software.
+///
+/// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+/// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+/// MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+/// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+/// LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+/// OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+/// WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+///
+/// January 19, 2011
+
+/** @license MIT License (c) copyright B Cavalier & J Hann */
+
+/**
+* when
+* A lightweight CommonJS Promises/A and when() implementation
+*
+* when is part of the cujo.js family of libraries (http://cujojs.com/)
+*
+* Licensed under the MIT License at:
+* http://www.opensource.org/licenses/mit-license.php
+*
+* @version 1.2.0
+*/
+
+/*lint-ignore-start*/
+
+(function (define) {
+ define(function () {
+ var freeze, reduceArray, slice, undef;
+
+ //
+ // Public API
+ //
+
+ when.defer = defer;
+ when.reject = reject;
+ when.isPromise = isPromise;
+
+ when.all = all;
+ when.some = some;
+ when.any = any;
+
+ when.map = map;
+ when.reduce = reduce;
+
+ when.chain = chain;
+
+ /** Object.freeze */
+ freeze = Object.freeze || function (o) { return o; };
+
+ /**
+ * Trusted Promise constructor. A Promise created from this constructor is
+ * a trusted when.js promise. Any other duck-typed promise is considered
+ * untrusted.
+ *
+ * @constructor
+ */
+ function Promise() { }
+
+ Promise.prototype = freeze({
+ always: function (alwaysback, progback) {
+ return this.then(alwaysback, alwaysback, progback);
+ },
+
+ otherwise: function (errback) {
+ return this.then(undef, errback);
+ }
+ });
+
+ /**
+ * Create an already-resolved promise for the supplied value
+ * @private
+ *
+ * @param value anything
+ * @return {Promise}
+ */
+ function resolved(value) {
+
+ var p = new Promise();
+
+ p.then = function (callback) {
+ var nextValue;
+ try {
+ if (callback) nextValue = callback(value);
+ return promise(nextValue === undef ? value : nextValue);
+ } catch (e) {
+ return rejected(e);
+ }
+ };
+
+ return freeze(p);
+ }
+
+ /**
+ * Create an already-rejected {@link Promise} with the supplied
+ * rejection reason.
+ * @private
+ *
+ * @param reason rejection reason
+ * @return {Promise}
+ */
+ function rejected(reason) {
+
+ var p = new Promise();
+
+ p.then = function (callback, errback) {
+ var nextValue;
+ try {
+ if (errback) {
+ nextValue = errback(reason);
+ return promise(nextValue === undef ? reason : nextValue)
+ }
+
+ return rejected(reason);
+
+ } catch (e) {
+ return rejected(e);
+ }
+ };
+
+ return freeze(p);
+ }
+
+ /**
+ * Returns a rejected promise for the supplied promiseOrValue. If
+ * promiseOrValue is a value, it will be the rejection value of the
+ * returned promise. If promiseOrValue is a promise, its
+ * completion value will be the rejected value of the returned promise
+ *
+ * @param promiseOrValue {*} the rejected value of the returned {@link Promise}
+ *
+ * @return {Promise} rejected {@link Promise}
+ */
+ function reject(promiseOrValue) {
+ return when(promiseOrValue, function (value) {
+ return rejected(value);
+ });
+ }
+
+ /**
+ * Creates a new, CommonJS compliant, Deferred with fully isolated
+ * resolver and promise parts, either or both of which may be given out
+ * safely to consumers.
+ * The Deferred itself has the full API: resolve, reject, progress, and
+ * then. The resolver has resolve, reject, and progress. The promise
+ * only has then.
+ *
+ * @memberOf when
+ * @function
+ *
+ * @returns {Deferred}
+ */
+ function defer() {
+ var deferred, promise, listeners, progressHandlers, _then, _progress, complete;
+
+ listeners = [];
+ progressHandlers = [];
+
+ /**
+ * Pre-resolution then() that adds the supplied callback, errback, and progback
+ * functions to the registered listeners
+ *
+ * @private
+ *
+ * @param [callback] {Function} resolution handler
+ * @param [errback] {Function} rejection handler
+ * @param [progback] {Function} progress handler
+ *
+ * @throws {Error} if any argument is not null, undefined, or a Function
+ */
+ _then = function unresolvedThen(callback, errback, progback) {
+ var deferred = defer();
+
+ listeners.push(function (promise) {
+ promise.then(callback, errback)
+ .then(deferred.resolve, deferred.reject, deferred.progress);
+ });
+
+ progback && progressHandlers.push(progback);
+
+ return deferred.promise;
+ };
+
+ /**
+ * Registers a handler for this {@link Deferred}'s {@link Promise}. Even though all arguments
+ * are optional, each argument that *is* supplied must be null, undefined, or a Function.
+ * Any other value will cause an Error to be thrown.
+ *
+ * @memberOf Promise
+ *
+ * @param [callback] {Function} resolution handler
+ * @param [errback] {Function} rejection handler
+ * @param [progback] {Function} progress handler
+ *
+ * @throws {Error} if any argument is not null, undefined, or a Function
+ */
+ function then(callback, errback, progback) {
+ return _then(callback, errback, progback);
+ }
+
+ /**
+ * Resolves this {@link Deferred}'s {@link Promise} with val as the
+ * resolution value.
+ *
+ * @memberOf Resolver
+ *
+ * @param val anything
+ */
+ function resolve(val) {
+ complete(resolved(val));
+ }
+
+ /**
+ * Rejects this {@link Deferred}'s {@link Promise} with err as the
+ * reason.
+ *
+ * @memberOf Resolver
+ *
+ * @param err anything
+ */
+ function reject(err) {
+ complete(rejected(err));
+ }
+
+ /**
+ * @private
+ * @param update
+ */
+ _progress = function (update) {
+ var progress, i = 0;
+ while (progress = progressHandlers[i++]) progress(update);
+ };
+
+ /**
+ * Emits a progress update to all progress observers registered with
+ * this {@link Deferred}'s {@link Promise}
+ *
+ * @memberOf Resolver
+ *
+ * @param update anything
+ */
+ function progress(update) {
+ _progress(update);
+ }
+
+ /**
+ * Transition from pre-resolution state to post-resolution state, notifying
+ * all listeners of the resolution or rejection
+ *
+ * @private
+ *
+ * @param completed {Promise} the completed value of this deferred
+ */
+ complete = function (completed) {
+ var listener, i = 0;
+
+ // Replace _then with one that directly notifies with the result.
+ _then = completed.then;
+
+ // Replace complete so that this Deferred can only be completed
+ // once. Also Replace _progress, so that subsequent attempts to issue
+ // progress throw.
+ complete = _progress = function alreadyCompleted() {
+ // TODO: Consider silently returning here so that parties who
+ // have a reference to the resolver cannot tell that the promise
+ // has been resolved using try/catch
+ throw new Error("already completed");
+ };
+
+ // Free progressHandlers array since we'll never issue progress events
+ // for this promise again now that it's completed
+ progressHandlers = undef;
+
+ // Notify listeners
+ // Traverse all listeners registered directly with this Deferred
+
+ while (listener = listeners[i++]) {
+ listener(completed);
+ }
+
+ listeners = [];
+ };
+
+ /**
+ * The full Deferred object, with both {@link Promise} and {@link Resolver}
+ * parts
+ * @class Deferred
+ * @name Deferred
+ */
+ deferred = {};
+
+ // Promise and Resolver parts
+ // Freeze Promise and Resolver APIs
+
+ promise = new Promise();
+ promise.then = deferred.then = then;
+
+ /**
+ * The {@link Promise} for this {@link Deferred}
+ * @memberOf Deferred
+ * @name promise
+ * @type {Promise}
+ */
+ deferred.promise = freeze(promise);
+
+ /**
+ * The {@link Resolver} for this {@link Deferred}
+ * @memberOf Deferred
+ * @name resolver
+ * @class Resolver
+ */
+ deferred.resolver = freeze({
+ resolve: (deferred.resolve = resolve),
+ reject: (deferred.reject = reject),
+ progress: (deferred.progress = progress)
+ });
+
+ return deferred;
+ }
+
+ /**
+ * Determines if promiseOrValue is a promise or not. Uses the feature
+ * test from http://wiki.commonjs.org/wiki/Promises/A to determine if
+ * promiseOrValue is a promise.
+ *
+ * @param promiseOrValue anything
+ *
+ * @returns {Boolean} true if promiseOrValue is a {@link Promise}
+ */
+ function isPromise(promiseOrValue) {
+ return promiseOrValue && typeof promiseOrValue.then === 'function';
+ }
+
+ /**
+ * Register an observer for a promise or immediate value.
+ *
+ * @function
+ * @name when
+ * @namespace
+ *
+ * @param promiseOrValue anything
+ * @param {Function} [callback] callback to be called when promiseOrValue is
+ * successfully resolved. If promiseOrValue is an immediate value, callback
+ * will be invoked immediately.
+ * @param {Function} [errback] callback to be called when promiseOrValue is
+ * rejected.
+ * @param {Function} [progressHandler] callback to be called when progress updates
+ * are issued for promiseOrValue.
+ *
+ * @returns {Promise} a new {@link Promise} that will complete with the return
+ * value of callback or errback or the completion value of promiseOrValue if
+ * callback and/or errback is not supplied.
+ */
+ function when(promiseOrValue, callback, errback, progressHandler) {
+ // Get a promise for the input promiseOrValue
+ // See promise()
+ var trustedPromise = promise(promiseOrValue);
+
+ // Register promise handlers
+ return trustedPromise.then(callback, errback, progressHandler);
+ }
+
+ /**
+ * Returns promiseOrValue if promiseOrValue is a {@link Promise}, a new Promise if
+ * promiseOrValue is a foreign promise, or a new, already-resolved {@link Promise}
+ * whose resolution value is promiseOrValue if promiseOrValue is an immediate value.
+ *
+ * Note that this function is not safe to export since it will return its
+ * input when promiseOrValue is a {@link Promise}
+ *
+ * @private
+ *
+ * @param promiseOrValue anything
+ *
+ * @returns Guaranteed to return a trusted Promise. If promiseOrValue is a when.js {@link Promise}
+ * returns promiseOrValue, otherwise, returns a new, already-resolved, when.js {@link Promise}
+ * whose resolution value is:
+ * * the resolution value of promiseOrValue if it's a foreign promise, or
+ * * promiseOrValue if it's a value
+ */
+ function promise(promiseOrValue) {
+ var promise, deferred;
+
+ if (promiseOrValue instanceof Promise) {
+ // It's a when.js promise, so we trust it
+ promise = promiseOrValue;
+
+ } else {
+ // It's not a when.js promise. Check to see if it's a foreign promise
+ // or a value.
+
+ deferred = defer();
+ if (isPromise(promiseOrValue)) {
+ // It's a compliant promise, but we don't know where it came from,
+ // so we don't trust its implementation entirely. Introduce a trusted
+ // middleman when.js promise
+
+ // IMPORTANT: This is the only place when.js should ever call .then() on
+ // an untrusted promise.
+ promiseOrValue.then(deferred.resolve, deferred.reject, deferred.progress);
+ promise = deferred.promise;
+
+ } else {
+ // It's a value, not a promise. Create an already-resolved promise
+ // for it.
+ deferred.resolve(promiseOrValue);
+ promise = deferred.promise;
+ }
+ }
+
+ return promise;
+ }
+
+ /**
+ * Return a promise that will resolve when howMany of the supplied promisesOrValues
+ * have resolved. The resolution value of the returned promise will be an array of
+ * length howMany containing the resolutions values of the triggering promisesOrValues.
+ *
+ * @memberOf when
+ *
+ * @param promisesOrValues {Array} array of anything, may contain a mix
+ * of {@link Promise}s and values
+ * @param howMany
+ * @param [callback]
+ * @param [errback]
+ * @param [progressHandler]
+ *
+ * @returns {Promise}
+ */
+ function some(promisesOrValues, howMany, callback, errback, progressHandler) {
+
+ checkCallbacks(2, arguments);
+
+ return when(promisesOrValues, function (promisesOrValues) {
+
+ var toResolve, results, ret, deferred, resolver, rejecter, handleProgress, len, i;
+
+ len = promisesOrValues.length >>> 0;
+
+ toResolve = Math.max(0, Math.min(howMany, len));
+ results = [];
+ deferred = defer();
+ ret = when(deferred, callback, errback, progressHandler);
+
+ // Wrapper so that resolver can be replaced
+ function resolve(val) {
+ resolver(val);
+ }
+
+ // Wrapper so that rejecter can be replaced
+ function reject(err) {
+ rejecter(err);
+ }
+
+ // Wrapper so that progress can be replaced
+ function progress(update) {
+ handleProgress(update);
+ }
+
+ function complete() {
+ resolver = rejecter = handleProgress = noop;
+ }
+
+ // No items in the input, resolve immediately
+ if (!toResolve) {
+ deferred.resolve(results);
+
+ } else {
+ // Resolver for promises. Captures the value and resolves
+ // the returned promise when toResolve reaches zero.
+ // Overwrites resolver var with a noop once promise has
+ // be resolved to cover case where n < promises.length
+ resolver = function (val) {
+ // This orders the values based on promise resolution order
+ // Another strategy would be to use the original position of
+ // the corresponding promise.
+ results.push(val);
+
+ if (! --toResolve) {
+ complete();
+ deferred.resolve(results);
+ }
+ };
+
+ // Rejecter for promises. Rejects returned promise
+ // immediately, and overwrites rejecter var with a noop
+ // once promise to cover case where n < promises.length.
+ // TODO: Consider rejecting only when N (or promises.length - N?)
+ // promises have been rejected instead of only one?
+ rejecter = function (err) {
+ complete();
+ deferred.reject(err);
+ };
+
+ handleProgress = deferred.progress;
+
+ // TODO: Replace while with forEach
+ for (i = 0; i < len; ++i) {
+ if (i in promisesOrValues) {
+ when(promisesOrValues[i], resolve, reject, progress);
+ }
+ }
+ }
+
+ return ret;
+ });
+ }
+
+ /**
+ * Return a promise that will resolve only once all the supplied promisesOrValues
+ * have resolved. The resolution value of the returned promise will be an array
+ * containing the resolution values of each of the promisesOrValues.
+ *
+ * @memberOf when
+ *
+ * @param promisesOrValues {Array|Promise} array of anything, may contain a mix
+ * of {@link Promise}s and values
+ * @param [callback] {Function}
+ * @param [errback] {Function}
+ * @param [progressHandler] {Function}
+ *
+ * @returns {Promise}
+ */
+ function all(promisesOrValues, callback, errback, progressHandler) {
+
+ checkCallbacks(1, arguments);
+
+ return when(promisesOrValues, function (promisesOrValues) {
+ return _reduce(promisesOrValues, reduceIntoArray, []);
+ }).then(callback, errback, progressHandler);
+ }
+
+ function reduceIntoArray(current, val, i) {
+ current[i] = val;
+ return current;
+ }
+
+ /**
+ * Return a promise that will resolve when any one of the supplied promisesOrValues
+ * has resolved. The resolution value of the returned promise will be the resolution
+ * value of the triggering promiseOrValue.
+ *
+ * @memberOf when
+ *
+ * @param promisesOrValues {Array|Promise} array of anything, may contain a mix
+ * of {@link Promise}s and values
+ * @param [callback] {Function}
+ * @param [errback] {Function}
+ * @param [progressHandler] {Function}
+ *
+ * @returns {Promise}
+ */
+ function any(promisesOrValues, callback, errback, progressHandler) {
+
+ function unwrapSingleResult(val) {
+ return callback ? callback(val[0]) : val[0];
+ }
+
+ return some(promisesOrValues, 1, unwrapSingleResult, errback, progressHandler);
+ }
+
+ /**
+ * Traditional map function, similar to `Array.prototype.map()`, but allows
+ * input to contain {@link Promise}s and/or values, and mapFunc may return
+ * either a value or a {@link Promise}
+ *
+ * @memberOf when
+ *
+ * @param promise {Array|Promise} array of anything, may contain a mix
+ * of {@link Promise}s and values
+ * @param mapFunc {Function} mapping function mapFunc(value) which may return
+ * either a {@link Promise} or value
+ *
+ * @returns {Promise} a {@link Promise} that will resolve to an array containing
+ * the mapped output values.
+ */
+ function map(promise, mapFunc) {
+ return when(promise, function (array) {
+ return _map(array, mapFunc);
+ });
+ }
+
+ /**
+ * Private map helper to map an array of promises
+ * @private
+ *
+ * @param promisesOrValues {Array}
+ * @param mapFunc {Function}
+ * @return {Promise}
+ */
+ function _map(promisesOrValues, mapFunc) {
+
+ var results, len, i;
+
+ // Since we know the resulting length, we can preallocate the results
+ // array to avoid array expansions.
+ len = promisesOrValues.length >>> 0;
+ results = new Array(len);
+
+ // Since mapFunc may be async, get all invocations of it into flight
+ // asap, and then use reduce() to collect all the results
+ for (i = 0; i < len; i++) {
+ if (i in promisesOrValues)
+ results[i] = when(promisesOrValues[i], mapFunc);
+ }
+
+ // Could use all() here, but that would result in another array
+ // being allocated, i.e. map() would end up allocating 2 arrays
+ // of size len instead of just 1. Since all() uses reduce()
+ // anyway, avoid the additional allocation by calling reduce
+ // directly.
+ return _reduce(results, reduceIntoArray, results);
+ }
+
+ /**
+ * Traditional reduce function, similar to `Array.prototype.reduce()`, but
+ * input may contain {@link Promise}s and/or values, and reduceFunc
+ * may return either a value or a {@link Promise}, *and* initialValue may
+ * be a {@link Promise} for the starting value.
+ *
+ * @memberOf when
+ *
+ * @param promise {Array|Promise} array of anything, may contain a mix
+ * of {@link Promise}s and values. May also be a {@link Promise} for
+ * an array.
+ * @param reduceFunc {Function} reduce function reduce(currentValue, nextValue, index, total),
+ * where total is the total number of items being reduced, and will be the same
+ * in each call to reduceFunc.
+ * @param initialValue starting value, or a {@link Promise} for the starting value
+ *
+ * @returns {Promise} that will resolve to the final reduced value
+ */
+ function reduce(promise, reduceFunc, initialValue) {
+ var args = slice.call(arguments, 1);
+ return when(promise, function (array) {
+ return _reduce.apply(undef, [array].concat(args));
+ });
+ }
+
+ /**
+ * Private reduce to reduce an array of promises
+ * @private
+ *
+ * @param promisesOrValues {Array}
+ * @param reduceFunc {Function}
+ * @param initialValue {*}
+ * @return {Promise}
+ */
+ function _reduce(promisesOrValues, reduceFunc, initialValue) {
+
+ var total, args;
+
+ total = promisesOrValues.length;
+
+ // Skip promisesOrValues, since it will be used as 'this' in the call
+ // to the actual reduce engine below.
+
+ // Wrap the supplied reduceFunc with one that handles promises and then
+ // delegates to the supplied.
+
+ args = [
+ function (current, val, i) {
+ return when(current, function (c) {
+ return when(val, function (value) {
+ return reduceFunc(c, value, i, total);
+ });
+ });
+ }
+ ];
+
+ if (arguments.length > 2) args.push(initialValue);
+
+ return reduceArray.apply(promisesOrValues, args);
+ }
+
+ /**
+ * Ensure that resolution of promiseOrValue will complete resolver with the completion
+ * value of promiseOrValue, or instead with resolveValue if it is provided.
+ *
+ * @memberOf when
+ *
+ * @param promiseOrValue
+ * @param resolver {Resolver}
+ * @param [resolveValue] anything
+ *
+ * @returns {Promise}
+ */
+ function chain(promiseOrValue, resolver, resolveValue) {
+ var useResolveValue = arguments.length > 2;
+
+ return when(promiseOrValue,
+ function (val) {
+ if (useResolveValue) val = resolveValue;
+ resolver.resolve(val);
+ return val;
+ },
+ function (e) {
+ resolver.reject(e);
+ return rejected(e);
+ },
+ resolver.progress
+ );
+ }
+
+ //
+ // Utility functions
+ //
+
+ /**
+ * Helper that checks arrayOfCallbacks to ensure that each element is either
+ * a function, or null or undefined.
+ *
+ * @private
+ *
+ * @param arrayOfCallbacks {Array} array to check
+ * @throws {Error} if any element of arrayOfCallbacks is something other than
+ * a Functions, null, or undefined.
+ */
+ function checkCallbacks(start, arrayOfCallbacks) {
+ var arg, i = arrayOfCallbacks.length;
+ while (i > start) {
+ arg = arrayOfCallbacks[--i];
+ if (arg != null && typeof arg != 'function') throw new Error('callback is not a function');
+ }
+ }
+
+ /**
+ * No-Op function used in method replacement
+ * @private
+ */
+ function noop() { }
+
+ slice = [].slice;
+
+ // ES5 reduce implementation if native not available
+ // See: http://es5.github.com/#x15.4.4.21 as there are many
+ // specifics and edge cases.
+ reduceArray = [].reduce ||
+ function (reduceFunc /*, initialValue */) {
+ // ES5 dictates that reduce.length === 1
+
+ // This implementation deviates from ES5 spec in the following ways:
+ // 1. It does not check if reduceFunc is a Callable
+
+ var arr, args, reduced, len, i;
+
+ i = 0;
+ arr = Object(this);
+ len = arr.length >>> 0;
+ args = arguments;
+
+ // If no initialValue, use first item of array (we know length !== 0 here)
+ // and adjust i to start at second item
+ if (args.length <= 1) {
+ // Skip to the first real element in the array
+ for (; ; ) {
+ if (i in arr) {
+ reduced = arr[i++];
+ break;
+ }
+
+ // If we reached the end of the array without finding any real
+ // elements, it's a TypeError
+ if (++i >= len) {
+ throw new TypeError();
+ }
+ }
+ } else {
+ // If initialValue provided, use it
+ reduced = args[1];
+ }
+
+ // Do the actual reduce
+ for (; i < len; ++i) {
+ // Skip holes
+ if (i in arr)
+ reduced = reduceFunc(reduced, arr[i], i, arr);
+ }
+
+ return reduced;
+ };
+
+ return when;
+ });
+})(typeof define == 'function'
+ ? define
+ : function (factory) {
+ typeof module != 'undefined'
+ ? (module.exports = factory())
+ : (jQuery.mapster_when = factory());
+ }
+// Boilerplate for AMD, Node, and browser global
+);
+/*lint-ignore-end*/
+/* ImageMapster core */
+
+/*jslint laxbreak: true, evil: true, unparam: true */
+
+/*global jQuery: true, Zepto: true */
+
+
+(function ($) {
+ // all public functions in $.mapster.impl are methods
+ $.fn.mapster = function (method) {
+ var m = $.mapster.impl;
+ if ($.isFunction(m[method])) {
+ return m[method].apply(this, Array.prototype.slice.call(arguments, 1));
+ } else if (typeof method === 'object' || !method) {
+ return m.bind.apply(this, arguments);
+ } else {
+ $.error('Method ' + method + ' does not exist on jQuery.mapster');
+ }
+ };
+
+ $.mapster = {
+ version: "1.2.8",
+ render_defaults: {
+ isSelectable: true,
+ isDeselectable: true,
+ fade: false,
+ fadeDuration: 150,
+ fill: true,
+ fillColor: '000000',
+ fillColorMask: 'FFFFFF',
+ fillOpacity: 0.7,
+ highlight: true,
+ stroke: false,
+ strokeColor: 'ff0000',
+ strokeOpacity: 1,
+ strokeWidth: 1,
+ includeKeys: '',
+ altImage: null,
+ altImageId: null, // used internally
+ altImages: {}
+ },
+ defaults: {
+ clickNavigate: false,
+ wrapClass: null,
+ wrapCss: null,
+ onGetList: null,
+ sortList: false,
+ listenToList: false,
+ mapKey: '',
+ mapValue: '',
+ singleSelect: false,
+ listKey: 'value',
+ listSelectedAttribute: 'selected',
+ listSelectedClass: null,
+ onClick: null,
+ onMouseover: null,
+ onMouseout: null,
+ mouseoutDelay: 0,
+ onStateChange: null,
+ boundList: null,
+ onConfigured: null,
+ configTimeout: 30000,
+ noHrefIsMask: true,
+ scaleMap: true,
+ safeLoad: false,
+ areas: []
+ },
+ shared_defaults: {
+ render_highlight: { fade: true },
+ render_select: { fade: false },
+ staticState: null,
+ selected: null
+ },
+ area_defaults:
+ {
+ includeKeys: '',
+ isMask: false
+ },
+ canvas_style: {
+ position: 'absolute',
+ left: 0,
+ top: 0,
+ padding: 0,
+ border: 0
+ },
+ hasCanvas: null,
+ isTouch: null,
+ windowLoaded: false,
+ map_cache: [],
+ hooks: {},
+ addHook: function(name,callback) {
+ this.hooks[name]=(this.hooks[name]||[]).push(callback);
+ },
+ callHooks: function(name,context) {
+ $.each(this.hooks[name]||[],function(i,e) {
+ e.apply(context);
+ });
+ },
+ utils: {
+ when: $.mapster_when,
+ defer: $.mapster_when.defer,
+
+ // extends the constructor, returns a new object prototype. Does not refer to the
+ // original constructor so is protected if the original object is altered. This way you
+ // can "extend" an object by replacing it with its subclass.
+ subclass: function(BaseClass, constr) {
+ var Subclass=function() {
+ var me=this,
+ args=Array.prototype.slice.call(arguments,0);
+ me.base = BaseClass.prototype;
+ me.base.init = function() {
+ BaseClass.prototype.constructor.apply(me,args);
+ };
+ constr.apply(me,args);
+ };
+ Subclass.prototype = new BaseClass();
+ Subclass.prototype.constructor=Subclass;
+ return Subclass;
+ },
+ asArray: function (obj) {
+ return obj.constructor === Array ?
+ obj : this.split(obj);
+ },
+ // clean split: no padding or empty elements
+ split: function (text,cb) {
+ var i,el, arr = text.split(',');
+ for (i = 0; i < arr.length; i++) {
+ el = $.trim(arr[i]);
+ if (el==='') {
+ arr.splice(i,1);
+ } else {
+ arr[i] = cb ? cb(el):el;
+ }
+ }
+ return arr;
+ },
+ // similar to $.extend but does not add properties (only updates), unless the
+ // first argument is an empty object, then all properties will be copied
+ updateProps: function (_target, _template) {
+ var onlyProps,
+ target = _target || {},
+ template = $.isEmptyObject(target) ? _template : _target;
+
+ //if (template) {
+ onlyProps = [];
+ $.each(template, function (prop) {
+ onlyProps.push(prop);
+ });
+ //}
+
+ $.each(Array.prototype.slice.call(arguments, 1), function (i, src) {
+ $.each(src || {}, function (prop) {
+ if (!onlyProps || $.inArray(prop, onlyProps) >= 0) {
+ var p = src[prop];
+
+ if ($.isPlainObject(p)) {
+ // not recursive - only copies 1 level of subobjects, and always merges
+ target[prop] = $.extend(target[prop] || {}, p);
+ } else if (p && p.constructor === Array) {
+ target[prop] = p.slice(0);
+ } else if (typeof p !== 'undefined') {
+ target[prop] = src[prop];
+ }
+
+ }
+ });
+ });
+ return target;
+ },
+ isElement: function (o) {
+ return (typeof HTMLElement === "object" ? o instanceof HTMLElement :
+ o && typeof o === "object" && o.nodeType === 1 && typeof o.nodeName === "string");
+ },
+ // finds element of array or object with a property "prop" having value "val"
+ // if prop is not defined, then just looks for property with value "val"
+ indexOfProp: function (obj, prop, val) {
+ var result = obj.constructor === Array ? -1 : null;
+ $.each(obj, function (i, e) {
+ if (e && (prop ? e[prop] : e) === val) {
+ result = i;
+ return false;
+ }
+ });
+ return result;
+ },
+ // returns "obj" if true or false, or "def" if not true/false
+ boolOrDefault: function (obj, def) {
+ return this.isBool(obj) ?
+ obj : def || false;
+ },
+ isBool: function (obj) {
+ return typeof obj === "boolean";
+ },
+ isUndef: function(obj) {
+ return typeof obj === "undefined";
+ },
+ // evaluates "obj", if function, calls it with args
+ // (todo - update this to handle variable lenght/more than one arg)
+ ifFunction: function (obj, that, args) {
+ if ($.isFunction(obj)) {
+ obj.call(that, args);
+ }
+ },
+ size: function(image, raw) {
+ var u=$.mapster.utils;
+ return {
+ width: raw ? (image.width || image.naturalWidth) : u.imgWidth(image,true) ,
+ height: raw ? (image.height || image.naturalHeight) : u.imgHeight(image,true),
+ complete: function() { return !!this.height && !!this.width;}
+ };
+ },
+
+ // basic function to set the opacity of an element.
+ // this gets monkey patched by the graphics module when running in IE6-8
+
+ setOpacity: function (el, opacity) {
+ el.style.opacity = opacity;
+ },
+
+ // fade "el" from opacity "op" to "endOp" over a period of time "duration"
+
+ fader: (function () {
+ var elements = {},
+ lastKey = 0,
+ fade_func = function (el, op, endOp, duration) {
+ var index,
+ cbIntervals = duration/15,
+ obj, u = $.mapster.utils;
+
+ if (typeof el === 'number') {
+ obj = elements[el];
+ if (!obj) {
+ return;
+ }
+ } else {
+ index = u.indexOfProp(elements, null, el);
+ if (index) {
+ delete elements[index];
+ }
+ elements[++lastKey] = obj = el;
+ el = lastKey;
+ }
+
+ endOp = endOp || 1;
+
+ op = (op + (endOp / cbIntervals) > endOp - 0.01) ? endOp : op + (endOp / cbIntervals);
+
+ u.setOpacity(obj, op);
+ if (op < endOp) {
+ setTimeout(function () {
+ fade_func(el, op, endOp, duration);
+ }, 15);
+ }
+ };
+ return fade_func;
+ } ())
+ },
+ getBoundList: function (opts, key_list) {
+ if (!opts.boundList) {
+ return null;
+ }
+ var index, key, result = $(), list = $.mapster.utils.split(key_list);
+ opts.boundList.each(function (i,e) {
+ for (index = 0; index < list.length; index++) {
+ key = list[index];
+ if ($(e).is('[' + opts.listKey + '="' + key + '"]')) {
+ result = result.add(e);
+ }
+ }
+ });
+ return result;
+ },
+ // Causes changes to the bound list based on the user action (select or deselect)
+ // area: the jQuery area object
+ // returns the matching elements from the bound list for the first area passed (normally only one should be passed, but
+ // a list can be passed
+ setBoundListProperties: function (opts, target, selected) {
+ target.each(function (i,e) {
+ if (opts.listSelectedClass) {
+ if (selected) {
+ $(e).addClass(opts.listSelectedClass);
+ } else {
+ $(e).removeClass(opts.listSelectedClass);
+ }
+ }
+ if (opts.listSelectedAttribute) {
+ $(e).attr(opts.listSelectedAttribute, selected);
+ }
+ });
+ },
+ getMapDataIndex: function (obj) {
+ var img, id;
+ switch (obj.tagName && obj.tagName.toLowerCase()) {
+ case 'area':
+ id = $(obj).parent().attr('name');
+ img = $("img[usemap='#" + id + "']")[0];
+ break;
+ case 'img':
+ img = obj;
+ break;
+ }
+ return img ?
+ this.utils.indexOfProp(this.map_cache, 'image', img) : -1;
+ },
+ getMapData: function (obj) {
+ var index = this.getMapDataIndex(obj.length ? obj[0]:obj);
+ if (index >= 0) {
+ return index >= 0 ? this.map_cache[index] : null;
+ }
+ },
+ queueCommand: function (map_data, that, command, args) {
+ if (!map_data) {
+ return false;
+ }
+ if (!map_data.complete || map_data.currentAction) {
+ map_data.commands.push(
+ {
+ that: that,
+ command: command,
+ args: args
+ });
+ return true;
+ }
+ return false;
+ },
+ unload: function () {
+ this.impl.unload();
+ this.utils = null;
+ this.impl = null;
+ $.fn.mapster = null;
+ $.mapster = null;
+ $('*').unbind();
+ }
+ };
+
+ // Config for object prototypes
+ // first: use only first object (for things that should not apply to lists)
+ /// calls back one of two fuinctions, depending on whether an area was obtained.
+ // opts: {
+ // name: 'method name',
+ // key: 'key,
+ // args: 'args'
+ //
+ //}
+ // name: name of method (required)
+ // args: arguments to re-call with
+ // Iterates through all the objects passed, and determines whether it's an area or an image, and calls the appropriate
+ // callback for each. If anything is returned from that callback, the process is stopped and that data return. Otherwise,
+ // the object itself is returned.
+
+ var m = $.mapster,
+ u = m.utils,
+ ap = Array.prototype;
+
+
+ // jQuery's width() and height() are broken on IE9 in some situations. This tries everything.
+ $.each(["width","height"],function(i,e) {
+ var capProp = e.substr(0,1).toUpperCase() + e.substr(1);
+ // when jqwidth parm is passed, it also checks the jQuery width()/height() property
+ // the issue is that jQUery width() can report a valid size before the image is loaded in some browsers
+ // without it, we can read zero even when image is loaded in other browsers if its not visible
+ // we must still check because stuff like adblock can temporarily block it
+ // what a goddamn headache
+ u["img"+capProp]=function(img,jqwidth) {
+ return (jqwidth ? $(img)[e]() : 0) ||
+ img[e] || img["natural"+capProp] || img["client"+capProp] || img["offset"+capProp];
+ };
+
+ });
+
+ m.Method = function (that, func_map, func_area, opts) {
+ var me = this;
+ me.name = opts.name;
+ me.output = that;
+ me.input = that;
+ me.first = opts.first || false;
+ me.args = opts.args ? ap.slice.call(opts.args, 0) : [];
+ me.key = opts.key;
+ me.func_map = func_map;
+ me.func_area = func_area;
+ //$.extend(me, opts);
+ me.name = opts.name;
+ me.allowAsync = opts.allowAsync || false;
+ };
+ m.Method.prototype.go = function () {
+ var i, data, ar, len, result, src = this.input,
+ area_list = [],
+ me = this;
+
+ len = src.length;
+ for (i = 0; i < len; i++) {
+ data = $.mapster.getMapData(src[i]);
+ if (data) {
+ if (!me.allowAsync && m.queueCommand(data, me.input, me.name, me.args)) {
+ if (this.first) {
+ result = '';
+ }
+ continue;
+ }
+
+ ar = data.getData(src[i].nodeName === 'AREA' ? src[i] : this.key);
+ if (ar) {
+ if ($.inArray(ar, area_list) < 0) {
+ area_list.push(ar);
+ }
+ } else {
+ result = this.func_map.apply(data, me.args);
+ }
+ if (this.first || typeof result !== 'undefined') {
+ break;
+ }
+ }
+ }
+ // if there were areas, call the area function for each unique group
+ $(area_list).each(function (i,e) {
+ result = me.func_area.apply(e, me.args);
+ });
+
+ if (typeof result !== 'undefined') {
+ return result;
+ } else {
+ return this.output;
+ }
+ };
+
+
+ $.mapster.impl = (function () {
+ var me = {},
+ removeMap, addMap;
+
+ addMap = function (map_data) {
+ return m.map_cache.push(map_data) - 1;
+ };
+ removeMap = function (map_data) {
+ m.map_cache.splice(map_data.index, 1);
+ for (var i = m.map_cache.length - 1; i >= this.index; i--) {
+ m.map_cache[i].index--;
+ }
+ };
+ /// return current map_data for an image or area
+
+ // merge new area data into existing area options. used for rebinding.
+ function merge_areas(map_data, areas) {
+ var ar, index,
+ map_areas = map_data.options.areas;
+ if (areas) {
+ $.each(areas, function (i, e) {
+
+ // Issue #68 - ignore invalid data in areas array
+
+ if (!e || !e.key) {
+ return;
+ }
+
+ index = u.indexOfProp(map_areas, "key", e.key);
+ if (index >= 0) {
+ $.extend(map_areas[index], e);
+ }
+ else {
+ map_areas.push(e);
+ }
+ ar = map_data.getDataForKey(e.key);
+ if (ar) {
+ $.extend(ar.options, e);
+ }
+ });
+ }
+ }
+ function merge_options(map_data, options) {
+ var temp_opts = u.updateProps({}, options);
+ delete temp_opts.areas;
+
+ u.updateProps(map_data.options, temp_opts);
+
+ merge_areas(map_data, options.areas);
+ // refresh the area_option template
+ u.updateProps(map_data.area_options, map_data.options);
+ }
+ // Most methods use the "Method" object which handles figuring out whether it's an image or area called and
+ // parsing key parameters. The constructor wants:
+ // this, the jQuery object
+ // a function that is called when an image was passed (with a this context of the MapData)
+ // a function that is called when an area was passed (with a this context of the AreaData)
+ // options: first = true means only the first member of a jQuery object is handled
+ // key = the key parameters passed
+ // defaultReturn: a value to return other than the jQuery object (if its not chainable)
+ // args: the arguments
+ // Returns a comma-separated list of user-selected areas. "staticState" areas are not considered selected for the purposes of this method.
+ me.get = function (key) {
+ var md = m.getMapData(this);
+ if (!(md && md.complete)) {
+ throw("Can't access data until binding complete.");
+ }
+
+ return (new m.Method(this,
+ function () {
+ // map_data return
+ return this.getSelected();
+ },
+ function () {
+ return this.isSelected();
+ },
+ { name: 'get',
+ args: arguments,
+ key: key,
+ first: true,
+ allowAsync: true,
+ defaultReturn: ''
+ }
+ )).go();
+ };
+ me.data = function (key) {
+ return (new m.Method(this,
+ null,
+ function () {
+ return this;
+ },
+ { name: 'data',
+ args: arguments,
+ key: key
+ }
+ )).go();
+ };
+
+
+ // Set or return highlight state.
+ // $(img).mapster('highlight') -- return highlighted area key, or null if none
+ // $(area).mapster('highlight') -- highlight an area
+ // $(img).mapster('highlight','area_key') -- highlight an area
+ // $(img).mapster('highlight',false) -- remove highlight
+ me.highlight = function (key) {
+ return (new m.Method(this,
+ function () {
+ if (key === false) {
+ this.ensureNoHighlight();
+ } else {
+ var id = this.highlightId;
+ return id >= 0 ? this.data[id].key : null;
+ }
+ },
+ function () {
+ this.highlight();
+ },
+ { name: 'highlight',
+ args: arguments,
+ key: key,
+ first: true
+ }
+ )).go();
+ };
+ // Return the primary keys for an area or group key.
+ // $(area).mapster('key')
+ // includes all keys (not just primary keys)
+ // $(area).mapster('key',true)
+ // $(img).mapster('key','group-key')
+
+ // $(img).mapster('key','group-key', true)
+ me.keys = function(key,all) {
+ var keyList=[],
+ md = m.getMapData(this);
+
+ if (!(md && md.complete)) {
+ throw("Can't access data until binding complete.");
+ }
+
+
+ function addUniqueKeys(ad) {
+ var areas,keys=[];
+ if (!all) {
+ keys.push(ad.key);
+ } else {
+ areas=ad.areas();
+ $.each(areas,function(i,e) {
+ keys=keys.concat(e.keys);
+ });
+ }
+ $.each(keys,function(i,e) {
+ if ($.inArray(e,keyList)<0) {
+ keyList.push(e);
+ }
+ });
+ }
+
+ if (!(md && md.complete)) {
+ return '';
+ }
+ if (typeof key === 'string') {
+ if (all) {
+ addUniqueKeys(md.getDataForKey(key));
+ } else {
+ keyList=[md.getKeysForGroup(key)];
+ }
+ } else {
+ all = key;
+ this.each(function(i,e) {
+ if (e.nodeName==='AREA') {
+ addUniqueKeys(md.getDataForArea(e));
+ }
+ });
+ }
+ return keyList.join(',');
+
+
+ };
+ me.select = function () {
+ me.set.call(this, true);
+ };
+ me.deselect = function () {
+ me.set.call(this, false);
+ };
+
+ /**
+ * Select or unselect areas. Areas can be identified by a single string key, a comma-separated list of keys,
+ * or an array of strings.
+ *
+ *
+ * @param {boolean} selected Determines whether areas are selected or deselected
+ * @param {string|string[]} key A string, comma-separated string, or array of strings indicating
+ * the areas to select or deselect
+ * @param {object} options Rendering options to apply when selecting an area
+ */
+
+ me.set = function (selected, key, options) {
+ var lastMap, map_data, opts=options,
+ key_list, area_list; // array of unique areas passed
+
+ function setSelection(ar) {
+ if (ar) {
+ switch (selected) {
+ case true:
+ ar.select(opts); break;
+ case false:
+ ar.deselect(true); break;
+ default:
+ ar.toggle(opts); break;
+ }
+ }
+ }
+ function addArea(ar) {
+ if (ar && $.inArray(ar, area_list) < 0) {
+ area_list.push(ar);
+ key_list+=(key_list===''?'':',')+ar.key;
+ }
+ }
+ // Clean up after a group that applied to the same map
+ function finishSetForMap(map_data) {
+ $.each(area_list, function (i, el) {
+ setSelection(el);
+ });
+ if (!selected) {
+ map_data.removeSelectionFinish();
+ }
+ if (map_data.options.boundList) {
+ m.setBoundListProperties(map_data.options, m.getBoundList(map_data.options, key_list), selected);
+ }
+ }
+
+ this.filter('img,area').each(function (i,e) {
+ var keys;
+ map_data = m.getMapData(e);
+
+ if (map_data !== lastMap) {
+ if (lastMap) {
+ finishSetForMap(lastMap);
+ }
+
+ area_list = [];
+ key_list='';
+ }
+
+ if (map_data) {
+
+ keys = '';
+ if (e.nodeName.toUpperCase()==='IMG') {
+ if (!m.queueCommand(map_data, $(e), 'set', [selected, key, opts])) {
+ if (key instanceof Array) {
+ if (key.length) {
+ keys = key.join(",");
+ }
+ }
+ else {
+ keys = key;
+ }
+
+ if (keys) {
+ $.each(u.split(keys), function (i,key) {
+ addArea(map_data.getDataForKey(key.toString()));
+ lastMap = map_data;
+ });
+ }
+ }
+ } else {
+ opts=key;
+ if (!m.queueCommand(map_data, $(e), 'set', [selected, opts])) {
+ addArea(map_data.getDataForArea(e));
+ lastMap = map_data;
+ }
+
+ }
+ }
+ });
+
+ if (map_data) {
+ finishSetForMap(map_data);
+ }
+
+
+ return this;
+ };
+ me.unbind = function (preserveState) {
+ return (new m.Method(this,
+ function () {
+ this.clearEvents();
+ this.clearMapData(preserveState);
+ removeMap(this);
+ },
+ null,
+ { name: 'unbind',
+ args: arguments
+ }
+ )).go();
+ };
+
+
+ // refresh options and update selection information.
+ me.rebind = function (options) {
+ return (new m.Method(this,
+ function () {
+ var me=this;
+
+ me.complete=false;
+ me.configureOptions(options);
+ me.bindImages().then(function() {
+ me.buildDataset(true);
+ me.complete=true;
+ });
+ //this.redrawSelections();
+ },
+ null,
+ {
+ name: 'rebind',
+ args: arguments
+ }
+ )).go();
+ };
+ // get options. nothing or false to get, or "true" to get effective options (versus passed options)
+ me.get_options = function (key, effective) {
+ var eff = u.isBool(key) ? key : effective; // allow 2nd parm as "effective" when no key
+ return (new m.Method(this,
+ function () {
+ var opts = $.extend({}, this.options);
+ if (eff) {
+ opts.render_select = u.updateProps(
+ {},
+ m.render_defaults,
+ opts,
+ opts.render_select);
+
+ opts.render_highlight = u.updateProps(
+ {},
+ m.render_defaults,
+ opts,
+ opts.render_highlight);
+ }
+ return opts;
+ },
+ function () {
+ return eff ? this.effectiveOptions() : this.options;
+ },
+ {
+ name: 'get_options',
+ args: arguments,
+ first: true,
+ allowAsync: true,
+ key: key
+ }
+ )).go();
+ };
+
+ // set options - pass an object with options to set,
+ me.set_options = function (options) {
+ return (new m.Method(this,
+ function () {
+ merge_options(this, options);
+ },
+ null,
+ {
+ name: 'set_options',
+ args: arguments
+ }
+ )).go();
+ };
+ me.unload = function () {
+ var i;
+ for (i = m.map_cache.length - 1; i >= 0; i--) {
+ if (m.map_cache[i]) {
+ me.unbind.call($(m.map_cache[i].image));
+ }
+ }
+ me.graphics = null;
+ };
+
+ me.snapshot = function () {
+ return (new m.Method(this,
+ function () {
+ $.each(this.data, function (i, e) {
+ e.selected = false;
+ });
+
+ this.base_canvas = this.graphics.createVisibleCanvas(this);
+ $(this.image).before(this.base_canvas);
+ },
+ null,
+ { name: 'snapshot' }
+ )).go();
+ };
+
+ // do not queue this function
+
+ me.state = function () {
+ var md, result = null;
+ $(this).each(function (i,e) {
+ if (e.nodeName === 'IMG') {
+ md = m.getMapData(e);
+ if (md) {
+ result = md.state();
+ }
+ return false;
+ }
+ });
+ return result;
+ };
+
+ me.bind = function (options) {
+
+ return this.each(function (i,e) {
+ var img, map, usemap, md;
+
+ // save ref to this image even if we can't access it yet. commands will be queued
+ img = $(e);
+
+ md = m.getMapData(e);
+
+ // if already bound completely, do a total rebind
+
+ if (md) {
+ me.unbind.apply(img);
+ if (!md.complete) {
+ // will be queued
+ img.bind();
+ return true;
+ }
+ md = null;
+ }
+
+ // ensure it's a valid image
+ // jQuery bug with Opera, results in full-url#usemap being returned from jQuery's attr.
+ // So use raw getAttribute instead.
+
+ usemap = this.getAttribute('usemap');
+ map = usemap && $('map[name="' + usemap.substr(1) + '"]');
+ if (!(img.is('img') && usemap && map.size() > 0)) {
+ return true;
+ }
+
+ // sorry - your image must have border:0, things are too unpredictable otherwise.
+ img.css('border', 0);
+
+ if (!md) {
+ md = new m.MapData(this, options);
+
+ md.index = addMap(md);
+ md.map = map;
+ md.bindImages().then(function() {
+ md.initialize();
+ });
+ }
+ });
+ };
+
+ me.init = function (useCanvas) {
+ var style, shapes;
+
+
+ // check for excanvas explicitly - don't be fooled
+ m.hasCanvas = (document.namespaces && document.namespaces.g_vml_) ? false :
+ $('')[0].getContext ? true : false;
+
+ m.isTouch = 'ontouchstart' in document.documentElement;
+
+ if (!(m.hasCanvas || document.namespaces)) {
+ $.fn.mapster = function () {
+ return this;
+ };
+ return;
+ }
+
+ $.extend(m.defaults, m.render_defaults,m.shared_defaults);
+ $.extend(m.area_defaults, m.render_defaults,m.shared_defaults);
+
+ // for testing/debugging, use of canvas can be forced by initializing manually with "true" or "false"
+ if (u.isBool(useCanvas)) {
+ m.hasCanvas = useCanvas;
+ }
+ if ($.browser.msie && !m.hasCanvas && !document.namespaces.v) {
+ document.namespaces.add("v", "urn:schemas-microsoft-com:vml");
+ style = document.createStyleSheet();
+ shapes = ['shape', 'rect', 'oval', 'circ', 'fill', 'stroke', 'imagedata', 'group', 'textbox'];
+ $.each(shapes,
+ function (i, el) {
+ style.addRule('v\\:' + el, "behavior: url(#default#VML); antialias:true");
+ });
+ }
+ };
+ me.test = function (obj) {
+ return eval(obj);
+ };
+ return me;
+ } ());
+
+ $.mapster.impl.init();
+} (jQuery));
+/* graphics.js
+ Graphics object handles all rendering.
+*/
+(function ($) {
+ var p, m=$.mapster,
+ u=m.utils;
+
+ /**
+ * Implemenation to add each area in an AreaData object to the canvas
+ * @param {Graphics} graphics The target graphics object
+ * @param {AreaData} areaData The AreaData object (a collection of area elements and metadata)
+ * @param {object} options Rendering options to apply when rendering this group of areas
+ */
+ function addShapeGroupImpl(graphics, areaData, options) {
+ var me = graphics,
+ md = me.map_data,
+ isMask = options.isMask;
+
+ // first get area options. Then override fade for selecting, and finally merge in the
+ // "select" effect options.
+
+ $.each(areaData.areas(), function (i,e) {
+ options.isMask = isMask || (e.nohref && md.options.noHrefIsMask);
+ me.addShape(e, options);
+ });
+
+ // it's faster just to manipulate the passed options isMask property and restore it, than to
+ // copy the object each time
+
+ options.isMask=isMask;
+
+ }
+
+
+ /**
+ * An object associated with a particular map_data instance to manage renderin.
+ * @param {MapData} map_data The MapData object bound to this instance
+ */
+
+ m.Graphics = function (map_data) {
+ //$(window).unload($.mapster.unload);
+ // create graphics functions for canvas and vml browsers. usage:
+ // 1) init with map_data, 2) call begin with canvas to be used (these are separate b/c may not require canvas to be specified
+ // 3) call add_shape_to for each shape or mask, 4) call render() to finish
+
+ var me = this;
+ me.active = false;
+ me.canvas = null;
+ me.width = 0;
+ me.height = 0;
+ me.shapes = [];
+ me.masks = [];
+ me.map_data = map_data;
+ };
+
+ p = m.Graphics.prototype= {
+ constructor: m.Graphics,
+
+ /**
+ * Initiate a graphics request for a canvas
+ * @param {Element} canvas The canvas element that is the target of this operation
+ * @param {string} [elementName] The name to assign to the element (VML only)
+ */
+
+ begin: function(canvas, elementName) {
+ var c = $(canvas);
+
+ this.elementName = elementName;
+ this.canvas = canvas;
+
+ this.width = c.width();
+ this.height = c.height();
+ this.shapes = [];
+ this.masks = [];
+ this.active = true;
+
+ },
+
+ /**
+ * Add an area to be rendered to this canvas.
+ * @param {MapArea} mapArea The MapArea object to render
+ * @param {object} options An object containing any rendering options that should override the
+ * defaults for the area
+ */
+
+ addShape: function(mapArea, options) {
+ var addto = options.isMask ? this.masks : this.shapes;
+ addto.push({ mapArea: mapArea, options: options });
+ },
+
+ /**
+ * Create a canvas that is sized and styled for the MapData object
+ * @param {MapData} mapData The MapData object that will receive this new canvas
+ * @return {Element} A canvas element
+ */
+
+ createVisibleCanvas: function (mapData) {
+ return $(this.createCanvasFor(mapData))
+ .addClass('mapster_el')
+ .css(m.canvas_style)[0];
+ },
+
+ /**
+ * Add a group of shapes from an AreaData object to the canvas
+ *
+ * @param {AreaData} areaData An AreaData object (a set of area elements)
+ * @param {string} mode The rendering mode, "select" or "highlight". This determines the target
+ * canvas and which default options to use.
+ * @param {striong} options Rendering options
+ */
+
+ addShapeGroup: function (areaData, mode,options) {
+ // render includeKeys first - because they could be masks
+ var me = this,
+ list, name, canvas,
+ map_data = this.map_data,
+ opts = areaData.effectiveRenderOptions(mode);
+
+ if (options) {
+ $.extend(opts,options);
+ }
+
+ if (mode === 'select') {
+ name = "static_" + areaData.areaId.toString();
+ canvas = map_data.base_canvas;
+ } else {
+ canvas = map_data.overlay_canvas;
+ }
+
+ me.begin(canvas, name);
+
+ if (opts.includeKeys) {
+ list = u.split(opts.includeKeys);
+ $.each(list, function (i,e) {
+ var areaData = map_data.getDataForKey(e.toString());
+ addShapeGroupImpl(me,areaData, areaData.effectiveRenderOptions(mode));
+ });
+ }
+
+ addShapeGroupImpl(me,areaData, opts);
+ me.render();
+ if (opts.fade) {
+
+ // fading requires special handling for IE. We must access the fill elements directly. The fader also has to deal with
+ // the "opacity" attribute (not css)
+
+ u.fader(m.hasCanvas ?
+ canvas :
+ $(canvas).find('._fill').not('.mapster_mask'),
+ 0,
+ m.hasCanvas ?
+ 1 :
+ opts.fillOpacity,
+ opts.fadeDuration);
+
+ }
+
+ }
+ };
+
+ // configure remaining prototype methods for ie or canvas-supporting browser
+
+ if (m.hasCanvas) {
+
+ /**
+ * Convert a hex value to decimal
+ * @param {string} hex A hexadecimal string
+ * @return {int} Integer represenation of the hex string
+ */
+
+ p.hex_to_decimal = function (hex) {
+ return Math.max(0, Math.min(parseInt(hex, 16), 255));
+ };
+
+ p.css3color = function (color, opacity) {
+ return 'rgba(' + this.hex_to_decimal(color.substr(0, 2)) + ','
+ + this.hex_to_decimal(color.substr(2, 2)) + ','
+ + this.hex_to_decimal(color.substr(4, 2)) + ',' + opacity + ')';
+ };
+
+ p.renderShape = function (context, mapArea, offset) {
+ var i,
+ c = mapArea.coords(null,offset);
+
+ switch (mapArea.shape) {
+ case 'rect':
+ context.rect(c[0], c[1], c[2] - c[0], c[3] - c[1]);
+ break;
+ case 'poly':
+ context.moveTo(c[0], c[1]);
+
+ for (i = 2; i < mapArea.length; i += 2) {
+ context.lineTo(c[i], c[i + 1]);
+ }
+ context.lineTo(c[0], c[1]);
+ break;
+ case 'circ':
+ case 'circle':
+ context.arc(c[0], c[1], c[2], 0, Math.PI * 2, false);
+ break;
+ }
+ };
+
+ p.addAltImage = function (context, image, mapArea, options) {
+ context.beginPath();
+
+ this.renderShape(context, mapArea);
+ context.closePath();
+ context.clip();
+
+ context.globalAlpha = options.altImageOpacity || options.fillOpacity;
+
+ context.drawImage(image, 0, 0, mapArea.owner.scaleInfo.width, mapArea.owner.scaleInfo.height);
+ };
+
+ p.render = function () {
+ // firefox 6.0 context.save() seems to be broken. to work around, we have to draw the contents on one temp canvas,
+ // the mask on another, and merge everything. ugh. fixed in 1.2.2. unfortunately this is a lot more code for masks,
+ // but no other way around it that i can see.
+
+ var maskCanvas, maskContext,
+ me = this,
+ md = me.map_data,
+ hasMasks = me.masks.length,
+ shapeCanvas = me.createCanvasFor(md),
+ shapeContext = shapeCanvas.getContext('2d'),
+ context = me.canvas.getContext('2d');
+
+ if (hasMasks) {
+ maskCanvas = me.createCanvasFor(md);
+ maskContext = maskCanvas.getContext('2d');
+ maskContext.clearRect(0, 0, maskCanvas.width, maskCanvas.height);
+
+ $.each(me.masks, function (i,e) {
+ maskContext.save();
+ maskContext.beginPath();
+ me.renderShape(maskContext, e.mapArea);
+ maskContext.closePath();
+ maskContext.clip();
+ maskContext.lineWidth = 0;
+ maskContext.fillStyle = '#000';
+ maskContext.fill();
+ maskContext.restore();
+ });
+
+ }
+
+ $.each(me.shapes, function (i,s) {
+ shapeContext.save();
+ if (s.options.fill) {
+ if (s.options.altImageId) {
+ me.addAltImage(shapeContext, md.images[s.options.altImageId], s.mapArea, s.options);
+ } else {
+ shapeContext.beginPath();
+ me.renderShape(shapeContext, s.mapArea);
+ shapeContext.closePath();
+ //shapeContext.clip();
+ shapeContext.fillStyle = me.css3color(s.options.fillColor, s.options.fillOpacity);
+ shapeContext.fill();
+ }
+ }
+ shapeContext.restore();
+ });
+
+
+ // render strokes at end since masks get stroked too
+
+ $.each(me.shapes.concat(me.masks), function (i,s) {
+ var offset = s.options.strokeWidth === 1 ? 0.5 : 0;
+ // offset applies only when stroke width is 1 and stroke would render between pixels.
+
+ if (s.options.stroke) {
+ shapeContext.save();
+ shapeContext.strokeStyle = me.css3color(s.options.strokeColor, s.options.strokeOpacity);
+ shapeContext.lineWidth = s.options.strokeWidth;
+
+ shapeContext.beginPath();
+
+ me.renderShape(shapeContext, s.mapArea, offset);
+ shapeContext.closePath();
+ shapeContext.stroke();
+ shapeContext.restore();
+ }
+ });
+
+ if (hasMasks) {
+ // render the new shapes against the mask
+
+ maskContext.globalCompositeOperation = "source-out";
+ maskContext.drawImage(shapeCanvas, 0, 0);
+
+ // flatten into the main canvas
+ context.drawImage(maskCanvas, 0, 0);
+ } else {
+ context.drawImage(shapeCanvas, 0, 0);
+ }
+
+ me.active = false;
+ return me.canvas;
+ };
+
+ // create a canvas mimicing dimensions of an existing element
+ p.createCanvasFor = function (md) {
+ return $('')[0];
+ };
+ p.clearHighlight = function () {
+ var c = this.map_data.overlay_canvas;
+ c.getContext('2d').clearRect(0, 0, c.width, c.height);
+ };
+ p.removeSelections = function () {
+
+ };
+ // Draw all items from selected_list to a new canvas, then swap with the old one. This is used to delete items when using canvases.
+ p.refreshSelections = function () {
+ var canvas_temp, map_data = this.map_data;
+ // draw new base canvas, then swap with the old one to avoid flickering
+ canvas_temp = map_data.base_canvas;
+
+ map_data.base_canvas = this.createVisibleCanvas(map_data);
+ $(map_data.base_canvas).hide();
+ $(canvas_temp).before(map_data.base_canvas);
+
+ map_data.redrawSelections();
+
+ $(map_data.base_canvas).show();
+ $(canvas_temp).remove();
+ };
+
+ } else {
+
+ /**
+ * Set the opacity of the element. This is an IE<8 specific function for handling VML.
+ * When using VML we must override the "setOpacity" utility function (monkey patch ourselves).
+ * jQuery does not deal with opacity correctly for VML elements. This deals with that.
+ *
+ * @param {Element} el The DOM element
+ * @param {double} opacity A value between 0 and 1 inclusive.
+ */
+
+ u.setOpacity = function(el,opacity) {
+ $(el).each(function(i,e) {
+ if (typeof e.opacity !=='undefined') {
+ e.opacity=opacity;
+ } else {
+ $(e).css("opacity",opacity);
+ }
+ });
+ };
+
+ p.renderShape = function (mapArea, options, cssclass) {
+ var me = this, fill,stroke, e, t_fill, el_name, el_class, template, c = mapArea.coords();
+ el_name = me.elementName ? 'name="' + me.elementName + '" ' : '';
+ el_class = cssclass ? 'class="' + cssclass + '" ' : '';
+
+ t_fill = '';
+
+
+ stroke = options.stroke ?
+ ' strokeweight=' + options.strokeWidth + ' stroked="t" strokecolor="#' +
+ options.strokeColor + '"' :
+ ' stroked="f"';
+
+ fill = options.fill ?
+ ' filled="t"' :
+ ' filled="f"';
+
+ switch (mapArea.shape) {
+ case 'rect':
+ template = '' + t_fill + '';
+ break;
+ case 'poly':
+ template = '' + t_fill + '';
+ break;
+ case 'circ':
+ case 'circle':
+ template = '' + t_fill + '';
+ break;
+ }
+ e = $(template);
+ $(me.canvas).append(e);
+
+ return e;
+ };
+ p.render = function () {
+ var opts, me = this;
+
+ $.each(this.shapes, function (i,e) {
+ me.renderShape(e.mapArea, e.options);
+ });
+
+ if (this.masks.length) {
+ $.each(this.masks, function (i,e) {
+ opts = u.updateProps({},
+ e.options, {
+ fillOpacity: 1,
+ fillColor: e.options.fillColorMask
+ });
+ me.renderShape(e.mapArea, opts, 'mapster_mask');
+ });
+ }
+
+ this.active = false;
+ return this.canvas;
+ };
+
+ p.createCanvasFor = function (md) {
+ var w = md.scaleInfo.width,
+ h = md.scaleInfo.height;
+ return $('')[0];
+ };
+
+ p.clearHighlight = function () {
+ $(this.map_data.overlay_canvas).children().remove();
+ };
+ // remove single or all selections
+ p.removeSelections = function (area_id) {
+ if (area_id >= 0) {
+ $(this.map_data.base_canvas).find('[name="static_' + area_id.toString() + '"]').remove();
+ }
+ else {
+ $(this.map_data.base_canvas).children().remove();
+ }
+ };
+ p.refreshSelections = function () {
+ return null;
+ };
+
+ }
+
+} (jQuery));
+/* mapimage.js
+ the MapImage object, repesents an instance of a single bound imagemap
+*/
+
+(function ($) {
+
+ var m = $.mapster,
+ u = m.utils,
+ ap=[];
+ /**
+ * An object encapsulating all the images used by a MapData.
+ */
+
+ m.MapImages = function(owner) {
+ this.owner = owner;
+ this.clear();
+ };
+
+
+ m.MapImages.prototype = {
+ constructor: m.MapImages,
+
+ /* interface to make this array-like */
+
+ slice: function() {
+ return ap.slice.apply(this,arguments);
+ },
+ splice: function() {
+ ap.slice.apply(this.status,arguments);
+ var result= ap.slice.apply(this,arguments);
+ return result;
+ },
+
+ /**
+ * a boolean value indicates whether all images are done loading
+ * @return {bool} true when all are done
+ */
+ complete: function() {
+ return $.inArray(false, this.status) < 0;
+ },
+
+ /**
+ * Save an image in the images array and return its index
+ * @param {Image} image An Image object
+ * @return {int} the index of the image
+ */
+
+ _add: function(image) {
+ var index = ap.push.call(this,image)-1;
+ this.status[index] = false;
+ return index;
+ },
+
+ /**
+ * Return the index of an Image within the images array
+ * @param {Image} img An Image
+ * @return {int} the index within the array, or -1 if it was not found
+ */
+
+ indexOf: function(image) {
+ return $.inArray(image, this);
+ },
+
+ /**
+ * Clear this object and reset it to its initial state after binding.
+ */
+
+ clear: function() {
+ var me=this;
+
+ if (me.ids && me.ids.length>0) {
+ $.each(me.ids,function(i,e) {
+ delete me[e];
+ });
+ }
+
+ /**
+ * A list of the cross-reference IDs bound to this object
+ * @type {string[]}
+ */
+
+ me.ids=[];
+
+ /**
+ * Length property for array-like behavior, set to zero when initializing. Array prototype
+ * methods will update it after that.
+ *
+ * @type {int}
+ */
+
+ me.length=0;
+
+ /**
+ * the loaded status of the corresponding image
+ * @type {boolean[]}
+ */
+
+ me.status=[];
+
+
+ // actually erase the images
+
+ me.splice(0);
+
+ },
+
+ /**
+ * Bind an image to the map and add it to the queue to be loaded; return an ID that
+ * can be used to reference the
+ *
+ * @param {Image|string} image An Image object or a URL to an image
+ * @param {string} [id] An id to refer to this image
+ * @returns {int} an ID referencing the index of the image object in
+ * map_data.images
+ */
+
+ add: function(image,id) {
+ var index,src,me = this;
+
+ if (!image) { return; }
+
+ if (typeof image === 'string') {
+ src = image;
+ image = me[src];
+ if (typeof image==='object') {
+ return me.indexOf(image);
+ }
+
+ image = $('')
+ .addClass('mapster_el')
+ .hide();
+
+ index=me._add(image[0]);
+
+ image
+ .bind('load',function(e) {
+ me.imageLoaded.call(me,e);
+ })
+ .bind('error',function(e) {
+ me.imageLoadError.call(me,e);
+ });
+
+ image.attr('src', src);
+ } else {
+
+ // use attr because we want the actual source, not the resolved path the browser will return directly calling image.src
+
+ index=me._add($(image)[0]);
+ }
+ if (id) {
+ if (this[id]) {
+ throw(id+" is already used or is not available as an altImage alias.");
+ }
+ me.ids.push(id);
+ me[id]=me[index];
+ }
+ return index;
+ },
+
+ /**
+ * Bind the images in this object,
+ * @param {boolean} retry when true, indicates that the function is calling itself after failure
+ * @return {Promise} a promise that resolves when the images have finished loading
+ */
+
+ bind: function(retry) {
+ var me = this,
+ promise,
+ triesLeft = me.owner.options.configTimeout / 200,
+
+ /* A recursive function to continue checking that the images have been
+ loaded until a timeout has elapsed */
+
+ check=function() {
+ var i;
+
+ // refresh status of images
+
+ i=me.length;
+
+ while (i-->0) {
+ if (!me.isLoaded(i)) {
+ break;
+ }
+ }
+
+ // check to see if every image has already been loaded
+
+ if (me.complete()) {
+ me.resolve();
+ } else {
+ // to account for failure of onLoad to fire in rare situations
+ if (triesLeft-- > 0) {
+ me.imgTimeout=window.setTimeout(function() {
+ check.call(me,true);
+ }, 50);
+ } else {
+ me.imageLoadError.call(me);
+ }
+ }
+
+ };
+
+ promise = me.deferred=u.defer();
+
+ check();
+ return promise;
+ },
+
+ resolve: function() {
+ var me=this,
+ resolver=me.deferred;
+
+ if (resolver) {
+ // Make a copy of the resolver before calling & removing it to ensure
+ // it is not called twice
+ me.deferred=null;
+ resolver.resolve();
+ }
+ },
+
+ /**
+ * Event handler for image onload
+ * @param {object} e jQuery event data
+ */
+
+ imageLoaded: function(e) {
+ var me=this,
+ index = me.indexOf(e.target);
+
+ if (index>=0) {
+
+ me.status[index] = true;
+ if ($.inArray(false, me.status) < 0) {
+ me.resolve();
+ }
+ }
+ },
+
+ /**
+ * Event handler for onload error
+ * @param {object} e jQuery event data
+ */
+
+ imageLoadError: function(e) {
+ clearTimeout(this.imgTimeout);
+ this.triesLeft=0;
+ var err = e ? 'The image ' + e.target.src + ' failed to load.' :
+ 'The images never seemed to finish loading. You may just need to increase the configTimeout if images could take a long time to load.';
+ throw err;
+ },
+ /**
+ * Test if the image at specificed index has finished loading
+ * @param {int} index The image index
+ * @return {boolean} true if loaded, false if not
+ */
+
+ isLoaded: function(index) {
+ var img,
+ me=this,
+ status=me.status;
+
+ if (status[index]) { return true; }
+ img = me[index];
+
+ if (typeof img.complete !== 'undefined') {
+ status[index]=img.complete;
+ } else {
+ status[index]=!!u.imgWidth(img);
+ }
+ // if complete passes, the image is loaded, but may STILL not be available because of stuff like adblock.
+ // make sure it is.
+
+ return status[index];
+ }
+ };
+ } (jQuery));
+/* mapdata.js
+ the MapData object, repesents an instance of a single bound imagemap
+*/
+
+
+(function ($) {
+
+ var m = $.mapster,
+ u = m.utils;
+
+ /**
+ * Set default values for MapData object properties
+ * @param {MapData} me The MapData object
+ */
+
+ function initializeDefaults(me) {
+ $.extend(me,{
+ complete: false, // (bool) when configuration is complete
+ map: null, // ($) the image map
+ base_canvas: null, // (canvas|var) where selections are rendered
+ overlay_canvas: null, // (canvas|var) where highlights are rendered
+ commands: [], // {} commands that were run before configuration was completed (b/c images weren't loaded)
+ data: [], // MapData[] area groups
+ mapAreas: [], // MapArea[] list. AreaData entities contain refs to this array, so options are stored with each.
+ _xref: {}, // (int) xref of mapKeys to data[]
+ highlightId: -1, // (int) the currently highlighted element.
+ currentAreaId: -1,
+ _tooltip_events: [], // {} info on events we bound to a tooltip container, so we can properly unbind them
+ scaleInfo: null, // {} info about the image size, scaling, defaults
+ index: -1, // index of this in map_cache - so we have an ID to use for wraper div
+ activeAreaEvent: null
+ });
+ }
+
+ /**
+ * Return an array of all image-containing options from an options object;
+ * that is, containers that may have an "altImage" property
+ *
+ * @param {object} obj An options object
+ * @return {object[]} An array of objects
+ */
+ function getOptionImages(obj) {
+ return [obj, obj.render_highlight, obj.render_select];
+ }
+
+ /**
+ * Parse all the altImage references, adding them to the library so they can be preloaded
+ * and aliased.
+ *
+ * @param {MapData} me The MapData object on which to operate
+ */
+ function configureAltImages(me)
+ {
+ var opts = me.options,
+ mi = me.images;
+
+ // add alt images
+
+ if ($.mapster.hasCanvas) {
+ // map altImage library first
+
+ $.each(opts.altImages || {}, function(i,e) {
+ mi.add(e,i);
+ });
+
+ // now find everything else
+
+ $.each([opts].concat(opts.areas),function(i,e) {
+ $.each(getOptionImages(e),function(i2,e2) {
+ if (e2 && e2.altImage) {
+ e2.altImageId=mi.add(e2.altImage);
+ }
+ });
+ });
+ }
+
+ // set area_options
+ me.area_options = u.updateProps({}, // default options for any MapArea
+ m.area_defaults,
+ opts);
+ }
+
+ /**
+ * Queue a mouse move action based on current delay settings
+ * (helper for mouseover/mouseout handlers)
+ *
+ * @param {MapData} me The MapData context
+ * @param {number} delay The number of milliseconds to delay the action
+ * @param {AreaData} area AreaData affected
+ * @param {Deferred} deferred A deferred object to return (instead of a new one)
+ * @return {Promise} A promise that resolves when the action is completed
+ */
+ function queueMouseEvent(me,delay,area, deferred) {
+
+ deferred = deferred || u.when.defer();
+
+ function cbFinal(areaId) {
+ if (me.currentAreaId!==areaId && me.highlightId>=0) {
+ deferred.resolve();
+ }
+ }
+ if (me.activeAreaEvent) {
+ window.clearTimeout(me.activeAreaEvent);
+ me.activeAreaEvent=0;
+ }
+ if (delay<0) {
+ return;
+ }
+
+ if (area.owner.currentAction || delay) {
+ me.activeAreaEvent = window.setTimeout((function() {
+ return function() {
+ queueMouseEvent(me,0,area,deferred);
+ };
+ }(area)),
+ delay || 100);
+ } else {
+ cbFinal(area.areaId);
+ }
+ return deferred;
+ }
+
+ /**
+ * Mousedown event. This is captured only to prevent browser from drawing an outline around an
+ * area when it's clicked.
+ *
+ * @param {EventData} e jQuery event data
+ */
+
+ function mousedown(e) {
+ if (!$.mapster.hasCanvas) {
+ this.blur();
+ }
+ e.preventDefault();
+ }
+
+ /**
+ * Mouseover event. Handle highlight rendering and client callback on mouseover
+ *
+ * @param {MapData} me The MapData context
+ * @param {EventData} e jQuery event data
+ * @return {[type]} [description]
+ */
+
+ function mouseover(me,e) {
+ var arData = me.getAllDataForArea(this),
+ ar=arData.length ? arData[0] : null;
+
+ // mouseover events are ignored entirely while resizing, though we do care about mouseout events
+ // and must queue the action to keep things clean.
+
+ if (!ar || ar.isNotRendered() || ar.owner.currentAction) {
+ return;
+ }
+
+ if (me.currentAreaId === ar.areaId) {
+ return;
+ }
+ if (me.highlightId !== ar.areaId) {
+ me.clearEffects();
+
+ ar.highlight();
+
+ if (me.options.showToolTip) {
+ $.each(arData,function(i,e) {
+ if (e.effectiveOptions().toolTip) {
+ e.showToolTip();
+ }
+ });
+ }
+ }
+
+ me.currentAreaId = ar.areaId;
+
+ if ($.isFunction(me.options.onMouseover)) {
+ me.options.onMouseover.call(this,
+ {
+ e: e,
+ options:ar.effectiveOptions(),
+ key: ar.key,
+ selected: ar.isSelected()
+ });
+ }
+ }
+
+ /**
+ * Mouseout event.
+ *
+ * @param {MapData} me The MapData context
+ * @param {EventData} e jQuery event data
+ * @return {[type]} [description]
+ */
+
+ function mouseout(me,e) {
+ var newArea,
+ ar = me.getDataForArea(this),
+ opts = me.options;
+
+
+ if (me.currentAreaId<0 || !ar) {
+ return;
+ }
+
+ newArea=me.getDataForArea(e.relatedTarget);
+
+ if (newArea === ar) {
+ return;
+ }
+
+ me.currentAreaId = -1;
+ ar.area=null;
+
+ queueMouseEvent(me,opts.mouseoutDelay,ar)
+ .then(me.clearEffects);
+
+ if ($.isFunction(opts.onMouseout)) {
+ opts.onMouseout.call(this,
+ {
+ e: e,
+ options: opts,
+ key: ar.key,
+ selected: ar.isSelected()
+ });
+ }
+
+ }
+
+ /**
+ * Clear any active tooltip or highlight
+ *
+ * @param {MapData} me The MapData context
+ * @param {EventData} e jQuery event data
+ * @return {[type]} [description]
+ */
+
+ function clearEffects(me) {
+ var opts = me.options;
+
+ me.ensureNoHighlight();
+
+ if (opts.toolTipClose
+ && $.inArray('area-mouseout', opts.toolTipClose) >= 0
+ && me.activeToolTip)
+ {
+ me.clearToolTip();
+ }
+ }
+
+ /**
+ * Mouse click event handler
+ *
+ * @param {MapData} me The MapData context
+ * @param {EventData} e jQuery event data
+ * @return {[type]} [description]
+ */
+
+ function click(me,e) {
+ var selected, list, list_target, newSelectionState, canChangeState, cbResult,
+ that = this,
+ ar = me.getDataForArea(this),
+ opts = me.options;
+
+ function clickArea(ar) {
+ var areaOpts,target;
+ canChangeState = (ar.isSelectable() &&
+ (ar.isDeselectable() || !ar.isSelected()));
+
+ if (canChangeState) {
+ newSelectionState = !ar.isSelected();
+ } else {
+ newSelectionState = ar.isSelected();
+ }
+
+ list_target = m.getBoundList(opts, ar.key);
+
+ if ($.isFunction(opts.onClick))
+ {
+ cbResult= opts.onClick.call(that,
+ {
+ e: e,
+ listTarget: list_target,
+ key: ar.key,
+ selected: newSelectionState
+ });
+
+ if (u.isBool(cbResult)) {
+ if (!cbResult) {
+ return false;
+ }
+ target = $(ar.area).attr('href');
+ if (target!=='#') {
+ window.location.href=target;
+ return false;
+ }
+ }
+ }
+
+ if (canChangeState) {
+ selected = ar.toggle();
+ }
+
+ if (opts.boundList && opts.boundList.length > 0) {
+ m.setBoundListProperties(opts, list_target, ar.isSelected());
+ }
+
+ areaOpts = ar.effectiveOptions();
+ if (areaOpts.includeKeys) {
+ list = u.split(areaOpts.includeKeys);
+ $.each(list, function (i, e) {
+ var ar = me.getDataForKey(e.toString());
+ if (!ar.options.isMask) {
+ clickArea(ar);
+ }
+ });
+ }
+ }
+
+ mousedown.call(this,e);
+
+ if (opts.clickNavigate && ar.href) {
+ window.location.href=ar.href;
+ return;
+ }
+
+ if (ar && !ar.owner.currentAction) {
+ opts = me.options;
+ clickArea(ar);
+ }
+ }
+
+ /**
+ * Prototype for a MapData object, representing an ImageMapster bound object
+ * @param {Element} image an IMG element
+ * @param {object} options ImageMapster binding options
+ */
+ m.MapData = function (image, options)
+ {
+ var me = this;
+
+ // (Image) main map image
+
+ me.image = image;
+
+ me.images = new m.MapImages(me);
+ me.graphics = new m.Graphics(me);
+
+ // save the initial style of the image for unbinding. This is problematic, chrome
+ // duplicates styles when assigning, and cssText is apparently not universally supported.
+ // Need to do something more robust to make unbinding work universally.
+
+ me.imgCssText = image.style.cssText || null;
+
+ initializeDefaults(me);
+
+ me.configureOptions(options);
+
+ // create context-bound event handlers from our private functions
+
+ me.mouseover = function(e) { mouseover.call(this,me,e); };
+ me.mouseout = function(e) { mouseout.call(this,me,e); };
+ me.click = function(e) { click.call(this,me,e); };
+ me.clearEffects = function(e) { clearEffects.call(this,me,e); };
+ };
+
+ m.MapData.prototype = {
+ constructor: m.MapData,
+
+ /**
+ * Set target.options from defaults + options
+ * @param {[type]} target The target
+ * @param {[type]} options The options to merge
+ */
+
+ configureOptions: function(options) {
+ this.options= u.updateProps({}, m.defaults, options);
+ },
+
+ /**
+ * Ensure all images are loaded
+ * @return {Promise} A promise that resolves when the images have finished loading (or fail)
+ */
+
+ bindImages: function() {
+ var me=this,
+ mi = me.images;
+
+ // reset the images if this is a rebind
+
+ if (mi.length>2) {
+ mi.splice(2);
+ } else if (mi.length===0) {
+
+ // add the actual main image
+ mi.add(me.image);
+ // will create a duplicate of the main image, we need this to get raw size info
+ mi.add(me.image.src);
+ }
+
+ configureAltImages(me);
+
+ return me.images.bind();
+ },
+
+ /**
+ * Test whether an async action is currently in progress
+ * @return {Boolean} true or false indicating state
+ */
+
+ isActive: function() {
+ return !this.complete || this.currentAction;
+ },
+
+ /**
+ * Return an object indicating the various states. This isn't really used by
+ * production code.
+ *
+ * @return {object} An object with properties for various states
+ */
+
+ state: function () {
+ return {
+ complete: this.complete,
+ resizing: this.currentAction==='resizing',
+ zoomed: this.zoomed,
+ zoomedArea: this.zoomedArea,
+ scaleInfo: this.scaleInfo
+ };
+ },
+
+ /**
+ * Get a unique ID for the wrapper of this imagemapster
+ * @return {string} A string that is unique to this image
+ */
+
+ wrapId: function () {
+ return 'mapster_wrap_' + this.index;
+ },
+ _idFromKey: function (key) {
+ return typeof key === "string" && this._xref.hasOwnProperty(key) ?
+ this._xref[key] : -1;
+ },
+
+ /**
+ * Return a comma-separated string of all selected keys
+ * @return {string} CSV of all keys that are currently selected
+ */
+
+ getSelected: function () {
+ var result = '';
+ $.each(this.data, function (i,e) {
+ if (e.isSelected()) {
+ result += (result ? ',' : '') + this.key;
+ }
+ });
+ return result;
+ },
+
+ /**
+ * Get an array of MapAreas associated with a specific AREA based on the keys for that area
+ * @param {Element} area An HTML AREA
+ * @param {number} atMost A number limiting the number of areas to be returned (typically 1 or 0 for no limit)
+ * @return {MapArea[]} Array of MapArea objects
+ */
+
+ getAllDataForArea:function (area,atMost) {
+ var i,ar, result,
+ me=this,
+ key = $(area).filter('area').attr(me.options.mapKey);
+
+ if (key) {
+ result=[];
+ key = u.split(key);
+
+ for (i=0;i<(atMost || key.length);i++) {
+ ar = me.data[me._idFromKey(key[i])];
+ ar.area=area.length ? area[0]:area;
+ // set the actual area moused over/selected
+ // TODO: this is a brittle model for capturing which specific area - if this method was not used,
+ // ar.area could have old data. fix this.
+ result.push(ar);
+ }
+ }
+
+ return result;
+ },
+ getDataForArea: function(area) {
+ var ar=this.getAllDataForArea(area,1);
+ return ar ? ar[0] || null : null;
+ },
+ getDataForKey: function (key) {
+ return this.data[this._idFromKey(key)];
+ },
+
+ /**
+ * Get the primary keys associated with an area group.
+ * If this is a primary key, it will be returned.
+ *
+ * @param {string key An area key
+ * @return {string} A CSV of area keys
+ */
+
+ getKeysForGroup: function(key) {
+ var ar=this.getDataForKey(key);
+
+ return !ar ? '':
+ ar.isPrimary ?
+ ar.key :
+ this.getPrimaryKeysForMapAreas(ar.areas()).join(',');
+ },
+
+ /**
+ * given an array of MapArea object, return an array of its unique primary keys
+ * @param {MapArea[]} areas The areas to analyze
+ * @return {string[]} An array of unique primary keys
+ */
+
+ getPrimaryKeysForMapAreas: function(areas)
+ {
+ var keys=[];
+ $.each(areas,function(i,e) {
+ if ($.inArray(e.keys[0],keys)<0) {
+ keys.push(e.keys[0]);
+ }
+ });
+ return keys;
+ },
+ getData: function (obj) {
+ if (typeof obj === 'string') {
+ return this.getDataForKey(obj);
+ } else if (obj && obj.mapster || u.isElement(obj)) {
+ return this.getDataForArea(obj);
+ } else {
+ return null;
+ }
+ },
+ // remove highlight if present, raise event
+ ensureNoHighlight: function () {
+ var ar;
+ if (this.highlightId >= 0) {
+ this.graphics.clearHighlight();
+ ar = this.data[this.highlightId];
+ ar.changeState('highlight', false);
+ this.setHighlightId(-1);
+ }
+ },
+ setHighlightId: function(id) {
+ this.highlightId = id;
+ },
+
+ /**
+ * Clear all active selections on this map
+ */
+
+ clearSelections: function () {
+ $.each(this.data, function (i,e) {
+ if (e.selected) {
+ e.deselect(true);
+ }
+ });
+ this.removeSelectionFinish();
+
+ },
+
+ /**
+ * Set area options from an array of option data.
+ *
+ * @param {object[]} areas An array of objects containing area-specific options
+ */
+
+ setAreaOptions: function (areas) {
+ var i, area_options, ar;
+ areas = areas || [];
+
+ // refer by: map_data.options[map_data.data[x].area_option_id]
+
+ for (i = areas.length - 1; i >= 0; i--) {
+ area_options = areas[i];
+ if (area_options) {
+ ar = this.getDataForKey(area_options.key);
+ if (ar) {
+ u.updateProps(ar.options, area_options);
+
+ // TODO: will not deselect areas that were previously selected, so this only works
+ // for an initial bind.
+
+ if (u.isBool(area_options.selected)) {
+ ar.selected = area_options.selected;
+ }
+ }
+ }
+ }
+ },
+ // keys: a comma-separated list
+ drawSelections: function (keys) {
+ var i, key_arr = u.asArray(keys);
+
+ for (i = key_arr.length - 1; i >= 0; i--) {
+ this.data[key_arr[i]].drawSelection();
+ }
+ },
+ redrawSelections: function () {
+ $.each(this.data, function (i, e) {
+ if (e.isSelectedOrStatic()) {
+ e.drawSelection();
+ }
+ });
+
+ },
+ ///called when images are done loading
+ initialize: function () {
+ var imgCopy, base_canvas, overlay_canvas, wrap, parentId, css, i,size,
+ img,sort_func, sorted_list, scale,
+ me = this,
+ opts = me.options;
+
+ if (me.complete) {
+ return;
+ }
+
+ img = $(me.image);
+
+ parentId = img.parent().attr('id');
+
+ // create a div wrapper only if there's not already a wrapper, otherwise, own it
+
+ if (parentId && parentId.length >= 12 && parentId.substring(0, 12) === "mapster_wrap") {
+ wrap = img.parent();
+ wrap.attr('id', me.wrapId());
+ } else {
+ wrap = $('');
+
+ if (opts.wrapClass) {
+ if (opts.wrapClass === true) {
+ wrap.addClass(img[0].className);
+ }
+ else {
+ wrap.addClass(opts.wrapClass);
+ }
+ }
+ }
+ me.wrapper = wrap;
+
+ // me.images[1] is the copy of the original image. It should be loaded & at its native size now so we can obtain the true
+ // width & height. This is needed to scale the imagemap if not being shown at its native size. It is also needed purely
+ // to finish binding in case the original image was not visible. It can be impossible in some browsers to obtain the
+ // native size of a hidden image.
+
+ me.scaleInfo = scale = u.scaleMap(me.images[0],me.images[1], opts.scaleMap);
+
+ me.base_canvas = base_canvas = me.graphics.createVisibleCanvas(me);
+ me.overlay_canvas = overlay_canvas = me.graphics.createVisibleCanvas(me);
+
+ // Now we got what we needed from the copy -clone from the original image again to make sure any other attributes are copied
+ imgCopy = $(me.images[1])
+ .addClass('mapster_el '+ me.images[0].className)
+ .attr({id:null, usemap: null});
+
+ size=u.size(me.images[0]);
+
+ if (size.complete) {
+ imgCopy.css({
+ width: size.width,
+ height: size.height
+ });
+ }
+
+ me.buildDataset();
+
+ // now that we have processed all the areas, set css for wrapper, scale map if needed
+
+ css = {
+ display: 'block',
+ position: 'relative',
+ padding: 0,
+ width: scale.width,
+ height: scale.height
+ };
+
+ if (opts.wrapCss) {
+ $.extend(css, opts.wrapCss);
+ }
+ // if we were rebinding with an existing wrapper, the image will aready be in it
+ if (img.parent()[0] !== me.wrapper[0]) {
+
+ img.before(me.wrapper);
+ }
+
+ wrap.css(css);
+
+ // move all generated images into the wrapper for easy removal later
+
+ $(me.images.slice(2)).hide();
+ for (i = 1; i < me.images.length; i++) {
+ wrap.append(me.images[i]);
+ }
+
+ //me.images[1].style.cssText = me.image.style.cssText;
+
+ wrap.append(base_canvas)
+ .append(overlay_canvas)
+ .append(img.css(m.canvas_style));
+
+ // images[0] is the original image with map, images[1] is the copy/background that is visible
+
+ u.setOpacity(me.images[0], 0);
+ $(me.images[1]).show();
+
+ u.setOpacity(me.images[1],1);
+
+ if (opts.isSelectable && opts.onGetList) {
+ sorted_list = me.data.slice(0);
+ if (opts.sortList) {
+ if (opts.sortList === "desc") {
+ sort_func = function (a, b) {
+ return a === b ? 0 : (a > b ? -1 : 1);
+ };
+ }
+ else {
+ sort_func = function (a, b) {
+ return a === b ? 0 : (a < b ? -1 : 1);
+ };
+ }
+
+ sorted_list.sort(function (a, b) {
+ a = a.value;
+ b = b.value;
+ return sort_func(a, b);
+ });
+ }
+
+ me.options.boundList = opts.onGetList.call(me.image, sorted_list);
+ }
+
+ me.complete=true;
+ me.processCommandQueue();
+
+ if (opts.onConfigured && typeof opts.onConfigured === 'function') {
+ opts.onConfigured.call(img, true);
+ }
+ },
+
+ // when rebind is true, the MapArea data will not be rebuilt.
+ buildDataset: function(rebind) {
+ var sel,areas,j,area_id,$area,area,curKey,mapArea,key,keys,mapAreaId,group_value,dataItem,href,
+ me=this,
+ opts=me.options,
+ default_group;
+
+ function addAreaData(key, value) {
+ var dataItem = new m.AreaData(me, key, value);
+ dataItem.areaId = me._xref[key] = me.data.push(dataItem) - 1;
+ return dataItem.areaId;
+ }
+
+ me._xref = {};
+ me.data = [];
+ if (!rebind) {
+ me.mapAreas=[];
+ }
+
+ default_group = !opts.mapKey;
+ if (default_group) {
+ opts.mapKey = 'data-mapster-key';
+ }
+ sel = ($.browser.msie && $.browser.version <= 7) ? 'area' :
+ (default_group ? 'area[coords]' : 'area[' + opts.mapKey + ']');
+ areas = $(me.map).find(sel).unbind('.mapster');
+
+ for (mapAreaId = 0;mapAreaId= 0; j--) {
+ key = keys[j];
+
+ if (opts.mapValue) {
+ group_value = $area.attr(opts.mapValue);
+ }
+ if (default_group) {
+ // set an attribute so we can refer to the area by index from the DOM object if no key
+ area_id = addAreaData(me.data.length, group_value);
+ dataItem = me.data[area_id];
+ dataItem.key = key = area_id.toString();
+ }
+ else {
+ area_id = me._xref[key];
+ if (area_id >= 0) {
+ dataItem = me.data[area_id];
+ if (group_value && !me.data[area_id].value) {
+ dataItem.value = group_value;
+ }
+ }
+ else {
+ area_id = addAreaData(key, group_value);
+ dataItem = me.data[area_id];
+ dataItem.isPrimary=j===0;
+ }
+ }
+ mapArea.areaDataXref.push(area_id);
+ dataItem.areasXref.push(mapAreaId);
+ }
+
+ href=$area.attr('href');
+ if (href && href!=='#' && !dataItem.href)
+ {
+ dataItem.href=href;
+ }
+
+ if (!mapArea.nohref) {
+ $area.bind('click.mapster', me.click);
+
+ if (!m.isTouch) {
+ $area.bind('mouseover.mapster', me.mouseover)
+ .bind('mouseout.mapster', me.mouseout)
+ .bind('mousedown.mapster', me.mousedown);
+
+ }
+
+ }
+
+ // store an ID with each area.
+ $area.data("mapster", mapAreaId+1);
+ }
+
+ // TODO listenToList
+ // if (opts.listenToList && opts.nitG) {
+ // opts.nitG.bind('click.mapster', event_hooks[map_data.hooks_index].listclick_hook);
+ // }
+
+ // populate areas from config options
+ me.setAreaOptions(opts.areas);
+ me.redrawSelections();
+
+ },
+ processCommandQueue: function() {
+
+ var cur,me=this;
+ while (!me.currentAction && me.commands.length) {
+ cur = me.commands[0];
+ me.commands.splice(0,1);
+ m.impl[cur.command].apply(cur.that, cur.args);
+ }
+ },
+ clearEvents: function () {
+ $(this.map).find('area')
+ .unbind('.mapster');
+ $(this.images)
+ .unbind('.mapster');
+ },
+ _clearCanvases: function (preserveState) {
+ // remove the canvas elements created
+ if (!preserveState) {
+ $(this.base_canvas).remove();
+ }
+ $(this.overlay_canvas).remove();
+ },
+ clearMapData: function (preserveState) {
+ var me = this;
+ this._clearCanvases(preserveState);
+
+ // release refs to DOM elements
+ $.each(this.data, function (i, e) {
+ e.reset();
+ });
+ this.data = null;
+ if (!preserveState) {
+ // get rid of everything except the original image
+ this.image.style.cssText = this.imgCssText;
+ $(this.wrapper).before(this.image).remove();
+ }
+
+ me.images.clear();
+
+ this.image = null;
+ u.ifFunction(this.clearTooltip, this);
+ },
+
+ // Compelete cleanup process for deslecting items. Called after a batch operation, or by AreaData for single
+ // operations not flagged as "partial"
+
+ removeSelectionFinish: function () {
+ var g = this.graphics;
+
+ g.refreshSelections();
+ // do not call ensure_no_highlight- we don't really want to unhilight it, just remove the effect
+ g.clearHighlight();
+ }
+ };
+} (jQuery));
+/* areadata.js
+ AreaData and MapArea protoypes
+*/
+
+(function ($) {
+ var m = $.mapster, u = m.utils;
+
+ /**
+ * Select this area
+ *
+ * @param {AreaData} me AreaData context
+ * @param {object} options Options for rendering the selection
+ */
+ function select(options) {
+ // need to add the new one first so that the double-opacity effect leaves the current one highlighted for singleSelect
+
+ var me=this, o = me.owner;
+ if (o.options.singleSelect) {
+ o.clearSelections();
+ }
+
+ // because areas can overlap - we can't depend on the selection state to tell us anything about the inner areas.
+ // don't check if it's already selected
+ if (!me.isSelected()) {
+ if (options) {
+
+ // cache the current options, and map the altImageId if an altimage
+ // was passed
+
+ me.optsCache = $.extend(me.effectiveRenderOptions('select'),
+ options,
+ {
+ altImageId: o.images.add(options.altImage)
+ });
+ }
+
+ me.drawSelection();
+
+ me.selected = true;
+ me.changeState('select', true);
+ }
+
+ if (o.options.singleSelect) {
+ o.graphics.refreshSelections();
+ }
+ }
+
+ /**
+ * Deselect this area, optionally deferring finalization so additional areas can be deselected
+ * in a single operation
+ *
+ * @param {boolean} partial when true, the caller must invoke "finishRemoveSelection" to render
+ */
+
+ function deselect(partial) {
+ var me=this;
+ me.selected = false;
+ me.changeState('select', false);
+
+ // release information about last area options when deselecting.
+
+ me.optsCache=null;
+ me.owner.graphics.removeSelections(me.areaId);
+
+ // Complete selection removal process. This is separated because it's very inefficient to perform the whole
+ // process for multiple removals, as the canvas must be totally redrawn at the end of the process.ar.remove
+
+ if (!partial) {
+ me.owner.removeSelectionFinish();
+ }
+ }
+
+ /**
+ * Toggle the selection state of this area
+ * @param {object} options Rendering options, if toggling on
+ * @return {bool} The new selection state
+ */
+ function toggle(options) {
+ var me=this;
+ if (!me.isSelected()) {
+ me.select(options);
+ }
+ else {
+ me.deselect();
+ }
+ return me.isSelected();
+ }
+
+ /**
+ * An AreaData object; represents a conceptual area that can be composed of
+ * one or more MapArea objects
+ *
+ * @param {MapData} owner The MapData object to which this belongs
+ * @param {string} key The key for this area
+ * @param {string} value The mapValue string for this area
+ */
+
+ m.AreaData = function (owner, key, value) {
+ $.extend(this,{
+ owner: owner,
+ key: key || '',
+ // means this represents the first key in a list of keys (it's the area group that gets highlighted on mouseover)
+ isPrimary: true,
+ areaId: -1,
+ href: '',
+ value: value || '',
+ options:{},
+ // "null" means unchanged. Use "isSelected" method to just test true/false
+ selected: null,
+ // xref to MapArea objects
+ areasXref: [],
+ // (temporary storage) - the actual area moused over
+ area: null,
+ // the last options used to render this. Cache so when re-drawing after a remove, changes in options won't
+ // break already selected things.
+ optsCache: null
+ });
+ };
+
+ /**
+ * The public API for AreaData object
+ */
+
+ m.AreaData.prototype = {
+ constuctor: m.AreaData,
+ select: select,
+ deselect: deselect,
+ toggle: toggle,
+ areas: function() {
+ var i,result=[];
+ for (i=0;i= 0; j -= 2) {
+ curX = coords[j];
+ curY = coords[j + 1];
+
+ if (curX < minX) {
+ minX = curX;
+ bestMaxY = curY;
+ }
+ if (curX > maxX) {
+ maxX = curX;
+ bestMinY = curY;
+ }
+ if (curY < minY) {
+ minY = curY;
+ bestMaxX = curX;
+ }
+ if (curY > maxY) {
+ maxY = curY;
+ bestMinX = curX;
+ }
+
+ }
+
+ // try to figure out the best place for the tooltip
+
+ if (width && height) {
+ found=false;
+ $.each([[bestMaxX - width, minY - height], [bestMinX, minY - height],
+ [minX - width, bestMaxY - height], [minX - width, bestMinY],
+ [maxX,bestMaxY - height], [ maxX,bestMinY],
+ [bestMaxX - width, maxY], [bestMinX, maxY]
+ ],function (i, e) {
+ if (!found && (e[0] > rootx && e[1] > rooty)) {
+ nest = e;
+ found=true;
+ return false;
+ }
+ });
+
+ // default to lower-right corner if nothing fit inside the boundaries of the image
+
+ if (!found) {
+ nest=[maxX,maxY];
+ }
+ }
+ return nest;
+ };
+} (jQuery));
+/* scale.js: resize and zoom functionality
+ requires areacorners.js, when.js
+*/
+
+
+(function ($) {
+ var m = $.mapster, u = m.utils, p = m.MapArea.prototype;
+
+ m.utils.getScaleInfo = function (eff, actual) {
+ var pct;
+ if (!actual) {
+ pct = 1;
+ actual=eff;
+ } else {
+ pct = eff.width / actual.width || eff.height / actual.height;
+ // make sure a float error doesn't muck us up
+ if (pct > 0.98 && pct < 1.02) { pct = 1; }
+ }
+ return {
+ scale: (pct !== 1),
+ scalePct: pct,
+ realWidth: actual.width,
+ realHeight: actual.height,
+ width: eff.width,
+ height: eff.height,
+ ratio: eff.width / eff.height
+ };
+ };
+ // Scale a set of AREAs, return old data as an array of objects
+ m.utils.scaleMap = function (image, imageRaw, scale) {
+
+ // stunningly, jQuery width can return zero even as width does not, seems to happen only
+ // with adBlock or maybe other plugins. These must interfere with onload events somehow.
+
+
+ var vis=u.size(image),
+ raw=u.size(imageRaw,true);
+
+ if (!raw.complete()) {
+ throw("Another script, such as an extension, appears to be interfering with image loading. Please let us know about this.");
+ }
+ if (!vis.complete()) {
+ vis=raw;
+ }
+ return this.getScaleInfo(vis, scale ? raw : null);
+ };
+
+ /**
+ * Resize the image map. Only one of newWidth and newHeight should be passed to preserve scale
+ *
+ * @param {int} width The new width OR an object containing named parameters matching this function sig
+ * @param {int} height The new height
+ * @param {int} effectDuration Time in ms for the resize animation, or zero for no animation
+ * @param {function} callback A function to invoke when the operation finishes
+ * @return {promise} NOT YET IMPLEMENTED
+ */
+
+ m.MapData.prototype.resize = function (width, height, duration, callback) {
+ var p,promises,newsize,els, highlightId, ratio,
+ me = this;
+
+ // allow omitting duration
+ callback = callback || duration;
+
+ function sizeCanvas(canvas, w, h) {
+ if ($.mapster.hasCanvas) {
+ canvas.width = w;
+ canvas.height = h;
+ } else {
+ $(canvas).width(w);
+ $(canvas).height(h);
+ }
+ }
+
+ // Finalize resize action, do callback, pass control to command queue
+
+ function cleanupAndNotify() {
+
+ me.currentAction = '';
+
+ if ($.isFunction(callback)) {
+ callback();
+ }
+
+ me.processCommandQueue();
+ }
+
+ // handle cleanup after the inner elements are resized
+
+ function finishResize() {
+ sizeCanvas(me.overlay_canvas, width, height);
+
+ // restore highlight state if it was highlighted before
+ if (highlightId >= 0) {
+ var areaData = me.data[highlightId];
+ areaData.tempOptions = { fade: false };
+ me.getDataForKey(areaData.key).highlight();
+ areaData.tempOptions = null;
+ }
+ sizeCanvas(me.base_canvas, width, height);
+ me.redrawSelections();
+ cleanupAndNotify();
+ }
+
+ function resizeMapData() {
+ $(me.image).css(newsize);
+ // start calculation at the same time as effect
+ me.scaleInfo = u.getScaleInfo({
+ width: width,
+ height: height
+ },
+ {
+ width: me.scaleInfo.realWidth,
+ height: me.scaleInfo.realHeight
+ });
+ $.each(me.data, function (i, e) {
+ $.each(e.areas(), function (i, e) {
+ e.resize();
+ });
+ });
+ }
+
+ if (me.scaleInfo.width === width && me.scaleInfo.height === height) {
+ return;
+ }
+
+ highlightId = me.highlightId;
+
+
+ if (!width) {
+ ratio = height / me.scaleInfo.realHeight;
+ width = Math.round(me.scaleInfo.realWidth * ratio);
+ }
+ if (!height) {
+ ratio = width / me.scaleInfo.realWidth;
+ height = Math.round(me.scaleInfo.realHeight * ratio);
+ }
+
+ newsize = { 'width': String(width) + 'px', 'height': String(height) + 'px' };
+ if (!$.mapster.hasCanvas) {
+ $(me.base_canvas).children().remove();
+ }
+
+ // resize all the elements that are part of the map except the image itself (which is not visible)
+ // but including the div wrapper
+ els = $(me.wrapper).find('.mapster_el').add(me.wrapper);
+
+ if (duration) {
+ promises = [];
+ me.currentAction = 'resizing';
+ els.each(function (i, e) {
+ p = u.defer();
+ promises.push(p);
+
+ $(e).animate(newsize, {
+ duration: duration,
+ complete: p.resolve,
+ easing: "linear"
+ });
+ });
+
+ p = u.defer();
+ promises.push(p);
+
+ // though resizeMapData is not async, it needs to be finished just the same as the animations,
+ // so add it to the "to do" list.
+
+ u.when.all(promises).then(finishResize);
+ resizeMapData();
+ p.resolve();
+ } else {
+ els.css(newsize);
+ resizeMapData();
+ finishResize();
+
+ }
+ };
+
+
+ m.MapArea = u.subclass(m.MapArea, function () {
+ //change the area tag data if needed
+ this.base.init();
+ if (this.owner.scaleInfo.scale) {
+ this.resize();
+ }
+ });
+
+ p.coords = function (percent, coordOffset) {
+ var j, newCoords = [],
+ pct = percent || this.owner.scaleInfo.scalePct,
+ offset = coordOffset || 0;
+
+ if (pct === 1 && coordOffset === 0) {
+ return this.originalCoords;
+ }
+
+ for (j = 0; j < this.length; j++) {
+ //amount = j % 2 === 0 ? xPct : yPct;
+ newCoords.push(Math.round(this.originalCoords[j] * pct) + offset);
+ }
+ return newCoords;
+ };
+ p.resize = function () {
+ this.area.coords = this.coords().join(',');
+ };
+
+ p.reset = function () {
+ this.area.coords = this.coords(1).join(',');
+ };
+
+ m.impl.resize = function (width, height, duration, callback) {
+ if (!width && !height) {
+ return false;
+ }
+ var x= (new m.Method(this,
+ function () {
+ this.resize(width, height, duration, callback);
+ },
+ null,
+ {
+ name: 'resize',
+ args: arguments
+ }
+ )).go();
+ return x;
+ };
+
+/*
+ m.impl.zoom = function (key, opts) {
+ var options = opts || {};
+
+ function zoom(areaData) {
+ // this will be MapData object returned by Method
+
+ var scroll, corners, height, width, ratio,
+ diffX, diffY, ratioX, ratioY, offsetX, offsetY, newWidth, newHeight, scrollLeft, scrollTop,
+ padding = options.padding || 0,
+ scrollBarSize = areaData ? 20 : 0,
+ me = this,
+ zoomOut = false;
+
+ if (areaData) {
+ // save original state on first zoom operation
+ if (!me.zoomed) {
+ me.zoomed = true;
+ me.preZoomWidth = me.scaleInfo.width;
+ me.preZoomHeight = me.scaleInfo.height;
+ me.zoomedArea = areaData;
+ if (options.scroll) {
+ me.wrapper.css({ overflow: 'auto' });
+ }
+ }
+ corners = $.mapster.utils.areaCorners(areaData.coords(1, 0));
+ width = me.wrapper.innerWidth() - scrollBarSize - padding * 2;
+ height = me.wrapper.innerHeight() - scrollBarSize - padding * 2;
+ diffX = corners.maxX - corners.minX;
+ diffY = corners.maxY - corners.minY;
+ ratioX = width / diffX;
+ ratioY = height / diffY;
+ ratio = Math.min(ratioX, ratioY);
+ offsetX = (width - diffX * ratio) / 2;
+ offsetY = (height - diffY * ratio) / 2;
+
+ newWidth = me.scaleInfo.realWidth * ratio;
+ newHeight = me.scaleInfo.realHeight * ratio;
+ scrollLeft = (corners.minX) * ratio - padding - offsetX;
+ scrollTop = (corners.minY) * ratio - padding - offsetY;
+ } else {
+ if (!me.zoomed) {
+ return;
+ }
+ zoomOut = true;
+ newWidth = me.preZoomWidth;
+ newHeight = me.preZoomHeight;
+ scrollLeft = null;
+ scrollTop = null;
+ }
+
+ this.resize({
+ width: newWidth,
+ height: newHeight,
+ duration: options.duration,
+ scroll: scroll,
+ scrollLeft: scrollLeft,
+ scrollTop: scrollTop,
+ // closure so we can be sure values are correct
+ callback: (function () {
+ var isZoomOut = zoomOut,
+ scroll = options.scroll,
+ areaD = areaData;
+ return function () {
+ if (isZoomOut) {
+ me.preZoomWidth = null;
+ me.preZoomHeight = null;
+ me.zoomed = false;
+ me.zoomedArea = false;
+ if (scroll) {
+ me.wrapper.css({ overflow: 'inherit' });
+ }
+ } else {
+ // just to be sure it wasn't canceled & restarted
+ me.zoomedArea = areaD;
+ }
+ };
+ } ())
+ });
+ }
+ return (new m.Method(this,
+ function (opts) {
+ zoom.call(this);
+ },
+ function () {
+ zoom.call(this.owner, this);
+ },
+ {
+ name: 'zoom',
+ args: arguments,
+ first: true,
+ key: key
+ }
+ )).go();
+
+
+ };
+ */
+} (jQuery));
+/* tooltip.js - tooltip functionality
+ requires areacorners.js
+*/
+
+(function ($) {
+
+ var m = $.mapster, u = m.utils;
+
+ $.extend(m.defaults, {
+ toolTipContainer: '
',
+ showToolTip: false,
+ toolTipFade: true,
+ toolTipClose: ['area-mouseout','image-mouseout'],
+ onShowToolTip: null,
+ onHideToolTip: null
+ });
+
+ $.extend(m.area_defaults, {
+ toolTip: null,
+ toolTipClose: null
+ });
+
+
+ /**
+ * Show a tooltip positioned near this area.
+ *
+ * @param {string|jquery} html A string of html or a jQuery object containing the tooltip content.
+ * @param {string|jquery} [template] The html template in which to wrap the content
+ * @param {string|object} [css] CSS to apply to the outermost element of the tooltip
+ * @return {jquery} The tooltip that was created
+ */
+
+ function createToolTip(html, template, css) {
+ var tooltip;
+
+ // wrap the template in a jQuery object, or clone the template if it's already one.
+ // This assumes that anything other than a string is a jQuery object; if it's not jQuery will
+ // probably throw an error.
+
+ if (template) {
+ tooltip = typeof template === 'string' ?
+ $(template) :
+ $(template).clone();
+
+ tooltip.append(html);
+ } else {
+ tooltip=$(html);
+ }
+
+ // always set display to block, or the positioning css won't work if the end user happened to
+ // use a non-block type element.
+
+ tooltip.css($.extend((css || {}),{
+ display:"block",
+ position:"absolute"
+ })).hide();
+
+ $('body').append(tooltip);
+
+ // we must actually add the tooltip to the DOM and "show" it in order to figure out how much space it
+ // consumes, and then reposition it with that knowledge.
+ // We also cache the actual opacity setting to restore finally.
+
+ tooltip.attr("data-opacity",tooltip.css("opacity"))
+ .css("opacity",0);
+
+ // doesn't really show it because opacity=0
+
+ return tooltip.show();
+ }
+
+
+ /**
+ * Show a tooltip positioned near this area.
+ *
+ * @param {jquery} tooltip The tooltip
+ * @param {object} [options] options for displaying the tooltip.
+ * @config {int} [left] The 0-based absolute x position for the tooltip
+ * @config {int} [top] The 0-based absolute y position for the tooltip
+ * @config {string|object} [css] CSS to apply to the outermost element of the tooltip
+ * @config {bool} [fadeDuration] When non-zero, the duration in milliseconds of a fade-in effect for the tooltip.
+ */
+
+ function showToolTipImpl(tooltip,options)
+ {
+ var tooltipCss = {
+ "left": options.left + "px",
+ "top": options.top + "px"
+ },
+ actalOpacity=tooltip.attr("data-opacity") || 0,
+ zindex = tooltip.css("z-index");
+
+ if (parseInt(zindex,10)===0
+ || zindex === "auto") {
+ tooltipCss["z-index"] = 9999;
+ }
+
+ tooltip.css(tooltipCss)
+ .addClass('mapster_tooltip');
+
+
+ if (options.fadeDuration && options.fadeDuration>0) {
+ u.fader(tooltip[0], 0, actalOpacity, options.fadeDuration);
+ } else {
+ u.setOpacity(tooltip[0], actalOpacity);
+ }
+ }
+
+ /**
+ * Hide and remove active tooltips
+ *
+ * @param {MapData} this The mapdata object to which the tooltips belong
+ */
+
+ m.MapData.prototype.clearToolTip = function() {
+ if (this.activeToolTip) {
+ this.activeToolTip.stop().remove();
+ this.activeToolTip = null;
+ this.activeToolTipID = null;
+ u.ifFunction(this.options.onHideToolTip, this);
+ }
+ };
+
+ /**
+ * Configure the binding between a named tooltip closing option, and a mouse event.
+ *
+ * If a callback is passed, it will be called when the activating event occurs, and the tooltip will
+ * only closed if it returns true.
+ *
+ * @param {MapData} [this] The MapData object to which this tooltip belongs.
+ * @param {String} option The name of the tooltip closing option
+ * @param {String} event UI event to bind to this option
+ * @param {Element} target The DOM element that is the target of the event
+ * @param {Function} [beforeClose] Callback when the tooltip is closed
+ * @param {Function} [onClose] Callback when the tooltip is closed
+ */
+ function bindToolTipClose(options, bindOption, event, target, beforeClose, onClose) {
+ var event_name = event + '.mapster-tooltip';
+
+ if ($.inArray(bindOption, options) >= 0) {
+ target.unbind(event_name)
+ .bind(event_name, function (e) {
+ if (!beforeClose || beforeClose.call(this,e)) {
+ target.unbind('.mapster-tooltip');
+ if (onClose) {
+ onClose.call(this);
+ }
+ }
+ });
+
+ return {
+ object: target,
+ event: event_name
+ };
+ }
+ }
+
+ /**
+ * Show a tooltip.
+ *
+ * @param {string|jquery} [tooltip] A string of html or a jQuery object containing the tooltip content.
+ *
+ * @param {string|jquery} [target] The target of the tooltip, to be used to determine positioning. If null,
+ * absolute position values must be passed with left and top.
+ *
+ * @param {string|jquery} [image] If target is an [area] the image that owns it
+ *
+ * @param {string|jquery} [container] An element within which the tooltip must be bounded
+ *
+ *
+ *
+ * @param {object|string|jQuery} [options] options to apply when creating this tooltip - OR -
+ * The markup, or a jquery object, containing the data for the tooltip
+ *
+ * @config {string} [closeEvents] A string with one or more comma-separated values that determine when the tooltip
+ * closes: 'area-click','tooltip-click','image-mouseout' are valid values
+ * then no template will be used.
+ * @config {int} [offsetx] the horizontal amount to offset the tooltip
+ * @config {int} [offsety] the vertical amount to offset the tooltip
+ * @config {string|object} [css] CSS to apply to the outermost element of the tooltip
+ */
+
+ function showToolTip(tooltip,target,image,container,options) {
+ var corners,
+ ttopts = {};
+
+ options = options || {};
+
+
+ if (target) {
+
+ corners = u.areaCorners(target,image,container,
+ tooltip.outerWidth(true),
+ tooltip.outerHeight(true));
+
+ // Try to upper-left align it first, if that doesn't work, change the parameters
+
+ ttopts.left = corners[0];
+ ttopts.top = corners[1];
+
+ } else {
+
+ ttopts.left = options.left;
+ ttopts.top = options.top;
+ }
+
+ ttopts.left += (options.offsetx || 0);
+ ttopts.top +=(options.offsety || 0);
+
+ ttopts.css= options.css;
+ ttopts.fadeDuration = options.fadeDuration;
+
+ showToolTipImpl(tooltip,ttopts);
+
+ return tooltip;
+ }
+
+ /**
+ * Show a tooltip positioned near this area.
+ *
+ * @param {string|jquery} [content] A string of html or a jQuery object containing the tooltip content.
+
+ * @param {object|string|jQuery} [options] options to apply when creating this tooltip - OR -
+ * The markup, or a jquery object, containing the data for the tooltip
+ * @config {string|jquery} [container] An element within which the tooltip must be bounded
+ * @config {bool} [template] a template to use instead of the default. If this property exists and is null,
+ * then no template will be used.
+ * @config {string} [closeEvents] A string with one or more comma-separated values that determine when the tooltip
+ * closes: 'area-click','tooltip-click','image-mouseout' are valid values
+ * then no template will be used.
+ * @config {int} [offsetx] the horizontal amount to offset the tooltip
+ * @config {int} [offsety] the vertical amount to offset the tooltip
+ * @config {string|object} [css] CSS to apply to the outermost element of the tooltip
+ */
+ m.AreaData.prototype.showToolTip= function(content,options) {
+ var tooltip, closeOpts, target, tipClosed, template,
+ ttopts = {},
+ ad=this,
+ md=ad.owner,
+ areaOpts = ad.effectiveOptions();
+
+ // copy the options object so we can update it
+ options = options ? $.extend({},options) : {};
+
+ content = content || areaOpts.toolTip;
+ closeOpts = options.closeEvents || areaOpts.toolTipClose || md.options.toolTipClose || 'tooltip-click';
+
+ template = typeof options.template !== 'undefined' ?
+ options.template :
+ md.options.toolTipContainer;
+
+ options.closeEvents = typeof closeOpts === 'string' ?
+ closeOpts = u.split(closeOpts) :
+ closeOpts;
+
+ options.fadeDuration = options.fadeDuration ||
+ (md.options.toolTipFade ?
+ (md.options.fadeDuration || areaOpts.fadeDuration) : 0);
+
+ target = ad.area ?
+ ad.area :
+ $.map(ad.areas(),
+ function(e) {
+ return e.area;
+ });
+
+ if (md.activeToolTipID===ad.areaId) {
+ return;
+ }
+
+ md.clearToolTip();
+
+ md.activeToolTip = tooltip = createToolTip(content,
+ template,
+ options.css);
+
+ md.activeToolTipID = ad.areaId;
+
+ tipClosed = function() {
+ md.clearToolTip();
+ };
+
+ bindToolTipClose(closeOpts,'area-click', 'click', $(md.map), null, tipClosed);
+ bindToolTipClose(closeOpts,'tooltip-click', 'click', tooltip,null, tipClosed);
+ bindToolTipClose(closeOpts,'image-mouseout', 'mouseout', $(md.image), function(e) {
+ return (e.relatedTarget && e.relatedTarget.nodeName!=='AREA' && e.relatedTarget!==ad.area);
+ }, tipClosed);
+
+
+ showToolTip(tooltip,
+ target,
+ md.image,
+ options.container,
+ template,
+ options);
+
+ u.ifFunction(md.options.onShowToolTip, ad.area,
+ {
+ toolTip: tooltip,
+ options: ttopts,
+ areaOptions: areaOpts,
+ key: ad.key,
+ selected: ad.isSelected()
+ });
+
+ return tooltip;
+ };
+
+
+ /**
+ * Parse an object that could be a string, a jquery object, or an object with a "contents" property
+ * containing html or a jQuery object.
+ *
+ * @param {object|string|jQuery} options The parameter to parse
+ * @return {string|jquery} A string or jquery object
+ */
+ function getHtmlFromOptions(options) {
+
+ // see if any html was passed as either the options object itself, or the content property
+
+ return (options ?
+ ((typeof options === 'string' || options.jquery) ?
+ options :
+ options.content) :
+ null);
+ }
+
+ /**
+ * Activate or remove a tooltip for an area. When this method is called on an area, the
+ * key parameter doesn't apply and "options" is the first parameter.
+ *
+ * When called with no parameters, or "key" is a falsy value, any active tooltip is cleared.
+ *
+ * When only a key is provided, the default tooltip for the area is used.
+ *
+ * When html is provided, this is used instead of the default tooltip.
+ *
+ * When "noTemplate" is true, the default tooltip template will not be used either, meaning only
+ * the actual html passed will be used.
+ *
+ * @param {string|AreaElement} key The area for which to activate a tooltip, or a DOM element.
+ *
+ * @param {object|string|jquery} [options] options to apply when creating this tooltip - OR -
+ * The markup, or a jquery object, containing the data for the tooltip
+ * @config {string|jQuery} [content] the inner content of the tooltip; the tooltip text or HTML
+ * @config {Element|jQuery} [container] the inner content of the tooltip; the tooltip text or HTML
+ * @config {bool} [template] a template to use instead of the default. If this property exists and is null,
+ * then no template will be used.
+ * @config {int} [offsetx] the horizontal amount to offset the tooltip.
+ * @config {int} [offsety] the vertical amount to offset the tooltip.
+ * @config {string|object} [css] CSS to apply to the outermost element of the tooltip
+ * @config {string|object} [css] CSS to apply to the outermost element of the tooltip
+ * @config {bool} [fadeDuration] When non-zero, the duration in milliseconds of a fade-in effect for the tooltip.
+ * @return {jQuery} The jQuery object
+ */
+
+ m.impl.tooltip = function (key,options) {
+ return (new m.Method(this,
+ function mapData() {
+ var tooltip, target, md=this;
+ if (!key) {
+ md.clearToolTip();
+ } else {
+ target=$(key);
+ if (md.activeToolTipID ===target[0]) {
+ return;
+ }
+ md.clearToolTip();
+
+ md.activeToolTip = tooltip = createToolTip(getHtmlFromOptions(options),
+ options.template || md.options.toolTipContainer,
+ options.css);
+ md.activeToolTipID = target[0];
+
+ bindToolTipClose(['tooltip-click'],'tooltip-click', 'click', tooltip, null, function() {
+ md.clearToolTip();
+ });
+
+ md.activeToolTip = tooltip = showToolTip(tooltip,
+ target,
+ md.image,
+ options.container,
+ options);
+ }
+ },
+ function areaData() {
+ if ($.isPlainObject(key) && !options) {
+ options = key;
+ }
+
+ this.showToolTip(getHtmlFromOptions(options),options);
+ },
+ {
+ name: 'tooltip',
+ args: arguments,
+ key: key
+ }
+ )).go();
+ };
+} (jQuery));
diff --git a/preview-calendar/materials-science.html b/preview-calendar/materials-science.html
new file mode 100644
index 000000000..762c57581
--- /dev/null
+++ b/preview-calendar/materials-science.html
@@ -0,0 +1,362 @@
+
+
+
+
+
+
+Empowering Computational Materials Science Research using HTC
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Empowering Computational Materials Science Research using HTC
+
+
Ajay Annamareddy, a research scientist at the University of Wisconsin-Madison, describes how he utilizes high-throughput computing in computational materials science.
+
+
+
+
Groundbreaking research is in the works for the Computational Materials Group (CMG) at the University of Wisconsin-Madison (UW-Madison). Ajay Annamareddy, a research scientist within CMG, has been a leading user of GPU hours with the Center for High Throughput Computing (CHTC). He utilizes this capacity to run machine learning (ML) simulations as applied to material science problems that have gained tremendous interest in the past decade. CHTC resources have allowed him to study hugely data-driven problems that are practically impossible to deal with using regular resources.
+
+
Before coming to UW-Madison, Annamareddy received his Ph.D. in Nuclear Engineering from North Carolina State University. He was introduced to modeling and simulation work there, but he started using high-throughput computing (HTC) and CHTC services when he came to UW-Madison to work as a PostDoc with Prof. Dane Morgan in the Materials Science and Engineering department. He now works for CMG as a Research Scientist, where he’s been racking up GPU hours for over a year.
+
+
Working in the field of computational materials, Annamareddy and his group use computers to determine the properties of materials. So rather than preparing material and measuring it in experiments, they use a computer, which is less expensive and more time efficient. Annamareddy studies metallic glasses. These materials have many valuable properties and applications, but are not easy to make. Instead, he uses computer simulations of these materials to analyze and understand their fundamental properties.
+
+
Annamareddy’s group utilizes HTC and high-performance computing (HPC) for their work, so his project lead asked him to contact CHTC and set up an account. Christina Koch, the lead research computing facilitator, responded. “She helped me set up the account and determine how many resources we needed,” Annamareddy explained. “She was very generous in that whenever I exceeded my limits, she would increase them a bit more!”
+
+
CHTC resources have become critical for Annamareddy’s work. One of the projects involves running ML simulations, which he notes would be “difficult to complete” without the support of CHTC. Annamareddy uses graph neural networks (GNN), a powerful yet slightly inefficient deep learning technique. The upside to using GNN is that as long as there is some physics component in the underlying research problem, this technique can analyze just about anything. “The caveat is you need to provide lots of data for this technique to figure out a solution.”
+
+
Meeting this data challenge, Annamareddy put the input data he generates using high-performance computing (HPC) on the HTC staging location, which gets transferred to a local machine before the ML job starts running. “I use close to twenty gigabytes of data for my simulation, so this would be extremely inefficient to run without staging,” he explains. The CHTC provides Annamareddy with the storage and organization he needs to perform these potentially ground-breaking ML simulations.
+
+
Researchers often study materials in traditional atomistic simulations at different timescales, ranging from picoseconds to microseconds. Annamareddy’s goal with his work is to extend the time scales of these conventional simulations by using ML, which he found is well supported by HTC resources. “We have yet to reach it, but we hope we can use ML to extend the time scale of atomistic simulations by a few orders of magnitude. This would be extremely valuable when modeling systems like glass-forming materials where we should be able to obtain properties, like density and diffusion coefficients, much closer to experiments than currently possible with atomistic simulations,” Annamareddy elaborates. This is something that has never been done before in the field.
+
+
This project can potentially extend the time scales possible for conventional molecular dynamic simulations, allowing researchers in this field to predict how materials will behave over more extended periods of time. “It’s ambitious – but I’ve been working on it for more than a year, and we’ve made a lot of progress…I enjoy the challenge immensely and am happy I’m working on this problem!”
+ For neuroscientist Chris Cox, the OSG helps process mountains of data
+
+
Whether exploring how the brain is fooled by fake news or explaining the decline of knowledge in dementia, cognitive neuroscientists like Chris Cox are relying more on high-throughput computing resources like the Open Science Pool to understand how the brain makes sense of information.
+
+
Cognitive neuroscientist Chris Cox recently defended his dissertation at the University of Wisconsin Madison (UW-Madison). Unlike molecular or cellular study of neuroscience, cognitive neuroscience seeks a larger view of neural systems—of “how the brain supports cognition,” said Cox.
+
+
Cox and other neuroscience researchers seek to understand which parts of the brain support memory and decision making, and answer more nuanced questions like how objects are represented in the brain. For Cox, this has involved developing new techniques for studying the brain that rely heavily on high-throughput computing.
+
+
“Our research gets into the transformations that take place in the brain. We ask questions like ‘how is information from our senses combined to support abstract knowledge that seems to transcend our senses,’” said Cox. “For example, we can recognize a single object from different perspectives and scales as being the same thing, and when we read a word we can call to mind all kinds of meaning that have little if anything to do with the letters on the page.”
+
+
The brain is highly complex, so neural imaging methods like functional MRI yield thousands of individual data points for every two seconds of imaging. Cox first turned to high performance computing and finally to the Open Science Pool for high-throughput computing (HTC) to deal with the massive amounts of data. Because computing support at UW-Madison is so seamless, when he first started out on HTC, Cox wasn’t even aware that the OSG was powering the vast improvement in his research.
+
+
“The OSG at UW-Madison is like flipping a switch,” said Cox. “It cut my computing time in half and was totally painless. Our research was a good candidate for the OSG and the advantages of HTC. The OSG and the Center for High Throughput Computing at UW-Madison have empowered us to get results quickly that inform our next steps. This would be impossible without the extensive and robust HTC infrastructure provided by the OSG.”
+
+
A 45-minute experiment from many participants would produce enormous amounts of data. “From that, we can make inferences that generalize to humanity at large about how our brains work,” said Cox. “Our previous approach was to only look for activation that is located in the same place in the brain in different people and look for anatomical landmarks that we can line up across people. Then we ask whether they respond the same way (across people).”
+
+
“But now, we have expanded beyond that approach and look at how multiple parts of the brain are working together,” said Cox. “Even in one region of the brain, not every subcomponent might be working the same way, so when we start adding in all this extra diversity of the activation profile, we get very complicated models that have to be tuned to the data set.”
+
+
Cox’s major parameters now are how many data points to include when it’s time to build a model. “For cross-validation, that then increases the need for computing by an order of magnitude,” said Cox.
+
+
Each model can take 30 minutes to an hour to compute. Cox then runs hundreds of thousands of them to narrow in on the appropriate parameter values.
+
+
Further increasing the computational burden, this whole procedure has to be done multiple times, each time holding out a portion of the data for cross-validation. “By cross-validating and running simulations to determine what random performance looks like, we can test whether the models are revealing something meaningful about the brain,” said Cox.
+
+
Cox gains a particular advantage from high-throughput computing on the OSG by creating novel optimization procedures to probe MRI data that is more connected with cognitive theory.
+
+
“Saving a minute or two on each individual job is not important,” said Cox. “Our main priority can focus on the most conceptually sound algorithms and we can get to real work more quickly. We don’t need to optimize for a HPC cluster, we can just use the scale of HTC.”
+
+
Cox’s research is beginning to explore the neural dynamics involved when calling to mind a concept, with millisecond resolution. This requires looking at data collected with other methods like electroencephalography (EEG) and electrocortography (EcoG). Cox said that it takes about two full seconds for MRI to collect a single sample.
+
+
“The problem is that lots of cognitive activity is going on in those two seconds that is being missed,” said Cox. “When you gain resolution in the time domain you have a chance to notice qualitative shifts that may delimit different neural processes. Identifying when they occur has a lot of theoretical relevance, but also practical relevance in understanding when information is available to the person.”
+
+
“People think of the brain as a library—adding books to the stack and looking in a card catalog,” said Cox. “We are seeing knowledge more like Lego blocks than a library—no single block has meaning, but a collection can express meaning when properly composed. The brain puts those blocks together to give meaning. My research so far supports the Lego perspective over the library perspective.”
+
+
Cognitive neuroscience may offer clues to cognitive decline, which in turn could inform how we think about learning, instruction, and training. How we understand challenges like dementia can lead to better, more correct therapies by understanding the patterns of decline in the brain.
+
+
“Also, having a more accurate understanding of what it means to ‘know’ something can also help us understand how fake news and misinformation take hold in individuals and spread through social networks,” said Cox. “At the core of these issues are fundamental questions about how we process and assimilate information.
+
+
“We know it is hard to get someone to change their mind, so the question is what is happening in the brain. The answers depend on a better understanding of what knowledge is and how we acquire it. Our research is pointed to these higher level questions.”
+
+
“Once we had access to the computational resources of the OSG, we saw a paradigm shift in the way we think about research,” said Cox. “Previously, we might have jobs running for months. With HTC on the OSG, that job length became just a few days. It gave new legs to the whole research program and pushed us forward on new optimization techniques that we never would have tried otherwise.”
+ CHTC’s computing pioneering continues to advance science and society in new ways. Located at the heart of
+ UW-Madison’s School for Computer, Data & Information Sciences (CDIS), CHTC offers
+ exceptional computing capabilities and experienced facilitation support to campus researchers and
+ international scientists alike. Working in collaboration with projects across all areas of study,
+ CHTC helps innovate solutions that otherwise might not have been possible, while at
+ the same time evolving the field of distributed computing.
+
+ The CHTC Newsletter is a quarterly email that includes information about upcoming events,
+ training opportunities, and other news from the CHTC. If you would like to receive
+ the CHTC Newsletter, please fill out the form below.
+
+ NOAA funded marine scientist uses OSPool access to high throughput computing to explode her boundaries of research
+
+
Dr. Carrie Wall, a research scientist at the University of Colorado Boulder, shares how access to OSPool resources has allowed her team to expand the scope of their research and to fail, unconstrained by the cost of computing in the cloud and the associated restraints that places on research.
+
+
+
+
A marine scientist faced the daunting challenge of processing sonar data from 65 research cruises spanning 20 years, totaling over 100,000 files. The researcher, Dr. Carrie Wall, braced herself for a grueling 30-week endeavor of single stream, desktop-based processing. However, good fortune intervened at a National Discovery Cloud for Climate (NDC-C) conference in January 2024 when she crossed paths with Brian Bockelman, the principal investigator (PI) of the Pelican Project and a Co-PI of the PATh Project.
+
+
Wall discussed with Bockelman the challenges of converting decades’ worth of sonar datasets into a format suitable for AI analysis —a crucial step for her NSF-funded project through the NDC-C. This initiative aimed to develop the cyberinfrastructure essential for implementing scalable self-supervised machine learning on the extensive water column sonar data accumulated over the years.“We all went around and did five-minute presentations explaining ‘here’s what I do, here’s what I work on,’ almost like speed dating,” recounted Bockelman. “Listening to her talk, it was like, ‘this is a great high throughput computing example.’” Recognizing the volume of Wall’s project, Bockelman introduced her to the OSPool, a shared computing resource freely available to researchers affiliated with US academic institutions. He observed that Wall’s computing style aligned seamlessly with OSPool’s capabilities and would address Wall’s sonar processing bottleneck.
+
+
With Bockelman’s encouragement, Wall and her team’s software developer, Rudy Klucik, easily created accounts and began modifying their computing workflow for high throughput computing.”The process was super easy and very accommodating. Rachel Lombardi, a Research Computing Facilitator for the Center for High Throughput Computing, walked me through all the details, answered my technical questions, and was very welcoming. It was a really nice onboarding,” enthused Klucik. What followed was nothing short of a paradigm shift.
+
+
+
+
Within the walls of the University of Colorado Boulder lies CIRES: The Cooperative Institute for Research in Environmental Sciences, a partnership between the National Oceanic and Atmospheric Administration (NOAA) and the university itself. CIRES employs a workforce of over 800 scientists and staff, actively involved in various aspects of NOAA’s mission-critical endeavors. Among them are Wall and Klucik, both members of NOAA’s team. NOAA’s mission centers on supporting healthy oceans. Dr. Wall has dedicated the past 11 years to leading the development of national archives for water column sonar data, a task undertaken through the National Centers for Environmental Information (NCEI), NOAA’s archival arm.
+
+
Wall and her team have archived over 280TB of water column sonar data at NCEI, which serves not only NOAA’s scientists but also other agencies and academic institutions. However, there was a significant issue: it existed solely in its native, proprietary, and exceedingly complex industry format. Despite being hosted on Amazon Web Services (AWS) for accessibility, as Wall explained, “a lot of expert knowledge is needed to even open and read these files.”
+
+
“NOAA scientists, mostly from the National Marine Fisheries Service (NOAA Fisheries), have collected in all U.S. waters - from the Arctic Ocean to the Caribbean and off the entire U.S. coastline. In just the Gulf of Maine alone, NOAA Fisheries scientists have collected over 20 years of data going back to 1998. All of these data have been archived so not only do we have a very large volume of data, but also a very long time series covering critical habitats,” Wall explained. “The majority of these fascinating data have been used to support fishery stock assessments,” Wall emphasized. “There’s a lot of these data, and in collaboration with experts we want to find out more about them.”
+
+
With the help of the OSPool, Klucik has been able to successfully develop a workflow that he now executes smoothly. This involves reading files from an AWS bucket, processing and converting them into a cloud native Zarr format, and then writing that data out to a publicly accessible bucket, available under the NOAA Open Data Dissemination program. Wall added, “This will now serve as our input for the AI model.”
+
+
Before discovering the OSPool, the original plan was to “fully utilize a cloud native processing pipeline, mainly composed of AWS Lambdas to do all our data conversion” described Klucik. “One common misconception is that cloud computing is cheaper than traditional computing, if the technical elements are all aligned properly it can be cheap, but in a lot of situations it’s still extremely expensive; in the back of our minds we were afraid that it might even be cost prohibitive to process the archive as a whole.”
+
+
However, it is important to acknowledge that before being set up with the OSPool, there was a bit of a learning curve. “I was completely new to high throughput computing infrastructure and didn’t understand how the processing worked,” recalled Klucik. “So, a lot of my initial time was spent running ‘hello world’ examples to better understand the functionality. I started with one job, then scaled up to 100 and eventually 1,000 to get the concurrency we were looking to achieve. It involved a lot of trial and error to get everything right. It took about a month before I finally managed to run the full catalog of data properly.” Klucik noted that he was aware of the available resources, saying, “The OSPool documentation served as an invaluable resource for getting me oriented in a new computing environment.”
+
+
Although initially tasked with processing 100,000 files, their workflow using the OSPool has since surged beyond 400,000 files—an accomplishment that would have been financially daunting in a traditional cloud environment. Wall emphasized that “what OSPool has allowed us to do is fail, which is really, really good. Before [using the] OSPool, we started by processing a couple of cruises with a very small number of files in the cloud to be cost-effective; we didn’t want to make costly mistakes. Being able to use OSPool to iterate and strengthen our process, allowed us to then scale to the volume of data and number of files that we need to process. I don’t know where we would be without OSPool but it would’ve cost us tens of thousands of dollars. We didn’t have to sacrifice for a lesser workflow, one that we didn’t improve upon because it would have cost us more money. I’m really excited about where OSPool has allowed us to go, and now we can take that next step to say ‘okay, we have our foundation, which is our data and a great format, and we can build our models and additional workflows.’”
+
+
Wall’s testimony underscores OSPool’s role not just as computing capacity but as a catalyst for innovation, enabling teams to push boundaries and realize their full potential in research and model development.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/preview-calendar/nrao.html b/preview-calendar/nrao.html
new file mode 100644
index 000000000..a314114a8
--- /dev/null
+++ b/preview-calendar/nrao.html
@@ -0,0 +1,421 @@
+
+
+
+
+
+
+Through the use of high throughput computing, NRAO delivers one of the deepest radio images of space
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Through the use of high throughput computing, NRAO delivers one of the deepest radio images of space
+
+
The National Radio Astronomy Observatory’s collaboration with the NSF-funded Partnership to Advance Throughput Computing
+(PATh; NSF grant #2030508) and the Pelican Project (NSF grant #2331480) leads to successfully imaged deep
+space and creates a first-of-its-kind nationally distributed workflow model for data-intensive scientific investigations.
+
+
Ten years ago, the National Radio Astronomy Observatory (NRAO) pointed its
+Very Large Array (VLA) telescopes toward a well-studied portion of the sky, searching for the
+oldest view of the universe. Deeper structures reflect older structures in space, as their light takes longer to travel through space
+and be picked up by telescopes. Radio astronomy can go even further, detecting structures beyond visible light. The VLA
+telescopes generated enough data that a single image of a portion of the sky resulted in two terabytes of data. Without the
+computing capacity to image the complete data set, it sat largely unprocessed — until now.
+
+
Researchers at NRAO knew that attempting to process this entire data set in-house was impractical. A previous computing run in 2016 using only a subset of this data took nearly two weeks of active processing. The high sensitivity of radio images requires a vast amount of computing to reach a final product, noted Felipe Madsen, an
+NRAO software engineer. The VLA telescopes are interferometers, meaning they point two antennas at the same portion of the
+sky; the differences in what these antennas provide eventually result in an image, Madsen explains. NRAO models and re-models
+the data to decrease the noise level until the noise is indistinguishable from structures in space. “This project is a lot
+more data-intensive than most other projects,” Madsen said.
+
+
Curious about how high-throughput computing (HTC) could enhance its capacity to process data from the VLA, NRAO joined
+forces with the Center for High Throughput Computing (CHTC) in 2018. After learning about
+what HTC could accomplish, NRAO began executing trial runs in 2019, experimenting with HTC. “Four years ago, we were
+beginning to use GPU software to process our data,” Madsen explained. “From the beginning, we understood that to be
+compatible with HTC we needed to make changes to our systems.”
+
+
Each team learned from and made improvements based on insights from each other. Greg Thain,
+an HTCondor Core Developer for the CHTC, met with NRAO weekly to discuss HTC and changes both parties
+could make. These weekly meetings resulted in the HTCondor team making changes to the software, eventually improving the
+experience of other users, he said. OSG Software Area Coordinator of CHTC Brian Lin
+helped NRAO manage their distributed infrastructure of resources across the country and transition workflows from CPUs to GPUs
+to make their workflows more compatible with HTC. Through distributed HTC, NRAO was able to run workflows across the country through the
+Open Science Pool (OSPool) and
+PATh Facility.
+
+
At NRAO, Madsen developed the software to interface the scientific software in the LibRA package
+developed by NRAO Algorithms Research & Development Group with the CHTC infrastructure software. This separation of software
+allowed the two teams to solve problems that arose in real-time as the data began to transfer across sites nationwide.
+
+
By December 2023, both parties were ready to tackle the VLA telescope deep sky data using HTC. Transitioning workflows to
+nationwide resources led to data movement issues, struggling to move efficiently from distributed resources. The December
+2023 image processing run relied upon resources from the Open Science Data Federation
+(OSDF) and the recently funded Pelican Project to speed up data
+transfers across sites. Brian Bockelman, PI of the
+Pelican Project, and his team helped NRAO improve data movement using the OSDF. “Both teams
+were working to solve problems as they were happening,” Madsen recounted. “That made for a very successful collaboration
+in this process.”
+
+
+
+
Ultimately, the imaging process was 300 times faster than without using HTC, NRAO reported in
+a press release describing
+the project. What had previously taken two weeks now took only two hours to create the final result. The final image turned nine terabytes of data into a single
+product of one gigabyte. By
+the end, the collaboration resulted in one of the earliest radio images of the
+Hubble Ultra Deep Field.
+
+
The collaboration that led to this imaging is even bigger than NRAO and CHTC.
+The OSPool, which provided some of the computing capacity for the project,
+is supported by campuses and institutions across the country that share their excess capacity with the pool
+that NRAO utilized. For this project, 13 campuses contributed computing capacity, from small institutions
+like Emporia State University to larger ones like San Diego State University.
+
+
+
+
+
The December 2023 run and the working relationship between CHTC and NRAO revolutionized information available to astronomers
+and proved that HTC is a viable option for the field. “It’s useful to do this run once. What’s exciting is doing it
+30,000 times for the entire sky,” Bockelman said. Although previous radio astronomy imaging workflows utilized HTC,
+this run was the first to image data on a distributed workflow nationwide from start to finish. Moving forward, NRAO
+and CHTC will continue covering the entire area of the sky seen by the VLA telescopes.
+
+
Madsen is enthusiastic about continuing this project, and how the use of HTC is revolutionizing astronomy, “I’ve always felt
+like, in this project, we are at the cutting edge of the current knowledge for making this kind of imaging.
+On the astronomy side, we can access a lot of new information with this image,” he said. “We have also imaged a data set that was
+previously impractical to image.”
LIGO consists of two observatories within the United States—one in Hanford, Washington and the other in Livingston, Louisiana—separated by 1,865 miles. LIGO’s detectors search for gravitational waves from deep space. With two detectors, researchers can use differences in the wave’s arrival times to constrain the source location in the sky. LIGO’s first data run of its advanced gravitational wave detectors began in September 2015 and ran through January 12, 2016. The first gravitational waves were detected on September 14, 2015 by both detectors.
+
+
The LIGO project employs many concepts that the OSG promotes—resource sharing, aggregating opportunistic use across a variety of resources—and adds two twists: First, this experiment ran across LIGO Data Grid (LDG), OSPool and Extreme Science and Engineering Discovery Environment (XSEDE)-based resources, all managed from a single HTCondor-based system to take advantage of dedicated LDG, opportunistic OSG and NSF eXtreme Digital (XD) allocations. Second, workflows analyzing LIGO detector data proved more data-intensive than many opportunistic OSG workflows. Despite these challenges, LIGO scientists were able to manage workflows with the same tools they use to run on dedicated LDG systems—Pegasus and HTCondor.
+
+
Peter Couvares, data analysis computing manager for the Advanced LIGO project at Caltech, specializes in distributed computing problems. He and colleagues James Clark (Georgia Tech) and Larne Pekowsky (Syracuse University) explained LIGO’s computing needs and environment: The main focus is on optimization of data analysis codes, where optimization is broadly defined to encompass the overall performance and efficiency of their computing. While they use traditional optimization techniques to make things run faster, they also pursue more efficient resource management, and opportunistic resources—if there are computers available, they try to use them—thus the collaboration with OSG.
+
+
+
+
+
+
+ Peter Couvares, courtesy photo
+
+
+
+
+
+ James Clark, courtesy photo
+
+
+
+
+
+ Larne Pekowsky, courtesy photo
+
+
+
+
“When a workflow might consist of 600,000 jobs, we don’t want to rerun them if we make a mistake. So we use DAGMan (Directed Acyclic Graph Manager, a meta-scheduler for HTCondor) and Pegasus workflow manager to optimize changes,” added Couvares. “The combination of Pegasus, Condor, and OSG work great together.” Keeping track of what has run and how the workflow progresses, Pegasus translates the abstract layer of what needs to be done into actual jobs for Condor, which then puts them out on OSG.
+
+
The computing model
+
+
Since this work encompasses four types of computing – volunteer, dedicated, opportunistic (OSG), and allocated (XSEDE XD via OSG) – everything needs to be very efficient. Couvares helps with coordination, Pekowsky with optimization, and Clark with using OSG. In particular, OSG also enabled access to allocation-based resources from XSEDE. Allocations allow LIGO to get fixed amounts of time on dedicated NSF-funded supercomputers Comet and Stampede. While Stampede looks and behaves very much like a traditional supercomputer resource (batch, login node, shared file system), Comet has a new virtualization-based interface that eliminates the need to submit to a batch system. OSG provides this through a virtual machine (VM) image, then LIGO simply uses the OSG environment.
+
+
LIGO consumed 3,956,910 hours on OSG, out of which 628,602 hours were on the Comet and 430,960 on the Stampede XD resources. OSG’s Brian Bockelman (University of Nebraska-Lincoln) and Edgar Fajardo (UC San Diego/San Diego Supercomputer Center) used HTCondor to help LIGO implement their Pegasus workflow transparently across 16 clusters at universities and national labs across the US, including on the NSF-funded Comet and Stampede supercomputers.
+
+
“Normally our computing is done on dedicated clusters on the LIGO Data Grid,” said Couvares, “but we are moving toward also using outside and more elastic resources like OSG. OSG allows more flexibility as we add in systems that aren’t part of our traditional dedicated systems. The combination of OSG for unusual or dynamic workloads, and the LIGO Data Grid for regular workloads keeping up with new observational data is very powerful. In addition, Berkely Open Infrastructure for Network Computer (BOINC) allows us to use volunteers’ home computers when they are idle, running Pulsar searches around the world in the Einstein@Home project (E@H). The aggregated cycles from E@H are quite large but it is well-suited to only some kinds of searches where a computer must process a smaller amount of data for a longer amount of time.” We must rely on traditional HTC resources for our data-intensive analysis codes.
+
+
LIGO codes cannot all run as-is on OSG. The majority of codes are highly optimized for the LDG environment, so they identified the most compute-intensive and high science priority code to run on OSG. Of about 100 different data analysis codes, only a small handful are running on OSG so far. However, the research team started with the hardest code, their highest priority, which means they are now doing some of LIGO’s most important computing on OSG. Other low latency codes must run on dedicated local resources where they might need to be done in seconds or minutes.
+
+
“It is important that LIGO has a broad set of resources and an increasingly diverse set of resources. OSG is almost like a universal adapter for us,” said Couvares. “It is very powerful, users don’t need to care where a job runs, and it is another step toward that old promise of grid computing.
+
+
The importance of OSG and NSF support
+
+
Using data analysis run on the OSG, the LIGO team looked for a compact binary coalescence, that is, the merger of binary neutron stars or black holes. Couvares called it a modeled search—they have a signal that they believe is a strong indicator, they know what it’s going to look like, and they have optimal match filters to compare data with the signal they expect. But the search is computationally expensive because it’s not just one signal they are looking for: The parameters of the source may change or the objects may spin differently. The degree of match requires a search on the order of 100,000 different models/waveforms. This makes the OSG very valuable, because it can split up many of the match filters.
+
+
“The parallel nature of the OSG is what’s valuable,” said Couvares. “It is well suited to a high throughput environment. We would like to use more OSG resources because we could expand the parameter space of our searches beyond what is possible with dedicated resources. We need two things, really. We obviously need resources, but we also need people who can be a bridge between the data analysts/scientists and the computing resources. Resources alone are not enough. LIGO will always need dedicated in-house computing for low latency searches that need to be done quickly, and for our steady-state offline computing, but now we have the potential elasticity of OSG.”
+
+
“The nature of our collaboration with OSG has been absolutely great for a number of reasons,” said Couvares. “The OSG people have been extremely helpful. They are really unselfish and technical. That’s not always there in the open-source world. The basic idea of OSG has been good for LIGO—their willingness OSG services for LIGO, to reduce the barrier to entry, setting up computing elements, and so on. The barrier otherwise would have been too high. We couldn’t be happier with our partnership.”
+
+
Another big step has been the increase in network speed. The data was cached at the University of Nebraska and streamed to on-demand worker nodes that are able to read from a common location. This project benefited greatly from the NSF’s Campus Cyberinfrastructure – Network Infrastructure and Engineering (CC-NIE) program, which helped provide a hardware upgrade from 10Gbps to 100Gbps WAN connectivity. Receiving NSF support to upgrade to 100Gbps has enabled huge gains in workflow throughput.
+
+
+
+
The LIGO analysis ran across 16 different OSG resources, for a total of 4M CPU hours:
+
+
1M CPU hour (25%) XSEDE contribution<
+
5 TB total input data, cached at the Holland Computing Center (HCC) at the University of Nebraska-Lincoln
+
1 PB total data volume distributed to jobs from Nebraska
+
10 Gbps sustained data rates from Nebraska storage to worker nodes
+
+
+
Couvares concluded, “What we are doing is pure science. We are trying to understand the universe, trying to do what people have wanted to do for 100 years. We are extending the reach of human understanding. It’s very exciting and the science is that much easier with the OSG.”
+
+
+
+
– Greg Moore
+
+
– Brian Bockelman (OSG, University of Nebraska at Lincoln) contributed to this story
+ OSPool As a Tool for Advancing Research in Computational Chemistry
+
+
Assistant Professor Eric Jonas uses OSG resources to understand the structure of molecules based on their measurements and derived properties.
+
+
+
+
+
Picture this: You have just developed a model that predicts the properties of some molecules and plan to include this model in a section of a research paper. However, just a few days before the paper is to be published on your professional website, you discover an error in the data generation process, which requires you to compute your work again and quickly!
+This scenario was the case with Assistant Professor Eric Jonas, who works in the Department of Computer Science at the University of Chicago (UChicago).
+While this process is normally tedious, he noted how the OSPool helped streamline the steps needed to regenerate results: “The OSPool made it easy to go back and regenerate the data set with about 70 million new molecules in just a matter of days.”
+
+
Although this was a fairly recent incident for Jonas, he is not new to high throughput computing or the OSPool. With usage reaching as far back as his graduate school days, Jonas has utilized resources ranging from cloud computing infrastructures like Amazon Web Services to the National Supercomputing Center for his work with biological signal acquisition, molecular inverse problems, machine learning, and other ways of exploiting scalable computation.
+
+
He soon realized, though, that although these other resources could run large amounts of data in a relatively short time, they required a long, drawn-out sequence of actions to provide results – creating an application, waiting for it to be accepted, and then waiting in line for long periods for a job to run. Faced with this problem in 2021, Jonas found a solution with the OSG Consortium and its OSPool, OSG’s distributed pool of computing resources for running high-throughput jobs.
+
+
In April of 2021, he enlisted the help of HTCondor and the OSPool to run pre-exising computations that allow for the generation of training data and the development of new machine learning techniques to determine molecular structures in mixtures, chemical structures in new plant species, and other related queries.
+
+
Jonas’ decision to transition to the OSPool boiled down to three simple reasons:
+Less red tape involved in getting started.
+Better communication and assistance from staff.
+Greater flexibility with running other people’s software to generate data for his specific research, which, in his words, are a much better fit for his specific research which would otherwise have been too computationally bulky to handle alone.
+
+
In terms of challenges with OSPool utilization, Jonas’ only point of concern is the amount of time it takes for code that has been uploaded to reach the OSPool. “It takes between 8 and 12 hours for that code to get to OSG. The time-consuming containerization process means that any bug in code that prevents it from running isn’t discovered and resolved as quickly, and takes quite a while, sometimes overnight.”
+
+
He and his research team have since continued to utilize OSPool to generate output and share data with other users. They have even become advocates for the resource: “After we build our models, as a next step, we’re like, let’s run our model on the OSPool to allow the community (which constitutes the entirety of OSPool users) also to generate their datasets. I guess my goal, in a way, is to help OSG grow any way I can, whether that involves sharing my output with others or encouraging people to look into it more.”
+
+
Jonas spoke about how he hopes more people would take advantage of OSPool:
+“We’re already working on expanding our use of it at UChicago, but I want even more people to know that OSPool is out there and to know what kind of jobs it’s a good fit for because if it fits the kind of work you’re doing, it’s like having a superpower!”
+ Advancing computational throughput of NSF funded projects with the PATh Facility
+
+
Since 2022, the Partnership to Advance Throughput Computing (PATh) Facility has provided dedicated high throughput
+computing (HTC)
+capacity to researchers nationwide. Following a year of expansion, here’s a look into the researchers’ work and how it has been enabled by
+the PATh Facility.
+
+
Searching for more computing capacity, Dr. Himadri Chakraborty of
+Northwest Missouri State University first heard of the PATh Facility, a
+purpose-built,
+national-scale distributed high throughput computing (HTC) resource, from his NSF program director. After approaching PATh Research
+Facilitators to
+acquire an account and computing “credits,” Chakraborty’s team was able to advance their work in physics using computing resources from the
+PATh Facility.
+Christina Koch, Lead Research Facilitator at CHTC guided Chakraborty’s team in
+transitioning workflows to run within HTCondor.
+
+
“As our ambition grew, we were looking out for a larger system, PATh came as a blessing,” Chakraborty reflected. “The ultimate satisfaction is to get
+some new understanding and learning of the science we are working on. We hope that this will be one of our first major achievements using
+the PATh Facility.”
The PATh Facility guarantees users access through credits. Credits operate as a stand-in unit to ensure no
+one user is monopolizing the Facility’s capacity. Users receive 1,000 start-up credits to test if the PATh Facility is a good fit for them, available as Central
+Processing Unit (CPU) or Graphic Processing Unit (GPU) charges. After this initial testing period, they can apply for supplemental credits by contacting PATh
+Facilitators and their NSF officer. If users run through all their credits, they are still able to keep running and facilitators will work with them to request
+additional credits.
+
+
In comparison to the PATh Facility, the Open Science Pool (OSPool) — another distributed HTC resource created
+by the OSG Consortium — acquires available capacity from idle resources across 70 institutions
+nationwide. Projects may be better suited for the PATh Facility than the OSPool if they need additional cores, memory, data or dedicated time. “Since the PATh
+Facility is hardware owned and operated by PATh, we can make more guarantees about how long individual computations can run, that people will be able to get certain
+resources and run computations of a certain size,” Koch explained.
+
+
Following the PATh Facility’s growth, some OSPool users have begun to use the Facility’s dedicated capacity in tandem. One example is North Carolina State University
+Ph.D. candidate Matthew Dorsey, who relied on capacity from the OSPool for two years before expanding his research to the newer PATh Facility.
+In doing so, he was able to run larger jobs without worrying about running out of capacity. “The transition to the PATh Facility was extremely easy,” Dorsey said.
+“I was pleased with how seamless it was.”
+
+
Dorsey became interested in the OSPool after attending OSG School in the summer of 2022. There, he learned the basics of
+HTCondor and got to know Koch and other facilitators. Dorsey’s research specializes in statistical physics. He uses computational models
+to study magnetic materials and how magnetic fields can be used to alter properties made from different kinds of magnetic nanoparticles. His work benefits from the
+consistent access to computing for runs that accumulate over a long period of time.
+
+
+
+
Dorsey acknowledges that each system has its advantages and disadvantages. For Dorsey, the PATh Facility is better equipped for more complex jobs due to capacity and
+allocated time, while the OSPool is better for testing out comparatively smaller runs. “It was really easy to translate what I was doing on the OSPool to the PATh
+Facility and quickly scale it up,” Dorsey said.
+
+
A testament to the strength of the PATh Facility, the National Radio Astronomy Observatory (NRAO) used its capacity with the OSPool to
+develop one of the oldest radio images of a well-studied area in space. Working alongside PATh and CHTC staff, the capacity
+from the PATh Facility was instrumental in planning when certain jobs would run, without the risk of reduced capacity from the OSPool.
+
+
The PATh Facility makes it possible to support projects with larger computational requirements. Dr. Vladan Stevanovic
+of the Colorado School of Mines is studying computational material science and relies heavily on the PATh Facility to plan and run data workflows.
+Stevanovic became familiar with the PATh Facility after receiving a Dear Colleague Letter from the NSF.
+
+
His work requires more cores than what the OSPool alone could offer, and he was drawn to the PATh Facility due to its specialization in HTC and ability to guarantee
+availability. Stevanovic and his team hope to develop computational tools to reliably predict the metastable states of solid matter. He describes this work as very
+computational, and has worked with HTC workflows for over 12 years. “PATh is amazing for my research because its primary purpose is HTC, which I rely on,” he said.
+“I’m grateful because my project critically depends on PATh,” he said.
+
+
Stevanovic also appreciates how easy it was to start using the PATh Facility. Start-up credits are typically granted while or directly after meeting with the Facilitation
+team, and Koch’s team continues to support researchers as they ask the NSF for more credits. “The onboarding process was great, and the support from Christina was amazing.
+We were able to get running and get up to speed quickly.”
+
+
Chakraborty’s team faced some initial challenges in switching workflows from in-house to distributed, but coming to the PATh Facility nonetheless expanded the capacity
+available to his team. He recounted that his previous in-house system provided about 28 CPUs per node, while the PATh Facility offers up to 150 CPUs. Overall, Chakraborty
+is optimistic that the new capacity will improve his findings for the future. “We are happy we took that plunge because we went through some difficult times and got help
+from the PATh folks,” he said. “We’ve made some pretty good progress in making our codes run on PATh. It was mostly to be able to use a larger pool of computer power.”
+
+
His work focuses on the simulation of electronic and phonontonic coupled ultrafast relaxation of photoexcited large molecules. The PATh Facility’s new capacity allowed
+his team to make new advances “of a polymer functional system that has a lot of applications,” he said. “It’s not the end, it’s still preliminary and intermediate, and
+they are so exciting. We are looking forward to the final results and finding out new physics.”
+
+
Interested PIs can submit an interest form on the PATh website, to then meet a research computing facilitator for a consultation.
+If the researcher is a good fit, PATh Facilitators help the researcher log in and begin using their start-up credits. If they wish to continue, the researcher begins
+drafting a proposal letter to the NSF, requesting credits. Koch notes that the credit proposal is simpler than a typical project proposal, and the Facilitation team provides
+a multitude of resources such as credit calculators and proposal templates. For users who encounter issues, the facilitation team is available through support email address,
+and weekly support hours, as well as maintaining documentation on the website.
+ Harnessing HTC-enabled precision mental health to capture the complexity of smoking cessation
+
+
Collaborating with CHTC research computing facilitation staff, UW-Madison researcher Gaylen Fronk is using HTC to improve cigarette cessation treatments by accounting for the complex differences among patients.
+
+
+
+
Working at the crossroads of mental health and computing is Gaylen Fronk, a graduate student at the University of Wisconsin-Madison’s Addiction Research Center. By examining treatments for substance use disorders with machine learning models that are enabled by High Throughput Computing (HTC), Fronk captures the array of differences among individuals while still ensuring that her models are applicable to new patients. Her work is embedded within the larger context of precision mental health, an emerging field that relies on computational tools to evaluate complex, individual-level data in determining the fastest and most effective treatment plan for a given patient.
+
+
Fronk’s pursuit of precision mental health has recently led her to the world of computing that involves high-throughput workloads. Currently, she’s using HTC to predict treatment responses for people who are quitting cigarette smoking.
+
+
“I feel like [HTC] has been critical for my entire project,” Fronk reasons. “It removes so many constraints from how I have to think about my research. It keeps so many possibilities open because, within reason, I just don’t have to worry about computational time –– it allows me to explore new questions and test out ideas. It allows me to think bigger and add complexity rather than only having to constrain.”
+
+
Embarking on this project in August of 2019, Fronk began by reaching out to the research computing facilitators at UW-Madison’s Center for High Throughput Computing (CHTC). Dedicated to bringing the power of HTC to all fields of research, CHTC staff provided Fronk with the advice and resources she needed to get up and running. Soon, she was able to access hundreds of concurrent cores on CHTC’s HTC system through the HTCondor Software Suite (HTCSS), which was developed at UW-Madison and is used internationally for automating and managing batch HTC workloads. This computing capacity has been undeniably impactful on Fronk’s research, yet when reflecting on the beginnings of her project today, Fronk considers the collaborative relationships she’s developed along the way to be particularly powerful.
+
+
“I am never going to be a computer scientist,” explains Fronk. “I’m doing my best and I’m learning, but that’s not what my focus is and that’s never going to be my area of expertise. I think it’s really wonderful to be able to lean on people for whom that is their area of expertise, and have those collaborative relationships.” This type of collaboration among computing experts and researchers will be vital as computational advances continue to spread throughout the social sciences. Computing staff like CHTC’s research computing facilitators help researchers to transform, expand, and accelerate their work; and specialized researchers like Fronk provide their domain expertise to ensure these computational methods are incorporated in ways that preserve the conceptual and theoretical basis of their discipline.
+
+
+
+
CHTC research computing facilitator Christina Koch has worked closely with Fronk since the beginning of her project, and elaborates on the benefits arising from this synergistic relationship: “Instead of every research group on campus needing to have their own in-house large-scale computing expert, they can meet with our facilitation team and we provide them with the information they need to expand their research computing vision and apply it to their work. But we also learn a lot ourselves from the wide variety of researchers we consult with. Since our experience isn’t siloed to a particular research domain, we take lessons learned from one group and share them with another group, where normally those groups would never have thought to connect with each other.”
+
+
For fellow social scientists who are considering reaching out to people like Christina and incorporating HTC into their work, Fronk urges them to do just that: “There’s a lot you can teach yourself, but you also don’t have to be on your own. Reach out to the people who know more than you. For me, people like Christina and others on the CHTC team have been invaluable.”
+
+
Fronk’s collaborations with Christina all have revolved around the ongoing project that she first began in August of 2019 –– predicting which cigarette cessation treatments will be most effective for a given individual. Data from a Center for Tobacco Research and Intervention (CTRI) 6-month clinical trial serve as a rich and comprehensive foundation to begin building machine learning models from. With the CTRI data in hand, Fronk not only has access to the treatment type and whether it was successful at the end of the trial, but also to approximately 400 characteristics that capture the fine-tuned individual differences among patients. These include demographic information, physical and mental health records, smoking histories, and social pressures, such as the smoking habits of a patient’s friends, roommates, or spouse.
+
+
All these individual-level differences paint valuable complexity onto the picture, and Fronk is able to embrace and dive into that complexity with the help of HTC. Each job she sends over to CHTC’s cores contains a unique model configuration run against a single cross-validation iteration, meaning that part of the CTRI data is used for model fitting while the unused, ‘new’ data is used for model evaluation. For instance, Fronk might start with as many as 200 unique configurations for a given model. If each of these model configurations is fit and evaluated using a cross-validation technique that has 100 unique splits of data, Fronk would then submit the resulting 20,000 jobs to CHTC.
+
+
Before submitting, Fronk alters her code so that each job runs just a single configuration, single iteration context; effectively breaking the comprehensive CTRI data down into small, manageable pieces. Ultimately, when delegated to hundreds of CHTC cores in concurrent use, Fronk’s largest submissions finish in mere hours, as opposed to days on a local computer.
+
+
Thousands of small jobs are handled easily by HTCSS and CHTC’s distributed resources, after which Fronk can aggregate this multitude of output files on her own computer to average the performance of the model configuration across the array of cross-validation iterations. This aggregated output represents how accurately the model predicts whether a certain cigarette cessation treatment will work for a specific individual. After receiving the output, Fronk evaluates it, learns from it, and repeats –– but this time with new insight.
+
+
After her experience with HTC, Fronk now sees computing as an integral part of her work. In fact, the ideal of precision mental health as a compelling alternative to traditional treatment methods has actually been around for a while –– though scalable computing methods that enable it are just beginning to enter the toolboxes of mental health researchers everywhere. “I feel like high-throughput computing really fills a lot of the holes that are needed to move precision mental health forward,” Fronk expresses. “It makes me really excited to be working at that intersection.”
+
+
And at that intersection, Fronk isn’t alone. As computational resources are becoming more accessible, increasingly more researchers are investigating the frontiers of precision mental health and its potential to improve treatment success. But before this approach moves from the research space and into a clinical setting, careful thought is needed to assess how these experimental models will fare in the real world.
+
+
Approaches that require intensive and expensive data, like neuroimaging or genetic analysis for instance, may not be feasible –– especially for clinics located in low-income communities. Elaborating on this idea, Fronk explains, “It’s really exciting to think that neuroimaging or genetic data might hold a lot of predictive potential –– yet if a person can’t get genotyped or imaged, then they’re not going to be able to be split into treatments. And those problems get compounded in lower income areas, or for people who have been historically excluded and underrepresented both in terms of existing research and access to healthcare.”
+
+
It will take time, research, and ethical forethought before precision mental health approaches can reach local clinics, but when that time comes –– the impact will ripple through the lives of people seeking treatment everywhere. “I think precision mental health can really help people on a much shorter timeline than traditional treatment approaches, and that feels very meaningful to me,” says Fronk. In terms of her focus on cigarette smoking cessation, timing is everything. Cigarette smoking –– as well as other substance use disorders like it –– have extremely high costs of failed treatments at both the personal and societal level. If someone is given the right treatment from the start when they’re most motivated to quit, it mitigates not only their own health and financial risks, but also those of society’s.
+
+
Ultimately, these impacts stem from the collaborative relationships seen today between researchers like Fronk and computing facilitators like Christina at CHTC. There’s still much to be done before precision mental health approaches can be brought to bear in the clinical world, but high-throughput computing is powering the research to move that direction in a way that never was possible before. Complexity –– which used to limit Fronk’s research –– now seems to be absolutely central to it.
+ CHTC Partners Using CHTC Technologies and Services
+
+
+
+
+
+
+
+
+
+
+
+
+
+
IceCube and Francis Halzen
+
+ Francis Halzen, principal investigator of IceCube and the Hilldale and Gregory Breit Distinguished Professor of Physics.
IceCube has transformed a cubic kilometer of natural Antarctic ice into a neutrino detector. We have discovered a flux of high-energy neutrinos of cosmic origin, with an energy flux that is comparable to that of high-energy photons. We have also identified its first source: on September 22, 2017, following an alert initiated by a 290-TeV neutrino, observations by other astronomical telescopes pinpointed a flaring active galaxy, powered by a supermassive black hole. We study the neutrinos themselves, some with energies exceeding by one million those produced by accelerators. The IceCube Neutrino Observatory is managed and operated by the Wisconsin IceCube Astroparticle Physics Center (WIPAC) in the Office of the Vice Chancellor of Graduate Education and Research and funded by a cooperative agreement with the National Science Foundation. We have used CHTC and the Open Science Pool for over a decade to perform all large-scale data analysis tasks and generate Monte Carlo simulations of the instrument's performance. Without CHTC and OSP resources we would simply be unable to make any of IceCube's groundbreaking discoveries. Francis Halzen is the Principal Investigator of IceCube. See the IceCube web site for project details.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
David O’Connor
+
+ David H. O’Connor, Ph.D., UW Medical Foundation (UWMF) Professor, Department of Pathology and Laboratory Medicine at the AIDS Vaccine Research Laboratory.
Computational workflows that analyze and process genomics sequencing data have become the standard in Virology and genomics research. The resources provided by CHTC allow us to scale up the amount of sequence analysis performed while decreasing sequence processing time. An example of how we use CHTC is generating consensus sequences for COVID-19 samples. Part of this is a step that separates and sorts a multisample sequencing run into individual samples, maps the reads of these individual samples to a reference sequence, and then forms a consensus sequence for each sample. Simultaneously, different metadata and other accessory files are generated on a per-sample basis, and data is copied to and from local machines. This workflow is cumbersome when there are, for example, 4 sequencing runs with each run containing 96 samples. CHTC allows us to cut the processing time from 16 hours to 40 minutes due to the distribution of jobs to different CPUs. Overall, being able to use CHTC resources gives us a major advantage in producing results faster for large-scale and/or time-sensitive projects.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Small Molecule Screening Facility and Spencer Ericksen
+
+ Spencer Ericksen, Scientist II at the Small Molecule Screening Facility, part of the Drug Development Core – a shared resource in the UW Carbone Cancer Center.
I have been working on computational methods for predicting biomolecular recognition processes. The motivation is to develop reliable models for predicting binding interactions between drug-like small molecules and therapeutic target proteins. Our team at SMSF works with campus investigators on early-stage academic drug discovery projects. Computational models for virtual screening could prioritize candidate molecules for faster, cheaper focused screens on just tens of compounds. To perform a virtual screen, the models evaluate millions to billions of molecules, a computationally daunting task. But CHTC facilitators have been with us through every obstacle, helping us to effectively scale through parallelization over HTC nodes, matching appropriate resources to specific modeling tasks, compiling software, and using Docker containers. Moreover, CHTC provides access to vast and diverse compute resources.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Natalia de Leon
+
+ Natalia de Leon, Professor of Agronomy, Department of Agronomy.
The goal of her research is to identify efficient mechanisms to better understand the genetic constitution of economically relevant traits and to improve plant breeding efficiency. Her research integrates genomic, phenomic, and environmental information to accelerate translational research for enhanced sustainable crop productivity.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
xDD project and Shanan Peters
+
+ Shanan Peters, project lead for xDD, Dean L. Morgridge Professor of Geology, Department of Geoscience.
Shanan’s primary research thrust involves quantifying the spatial and temporal distribution of rocks in the Earth’s crust in order to constrain the long-term evolution of life and Earth’s surface environment. Compiling data from scientific publications is a key component of this work and Peters and his collaborators are developing machine reading systems deployed over the xDD digital library ad cyberinfrastructure hosted in the CHTC for this purpose.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Susan Hagness
+
+ In our research we're working on a novel computational tool for THz-frequency characterization of materials with high carrier densities, such as highly-doped semiconductors and metals. The numerical technique tracks carrier-field dynamics by combining the ensemble Monte Carlo simulator of carrier dynamics with the finite-difference time-domain technique for Maxwell's equations and the molecular dynamics technique for close-range Coulomb interactions. This technique is computationally intensive and each test runs long enough (12-20 hours) that our group's cluster isn't enough. This is why we think CHTC can help, to let us run more jobs than we're able to run now.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Joao Dorea
+
+ Joao Dorea, Assistant Professor in the Department of Animal and Dairy Sciences/Department of Biological Systems Engineering.
The Digital Livestock Lab develops research focused on high-throughput phenotyping strategies to optimize farm management decisions. Our research group is interested in the large-scale development and implementation of computer vision systems, wearable sensors, and infrared spectroscopy (NIR and MIR) to monitor animals in livestock systems. We have a large computer vision system implemented in two UW research farms that generate large datasets. With the help of CHTC, we can train deep learning algorithms with millions of parameters using large image datasets and evaluate their performance in farm settings in a timely manner. We use these algorithms to monitor animal behavior, growth development, social interaction, and to build predictive models for early detection of health issues and productive performance. Without access to the GPU cluster and the facilitation made by CHTC, we would not be able to quickly implement AI technologies in livestock systems.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Paul Wilson
+
+ Paul Wilson, head of The Computational Nuclear Engineering Research Group (CNERG), the Grainger Professor for Nuclear Engineering, and the current chair of the Department of Engineering Physics.
CNERG’s mission is to foster the development of new generations of nuclear engineers and scientists through the development and deployment of open and reliable software tools for the analysis of complex nuclear energy systems. Our inspiration and motivation come from performing those analyses on large, complex systems. Such simulations require ever-increasing computational resources and CHTC has been our primary home for both HPC and HTC computation for nearly a decade. In addition to producing our results faster and without the burden of administering our computer hardware, we rely on CHTC resources to demonstrate performance improvements that arise from our methods development. The role-defining team of research computing facilitators has ensured a smooth onboarding of each of them and helped them find the resources they need to be most effective.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Barry Van Veen
+
+ The bio-signal processing laboratory develops statistical signal processing methods for biomedical problems. We use CHTC for casual network modeling of brain electrical activity. We develop methods for identifying network models from noninvasive measures of electric/ magnetic fields at the scalp, or invasive measures of the electric fields at or in the cortex, such as electrocorticography. Model identification involves high throughput computing applied to large datasets consisting of hundreds of spatial channels each containing thousands of time samples.
+
+
+
+
+
+
+
+
+
+
+
+
+
CMS LHC Compact Muon Solenoid
+
+ The UW team participating in the Compact Muon Solenoid (CMS) experiment analyzes petabytes of data from proton-proton collisions in the Large Hadron Collider (LHC). We use the unprecedented energies of the LHC to study Higgs Boson signatures, Electroweak Physics, and the possibility of exotic particles beyond the Standard Model of Particle Physics. Important calculations are also performed to better tune the experiment's trigger system, which is responsible for making nanosecond-scale decisions about which collisions in the LHC should be recorded for further analysis.
+
+
+
+
+
+
+
+
+
+
+
+
+
Biomagnetic Resonance Data Bank
+
+ The Biomagnetic Resonance Data Bank (BMRB) is headquarted within UW-Madison's National Magnetic Resonance Facility at Madison (NMRFAM) and uses the CHTC for research in connection with the Biological Magnetic Resonance Data Bank (BMRB).
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Phil Townsend
+
+ Professor Phil Townsend of Forestry and Wildlife Ecology says Our research (NASA & USDA Forest Service funded) strives to understand the outbreak dynamic of major forest insect pests in North America through simulation modeling. As part of this effort, we map forest species and their abundance using multi-temporal Landsat satellite data. My colleagues have written an automatic variable selection routine in MATLAB to preselect the most important image variables to model and map forest species abundance. However, depending on the number of records and the initial variables, this process can take weeks to run. Hence, we seek resources to speed up this process.
+
RC facilitators serve as proactive and personalized guides, helping researchers
+identify and implement computational approaches that result in the greatest impact to their projects. Rather
+than possessing a significant depth of expertise in computational technologies, RC facilitators build and
+leverage their team of expert technical staff and translate the details of computational options for individual
+researchers. Through this two-way relationship-building approach, dedicated RC facilitators have enabled
+previously unimagined and significant scholarship outcomes of scale and scope across a variety of research
+domains, especially within the space of campus-supported research computing centers.
+
+
Impact
+
+
Since the hiring of the first Research Computing Facilitator in 2013 usage of computing services by previously underserved researchers
+increased significantlyy (see figure below). Importantly, more than 95% of usage from the life sciences and
+social sciences has been on an HTC-optimized compute configuration rather than a traditional HPC
+cluster, emphasizing the applicability of multiple compute configurations to meet needs across domains.
+
+
+
+
Goals
+
+
The following outlines the primary goals (the needs) of successful RC facilitation and identifies the related
+major activities for achieving those goals.
+
+
+
Proactive Engagement
+
Personalized Guidance
+
Teaching Researchers to Fish
+
Building Relationships
+
Advocating for Research Needs
+
Developing Connections among Staff
+
+
+
Skills and Backgrounds
+
+
Three key areas of experience and interest are relevant for successful RC facilitators: individual interests
+and motivation, communication and interpersonal skills, and technical knowledge.
+
+
Interests and Motivation
+
+
+
A desire to enable and support the scholarly work of others
+
Interest in a wide set of research domains beyond their own area of expertise
+
The ability and the desire to work in a team environment
+
A desire to further develop the skills and interests relevant to effective facilitation
+
+
+
Communication and Interpersonal Skills
+
+
+
Excellent written and verbal communication, including active and empathetic listening skills and an
+ability to translate complex and domain-specific information for nonspecialists
+
Demonstrated effectiveness and comfort in teaching and public speaking
+
Success and demonstrated interest in interpersonal networking and liaising
+
The desire to work in a team environment, where staff frequently depend on one another
+
Leadership skills that inspire action and coordinate the activities of shared contributions
+
+
+
Technical Abilities
+
+
+
Prior experience conducting research projects or other significant scholarly work with some
+integration of relevant computational systems and tools
+
A demonstrated ability to understand multiple aspects of a problem and identify appropriate solutions
+
The ability to provide solution-agnostic support by focusing on research requirements and desired
+outcomes
+
A desire for continuous learning of relevant technology topics
+ Douglas Thain, Todd Tannenbaum, and Miron Livny,
+ "Distributed Computing in Practice: The Condor Experience"
+ Concurrency and Computation: Practice and Experience,
+ Vol. 17, No. 2-4, pages 323-356, February-April, 2005.
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+ Douglas Thain and Miron Livny,
+ "Building Reliable Clients and Servers",
+ in Ian Foster and Carl Kesselman, editors,
+ The Grid: Blueprint for a New Computing Infrastructure,
+ Morgan Kaufmann, 2003, 2nd edition. ISBN: 1-55860-933-4.
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+ Douglas Thain, Todd Tannenbaum, and Miron Livny,
+ "Condor and the Grid",
+ in Fran Berman, Anthony J.G. Hey, Geoffrey Fox, editors,
+ Grid Computing: Making The Global Infrastructure a Reality,
+ John Wiley, 2003.
+ ISBN: 0-470-85319-0
+ [PDF]
+
+ [BibTeX Source for Citation]
+
+
+ Todd Tannenbaum, Derek Wright, Karen Miller, and Miron Livny,
+ "Condor - A Distributed Job Scheduler",
+ in Thomas Sterling, editor,
+ Beowulf Cluster Computing with Linux,
+ The MIT Press, 2002.
+ ISBN: 0-262-69274-0
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+ [MIT Press' Web Page]
+
+ The MIT Press is pleased to present material from a preliminary draft of
+ Beowulf Cluster Computing with Linux.
+ This material is Copyright 2002 Massachusetts Institute of Technology, and
+ may not be used or distributed for any commercial purpose without the
+ express written consent of The MIT Press. Because
+ this material was a draft chapter, neither
+ The MIT Press nor the authors can be held liable for changes or
+ alternations in the final edition.
+
+
+ Jim Basney and Miron Livny,
+ "Deploying a High Throughput Computing Cluster",
+ High Performance Cluster Computing, Rajkumar Buyya, Editor,
+ Vol. 1, Chapter 5, Prentice Hall PTR, May 1999.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+ Miron Livny, Jim Basney, Rajesh Raman, and Todd Tannenbaum,
+ "Mechanisms for High Throughput Computing",
+ SPEEDUP Journal, Vol. 11, No. 1, June 1997.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+ Jim Basney, Miron Livny, and Todd Tannenbaum,
+ "High Throughput Computing with Condor",
+ HPCU news, Volume 1(2), June 1997.
+
+
+
+ D. H. J Epema, Miron Livny, R. van Dantzig, X. Evers, and Jim Pruyne,
+ "A Worldwide Flock of Condors : Load Sharing among Workstation Clusters"
+ Journal on Future Generations of Computer Systems, Volume 12, 1996
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+ Scott Fields,
+ "Hunting for Wasted Computing Power",
+ 1993 Research Sampler, University of Wisconsin-Madison.
+ [HTML]
+
+
+ Michael Litzkow, Miron Livny, and Matt Mutka,
+ "Condor - A Hunter of Idle Workstations",
+ Proceedings of the 8th International Conference of Distributed Computing Systems,
+ pages 104-111, June, 1988.
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+ Michael Litzkow,
+ "Remote Unix - Turning Idle Workstations into Cycle Servers",
+ Proceedings of Usenix Summer Conference, pages 381-384, 1987.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+
+
Matchmaking and ClassAds
+
+
+ Nicholas Coleman, "Distributed Policy Specification and Interpretation with
+ Classified Advertisements", Practical Aspects of Declarative Languages, Lecture
+ Notes in Computer Science Volume 7149, 2012, pp 198-211, January 2012.
+ [PDF]
+
+
+
+ Rajesh Raman, Miron Livny, and Marvin Solomon,
+ "Policy Driven Heterogeneous Resource Co-Allocation with Gangmatching",
+ Proceedings of the Twelfth IEEE International Symposium on
+ High-Performance Distributed Computing, June, 2003, Seattle, WA
+ [Postscript]
+ [PDF]
+
+
+ Nicholas Coleman, Rajesh Raman, Miron Livny and Marvin Solomon,
+ "Distributed Policy Management and Comprehension with Classified
+ Advertisements",
+ University of Wisconsin-Madison Computer Sciences Technical Report #1481,
+ April 2003.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+ Nicholas Coleman, "An Implementation of Matchmaking Analysis in Condor",
+ Masters' Project report,University of Wisconsin, Madison, May 2001.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+ Rajesh Raman, Miron Livny, and Marvin Solomon,
+ "Resource Management through Multilateral Matchmaking",
+ Proceedings of the Ninth IEEE Symposium on High Performance Distributed Computing (HPDC9),
+ Pittsburgh, Pennsylvania, August 2000, pp 290-291.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+ Rajesh Raman, Miron Livny, and Marvin Solomon,
+ "Matchmaking: Distributed Resource Management for High Throughput Computing",
+ Proceedings of the Seventh IEEE International Symposium on High Performance Distributed Computing, July 28-31, 1998, Chicago, IL.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+
+
Workflow and DAGMan
+
+
Peter Couvares, Tevik Kosar, Alain Roy, Jeff Weber and Kent
+ Wenger, "Workflow in Condor", in In Workflows for e-Science, Editors: I.Taylor, E.Deelman, D.Gannon,
+ M.Shields, Springer Press, January 2007 (ISBN: 1-84628-519-4)
+ [PDF]
+
+
+
+
+
Resource Management
+
+
+ Zhe Zhang, Brian Bockelman, Dale Carder, and Todd Tannenbaum,
+ "Lark: Bringing Network Awareness to High Throughput Computing",
+ Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2015), Shenzhen, Guangdong, China, May 2015.
+ [PDF]
+
+
+ Jim Basney and Miron Livny,
+ "Managing Network Resources in Condor",
+ Proceedings of the Ninth IEEE Symposium on High Performance Distributed Computing (HPDC9),
+ Pittsburgh, Pennsylvania, August 2000, pp 298-299.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+ Jim Basney and Miron Livny,
+ "Improving Goodput by Co-scheduling CPU and Network Capacity",
+ International Journal of High Performance Computing Applications,
+ Volume 13(3), Fall 1999.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+ Miron Livny and Rajesh Raman,
+ "High Throughput Resource Management",
+ chapter 13 in The Grid: Blueprint for a New Computing Infrastructure,
+ Morgan Kaufmann, San Francisco, California, 1999.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+ Morgan Kaufmann is pleased to present material from a preliminary draft of
+ High Performance Distributed Computing: Building a Computational Grid;
+ the material is Copyright 1997 Morgan Kaufmann Publishers. This
+ material may not be used or distributed for any commercial purpose without the
+ express written consent of Morgan Kaufmann Publishers. Please note that
+ this material is a draft of forthcoming publication, and as such neither
+ Morgan Kaufmann nor the author can be held liable for changes or
+ alternations in the final edition.
+
+
+ Matt Mutka and Miron Livny,
+ "The Available Capacity of a Privately Owned Workstation Environment",
+ Performance Evaluation, vol. 12, no. 4 pp. 269-284, July, 1991.
+ [BibTeX Source for Citation]
+
+
+ Matt Mutka and Miron Livny,
+ "Profiling Workstations' Available Capacity for Remote Execution",
+ Performance '87,12th IFIP WG 7.3, pp. 529-544, December 1987.
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+
+
Checkpointing
+
+
+ Joe Meehean and Miron Livny,
+ "A Service Migration Case Study: Migrating the Condor Schedd",
+ Midwest Instruction and Computing Symposium, April 2005.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+ Jim Basney, Miron Livny, and Paolo Mazzanti,
+ "Utilizing Widely Distributed Computational Resources Efficiently with Execution Domains",
+ Computer Physics Communications, 2001.
+ (This is an extended version of the CHEP 2000 paper below.)
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+ Jim Basney, Miron Livny, and Paolo Mazzanti,
+ "Harnessing the Capacity of Computational Grids for High Energy Physics",
+ Proceedings of the International Conference on Computing in High Energy and
+ Nuclear Physics (CHEP 2000),
+ February 2000, Padova, Italy.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+ Michael Litzkow, Todd Tannenbaum, Jim Basney, and Miron Livny,
+ "Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System",
+ University of Wisconsin-Madison Computer Sciences Technical Report #1346,
+ April 1997.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+ Jim Pruyne and Miron Livny,
+ "Managing Checkpoints for Parallel Programs",
+ Workshop on Job Scheduling Strategies for Parallel Processing IPPS '96.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+ Todd Tannenbaum and Michael Litzkow,
+ "Checkpointing and Migration of UNIX Processes in the Condor Distributed Processing System",
+ Dr Dobbs Journal, Feb 1995.
+ [HTML]
+ [Postscript]
+ [BibTeX Source for Citation]
+
+
+ Michael Litzkow and Marvin Solomon,
+ "Supporting Checkpointing and Process Migration Outside the UNIX Kernel",
+ Usenix Conference Proceedings,
+ San Francisco, CA, January 1992, pages 283-290.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+
+
Data Intensive Computing
+
+
+
+ Parag Mhashilkar, Zachary Miller, Rajkumar Kettimuthu, Gabriele Garzoglio, Burt
+ Holzman, Cathrin Weiss, Xi Duan, and Lukasz Lacinski, "End-To-End Solution for
+ Integrated Workload and Data Management using GlideinWMS and Globus Online",
+ Journal of Physics: Conference Series, Volume 396, Issue 3, Year 2012
+ [PDF]
+
+
+
+ Ian T. Foster, Josh Boverhof, Ann Chervenak, Lisa Childers, Annette DeSchoen,
+ Gabriele Garzoglio, Dan Gunter, Burt Holzman, Gopi Kandaswamy, Raj Kettimuthu,
+ Jack Kordas, Miron Livny, Stuart Martin, Parag Mhashilkar, Zachary Miller,
+ Taghrid Samak, Mei-Hui Su, Steven Tuecke, Vanamala Venkataswamy, Craig Ward,
+ Cathrin Weiss,
+ "Reliable high-performance data transfer via Globus Online",
+ in Proc. SciDAC 2011, Denver, CO, July 10-14.
+ [PDF]
+
+
+
+ Ann Chervenak, Ewa Deelman, Miron Livny, Mei-Hui Su, Rob Schuler, Shishir Bharathi, Gaurang Mehta, Karan Vahi,
+ "Data Placement for Scientific Applications in Distributed Environments",
+ In Proceedings of the 8th IEEE/ACM International Conference on Grid
+ Computing (Grid 2007), Austin, TX, September 2007.
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+
+ George Kola, Tevfik Kosar, Jaime Frey, Miron Livny, Robert J. Brunner and Michael Remijan,
+ "DISC: A System for Distributed Data Intensive Scientific Computing",
+ In Proceedings of the First Workshop on Real, Large Distributed Systems (WORLDS'04), San Francisco, CA, December 2004, in conjunction with OSDI'04
+ [PostScript]
+ [PDF]
+
+
+
+
+ George Kola, Tevfik Kosar and Miron Livny,
+ "Profiling Grid Data Transfer Protocols and Servers",
+ In Euro-Par 2004, Pisa, Italy, September 2004.
+ [PDF]
+ [BibTeX Source for Citation]
+
+ George Kola, Tevfik Kosar and Miron Livny,
+ "Run-time Adaptation of Grid Data-placement Jobs",
+ In Parallel and Distributed Computing Practices, 2004.
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Tevfik Kosar and Miron Livny, "Stork: Making Data Placement a First Class Citizen in the Grid",
+ In Proceedings of 24th IEEE Int. Conference on Distributed Computing Systems (ICDCS2004), Tokyo, Japan, March 2004.
+ [PDF]
+
+
+
+ Tevfik Kosar, George Kola and Miron Livny, "A Framework for Self-optimising, Fault-tolerant, High Performance Bulk Data Transfers in a Heterogeneous Grid Environment",
+ Proceedings of 2nd Int. Symposium on Parallel and Distributed Computing (ISPDC2003), Ljubljana, Slovenia, October 2003.
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ George Kola, Tevfik Kosar and Miron Livny,
+ "Run-time Adaptation of Grid Data-placement Jobs",
+ Proceedings of Int. Workshop on Adaptive Grid Middleware (AGridM2003), New Orleans, LA, September 2003.
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Tevfik Kosar, George Kola and Miron Livny,
+ "Building Data Pipelines for High Performance Bulk Data Transfers in a Heterogeneous Grid Environment",
+ Technical Report CS-TR-2003-1487, University of Wisconsin-Madison Computer Sciences, August 2003.
+ [PDF]
+
+
+
+
Grid Computing
+
+
+ C. Acosta-Silva, A. Delgado Peris, J. Flix, J. Frey, J.M. Hernández, A. Pérez-Calero Yzquierdo, and T. Tannenbaum
+ "Exploitation of network-segregated CPU resources in CMS",
+ Proceedings of the 25th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2021), May 2021.
+ [PDF]
+
+
+ Brian Bockelman, Miron Livny, Brian Lin, Francesco Prelz
+ "Principles, technologies, and time: The translational journey of the HTCondor-CE",
+ Journal of Computational Science, 2020
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ B Bockelman, T Cartwright, J Frey, E M Fajardo, B Lin, M Selmeci, T Tannenbaum and M Zvada
+ "Commissioning the HTCondor-CE for the Open Science Grid",
+ Journal of Physics: Conference Series, Vol. 664, 2015
+ [PDF]
+ [BibTeX /Source for Citation]
+
+
+
+ I Sfiligoi, D C Bradley, Z Miller, B Holzman, F Würthwein, J M Dost, K Bloom,
+ and C Grandi, "glideinWMS experience with glexec",
+ Journal of Physics: Conference Series, Volume 396, Issue 3, Year 2012
+ [PDF]
+
+
+
+ W Andrews, B Bockelman, D Bradley, J Dost, D Evans, I Fisk, J Frey, B Holzman, M Livny, T Martin, A McCrea, A Melo, S Metson, H Pi, I Sfiligoi, P Sheldon, T Tannenbaum, A Tiradani, F Würthwein and D Weitzel,
+ "Early experience on using glideinWMS in the cloud",
+ Journal of Physics: Conference Series, Vol. 331, No. 6, 2011
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Igor Sfiligoi, Greg Quinn, Chris Green, Greg Thain,
+ "Pilot Job Accounting and Auditing in Open Science Grid",
+ The 9th IEEE/ACM International Conference on Grid Computing,
+ Tsukuba, Japan, 2008
+ [PDF]
+
+
+
+ Alexandru Iosup, Dick H.J. Epema, Todd Tannenbaum, Matthew Farrellee, Miron Livny,
+ "Inter-Operating Grids through Delegated MatchMaking",
+ in proceedings of the International Conference for High Performance
+ Computing, Networking, Storage and Analysis (SC07),
+ Reno, Nevada, November 2007.
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Sander Klous, Jamie Frey, Se-Chang Son, Douglas Thain, Alain Roy,
+ Miron Livny, and Jo van den Brand, "Transparent Access to Grid
+ Resources for User Software", in Concurrency and Computation:
+ Practice and Experience, Volume 18, Issue 7, pages 787-801, 2006.
+
+
+
+ Sechang Son, Matthew Farrellee, and Miron Livny,
+ "A Generic Proxy Mechanism for Secure Middlebox Traversal",
+ CLUSTER 2005,
+ Boston, MA, September 26-30, 2005.
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Bruce Beckles, Sechang Son, and John Kewley,
+ "Current methods for negotiating firewalls for the Condor system",
+ Proceedings of the 4th UK e-Science All Hands Meeting 2005,
+ Nottingham, UK, September 19-22, 2005.
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Sechang Son, Bill Allcock and Miron Livny,
+ "CODO: Firewall Traversal by Cooperative On-Demand Opening",
+ Proceedings of the 14th IEEE Symposium on High Performance Distributed Computing (HPDC14),
+ Research Triangle Park, NC, July 24-27, 2005.
+ [PDF]
+ [MS Word]
+ [BibTeX Source for Citation]
+
+
+
+ Clovis Chapman, Paul Wilson, Todd Tannenbaum, Matthew Farrellee, Miron Livny, John Brodholt, and Wolfgang Emmerich,
+ "Condor services for the global grid: Interoperability between Condor and OGSA",
+ Proceedings of the 2004 UK e-Science All Hands Meeting, ISBN 1-904425-21-6, pages 870-877, Nottingham, UK, August 2004.
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Clovis Chapman, Charaka Goonatilake, Wolfgang Emmerich, Matthew Farrellee, Todd Tannenbaum, Miron Livny, Mark Calleja, and Martin Dove,
+ "Condor BirdBath: Web Service interfaces to Condor",
+ Proceedings of the 2005 UK e-Science All Hands Meeting, ISBN 1-904425-53-4, pages 737-744, Nottingham, UK, September 2005.
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Sriya Santhanam, Pradheep Elango, Andrea Arpaci-Dusseau, and Miron Livny,
+ "Deploying Virtual Machines as Sandboxes for the Grid",
+ WORLDS 2005, San Francisco, CA, December 2004
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+
+ George Kola, Tevfik Kosar and Miron Livny,
+ "Phoenix: Making Data-intensive Grid Applications Fault-tolerant",
+ In Grid 2004, Pittsburgh, PA, November 2004
+ [PostScript]
+ [PDF]
+
+
+
+
+ Andrew Baranovski, Gabriele Garzoglio, Igor Terekhov, Alain Roy and Todd Tannenbaum,
+ "Management of Grid Jobs and Data within SAMGrid",
+ Proceedings of the 2004 IEEE International Conference on Cluster Computing,
+ pages 353-360,
+ San Diego, CA, September 2004.
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ George Kola, Tevfik Kosar and Miron Livny,
+ "A Client-centric Grid Knowledgebase",
+ Proceedings of the 2004 IEEE International Conference on Cluster Computing,
+ pages 431-438,
+ San Diego, CA, September 2004.
+ [PostScript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ George Kola, Tevfik Kosar and Miron Livny,
+ "Profiling Grid Data Transfer Protocols and Servers",
+ In Euro-Par 2004, Pisa, Italy, September 2004.
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ John Bent, Douglas Thain, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, and Miron Livny,
+ "Explicit Control in a Batch Aware Distributed File System",
+ Proceedings of the First USENIX/ACM Conference on Networked Systems Design and Implementation,
+ San Francisco, CA, March 2004.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Sechang Son and Miron Livny, "Recovering Internet Symmetry in Distributed Computing",
+ Proceedings of the 3rd International Symposium on Cluster Computing and the Grid, Tokyo, Japan, May 2003.
+ [PDF]
+ [MS Word]
+ [BibTeX Source for Citation]
+
+
+
+ Douglas Thain, John Bent, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau and Miron Livny,
+ "Pipeline and Batch Sharing in Grid Workloads",
+ in Proceedings of the Twelfth IEEE Symposium on High Performance Distributed Computing,
+ Seattle, WA, 2003.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Douglas Thain and Miron Livny,
+ "The Ethernet Approach to Grid Computing",
+ in Proceedings of the Twelfth IEEE Symposium on High Performance Distributed Computing,
+ Seattle, WA, 2003.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ John Bent, Venkateshwaran Venkataramani, Nick LeRoy,
+ Alain Roy, Joseph Stanley, Andrea Arpaci-Dusseau,
+ Remzi Arpaci-Dusseau, and Miron Livny,
+ "NeST - A Grid Enabled Storage Appliance",
+ in Jan Weglarz and Jarek Nabrzyski and Jennifer Schopf and
+ Macief Stroinkski, editors,
+ Grid Resource Management,
+ Kluwer Academic Publishers, 2003.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Douglas Thain, Todd Tannenbaum, and Miron Livny,
+ "Condor and the Grid",
+ in Fran Berman, Anthony J.G. Hey, Geoffrey Fox, editors,
+ Grid Computing: Making The Global Infrastructure a Reality,
+ John Wiley, 2003.
+ ISBN: 0-470-85319-0
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Francesco Giacomini,
+ Francesco Prelz,
+ Massimo Sgaravatto,
+ Igor Terekhov,
+ Gabriele Garzoglio,
+ and Todd Tannenbaum,
+ "Planning on the Grid: A Status Report [DRAFT]",
+ Technical Report PPDG-20,
+ Particle Physics Data Grid collaboration (http://www.ppdg.net),
+ October 2002.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ John Bent, Venkateshwaran Venkataramani,
+ Nick LeRoy,
+ Alain Roy,
+ Joseph Stanley,
+ Andrea Arpaci-Dusseau,
+ Remzi H. Arpaci-Dusseau,
+ and Miron Livny,
+ "Flexibility, Manageability, and Performance in a Grid Storage Appliance",
+ Proceedings of the Eleventh IEEE Symposium on High Performance Distributed Computing,
+ Edinburgh, Scotland, July 2002.
+ [Abstract]
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Douglas Thain and Miron Livny,
+ "Error Scope on a Computational Grid: Theory and Practice",
+ Proceedings of the Eleventh IEEE Symposium on High Performance Distributed Computing (HPDC11),
+ Edinburgh, Scotland, July 2002.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+ (This paper also describes aspects of Condor's Java Universe)
+
+
+
+ Douglas Thain, John Bent, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, and Miron Livny,
+ "Gathering at the Well: Creating Communities for Grid I/O",
+ in Proceedings of Supercomputing 2001,
+ Denver, Colorado, November 2001.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ James Frey, Todd Tannenbaum, Ian Foster, Miron Livny, and Steven Tuecke,
+ "Condor-G: A Computation Management Agent for Multi-Institutional Grids",
+ Journal of Cluster Computing
+ volume 5, pages 237-246, 2002.
+ [BibTeX Source for Citation]
+
+
+
+ James Frey, Todd Tannenbaum, Ian Foster, Miron Livny, and Steven Tuecke,
+ "Condor-G: A Computation Management Agent for Multi-Institutional Grids",
+ Proceedings of the Tenth IEEE Symposium on High Performance Distributed Computing (HPDC10)
+ San Francisco, California, August 7-9, 2001.
+ [Postscript]
+ [PDF]
+ [MS Word]
+ [BibTeX Source for Citation]
+
+
+
+ Douglas Thain, Jim Basney, Se-Chang Son, and Miron Livny,
+ "The Kangaroo Approach to Data Movement on the Grid",
+ in Proceedings of the Tenth IEEE Symposium on High Performance Distributed Computing (HPDC10),
+ San Francisco, California, August 7-9, 2001.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ John Bent, "Building Storage Appliances for the Grid and Beyond",
+ Masters' Project report, University of Wisconsin, Madison, May 2001.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+
+
Master-Worker Computing (MW, PVM, MPI, CARMI)
+
+
+
Elisa Heymann, Miquel A. Senar, Emilio Luque, and Miron Livny,
+ "Adaptive Scheduling for Master-Worker Applications on the Computational Grid".
+ in Proceedings of the First IEEE/ACM International Workshop on Grid Computing (GRID 2000), Bangalore, India, December 17, 2000.
+ [Postscript]
+ [PDF]
+ [MS Word]
+ [BibTeX Source for Citation]
+
+
+
Elisa Heymann, Miquel A. Senar, Emilio Luque, and Miron Livny,
+ "Evaluation of an Adaptive Scheduling Strategy for Master-Worker Applications on Clusters of Workstations".
+ in Proceedings of the 7th International Conference on High Performance Computing (HiPC 2000), Bangalore, India, December 17, 2000.
+ [Postscript]
+ [PDF]
+ [MS Word]
+ [BibTeX Source for Citation]
+
+
+
Jeff Linderoth, Sanjeev Kulkarni, Jean-Pierre Goux, and Michael Yoder,
+ "An Enabling Framework for Master-Worker Applications on the Computational Grid",
+ Proceedings of the Ninth IEEE Symposium on High Performance Distributed Computing (HPDC9),
+ Pittsburgh, Pennsylvania, August 2000, pp 43-50.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Jeff Linderoth, Jean-Pierre Goux, and Michael Yoder,
+ "Metacomputing and the Master-Worker Paradigm",
+ Preprint ANL/MCS-P792-0200,
+ Mathematics and Computer Science Division, Argonne National Laboratory, February 2000.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Jim Pruyne and Miron Livny,
+ "Providing Resource Management Services to Parallel Applications",
+ Proceedings of the Second Workshop on Environments and Tools for Parallel Scientific Computing, May, 1994.
+ [Postscript]
+ [BibTeX Source for Citation]
+
+
+
+
+
Java
+
+
+ Douglas Thain and Miron Livny,
+ "Error Scope on a Computational Grid: Theory and Practice",
+ Proceedings of the Eleventh IEEE Symposium on High Performance Distributed Computing (HPDC11),
+ Edinburgh, Scotland, July 2002.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+ (This paper describes aspects of error handling in Condor's Java Universe)
+
+
+
+ Al Globus, Eric Langhirt, Miron Livny, Ravishankar Ramamurthy, Marvin Solomon, and Steve Traugott,
+ "JavaGenes and Condor: cycle-scavenging genetic algorithms",
+ Proceedings of the ACM Conference on JavaGrande,
+ San Francisco, California, 2000.
+ [PDF]
+ [BibTeX Source for Citation]
+ (This paper describes checkpointing Java applications for opportunistic computing.)
+
+
+
+
+
Remote Execution and Interposition Agents
+
+
+
Douglas Thain and Miron Livny,
+ "Parrot: Transparent User-Level Middleware for Data-Intensive
+ Computing",
+ Scalable Computing: Practice and Experience,
+ Volume 6, Number 3, Pages 9-18, 2005.
+ [PDF]
+
+
+
+
+
+ Douglas Thain and Miron Livny,
+ "Parrot: Transparent User-Level Middleware for Data-Intensive Computing",
+ Workshop on Adaptive Grid Middleware,
+ New Orleans, Louisiana,
+ September 2003.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Douglas Thain and Miron Livny,
+ "Error Management in the Pluggable File System",
+ Technical Report 1448,
+ Computer Sciences Department, University of Wisconsin, October 2002.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Douglas Thain and Miron Livny,
+ "Multiple Bypass: Interposition Agents for Distributed Computing",
+ The Journal of Cluster Computing,
+ Volume 4, 2001, pp 39-47.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Douglas Thain and Miron Livny,
+ "Bypass: A Tool for Building Split Execution Systems",
+ Proceedings of the Ninth IEEE Symposium on High Performance Distributed Computing (HPDC9),
+ Pittsburgh, Pennsylvania, August 2000, pp 79-86.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Victor C. Zandy, Barton P. Miller, and Miron Livny,
+ "Process Hijacking",
+ The Eighth IEEE International Symposium on High Performance Distributed Computing (HPDC8),
+ Redondo Beach, California, August 1999, pp. 177-184.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Miron Livny and Michael Litzkow,
+ "Making Workstations a Friendly Environment for Batch Jobs",
+ Third IEEE Workshop on Workstation Operating Systems,
+ April 1992, Key Biscayne, Florida.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+
+
Security
+
+
+
+ Zach Miller, Dan Bradley, Todd Tannenbaum, Igor Sfiligoi,
+ "Flexible Session Management in a Distributed Environment",
+ Journal of Physics: Conference Series Volume 219, Issue 4, Year 2010.,
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Gabriele Garzoglio, Ian Alderman, Mine Altunay, Rachana Ananthakrishnan, Joe
+ Bester, Keith Chadwick, Vincenzo Ciaschini, Yuri Demchenko, Andrea Ferraro,
+ Alberto Forti, David L. Groep, Ted Hesselroth, John Hover, Oscar Koeroo, Chad
+ La Joie, Tanya Levshina, Zach Miller, Jay Packard, Håkon Sagehaug, Valery
+ Sergeev, Igor Sfiligoi, Neha Sharma, Frank Siebenlist, Valerio Venturi, John
+ Weigand,
+ "Definition and Implementation of a SAML-XACML Profile for Authorization
+ Interoperability Across Grid Middleware in OSG and EGEE",
+ Journal of Grid Computing, Volume 7, Issue 3, Year 2009.
+ [PDF]
+
+
+
+ Hao Wang, Somesh Jha, Miron Livny, and Patrick D. McDaniel,
+ "Security Policy Reconciliation in Distributed Computing Environments",
+ IEEE Fifth International Workshop on Policies for Distributed
+ Systems and Networks (POLICY 2004),
+ June 2004, Yorktown Heights, New York.
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+
+
Scalability and Performance
+
+
+
+ E M Fajardo, J M Dost, B Holzman, T Tannenbaum, J Letts, A Tiradani, B Bockelman, J Frey and D Mason,
+ "How much higher can HTCondor fly?",
+ Journal of Physics: Conference Series, Vol. 664, 2015
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Dan Bradley, Timothy St Clair, Matthew Farrellee, Ziliang Guo, Miron Livny, Igor Sfiligoi,
+ and Todd Tannenbaum,
+ "An update on the scalability limits of the Condor batch system",
+ Journal of Physics: Conference Series, Vol. 331, No. 6, 2011.
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ D Bradley, I Sfiligoi, S Padhi, J Frey and T Tannenbaum,
+ "Scalability and interoperability within glideinWMS",
+ Journal of Physics: Conference Series, Vol. 219, No. 6, 2010
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ D Bradley, S Dasu, M Livny, A Mohapatra, T Tannenbaum and G Thain,
+ "Condor enhancements for a rapid-response adaptive computing environment for LHC",
+ Journal of Physics: Conference Series Vol. 219, No. 6, 2010.
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+
+
Experience
+
+
+ Michael Litzkow and Miron Livny,
+ "Experience With The Condor Distributed Batch System",
+ IEEE Workshop on Experimental Distributed Systems, Oct 1990, Huntsville, Al.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+
+
Scientific Applications
+
+
+
+ Douglas Thain, John Bent, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau and Miron Livny,
+ "Pipeline and Batch Sharing in Grid Workloads",
+ in Proceedings of the Twelfth IEEE Symposium on High Performance Distributed Computing,
+ Seattle, WA, 2003.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+ Jim Basney, Rajesh Raman, and Miron Livny,
+ "High Throughput Monte Carlo",
+ Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing,
+ March 22-24, 1999, San Antonio, Texas.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+ Chungmin Chen, Kenneth Salem, and Miron Livny,
+ "The DBC: Processing Scientific Data Over the Internet",
+ 16th International Conference on Distributed Computing Systems,
+ May 1996.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+
+
Scheduling
+
+
+
+
+ Mark Silberstein, Dan Geiger, Assaf Schuster, and Miron Livny,
+ "Scheduling Mixed Workloads in Multi-grids: The Grid Execution Hierarchy",
+ Proceedings of the 15th IEEE Symposium on High Performance Distributed Computing (HPDC),
+ city, state, month 2006.
+ [PDF]
+ [BibTeX Source for Citation]
+
+ P. E. Krueger and Miron Livny,
+ "A Comparison of Preemptive and Non-Preemptive Load Distributing",
+ Proc. of the 8th International Conference on Distributed Computing Systems,
+ pp. 123-130, June 1988.
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+ Matt Mutka and Miron Livny,
+ "Scheduling Remote Processing Capacity In A Workstation-Processing Bank Computing System",
+ Proceedings of the 7th International Conference of Distributed Computing Systems,
+ pp. 2-9, September, 1987.
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
Alain Roy and Miron Livny, "Condor and Preemptive Resume
+ Scheduling", Published in Grid Resource Management: State of the Art and
+ Future Trends, Fall 2003, pages 135-144, Fall 2003, Edited by Jarek
+ Nabrzyski, Jennifer M. Schopf and Jan Weglarz, published by Kluwer
+ Academic Publishers.
+ [PDF]
+
+
+
+
+
+
NMI Build & Test Laboratory
+
+
+
Andrew Pavlo, Peter Couvares, Rebekah Gietzel, Anatoly Karp, Ian D. Alderman, Miron Livny, and Charles Bacon,
+ "The NMI Build & Test Laboratory: Continuous Integration Framework for Distributed Computing Software",
+ Proceedings of LISA '06: Twentieth Systems Administration Conference,
+ Washington, DC, December 2006, pp. 263 - 273.
+ [Postscript]
+ [PDF]
+ [BibTeX Source for Citation]
+ [PDF of presentation slides]
+
+
A. Iosup, D.H.J. Epema, P. Couvares, A. Karp, and M. Livny,
+ "Build-and-Test Workloads for Grid Middleware: Problem, Analysis, and Applications",
+ Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGRID),
+ IEEE Computer Society, Pages 205-213, May 2007.
+ [PDF]
+ [BibTeX Source for Citation]
+ [PDF of presentation slides]
+
+
+
+
+
Background Work
+
+
+
+ Miron Livny and Myron Melman,
+ "Load Balancing in Homogeneous Broadcast Distributed Systems",
+ Proceedings of Computer Network Performance Symposium,
+ April 13-14, 1982,
+ College Park, Maryland.
+ [PDF]
+ [BibTeX Source for Citation]
+
+
+
+
+
Miscellaneous
+
+
Douglas Thain, Todd Tannenbaum, and Miron Livny, "How to Measure a
+ Large Open Source Distributed System", in Concurrency and
+ Computation: Practice and Experience, to appear in 2006.
+
+
Zach Miller, Todd Tannenbaum, and Ben Liblit, "Enforcing Murphy's
+ Law for Advance Identification of Run-time Failures", Proceedings of
+ USENIX 2012.
+ [PDF]
+
+
+
+
+
PhD Dissertations from HTCondor team members at UW-Madison
+ Plant physiologists used high throughput computing to remedy research “bottleneck”
+
+
HTC resources increased the efficiency of the Spalding group’s data analyses, which enabled an increase in the scope of their research.
+
+
Enhancing his research with high throughput computing was a pivotal moment for University of Wisconsin–Madison molecular plant physiologist Edgar Spalding when his
+research group adopted it in 2006. Over the past five years, the research group has used more than 200,000 computing hours, including to facilitate “the development of the measurement algorithm and the automatic processing of tens-of-thousands of images” of maize seedling root growth, Spalding says.
+
+
+
+
Spalding’s research group was studying Arabidopsis plant populations with genetically diverse members and tracking their response to light or gravity due to a mutation — one seedling at a time. Since Arabidopsis seedlings are only a few millimeters tall, Spalding says his research group found that obtaining high-resolution digital images was the best approach to measure the direction of their growth. A computer collected images every few minutes as the seedlings grew. “If we could characterize this whole genetically diverse population, we could use the powerful techniques of statistical genetics to track down the genes affecting the process. That meant we now had thousands and thousands of images to measure,” Spalding explains.
+
+
The thousands of digital images to measure created a bottleneck in Spalding’s research. That was before he led an effort with the Center for High Throughput Computing (CHTC) Director Miron Livny, other plant biologists, and computer scientists to develop a proposal for a competitive National Science Foundation (NSF) grant that would produce cyberinfrastructure to support plant biology research. Though the application wasn’t successful, the connections Spalding made from that meeting were meaningful nonetheless.
+
+
Speaking with Livny at the meeting — from whom he learned about the capabilities of the HTC approach that was pioneered on our campus — helped Spalding realize the inefficiencies of his group in analyzing thousands of seedlings. “[O]ur research up until that point had been focused on one seedling at a time. Faced with large numbers of seedlings to do a broader scale of investigation meant that we had to find computing methodologies that matched our new data type, which was tens of thousands of images instead of a couple of dozen. That drove our need for a different way of computing,” Spalding describes.
+
+
When asked about which accomplishment using HTC was most impactful, Spalding said “The way we measure yield-related features from maize ears and several thousand kernels has had a large impact.” Others from around the world began asking for their help with making similar measurements. “In many cases, we can use our workflow [algorithms] running on CHTC to process their images of maize ears and kernels and return data that helps them answer their scientific or crop breeding questions,” Spalding says.
+
+
Since the goals of the experiments determine the type of data the researchers collect, they did not need to adjust the type of data they collected. Rather, adopting the HTC approach changed the way they created tools to analyze the data. Today, Spalding says his research group continues to use HTC in three ways: “from tool development to extracting the features from the images with the tool that you developed to applying it in the challenge of statistically matching it to elements of the results to elements of the genome.” As his team became more experienced in writing new algorithms to make measurements, they realized that HTC was useful in developing new methodologies; it was more than just more automation and increased computing capacity.
+
+
In other words, HTC is useful as both a development resource and a production resource. Making measurements on seedlings and then matching processes to the genome elements that control those processes involved an ever-growing amount of computing capacity. “We realized that statistical modeling of the measurements from the biology to the genetic information in the population also benefited from high throughput computing.” HTC in all these cases, Spalding elaborates, “was beneficial and changed the way we work. It changed the nature of the questions we asked.” In addition to these uses of HTC, the research group’s uses of machine learning (ML) also continue to become a bigger part of the tool development stage and in driving the methods to train a model to recognize a feature in a seedling.
+
+
Spalding has also shared his HTC experience with the attendees of the annual OSG School. Spalding emphasizes that students “should not hold back on doing something because they think computing will be a bottleneck. There are ways to bring the computing they need to their problem and they should not shy away from a question just because they think it might be difficult to compute. There are people like the CHTC staff that can remove that bottleneck if the person’s willing to learn about it.”
+
+
“Engaged and motivated collaborators like Spalding and his group is what guides CHTC in advancing the state of the art of HTC and drives our commitment to bring these advances to researchers on the UW-Madison campus and around the world,” says Livny.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/preview-calendar/staff-list/README.md b/preview-calendar/staff-list/README.md
new file mode 100644
index 000000000..5fb553667
--- /dev/null
+++ b/preview-calendar/staff-list/README.md
@@ -0,0 +1,93 @@
+
+# Staff List Submodule
+
+Welcome to the `staff-list` submodule. This submodule is designed to manage and display information about the staff members in a structured and consistent manner. It includes details such as names, roles, images, and affiliations. To ensure uniformity and ease of management, please adhere to the guidelines provided below.
+
+## File Naming Conventions
+
+### YML Files
+
+Each staff member should have a corresponding `.yml` file named according to the following convention:
+
+```
+firstName_lastName.yml
+```
+
+This file contains structured data about the staff member, such as their name, image path, title, and more.
+
+### Image Files
+
+Staff member images should be stored in the `images/` directory and named following this convention:
+
+```
+images/firstName_lastName.jpg
+```
+
+or
+
+```
+images/firstName_lastName.png
+```
+
+Please ensure that the image file extension matches the one referenced in the staff member's `.yml` file.
+
+## YML File Format
+
+Each `.yml` file should adhere to the following structure:
+
+```yaml
+name: "John Doe"
+image: "images/john_doe.jpg"
+title: "Lead Software Engineer"
+website: "https://johndoe.com"
+institution: "Morgridge Institute for Research"
+promoted: true
+weight: 3
+description: "John Doe is a brilliant software engineer."
+status: Staff
+organizations:
+ - path
+ - chtc
+ - osg
+ - pelican
+```
+
+### Fields Explanation
+
+- `name`: Full name of the staff member.
+- `image`: Relative path to the staff member's image within the submodule.
+- `title`: The staff member's role or title within the organization.
+- `website`: (Optional) A URL to the staff member's professional or personal webpage.
+- `institution`: The name of the institution to which the staff member belongs.
+- `promoted`: (Optional) A boolean value indicating if the staff member is part of the executive team. Only use if true.
+- `weight`: (Optional) Used to order executive staff members if `promoted` is set to `true`.
+- `description`: (Optional) A brief description or bio of the staff member.
+- `status`: Indicates the current status of the staff member within the organization (e.g., Leadership, Staff, Student, Past).
+- `organizations`: Lists the organizations the staff member is associated with. If the correct values are not provided, the staff member will not be displayed on the respective organization's website.
+
+## Additional Organization-Specific Information
+
+For staff members associated with specific organizations (e.g., `osg`, `chtc`, `pelican`), additional information can be provided under `osg/chtc/pelican/path` with an alternative title for that organization.
+See below for the example:
+
+```yaml
+name: "John Doe"
+image: "images/john_doe.jpg"
+title: "Lead Software Engineer"
+osg:
+ title: "Software Engineer"
+status: Staff
+organizations:
+ - path
+ - chtc
+ - osg
+```
+
+## Contribution Guidelines
+
+- Ensure all information is accurate and up-to-date.
+- Images should be clear and professional, preferably in a uniform size or aspect ratio.
+- Follow the file naming conventions strictly to avoid any inconsistencies.
+- For any updates or changes, please submit a pull request for review.
+
+Thank you for contributing to the `staff-list` submodule and helping maintain a consistent and professional presentation of our staff members.
\ No newline at end of file
diff --git a/preview-calendar/staff-list/aaron_moate.yml b/preview-calendar/staff-list/aaron_moate.yml
new file mode 100644
index 000000000..b770a002d
--- /dev/null
+++ b/preview-calendar/staff-list/aaron_moate.yml
@@ -0,0 +1,14 @@
+name: Aaron Moate
+date: 2020-09-28T19:31:00-05:00
+draft: false
+image: "images/aaron_moate.png"
+title: "Systems Administrator"
+status: "Staff"
+institution: "University of Wisconsin–Madison"
+weight: 5
+chtc:
+ title: Lead Systems Administrator
+organizations:
+ - path
+ - chtc
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/aaryan_patel.yml b/preview-calendar/staff-list/aaryan_patel.yml
new file mode 100644
index 000000000..acf944243
--- /dev/null
+++ b/preview-calendar/staff-list/aaryan_patel.yml
@@ -0,0 +1,7 @@
+name: Aaryan Patel
+title: Research Computing Facilitation Assistant
+institution: Morgridge Insititute for Research
+status: Staff
+organizations:
+ - chtc
+image: images/aaryan_patel.jpeg
diff --git a/preview-calendar/staff-list/abhinandan_saha.yml b/preview-calendar/staff-list/abhinandan_saha.yml
new file mode 100644
index 000000000..e629bd170
--- /dev/null
+++ b/preview-calendar/staff-list/abhinandan_saha.yml
@@ -0,0 +1,7 @@
+image: images/abhinandan_saha.jpg
+institution: University of Wisconsin-Madison
+title: Systems Administration Intern
+name: Abhinandan Saha
+status: Staff
+organizations:
+ - chtc
\ No newline at end of file
diff --git a/preview-calendar/staff-list/adrian_crenshaw.yml b/preview-calendar/staff-list/adrian_crenshaw.yml
new file mode 100644
index 000000000..4a0ab40df
--- /dev/null
+++ b/preview-calendar/staff-list/adrian_crenshaw.yml
@@ -0,0 +1,8 @@
+name: "Adrian Crenshaw"
+image: "images/adrian_crenshaw.jpeg"
+title: "Security Analyst"
+institution: "Indiana University"
+website: https://cacr.iu.edu/about/people/Adrian-Crenshaw.html
+organizations:
+ - path
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/alja_tadel.yml b/preview-calendar/staff-list/alja_tadel.yml
new file mode 100644
index 000000000..83ee25fd6
--- /dev/null
+++ b/preview-calendar/staff-list/alja_tadel.yml
@@ -0,0 +1,10 @@
+image: images/alja_tadel.jpg
+institution: University of California San Diego
+title: Analytic Programmer
+name: Alja Mrak Tadel
+status: Staff
+website: null
+pelican:
+ weight: 9
+organizations:
+ - pelican
\ No newline at end of file
diff --git a/preview-calendar/staff-list/alperen_bakirci.yml b/preview-calendar/staff-list/alperen_bakirci.yml
new file mode 100644
index 000000000..6b03715e4
--- /dev/null
+++ b/preview-calendar/staff-list/alperen_bakirci.yml
@@ -0,0 +1,13 @@
+image: images/alperen_bakirci.jpg
+institution: Morgridge Institute For Research
+title: Student Web Developer
+name: Alperen Bakirci
+status: Past
+website: null
+pelican:
+ weight: 18
+organizations:
+ - path
+ - chtc
+ - osg
+ - pelican
\ No newline at end of file
diff --git a/preview-calendar/staff-list/amber_lim.yml b/preview-calendar/staff-list/amber_lim.yml
new file mode 100644
index 000000000..281834f9d
--- /dev/null
+++ b/preview-calendar/staff-list/amber_lim.yml
@@ -0,0 +1,9 @@
+name: Amber Lim
+title: Research Computing Facilitator
+institution: "University of Wisconsin–Madison"
+status: Staff
+organizations:
+ - chtc
+ - osg
+ - path
+image: images/amber_lim.jpg
diff --git a/preview-calendar/staff-list/andrew_owen.yml b/preview-calendar/staff-list/andrew_owen.yml
new file mode 100644
index 000000000..96622aeb7
--- /dev/null
+++ b/preview-calendar/staff-list/andrew_owen.yml
@@ -0,0 +1,12 @@
+image: images/andrew_owen.jpg
+institution: University of Wisconsin-Madison
+title: Research Computing Facilitator
+is_facilitator: 1
+name: Andrew Owen
+status: Staff
+website: null
+organizations:
+ - path
+ - chtc
+ - osg
+ - pelican
diff --git a/preview-calendar/staff-list/ashton_graves.yml b/preview-calendar/staff-list/ashton_graves.yml
new file mode 100644
index 000000000..030d34286
--- /dev/null
+++ b/preview-calendar/staff-list/ashton_graves.yml
@@ -0,0 +1,9 @@
+image: images/ashton_graves.jpeg
+institution: University of Lincoln-Nebraska
+title: DevOps Engineer
+name: Ashton Graves
+status: Staff
+website: null
+organizations:
+ - path
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/ben_staehle.yml b/preview-calendar/staff-list/ben_staehle.yml
new file mode 100644
index 000000000..693c764ca
--- /dev/null
+++ b/preview-calendar/staff-list/ben_staehle.yml
@@ -0,0 +1,20 @@
+name: Ben Staehle
+title: Fellow
+institution: Morgridge Institute for Research
+status: Student
+organizations:
+ - chtc
+image: images/ben_staehle.jpg
+
+fellowship:
+ name: Tracking server inventory and elevation
+ description: |
+ The CHTC maintains over 1,000 servers on the UW–Madison campus and
+ across the country. Keeping track of server elevation (datacenter
+ and rack location), serial numbers, asset tags is a challenge that
+ is always in need of improvement. This project will focus on taking
+ existing data from the CHTC hardware monitoring system and automatically
+ exporting it to other systems such as Google spreadsheets or ITAdvisor.
+ After a successful summer, the student fellow will gain skills in
+ Python and monitoring and Google Docs APIs.
+ mentor: Joe Bartowiak
diff --git a/preview-calendar/staff-list/bocheng_zou.yaml b/preview-calendar/staff-list/bocheng_zou.yaml
new file mode 100644
index 000000000..130df536e
--- /dev/null
+++ b/preview-calendar/staff-list/bocheng_zou.yaml
@@ -0,0 +1,8 @@
+name: Bocheng Zou
+image: "images/bocheng_zou.png"
+title: "System Administrator Intern"
+status: "Student"
+institution: "University of Wisconsin–Madison"
+weight: 5
+organizations:
+ - chtc
diff --git a/preview-calendar/staff-list/brian_aydemir.yml b/preview-calendar/staff-list/brian_aydemir.yml
new file mode 100644
index 000000000..76f1b3c1e
--- /dev/null
+++ b/preview-calendar/staff-list/brian_aydemir.yml
@@ -0,0 +1,8 @@
+image: images/brian_aydemir.jpeg
+institution: University of Wisconsin-Madison
+title: Systems Integration Developer
+name: Brian Aydemir
+status: Staff
+website: null
+organizations:
+ - chtc
diff --git a/preview-calendar/staff-list/brian_bockelman.yml b/preview-calendar/staff-list/brian_bockelman.yml
new file mode 100644
index 000000000..3d3388a6d
--- /dev/null
+++ b/preview-calendar/staff-list/brian_bockelman.yml
@@ -0,0 +1,23 @@
+name: "Brian Bockelman"
+date: 2018-11-19T10:47:58+10:00
+draft: false
+image: "images/brian_bockelman.jpg"
+title: "FoCaS co-lead"
+institution: "Morgridge Institute for Research"
+promoted: true
+weight: 4
+description: Bockelman is an Investigator at the Morgridge Institute for Research and co-lead of the FoCaS area.
+status: Leadership
+osg:
+ title: OSG Technology Lead
+ website: "https://opensciencegrid.org"
+ promoted: true
+ weight: 3
+pelican:
+ title: Principal Investigator
+ weight: 1
+organizations:
+ - path
+ - chtc
+ - osg
+ - pelican
diff --git a/preview-calendar/staff-list/brian_lin.yml b/preview-calendar/staff-list/brian_lin.yml
new file mode 100644
index 000000000..7035f18d8
--- /dev/null
+++ b/preview-calendar/staff-list/brian_lin.yml
@@ -0,0 +1,22 @@
+name: "Brian Lin"
+date: 2018-11-19T10:47:58+10:00
+draft: false
+image: "images/brian_lin.jpg"
+title: "Infrastructure Services Lead"
+institution: "University of Wisconsin–Madison"
+#website: ""
+linkedinurl: ""
+weight: 5
+status: Staff
+chtc:
+ title: OSG Software Area Coordinator
+osg:
+ title: Software Area Coordinator
+pelican:
+ title: OSG Software Area Coordinator
+ weight: 13
+organizations:
+ - path
+ - chtc
+ - osg
+ - pelican
\ No newline at end of file
diff --git a/preview-calendar/staff-list/bryna_goeking.yml b/preview-calendar/staff-list/bryna_goeking.yml
new file mode 100644
index 000000000..8509a8ca1
--- /dev/null
+++ b/preview-calendar/staff-list/bryna_goeking.yml
@@ -0,0 +1,10 @@
+name: "Bryna Goeking"
+image: "images/bryna_goeking.jpg"
+title: "Student Writer"
+institution: "Morgridge Institute for Research"
+weight: 5
+status: Past
+organizations:
+ - path
+ - chtc
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/cameron_abplanalp.yml b/preview-calendar/staff-list/cameron_abplanalp.yml
new file mode 100644
index 000000000..28f2d2d87
--- /dev/null
+++ b/preview-calendar/staff-list/cameron_abplanalp.yml
@@ -0,0 +1,8 @@
+image: images/cameron_abplanalp.png
+institution: University of Wisconsin-Madison
+title: Research Computing Facilitation Assistant
+name: Cameron Abplanalp
+status: Past
+website: null
+organizations:
+ - chtc
diff --git a/preview-calendar/staff-list/cannon_lock.yml b/preview-calendar/staff-list/cannon_lock.yml
new file mode 100644
index 000000000..33692fe6b
--- /dev/null
+++ b/preview-calendar/staff-list/cannon_lock.yml
@@ -0,0 +1,16 @@
+name: "Cannon Lock"
+draft: false
+image: "images/cannon_lock.jpg"
+title: "Web Developer"
+institution: "Morgridge Institute for Research"
+status: Staff
+linkedinurl: ""
+weight: 5
+pelican:
+ title: "Web Developer"
+ weight: 6
+organizations:
+ - path
+ - chtc
+ - osg
+ - pelican
\ No newline at end of file
diff --git a/preview-calendar/staff-list/chris_lauderbaugh.yml b/preview-calendar/staff-list/chris_lauderbaugh.yml
new file mode 100644
index 000000000..d007b42ea
--- /dev/null
+++ b/preview-calendar/staff-list/chris_lauderbaugh.yml
@@ -0,0 +1,9 @@
+image: images/chris_lauderbaugh.jpg
+institution: Indiana University
+title: Security Analyst
+name: Chris Lauderbaugh
+status: Staff
+website: null
+organizations:
+ - path
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/christina_koch.yml b/preview-calendar/staff-list/christina_koch.yml
new file mode 100644
index 000000000..080136484
--- /dev/null
+++ b/preview-calendar/staff-list/christina_koch.yml
@@ -0,0 +1,25 @@
+name: "Christina Koch"
+date: 2018-11-19T10:47:58+10:00
+draft: false
+image: "images/christina_koch.jpg"
+title: "Research Facilitator Manager"
+institution: "University of Wisconsin - Madison"
+website: https://wid.wisc.edu/people/christina-koch/
+is_facilitator: 1
+status: Staff
+linkedinurl: ""
+weight: 5
+chtc:
+ title: Lead Research Computing Facilitator
+pelican:
+ title: Lead Research Computing Facilitator
+ weight: 14
+osg:
+ title: OSG Research Facilitation Lead
+ promoted: true
+ weight: 7
+organizations:
+ - path
+ - chtc
+ - osg
+ - pelican
\ No newline at end of file
diff --git a/preview-calendar/staff-list/colby_walsworth.yml b/preview-calendar/staff-list/colby_walsworth.yml
new file mode 100644
index 000000000..8b803727c
--- /dev/null
+++ b/preview-calendar/staff-list/colby_walsworth.yml
@@ -0,0 +1,9 @@
+name: "Colby Walsworth"
+image: "images/colby_walsworth.jpg"
+title: "Software Integration Developer"
+status: Staff
+institution: "University of California - San Diego"
+weight: 5
+organizations:
+ - path
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/cole_bollig.yml b/preview-calendar/staff-list/cole_bollig.yml
new file mode 100644
index 000000000..fabeb4f59
--- /dev/null
+++ b/preview-calendar/staff-list/cole_bollig.yml
@@ -0,0 +1,10 @@
+name: "Cole Bollig"
+status: Staff
+image: "images/cole_bollig.jpg"
+title: "Systems Software Developer"
+institution: "University of Wisconsin - Madison"
+chtc:
+ title: HTCondor Core Developer
+organizations:
+ - path
+ - chtc
\ No newline at end of file
diff --git a/preview-calendar/staff-list/cristina_encarnacion.yml b/preview-calendar/staff-list/cristina_encarnacion.yml
new file mode 100644
index 000000000..09d88acdf
--- /dev/null
+++ b/preview-calendar/staff-list/cristina_encarnacion.yml
@@ -0,0 +1,12 @@
+name: "Cristina Encarnacion"
+image: "images/cristina_encarnacion.jpeg"
+title: "Student Science Writer"
+institution: "Morgridge Institute for Research"
+website: null
+weight: 3
+status: Student
+organizations:
+ - path
+ - chtc
+ - osg
+ - pelican
\ No newline at end of file
diff --git a/preview-calendar/staff-list/david_baik.yaml b/preview-calendar/staff-list/david_baik.yaml
new file mode 100644
index 000000000..2a67fb236
--- /dev/null
+++ b/preview-calendar/staff-list/david_baik.yaml
@@ -0,0 +1,8 @@
+image: images/default.jpg
+institution: University of Wisconsin-Madison
+title: System Administrator
+name: David Baik
+status: Staff
+website: null
+organizations:
+ - chtc
diff --git a/preview-calendar/staff-list/david_jordan.yml b/preview-calendar/staff-list/david_jordan.yml
new file mode 100644
index 000000000..48ea8a8aa
--- /dev/null
+++ b/preview-calendar/staff-list/david_jordan.yml
@@ -0,0 +1,7 @@
+name: David Jordan
+image: "images/david_jordan.jpg"
+title: "Systems Administrator"
+status: Staff
+institution: "University of Chicago"
+organizations:
+ - path
\ No newline at end of file
diff --git a/preview-calendar/staff-list/derek_weitzel.yml b/preview-calendar/staff-list/derek_weitzel.yml
new file mode 100644
index 000000000..80a3a329f
--- /dev/null
+++ b/preview-calendar/staff-list/derek_weitzel.yml
@@ -0,0 +1,15 @@
+name: "Derek Weitzel"
+date: 2018-11-19T10:47:58+10:00
+draft: false
+image: "images/derek_weitzel.png"
+title: "Institutional PI"
+institution: "University of Nebraska-Lincoln"
+status: Staff
+website: "https://derekweitzel.com"
+description: Derek Weitzel is an Assistant Research Professor at the Univeristy of Nebraska-Lincoln's Computer Science and Engineering Department.
+osg:
+ title: Software Integration Developer
+ website: https://github.com/djw8605
+organizations:
+ - path
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/emile_turatsinze.yml b/preview-calendar/staff-list/emile_turatsinze.yml
new file mode 100644
index 000000000..ae8c8fe1d
--- /dev/null
+++ b/preview-calendar/staff-list/emile_turatsinze.yml
@@ -0,0 +1,7 @@
+image: images/emile_turatsinze.jpg
+institution: Morgridge Institute for Research
+title: Systems Administrator
+name: Emile Turatsinze
+status: Staff
+organizations:
+ - chtc
\ No newline at end of file
diff --git a/preview-calendar/staff-list/emily_yao.yml b/preview-calendar/staff-list/emily_yao.yml
new file mode 100644
index 000000000..44d2441d3
--- /dev/null
+++ b/preview-calendar/staff-list/emily_yao.yml
@@ -0,0 +1,7 @@
+image: images/emily_yao.jpg
+institution: University on Wisconsin-Madison
+title: System Administrator Intern
+name: Emily Yao
+status: Past
+organizations:
+ - chtc
\ No newline at end of file
diff --git a/preview-calendar/staff-list/emma_turetsky.yml b/preview-calendar/staff-list/emma_turetsky.yml
new file mode 100644
index 000000000..a46affc3e
--- /dev/null
+++ b/preview-calendar/staff-list/emma_turetsky.yml
@@ -0,0 +1,10 @@
+image: images/emma_turetsky.jpg
+institution: Morgridge Institute for Research
+title: Research Software Engineer
+name: Emma Turetsky
+status: Staff
+pelican:
+ weight: 7
+organizations:
+ - chtc
+ - pelican
diff --git a/preview-calendar/staff-list/ewa_deelman.yml b/preview-calendar/staff-list/ewa_deelman.yml
new file mode 100644
index 000000000..e1619983f
--- /dev/null
+++ b/preview-calendar/staff-list/ewa_deelman.yml
@@ -0,0 +1,6 @@
+name: "Ewa Deelman"
+image: "images/ewa_deelman.jpeg"
+title: "Institutional PI"
+institution: "University of Southern California"
+organizations:
+ - path
\ No newline at end of file
diff --git a/preview-calendar/staff-list/fabio_andrijauska.yml b/preview-calendar/staff-list/fabio_andrijauska.yml
new file mode 100644
index 000000000..9239314f9
--- /dev/null
+++ b/preview-calendar/staff-list/fabio_andrijauska.yml
@@ -0,0 +1,6 @@
+name: Fabio Andrijauskas
+image: "images/fabio_andrijauskas.jpeg"
+title: "Senior Software Developer"
+institution: "University of California San Diego"
+organizations:
+ - path
\ No newline at end of file
diff --git a/preview-calendar/staff-list/farnaz_golnaraghi.yml b/preview-calendar/staff-list/farnaz_golnaraghi.yml
new file mode 100644
index 000000000..1811e84c2
--- /dev/null
+++ b/preview-calendar/staff-list/farnaz_golnaraghi.yml
@@ -0,0 +1,6 @@
+name: "Farnaz Golnaraghi"
+image: "images/farnaz_golnaraghi.jpeg"
+title: "Systems Administrator"
+institution: "University of Chicago"
+organizations:
+ - path
\ No newline at end of file
diff --git a/preview-calendar/staff-list/frank_wuerthwein.yml b/preview-calendar/staff-list/frank_wuerthwein.yml
new file mode 100644
index 000000000..e55dc7a69
--- /dev/null
+++ b/preview-calendar/staff-list/frank_wuerthwein.yml
@@ -0,0 +1,21 @@
+name: "Frank Wuerthwein"
+date: 2018-11-19T10:47:58+10:00
+draft: false
+image: "images/frank_wuerthwein.jpg"
+title: "OSG Executive Director"
+website:
+institution: "University of California San Diego"
+promoted: true
+weight: 2
+description: Wuerthwein is a Professor of Physics at UCSD and the Executive Director of the OSG.
+pelican:
+ title: Co-Principal Investigator
+ weight: 3
+osg:
+ title: OSG Executive Director
+ promoted: true
+ weight: 2
+organizations:
+ - path
+ - osg
+ - pelican
\ No newline at end of file
diff --git a/preview-calendar/staff-list/frank_zhang.yaml b/preview-calendar/staff-list/frank_zhang.yaml
new file mode 100644
index 000000000..26443fda9
--- /dev/null
+++ b/preview-calendar/staff-list/frank_zhang.yaml
@@ -0,0 +1,8 @@
+image: images/default.jpg
+institution: University of Wisconsin-Madison
+title: System Administrator Intern
+name: Frank Zhang
+status: Student
+website: null
+organizations:
+ - chtc
diff --git a/preview-calendar/staff-list/greg_thain.yml b/preview-calendar/staff-list/greg_thain.yml
new file mode 100644
index 000000000..a7635ff30
--- /dev/null
+++ b/preview-calendar/staff-list/greg_thain.yml
@@ -0,0 +1,14 @@
+name: "Greg Thain"
+date: 2018-11-19T10:47:58+10:00
+draft: false
+image: "images/greg_thain.jpg"
+title: "Senior Systems Software Developer"
+#website: ""
+institution: "University of Wisconsin-Madison"
+status: Staff
+weight: 5
+chtc:
+ title: HTCondor Core Developer
+organizations:
+ - path
+ - chtc
\ No newline at end of file
diff --git a/preview-calendar/staff-list/hannah_cheren.yml b/preview-calendar/staff-list/hannah_cheren.yml
new file mode 100644
index 000000000..a94b6206b
--- /dev/null
+++ b/preview-calendar/staff-list/hannah_cheren.yml
@@ -0,0 +1,12 @@
+name: "Hannah Cheren"
+date: 2021-11-017T09:00:00+10:00
+draft: false
+image: "images/hannah_cheren.jpg"
+title: "Communications Specialist"
+institution: "University of Wisconsin–Madison"
+#website: ""
+linkedinurl: ""
+weight: 5
+organizations:
+ - path
+status: Past
\ No newline at end of file
diff --git a/preview-calendar/staff-list/haoming_meng.yml b/preview-calendar/staff-list/haoming_meng.yml
new file mode 100644
index 000000000..4803f26f3
--- /dev/null
+++ b/preview-calendar/staff-list/haoming_meng.yml
@@ -0,0 +1,11 @@
+image: images/haoming_meng.jpg
+institution: Morgridge Institute For Research
+title: Research Software Engineer
+name: Haoming Meng
+status: Past
+website: null
+pelican:
+ weight: 12
+organizations:
+ - chtc
+ - pelican
diff --git a/preview-calendar/staff-list/ian_ross.yml b/preview-calendar/staff-list/ian_ross.yml
new file mode 100644
index 000000000..a0c5ef4df
--- /dev/null
+++ b/preview-calendar/staff-list/ian_ross.yml
@@ -0,0 +1,7 @@
+image: images/ian_ross.jpg
+institution: University of Wisconsin-Madison
+title: Systems Integration Developer
+name: Ian Ross
+status: Staff
+organizations:
+ - chtc
diff --git a/preview-calendar/staff-list/igor_sfiligoi.yml b/preview-calendar/staff-list/igor_sfiligoi.yml
new file mode 100644
index 000000000..2562325cc
--- /dev/null
+++ b/preview-calendar/staff-list/igor_sfiligoi.yml
@@ -0,0 +1,11 @@
+name: "Igor Sfiligoi"
+date: 2020-09-28T05:00:00-05:00
+draft: false
+image: "images/igor_sfiligoi.jpg"
+title: "Lead Scientific Software Developer and Researcher"
+institution: "University of California San Diego"
+#website: ""
+linkedinurl: "https://www.linkedin.com/in/igor-sfiligoi-73982a78/"
+weight: 5
+organizations:
+ - path
\ No newline at end of file
diff --git a/preview-calendar/staff-list/images/aaron_moate.png b/preview-calendar/staff-list/images/aaron_moate.png
new file mode 100644
index 000000000..036d9098c
Binary files /dev/null and b/preview-calendar/staff-list/images/aaron_moate.png differ
diff --git a/preview-calendar/staff-list/images/aaryan_patel.jpeg b/preview-calendar/staff-list/images/aaryan_patel.jpeg
new file mode 100644
index 000000000..223d00609
Binary files /dev/null and b/preview-calendar/staff-list/images/aaryan_patel.jpeg differ
diff --git a/preview-calendar/staff-list/images/abhinandan_saha.jpg b/preview-calendar/staff-list/images/abhinandan_saha.jpg
new file mode 100644
index 000000000..9a9adc10f
Binary files /dev/null and b/preview-calendar/staff-list/images/abhinandan_saha.jpg differ
diff --git a/preview-calendar/staff-list/images/adrian_crenshaw.jpeg b/preview-calendar/staff-list/images/adrian_crenshaw.jpeg
new file mode 100644
index 000000000..a69b9cb4b
Binary files /dev/null and b/preview-calendar/staff-list/images/adrian_crenshaw.jpeg differ
diff --git a/preview-calendar/staff-list/images/alja_tadel.jpg b/preview-calendar/staff-list/images/alja_tadel.jpg
new file mode 100644
index 000000000..2b1db9b86
Binary files /dev/null and b/preview-calendar/staff-list/images/alja_tadel.jpg differ
diff --git a/preview-calendar/staff-list/images/alperen_bakirci.jpg b/preview-calendar/staff-list/images/alperen_bakirci.jpg
new file mode 100644
index 000000000..a9c1186e3
Binary files /dev/null and b/preview-calendar/staff-list/images/alperen_bakirci.jpg differ
diff --git a/preview-calendar/staff-list/images/amber_lim.jpg b/preview-calendar/staff-list/images/amber_lim.jpg
new file mode 100644
index 000000000..4e5f2162d
Binary files /dev/null and b/preview-calendar/staff-list/images/amber_lim.jpg differ
diff --git a/preview-calendar/staff-list/images/andrew_owen.jpg b/preview-calendar/staff-list/images/andrew_owen.jpg
new file mode 100644
index 000000000..4617647df
Binary files /dev/null and b/preview-calendar/staff-list/images/andrew_owen.jpg differ
diff --git a/preview-calendar/staff-list/images/ashton_graves.jpeg b/preview-calendar/staff-list/images/ashton_graves.jpeg
new file mode 100644
index 000000000..06e38c7d9
Binary files /dev/null and b/preview-calendar/staff-list/images/ashton_graves.jpeg differ
diff --git a/preview-calendar/staff-list/images/ben_staehle.jpg b/preview-calendar/staff-list/images/ben_staehle.jpg
new file mode 100644
index 000000000..c3e7c0d85
Binary files /dev/null and b/preview-calendar/staff-list/images/ben_staehle.jpg differ
diff --git a/preview-calendar/staff-list/images/bocheng_zou.png b/preview-calendar/staff-list/images/bocheng_zou.png
new file mode 100644
index 000000000..9d49e5f85
Binary files /dev/null and b/preview-calendar/staff-list/images/bocheng_zou.png differ
diff --git a/preview-calendar/staff-list/images/brian_aydemir.jpeg b/preview-calendar/staff-list/images/brian_aydemir.jpeg
new file mode 100644
index 000000000..7cd690dcb
Binary files /dev/null and b/preview-calendar/staff-list/images/brian_aydemir.jpeg differ
diff --git a/preview-calendar/staff-list/images/brian_bockelman.jpg b/preview-calendar/staff-list/images/brian_bockelman.jpg
new file mode 100644
index 000000000..0ebb6eb0d
Binary files /dev/null and b/preview-calendar/staff-list/images/brian_bockelman.jpg differ
diff --git a/preview-calendar/staff-list/images/brian_lin.jpg b/preview-calendar/staff-list/images/brian_lin.jpg
new file mode 100644
index 000000000..8fa6934ec
Binary files /dev/null and b/preview-calendar/staff-list/images/brian_lin.jpg differ
diff --git a/preview-calendar/staff-list/images/bryna_goeking.jpg b/preview-calendar/staff-list/images/bryna_goeking.jpg
new file mode 100644
index 000000000..69f1d7fbb
Binary files /dev/null and b/preview-calendar/staff-list/images/bryna_goeking.jpg differ
diff --git a/preview-calendar/staff-list/images/cameron_abplanalp.png b/preview-calendar/staff-list/images/cameron_abplanalp.png
new file mode 100644
index 000000000..982b1977e
Binary files /dev/null and b/preview-calendar/staff-list/images/cameron_abplanalp.png differ
diff --git a/preview-calendar/staff-list/images/cannon_lock.jpg b/preview-calendar/staff-list/images/cannon_lock.jpg
new file mode 100644
index 000000000..31afa7caf
Binary files /dev/null and b/preview-calendar/staff-list/images/cannon_lock.jpg differ
diff --git a/preview-calendar/staff-list/images/chris_lauderbaugh.jpg b/preview-calendar/staff-list/images/chris_lauderbaugh.jpg
new file mode 100644
index 000000000..74a235369
Binary files /dev/null and b/preview-calendar/staff-list/images/chris_lauderbaugh.jpg differ
diff --git a/preview-calendar/staff-list/images/christina_koch.jpg b/preview-calendar/staff-list/images/christina_koch.jpg
new file mode 100644
index 000000000..455bad094
Binary files /dev/null and b/preview-calendar/staff-list/images/christina_koch.jpg differ
diff --git a/preview-calendar/staff-list/images/colby_walsworth.jpg b/preview-calendar/staff-list/images/colby_walsworth.jpg
new file mode 100644
index 000000000..c39ac9e3c
Binary files /dev/null and b/preview-calendar/staff-list/images/colby_walsworth.jpg differ
diff --git a/preview-calendar/staff-list/images/cole_bollig.jpg b/preview-calendar/staff-list/images/cole_bollig.jpg
new file mode 100644
index 000000000..f6c1052ca
Binary files /dev/null and b/preview-calendar/staff-list/images/cole_bollig.jpg differ
diff --git a/preview-calendar/staff-list/images/cristina_encarnacion.jpeg b/preview-calendar/staff-list/images/cristina_encarnacion.jpeg
new file mode 100644
index 000000000..63c6af413
Binary files /dev/null and b/preview-calendar/staff-list/images/cristina_encarnacion.jpeg differ
diff --git a/preview-calendar/staff-list/images/david_jordan.jpg b/preview-calendar/staff-list/images/david_jordan.jpg
new file mode 100644
index 000000000..07dd7969c
Binary files /dev/null and b/preview-calendar/staff-list/images/david_jordan.jpg differ
diff --git a/preview-calendar/staff-list/images/default.jpg b/preview-calendar/staff-list/images/default.jpg
new file mode 100644
index 000000000..e42186f37
Binary files /dev/null and b/preview-calendar/staff-list/images/default.jpg differ
diff --git a/preview-calendar/staff-list/images/derek_weitzel.png b/preview-calendar/staff-list/images/derek_weitzel.png
new file mode 100644
index 000000000..e46c6b25f
Binary files /dev/null and b/preview-calendar/staff-list/images/derek_weitzel.png differ
diff --git a/preview-calendar/staff-list/images/emile_turatsinze.jpg b/preview-calendar/staff-list/images/emile_turatsinze.jpg
new file mode 100644
index 000000000..207ba1cce
Binary files /dev/null and b/preview-calendar/staff-list/images/emile_turatsinze.jpg differ
diff --git a/preview-calendar/staff-list/images/emily_yao.jpg b/preview-calendar/staff-list/images/emily_yao.jpg
new file mode 100644
index 000000000..da2a80f13
Binary files /dev/null and b/preview-calendar/staff-list/images/emily_yao.jpg differ
diff --git a/preview-calendar/staff-list/images/emma_turetsky.jpg b/preview-calendar/staff-list/images/emma_turetsky.jpg
new file mode 100644
index 000000000..631ecfce4
Binary files /dev/null and b/preview-calendar/staff-list/images/emma_turetsky.jpg differ
diff --git a/preview-calendar/staff-list/images/ewa_deelman.jpeg b/preview-calendar/staff-list/images/ewa_deelman.jpeg
new file mode 100644
index 000000000..774d860a4
Binary files /dev/null and b/preview-calendar/staff-list/images/ewa_deelman.jpeg differ
diff --git a/preview-calendar/staff-list/images/fabio_andrijauskas.jpeg b/preview-calendar/staff-list/images/fabio_andrijauskas.jpeg
new file mode 100644
index 000000000..c3fb45426
Binary files /dev/null and b/preview-calendar/staff-list/images/fabio_andrijauskas.jpeg differ
diff --git a/preview-calendar/staff-list/images/farnaz_golnaraghi.jpeg b/preview-calendar/staff-list/images/farnaz_golnaraghi.jpeg
new file mode 100644
index 000000000..feb787e63
Binary files /dev/null and b/preview-calendar/staff-list/images/farnaz_golnaraghi.jpeg differ
diff --git a/preview-calendar/staff-list/images/frank_wuerthwein.jpg b/preview-calendar/staff-list/images/frank_wuerthwein.jpg
new file mode 100644
index 000000000..bc5cb071a
Binary files /dev/null and b/preview-calendar/staff-list/images/frank_wuerthwein.jpg differ
diff --git a/preview-calendar/staff-list/images/greg_thain.jpg b/preview-calendar/staff-list/images/greg_thain.jpg
new file mode 100644
index 000000000..10fc4785d
Binary files /dev/null and b/preview-calendar/staff-list/images/greg_thain.jpg differ
diff --git a/preview-calendar/staff-list/images/hannah_cheren.jpg b/preview-calendar/staff-list/images/hannah_cheren.jpg
new file mode 100644
index 000000000..5dd58aed6
Binary files /dev/null and b/preview-calendar/staff-list/images/hannah_cheren.jpg differ
diff --git a/preview-calendar/staff-list/images/haoming_meng.jpg b/preview-calendar/staff-list/images/haoming_meng.jpg
new file mode 100644
index 000000000..487f5322b
Binary files /dev/null and b/preview-calendar/staff-list/images/haoming_meng.jpg differ
diff --git a/preview-calendar/staff-list/images/ian_ross.jpg b/preview-calendar/staff-list/images/ian_ross.jpg
new file mode 100644
index 000000000..f5467bb6f
Binary files /dev/null and b/preview-calendar/staff-list/images/ian_ross.jpg differ
diff --git a/preview-calendar/staff-list/images/igor_sfiligoi.jpg b/preview-calendar/staff-list/images/igor_sfiligoi.jpg
new file mode 100644
index 000000000..6c901b2ff
Binary files /dev/null and b/preview-calendar/staff-list/images/igor_sfiligoi.jpg differ
diff --git a/preview-calendar/staff-list/images/irene_landrum.png b/preview-calendar/staff-list/images/irene_landrum.png
new file mode 100644
index 000000000..6bce28d19
Binary files /dev/null and b/preview-calendar/staff-list/images/irene_landrum.png differ
diff --git a/preview-calendar/staff-list/images/jaime_frey.jpg b/preview-calendar/staff-list/images/jaime_frey.jpg
new file mode 100644
index 000000000..0c96f694a
Binary files /dev/null and b/preview-calendar/staff-list/images/jaime_frey.jpg differ
diff --git a/preview-calendar/staff-list/images/janet_stathas.jpg b/preview-calendar/staff-list/images/janet_stathas.jpg
new file mode 100644
index 000000000..88b938689
Binary files /dev/null and b/preview-calendar/staff-list/images/janet_stathas.jpg differ
diff --git a/preview-calendar/staff-list/images/jason_patton.png b/preview-calendar/staff-list/images/jason_patton.png
new file mode 100644
index 000000000..63b5e2471
Binary files /dev/null and b/preview-calendar/staff-list/images/jason_patton.png differ
diff --git a/preview-calendar/staff-list/images/jeff_dost.jpg b/preview-calendar/staff-list/images/jeff_dost.jpg
new file mode 100644
index 000000000..e5adc8ac1
Binary files /dev/null and b/preview-calendar/staff-list/images/jeff_dost.jpg differ
diff --git a/preview-calendar/staff-list/images/jeff_peterson.jpg b/preview-calendar/staff-list/images/jeff_peterson.jpg
new file mode 100644
index 000000000..03f212b95
Binary files /dev/null and b/preview-calendar/staff-list/images/jeff_peterson.jpg differ
diff --git a/preview-calendar/staff-list/images/jeronimo_bezerra.jpeg b/preview-calendar/staff-list/images/jeronimo_bezerra.jpeg
new file mode 100644
index 000000000..83f194517
Binary files /dev/null and b/preview-calendar/staff-list/images/jeronimo_bezerra.jpeg differ
diff --git a/preview-calendar/staff-list/images/joe_bartkowiak.jpg b/preview-calendar/staff-list/images/joe_bartkowiak.jpg
new file mode 100644
index 000000000..391d81da6
Binary files /dev/null and b/preview-calendar/staff-list/images/joe_bartkowiak.jpg differ
diff --git a/preview-calendar/staff-list/images/joe_reuss.jpeg b/preview-calendar/staff-list/images/joe_reuss.jpeg
new file mode 100644
index 000000000..40e2f80aa
Binary files /dev/null and b/preview-calendar/staff-list/images/joe_reuss.jpeg differ
diff --git a/preview-calendar/staff-list/images/john_knoeller.jpg b/preview-calendar/staff-list/images/john_knoeller.jpg
new file mode 100644
index 000000000..eda38ee91
Binary files /dev/null and b/preview-calendar/staff-list/images/john_knoeller.jpg differ
diff --git a/preview-calendar/staff-list/images/john_parsons.jpeg b/preview-calendar/staff-list/images/john_parsons.jpeg
new file mode 100644
index 000000000..27790e88f
Binary files /dev/null and b/preview-calendar/staff-list/images/john_parsons.jpeg differ
diff --git a/preview-calendar/staff-list/images/john_thiltges.jpg b/preview-calendar/staff-list/images/john_thiltges.jpg
new file mode 100644
index 000000000..cae48ccaa
Binary files /dev/null and b/preview-calendar/staff-list/images/john_thiltges.jpg differ
diff --git a/preview-calendar/staff-list/images/jordan_sklar.jpg b/preview-calendar/staff-list/images/jordan_sklar.jpg
new file mode 100644
index 000000000..5f233cea0
Binary files /dev/null and b/preview-calendar/staff-list/images/jordan_sklar.jpg differ
diff --git a/preview-calendar/staff-list/images/josh_drake.jpg b/preview-calendar/staff-list/images/josh_drake.jpg
new file mode 100644
index 000000000..b1962b9b9
Binary files /dev/null and b/preview-calendar/staff-list/images/josh_drake.jpg differ
diff --git a/preview-calendar/staff-list/images/josh_edwards.jpeg b/preview-calendar/staff-list/images/josh_edwards.jpeg
new file mode 100644
index 000000000..a7a0db417
Binary files /dev/null and b/preview-calendar/staff-list/images/josh_edwards.jpeg differ
diff --git a/preview-calendar/staff-list/images/judith_stephen.jpeg b/preview-calendar/staff-list/images/judith_stephen.jpeg
new file mode 100644
index 000000000..e0f658d59
Binary files /dev/null and b/preview-calendar/staff-list/images/judith_stephen.jpeg differ
diff --git a/preview-calendar/staff-list/images/julio_ibarra.jpg b/preview-calendar/staff-list/images/julio_ibarra.jpg
new file mode 100644
index 000000000..786c26352
Binary files /dev/null and b/preview-calendar/staff-list/images/julio_ibarra.jpg differ
diff --git a/preview-calendar/staff-list/images/justin_hiemstra.jpg b/preview-calendar/staff-list/images/justin_hiemstra.jpg
new file mode 100644
index 000000000..7a819cdc2
Binary files /dev/null and b/preview-calendar/staff-list/images/justin_hiemstra.jpg differ
diff --git a/preview-calendar/staff-list/images/kent_cramer.jpeg b/preview-calendar/staff-list/images/kent_cramer.jpeg
new file mode 100644
index 000000000..ac3fd4e9e
Binary files /dev/null and b/preview-calendar/staff-list/images/kent_cramer.jpeg differ
diff --git a/preview-calendar/staff-list/images/kristina_zhao.jpg b/preview-calendar/staff-list/images/kristina_zhao.jpg
new file mode 100644
index 000000000..1cb6e5ac0
Binary files /dev/null and b/preview-calendar/staff-list/images/kristina_zhao.jpg differ
diff --git a/preview-calendar/staff-list/images/lili_bicoy.jpg b/preview-calendar/staff-list/images/lili_bicoy.jpg
new file mode 100644
index 000000000..bc46a64fb
Binary files /dev/null and b/preview-calendar/staff-list/images/lili_bicoy.jpg differ
diff --git a/preview-calendar/staff-list/images/matevz_tadel.jpg b/preview-calendar/staff-list/images/matevz_tadel.jpg
new file mode 100644
index 000000000..5b89c609f
Binary files /dev/null and b/preview-calendar/staff-list/images/matevz_tadel.jpg differ
diff --git a/preview-calendar/staff-list/images/mats_rynge.jpg b/preview-calendar/staff-list/images/mats_rynge.jpg
new file mode 100644
index 000000000..c2f526df7
Binary files /dev/null and b/preview-calendar/staff-list/images/mats_rynge.jpg differ
diff --git a/preview-calendar/staff-list/images/matt_westphall.jpeg b/preview-calendar/staff-list/images/matt_westphall.jpeg
new file mode 100644
index 000000000..3857eca52
Binary files /dev/null and b/preview-calendar/staff-list/images/matt_westphall.jpeg differ
diff --git a/preview-calendar/staff-list/images/matyas_selmeci.jpg b/preview-calendar/staff-list/images/matyas_selmeci.jpg
new file mode 100644
index 000000000..7fff6f84f
Binary files /dev/null and b/preview-calendar/staff-list/images/matyas_selmeci.jpg differ
diff --git a/preview-calendar/staff-list/images/max_hartke.jpg b/preview-calendar/staff-list/images/max_hartke.jpg
new file mode 100644
index 000000000..d984af3da
Binary files /dev/null and b/preview-calendar/staff-list/images/max_hartke.jpg differ
diff --git a/preview-calendar/staff-list/images/michael_collins.png b/preview-calendar/staff-list/images/michael_collins.png
new file mode 100644
index 000000000..192afd5cf
Binary files /dev/null and b/preview-calendar/staff-list/images/michael_collins.png differ
diff --git a/preview-calendar/staff-list/images/mihir_manna.jpeg b/preview-calendar/staff-list/images/mihir_manna.jpeg
new file mode 100644
index 000000000..5542ed825
Binary files /dev/null and b/preview-calendar/staff-list/images/mihir_manna.jpeg differ
diff --git a/preview-calendar/staff-list/images/miron_livny.png b/preview-calendar/staff-list/images/miron_livny.png
new file mode 100644
index 000000000..a762f7e22
Binary files /dev/null and b/preview-calendar/staff-list/images/miron_livny.png differ
diff --git a/preview-calendar/staff-list/images/molly_mccarthy.jpg b/preview-calendar/staff-list/images/molly_mccarthy.jpg
new file mode 100644
index 000000000..7653df08a
Binary files /dev/null and b/preview-calendar/staff-list/images/molly_mccarthy.jpg differ
diff --git a/preview-calendar/staff-list/images/neha_talluri.jpg b/preview-calendar/staff-list/images/neha_talluri.jpg
new file mode 100644
index 000000000..464b953d4
Binary files /dev/null and b/preview-calendar/staff-list/images/neha_talluri.jpg differ
diff --git a/preview-calendar/staff-list/images/pascal_paschos.png b/preview-calendar/staff-list/images/pascal_paschos.png
new file mode 100644
index 000000000..845c783d1
Binary files /dev/null and b/preview-calendar/staff-list/images/pascal_paschos.png differ
diff --git a/preview-calendar/staff-list/images/patrick_brophy.jpg b/preview-calendar/staff-list/images/patrick_brophy.jpg
new file mode 100644
index 000000000..5cda328f1
Binary files /dev/null and b/preview-calendar/staff-list/images/patrick_brophy.jpg differ
diff --git a/preview-calendar/staff-list/images/pratham_patel.jpg b/preview-calendar/staff-list/images/pratham_patel.jpg
new file mode 100644
index 000000000..5d215fcf4
Binary files /dev/null and b/preview-calendar/staff-list/images/pratham_patel.jpg differ
diff --git a/preview-calendar/staff-list/images/rachel_lombardi.jpg b/preview-calendar/staff-list/images/rachel_lombardi.jpg
new file mode 100644
index 000000000..acac1723d
Binary files /dev/null and b/preview-calendar/staff-list/images/rachel_lombardi.jpg differ
diff --git a/preview-calendar/staff-list/images/rich_wellner.jpg b/preview-calendar/staff-list/images/rich_wellner.jpg
new file mode 100644
index 000000000..015215129
Binary files /dev/null and b/preview-calendar/staff-list/images/rich_wellner.jpg differ
diff --git a/preview-calendar/staff-list/images/rishideep_rallabandi.jpg b/preview-calendar/staff-list/images/rishideep_rallabandi.jpg
new file mode 100644
index 000000000..ba7c415c6
Binary files /dev/null and b/preview-calendar/staff-list/images/rishideep_rallabandi.jpg differ
diff --git a/preview-calendar/staff-list/images/rob_gardner.jpg b/preview-calendar/staff-list/images/rob_gardner.jpg
new file mode 100644
index 000000000..70efb9ed0
Binary files /dev/null and b/preview-calendar/staff-list/images/rob_gardner.jpg differ
diff --git a/preview-calendar/staff-list/images/ryan_boone.jpg b/preview-calendar/staff-list/images/ryan_boone.jpg
new file mode 100644
index 000000000..10de5fcb5
Binary files /dev/null and b/preview-calendar/staff-list/images/ryan_boone.jpg differ
diff --git a/preview-calendar/staff-list/images/ryan_jacob.jpg b/preview-calendar/staff-list/images/ryan_jacob.jpg
new file mode 100644
index 000000000..2545d5821
Binary files /dev/null and b/preview-calendar/staff-list/images/ryan_jacob.jpg differ
diff --git a/preview-calendar/staff-list/images/shawn_mckee.jpg b/preview-calendar/staff-list/images/shawn_mckee.jpg
new file mode 100644
index 000000000..6c388489e
Binary files /dev/null and b/preview-calendar/staff-list/images/shawn_mckee.jpg differ
diff --git a/preview-calendar/staff-list/images/shirley_obih.jpg b/preview-calendar/staff-list/images/shirley_obih.jpg
new file mode 100644
index 000000000..1d629fb62
Binary files /dev/null and b/preview-calendar/staff-list/images/shirley_obih.jpg differ
diff --git a/preview-calendar/staff-list/images/showmic_islam.jpg b/preview-calendar/staff-list/images/showmic_islam.jpg
new file mode 100644
index 000000000..cf2aa79ea
Binary files /dev/null and b/preview-calendar/staff-list/images/showmic_islam.jpg differ
diff --git a/preview-calendar/staff-list/images/susan_sons.jpg b/preview-calendar/staff-list/images/susan_sons.jpg
new file mode 100644
index 000000000..e0a19a647
Binary files /dev/null and b/preview-calendar/staff-list/images/susan_sons.jpg differ
diff --git a/preview-calendar/staff-list/images/tae_kidd.jpg b/preview-calendar/staff-list/images/tae_kidd.jpg
new file mode 100644
index 000000000..21c6f31b9
Binary files /dev/null and b/preview-calendar/staff-list/images/tae_kidd.jpg differ
diff --git a/preview-calendar/staff-list/images/theng_vang.jpg b/preview-calendar/staff-list/images/theng_vang.jpg
new file mode 100644
index 000000000..3f272bb37
Binary files /dev/null and b/preview-calendar/staff-list/images/theng_vang.jpg differ
diff --git a/preview-calendar/staff-list/images/thinh_nguyen.jpg b/preview-calendar/staff-list/images/thinh_nguyen.jpg
new file mode 100644
index 000000000..3a2a04971
Binary files /dev/null and b/preview-calendar/staff-list/images/thinh_nguyen.jpg differ
diff --git a/preview-calendar/staff-list/images/tim_cartwright.jpg b/preview-calendar/staff-list/images/tim_cartwright.jpg
new file mode 100644
index 000000000..5a6ac61b8
Binary files /dev/null and b/preview-calendar/staff-list/images/tim_cartwright.jpg differ
diff --git a/preview-calendar/staff-list/images/tim_theisen.png b/preview-calendar/staff-list/images/tim_theisen.png
new file mode 100644
index 000000000..854b4d2da
Binary files /dev/null and b/preview-calendar/staff-list/images/tim_theisen.png differ
diff --git a/preview-calendar/staff-list/images/todd_miller.png b/preview-calendar/staff-list/images/todd_miller.png
new file mode 100644
index 000000000..3f41dcc17
Binary files /dev/null and b/preview-calendar/staff-list/images/todd_miller.png differ
diff --git a/preview-calendar/staff-list/images/todd_tannenbaum.jpg b/preview-calendar/staff-list/images/todd_tannenbaum.jpg
new file mode 100644
index 000000000..5b3eefd2b
Binary files /dev/null and b/preview-calendar/staff-list/images/todd_tannenbaum.jpg differ
diff --git a/preview-calendar/staff-list/images/wil_cram.jpg b/preview-calendar/staff-list/images/wil_cram.jpg
new file mode 100644
index 000000000..658c07e8a
Binary files /dev/null and b/preview-calendar/staff-list/images/wil_cram.jpg differ
diff --git a/preview-calendar/staff-list/images/william_swanson.jpg b/preview-calendar/staff-list/images/william_swanson.jpg
new file mode 100644
index 000000000..dc92bbf33
Binary files /dev/null and b/preview-calendar/staff-list/images/william_swanson.jpg differ
diff --git a/preview-calendar/staff-list/images/yuxiao.jpg b/preview-calendar/staff-list/images/yuxiao.jpg
new file mode 100644
index 000000000..344e0bf88
Binary files /dev/null and b/preview-calendar/staff-list/images/yuxiao.jpg differ
diff --git a/preview-calendar/staff-list/irene_landrum.yml b/preview-calendar/staff-list/irene_landrum.yml
new file mode 100644
index 000000000..9dbf367f9
--- /dev/null
+++ b/preview-calendar/staff-list/irene_landrum.yml
@@ -0,0 +1,13 @@
+name: "Irene Landrum"
+date: 2020-09-25T10:47:58+10:00
+draft: false
+image: "images/irene_landrum.png"
+title: "Project Manager"
+#website: ""
+institution: "Morgridge Institute for Research"
+weight: 5
+status: Staff
+organizations:
+ - path
+ - chtc
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/jaime_frey.yml b/preview-calendar/staff-list/jaime_frey.yml
new file mode 100644
index 000000000..8d04d334d
--- /dev/null
+++ b/preview-calendar/staff-list/jaime_frey.yml
@@ -0,0 +1,14 @@
+name: "Jaime Frey"
+date: 2018-11-19T10:47:58+10:00
+draft: false
+image: "images/jaime_frey.jpg"
+title: "Senior Systems Software Developer"
+#website: ""
+institution: "University of Wisconsin-Madison"
+status: Staff
+weight: 5
+chtc:
+ title: HTCondor Core Developer
+organizations:
+ - path
+ - chtc
\ No newline at end of file
diff --git a/preview-calendar/staff-list/janet_stathas.yml b/preview-calendar/staff-list/janet_stathas.yml
new file mode 100644
index 000000000..ecb521df4
--- /dev/null
+++ b/preview-calendar/staff-list/janet_stathas.yml
@@ -0,0 +1,14 @@
+name: "Janet Stathas"
+date: 2020-10-27T10:47:58+10:00
+draft: false
+image: "images/janet_stathas.jpg"
+title: "Project Manager"
+institution: "Morgridge Institute for Research"
+#website: ""
+linkedinurl: ""
+status: Staff
+weight: 5
+organizations:
+ - path
+ - chtc
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/jason_patton.yml b/preview-calendar/staff-list/jason_patton.yml
new file mode 100644
index 000000000..f0f91494c
--- /dev/null
+++ b/preview-calendar/staff-list/jason_patton.yml
@@ -0,0 +1,12 @@
+name: "Jason Patton"
+date: 2018-11-19T10:47:58+10:00
+draft: false
+image: "images/jason_patton.png"
+title: "Software Integration Developer"
+#website: ""
+institution: "University of Wisconsin-Madison"
+weight: 5
+status: Staff
+organizations:
+ - path
+ - chtc
\ No newline at end of file
diff --git a/preview-calendar/staff-list/jeff_dost.yml b/preview-calendar/staff-list/jeff_dost.yml
new file mode 100644
index 000000000..dc905c664
--- /dev/null
+++ b/preview-calendar/staff-list/jeff_dost.yml
@@ -0,0 +1,6 @@
+name: "Jeff Dost"
+image: "images/jeff_dost.jpg"
+title: "Program Analyst"
+institution: "University of California San Diego"
+organizations:
+ - path
\ No newline at end of file
diff --git a/preview-calendar/staff-list/jeff_peterson.yml b/preview-calendar/staff-list/jeff_peterson.yml
new file mode 100644
index 000000000..57b0768d4
--- /dev/null
+++ b/preview-calendar/staff-list/jeff_peterson.yml
@@ -0,0 +1,13 @@
+name: "Jeff Peterson"
+date: 2018-11-19T10:47:58+10:00
+draft: false
+image: "images/jeff_peterson.jpg"
+title: "System Administrator"
+institution: "Morgridge Institute"
+status: Staff
+website: http://opensciencegrid.org
+weight: 5
+organizations:
+ - path
+ - chtc
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/jeronimo_bezerra.yml b/preview-calendar/staff-list/jeronimo_bezerra.yml
new file mode 100644
index 000000000..a6e1c28db
--- /dev/null
+++ b/preview-calendar/staff-list/jeronimo_bezerra.yml
@@ -0,0 +1,6 @@
+name: "Jeronimo Bezerra"
+image: "images/jeronimo_bezerra.jpeg"
+title: "Senior Systems Administrator"
+institution: "Florida International University"
+organizations:
+ - path
\ No newline at end of file
diff --git a/preview-calendar/staff-list/joe_bartkowiak.yml b/preview-calendar/staff-list/joe_bartkowiak.yml
new file mode 100644
index 000000000..79ddf5dd7
--- /dev/null
+++ b/preview-calendar/staff-list/joe_bartkowiak.yml
@@ -0,0 +1,11 @@
+image: images/joe_bartkowiak.jpg
+institution: University of Wisconsin Madison
+title: Systems Administrator
+name: Joe Bartkowiak
+shortname: jbartkowiak
+status: Staff
+website: null
+organizations:
+ - path
+ - chtc
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/joe_reuss.yml b/preview-calendar/staff-list/joe_reuss.yml
new file mode 100644
index 000000000..62d5305d8
--- /dev/null
+++ b/preview-calendar/staff-list/joe_reuss.yml
@@ -0,0 +1,14 @@
+image: images/joe_reuss.jpeg
+institution: University of Wisconsin-Madison
+title: Software Engineer
+name: Joe Reuss
+status: Past
+website: null
+pelican :
+ title: Software Engineer
+ weight: 8
+organizations:
+ - path
+ - chtc
+ - osg
+ - pelican
\ No newline at end of file
diff --git a/preview-calendar/staff-list/john_knoeller.yml b/preview-calendar/staff-list/john_knoeller.yml
new file mode 100644
index 000000000..efcd15338
--- /dev/null
+++ b/preview-calendar/staff-list/john_knoeller.yml
@@ -0,0 +1,14 @@
+name: "John TJ Knoeller"
+date: 2018-11-19T10:47:58+10:00
+draft: false
+image: "images/john_knoeller.jpg"
+title: "Systems Software Developer"
+status: Staff
+#website: ""
+institution: "University of Wisconsin-Madison"
+weight: 5
+chtc:
+ title: HTCondor Core Developer
+organizations:
+ - path
+ - chtc
\ No newline at end of file
diff --git a/preview-calendar/staff-list/john_parsons.yml b/preview-calendar/staff-list/john_parsons.yml
new file mode 100644
index 000000000..70cece8d2
--- /dev/null
+++ b/preview-calendar/staff-list/john_parsons.yml
@@ -0,0 +1,8 @@
+image: images/john_parsons.jpeg
+institution: University of Wisconsin Madison
+title: System Administrator Intern
+name: John Parsons
+status: Past
+website: null
+organizations:
+ - chtc
\ No newline at end of file
diff --git a/preview-calendar/staff-list/john_thiltges.yml b/preview-calendar/staff-list/john_thiltges.yml
new file mode 100644
index 000000000..88e7afb06
--- /dev/null
+++ b/preview-calendar/staff-list/john_thiltges.yml
@@ -0,0 +1,13 @@
+name: "John Thiltges"
+shortname: jthiltges
+date: 2018-11-19T10:47:58+10:00
+draft: false
+image: "images/john_thiltges.jpg"
+title: "Systems Administrator"
+institution: "University of Nebraska-Lincoln"
+#website: ""
+linkedinurl: ""
+weight: 5
+organizations:
+ - path
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/jordan_sklar.yml b/preview-calendar/staff-list/jordan_sklar.yml
new file mode 100644
index 000000000..d9710a24f
--- /dev/null
+++ b/preview-calendar/staff-list/jordan_sklar.yml
@@ -0,0 +1,12 @@
+name: "Jordan Sklar"
+image: "images/jordan_sklar.jpg"
+title: "Student Science Writer"
+institution: "Morgridge Institute for Research"
+website: null
+weight: 3
+status: Student
+organizations:
+ - path
+ - chtc
+ - osg
+ - pelican
diff --git a/preview-calendar/staff-list/josh_drake.yml b/preview-calendar/staff-list/josh_drake.yml
new file mode 100644
index 000000000..95da05fee
--- /dev/null
+++ b/preview-calendar/staff-list/josh_drake.yml
@@ -0,0 +1,14 @@
+name: "Josh Drake"
+date: 2021-07-20T09:00:00+10:00
+draft: false
+image: "images/josh_drake.jpg"
+title: "Institutional PI"
+institution: "Indiana University"
+website: https://cacr.iu.edu/about/people/Josh_Drake.html
+osg:
+ title: OSG Information Security Officer
+ promoted: true
+ weight: 6
+organizations:
+ - path
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/josh_edwards.yml b/preview-calendar/staff-list/josh_edwards.yml
new file mode 100644
index 000000000..6b5f23d26
--- /dev/null
+++ b/preview-calendar/staff-list/josh_edwards.yml
@@ -0,0 +1,9 @@
+image: images/josh_edwards.jpeg
+institution: Indiana University
+title: Security Analyst
+name: Josh Edwards
+status: Staff
+website: null
+organizations:
+ - path
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/judith_stephen.yml b/preview-calendar/staff-list/judith_stephen.yml
new file mode 100644
index 000000000..f58644ca7
--- /dev/null
+++ b/preview-calendar/staff-list/judith_stephen.yml
@@ -0,0 +1,6 @@
+name: "Judith Stephen"
+image: "images/judith_stephen.jpeg"
+title: "Systems Administrator"
+institution: "University of Chicago"
+organizations:
+ - path
\ No newline at end of file
diff --git a/preview-calendar/staff-list/julio_ibarra.yml b/preview-calendar/staff-list/julio_ibarra.yml
new file mode 100644
index 000000000..f1b716568
--- /dev/null
+++ b/preview-calendar/staff-list/julio_ibarra.yml
@@ -0,0 +1,6 @@
+name: "Julio Ibarra"
+image: "images/julio_ibarra.jpg"
+title: "Institutional PI"
+institution: "Florida International University"
+organizations:
+ - path
\ No newline at end of file
diff --git a/preview-calendar/staff-list/justin_hiemstra.yml b/preview-calendar/staff-list/justin_hiemstra.yml
new file mode 100644
index 000000000..4e3386d48
--- /dev/null
+++ b/preview-calendar/staff-list/justin_hiemstra.yml
@@ -0,0 +1,12 @@
+image: images/justin_hiemstra.jpg
+institution: Morgridge Institute For Research
+title: Research Software Engineer
+name: Justin Hiemstra
+status: Staff
+website: null
+pelican:
+ weight: 5
+organizations:
+ - chtc
+ - osg
+ - pelican
diff --git a/preview-calendar/staff-list/kent_cramer_iii.yml b/preview-calendar/staff-list/kent_cramer_iii.yml
new file mode 100644
index 000000000..ebd3e4c1b
--- /dev/null
+++ b/preview-calendar/staff-list/kent_cramer_iii.yml
@@ -0,0 +1,7 @@
+image: images/kent_cramer.jpeg
+institution: Morgridge Institute For Research
+title: Network Infrastructure Support Specialist
+name: Kent Cramer III
+status: Staff
+organizations:
+ - chtc
diff --git a/preview-calendar/staff-list/kristina_zhao.yml b/preview-calendar/staff-list/kristina_zhao.yml
new file mode 100644
index 000000000..c65e8951d
--- /dev/null
+++ b/preview-calendar/staff-list/kristina_zhao.yml
@@ -0,0 +1,25 @@
+name: Kristina Zhao
+title: Fellow
+institution: Morgridge Institute for Research
+status: Student
+organizations:
+ - chtc
+image: images/kristina_zhao.jpg
+
+
+fellowship:
+ name: Integrating PyTorch and Pelican
+ description: |
+ PyTorch is one of the most popular machine learning frameworks.
+ An important aspect of using it is the data engineering: how
+ is input data fed into the model during training? Going from
+ “tutorial scale” problems to cutting-edge research requires
+ drastically different techniques around data handling.
+
+ For this project, we aim to better integrate Pelican
+ into the PyTorch community, providing both technical
+ mechanisms (implementing the fsspec interface for Pelican)
+ and documentation by providing tutorials and recipes for
+ scaling PyTorch-based training using a combination of HTCondor
+ and Pelican.
+ mentor: Emma Turetsky and Ian Ross
diff --git a/preview-calendar/staff-list/lili_bicoy.yml b/preview-calendar/staff-list/lili_bicoy.yml
new file mode 100644
index 000000000..3afdf47a6
--- /dev/null
+++ b/preview-calendar/staff-list/lili_bicoy.yml
@@ -0,0 +1,12 @@
+image: images/lili_bicoy.jpg
+institution: Morgridge Institute For Research
+title: Student Science Writer
+name: Lili Bicoy
+status: Past
+website: null
+pelican:
+ weight: 17
+organizations:
+ - chtc
+ - osg
+ - pelican
\ No newline at end of file
diff --git a/preview-calendar/staff-list/marissa_zhang.yaml b/preview-calendar/staff-list/marissa_zhang.yaml
new file mode 100644
index 000000000..b2ae31bb6
--- /dev/null
+++ b/preview-calendar/staff-list/marissa_zhang.yaml
@@ -0,0 +1,8 @@
+image: images/default.jpg
+institution: University of Wisconsin-Madison
+title: System Administrator Intern
+name: Marissa (Yujia) Zhang
+status: Student
+website: null
+organizations:
+ - chtc
diff --git a/preview-calendar/staff-list/matevz_tadel.yml b/preview-calendar/staff-list/matevz_tadel.yml
new file mode 100644
index 000000000..5833a4188
--- /dev/null
+++ b/preview-calendar/staff-list/matevz_tadel.yml
@@ -0,0 +1,10 @@
+image: images/matevz_tadel.jpg
+institution: University of California San Diego
+title: Project Scientist
+name: Matevz Tadel
+status: Staff
+website: null
+pelican:
+ weight: 10
+organizations:
+ - pelican
\ No newline at end of file
diff --git a/preview-calendar/staff-list/mats_rynge.yml b/preview-calendar/staff-list/mats_rynge.yml
new file mode 100644
index 000000000..6d7e7b14d
--- /dev/null
+++ b/preview-calendar/staff-list/mats_rynge.yml
@@ -0,0 +1,13 @@
+name: "Mats Rynge"
+shortname: rynge
+date: 2018-11-19T10:47:58+10:00
+draft: false
+image: "images/mats_rynge.jpg"
+title: "Systems Integrator"
+institution: "University of Southern California - Information Sciences Institute"
+#website: ""
+linkedinurl: ""
+weight: 5
+organizations:
+ - path
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/matt_westphall.yml b/preview-calendar/staff-list/matt_westphall.yml
new file mode 100644
index 000000000..9905020a0
--- /dev/null
+++ b/preview-calendar/staff-list/matt_westphall.yml
@@ -0,0 +1,10 @@
+image: images/matt_westphall.jpeg
+institution: University of Wisconsin-Madison
+title: Research Cyberinfrastructure Specialist
+name: Matt Westphall
+status: Staff
+website: null
+organizations:
+ - path
+ - chtc
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/matyas_selmeci.yml b/preview-calendar/staff-list/matyas_selmeci.yml
new file mode 100644
index 000000000..0f1b5d21a
--- /dev/null
+++ b/preview-calendar/staff-list/matyas_selmeci.yml
@@ -0,0 +1,17 @@
+name: "Mátyás Selmeci"
+shortname: matyasselmeci
+date: 2020-09-18T15:46:09-05:00
+draft: false
+image: "images/matyas_selmeci.jpg"
+title: "Software Integration Developer"
+institution: "University of Wisconsin–Madison"
+status: Staff
+linkedinurl: ""
+weight: 5
+pelican:
+ weight: 15
+organizations:
+ - path
+ - chtc
+ - osg
+ - pelican
\ No newline at end of file
diff --git a/preview-calendar/staff-list/max_hartke.yml b/preview-calendar/staff-list/max_hartke.yml
new file mode 100644
index 000000000..7a7d25096
--- /dev/null
+++ b/preview-calendar/staff-list/max_hartke.yml
@@ -0,0 +1,9 @@
+image: images/max_hartke.jpg
+institution: University of Wisconsin-Madison
+title: Student Programming Intern
+name: Max Hartke
+status: Past
+website: null
+organizations:
+ - path
+ - chtc
\ No newline at end of file
diff --git a/preview-calendar/staff-list/michael_collins.yml b/preview-calendar/staff-list/michael_collins.yml
new file mode 100644
index 000000000..30368c476
--- /dev/null
+++ b/preview-calendar/staff-list/michael_collins.yml
@@ -0,0 +1,11 @@
+name: Michael Collins
+shortname: mcollins
+title: Systems Administrator
+active: green
+institution: Morgridge Institute for Research
+website:
+image: images/michael_collins.png
+status: Past
+organizations:
+ - chtc
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/mihir_manna.yml b/preview-calendar/staff-list/mihir_manna.yml
new file mode 100644
index 000000000..486c804a0
--- /dev/null
+++ b/preview-calendar/staff-list/mihir_manna.yml
@@ -0,0 +1,9 @@
+image: images/mihir_manna.jpeg
+institution: University of Wisconsin-Madison
+title: System Administrator Intern
+name: Mihir Manna
+status: Past
+website: null
+organizations:
+ - path
+ - chtc
\ No newline at end of file
diff --git a/preview-calendar/staff-list/miron_livny.yml b/preview-calendar/staff-list/miron_livny.yml
new file mode 100644
index 000000000..f22c183b5
--- /dev/null
+++ b/preview-calendar/staff-list/miron_livny.yml
@@ -0,0 +1,26 @@
+name: "Miron Livny"
+shortname: miron
+date: 2018-11-19T10:47:58+10:00
+draft: false
+image: "images/miron_livny.png"
+title: "PATh PI"
+website: "https://wid.wisc.edu/people/miron-livny/"
+institution: "University of Wisconsin–Madison"
+promoted: true
+weight: 1
+status: Leadership
+description: Livny is a Professor of Computer Science and the lead of the PATh project.
+chtc:
+ title: Director
+osg:
+ title: OSG Technical Director and PI
+ promoted: true
+ weight: 1
+pelican:
+ title: Co-Principal Investigator
+ weight: 2
+organizations:
+ - path
+ - chtc
+ - osg
+ - pelican
\ No newline at end of file
diff --git a/preview-calendar/staff-list/molly_mccarthy.yml b/preview-calendar/staff-list/molly_mccarthy.yml
new file mode 100644
index 000000000..fa99b1f01
--- /dev/null
+++ b/preview-calendar/staff-list/molly_mccarthy.yml
@@ -0,0 +1,12 @@
+name: "Molly McCarthy"
+image: "images/molly_mccarthy.jpg"
+title: "Student Web Developer"
+institution: "Morgridge Institute for Research"
+website: null
+weight: 3
+status: Past
+organizations:
+ - path
+ - chtc
+ - osg
+ - pelican
\ No newline at end of file
diff --git a/preview-calendar/staff-list/neha_talluri.yml b/preview-calendar/staff-list/neha_talluri.yml
new file mode 100644
index 000000000..8c69a14bd
--- /dev/null
+++ b/preview-calendar/staff-list/neha_talluri.yml
@@ -0,0 +1,20 @@
+name: Neha Talluri
+title: Fellow
+institution: Morgridge Institute for Research
+status: Student
+organizations:
+ - chtc
+image: images/neha_talluri.jpg
+
+fellowship:
+ name: Where in the world am I
+ description: |
+ In PATh, an important part of the infrastructure is the “glidein”, a client that
+ starts at a remote location and provides computational cycles for research.
+ In the past, glideins have relied on configuration at remote locations to
+ determine their location but this often results in missing or incorrect
+ information. This project will focus on enhancing glideins so that they
+ can detect and report where they are running in the world, possibly including
+ data like geolocation and institutional owner. After a successful summer,
+ the student fellow will gain skills in Python, bash, and layer 3 networking.
+ mentor: Jason Patton
diff --git a/preview-calendar/staff-list/pascal_paschos.yml b/preview-calendar/staff-list/pascal_paschos.yml
new file mode 100644
index 000000000..d2b72c090
--- /dev/null
+++ b/preview-calendar/staff-list/pascal_paschos.yml
@@ -0,0 +1,10 @@
+name: "Pascal Paschos"
+date: 2020-09-28T05:00:01-05:00
+draft: false
+image: "images/pascal_paschos.png"
+title: "Senior Computational Scientist"
+institution: "University of Chicago"
+#website: ""
+weight: 5
+organizations:
+ - path
\ No newline at end of file
diff --git a/preview-calendar/staff-list/patrick_brophy.yml b/preview-calendar/staff-list/patrick_brophy.yml
new file mode 100644
index 000000000..633b649d1
--- /dev/null
+++ b/preview-calendar/staff-list/patrick_brophy.yml
@@ -0,0 +1,24 @@
+name: Patrick Brophy
+title: Fellow
+institution: Morgridge Institute for Research
+status: Student
+organizations:
+ - chtc
+image: images/patrick_brophy.jpg
+
+fellowship:
+ name: Expanded Pelican Origin Monitoring
+ description: |
+ The Pelican origin service is responsible for exporting objects in the backend
+ storage to the data federation. As it is the “entry point” for the data, understanding
+ the load on the origin and its activities is key to keeping the federation healthy.
+ Pelican takes monitoring data from the web server component and feeds it into the popular
+ Prometheus software to store time series about the activity. This project would focus on:
+ - Implementing new monitoring probes to complement the existing information.
+ - Forwarding the raw, unsummarized data to an ElasticSearch database for further analysis.
+ - Designing visualizations to provide administrators with an overview of the origin’s activities.
+ - Implementing alerts when there are health issues with the origin.
+
+ After a successful summer, the student fellow will gain skills in using the Go
+ language, the Prometheus monitoring system (and other Cloud Native technologies), and web design.
+ mentor: Haoming Meng
diff --git a/preview-calendar/staff-list/pratham_patel.yml b/preview-calendar/staff-list/pratham_patel.yml
new file mode 100644
index 000000000..ec78da9ca
--- /dev/null
+++ b/preview-calendar/staff-list/pratham_patel.yml
@@ -0,0 +1,19 @@
+name: Pratham Patel
+title: Fellow
+institution: Morgridge Institute for Research
+status: Student
+organizations:
+ - chtc
+image: images/pratham_patel.jpg
+
+fellowship:
+ name: Enhancing container image build system
+ description: |
+ Container images are a widely used technology to package and distribute
+ software and services for use in systems such as Docker or Kubernetes.
+ The PATh project builds hundreds of these images on a weekly basis but
+ the build system needs improvement to support more images and additional
+ use cases. This project will focus on taking the existing system and
+ adding configurable, per-image build options. After a successful summer,
+ the student fellow will gain skills in Docker containers, GitHub actions, and Bash.
+ mentor: Brian Lin
diff --git a/preview-calendar/staff-list/rachel_lombardi.yml b/preview-calendar/staff-list/rachel_lombardi.yml
new file mode 100644
index 000000000..7a85404c6
--- /dev/null
+++ b/preview-calendar/staff-list/rachel_lombardi.yml
@@ -0,0 +1,15 @@
+name: "Rachel Lombardi"
+date: 2021-11-23T19:31:00-05:00
+draft: false
+image: "images/rachel_lombardi.jpg"
+title: "Research Computing Facilitator"
+institution: "University of Wisconsin–Madison"
+status: Staff
+is_facilitator: 1
+#website: ""
+linkedinurl: ""
+weight: 5
+organizations:
+ - path
+ - chtc
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/rich_wellner.yml b/preview-calendar/staff-list/rich_wellner.yml
new file mode 100644
index 000000000..fac85a9ae
--- /dev/null
+++ b/preview-calendar/staff-list/rich_wellner.yml
@@ -0,0 +1,10 @@
+image: images/rich_wellner.jpg
+institution: San Diego Supercomputer Center
+title: SDx Director
+name: Rich Wellner
+status: Staff
+website: null
+pelican:
+ weight: 11
+organizations:
+ - pelican
\ No newline at end of file
diff --git a/preview-calendar/staff-list/rishideep_rallabandi.yml b/preview-calendar/staff-list/rishideep_rallabandi.yml
new file mode 100644
index 000000000..6393519bd
--- /dev/null
+++ b/preview-calendar/staff-list/rishideep_rallabandi.yml
@@ -0,0 +1,9 @@
+image: images/rishideep_rallabandi.jpg
+institution: University of Wisconsin-Madison
+title: Student Programming Intern
+name: Rishideep Rallabandi
+status: Past
+website: null
+organizations:
+ - path
+ - chtc
\ No newline at end of file
diff --git a/preview-calendar/staff-list/rob_gardner.yml b/preview-calendar/staff-list/rob_gardner.yml
new file mode 100644
index 000000000..d3ce96ad7
--- /dev/null
+++ b/preview-calendar/staff-list/rob_gardner.yml
@@ -0,0 +1,13 @@
+name: Rob Gardner
+shortname: robrwg
+image: images/rob_gardner.jpg
+institution: University of Chicago
+title: Institutional PI
+website: https://efi.uchicago.edu/people/profile/rob-gardner/
+osg:
+ title: OSG Collaboration Support Lead and OSG Council Chair
+ promoted: true
+ weight: 4
+organizations:
+ - path
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/ryan_boone.yml b/preview-calendar/staff-list/ryan_boone.yml
new file mode 100644
index 000000000..f333efd40
--- /dev/null
+++ b/preview-calendar/staff-list/ryan_boone.yml
@@ -0,0 +1,20 @@
+name: Ryan Boone
+title: Fellow
+institution: Morgridge Institute for Research
+status: Student
+organizations:
+ - chtc
+image: images/ryan_boone.jpg
+
+fellowship:
+ name: Grid Exerciser
+ description: |
+ The OSPool is a very large, very dynamic, heterogenous high throughput system composed of execute
+ points from dozens of campuses all over the United States. Sometimes, something will go wrong
+ at one of these many sites, or one network, or one storage point, and it is difficult to determine
+ where the problem is. This project proposed the design and construction of a “Grid Exerciser”,
+ which consists of intentionally sending sample jobs to targetted locations on the OSPool to verify
+ correct operation and sufficient performance. The project will also have a reporting and
+ visualization component so that the voluminous results can be understood by a human in a
+ concise manner.
+ mentor: Cole Bollig and Rachel Lombardi
diff --git a/preview-calendar/staff-list/ryan_jacobs.yml b/preview-calendar/staff-list/ryan_jacobs.yml
new file mode 100644
index 000000000..292289be2
--- /dev/null
+++ b/preview-calendar/staff-list/ryan_jacobs.yml
@@ -0,0 +1,10 @@
+image: images/ryan_jacob.jpg
+institution: University of Wisconsin-Madison
+title: System Administrator Intern
+name: Ryan Jacob
+status: Past
+website: null
+organizations:
+ - path
+ - chtc
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/shawn_mckee.yml b/preview-calendar/staff-list/shawn_mckee.yml
new file mode 100644
index 000000000..db7043a4c
--- /dev/null
+++ b/preview-calendar/staff-list/shawn_mckee.yml
@@ -0,0 +1,9 @@
+name: Shawn McKee
+shortname: smckee
+title: Network Area Coordinator
+active: green
+institution: University of Michigan-Ann Arbor
+website: https://lsa.umich.edu/physics/people/research-scientists/smckee.html
+image: images/shawn_mckee.jpg
+organizations:
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/shirley_obih.yml b/preview-calendar/staff-list/shirley_obih.yml
new file mode 100644
index 000000000..6d608aa35
--- /dev/null
+++ b/preview-calendar/staff-list/shirley_obih.yml
@@ -0,0 +1,8 @@
+image: images/shirley_obih.jpg
+institution: Morgridge Institute For Research
+title: Communications Specialist
+name: Shirley Obih
+status: Past
+website: null
+organizations:
+ - chtc
\ No newline at end of file
diff --git a/preview-calendar/staff-list/showmic_islam.yml b/preview-calendar/staff-list/showmic_islam.yml
new file mode 100644
index 000000000..d8c6a7c51
--- /dev/null
+++ b/preview-calendar/staff-list/showmic_islam.yml
@@ -0,0 +1,9 @@
+name: "Showmic Islam"
+image: "images/showmic_islam.jpg"
+title: "Research Facilitator"
+#website: ""
+institution: "University of Nebraska-Lincoln"
+weight: 5
+organizations:
+ - path
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/susan_sons.yml b/preview-calendar/staff-list/susan_sons.yml
new file mode 100644
index 000000000..a4a643c92
--- /dev/null
+++ b/preview-calendar/staff-list/susan_sons.yml
@@ -0,0 +1,9 @@
+name: Susan Sons
+shortname: HedgeMage
+title: Security Analyst
+active: green
+institution: Indiana University
+website: https://cacr.iu.edu/about/people/susan-sons.html
+image: images/susan_sons.jpg
+organizations:
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/tae_kidd.yml b/preview-calendar/staff-list/tae_kidd.yml
new file mode 100644
index 000000000..110281fba
--- /dev/null
+++ b/preview-calendar/staff-list/tae_kidd.yml
@@ -0,0 +1,13 @@
+image: images/tae_kidd.jpg
+institution: Morgridge Institute For Research
+title: Project Manager
+name: Tae Kidd
+status: Staff
+website: null
+pelican:
+ title: Project Manager
+ weight: 4
+organizations:
+ - path
+ - chtc
+ - pelican
\ No newline at end of file
diff --git a/preview-calendar/staff-list/theng_vang.yml b/preview-calendar/staff-list/theng_vang.yml
new file mode 100644
index 000000000..21a163165
--- /dev/null
+++ b/preview-calendar/staff-list/theng_vang.yml
@@ -0,0 +1,12 @@
+name: Theng Vang
+shortname: theng
+title: System Administrator
+active: green
+institution: University of Wisconsin-Madison
+website:
+image: images/theng_vang.jpg
+status: Staff
+organizations:
+ - path
+ - chtc
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/thinh_nguyen.yml b/preview-calendar/staff-list/thinh_nguyen.yml
new file mode 100644
index 000000000..55fb6d830
--- /dev/null
+++ b/preview-calendar/staff-list/thinh_nguyen.yml
@@ -0,0 +1,20 @@
+name: Thinh Nguyen
+title: Fellow
+institution: Morgridge Institute for Research
+status: Student
+organizations:
+ - chtc
+image: images/thinh_nguyen.jpg
+
+fellowship:
+ name: ML for failure classification in the OSPool
+ description: |
+ The OSPool runs hundreds of thousands of jobs every day on dozens of
+ different sites, each unique in their own way. Naturally, there are
+ many hundreds of failures, most of which the system works around, but
+ with added latency to workflow completion. This project would attempt
+ to automatically classify failures from job logs to detect common
+ patterns and highlight places for humans to look to fix common failures
+ with the most payoff. Students working on this project will gain
+ experience applying ML techniques to real world problems.
+ mentor: Justin Hiemstra
diff --git a/preview-calendar/staff-list/tim_cartwright.yml b/preview-calendar/staff-list/tim_cartwright.yml
new file mode 100644
index 000000000..d07f09501
--- /dev/null
+++ b/preview-calendar/staff-list/tim_cartwright.yml
@@ -0,0 +1,20 @@
+name: "Tim Cartwright"
+shortname: osg-cat
+date: 2020-09-21T05:00:01-05:00
+draft: false
+image: "images/tim_cartwright.jpg"
+title: "Research Services Manager"
+institution: "University of Wisconsin–Madison"
+website: http://pages.cs.wisc.edu/~cat/
+status: Staff
+weight: 5
+chtc:
+ title: OSG Deputy Director/XO
+osg:
+ title: CC* Coordinator
+ promoted: true
+ weight: 5
+organizations:
+ - path
+ - chtc
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/tim_theisen.yml b/preview-calendar/staff-list/tim_theisen.yml
new file mode 100644
index 000000000..702b0fc68
--- /dev/null
+++ b/preview-calendar/staff-list/tim_theisen.yml
@@ -0,0 +1,16 @@
+name: "Tim Theisen"
+date: 2018-11-19T10:47:58+10:00
+draft: false
+image: "images/tim_theisen.png"
+title: "Senior Systems Software Developer"
+status: Staff
+institution: "University of Wisconsin-Madison"
+weight: 5
+chtc:
+ title: Release Manager
+osg:
+ title: Release Manager
+organizations:
+ - path
+ - chtc
+ - osg
\ No newline at end of file
diff --git a/preview-calendar/staff-list/todd_miller.yml b/preview-calendar/staff-list/todd_miller.yml
new file mode 100644
index 000000000..579c578c3
--- /dev/null
+++ b/preview-calendar/staff-list/todd_miller.yml
@@ -0,0 +1,13 @@
+name: "Todd L Miller"
+date: 2018-11-19T10:47:58+10:00
+draft: false
+image: "images/todd_miller.png"
+title: "Senior Systems Software Developer"
+status: Staff
+institution: "University of Wisconsin-Madison"
+weight: 5
+chtc:
+ title: HTCondor Core Developer
+organizations:
+ - path
+ - chtc
\ No newline at end of file
diff --git a/preview-calendar/staff-list/todd_tannenbaum.yml b/preview-calendar/staff-list/todd_tannenbaum.yml
new file mode 100644
index 000000000..40fd96c48
--- /dev/null
+++ b/preview-calendar/staff-list/todd_tannenbaum.yml
@@ -0,0 +1,16 @@
+name: "Todd Tannenbaum"
+date: 2018-11-19T10:47:58+10:00
+draft: false
+image: "images/todd_tannenbaum.jpg"
+title: "Software Development co-lead"
+#website: ""
+institution: "University of Wisconsin–Madison"
+promoted: true
+weight: 3
+status: Leadership
+description: Tannenbaum is a Researcher and HTCondor Technical Lead at UW-Madison, and co-lead of PATh Software Development.
+chtc:
+ title: HTCondor Software Lead
+organizations:
+ - path
+ - chtc
\ No newline at end of file
diff --git a/preview-calendar/staff-list/wil_cram.yml b/preview-calendar/staff-list/wil_cram.yml
new file mode 100644
index 000000000..dc5278479
--- /dev/null
+++ b/preview-calendar/staff-list/wil_cram.yml
@@ -0,0 +1,19 @@
+name: Wil Cram
+title: Fellow
+institution: Morgridge Institute for Research
+status: Student
+organizations:
+ - chtc
+image: images/wil_cram.jpg
+
+fellowship:
+ name: Schedd performance analysis for human
+ description: |
+ The condor_schedd is a single threaded program, and when it is overloaded,
+ it is difficult for administrators to understand why. There are some
+ statistics about what it is doing, but there is no clear way to present
+ this information in a useful way to an administrator. Students working
+ on this project would build visualizations of complex data, and work
+ with end users and facilitators to tune output for real world human
+ consumption.
+ mentor: Greg Thain
diff --git a/preview-calendar/staff-list/william_swanson.yml b/preview-calendar/staff-list/william_swanson.yml
new file mode 100644
index 000000000..2177eeb11
--- /dev/null
+++ b/preview-calendar/staff-list/william_swanson.yml
@@ -0,0 +1,11 @@
+name: William Swanson
+image: images/william_swanson.jpg
+title: Research Cyberinfrastructure Specialist
+institution: "University of Wisconsin\u2013Madison"
+status: Staff
+pelican:
+ weight: 16
+organizations:
+ - chtc
+ - osg
+ - pelican
\ No newline at end of file
diff --git a/preview-calendar/staff-list/yuxiao_qu.yml b/preview-calendar/staff-list/yuxiao_qu.yml
new file mode 100644
index 000000000..f70f73fd5
--- /dev/null
+++ b/preview-calendar/staff-list/yuxiao_qu.yml
@@ -0,0 +1,8 @@
+image: images/yuxiao.jpg
+institution: Morgridge Institute For Research
+title: Research Software Engineer
+name: Yuxiao Qu
+status: Past
+website: null
+organizations:
+ - chtc
diff --git a/preview-calendar/staff/.htaccess b/preview-calendar/staff/.htaccess
new file mode 100644
index 000000000..49643e4f5
--- /dev/null
+++ b/preview-calendar/staff/.htaccess
@@ -0,0 +1,13 @@
+AuthUserFile /p/condor/public/developers/dev-webpage-passwd
+AuthName "Condor Developers"
+AuthType Basic
+
+require valid-user
+
+
+AddHandler cgi-script .pl
+DefaultType text/html
+
+#PerlHandler HTML::Mason::ApacheHandler
+#PerlSetVar MasonCompRoot /s/www/html/condor/developers
+#PerlSetVar MasonDataDir /p/condor/public/mason
diff --git a/preview-calendar/staff/docs/Adding_News_Articles.html b/preview-calendar/staff/docs/Adding_News_Articles.html
new file mode 100644
index 000000000..ef3557df6
--- /dev/null
+++ b/preview-calendar/staff/docs/Adding_News_Articles.html
@@ -0,0 +1,558 @@
+
+
+
+
+
+
+Adding News Articles
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
You will be using Markdown to write all news articles. Markdown is a popular markup
+language that is converted to HTML before being displayed on the website.
+
+
A good cheatsheet can be found here which contains
+the markdown syntax and examples of how it looks when converted to html.
+
+
Adding Article To the Website
+
+
After you have written your article in the text editor of your choice and are ready to have it on the website you will first need to create a preview branch.
+
+
All of our websites have a preview location where you can view changes before adding them to the main website. I will use PATh for the example below, but this is the same process for CHTC, HTCondor and OSG as well.
Go to the Github repo and Create a preview branch
+
+
Branch name must start with ‘preview-‘ followed by a descriptive term.
+
+
Example: You write an article about HTC and Genes, you name the branch ‘preview-gene-article’.
+
+
+
+
+
+
Add your News article
+
+
Check that are in your new branch
+
+
The previous step will put you in your new preview branch. You can check by looking at the branch name displayed.
+
+
+
Go into the news article directory
+
Add new file with title ‘YYYY-MM-DD-title.md’
+
+
Example: For the HTC and Genes article -> ‘2021-12-25-htc-and-genes.md’ if you are going to publish the article on Christmas 2021.
+
+
+
Copy and Paste in the template
+
---
+ title: # Article Title
+ date: 9999-12-31 # Article Date - In format YYYY-MM-DD - Article will not show on website until Article Data >= Current Date
+ excerpt: # Article Excerpt - An abstract of the article
+ image_src: # Path to the image to be displayed in article cards
+ image_alt: # A description of this image
+ author: # Article Author
+ published: false # If this article should be on the website, change to true when ready to publish
+ ---
+
+ Content
+
+
+
Fill in all the front matter and replace ‘Content’ with your article.
+
+
+
+
Review your Preview
+
+
Look for your article preview at
+
Example for PATh: https://path-cc.io/web-preview/preview-helloworld
+
High definition images can take up space that slows down the website when it loads, because of this it is important to reduce this footprint before adding them to the website.
+
+
To keep the image size reasonable follow the rules below.
+
+
+
<= 1000 pixels wide
+
+
This is the maximum website width, so any images wider will have unused resolution.
+
+
+
Convert to jpg and reduce to reasonable size
+
+
This is up to you and changes picture to picture. Some pictures look fine when compressed, some don’t.
+
Reasonable target is ~200kb
+
+
+
+
+
Example
+
+
+
+
We will reduce this 2MB image to demonstrate.
+
+
+
+
Pull up the image in photoshop. If you don’t have photoshop contact IT as you do have free access.
+
+
+
Go to Export as…
+
+
+
+
Update the Values to reduce size.
+
+
+
+
Use your new compressed image in the article.
+
+
+
+
+
Positioning Images
+
+
To position images on the page you must use classes.
+For markdown this means including [: .<class> } above the image syntax, and
+for html this means adding class="<class>" inside the html tag.
+
+
Potential classes that can be used to position the image come from Bootstrap Utilities.
Don’t forget to also create your new schedule file which holds your schedule meta-data.
+
+
Creating The Schedule File
+
+
This section will describe how to create your yaml file, you can find a verbose template file on afs or use the one below.
+
+
<filename>.yml
+
Name: "name"
+ShortName: "shortname"
+corehours: "corehours"
+DailyEmail: "Yes" # ( or omit line entirely )
+Email: "email"
+Office: "location"
+Phone: "phone" # Office and/or Cell - This is read as a string so format how you want
+calendarurl: "calendarurl"
+default:
+ starttime: "starttime"
+ endtime: "endtime"
+ status: "status"
+
+
File Details
+
+
+
Name: First Last
+
ShortName: (Optional, defaults to First Name if not specified) Should be unique and obviously you. If your name is “George Washington” and George Foreman also works in your group, “GeorgeW” would be a good choice.
+
corehours: A description of your corehours that is displayed. Format is not important. Example is “9:00 AM to 5:00 PM”
+
DailyEmail: If ‘Yes’ then you will receive a daily email with who is out, otherwise should be omitted entirely.
+
Email: Your preferred email address. Defaults to filename@cs.wisc.edu so you will likely want to change this
+
Office: Your office location. Example => “4261 CS”
+
Phone: Your phone number(s). Example => “+1 608 265 5736 (office) +1 608 576 0351 (cell)”
+
DailyEmail: Do you want a daily email with information about who is gone?
+
calendarurl: The url to your outage calendar. Details on obtaining this found below.
+
starttime: Your typical start time, use military format. Example => “09:00”
+
endtime: Your typical end time, use military format. Example => “17:00”
+
status: Your status during these hours. If you are unsure use “Office”.
+
default[Monday, Tuesday, Wednesday, Thursday, Friday]: This overwrites the default for that day. Use the same format as default.
+
+
+
Important
+
+
All of these data strings have to be encased in double quotations to be valid yaml. This encasement can be seen in the template file.
+
+
Creating Your ICAL URL
+
+
To power your outage calendar you need to create a google calendar which is solely used to populate your outages.
+
+
+
Go to https://calendar.google.com/ and sign in with your preferred account. You can use @morgridge.org and @wisc.edu.
+
Create a new calendar
+
+
Name and Description do not matter
+
+
+
+
Go into Calendar settings and retrieve the Secret Address
+
+
Go to calendar Settings
+
+
Get the secret calendar url ( Will warn not to give this out )
+
+
+
+
Post this address into your yaml file as the calendarurl
+
+
+
Populating Your Days Off
+
+
Event Title
+
+
The event title should be one of the statuses bolded below. These statuses are
+used to key the type of outage so anything but a approved status should be in the event title.
+
+
+
Travel: Working, but not at the office. Perhaps a conference
+
Vacation: Taking vacation (“vacation” and “personal holiday” on the leave report)
+
Sick: Taking sick leave (“sick leave” on the leave report)
+
Holiday: Taking floating holiday (“legal holiday” on the leave report)
+
Furlough: State- or UW-mandated furlough (as required). Includes both fixed (“mandatory”) and floating time.
+
Off: Days not worked on a part-time employment
+
WFH: Work From Home
+
+
+
Event Description
+
+
Any description of the outage you would like to add can be added in the event
+description.
+
+
Marking Event Time
+
+
Marking Full day/days Out
+
+
To mark full day outages you create an event with the “All day” attribute ticked ( This is used in the demo above ). Populate the title and description as expected.
+
+
Do not use the recurring event feature for multiple outage days.
+
+
Marking Partial Outages
+
+
To mark partial time you must do two different things.
+
+
+
Append the amount of hours this outage is taking with a colon separating the title.
+
+
For Example, if you have a four hour doctor appt. you would mark SICK:4
+
For Example, if you leave for vacation half a day early you would mark VACATION:4
+
+
+
Mark the time you are *in* the office on Google
+
+
This is non-intuitive but when you are marking time you mark the time you are in.
+
For Example, if I am normally in 9-5 and am leaving 4 hours early I will mark my event to go from 9:00 AM to 1:00 PM.
+
+
+
+
+
Example
+
+
If you mark your title in Google as “Sick” and the description as “Wisdom Tooth Surgery and Recovery” the schedule output will be as so.
+ Tackling Strongly Correlated Quantum Systems on OSPool
+
+
Duke University Associate Professor of Physics Shailesh Chandrasekharan and his graduate student Venkitesh Ayyar are
+using the OSpool to tackle notoriously difficult problems in quantum systems.
+
+
+
+
+
+
+
+
+
+
+
These quantum systems are the physical systems of our universe, being investigated at the fundamental level where
+elemental units carrying energy behave according to the laws of quantum mechanics.
+In many cases, these units might be referred to as particles or more generally as “quantum degrees of freedom.”
+The most exciting physics arises when these units are strongly correlated: the behavior of each one depends on the
+system as a whole; they cannot be taken and studied independently.
+Such systems arise naturally in many areas of fundamental physics, ranging from condensed matter (many materials
+fabricated in laboratories contain electrons that are strongly correlated and show exotic properties) to nuclear and
+particle physics.
+
+
The proton, one of the particles inside an atom’s nucleus, is itself a strongly correlated bound state involving many
+quarks and gluons.
+Understanding its properties is an important research area in nuclear physics.
+The origin of mass and energy in the universe could be the result of strong correlations between fundamental quantum
+degrees.
+
+
“Often we can write down the microscopic theory that describes a physical system.
+For example, we believe we know how quarks and gluons interact with each other to produce a proton.
+But then to go from there to calculate, for instance, the spin of the proton or its structure is non-trivial,” said
+Chandrasekharan.
+“Similarly, in a given material we have a good grasp of how electrons hop from one atom to another.
+However, from that theory to compute the conductivity of a strongly correlated material is very difficult.
+The final answer—that helps us understand things better—requires a lot of computation.
+Typically the computational cost grows exponentially with the number of interacting quantum degrees of freedom.”
+
+
According to Chandrasekharan, the main challenge is to take this exponentially hard problem and convert it to something
+that scales as a polynomial and can be computed on a classical computer.
+“This step is often impossible for many strongly correlated quantum systems, due to the so-called
+sign problem which arises due to quantum mechanics,” added
+Chandrasekharan.
+“Once the difficult sign problem is solved, we can use Monte Carlo calculations to obtain answers.
+Computing clusters like the OSG can be used at that stage.”
+
+
Chandrasekharan has proposed an idea, called the fermion bag approach,
+that has solved numerous sign problems that seemed unsolvable in systems containing fermions (electrons and quarks are
+examples of fermions).
+In order to understand a new mechanism for the origin of mass in the universe, Ayyar is specifically using the OSG to
+study an interacting theory of fermions using the fermion bag approach.
+
+
+
+
“We compute correlation functions on lattices and look at their behavior as the lattice size increases,” Ayyar explained.
+In the presence of a mass, the correlation functions decay exponentially.
+“Ideally, we would want to perform computations on very large lattices (>100x100x100).
+Each calculation involves computing the inverse of large matrices millions of times.
+The matrix size scales with the lattice size and so the time taken increases very quickly (from days to weeks to months).
+This is what limits the size of the lattice used in our computation and the precision of the quantities calculated.
+”In a recent publication, Ayyar and Chandrasekharan performed computations on lattices of sizes up to 28x28x28, and
+more recently they have been able to push these to lattices of size 40x40x40.
+
+
Since their computation is parallelizable, they can run several calculations at the same time.
+Ayyar says this makes the OSG perfect for their work.
+“Instead of running a job for 100 days sequentially,” he noted, “we can run 100 jobs simultaneously for one day to get
+the same information.
+This not only helps us speed up our calculation several times, but we also get very high precision.”
+
+
Ayyar uses simple scripts to submit a large number of jobs and monitor their progress.
+One challenge he faced was the check-pointing of jobs.
+“Some of our jobs run long, say two to six days, and we found these getting terminated before completion due to the
+queuing system,” Ayyar said.
+To solve this, he developed what he calls ‘manual check-pointing’ to execute jobs in pieces.
+“This extends the completed processes and submits them so that long-running processes can be completed.
+Being able to control the memory and disk-space requirements on the target nodes has proved to be extremely useful.”
+
+
Ayyar also noted that many individual research groups cannot afford the luxury of having thousands of computing nodes.
+“This kind of resource sharing on the OSG has helped computational scientists like us attempt calculations that could
+not be done before,” he added.
+“For example, we are now attempting computations on lattices of size 60x60x60.
+One sweep should only take a few hours on each core.”
+
+
Chandrasekharan points out that past technology breakthroughs like the kind that revolutionized processor chip
+manufacturing have largely been based on basic quantum mechanics learned in the 1940s and 1950s.
+“We still need to understand the complexity that can be produced when many quantum degrees of freedom interact with each
+other strongly,” said Chandrasekharan.
+“The physics we learn could be quite rich.”
+
+
He says this next phase of research is already happening in nanoelectronics.
+“If the computational quantum many-body challenge that we face today is solved, it may help revolutionize the next
+generation of technology and give us a better understanding of the physics.”
+
+
Ayyar and Chandrasekharan recently submitted a paper based on their work using the OSG.
+Titled Massive fermions without fermion bilinear condensates,
+it has been published in the journal Physical Review D of the American Physical Society.
The CHTC offers a suite of open-source software tools that manage HTC
+workloads and enable organizations to form distributed HTC pools. The
+HTCondor Software Suite (HTCSS)
+is the product of over three decades of
+research and development at the Computer Sciences Department of the
+University of Wisconsin-Madison. It has been adopted by academic and
+commercial entities around the world in support of their HTC workloads.
+ The Pelican Project: Building a universal plug for scientific data-sharing
+
+
From its founding, the Morgridge Institute for Research has driven the idea that open sharing of research computing resources will be a great enabler of scientific discovery, powering everything from black hole astronomy to stem cell biology.
+
+
Increasingly, the principle of sharing is being applied not only to computing resources, but to the wealth of data those projects are producing. Resources such as high-throughput computing and the OSG Consortium have been incorporating more tools for scientists to share their raw data for further exploration.
+
+
This principle is now getting traction on a national policy scale. The White House Office of Science and Technology Policy (OSTP) established new requirements in 2022 that any research supported by federal funds must be made available to the public without embargoes or paywalls.
+
+
This mandate applies not only to published findings, but to the core data those findings are based upon. Within the scientific community, the approach is referred to as the “FAIR” principles, which means that scientific data should be “findable, accessible, interoperable and reusable.”
+
+
Obviously, applying this new standard to data is as much a technical challenge as it is a cultural one. A new project at the Morgridge, led by research computing investigators Brian Bockelman and Miron Livny, is working toward creating a software platform that can facilitate the sharing of diverse research datasets.
+
+
Nicknamed “Pelican,” the project is supported through a $7 million grant from the National Science Foundation (NSF). The award (OAC-2331489) will strive to make data produced by researchers, from single-investigator labs to international collaborations, more accessible for computing and remote clients for viewing. Pelican supports and extends the work Bockelman and Livny have been doing as part of the OSG Consortium for over a decade.
+
+
Bockelman says that public research data-sharing has been a growing movement the past decade, but the COVID-19 pandemic served as a potent catalyst. The pandemic made the benefits of sharing abundantly clear, including the development of a vaccine at an unprecedented pace — 6 months compared to a typical multi-year process.
+
+
“Our philosophy is that not only should your research paper be public and readable, but your data should be as well,” Bockelman says. “If scientists just say, ‘here are the results in a pretty graph,’ and don’t share the underlying dataset, we lose a lot of value when others can’t access the data, can’t interpret it, or use it for their own research.”
+
+
Bockelman says there are some other core benefits that may come from the open science push. By making data more readily accessible, it should improve the reproducibility of experiments and potentially reduce scientific fraud. It can also narrow the gap between the “haves” and “have-nots” in the research world by providing data access regardless of institutional resources.
+
+
Bockelman likens the Pelican project to developing a “universal adapter plug” that can accommodate all different types of data. Just like homes have standard outlets that work for all different household appliances, that same approach should help individual scientists plug into a sharable data platform regardless of the nature of their data.
+
+
One of the first proving grounds for Pelican will be its participation within the National Discovery Cloud for Climate, an effort to bring together compute, data, and network resources to democratize access and advance the climate-related science and engineering. Bockelman says the Pelican project will help optimize this data sharing effort with the climate science community and provide a proof of concept for other research areas.
+
+
But ultimately, the best benefit may be enhancing public trust in high-impact science.
+
+
“Even for people who may not go digging into the data, they want to know that science has been done responsibly, especially for fields where it directly affects their lives,” Bockelman says. “Climate is a great example of where the science can really drive regulations that affect people. Getting data out as open and following the FAIR principles … is part of that relationship between the scientific community and the society at large.”
+
+
Bockelman says making data accessible is more than just downloading from a webserver. Pelican works to establish approaches that help people utilize the data effectively from anywhere in the nation’s computing infrastructure — essential so anyone from a tribal college to the largest university can understand and interpret the climate data.
+
+
The original memo was written in 2022 by then OSTP Director Alondra Nelson, and today the “Nelson memo” is viewed as a watershed document in federal research policy.
+
+
“When research is widely available to other researchers and the public, it can save lives, provide policy makers with the tools to make critical decisions, and drive more equitable outcomes across every sector of society,” Nelson wrote. “The American people fund tens of billions of dollars of cutting-edge research annually. There should be no delay or barrier between the American public and the returns on their investments in research.”
+ Tribal College and CHTC pursue opportunities to expand computing education and infrastructure
+
+
Salish Kootenai College and CHTC take steps toward bringing underrepresented communities to cyberinfrastructure.
+
+
Access to cyberinfrastructure (CI) is the bedrock foundation essential for students and researchers determined to contribute to science.
+That’s why Lee Slater,
+the Cyberinfrastructure Facilitator at Salish Kootenai College (SKC), a tribal community college in northwest Montana, first brought
+up the “missing millions.” The term was coined after the National Science Foundation (NSF) reported
+that users and providers of the CI as a whole do not accurately represent society. Underrepresented racial and gender demographics were largely missing from
+the field. “[The missing millions] just don’t have access to high performance computing platforms and so they’re not contributing greatly to the scientific
+body of knowledge that other privileged students have access to,” Slater explained. “It’s a real serious deficit for these students. One of the goals we’re
+trying to get accomplished is to bring these educational and research platforms to students and faculty to really enrich the experience they have as students.”
+
+
SKC inhabits an indigenous reserve known as the Flathead Reservation, which includes territory in four western states. Established in 1855, the reservation
+is home to the Confederated Salish and Kootenai Tribes. SKC — with just over 600 students — makes up a
+small, but vital portion of the much larger reservation. The college consists largely of tribal descendents or members, making up almost 80 percent of the
+school population.
+
+
+
+
The Center for High Throughput Computing (CHTC) Director Miron Livny traveled
+to Montana this past October to meet with Salish Kootenai College faculty and staff. The four-day trip was coordinated by International Networking
+Coordinator Dale Smith from the University of Oregon, who also works for the American Indian Higher Education Consortium.
+The visit was meant for Livny to experience one of the nation’s tribal colleges and universities (TCUs) and to further the discourse between CHTC and SKC.
+“The main goal was for him to see our infrastructure, meet the faculty and see research opportunities,” Slater recalled.
+
+
SKC’s biggest and most immediate computing goal is to provide the access and training to utilize a web platform for JupyterHub that would be available
+for faculty and student use. The Jupyter Notebook connects with an OSPool Access Point, where students can place their workloads and data and which
+automates the execution of jobs and data movement across associated resources. Slater believes this would be beneficial, as many SKC faculty members do
+computing and data analysis within their specialties. “The fact that we could have a web platform with JupyterHub that students could access and faculty
+could access would really be a great facilitation,” Slater explained.
+
+
Slater would also like to collaborate with other TCUs, train faculty in computing software and overall increase their cyberinfrastructure capabilities.
+SKC Chief Information Officer (CIO) Al Anderson would
+like to leverage storage capacity for a faculty researcher who is examining the novel behavior of elk on the National Bison Range. This work requires taking a
+vast amount of photographs that then must be processed and stored. “We found that we have this storage issue — right now they’re using portable hard drives
+and it’s just a mess,” Anderson said.
+
+
Engagements like this are an early, but important step in bringing underserved communities to cyberinfrastructure and thus to science and research.
+The NSF “Missing Millions” report focused on the need for democratizing access to
+computing and showed a deficiency of engagement with institutions created for marginalized groups. Institutions like historically black colleges and universities (HBCUs)
+and TCUs tend to lack cyberinfrastructure capabilities that can be hard to implement without engagement from outside institutions.
+SKC’s engagement with CHTC is an example of steps both are taking in addressing this deficiency.
+
+
Longer term goals for the college are largely educational-focused. “We’re a small school, traditionally we have more educational needs than really heavy
+research needs,” Slater said. Anderson agreed stating, “I think a lot of our focus is the educational side of computing and how to get people hooked into
+those things.”
+
+
Anderson and Slater are also focused on relationship-building with faculty and discovering what they need to educate their students.
+They believe hearing from the SKC community should be first and foremost. “We’re still in that formative stage of asking, what do we need to support?”
+Anderson explained, “Through these conversations we’re slowly discovering.”
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/preview-calendar/ucsd-external-release.html b/preview-calendar/ucsd-external-release.html
new file mode 100644
index 000000000..639ca0d84
--- /dev/null
+++ b/preview-calendar/ucsd-external-release.html
@@ -0,0 +1,364 @@
+
+
+
+
+
+
+PATh Extends Access to Diverse Set of High Throughput Computing Research Programs
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ PATh Extends Access to Diverse Set of High Throughput Computing Research Programs
+
+
Finding the right road to research results is easier when there is a clear PATh to follow. The Partnership to Advance Throughput Computing (PATh)—a partnership between the OSG Consortium and the University of Wisconsin-Madison’s Center for High Throughput Computing (CHTC) supported by the National Science Foundation (NSF)—has cleared the way for science and engineering researchers for years with its commitment to advancing distributed high throughput computing (dHTC) technologies and methods.
+
+
HTC involves running a large number of independent computational tasks over long periods of time—from hours and days to week or months. dHTC tools leverage automation and build on distributed computing principles to save researchers with large ensembles incredible amounts of time by harnessing the computing capacity of thousands of computers in a network—a feat that with conventional computing could take years to complete.
+
+
Recently PATh launched the PATh Facility, a dHTC service meant to handle HTC workloads in support and advancement of NSF-funded open science. It was announced earlier this year via a Dear Colleague Letter issued by the NSF and identified a diverse set of eligible research programs that range across 14 domain science areas including geoinformatics, computational methods in chemistry, cyberinfrastructure, bioinformatics, astronomy, arctic research and more. Through this 2022-2023 fiscal year pilot project, the NSF awards credits for access to the PATh Facility, and researchers can request computing credits associated with their NSF awards. There are two ways to request credit: 1) within new proposals or 2) with existing awards via an email request for additional credits to participating program officers.
+
+
“It is a remarkable program because it spans almost the entirety of the NSF’s directorates and offices,” said San Diego Supercomputer Center (SDSC) Director Frank Würthwein, who also serves as executive director of the OSG Consortium.
+
+
Access to the PATh Facility offers researchers approximately 35,000 modern cores and up to 44 A100 GPUs. Recently SDSC, located at UC San Diego, added PATh Facility hardware on its Expanse supercomputer for use by researchers with PATh credits. According to SDSC Deputy Director Shawn Strande: “Within the first two weeks of operations, we saw researchers from 10 different institutions, including one minority serving institution, across nearly every field of science. The beauty of the PATh model of system integration is that researchers have access as soon as the resource is available via OSG. PATh democratizes access by lowering barriers to doing research on advanced computing resources.”
+
+
While the PATh credit ecosystem is still growing, any PATh Facility capacity not used for credit will be available to the Open Science Pool (OSPool) to benefit all open science under a Fair-Share allocation policy. “For researchers familiar with the OSPool, running HTC workloads on the PATh Facility should feel like second-nature” said Christina Koch, PATh’s research computing facilitator.
+
+
“Like the OSPool, the PATh Facility is nationally spanning, geographically distributed and ideal for HTC workloads. But while resources on the OSPool belong to a diverse range of campuses and organizations that have generously donated their resources to open science, the allocation of capacity in the PATh Facility is managed by the PATh project itself,” said Koch.
+
+
PATh will eventually reach over six national sites: SDSC at UC San Diego, CHTC at the University of Wisconsin-Madison, the Holland Computing Center at the University of Nebraska-Lincoln, Syracuse University’s Research Computing group, the Texas Advanced Computing Center at the University of Texas at Austin and Florida International University’s AMPATH network in Miami.
+
+
PIs may contact credit-accounts@path-cc.io with questions about PATh resources, using HTC, or estimating credit needs. More details also are available on the PATh credit accounts web page.
Researchers at the USGS are using HTC to pinpoint potential invasive species for the United States.
+
+
+
+
Benjamin Franklin famously advised that an ounce of prevention is worth a pound of cure, and researcher Richard Erickson has taken this advice to heart in his mission to protect our lakes and wildlife from invasive species. As a research ecologist at the United States Geological Survey’s (USGS) Upper Midwest Environmental Sciences Center, Erickson uses computation to identify invasive species before they pose a threat to U.S. ecosystems.
+
+
Instrumental to his preventative mission is the HTCondor Software Suite (HTCSS) and consulting from UW-Madison’s Center for High Throughput Computing (CHTC), which have been integral to the USGS’s in-house computing infrastructure. Equipped with the management capabilities of HTCSS and guidance from CHTC, Erickson recently completed a high-throughput horizon scan of over 8000 different species in less than two days.
+
+
Explaining how his team was able to accomplish such a feat in merely one weekend, Erickson reasons: ”High throughput computing software allows [big problems] to be broken into small jobs. Rather than having to worry about everything, I just have to worry about a small thing, and then high throughput computing does the small thing many times over, to solve big problems through small steps.”
+
+
Erickson’s big problem first began to take shape in 2020 when the U.S. Fish and Wildlife Service (FWS) provided the USGS with a list of over 8000 species currently being bought and sold in the United States, from Egyptian Geese, to Algerian hedgehogs, to Siberian weasels. If these animals proliferate in U.S. environments, they could potentially threaten native species, the ecosystem as a whole, and the societal and economic value associated with it. Erickson’s job? To determine which species are a threat, and to what areas –– a tall order when faced with 8000 unique species and roughly 900 different ecological regions across the United States.
+
+
With HTC, Erickson could approach this task by breaking it down into small, manageable steps. Each species was independent of one another, meaning that the colossal collection of 8000 plants and animals could be organized into 8000 different jobs for HTCSS to run in parallel. Each job contained calculations comparing the US and non-US environments across sixteen different climate metrics. Individually, the jobs took anywhere from under thirty minutes to over two hours to run.
+
+
To analyze this type of data, the team created their own R package, climatchR. The package was released to the public in early September, and the team plans to make their HTCondor code publicly available after it undergoes USGS review.
+
+
But the HTC optimization didn’t end there. Because the project also required several complex GIS software dependencies, the group used Docker to build a container that could hold the various R and GIS dependencies in the context of a preferred operating system. Such containers make the software portable and consistent between diverse users and their computers, and can also be easily distributed by HTCSS to provide a consistent and custom environment for each computed job running across a cluster.
+
+
By the end of their computing run, the 8000 jobs had used roughly a year of computing in less than two days. The output included a climate score between zero and ten for each of the 8000 species, corresponding to how similar a species’ original climate is to the climates of the United States.
+
+
Currently, different panels of experts are reviewing species with climate scores above 6 to determine which of them could jeopardize US ecosystems. This expert insight will inform FWS’s regulation and management of the species traded in the United States, ultimately preventing the arrival of those that are likely to be invasive.
+
+
Invasive species disrupt ecological interactions, contributing to the population decline and extinction of native species. But beyond their environmental consequences, these non-native species impact property values, tourism activities, and agricultural yields. Hopefully, the results of Erickson’s high-throughput horizon screen will prevent these costs before they’re endured –– all by using HTC to solve big problems, through small steps.
+
+
…
+
+
Erickson co-authored an open-access tutorial to help other environmental scientists and biologists who are getting started with HTCondor.
+Erickson’s team hopes to make the results from this project publicly available in 2022.
The following sections detail the processes for requesting a new CHTC account,
+or for continuing to use an existing CHTC account. Use of CHTC services are free
+to use in support of University of Wisconsin - Madison’s research and teaching mission.
+
+
Current Member of UW - Madison
+
+
If you are a current student, staff, or faculty at UW - Madison, you can request an account
+by completing the Account Request Form. A staff member from CHTC will follow
+up with next steps.
+
+
All accounts require an active NetID and a faculty sponsor (typically the PI that is leading
+your research project.)
+
+
Graduating from UW - Madison
+
+
We understand that some users may need to continue carrying out their computational
+analyses after graduation and the subsequent expiration of their NetID.
+
+
Once you are no longer enrolled in or employed by the University, you can continue
+to use your CHTC account as an “External Collaborator”. Follow the instructions in
+the section below to have your faculty advisor sponsor your continued access to
+CHTC.
+
+
We highly recommend reaching out to CHTC staff before your NetID expires, if possible.
+
+
+
Our policy is that CHTC accounts are deactivated and user data is erased after a user
+is no longer actively using their account (~1 year of inactivity). It is your responsibility
+to maintain your data and important files in a location that is not CHTC’s file systems.
+
+
+
External Collaborator
+
+
If you are not a current member of UW - Madison, you can gain access to CHTC provided
+that you are sponsored by a faculty member of UW - Madison. To begin the account
+request process, have your Faculty Sponsor email CHTC (chtc@cs.wisc.edu) and provide:
+
+
+
Your name,
+
The reason you need (continued) access to CHTC resources,
+
The amount of time they would like to sponsor your account,
+
Your city/country of residence, and
+
Your institution.
+
+
+
CHTC staff will then follow up with next steps to create or extend your account.
+
+
+
Your faculty sponsor can sponsor your account for up to one year at a time. If
+you need continued access past one year, your faculty sponsor must contact us and
+re-confirm that you should have continued access.
+
Our policy is that CHTC accounts are deactivated and user data is erased after a user
+is no longer actively using their account (~1 year of inactivity). It is your responsibility
+to maintain your data and important files in a location that is not CHTC’s file systems.
This guide describes the general process for creating an Apptainer container.
+Specifically, we discuss the components of the “definition file” and how that file is used to construct or “build” the container itself.
+
+
For instructions on using and building Apptainer containers
The instructions for how Apptainer should build a container are located in the definition file (typically suffixed with .def).
+The following table summarizes the sections that are routinely used when constructing a container:
+
+
+
+
+
Section
+
Description
+
+
+
+
+
“Header”
+
Choose an existing container to start from.
+
+
+
%files
+
Add existing files (e.g., pre-downloaded source code) to use in the container.
+
+
+
%post
+
Installation commands for adding libraries/programs to the container.
+
+
+
%environment
+
Automatically set environment variables when the container is started to help find installed programs.
+
+
+
%labels
+
Add information or metadata to help identify the container and its contents.
+
+
+
%help
+
Add text to help others use the container and its contents.
+
+
+
+
+
With the exception of the “Header”, sections in the definition file begin with a line starting with %name_of_section and all subsequent lines belong to that section until the end of the file or the next %section line is encountered.
+Typically the contents of a section are indented to help visually distinguish the different parts.
+
+
Additional sections can be specified, though not all may be functional when using the container on CHTC systems.
+For additional information on Apptainer definition files, see the Apptainer documentation.
+The manual page provides a full reference on the different sections of the definition file.
+
+
+
Note that the %runscript section is ignored when the container is executed on the High Throughput system.
+
+
+
Header section
+
+
This must be the first section of the definition file.
+The header specifies the container image that Apptainer should start with.
+Apptainer will load this container image before attempting to execute the build commands.
+
+
Most users will use
+
+
Bootstrap: docker
+From: user/repo:tag
+
+
+
where user/repo:tag is any valid address to a Docker-based container registry.
+For example,
If you just want to convert an existing Docker container into an Apptainer container, you do not need to use a definition file.
+Instead, you can directly run the apptainer build command using the Docker address, as described below.
+
+
+
Files section
+
+
The %files section is used to copy files from the machine that is running Apptainer (the “host”) into the container that Apptainer is building.
+This section is typically used when you have the source code saved on the host and want to extract/compile/install it inside of the container image.
+
+
+
While the container is being built on the host system, by default it does not have direct access to files located on the host system.
+The %files section serves as the bridge between the host system and the container being built.
+
+
+
The syntax for use is
+
+
%files
+ file_on_host file_in_container
+
+
+
where file_on_host is in the same directory as the .def definition file, and where file_in_container will be copied to the container’s root (/) by default.
+You can instead provide absolute paths to the files on the host or in the container, or both.
+For example:
If the directories in the path in the container do not already exist, they will be created.
+
+
+
Post section
+
+
The %post section contains any and all commands to be executed when building the container.
+Typically this involves first installing packages using the operating system’s package manager and then compiling/installing your custom programs.
+Environment variables can be set as well, but they will only be active during the build (use the %environment section if you need them active during run time).
+
+
For example, if using an ubuntu based container, then you should be able to use the apt package manager to install your program’s dependencies.
Note that we have used the -y option for apt to pre-emptively agree to update apt and to install the gcc, make, and wget packages.
+Otherwise, the apt command will prompt you to confirm the executions via the command line.
+But since the Apptainer build process is executed non-interactively, you will be unable to enter a response via the command line, and the commands will eventually time out and the build fail.
+
+
Once you install the dependencies you need using the operating system’s package manager, you can use those packages to obtain and install your desired program.
+For example, the following commands will install the GNU Units command units.
+
+
mkdir -p /opt/units-source
+ cd /opt/units-source
+ wget https://ftp.gnu.org/gnu/units/units-2.23.tar.gz
+ tar -xzf units-2.23.tar.gz
+ cd units-2.23
+ ./configure
+ make
+ make install
+
+
+
If using the default installation procedure, your program should be installed in and detectable by the operating system.
+If not, you may need to manually environment variables to recognize your program.
+
+
Environment section
+
+
The %environment section can be used to automatically set environment variables when the container is actually started.
+
+
For example, if you installed your program in a custom location /opt/my-program and the binaries are in the bin/ folder, you could use this section to add that location to your PATH environment variable:
Effectively, this section can be used like a .bashrc or .bash_profile file.
+
+
+
Labels section
+
+
The %labels section can be used to provide custom metadata about the container, which can make it easier for yourself and others to identify the nature and provenance of a container.
where LabelName is the name of the label, and LabelValue is the corresponding value.
+For example,
+
+
%labels
+ Author Bucky Badger
+ ContactEmail bbadger@wisc.edu
+ Name Bucky's First Container
+
+
+
will generate the metadata in the container showing the Author as Bucky Badger, the ContactEmail as bbadger@wisc.edu, and the container Name as Bucky's First Container.
+
+
For an existing container, you can inspect the metadata with the command apptainer inspect my_container.sif.
+
+
+
For a container with the %labels in the above example, you should see the following output:
The %help section can be used to provide custom help text about how to use the container.
+This can make it easier for yourself and others to interact and use the container.
+
+
For example,
+
+
%help
+ This container is based on Ubuntu 22.04 and has the GNU Units command installed.
+ You can use the command `units` inside this container to convert from one unit of measurement to another.
+ For example,
+ $ units '1 GB' 'MB'
+ returns
+ * 1000
+ / 0.001
+
+
+
For an existing container, you can inspect the help text with the command apptainer run-help my-container.sif.
+
+
The Apptainer Container Image
+
+
The actual container image, which can be executed by Apptainer as a stand-alone operating system, is stored in a .sif file.*
+The instructions for constructing the .sif file are provided by the .def definition file, as described above.
+Basically, the .sif file is a compression of all of the files in the stand-alone operating system that comprises a “container”.
+Apptainer can use this one file to reconstitute the container at runtime.
+
+
* sif stands for “Singularity Image File”; Apptainer is formerly an open-source fork of the original program called Singularity.
+
+
Building the container
+
+
To create the .sif file from the .def file, you need to run the command
Here the syntax is to provide the name of the .sif file that you want to create and then provide the name of the existing .def definition file.
+
+
+
Don’t run the apptainer build command on the login server!
+Building the container image can be an intensive process and can consume the resources of the login server.
+
+
+
On the High Throughput system, first start an interactive build job as described in our Use Apptainer Containers guide.
+
On the High Performance system, first launch an interactive Slurm session as described here.
+
+
+
+
Converting a Docker image to an Apptainer container image
+
+
You can directly convert an existing Docker container into an Apptainer container image without having to provide a definition file.
+To do so, use the command
where user/repo:tag is any valid address to a Docker-based container registry. (For example, rocker/tidyverse:4.1.3 from DockerHub or nvcr.io/nvidia/tensorflow:24.02-tf2-py3 from NVIDIA Container Registry.)
+
+
Testing the container interactively
+
+
After building your container, we strongly recommend that you start it up and check that your program has been installed correctly.
+Assuming that you are in an interactive session (i.e., not on the login server), then you can run
+
+
apptainer shell my-container.sif
+
+
+
This command should log you into a terminal that is backed by the container’s operating system.
+
+
+
On the High Throughput system, you can instead submit an interactive job that uses the .sif file as the container_image.
+In this case, you do not need to run any apptainer commands, as HTCondor has automatically done so before you load into the interactive session.
+
+
+
Then you can check that the files are in the right place, or that your program can be found.
+An easy way to check if your program is at least in recognized by the container is to try to print the help text for the program.
+
+
For example,
+
+
[username@hostname ~]$ apptainer shell units.sif
+Apptainer> units --help
+
+Usage: units [options] ['from-unit' 'to-unit']
+
+<additional output truncated>
+
+
+
+
By default, only your current directory will be mounted into the container, meaning the only files you can see from the host system are those in the directory where you ran the command.
+
+
Furthermore, the interactive container session may inherit environment variables from your terminal session on the host system, which may conflict with the container environment.
+In this case, use the -e option to use a “clean” environment for the interactive session: apptainer shell -e my-container.sif.
+
+
+
Special Considerations for Building Your Container
+
+
+
+
+
+
Non-interactive
+
+
Because the container build is a non-interactive process, all commands within the .def file must be able to execute without user intervention.
+
+
+
Be prepared to troubleshoot
+
+
A consequence of the non-interactive build is that when something goes wrong, the build process will fail without creating a .sif file.
+That in turn means that when the build is restarted, it does so from completely from scratch.
+
+
It is rare to correctly write your .def file such that the container builds successfully on your first try!
+Do not be discouraged - examine the build messages to determine what went wrong and use the information to correct your .def file, then try again.
+
+
+
Multi-stage build
+
+
It is possible to have a multi-stage build.
+In this scenario, you have two .def files.
+You use the first one to construct an intermediary .sif file, which you can then use as the base for the second .def file.
+In the second .def file, you can specify
+
+
Bootstrap: localimage
+From: path/to/first.sif
+
+
+
+
.sif files can be large
+
+
If you are installing a lot of programs, the final .sif image can be large, on the order of 10s of gigabytes.
+Keep that in mind when requesting disk space.
+On the High Throughput system, we encourage you to place your container image on the /staging system.
+
+
+
Files cannot be created or modified after the container has been built
+
+
While you can read and execute any file within the container, you will not be able to create or modify files in the container once it has been built.
+The exception is if the location is “mounted” into the container, which means that there is a corresponding location on the host system where the files will be stored.
+Even then, you will only be allowed to create/modify files in that location if you would be able to normally without a container.
+
+
This behavior is intentional as otherwise it would be possible for users to modify files on the host machine’s operating system, which would be a signicant security, operations, and privacy risk.
+
+
+
Manually set a HOME directory
+
+
Some programs create .cache directories and may attempt to do so in the user’s “HOME” directory.
+When executing in a container, however, the user typically does NOT have a “HOME” directory.
+In this case, some programs default to creating the directory in the root / directory.
+This will not work for reasons in the previous item.
+
+
One workaround may be to manually set the HOME environment variable after the container has started.
+On CHTC systems, the following should address this issue:
+
+
export HOME=$(pwd)
+
+
+
If this does not address the issue, examine the error messages and consult the program documentation for how configure the program to use an alternate location for cache or temporary directories.
Similar to Docker containers, Apptainer environments allow users to prepare portable software and computing environments that can be sent to many jobs.
+This means your jobs will run in a more consistent environment that is easily reproducible by others.
The definition (.def) file contains the instructions for what software to install while building the container.
+CHTC provides example definition files in the software folder of our Recipes GitHub repository. We strongly recommend that you use one of the existing examples as the starting point for creating your own container.
+
+
To create your own container using Apptainer, you will need to create a definition (.def) file.
+We encourage you to read our Building an Apptainer Container guide to learn more about the components of the Apptainer definition file.
+
+
Regarding MPI
+
+
We are still in the process of developing guidance for deploying MPI-based software in containers on the High Performance system.
+The instructions in this guide should work for single-node jobs.
+Multi-node jobs require the MPI installed in the container to integrate with Slurm and/or the cluster installation of MPI, and we are still exploring how to do so.
+
+
Start an Interactive Session
+
+
Building a container can be a computationally intense process.
+As such, we require that you only build containers while in an interactive session.
+On the High Performance system, you can use the following command to start the interactive session:
To build a container, Apptainer uses the instructions in the .def file to create a .sif file. The .sif file is the compressed collection of all the files that comprise the container.
+
+
To build your container, run this command:
+
+
apptainer build my-container.sif image.def
+
+
+
Feel free to rename the .sif file as you desire; for the purposes of this guide we are using my-container.sif.
+
+
As the command runs, a variety of information will be printed to the terminal regarding the container build process.
+Unless something goes wrong, this information can be safely ignored.
+Once the command has finished running, you should see INFO: Build complete: my-container.sif.
+Using the ls command, you should now see the container file my-container.sif.
+
+
If the build command fails, examine the output for error messages that may explain why the build was unsuccessful.
+Typically there is an issue with a package installation, such as a typo or a missing but required dependency.
+Sometimes there will be an error during an earlier package installation that doesn’t immediately cause the container build to fail.
+But, when you test the container, you may notice an issue with the package.
+
+
If you are having trouble finding the error message, edit the definition file and remove (or comment out) the installation commands that come after the package in question.
+Then rebuild the image, and now the relevant error messages should be near the end of the build output.
Once your container builds successfully, it is important to test it to make sure you have all software, packages, and libraries installed correctly.
+
+
To test your container, use the command
+
+
apptainer shell -e my-container.sif
+
+
+
You should see your command prompt change to Apptainer>.
+
+
The shell command logs you into a terminal “inside” the container, with access to the libraries, packages, and programs that were installed in the container following the instructions in your image.def file.
+(The -e option is used to prevent this terminal from trying to use the host system’s programs.)
+
+
While “inside” the container, try to run your program(s) that you installed in the container.
+Typically it is easiest to try to print your program’s “help” text, e.g., my-program --help.
+If using a programming language such as python3 or R, try to start an interactive code session and load the packages that you installed.
+
+
If you installed your program in a custom location, consider using ls to verify the files are in the right location.
+You may need to manually set the PATH environment variable to point to the location of your program’s executable binaries.
+For example,
+
+
export PATH=/opt/my-program/bin:$PATH
+
+
+
Consult the “Special Considerations” section of our Building an Apptainer Container guide for additional information on setting up and testing your container.
+
+
When you are finished running commands inside the container, run the command exit to exit the container.
+Your prompt should change back to something like [username@spark-a006 directory]$.
+If you are satisfied with the container that you built, run the exit command again to exit the interactive Slurm session.
+
+
Use an Apptainer Container in HPC Jobs
+
+
Now that you have the container image saved in the form of the .sif file, you can use it as the environment for running your HPC jobs.
+
+
For execution on a single node, we recommend adding the following commands to your sbatch script:
We are still in the early stages of deploying containers on the High Performance system.
+A complicating factor is the construction of the .def file to deploy MPI on the system to allow for execution across multiple nodes.
+If you are interested in mutli-node execution using containers, contact a facilitator for more information.
Sometimes the program you want to use does not have a pre-existing container that you can build on top of.
+Then you will need to install the program and its dependencies inside of the container.
+In this example, we will show how to install the program SUMO in a container, as an illustration of how to build a container more-or-less from scratch.
First, you will need to choose a base image for the container.
+Consult the documentation for the program you want to install to make sure you select a compatible operating system.
+
+
For this example, we will use the most recent LTS version of Ubuntu from Docker.
+The beginning of the image.def file should look like this:
+
+
Bootstrap: docker
+From: ubuntu:22.04
+
+
+
2. Add the Installation Commands
+
+
All of the installation commands that you want Apptainer to execute during the container build step are provided in the %post section of the definition file.
+
+
Setting up non-interactive installation
+
+
First, you may need to instruct programs that you are executing commands in a non-interactive environment.
+There can be issues with installing packages in a container that would not normally occur when installing manually in the terminal.
+
+
On the HTC system in particular, the /tmp directory inside of the container needs to be given global read/write permissions.
+This can be done by adding the following line at the start of the %post section:
+
+
chmod 777 /tmp
+
+
+
Similarly, some packages require that the user answer interactive prompts for selecting various options.
+Since the Apptainer build is non-interactive, this can cause the package installation to hang.
+While this isn’t an issue in the present example, the issue can be avoided by adding the following line near the start of the %post section:
+
+
DEBIAN_FRONTEND=noninteractive
+
+
+
Note that this particular command only applies to Debian-based container images, such as Ubuntu.
Note that we are using the built-in package manager (apt) of Ubuntu, since that is the base operating system we chose to build on top of.
+If you choose a different operating system, you may need to use a different package manager.
+
+
In this case, the first command is apt-get update which will update the list of available packages.
+This is necessary to get the latest versions of the packages in the following apt-get install command.
+
+
The apt-get install command will install the dependencies required by the SUMO program.
+
+
+
Note that these installation commands do not use sudo, as Apptainer already has permissions to install programs in the container.
While the %post section now contains all of the instructions for installing and compiling your desired program,
+you likely need to add commands for setting up the environment so that the shell recognizes your program.
+This is typically the case if your program compiled successfully but you still get a “command not found” error when you try to execute it.
+
+
To set environment variables automatically when your container runs, you need to add them to the %environment section before you build the container.
+
+
For example, in the %post section there is the command export SUMO_HOME=/sumo, which sets the environment variable SUMO_HOME to the location of the sumo directory.
+This environment variable, however, is only active during the installation phase of the container build, and will not be set when the container is actually run.
+Thus, we need to set SUMO_HOME and update PATH with the location of the SUMO bin folder by using the %environment section.
+
+
We therefore add the following lines to the image.def file:
We can now build the container using this definition file.
+
+
+
For more information on the components of an Apptainer definition (.def) file and container image file (.sif), see our Building an Apptainer Container guide.
+
For information on building and using the container on the HTC system, see our Use Apptainer Containers guide.
HTCondor supports the use of Apptainer (formerly known as Singularity) environments for jobs on the High Throughput Computing system.
+
+
Similar to Docker containers, Apptainer environments allow users to prepare portable software and computing environments that can be sent to many jobs.
+This means your jobs will run in a more consistent environment that is easily reproducible by others.
+
+
Container jobs are able to take advantage of more of CHTC’s High Throughput resources because the operating system where the job is running does not need to match the operating system where the container was built.
If you or a group member have already created the Apptainer .sif file, or are using a container from reputable sources such as the OSG, follow these steps to use it in an HTCondor job.
+
+
1. Add the container .sif file to your submit file
As always with the High Throughput system, submit a single test job and confirm that your job behaves as expected.
+If there are issues with the job, you may need to modify your executable, or even (re)build your own container.
+
+
Build your own container
+
+
If you need to create your own container for the software you want to use, follow these steps.
+For more information on any particular step, jump to the corresponding section later in this guide.
+
+
+
+
1. Create a definition file
+
+
The definition (.def) file contains the instructions for what software to install while building the container.
+CHTC provides example definition files in the software folder of our Recipes GitHub repository. Choose from one of the existing examples, or create your own using the instructions later in this guide.
Start an interactive build job (an example submit file build.sub is provided below).
+Be sure to include your .def file in the transfer_input_files line, or else create the file once the interactive job starts using a command line editor.
While in an interactive build job, run the command
+
+
apptainer build my-container.sif image.def
+
+
+
If the container build finishes successfully, then the container image (.sif) file is created.
+This file is used for actually executing the container.
Once you’ve built the container, use the instructions above to use the container in your HTCondor job.
+
+
Create a Definition File
+
+
To create your own container using Apptainer, you will need to create a definition (.def) file.
+For the purposes of this guide, we will call the definition file image.def.
+
+
CHTC provides example definition files in the software folder of our Recipes GitHub repository. We strongly recommend that you use one of the existing examples as the starting point for creating your own container.
+
+
If the software you want to use is not in the CHTC Recipes repository, you can create your own container. Here is general process for creating your own definition file for building your custom container:
+
+
+
+
Consult your software’s documentation
+
+
Determine the requirements for installing the software you want to use.
+In particular you are looking for (a) the operating systems it is compatible with and (b) the prerequisite libraries or packages.
+
+
+
Choose a base container
+
+
The base container should at minimum use an operating system compatible with your software.
+Ideally the container you choose also has many of the prerequisite libraries/programs already installed.
+
+
+
Create your own definition file
+
+
The definition file contains the installation commands needed to set up your software.
+We encourage you to read our Building an Apptainer Container guide to learn more about the components of the Apptainer definition file.
+An advanced example of a definition file is provided in our Advanced Apptainer Example - SUMO guide.
+
+
+
+
A simple definition file
+
+
As a simple example, here is the .def file that uses an existing container with python installed inside (python:3.11, from DockerHub),
+and furthermore installs the desired packages cowsay and tqdm:
Remember that the .def file contains the instructions for creating your container and is not itself the container.
+To use the software defined within the .def file, you will need to first “build” the container and create the .sif file, as described in the following sections.
Building a container can be a computationally intense process.
+As such, we require that you only build containers while in an interactive build job.
+On the High Throughput system, you can use the following submit file build.sub:
+
+
+
# build.sub
+# For building an Apptainer container
+
+universe = vanilla
+log = build.log
+
+# In the latest version of HTCondor on CHTC, interactive jobs require an executable.
+# If you do not have an existing executable, use a generic linux command like hostname as shown below.
+executable = /usr/bin/hostname
+
+# If you have additional files in your /home directory that are required for your container, add them to the transfer_input_files line as a comma-separated list.
+transfer_input_files = image.def
+
+requirements = (HasCHTCStaging == true)
+
++IsBuildJob = true
+request_cpus = 4
+request_memory = 16GB
+request_disk = 16GB
+
+queue
+
+
+
Note that this submit file assumes you have a definition file named image.def in the same directory as the submit file.
+
+
Once you’ve created the submit file, you can submit an interactive job with the command
+
+
condor_submit -i build.sub
+
+
+
+
Apptainer .sif files can be fairly large, especially if you have a complex software stack.
+If your interactive job abruptly fails during the build step, you may need to increase the value of request_disk in your submit file.
+In this case, the .log file should have a message about the reason the interactive job was interrupted.
Once the interactive build job starts, confirm that your image.def was transferred to the current directory.
+
+
To build a container, Apptainer uses the instructions in the .def file to create a .sif file. The .sif file is the compressed collection of all the files that comprise the container.
+
+
To build your container, run this command:
+
+
apptainer build my-container.sif image.def
+
+
+
Feel free to rename the .sif file as you desire; for the purposes of this guide we are using my-container.sif.
+
+
As the command runs, a variety of information will be printed to the terminal regarding the container build process.
+Unless something goes wrong, this information can be safely ignored.
+Once the command has finished running, you should see INFO: Build complete: my-container.sif.
+Using the ls command, you should now see the container file my-container.sif.
+
+
If the build command fails, examine the output for error messages that may explain why the build was unsuccessful.
+Typically there is an issue with a package installation, such as a typo or a missing but required dependency.
+Sometimes there will be an error during an earlier package installation that doesn’t immediately cause the container build to fail.
+But, when you test the container, you may notice an issue with the package.
+
+
If you are having trouble finding the error message, edit the definition file and remove (or comment out) the installation commands that come after the package in question.
+Then rebuild the image, and now the relevant error messages should be near the end of the build output.
+
+
Once the image is built, it is important to test it to make sure you have all software, packages, and libraries installed correctly.
Once your container builds successfully, we highly encourage you to immediately test the container while still in the interactive build session.
+
+
To test your container, use the command
+
+
apptainer shell -e my-container.sif
+
+
+
You should see your command prompt change to Apptainer>.
+
+
The shell command logs you into a terminal “inside” the container, with access to the libraries, packages, and programs that were installed in the container following the instructions in your image.def file.
+(The -e option is used to prevent this terminal from trying to use the host system’s programs.)
+
+
While “inside” the container, try to run your program(s) that you installed in the container.
+Typically it is easiest to try to print your program’s “help” text, e.g., my-program --help.
+If using a programming language such as python3 or R, try to start an interactive code session and load the packages that you installed.
+
+
If you installed your program in a custom location, consider using ls to verify the files are in the right location.
+You may need to manually set the PATH environment variable to point to the location of your program’s executable binaries.
+For example,
+
+
export PATH=/opt/my-program/bin:$PATH
+
+
+
Consult the “Special Considerations” section of our Building an Apptainer Container guide for additional information on setting up and testing your container.
+
+
When you are finished running commands inside the container, run the command exit to exit the container.
+Your prompt should change back to something like [username@build4000 ~]$.
Since Apptainer .sif files are routinely more than 1GB in size, we recommend that you transfer my-container.sif to your /staging directory.
+It is usually easiest to move the container file directly to staging while still in the interactive build job:
+
+
mv my-container.sif /staging/$USER
+
+
+
If you do not have a /staging directory, you can skip this step and the .sif file will be automatically transferred back to the login server when you exit the interactive job.
+We encourage you to request a /staging directory, especially if you plan on running many jobs using this container.
+See our Managing Large Data in Jobs guide for more information on using staging.
Now that you have the container image saved in the form of the .sif file, you can use it as the environment for running your HTCondor jobs.
+In your submit file, specify the image file using the container_image command.
+HTCondor will automatically transfer the .sif file and automatically execute your executable file inside of the container; you do not need to include any apptainer commands in your executable file.
+
+
If the .sif file is located on the login server, you can use
+
+
container_image = my-container.sif
+
+
+
although we generally don’t recommend this, since .sif files are large and should instead be located in staging.
The full submit file otherwise looks like normal, for example:
+
+
# apptainer.sub
+
+# Provide HTCondor with the name of your .sif file and universe information
+container_image = file:///staging/path/to/my-container.sif
+
+executable = myExecutable.sh
+
+# Include other files that need to be transferred here.
+# transfer_input_files = other_job_files
+
+log = job.log
+error = job.err
+output = job.out
+
+requirements = (HasCHTCStaging == true)
+
+# Make sure you request enough disk for the container image in addition to your other input files
+request_cpus = 1
+request_memory = 4GB
+request_disk = 10GB
+
+queue
+
+
+
Then use condor_submit with the name of your submit file:
+
+
condor_submit apptainer.sub
+
+
+
If you are using +WantFlocking or +WantGliding as described in our Scale Beyond Local HTC Capacity guide, then you should instead use
From the user’s perspective, a container job is practically identical to a regular job.
+The main difference is that instead of running on the execute point’s default operation system, the job is run inside the container.
+
+
When you submit a job to HTCondor using a submit file with container_image set, HTCondor automatically handles the process of obtaining and running the container.
+The process looks roughly like
+
+
+
Claim machine that satisifies submit file requirements
+
Pull (or transfer) the container image
+
Transfer input files, executable to working directory
+
Run the executable script inside the container, as the submit user, with key directories mounted inside (such as the working directory, /staging directories, etc.)
+
Transfer output files back to the submit server
+
+
+
For testing purposes, you can replicate the behavior of a container job with the following command.
+First, start an interactive job.
+Then run this command but change my-container.sif and myExecutable.sh to the names of the .sif and .sh files that you are using:
+ HPC System Transition to a New Linux Version (CentOS Stream 9)
+
+
+
Starting in May 2024, CHTC’s high performance computing (HPC) cluster began upgrading
+the Linux distribution and version we use on our servers to CentOS Stream 9 (EL9). This transition is expected to complete in June 2024.
All updates to the HPC Cluster will be reflected on this page; significant changes may
+also include a notification to the chtc-users mailing list.
+
+
Important Dates
+
+
+
May 31: Log in to upgraded cluster login node is available. Worker nodes start transitioning from the existing cluster to upgraded cluster partitions.
+
May 31 - June 17: Users should rebuild their code and test jobs on the upgraded cluster. Users should be running primarily on the upgraded cluster.
+
June 17: Most nodes will have been upgraded and transitioned.
+
June 24: The old cluster partitions are closed.
+
+
+
What is Changing
+
+
As part of this transition, there will be a new login node for
+the HPC cluster: spark-login.chtc.wisc.edu.
+
+
If you log into spark-login, you will have access to a new
+module stack, compiled on CentOS Stream 9, and the partitions available will
+have worker nodes that are running CentOS Stream 9.
+
+
The files in your /home and /scratch directories will be unchanged.
+
+
What You Need to Do
+
+
As soon as possible, do the following:
+
+
+
+
Log into the new login node spark-login.chtc.wisc.edu.
+
+
+
Run a typical job as a test. It is highly likely that certain codes will
+fail on the new worker nodes, as the underlying dependencies of your code, including
+the operating system, and any modules used, have changed.
+
+
+
If your jobs no longer run, archive your previous software installation(s) and
+rebuild your software. The Software Changes section below has
+more specific information about how to do this.
+
+
+
If you recompiled your code, run a few small test jobs to confirm that the
+code is working correctly.
+
+
+
+
If you are having trouble getting your jobs to run successfully on the new operating system,
+please contact the facilitation team at chtc@cs.wisc.edu or come to office hours
+
+
Software Changes
+
+
Almost all packages and libraries have been upgraded as part of the operating system transition.
+Unless your code is fairly simple, you will likely need to recompile it.
+
+
Remember to always compile your code/programs in a (interactive) Slurm job! How To
+
+
+
Not only does this help avoid stressing the resources of the login server, but the upgraded login server uses a newer CPU architecture than the worker nodes in the cluster.
+Most compilers auto-detect the CPU architecture and adapt the compilation to use that architecture.
+Attempting to use such compiled code on a different/older CPU architecture can lead to “Illegal instruction” errors, among others.
+
+
+
Modules
+
+
Most of the modules on the upgraded cluster have been kept, but with upgraded versions.
+The following table is a comparison of the modules on the old operating system (EL8) versus the new operating system (EL9).
+(Adapted from the output of module avail on the respective servers.)
+
+
You will likely need to recompile your code to use the new module versions.
+Remember to also update any module load commands that specify a particular version of the module,
+otherwise you may encounter “module(s) are unknown” errors.
+
+
Module comparison
+
+
+
+
+
Module name
+
Old version (EL8)
+
New version (EL9)
+
+
+
+
+
abaqus
+
2018-hotfix-1904
+
TBD
+
+
+
ansys
+
2022r24
+
2024r1
+
+
+
aocc
+
3.2.0
+
4.2.0
+
+
+
cmake
+
3.27.7
+
3.27.9
+
+
+
comsol
+
6.0, 6.1, 6.2
+
6.2
+
+
+
gcc
+
11.3.0
+
13.2.0
+
+
+
hdf5 (intel-oneapi-mpi)
+
1.12.2
+
dropped
+
+
+
hdf5 (openmpi)
+
1.12.2
+
1.14.3
+
+
+
intel-oneapi-compilers
+
2023.2.1
+
2024.1.0
+
+
+
intel-oneapi-mkl
+
2023.2.0
+
2024.0.0
+
+
+
intel-oneapi-mpi
+
2021.10.0
+
2021.12.1
+
+
+
intel-tbb
+
2021.9.0
+
deprecated
+
+
+
lmstat.comsol
+
6.0
+
TBD
+
+
+
lumerical-fdtd
+
2022-r2.4
+
2024-R1.2
+
+
+
matlab
+
R2021b, R2022b
+
R2024a
+
+
+
mvapich2
+
2.3.7-1
+
deprecated
+
+
+
mvapich
+
n/a
+
3.0
+
+
+
netcdf-c
+
4.8.1
+
4.9.2
+
+
+
netcdf-cxx4
+
4.3.1
+
4.3.1
+
+
+
netcdf-fortran
+
4.5.4
+
4.6.1
+
+
+
openmpi (aocc)
+
4.1.3
+
dropped
+
+
+
openmpi (gcc)
+
4.1.3
+
5.0.3
+
+
+
patchelf (gcc)
+
0.17.2
+
0.17.2
+
+
+
patchelf (intel)
+
0.18.0
+
dropped
+
+
+
patchelf (oneapi)
+
0.18.0
+
0.17.2
+
+
+
petsc
+
3.18.1
+
3.21.1
+
+
+
pmix
+
n/a
+
5.0.1
+
+
+
+
+
+
Different versions of module packages, or packages that are “dropped” or “deprecated” may be manually installed by the user using Spack.
+
+
+
Spack
+
+
Spack is a package manager platform that allows users to install software without admin privileges.
+CHTC also uses Spack to install the software underlying the system-wide modules discussed above.
+
+
+
If you have not used Spack before, you can skip this section and go directly to the Set Up Spack on HPC guide.
+
+
+
Here is the general process for setting up your software on the upgraded EL9 system; detailed instructions are provided after the general process:
+
+
+
+
Identify the environments you currently have and which you want to reproduce on the upgraded system.
+
+
+
Remove your existing Spack folders.
+
+
+
Do a clean installation of Spack.
+
+
+
In an interactive job, create your Spack environment(s) and install the packages as you did previously.
+
+
+
Update your job submission scripts and/or recompile programs as needed to use the new Spack environment(s).
+
+
+
+
The following instructions assume that you previously installed Spack in your home (~/) directory for individual use.
+
+
1. Identify your environments
+
+
You can see your Spack environments with
+
+
spack env list
+
+
+
Activate an environment that you want to replicate with
+
+
spack env activate environment_name
+
+
+
Then list your package “specs” with the command
+
+
spack find
+
+
+
There is a section “==> Root specs” that lists the package specs you explicity added when you created your environment.
+Save a copy of these specs somewhere safe, so that you can use them to replicate the environment later on.
+You can ignore the “installed packages” section, as that will certainly change on the new system.
+
+
Repeat the above steps for each environment you want to replicate on the upgraded system.
+
+
2. Remove your existing Spack folders
+
+
The easiest way to update Spack for the upgraded system is to remove the current Spack installation and reinstall from scratch.
+
+
+
Before proceeding, you may want to make a backup of each folder using
+
+
tar -czf folder_name.tar.gz ~/folder_name
+
+
+
+
For most users, the following commands should work:
+
+
cd ~/
+rm -rf spack spack_programs spack_modules .spack
+
+
+
The command may take a while to run.
+
+
3. Fresh install of Spack
+
+
Next, follow the instructions in our guide Set Up Spack on HPC to do a fresh installation of Spack.
+The commands in the guide have been updated for setting up Spack on the new operating system.
+
+
4. Recreate your environments
+
+
Follow the instructions in our guide Install Software Using Spack to create your desired environments
+using the “root specs” that you saved earlier.
+
+
NOTE: We’ve made small but important change to this guide: you should always start an interactive Slurm job before creating or modifying a Spack environment.
+The login server uses different hardware than the execute servers, and the mismatch leads to Spack using the wrong settings for installing packages.
+Of course, as before, you should only install packages while in interactive Slurm job.
+
+
Behind the scenes, we’ve made a few changes to the configuration that will hopefully make the package installation much smoother.
+
+
5. Update your workflow
+
+
Finally, remember to update your workflow to use the new Spack environments and the packages installed therein.
+
+
+
+
If you explicitly provide paths to packages installed using Spack, be sure to update those paths in your compiler configuration or in your job submission script.
+
+
+
If you used Spack to provide dependencies for manually compiling a program, remember to recompile the program.
+
+
+
If you changed the name of your environment, be sure to update the name in your job submission script.
The Center for High Throughput Computing’s High Performance Cluster is being
+replaced by a new High Performance System! All components of the system (execute
+nodes, network, shared file system) are new and we expect an improved experience for our HPC users.
+
+
ALL USERS OF THE EXISTING HPC SYSTEM WILL NEED TO MIGRATE TO THIS NEW CLUSTER.
+Importantly, access to the existing cluster will be phased out in early 2022.
+CHTC staff are here to assist with your transition.
+
+
Highlights
+
+
+
The existing HPC cluster is being replaced by a new cluster. After February 2023
+ALL users will lose access to the existing cluster, and all user files will be
+deleted.
+
Custom software will need to be reinstalled and jobs will need to be tested on
+the new cluster.
+
The univ2 partition is being renamed, and partition policies have changed.
+
Users should avoid using mpirun and instead should use srun to execute their
+MPI code.
+
+
Note: At this time, interactive jobs on the “Spark” HPC Cluster can not be
+used to run MPI code.
+
+
+
File system conventions have changed - jobs will now use /scratch/$USER to run,
+and /home/$USER will be mainly used for software installations and reference
+files.
+
+
+
Important Dates
+
+
+
Mid January 2023: New cluster available for general use
+
February 28, 2023: Jobs will no longer run on the old cluster
+
March 15, 2023: Access to hpclogin1.chtc.wisc.edu login node and old file
+system removed, Data for all users will be deleted on the old HPC system.
+
+
+
What You Need to Do
+
+
Move Data
+
+
Back up files from the old cluster to another system (e.g. your laptop), copy
+files you are actively working with to the new cluster, and delete all data off
+the old HPC system. All files in /home and /software will be deleted off the
+existing system starting March 15, 2023.
+
+
Log In and Quota Check
+
+
Confirm you can access the new cluster by logging into the new login node.
+
+
Prepare and Submit Test Jobs
+
+
After logging in, prepare and submit a few test jobs to confirm that your work
+will run, paying attention to these important changes:
+
+
+
Appropriate usage of /home and /scratch:
+
+
Jobs should be run out of /scratch/$USER. Your scratch directory has a quota of 100GB disk space and 250,000 items
+
Only use your /home directory for software installations and general job files and templates.
+Your /home directory has a quota of 20GB disk space and 250,000 items.
+
The /software directory is being phased out.
+
+
+
+
Build software with new modules: users will need to reinstall and/or rebuild
+their software on the new HPC cluster. Users may encounter different versions of
+common tools on the new cluster, so it is important to try installing your
+software early to ensure compatibility. If a software or library is not available
+that is necessary for your installation is not installed, contact CHTC staff (see
+our get help page).
+
+
+
Change MPI execution: Our systems administrators now recommend using srun
+with the --mpi=pmix flag instead of mpirun or mpiexec to execute MPI type code. It
+should look like this:
+ srun –mpi=pmix mpi_program
+
+
Change #SBATCH options: The new cluster has different partition names and
+different sized nodes. The main general use partition is now called shared
+instead of univ2 or univ3. We also recommend the following changes because
+most of our nodes now have 128 cores, so requesting multiple nodes is not
+advantageous if your jobs are smaller than 128 cores. We also now recommend requesting
+memory per core instead of memory per node, for similar reasons, using the --mem-per-cpu
+flag with units of MB. Here are our recommendations for different sized jobs:
The new cluster nodes have very fast local disk space on each node. If your code
+is able to use local space for certain parts of its calculations or is able to
+sync data between local spaces, it may be advantageous to use this disk to speed
+up your jobs. It is located at the following path on each node:
+
+
/local/$USER
+
+
+
New Cluster Specifications
+
+
Execute Nodes
+
+
We have 40 general use execute nodes, representing 5,120 cores of capacity.
+Server specs (Dell Poweredge R6525):
+While our High Throughput system has little in the way of pre-installed software,
+we've created resources to help users set up the software they want to use for running their jobs.
+
+
+{% endcapture %}
+{% include /components/directory.html title="Table of Contents" %}
+
+
Quickstart
+
+
+Click the button that corresponds to the language/program/software that you want to use.
+More information is provided in the Recipes repository and Containers sections.
+
+
+
+
+
+
+ Home
+
+
+ Profile
+
+
+ Contact
+
+
+
+
Home tab content
+
Profile tab content
+
Contact tab content
+
+
+
CHTC Recipes Repository
+
+
+CHTC provides examples for software and workflows for use on our systems in our "Recipes" repository on Github:
+https://github.com/CHTC/recipes.
+
+
+
Containers
+
+
+Many of the recipes in our Recipes repository involve building your own container.
+In this section, we provide a brief introduction into how to use containers for setting up your own software to run on the High Throughput system.
+
+
+
\ No newline at end of file
diff --git a/preview-calendar/uw-research-computing/archived/tensorflow-singularity-wait.html b/preview-calendar/uw-research-computing/archived/tensorflow-singularity-wait.html
new file mode 100644
index 000000000..4f3a5c619
--- /dev/null
+++ b/preview-calendar/uw-research-computing/archived/tensorflow-singularity-wait.html
@@ -0,0 +1,483 @@
+
+
+
+
+
+
+Running Tensorflow Jobs
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
This guide describes how to use a pre-built Tensorflow environment
+(implemented as a Singularity container)
+to run Tensorflow jobs in CHTC and on the OS Pool.
+
+
Overview
+
+
Typically, software in CHTC jobs is installed or compiled locally by
+individual users and then brought along to each job, either using the
+default file transfer or our SQUID web server. However, another option
+is to use a container system, where the software is installed in a
+container image. CHTC (and the OS Pool) have capabilities to access and
+start containers and run jobs inside them. One container option
+available in CHTC is Docker; another is
+Singularity.
+
+
In CHTC, our Singularity support consists of running jobs inside a
+pre-made Singularity container with an installation of Tensorflow. This
+Singularity set up is very flexible: it is accessible both in CHTC and
+on the OS Pool, and can be used to run Tensorflow either with
+CPUs or GPUs. This guide starts with a basic CPU example, but then goes
+on to describe how to use the Singularity Tensorflow container for GPUs,
+and also how to run on the OS Pool.
The submit file for jobs that use the Tensorflow singularity container
+will look similar to other CHTC jobs, except for the additional
+Singularity options seen below.
+
+
Submit File
+
+
# Typical submit file options
+universe = vanilla
+log = $(Cluster).$(Process).log
+error = $(Cluster).$(Process).err
+output = $(Cluster).$(Process).out
+
+# Fill in with your own script, arguments and input files
+# Note that you don't need to transfer any software
+executable = run_tensorflow.sh
+arguments =
+transfer_input_files =
+
+# Singularity settings
++SingularityImage = "/cvmfs/singularity.opensciencegrid.org/opensciencegrid/tensorflow:latest"
+Requirements = HAS_SINGULARITY == True
+
+# Resource requirements
+request_cpus = 1
+request_memory = 2GB
+request_disk = 4GB
+
+# Number of jobs
+queue 1
+
+
+
Sample Executable (Wrapper Script)
+
+
Your job will be running inside a container that has Tensorflow
+installed, so there should be no need to set any environment variables.
+
+
#!/bin/bash
+
+# your own code here
+python test.py
+
+
+
+
+
2. CPUs vs GPUs
+
+
The submit file above use a CPU-enabled version of Tensorflow. In order
+to take advantage of GPUs, make the following changes to the submit file
+above:
+
+
+
+
Request GPUs in addition to CPUs:
+
+
request_gpus = 1
+
+
+
+
Change the Singularity image to tensorflow with GPUs:
For more information about GPUs and how GPU jobs work in CHTC, see our
+GPU Jobs guide.
+
+
+
Limited GPU availablity in CHTC
+This Singularity/Tensorflow functionality is not yet available on
+CHTC's newer GPUs with a sufficiently high CUDA Capability.
+Therefore, for now, the best way to use this Singularity/Tensorflow
+environment with GPUs is by running jobs on the OS Pool (see
+below). We are working on having Singularity support on all CHTC GPUs
+soon.
+
+
+
+
+
3. Running on OS Pool
+
+
This Tensorflow environment can also be run on the OS Pool either as the CPU or GPU version.
The following commands will allow you to monitor the amount of disk
+space you are using in your home directory on our (or another) submit node and to determine the
+amount of disk space you have been allotted (your quota).
+
+
If you also have a /staging directory on the HTC system, see our
+staging guide for
+details on how to check your quota and usage.
+
+The default quota allotment on CHTC submit nodes is 20 GB with a hard
+limit of 30 GB (at which point you cannot write more files).
+
+Note: The CHTC submit nodes are not backed up, so you will want to
+copy completed jobs to a secure location as soon as a batch completes,
+and then delete them on the submit node in order to make room for future
+jobs. If you need more disk space to run a single batch or concurrent
+batches of jobs, please contact us (Get Help!). We have multiple ways of dealing with large disk space
+requirements to make things easier for you.
From any directory location within your home directory, type
+quota -vs. See the example below:
+
+
[alice@submit]$ quota -vs
+Disk quotas for user alice (uid 20384):
+ Filesystem space quota limit grace files quota limit grace
+ /dev/sdb1 12690M 20480M 30720M 161k 0 0
+
+
+
The output will list your total data usage under blocks, your soft
+quota, and your hard limit at which point your jobs will no longer
+be allowed to save data. Each of the values given are in 1-kilobyte
+blocks, so you can divide each number by 1024 to get megabytes (MB), and
+again for gigabytes (GB). (It also lists information for ` files`, but
+we don't typically allocate disk space by file count.)
+
+
2. Checking the Size of Directories and Contents
+
+
Move to the directory you'd like to check and type du . After several
+moments (longer if you're directory contents are large), the command
+will add up the sizes of directory contents and output the total size of
+each contained directory in units of kilobytes with the total size of
+that directory listed last. See the example below:
As for quota usage above, you can divide each value by 1024 to get
+megabytes, and again for gigabytes.
+
+
Using du with the -h or --human-readable flags will display the
+same values with only two significant digits and a K, M, or G to denote
+the byte units. The -s or --summarize flags will total up the size
+of the current directory without listing the size of directory contents
+. You can also specify which directory you'd like to query, without
+moving to it, by adding the relative filepath after the flags. See the
+below example from the home directory which contains the directory
+dir:
Checkpointing is a technique that provides fault tolerance for a user’s analysis. It consists of saving snapshots of a job’s progress so the job can be restarted without losing its progress and having to restart from the beginning. We highly encourage checkpointing as a solution for jobs that will exceed the 72 hour maximum default runtime on the HTC system.
+
+
This section is about jobs capable of periodically saving checkpoint information, and how to make HTCondor store that information safely, in case it’s needed to continue the job on another machine or at a later time.
+
+
There are two types of checkpointing: exit driven and eviction driven. In a vast majority of cases, exit driven checkpointing is preferred over eviction driven checkpointing. Therefore, this guide will focus on how to utilize exit driven checkpointing for your analysis.
+
+
Note that not all software, programs, or code are capable of creating checkpoint files and knowing how to resume from them. Consult the manual for your software or program to determine if it supports checkpointing features. Some manuals will refer this ability as “checkpoint” features, as the ability to “resume” mid-analysis if a job is interrupted, or as “checkpoint/restart” capabilities. Contact a Research Computing Facilitator if you would like help determining if your software, program, or code is able to checkpoint.
+
+
Why Checkpoint?
+
+
Checkpointing allows a job to automatically resume from approximately where it left off instead of having to start over if interrupted. This behavior is advantageous for jobs limited by a maximum runtime policy (72 hours on the HTC system). It is also advantageous for jobs submitted to backfill resources with no runtime guarantee (e.g. for +WantFlocking or +WantGliding jobs) where the compute resources may also be more prone to hardware or networking failures.
+
+
For example, checkpointing jobs that are limited by a runtime policy can enable HTCondor to exit a job and automatically requeue it to avoid hitting the maximum runtime limit. By using checkpointing, jobs circumvent hitting the maximum runtime limit and can run for extended periods of time until the completion of the analysis. This behavior avoids costly setbacks that may be caused by losing results mid-way through an analysis due to hitting a runtime limit.
+
+
Process of Exit Driven Checkpointing
+
+
Using exit driven checkpointing, a job is specified to time out after a user-specified amount of time with an exit code value of 85 (more on this below). Upon hitting this time limit, HTCondor transfers any checkpoint files listed in the submit file attribute transfer_checkpoint_files to a directory called /spool. This directory acts as a storage location for these files in case the job is interrupted. HTCondor then knows that jobs with exit code 85 should be automatically requeued, and will transfer the checkpoint files in /spool to your job’s working directory prior to restarting your executable.
+
+
The process of exit driven checkpointing relies heavily on the use of exit codes to determine the next appropriate steps for HTCondor to take with a job. In general, exit codes are used to report system responses, such as when an analysis is running, encountered an error, or successfully completes. HTCondor recognizes exit code 85 as checkpointing jobs and therefore will know to handle these jobs differently than non-checkpoiting jobs.
+
+
Requirements for Exit Driven Checkpointing
+
+
Requirements for your code or software:
+
+
+
Checkpoint: The software, program, or code you are using must be able to generate checkpoint files (i.e. snapshots of the progress made thus far) and know how to resume from them.
+
Resume: This means your code must be able to recognize checkpoint files and know to resume from them instead of the original input data when the code is restarted.
+
Exit: Jobs should exit with an exit code value of 85 after successfully creating checkpoint files. Additionally, jobs need to be able to exit with a non-85 value if they encounter an error or write the writing the final outputs.
+
+
+
In some cases, these requirements can be achieved by using a wrapper script. This means that your executable may be a script, rather than the code that is writing the checkpoint. An example wrapper script that enables some of these behaviors is below.
+
+
Contact a Research Computing Facilitator for help determining if your job is capable of using checkpointing.
+
+
Changes to the Submit File
+
+
Several modifications to the submit file are needed to enable HTCondor’s checkpointing feature.
+
+
+
The line checkpoint_exit_code = 85 must be added. HTCondor recognizes code 85 as a checkpoint job. This means HTCondor knows to end a job with this code but to then to requeue it repeatedly until the analysis completes.
+
The value of when_to_transfer_output should be set to ON_EXIT.
+
The name of the checkpoint files or directories to be transferred to /spool should be specified using transfer_checkpoint_files.
+
+
+
Optional
+In some cases, it is necessary to write a wrapper script to tell a job when to timeout and exit. In cases such as this, the executable will need to be changed to the name of that wrapper script. An example of a wrapper script that enables a job to checkout and exit with the proper exit codes can be found below.
+
+
An example submit file for an exit driven checkpointing job looks like:
As previously described, it may be necessary to use a wrapper script to tell your job when and how to exit as it checkpoints. An example of a wrapper script that tells a job to exit every 4 hours looks like:
Let’s take a moment to understand what each section of this wrapper script is doing:
+
+
#!/bin/bash
+
+timeout 4h do_science argument1 argument2
+# The `timeout` command will stop the job after 4 hours (4h).
+# This number can be increased or decreased depending on how frequent your code/software/program
+# is creating checkpoint files and how long it takes to create/resume from these files.
+# Replace `do_science argument1 argument2` with the execution command and arguments for your job.
+
+timeout_exit_status=$?
+# Uses the bash notation of `$?` to call the exit value of the last executed command
+# and to save it in a variable called `timeout_exit_status`.
+
+
+
+if [ $timeout_exit_status -eq 124 ]; then
+ exit 85
+fi
+
+exit $timeout_exit_status
+
+# Programs typically have an exit code of `124` while they are actively running.
+# The portion above replaces exit code `124` with code `85`. HTCondor recognizes
+# code `85` and knows to end a job with this code once the time specified by `timeout`
+# has been reached. Upon exiting, HTCondor saves the files from jobs with exit code `85`
+# in the temporary directory within `/spool`. Once the files have been transferred,
+# HTCondor automatically requeues that job and fetches the files found in `/spool`.
+# If an exit code of `124` is not observed (for example if the program is done running
+# or has encountered an error), HTCondor will end the job and will not automaticlally requeue it.
+
+
+
+
The ideal timeout frequency for a job is every 1-5 hours with a maximum of 10 hours. For jobs that checkpoint and timeout in under an hour, it is possible that a job may spend more time with checkpointing procedures than moving forward with the analysis. After 10 hours, jobs that checkpoint and timeout are less able to take advantage of submitting jobs outside of CHTC to run on other campus resources or on the OSPool.
+
+
Checking the Progress of Checkpointing Jobs
+
+
Always test a single checkpointing job before scaling up to identify odd or unintentional behaviors in your analysis.
+
+
To determine if your job is successfully creating and saving checkpoint files, you can investigate checkpoint files once they have been transferred to /spool.
+
+
You can explore the checkpointed files in /spool by navigating to /var/lib/condor/spool. The directories in this folder are the last four digits of a job’s cluster ID with leading zeros removed. Sub folders are labeled with the process ID for each job. For example, to investigate the checkpoint files for 17870068.220, the files in /spool would be found in folder 68 in a subdirectory called 220.
+
+
It is also possible to intentionally evict a running job and have it rematch to an execute server to test if your code is successfully resuming from checkpoint files or not. To test this, use condor_vacate_job <JobID>. This command will evict your job intentionally and have it return to “Idle” state in the queue. This job will begin running once it rematches to an execute server, allowing you to test if your job is correctly resuming from checkpoint files or incorrectly starting over with the analysis.
+
+
More Information
+
+
More information on checkpointing HTCondor jobs can be found in HTCondor’s manual: https://htcondor.readthedocs.io/en/latest/users-manual/self-checkpointing-applications.html This documentation contains additional features available to checkpointing jobs, as well as additional examples such as a python checkpointing job.
This page outlines CHTC services offered to UW - Madison affiliates and general guidelines for their use. Users with existing accounts should also refer to our User Expectations pages for more specific limits and guidelines for using CHTC services. To apply for a CHTC account, fill out this form: Getting Started
+
+
Overview
+
+
CHTC’s Research Computing activities aim to empower the research and teaching mission of the University of Wisconsin - Madison by providing access to scalable computing and data capacity.
+
+
Access to standard CHTC services is free of charge.
+
+
Who can use CHTC services?
+
+
Access to CHTC services is available to:
+
+
+
Current UW Madison affiliates (faculty, students, staff, post-docs)
+
Current UW System affiliates
+
Collaborators of UW Madison affiliates, where collaborator access benefits the work of the UW - Madison affiliate, e.g. recently graduated students, collaborators on multi-institution grants
+
+
+
Computing
+
+
CHTC operates two large-scale computing systems.
+
+
High Throughput Computing
+
+
+
Roughly 15k CPU cores, 100+ GPUs
+
Additional capacity available via campus HTC systems and the national OSPool.
+
Single user: 10s - 1000s of tasks (jobs) running at once
+
Scheduled using HTCondor
+
+
+
High Performance Computing (SPARK)
+
+
About 8k CPU cores
+
Infiniband networking for multi-node capability
+
Single user: up to 10 jobs running or 720 cores in use
+
Jobs are scheduled using SLURM job scheduling software
+
+
+
CHTC computing capacity is allocated via a fair-share scheduling algorithm. For groups that require additional or dedicated computing capacity, there is the option to purchase hardware. See our description of buy-in options here: CHTC Buy-In Overview
+
+
Data
+
+
CHTC provides space for data, software and other files that are being used for active computational work. Our file systems have no backup or other redundancy and should not be used as a storage solution. Researchers are assigned a default space quota when their account is created that can be increased upon request. For needs greater than 2TB or for longer-term projects, contact us to sign a data use memorandum of understanding.
+
+
Software
+
+
CHTC systems support software that runs on Linux. Whether or not licensed software
+can be run on CHTC depends significantly on the type of license.
+
+
Both CHTC computing systems support containers.
+
+
Citing CHTC
+
+
In order to track our scientific impact we ask that users cite a DOI in all publications that have benefited from our services. See Citing CHTC for more details.
(Feel free to modify the below text, use only certain paragraphs, or contact us for more input or customizable letters of support.)
+
+
The University of Wisconsin-Madison (UW-Madison) campus is an excellent match for meeting the computational needs of this project. Existing UW-Madison technology infrastructure supported by the CHTC can be readily leveraged, including CPU capacity, network connectivity, storage availability, and middleware connectivity. The UW-Madison has invested in the CHTC as the primary provider of shared, computing resources to campus researchers. All standard CHTC services are provided free-of-charge to UW-Madison researchers, their projects, and collaborators. But perhaps most important, the UW-Madison has significant staff experience and core competency in deploying, managing, and using computational technology.
+
+
The CHTC is home to over 20 full-time staff with a proven track record of making compute middleware work for scientists. Far beyond just being familiar with the deployment and use of such software, UW staff has been intimately involved in its design and implementation. Dedicated Research Computing Facilitators are available to provide training to all CHTC users and are available to consult on computational practices for achieving the best scientific throughput. As always, CHTC will be happy to provide consulting to ensure optimal use of its facilities, and development of robust, reproducible methods for scalable computing.
+
+
The UW-Madison maintains multiple compute clusters (including the largest of these operated by CHTC) across campus that are managed using either HTCondor or SLURM with support from CHTC. These clusters are connected by HTCondor technology to share resources with each other and with other institutions around the world via OSG services. Local computing capacity directly enabled by CHTC includes:
+
+
+
+
High-Throughput Computing (HTC) resources totaling about 30,000 CPU cores in support of research. Temporary file space for large individual files can support up to hundreds of terabytes of total working data. For single computing runs needing significant memory on a single server, the CHTC maintains several multi-core servers with terabytes of memory.
+
+
+
When on-campus resources are fully utilized, CHTC leverages OSG services to provision additional opportunistic resources from multiple external sites.
+
+
+
+
A High-Performance Computing (HPC) cluster consisting of roughly 7,000 tightly coupled cores. Compute nodes have 64 or 128 cores each, and 512 GB RAM, and are networked with 200 Gbps Infiniband, with access to a shared file system and resources managed via Slurm.
+
+
+
An origin server where users can make research data available through the Open Science Data Federation and an on-campus cache server which allows external jobs to cache files locally.
+
+
+
+
In the last year, CHTC made possible the use of more than 40,000 core years of computing work for campus researchers, supporting over 300 projects across a wide range of research domains. Temporary storage space for large files can support up to hundreds of terabytes of total working data. Should these resources not be sufficient for the project, the CHTC can also engage computing resources from across the campus grid and the OS Pool, an NSF-supported and expanding alliance of more than 100 universities, national laboratories, scientific collaborations, and software developers.
+
+
The UW–Madison network currently comprises a 200Gbps backbone and WAN connectivity with 160Gbps to the Discovery building. The equipment is located on the “Research Backbone Network”, which allows for friction-free (e.g., no middlebox devices such as firewalls on the data path) to the nation’s research and education networks. Redundancy is built into the network and its supporting infrastructure. An equitable funding model assures that network resources are kept current. The UW has been fundamental to the establishment of the Broadband Optical Research Education And Science network (BOREAS). This Regional Optical Network (RON) connects to the CIC OmniPoP in Chicago, providing a high-speed gateway to various research networks, including Internet2, ESNet, CERN, and other global research networks. BOREAS, along with our participation in the Northern Tier Network Consortium, provides various options to connect at very high speeds to research partners with shared or dedicated bandwidth
This approach may be sensitive to the operating system of the execution point.
+We recommend building a container instead, but are keeping these instructions as a backup.
+
+
+
+
+
More information
+
+
The above instructions are intended for if you have package(s) that need to be installed using conda install.
+Miniconda can be used to install Python and R and corresponding packages.
+But if you only need to install Python or R, and do not otherwise need to use a conda install command to set up the packages,
+you should see the instructions specifically for setting up Python or R because there is less chance of obscure errors when building your container.
+
+
When building or using a Miniconda container, you do not need to create or activate a conda environment.
+For the build process, you skip directly to the conda install commands you want to run.
+Similarly, when executing a script in a Miniconda container, the packages are loaded when the container starts.
+
+
Executable
+
+
If you are planning to execute a python .py script using your Miniconda container, you can follow the instructions in the Python guide.
+
+
If you are planning to execute a .R script using your Miniconda container, you can follow the instructions in the R guide.
+
+
Otherwise, you can use a bash .sh script as the submit file executable:
+
+
#!/bin/bash
+
+<your commands go here>
+
+
+
where the contents of the file are the commands that you want to execute using your conda environment.
+You do not and should not try to activate the conda environment in the executable if you are using a container.
+
+
Specifying Exact Dependency Versions
+
+
An important part of improving reproducibility and consistency between runs
+is to ensure that you use the correct/expected versions of your dependencies.
+
+
When you run a command like conda install numpy, conda tries to install
+the most recent version of numpy. For example, numpy version 1.18.2
+was released on March 17, 2020. To install exactly this version of numpy, you
+would run conda install numpy=1.18.2
+(the same works for pip, if you replace = with ==). We
+recommend installing with an explicit version to make sure you have exactly
+the version of a package that you want. This is often called
+“pinning” or “locking” the version of the package.
+
+
If you want a record of what is installed in your environment, or want to
+reproduce your environment on another computer, conda can create a file, usually
+called environment.yml, that describes the exact versions of all of the
+packages you have installed in an environment.
+This file can be re-used by a different conda command to recreate that
+exact environment on another computer.
+
+
To create an environment.yml file from your currently-activated environment, run
This environment.yml will pin the exact version of every dependency in your
+environment. This can sometimes be problematic if you are moving between
+platforms because a package version may not be available on some other platform,
+causing an “unsatisfiable dependency” or “inconsistent environment” error.
+A much less strict pinning is
which only lists packages that you installed manually, and does not pin their
+versions unless you yourself pinned them during installation.
+If you need an intermediate solution, it is also possible to manually edit
+environment.yml files; see the
+conda environment documentation
+for more details about the format and what is possible.
+In general, exact environment specifications are simply not guaranteed to be
+transferable between platforms (e.g., between Windows and Linux).
+We strongly recommend using the strictest possible pinning available to you.
+
+
To create an environment from an environment.yml file, run
By default, the name of the environment will be whatever the name of the source
+environment was; you can change the name by adding a -n <name> option to the
+conda env create command.
+
+
If you use a source control system like git, we recommend checking your
+environment.yml file into source control and making sure to recreate it
+when you make changes to your environment.
+Putting your environment under source control gives you a way to track how it
+changes along with your own code.
+
+
If you are developing software on your local computer for eventual use on
+the CHTC pool, your workflow might look like this:
+
+
Set up a conda environment for local development and install packages as desired
+(e.g., conda create -n science; conda activate science; conda install numpy).
+
Once you are ready to run on the CHTC pool, create an environment.yml file
+from your local environment (e.g., conda env export > environment.yml).
+
Move your environment.yml file from your local computer to the submit machine
+and create an environment from it (e.g., conda env create -f environment.yml),
+then pack it for use in your jobs, as per
+Create Software Package.
+
+
+
More information on conda environments can be found in
+their documentation.
+
+
+
+
Option B: Create your own portable copy
+
+
1. Create a Miniconda installation
+
+
On the submit server,
+download the latest Linux miniconda installer and run it.
+
+
[alice@submit]$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
+[alice@submit]$ sh Miniconda3-latest-Linux-x86_64.sh
+
+
+
Accept the license agreement and default options. At the end, you can choose whether or
+not to “initialize Miniconda3 by running conda init?”
+We recommend that you enter “yes”.
+Once you’ve completed the installer, you’ll be prompted to restart your terminal.
+Log out and log back in, and conda will be ready to use to set up your software.
+
+
+
If you choose “no” you’ll want to save the eval command shown by the installer so that you can reactivate the
+Miniconda installation when needed in the future.
+
+
+
2. Create a conda “environment” with your software
+
+
+
(If you are using an environment.yml file as described
+later, you should instead create
+the environment from your environment.yml file. If you don’t have an
+environment.yml file to work with, follow the install instructions in this
+section. We recommend switching to the environment.yml method of creating
+environments once you understand the “manual” method presented here.)
+
+
+
Make sure that you’ve activated the base Miniconda environment if you haven’t
+already. Your prompt should look like this:
+
+
(base)[alice@submit]$
+
+
+
To create an environment, use the conda create command and then activate the
+environment:
Then, run the conda install command to install the different packages and
+software you want to include in the installation. How this should look is often
+listed in the installation examples for software
+(e.g. Qiime2,
+Pytorch).
Some Conda packages are only available via specific Conda channels
+which serve as repositories for hosting and managing packages. If Conda is
+unable to locate the requested packages using the example above, you may
+need to have Conda search other channels. More detail are available at
+https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/channels.html.
+
+
Packages may also be installed via pip, but you should only do this
+when there is no conda package available.
+
+
Once everything is installed, deactivate the environment to go back to the
+Miniconda “base” environment.
+
+
(env-name)[alice@submit]$ conda deactivate
+
+
+
For example, if you wanted to create an installation with pandas and
+matplotlib and call the environment py-data-sci, you would use this sequence
+of commands:
Finally, use conda pack to create a zipped tar.gz file of your environment
+(substitute the name of your conda environment where you see env-name),
+set the proper permissions for this file using chmod, and check the size of
+the final tarball:
When this step finishes, you should see a file in your current directory named
+env-name.tar.gz
+
+
4. Check Size of Conda Environment Tar Archive
+
+
The tar archive, env-name.tar.gz, created in the previous step will be used as input for
+subsequent job submission. As with all job input files, you should check the size of this
+Conda environment file. If >100MB in size, you should NOT transfer the tar ball using
+transfer_input_files. Instead, you should plan to use either CHTC’s web proxy, SQUID or
+large data filesystem Staging. Please contact a research computing facilitators at
+chtc@cs.wisc.edu to determine the best option for your jobs.
The job will need to go through a few steps to use this “packed” conda environment;
+first, setting the PATH, then unzipping the environment, then activating it,
+and finally running whatever program you like. The script below is an example
+of what is needed (customize as indicated to match your choices above).
+
+
#!/bin/bash
+
+# have job exit if any command returns with non-zero exit status (aka failure)
+set -e
+
+# replace env-name on the right hand side of this line with the name of your conda environment
+ENVNAME=env-name
+# if you need the environment directory to be named something other than the environment name, change this line
+export ENVDIR=$ENVNAME
+
+# these lines handle setting up the environment; you shouldn't have to modify them
+export PATH
+mkdir $ENVDIR
+tar -xzf $ENVNAME.tar.gz -C $ENVDIR
+. $ENVDIR/bin/activate
+
+# modify this line to run your desired Python script and any other work you need to do
+python3 hello.py
+
+
+
6. Submit Jobs
+
+
In your submit file, make sure to have the following:
+
+
+
Your executable should be the the bash script you created in step 5.
+
Remember to transfer your Python script and the environment tar.gz file via
+ transfer_input_files.
+Since the tar.gz file will almost certainly be larger than 100MB,
+please email us about different tools for
+delivering the installation to your jobs,
+likely our SQUID web proxy.
The condor_q command can be used for much more than just
+checking on whether your jobs are running or not! Read on to learn how
+you can use condor_q to answer many common questions about running
+jobs.
+
+
Summary
+
+
+
condor_q: Show my jobs that have been submitted on this server.
+Useful options:
+
+
-nobatch: Starting in version HTCondor 8.6.0 installed in July
+2016, data is displayed in a compact mode (one line per
+cluster). With this option output will be displayed in the old
+format (one line per process)
+
-all: Show all the jobs submitted on the submit server.
+
-hold: Show only jobs in the "on hold" state and the reason
+for that. Held jobs are those that got an error so they could
+not finish. An action from the user is expected to solve the
+problem.
+
-better-analyze JobId: -better-analyze : Analyse a specific
+job and show the reason why it is in its current state.
+
-run: Show your running jobs and related info, like how much
+time they have been running, in which machine, etc.
+
-dag: Organize condor_q output by DAG.
+
-long JobId: Show all information related to that job.
+
-af Attr1 Attr2 ...: List specific attributes of jobs, using
+autoformat.
+
+
+
+
+
Examples and Further Explanation
+
+
+
+
1. Default condor_q output
+
+
As of July 19, 2016, the default condor_q output will show a single
+user's jobs, grouped in "batches", as shown below:
HTCondor will automatically group jobs into "batches" for this
+display. However, it's also possible for you to specify groups of jobs
+as a "batch" yourself. You can either:
Either option will create a batch of jobs with the label "CoolJobs".
+
+
+
+
2. View all jobs.
+
+
To display more detailed condor_q output (where each job is listed on a
+separate line), you can use the batch name or any existing grouping
+constraint (ClusterId or other "-constraint" options - see
+below for more on constraints) and the -nobatch flag.
+
+
Looking at a batch of jobs with the same ClusterId would look like
+this:
+
+
[alice@submit]$ condor_q -nobatch 195
+
+ ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
+195.10 alice 6/22 13:00 0+00:00:00 H 0 0.0 job.sh
+195.14 alice 6/22 13:00 0+00:01:44 R 0 0.0 job.sh
+195.16 alice 6/22 13:00 0+00:00:26 R 0 0.0 job.sh
+195.39 alice 6/22 13:00 0+00:00:05 R 0 0.0 job.sh
+195.40 alice 6/22 13:00 0+00:00:00 I 0 0.0 job.sh
+195.41 alice 6/22 13:00 0+00:00:00 I 0 0.0 job.sh
+195.53 alice 6/22 13:00 0+00:00:00 I 0 0.0 job.sh
+195.57 alice 6/22 13:00 0+00:00:00 I 0 0.0 job.sh
+195.58 alice 6/22 13:00 0+00:00:00 I 0 0.0 job.sh
+
+9 jobs; 0 completed, 0 removed, 5 idle, 3 running, 1 held, 0 suspended
+
+
+
This was the default view for condor_q from January 2016 until July
+2016.
+
+
+
+
3. View jobs from all users.
+
+
By default, condor_q will just show you information about your
+jobs. To get information about all jobs in the queue, type:
+
+
[alice@submit]$ condor_q -all
+
+
+
This will show a list of all job batches in the queue. To see a list of
+all jobs (individually, not in batches) for all users, combine the
+-all and -nobatch options with condor_q. This was the default view
+for condor_q before January 2016.
+
+
+
+
4. Determine why jobs are on hold.
+
+
If your jobs have gone on hold, you can see the hold reason by running:
+
+
[alice@submit]$ condor_q -hold
+
+
+
or
+
+
[alice@submit]$ condor_q -hold JobId
+
+
+
The first will show you the hold reasons for all of your jobs that
+are on hold; the second will show you the hold reason for a specific
+job. The hold reason is sometimes cut-off; try the following to see the
+entire hold reason:
+
+
[alice@submit]$ condor_q -hold -af HoldReason
+
+
+
If you aren't sure what your hold reason means email
+chtc@cs.wisc.edu.
+
+
+
+
5. Find out why jobs are idle
+
+
condor_q has an option to describe why a job hasn't matched and
+started running. Find the JobId of a job that hasn't started running
+yet and use the following command:
+
+
$ condor_q -better-analyze JobId
+
+
+
After a minute or so, this command should print out some information
+about why your job isn't matching and starting. This information is not
+always easy to understand, so please email us with the output of this
+command if you have questions about what it means.
+
+
+
+
6. Find out where jobs are running.
+
+
To see which computers your jobs are running on, use:
+
+
[alice@submit]$ condor_q -nobatch -run
+428.0 alice 6/22 17:27 0+00:07:17 slot1_12@e313.chtc.wisc.edu
+428.1 alice 6/22 17:27 0+00:07:11 slot1_8@e376.chtc.wisc.edu
+428.2 alice 6/22 17:27 0+00:07:16 slot1_15@e451.chtc.wisc.edu
+428.3 alice 6/22 17:27 0+00:07:16 slot1_17@e277.chtc.wisc.edu
+428.5 alice 6/22 17:27 0+00:07:16 slot1_9@e351.chtc.wisc.edu
+428.7 alice 6/22 17:27 0+00:07:16 slot1_1@e373.chtc.wisc.edu
+428.8 alice 6/22 17:27 0+00:07:16 slot1_5@e264.chtc.wisc.edu
+
+
+
+
+
7. View jobs by DAG.
+
+
If you have submitted multiple DAGs to the queue, it can be hard to tell
+which jobs belong to which DAG. The -dag option to condor_q will
+sort your queue output by DAG:
+
+
[alice@submit]$ condor_q -nobatch -dag
+ ID OWNER/NODENAME SUBMITTED RUN_TIME ST PRI SIZE CMD
+460.0 alice 11/18 16:51 0+00:00:17 R 0 0.3 condor_dagman -p 0
+462.0 |-0 11/18 16:51 0+00:00:00 I 0 0.0 print.sh
+463.0 |-1 11/18 16:51 0+00:00:00 I 0 0.0 print.sh
+464.0 |-2 11/18 16:51 0+00:00:00 I 0 0.0 print.sh
+461.0 alice 11/18 16:51 0+00:00:09 R 0 0.3 condor_dagman -p 0
+465.0 |-0 11/18 16:51 0+00:00:00 I 0 0.0 print.sh
+466.0 |-1 11/18 16:51 0+00:00:00 I 0 0.0 print.sh
+467.0 |-2 11/18 16:51 0+00:00:00 I 0 0.0 print.sh
+
+8 jobs; 0 completed, 0 removed, 6 idle, 2 running, 0 held, 0 suspended
+
+
+
+
+
8. View all details about a job.
+
+
Each job you submit has a series of attributes that are tracked by
+HTCondor. You can see the full set of attributes for a single job by
+using the "long" option for condor_q like so:
Attributes that are often useful for checking on jobs are:
+
+
+
Iwd: the job's submission directory on the submit node
+
UserLog: the log file for a job
+
RequestMemory, RequestDisk: how much memory and disk you've
+requested per job
+
MemoryUsage: how much memory the job has used so far
+
JobStatus: numerical code indicating whether a job is idle,
+running, or held
+
HoldReason: why a job is on hold
+
DAGManJobId: for jobs managed by a DAG, this is the JobId of the
+parent DAG
+
+
+
+
+
9. View specific details about a job using auto-format
+
+
If you would like to see specific attributes (see above) for a job or
+group of jobs, you can use the "auto-format" (-af) option to
+condor_q which will print out only the attributes you name for a
+single job or group of jobs.
+
+
For example, if I would like to see the amount of memory and disk I've
+requested for all of my jobs, and how much memory is currently behing
+used, I can run:
If you would like to find jobs that meet certain conditions, you can use
+condor_q's "constraint" option. For example, suppose you want to
+find all of the jobs associated with the DAGMan Job ID "234567". You
+can search using:
One common use of constraints is to find all jobs that are running,
+held, or idle. To do this, use a constraint with the JobStatus
+attribute and the appropriate status number - the status codes can be
+found in Appendix
+A
+of the HTCondor Manual.
+
+
Remember condor_q -hold from before? In the background, the
+-hold option is constraining the list of jobs to jobs that are on hold
+(using the JobStatus attribute) and then printing out the HoldReason
+attribute. Try running:
You should see something very similar to running condor_q -hold!
+
+
+
+
11. Remove a held job from the queue
+
+
To remove a job held in the queue, run:
+
+
[alice@submit]$ condor_rm <JobID>
+
+
+
This will remove the job in the queue. Once you have made changes to allow the job to run successfully, the job can be resubmitted using condor_submit.
how to authenticate with Duo when logging into CHTC’s HTC and HPC systems
+
how to set your login (SSH) configuration to “reuse” a two-factor authenticated
+connection over a certain period of time.
+
terminals and applications that are known to support persistent connections
+
+
+
Authentication with Duo
+
+
As of December 2022, accessing CHTC resources
+now requires two-factor authentication. The first “factor” uses your NetID password
+(or SSH keys) and the second “factor” is authentication using Duo, via either a
+Duo fob or the Duo app.
+
+
See the following video for an demonstration of two-factor authentication with Duo
+when logging into CHTC:
+
+
+
+
Re-Using SSH Connections
+
+
To reduce the number of times it is necessary to enter your credentials, it’s possible
+to customize your SSH configuration in a way that allows you to “reuse” a connection
+for logging in again or moving files. This configuration is optional, and
+most useful if you will connect to
+the same server multiple times in a short window, for example, when uploading or
+downloading files.
+
+
WARNING: This guide describes how to configure your local machine to not require
+reentering your NetID password or Duo authentication each time you login.
+This should ONLY be used on secure devices that you manage - it should
+not be used on any shared laptop, desktop, or research group resource. Users
+found violating this policy risk having their CHTC account permanently deactivated.
+
+
The instructions below are meant for users who can use a terminal (Mac, Linux, newer Windows operating systems):
+
+
+
+
Open a terminal window.
+
+
Create (or edit) your personal SSH configuration file at ~/.ssh/config to use
+what’s called “ControlMaster”
+This is the text that should be added to a file called config in the .ssh directory in your computer’s home directory:
+
Host *.chtc.wisc.edu
+ # Turn ControlMaster on
+ ControlMaster auto
+ # ControlMaster connection will persist
+ # for 2 hours of idleness, after which
+ # it will disconnect
+ ControlPersist 2h
+ # Where to store files that represent
+ # the ControlMaster persistent connections
+ ControlPath ~/.ssh/connections/%r@%h:%p
+
+
If you’re not able to find or create the config file, executing the code below from a terminal on your computer
+ will add the right information to the config file
+
# Let's create (or add to) our SSH client configuration file.
+ echo "
+ Host *.chtc.wisc.edu
+ # Turn ControlMaster on
+ ControlMaster auto
+ # ControlMaster connection will persist
+ # for 2 hours of idleness, after which
+ # it will disconnect
+ ControlPersist 2h
+ # Where to store files that represent
+ # the ControlMaster persistent connections
+ ControlPath ~/.ssh/connections/%r@%h:%p" >> ~/.ssh/config
+
+
+
You also create a directory that will be used to track connections. In
+the same .ssh directory, make a folder called connections by typing:
+
$ mkdir -p ~/.ssh/connections
+
+
+
Once you login to a CHTC server, this is where the system will store information
+ about your previous connection information so that you do not have to reenter your
+ password or Duo authenticate.
+
+
Now, log into your CHTC submit server or login node as normal. The first time you log in, you will need to use
+two-factor authentication, but subsequent logins to that machine will not require
+authentication as long as they occur within the time value used in
+the ControlPersist configuration option (so in this example, 2 hours).
+
+
+
For Windows users who use PuTTY to log in, you need to go to
+the Connection -> SSH section in the “Category” menu on the left side,
+and then check the “Share SSH Connection if possible” box. If you don’t
+see this option, try downloading a newer version of PuTTY.
+
+
Ending “Stuck” Connections
+
+
Sometimes a connection goes stale and you can’t reconnect using it, even if
+it is within the timeout window. In this case, you can avoid using the existing
+connection by removing the relevant file in ~/.ssh/connections; This will probably
+look something like:
+
+
$ ls ~/.ssh/connections/
+alice@submit.chtc.wisc.edu:22
+$ rm ~/.ssh/connections/alice@submit.chtc.wisc.edu:22
+
+
+
Connection settings
+
+
Note that all port forwarding, including X display forwarding, must be setup by
+the initial connection and cannot be changed. If you forget to use -Y on the initial
+connection, you will not be able to open X programs on subsequent connections.
+
+
File Transfer Tools
+
+
There are a variety of tools that people use for transferring and editing files
+like WinSCP and MobaXTerm. Some of these tools are able to use ssh configuration
+or have options that do not require Duo 2FA every time a file is uploaded or
+downloaded or edited, but some do not.
+
+
Known to support persistent connections
+
+
+
+
Linux, Mac, and Windows Subsystem for Linux (WSL) terminals
+
+
+
WinSCP
+
+
May need to adjust preferences. Within WinSCP:
+
+
+
Go to Options, then Preferences, and click on Background under the Transfer section.
+
Set ‘Maximal number of transfers at the same time:’ to 1.
+
Make sure ‘Use multiple connections for single transfer’ checkbox is checked.
Cyberduck does not use SSH configurations, therefore the following setting
+ can be used to enable connection persistence. Within Cyberduck:
+
+
+
Select Preferences, then the Transfers button, and then the General section.
+
Under “Transfers”, use the “Transfer Files” drop-down to select “Use browser
+connection”.
+
+
+
+
+
Known to NOT support ControlMaster or similar persistent connections
+
+
+
+
Windows PowerShell
+
+
+
File transfer tools from Panic, like Transmit and Nova
+
+
+
+
Other Tools
+
+
For those on spotty wireless or those who move a lot with their connection
+(and on *nix) then the open source shell Mosh (https://mosh.org/) has capabilities
+to keep sessions open as you change connections. Note that Mosh doesn’t support the
+following SSH features:
This guide assumes
+that you have already gotten a CHTC account for either our high
+throughput or high performance compute systems. If you haven't, see our
+getting started page.
You will need the following information to log into our CHTC submit
+servers or head nodes:
+
+
Username and Password
+
+
+
UW-Madison NetID and password
+
+
+
Hostname
+
+
+
+
+
HTC System
+
+
+
+
+
ap2001.chtc.wisc.edu
+
+
+
ap2002.chtc.wisc.edu
+
+
+
+
+
+
+
+
HPC Cluster
+
+
+
+
+
spark-login.chtc.wisc.edu
+
+
+
+
+
As of December 2022, we also require two-factor authentication with Duo to
+access CHTC resources.
+
+
+
Are you off-campus?
+All of our CHTC submit servers and head nodes are firewalled to block
+log-ins from off-campus. If you are off-campus and want to log in, you
+can either:
+
+
+
Activate the campus Virtual Private Network (VPN) (more details on how to set this up
+DoIT’s VPN webpage). This will allow you join the campus network when working off-campus.
+
Log into another computer that is on campus (typically by SSH-ing into that computer) and then SSH to our submit server.
+
+
+
In either case, it will appear like you are on-campus, and you should
+then be able to log into CHTC as usual.
+
+
+
+
+
2. Logging In
+
+
Using the information described above, you can log in to servers two
+different ways -- from the command line or using an SSH program:
+
+
+
+
A. On the command line
+
+
On Mac, Linux, and modern Windows (10+) systems, you can use the "Terminal" application to
+log in. Open a terminal window and use the following command to connect
+to the appropriate server:
+
+
$ ssh username@hostname
+
+
+
You will be prompted for your password, and then for Duo
+authentication.
+
+
+
+
B. Using an SSH program (Windows)
+
+
There are multiple programs to connect to remote servers for Windows. We
+recommend "PuTTy", which can be downloaded
+here.
+To log in, click on the PuTTy executable (putty.exe). You should see a
+screen like this:
+
+
+
+
Fill in the hostname as described in part 1. You should use Port 22 and
+connect using "ssh" -- these are usually the defaults. After you
+click "connect" you will be prompted to fill in your username and
+password, and then to authenticate with Duo.
+
+
Note that once you have submitted jobs to the queue, you can leave your
+logged in session (by typing exit). Your jobs will run and return
+output without you needing to be connected.
+
+
+
+
C. Re-Using SSH Connections
+
+
To reduce the number of times it is necessary to enter your credentials, it’s
+possible to customize your SSH configuration in a way that allows you to “reuse”
+a connection for logging in again or moving files. More details are shown
+in this guide: Automating CHTC Log In
+
+
+
+
3. Learning About the Command Line
+
+
Why learn about the command line? If you haven't used the command
+line before, it might seem like a big challenge to get started, and
+easier to use other tools, especially if you have a Windows computer.
+However, we strongly recommend learning more about the command line for
+multiple reasons:
+
+
+
You can do most of what you need to do in CHTC by learning a few
+basic commands.
+
With a little practice, typing on the command line is significantly
+faster and much more powerful than using a point-and-click graphic
+interface.
+
Command line skills are useful for more than just large-scale
+computing.
+
+
+
For a good overview of command line tools, see the Software Carpentry
+Unix Shell lesson. In
+particular, we recommend the sections on:
Dask
+is a Python library for parallel computing.
+Though it is not the
+traditional HTCondor workflow, it is possible to use
+Dask on the CHTC pool through a special adapter package provided by CHTC.
+This guide describes the situations in which you should consider using
+Dask instead of the traditional workflow, and will point you toward the
+documentation for the adapter package (which will guide you through
+actually using it).
+
+
+
This is a new how-to guide on the CHTC website. Recommendations and
+feedback are welcome via email (chtc@cs.wisc.edu) or by creating an
+issue on the CHTC website GitHub repository: Create an issue
+
+
+
What is Dask?
+
+
Dask
+is a Python library that can “scale up” Python code in two ways:
+
+
“Low-level” parallelism, through transparently-parallel calculations on familiar interfaces like numpy arrays.
+
“High-level” parallelism, through an explicit run-functions-in-parallel interface.
+
+
+
Both kinds of parallelism can be useful, depending on your work.
+For example, Dask could be used to perform data analysis on a single multi-TB
+dataframe stored in distributed memory, as if it was all stored locally.
+It could also be used to run thousands of independent simulations across
+a cluster, aggregating their results locally as they finish.
+Dask can also smoothly handle cases between these extremes (perhaps each of your
+independent simulations also needs a large amount of memory?).
+
+
Dask also “scales down”: it runs the same way on your laptop as it does on
+a cluster thereby providing a smooth transition between running on
+local resources and running on something like the CHTC pool.
+
+
When should I use Dask at CHTC?
+
+
Several use cased are described below for considering the use of Dask for parallelism
+in CHTC instead of the traditional HTCondor workflow
+of creating jobs and DAGs:
+
+
+
You are already using Dask for parallelism and want to smoothly scale
+up your computing resources. Note that many foundational libraries in the
+scientific Python ecosystem, like xarray,
+now use Dask internally.
+
You are already using something like
+multiprocessing or
+joblib
+for high-level parallelism.
+Dask’s high-level parallelism interface is fairly similar to these libraries,
+and switching from them to Dask should not involve too much work.
+
You can make your overall workflow more efficient by adjusting it based
+on intermediate results.
+For example,
+adaptive hyperparameter optimization
+can be significantly more efficient than something like a random grid search,
+but requires a “controller” to guide the process at every step.
+
You want to operate on single arrays or dataframes that are larger
+than can be stored in the memory of a single average CHTC worker
+(more than a few GB). Dask can store this kind of data in “chunks” on workers
+and seamlessly perform calculations on the chunks in parallel.
+
You want your workflow to “scale down” to local resources. Being able to run
+your workflow locally may make developing and testing it easier.
+
You want a more interactive way of using the CHTC pool.
+The adapter package provides tools for running Jupyter Notebooks on the
+CHTC pool, connected to your Dask cluster.
+This can be useful for debugging or inspecting the progress of your workflows.
+
+
+
You may also be interested in Dask’s own
+“Why Dask?” page.
+
+
If you are unsure whether you should use Dask or the traditional workflow,
+please get in touch with a research computing facilitator by emailing
+chtc@cs.wisc.edu to set up a consultation.
+
+
How do I use Dask at CHTC?
+
+
Dask integration with the CHTC pool is provided by the
+Dask-CHTC package.
+See that package’s documentation
+for details on how to get started.
Linux containers are a way to build a self-contained environment that
+includes software, libraries, and other tools. CHTC currently supports
+running jobs inside Docker
+containers. This guide describes how to build a Docker image
+that you can use for running jobs in CHTC. For information on using
+this image for jobs, see our Docker Jobs guide.
+
+
Overview
+
+
Note that all the steps below should be run on your own computer, not
+in CHTC.
+
+
Docker images can be created using a special file format
+called a “Dockerfile”. This file has commands that allow you to:
+
+
+
use a pre-existing Docker image as a base
+
add files to the image
+
run installation commands
+
set environment variables
+
+
+
You can then “build” an image from this
+file, test it locally, and push it to DockerHub, where
+HTCondor can then use the image to build containers to run jobs in.
+Different versions of the image can be labeled with different version
+“tags”.
If you haven’t already, create a DockerHub account and install
+Docker on your computer. You’ll want to look for the Docker Community
+Edition
+for your operating system. It sometimes takes some time for Docker to
+start, especially the first time. Once Docker starts, it won’t open a
+window; you’ll just see a little whale and container icon in one of your
+computers toolbars. In order to actually use Docker, you’ll need to
+open a command line program (like Terminal, or Command Prompt) and run
+commands there.
+
+
2. Explore Docker Containers (optional)
+
+
If you have never used Docker before, we recommend exploring a pre-existing container
+and testing out installation steps interactively before creating a Dockerfile. See the
+first half of this guide: Exploring and Testing a Docker Container
+
+
3. Create a Dockerfile
+
+
A Dockerfile is a plain text file with keywords that add elements to a
+Docker image. There are many keywords that can be used in a Dockerfile (documented on
+the Docker website here: Dockerfile
+keywords), but we will use a
+subset of these keywords following this basic outline:
+
+
+
Starting point: Which Docker image do you want to start with?
+
Additions: What needs to be added? Folders? Data? Other software?
+
Environment: What variables (if any) are set as part of the software installation?
+
+
+
Create the file
+
+
Create a blank text file named Dockerfile. If you are planning on making
+multiple images for different parts of your workflow,
+you should create a separate folder for each
+new image with the a Dockerfile inside each of them.
+
+
Choose a base image with FROM
+
+
Usually you don’t want to start building your image from scratch.
+Instead you’ll want to choose a “base” image to add things to.
+
+
You can find a base image by searching DockerHub. If you’re
+using a scripting language like Python, R or perl, you could start with
+the “official” image from these languages. If you’re not sure what to
+start with, using a basic Linux image (Debian, Ubuntu and CentOS are common
+examples) is often a good place to start.
+
+
Images often have tagged versions. Besides choosing the image
+you want, make sure to choose a version by clicking on the “Tags” tab of
+the image.
+
+
Once you’ve decided on a base image and version, add it as the first
+line of your Dockerfile, like this:
+
+
FROM repository/image:tag
+
+
+
Some images are maintained by DockerHub itself
+(these are the “official” images mentioned above),
+and do not have a repository.
+For example, to start with Centos 7,
+you could use
The next step is the most challenging. We need to add commands to the
+Dockerfile to install the desired software. There are a few standard ways to
+do this:
+
+
+
Use a Linux package manager. This is usually apt-get for Debian-based
+containers (e.g, Ubuntu) or yum for RedHat Linux containers (e.g., CentOS).
+
Use a software-specific package manager (like pip or conda for Python).
+
Use installation instructions (usually a progression of configure,
+make, make install).
+
+
+
Each of these options will be prefixed by the RUN keyword. You can
+join together linked commands with the && symbol; to break lines, put
+a backslash \ at the end of the line. RUN can execute any command inside the
+image during construction, but keep in mind that the only thing kept in the final
+image is changes to the filesystem (new and modified files, directories, etc.).
+
+
For example, suppose that your job’s executable ends up running Python and
+needs access to the packages numpy and scipy, as well as the Unix tool wget.
+Below is an example of a Dockerfile that uses RUN to install these packages
+using the system package manager and Python’s built-in package manager.
+
+
# Build the image based on the official Python version 3.8 image
+FROM python:3.8
+
+# Our base image happens to be Debian-based, so it uses apt-get as its system package manager
+# Use apt-get to install wget
+RUN apt-get update \
+ && apt-get install wget
+
+# Use RUN to install Python packages (numpy and scipy) via pip, Python's package manager
+RUN pip3 install numpy scipy
+
+
+
If you need to copy specific files (like source code) from your computer into the
+image, place the files in the same folder as the
+Dockerfile and use the COPY keyword. You could also download files
+within the image by using the RUN keyword and commands like wget
+or git clone.
+
+
For example, suppose that you need to use
+JAGS
+and the
+rjags package for R.
+If you have the
+JAGS source code
+downloaded next to the Dockerfile, you could compile and
+install it inside the image like so:
+
+
FROM rocker/r-ver:3.4.0
+
+# COPY the JAGS source code into the image under /tmp
+COPY JAGS-4.3.0.tar.gz /tmp
+
+# RUN a series of commands to unpack the JAGS source, compile it, and install it
+RUN cd /tmp \
+ && tar -xzf JAGS-4.3.0.tar.gz \
+ && cd JAGS-4.3.0 \
+ && ./configure \
+ && make \
+ && make install
+
+# install the R package rjags
+RUN install2.r --error rjags
+
+
+
Set up the environment with ENV
+
+
Your software might rely on certain environment variables being set correctly.
+
+
One common situation is that if you’re installing a program to a custom location
+(like a home directory), you may need to add that directory to the image’s system
+PATH. For example, if you installed some scripts to /home/software/bin, you
+could use
+
+
ENV PATH="/home/software/bin:${PATH}"
+
+
+
to add them to your PATH.
+
+
You can set multiple environment variables at once:
So far we haven’t actually created the image – we’ve just been
+listing instructions for how to build the image in the Dockerfile.
+Now we are ready to build the image!
+
+
First, decide on a name for the image, as well as a tag. Tags are
+important for tracking which version of the image you’ve created (and
+are using). A simple tag scheme would be to use numbers (e.g. v0, v1,
+etc.), but you can use any system that makes sense to you.
+
+
Because HTCondor caches Docker images by tag, we strongly recommend that you
+never use the latest tag, and always build images with a new, unique tag that
+you then explicitly specify in new jobs.
+
+
To build and tag your image, open a Terminal (Mac/Linux) or Command
+Prompt (Windows) and navigate to the folder that contains your
+Dockerfile:
+
+
$ cd directory
+
+
+
(Replace directory with the path to the appropriate folder.)
+
+
Then make sure Docker is running (there should be an icon on
+your status bar, and running docker info shouldn’t indicate any errors) and run:
+
+
$ docker build -t username/imagename:tag .
+
+
+
Replace username with your Docker Hub username and replace
+imagename and tag with the values of your choice. Note the . at the end
+of the command (to indicate “the current directory”).
+
+
If you get errors, try to determine what you may need to add or change
+to your Dockerfile and then run the build command again. Debugging a Docker
+build is largely the same as debugging any software installation process.
+
+
5. Test Locally
+
+
This page describes how to interact with your new Docker image on your
+own computer, before trying to run a job with it in CHTC:
Once your image has been successfully built and tested, you
+can push it to DockerHub so that it will be available to run jobs in
+CHTC. To do this, run the following command:
+
+
$ docker push username/imagename:tag
+
+
+
(Where you once again replace username/imagename:tag with what you used in
+previous steps.)
+
+
The first time you push an image to DockerHub, you may need to run this
+command beforehand:
+
+
$ docker login
+
+
+
It should ask for your DockerHub username and password.
+
+
+
Reproducibility
+
+
If you have a free account on Docker Hub, any container image that you
+have pushed there will be scheduled for removal if it is not used (pulled) at least once
+every 6 months (See the Docker Terms of Service).
+
+
For this reason, and just because it’s a good idea in general, we recommend
+creating a file archive of your container image and placing it in whatever space
+you use for long-term, backed-up storage of research data and code.
+
+
To create a file archive of a container image, use this command,
+changing the name of the archive file and container to reflect the
+names you want to use:
+
docker save --output archive-name.tar username/imagename:tag
+
+
+
It’s also a good idea to archive a copy of the Dockerfile used to generate a
+container image along with the file archive of the container image itself.
+
+
+
7. Running Jobs
+
+
Once your Docker image is on Docker Hub, you can use it to run
+jobs on CHTC’s HTC system. See this guide for more details:
This section holds various example Dockerfile that cover more advanced use cases.
+
+
Installing a Custom Python Package from GitHub
+
+
Suppose you have a custom Python package hosted on GitHub, but not available
+on PyPI.
+Since pip can install packages directly from git repositories, you could
+install your package like this:
+
+
FROM python:3.8
+
+RUN pip3 install git+https://github.com/<RepositoryOwner>/<RepositoryName>
+
+
where you would replace <RepositoryOwner> and <RepositoryName> with your
+desired targets.
Linux containers are a way to build a self-contained environment that
+includes software, libraries, and other tools. This guide shows how to
+submit jobs that use Docker containers.
+
+
Overview
+
+
Typically, software in CHTC jobs is installed or compiled locally by
+individual users and then brought along to each job, either using the
+default file transfer or our SQUID web server. However, another option
+is to use a container system, where the software is installed in a
+container image. Using a container to handle software can be
+advantageous if the software installation 1) has many dependencies, 2)
+requires installation to a specific location, or 3) “hard-codes” paths
+into the installation.
+
+
CHTC has capabilities to access and start containers and
+run jobs inside them. This guide shows how to do this for
+Docker containers.
+
+
1. Use a Docker Container in a Job
+
+
Jobs that run inside a Docker container will be almost exactly the same
+as “vanilla” HTCondor jobs. The main change is indicating which Docker
+container to use and an optional “container universe” option:
+
+
# HTC Submit File
+
+# Provide HTCondor with the name of the Docker container
+container_image = docker://user/repo:tag
+universe = container
+
+executable = myExecutable.sh
+transfer_input_files = other_job_files
+
+log = job.log
+error = job.err
+output = job.out
+
+request_cpus = 1
+request_memory = 4GB
+request_disk = 2GB
+
+queue
+
+
+
In the above, change the address of the Docker container image as
+needed based on the container you are using. More information on finding
+and making container is below.
+
+
Integration with HTCondor
+
+
When your job starts, HTCondor will pull the indicated image from
+DockerHub, and use it to run your job. You do not need to run any
+Docker commands yourself.
+
+
Other pieces of the job (your executable and input files) should be just
+like a non-Docker job submission.
+
+
The only additional change may be that your
+executable no longer needs to install or unpack your software, since it
+will already be present in the Docker container.
+
+
2. Choose or Create a Docker Container Image
+
+
To run a Docker job, you will first need access to a Docker container
+image that has been built and placed onto the
+DockerHub website. There are two primary ways
+to do this.
+
+
A. Pre-existing Images
+
+
The easiest way to get a Docker container image for running a job is to
+use a public or pre-existing image on DockerHub. You can find images by
+getting an account on DockerHub and searching for the software you want
+to use.
An image supported by a group will be continuously updated and the
+versions will be indicated by “tags”. We recommend choosing a specific
+tag (or tags) of the container to use in CHTC.
Similarly, we recommend using container tags. Importantly, whenever you make a significant change
+to your container, you will want to use a new tag name to ensure that your jobs are getting an
+updated version of the container, and not an ‘old’ version that has been cached by DockerHub
+or CHTC.
+
+
3. Testing
+
+
If you want to test your jobs, you have two options:
+
+
+
We have a guide on exploring and testing Docker containers on your own computer here:
+
You can test a container interactively in CHTC by using a normal Docker job submit file and using the
+interactive flag with condor_submit:
+
[alice@submit]$ condor_submit -i docker.sub
+
+
This should start a session inside the indicated Docker container and connect you to it using ssh. Type exit to end the interactive job. Note: Files generated during your interactive job with Docker will not be transfered back to the submit node. If you have a directory on staging, you can transfer the files there instead; if you have questions about this, please contact a facilitator.
Linux containers are a way to build a self-contained environment that
+includes software, libraries, and other tools. This guide shows how to
+explore and test a Docker container on your own computer.
+
+
A. Overview
+
+
Note that all the steps below should be run on your own computer, not
+in CHTC.
If you’ve never used Docker before, and/or are getting ready to build your own
+container image, we recommend starting with the first part of the
+guide.
+
+
If you’ve explored Docker already or built your own image and you want to test if it
+will work successfully in CHTC’s HTC system,
+you can follow the directions in the second section.
+
+
A. Set Up Docker on Your Computer
+
+
If you haven’t already, create a DockerHub account and install
+Docker on your computer. You’ll want to look for the Docker Community
+Edition
+for your operating system. It sometimes takes some time for Docker to
+start, especially the first time. Once Docker starts, it won’t open a
+window; you’ll just see a little whale and container icon in one of your
+computers toolbars. In order to actually use Docker, you’ll need to
+open a command line program (like Terminal, or Command Prompt) and run
+commands there.
+
+
B. Explore Docker Containers
+
+
1. Get a Docker Container Image
+
+
We need to have a local copy of the Docker container image in order to
+test it. You can see what container images you already have on your
+computer by running:
+
+
$ docker image ls
+
+
+
If you just installed Docker on your computer
+and are using it for the first time, this list is probably empty.
+If you want to use a pre-made container from Docker Hub,
+you will need to “pull” it down to your computer.
+If you created a container on your computer, it should already
+be in the list of container images.
+
+
If using a container from Docker Hub, find the container and its name, which
+will be of the format: username/imagename:tag. Then pull a copy of the container
+image to your computer by running the following from either a Terminal
+(Mac/Linux) or Command Prompt (Windows):
+
+
$ docker pull username/image:tag
+
+
+
If you run docker image ls again, you should see the container you downloaded
+listed.
+
+
2. Explore the Container Interactively
+
+
To actually explore a container, run this command:
+
+
$ docker run -it --rm=true username/image:tag /bin/bash
+
+
+
This will start a running copy of the container and start a command line shell
+inside. You should see your command line prompt change to something like:
+
+
root@2191c1169757:/#
+
+
+
+
What Do All the Options Mean?
+
+
+
-it: interactive flag
+
--rm=true: after we exit, this will clean up the runnining container so Docker uses less disk space.
+
username/image:tag: which container to start
+
/bin/bash: tells Docker that when the container starts, we want a command line (bash) inside to run commands
+
+
+
+
If you explore the container using cd and ls, you’ll see that this is a whole,
+self-contained file system, separate from your computer. Try running commands with their
+ --help or --version options to see what’s installed. If you’re planning to create
+ your own container, try following a few of the installation instructions for the software
+ you want to use and see what happens.
+
+
3. Exit the Container
+
+
Once you’re done exploring, type exit to leave the container.
+
+
root@2191c1169757:/# exit
+
+
+
Note that any changes or
+commands you ran in the container won’t be saved! Once you exit the
+running container is shut down and removed (although the container image will still be
+on your computer, which you can see if you type docker image ls again).
+
+
C. Simulate a CHTC Docker Job
+
+
The directions above were about simply exploring a container. If you want to
+simulate what happens in a CHTC job more specifically, we’ll want to do a few things:
+
+
+
create a test working directory, with needed files
+
have a list of commands to run or a script you want to use as the executable.
+
use some extra options when running the container.
+
+
+
1. Create Working Directory
+
+
For testing, we need a folder on your computer to stand in for the
+working directory that HTCondor creates for running your job. Create a folder
+for this purpose on your Desktop. The folder’s name shouldn’t include
+any spaces. Inside this folder, put all of the files that are normally
+inside the working directory for a single job – data, scripts, etc. If
+you’re using your own executable script, this should be in the folder.
+
+
Open a Windows Command Prompt or Mac/Linux Terminal to access that
+folder, replacing “folder” with the name of the folder you created.
+
+
+
Mac/Linux:
+
$ cd ~/Desktop/folder
+
+
+
+
+
Windows:
+
$ cd %HOMEPATH%\Desktop\folder
+
+
+
+
+
2. Plan What to Run
+
+
Once the container starts, you have a few options for testing your job:
+
+
+
Run Commands Directly
+
+
When you start the container, you’ll be able to run each command you
+ want to use, step-by-step. If you have multiple commands, these will eventually
+ need to be put into a shell script as your executable.
+
Example: Running multiple steps of a bioinformatics pipeline
+
+
+
Run an Executable
+
+
If you’ve already written a script with all your commands or code, you can
+ test this in the container.
+
Examples: Running a shell script with multiple steps, running a machine learning Python script
+
+
+
Run a Single Command
+
+
If you only want to run one command, using a program installed in the Docker
+ container, you can run this in the container.
+
Example: Running GROMACS from a container
+
+
+
+
+
3. Start the Docker Container
+
+
We’ll use a similar docker run command to start the Docker container,
+with a few extra options to better emulate how containers are run in
+the HTC system with HTCondor.
+
+
This command can be run verbatim except for the
+username, imagename and tag; these should be whatever you used to
+pull or tag the container image.
For Windows users, a window may pop up, asking for permission to share
+your main drive with Docker. This is necessary for the files to be
+placed inside the container. As in the previous section, the docker run command
+will start a running copy of the container and start a command line shell
+inside.
+
+
+
What Do All the Options Mean? Part 2
+
+
The options that we have added for this example are used in CHTC to make jobs run
+successfully and securely.
+
+
+
--user $(id -u):$(id -g): runs the container with more restrictive permissions
+
-v $(pwd):/scratch: Put the current working directory (pwd) into the container but call it /scratch.
+In CHTC, this working directory will be the job’s usual working directory.
+
-w /scratch: when the container starts, make /scratch the working directory
+
+
+
+
4. Test the job
+
+
Your command line prompt should have changed to look like this:
+
+
I have no name!@5a93cb:/scratch$
+
+
+
We can now see if the job would complete successfully!
+
+
If you have a single command or list of commands to run, start running them one by one.
+If you have an executable script, you can run it like so:
+
+
I have no name!@5a93cb:/scratch$ ./exec.sh
+
+
+
If your “executable” is software already in the container, run the
+appropriate command to use it.
+
+
+
Permission Errors
+
+
The following commands may not be necessary, but if you see messages
+about “Permission denied” or a bash error about bad formatting, you
+may want to try one (or both) of the following (replacing exec.sh
+with the name of your own executable.)
+
+
You may need to add executable permissions to the script for it to run
+correctly:
+
+
I have no name!@5a93cb:/scratch$ chmod +x exec.sh
+
+
+
Windows users who are using a bash script may also need to run the
+following two commands:
+
+
I have no name!@5a93cb:/scratch$ cat exec.sh | tr -d \\r > temp.sh
+I have no name!@5a93cb:/scratch$ mv temp.sh exec.sh
+
+
+
+
When your test is done, type exit to leave the container:
+
+
If the program didn’t work, try searching for the cause of the error
+messages, or email CHTC’s Research Computing Facilitators.
+
+
If your local test did run successfully, you are now ready to set up
+your Docker job to run on CHTC.
If your job is running a bash or shell script (includes the header
+#!/bin/bash), and it goes on hold, you might be experiencing a
+Windows/Linux incompability error. Files written in Windows (based on
+the DOS operating system) and files written in Mac/Linux (based on the
+UNIX operating system) use different invisible characters to mean "end
+of a line" in a file. Normally this isn't a problem, except when
+writing bash scripts; bash will not be able to run scripts if they have
+the Windows/DOS line endings.
+
+
To find why the job went on hold, look for the hold reason, either by
+running
+
+
[alice@submit]$ condor_q -af HoldReason
+
+
+
or by looking in the log file.
+
+
If a Windows/Linux incompatibility is the problem, the hold reason will
+look something like this:
+
+
Error from slot1_11@e189.chtc.wisc.edu: Failed to execute
+'/var/lib/condor/execute/slot1/dir_4086540/condor_exec.exe' with
+arguments 2: (errno=2: 'No such file or directory')
+
+
+
To check if this is the problem, you can open the script in the vi text
+editor, using its "binary" mode:
+
+
[alice@submit]$ vi -b hello-chtc.sh
+
+
+
(Replace hello-chtc.sh with the name of your script.) If you see ^M
+characters at the end of each line, those are the DOS line endings and
+that's the problem.
+(Type :q to quit vi)
+
+
Luckily, there is an easy fix! To convert the script to unix line
+endings so that it will run correctly, you can run:
+
+
[alice@submit]$ dos2unix hello-chtc.sh
+
+
+
on the submit node and it will change the format for you. If you release
+your held jobs (using condor_release) or re-submit the jobs, you
+should no longer get the same error.
To help researchers effectively utilize computing resources, our
+Research Computing Facilitators (RCFs) not only guide the implementation of
+computational work on CHTC compute capacity, but can also
+point researchers to other on- and off-campus services related to
+research computing and data needs. Our primary activities include the
+following.
+
+
Regular Support
+
+
We are available to answer questions via an email “ticket” system. We
+aim to provide a first response (although not necessarily a solution!)
+within 1-2 business days.
+
+
In addition to email, we host drop-in “office hours” online twice a
+week. No appointment is needed, just show up during the available times!
+
+
To email us or drop by office hours, see the information on our get
+help page.
+
+
Course and Group Visits
+
+
The Facilitation Team is
+available to provide guest lectures and introductory presentations to
+campus courses, regular department or program seminars, or individual
+lab group meetings.
The Facilitation Team offers occasional training sessions for CHTC
+users. Upcoming training events are announced via the CHTC Users email
+list and are listed on the CHTC events page:
When submitting jobs to CHTC’s High Throughput Computing (HTC) system,
+there is a distinct location for staging data that is too large to be
+handled at scale via the default HTCondor file transfer mechanism. This
+location should be used for jobs that require input files larger than 100MB
+and/or that generate output files larger than 3-4GB.
+
+
To best understand the below information, users should already be
+familiar with:
+
+
+
Using the command-line to: navigate directories,
+create/edit/copy/move/delete files and directories, and run intended
+programs (aka “executables”).
USERS VIOLATING ANY OF THE POLICIES IN THIS GUIDE WILL
+HAVE THEIR DATA STAGING ACCESS AND/OR CHTC ACCOUNT REVOKED UNTIL CORRECTIVE
+MEASURES ARE TAKEN. CHTC STAFF RESERVE THE RIGHT TO REMOVE ANY
+PROBLEMATIC USER DATA AT ANY TIME IN ORDER TO PRESERVE PERFORMANCE.
+
+
+
A. Intended Use
+
+
Our large data staging location is only for input and output files that
+are individually too large to be managed by our other data movement
+methods, HTCondor file transfer or SQUID. This includes individual input files
+greater than 100MB and individual output files greater than 3-4GB.
+
+
Users are expected to abide by this intended use expectation and follow the
+instructions for using /staging written in this guide (e.g. files placed
+in /staging should NEVER be listed in the submit file, but rather accessed
+via the job’s executable (aka .sh) script).
+
+
B. Access to Large Data Staging
+
+
Any one with a CHTC account whose data meets the intended use above can request
+space in our large data staging area. A Research Computing Facilitator will
+review the request and follow up. If appropriate, access will be granted via
+a directory in the system and a quota. Quotas are based on individual user needs;
+if a larger quota is needed, see our Request a Quota Change guide.
+
+
We can also create group or shared spaces by request.
+
+
C. User Data Management Responsibilities
+
+
As with all CHTC file spaces:
+
+
+
Keep copies: Our large data staging area is not backed up and has the
+possibility of data loss; keep copies of ANY and ALL data in /staging in another, non-CHTC
+location.
+
Remove data: We expect that users remove data from /staging AS
+SOON AS IT IS NO LONGER NEEDED FOR ACTIVELY-RUNNING JOBS.
+
Monitor usage and quota: Each /staging folder has both a size and “items” quota. Quota changes
+can be requested as described in our Request a Quota Change guide.
+
+
+
CHTC staff reserve the right to remove data from our large data staging
+location (or any CHTC file system) at any time.
+
+
D. Data Access Within Jobs
+
+
Staged large data will
+be available only within the the CHTC pool, on a subset of our total
+capacity.
+
+
Staged data are owned by the user, and only the user’s own
+jobs can access these files (unless the user specifically modifies unix
+file permissions to make certain files available for other users).
+
+
2. Staging Large Data
+
+
In order to stage large data for use on CHTC’s HTC system:
+
+
+
Get a directory: Large data staging is available by request.
+
Reduce file counts: Combine and compress files that are used together.
+
Use the transfer server: Upload your data via our dedicated file transfer server.
+
Remove files after jobs complete: our data staging space is quota controlled and not backed up.
+
+
+
A. Get a Directory
+
+
Space in our large data staging area is granted by request. If you think you need
+a directory, fill out our quota request form.
+
+
The created directory will exist at this path: /staging/username
+
+
B. Reduce File Counts
+
+
Data placed in our large data /staging location
+should be stored in as few files as possible (ideally,
+one file per job), and will be used by a job only after being copied
+from /staging into the job working directory (see below).
+Similarly, large output should first be written to the
+job working directory then compressed in to a single file before being
+copied to /staging at the end of the job.
+
+
To prepare job-specific data that is large enough to pre-staging
+and exists as multiple files or directories (or a directory of multiple
+files), first create a compressed tar package before placing the file in
+/staging (either before submitting jobs, or within jobs before
+moving output to /staging). For example:
+
+
$ tar -czvf job_package.tar.gz file_or_dir
+
+
+
C. Use the Transfer Server
+
+
Movement of data into/out of /staging before and after jobs should
+only be performed via CHTC’s transfer server, as below, and not via a
+CHTC submit server. After obtaining a user directory within
+/staging and an account on the transfer server, copy relevant
+files directly into this user directory from your own computer:
+
+
+
Example scp command on your own Linux or Mac computer:
+
Using a file transfer application, like WinSCP, directly drag the large
+file from its location on your computer to a location within
+/staging/username/ on transfer.chtc.wisc.edu.
+
+
+
+
+
D. Remove Files After Jobs Complete
+
+
As with all CHTC file spaces, data should be removed from /staging AS
+SOON AS IT IS NO LONGER NEEDED FOR ACTIVELY-RUNNING JOBS. Even if it
+will be used it the future, it should be deleted from and copied
+back at a later date. Files can be taken off of /staging using similar
+mechanisms as uploaded files (as above).
+
+
3. Using Staged Files in a Job
+
+
As shown above, the staging directory for large data is /staging/username.
+All interaction with files in this location should occur within your job’s
+main executable.
+
+
A. Accessing Large Input Files
+
+
To use large data placed in the /staging location, add commands to your
+job executable that copy input
+from /staging into the working directory of the job. Program should then use
+files from the working directory, being careful to remove the coiped
+files from the working
+directory before the completion of the job (so that they’re not copied
+back to the submit server as perceived output).
+
+
Example, if executable is a shell script:
+
+
#!/bin/bash
+#
+# First, copy the compressed tar file from /staging into the working directory,
+# and un-tar it to reveal your large input file(s) or directories:
+cp /staging/username/large_input.tar.gz ./
+tar -xzvf large_input.tar.gz
+#
+# Command for myprogram, which will use files from the working directory
+./myprogram large_input.txt myoutput.txt
+#
+# Before the script exits, make sure to remove the file(s) from the working directory
+rm large_input.tar.gz large_input.txt
+#
+# END
+
+
+
B. Moving Large Output Files
+
+
If jobs produce large (more than 3-4GB) output files, have
+your executable write the output file(s) to a location within
+the working directory, and then make sure to move this large file to
+the /staging folder, so that it’s not transferred back to the home directory, as
+all other “new” files in the working directory will be.
+
+
Example, if executable is a shell script:
+
+
#!/bin/bash
+#
+# Command to save output to the working directory:
+./myprogram myinput.txt output_dir/
+#
+# Tar and mv output to staging, then delete from the job working directory:
+tar -czvf large_output.tar.gz output_dir/ other_large_files.txt
+mv large_output.tar.gz /staging/username/
+rm other_large_files.txt
+#
+# END
+
+
+
C. Handling Standard Output (if needed)
+
+
In some instances, your software may produce very large standard output
+(what would typically be output to the command screen, if you ran the
+command for yourself, instead of having HTCondor do it). Because such
+standard output from your software will usually be captured by HTCondor
+in the submit file “output” file, this “output” file WILL still be
+transferred by HTCondor back to your home directory on the submit
+server, which may be very bad for you and others, if that captured
+standard output is very large.
+
+
In these cases, it is useful to redirect the standard output of commands
+in your executable to a file in the working directory, and then move it
+into /staging at the end of the job.
+
+
Example, if “myprogram” produces very large standard output, and is
+run from a script (bash) executable:
+
+
#!/bin/bash
+#
+# script to run myprogram,
+#
+# redirecting large standard output to a file in the working directory:
+./myprogram myinput.txt myoutput.txt > large_std.out
+#
+# tar and move large files to staging so they're not copied to the submit server:
+tar -czvf large_stdout.tar.gz large_std.out
+cp large_stdout.tar.gz /staging/username/subdirectory
+rm large_std.out large_stdout.tar.gz
+# END
+
+
+
4. Submit Jobs Using Staged Data
+
+
In order to properly submit jobs using staged large data, always do the following:
+
+
+
Submit from /home: ONLY submit jobs from within your home directory
+ (/home/username), and NEVER from within /staging.
+
+
+
In your submit file:
+
+
+
No large data in the submit file: Do NOT list any /staging files in any of the submit file
+ lines, including: executable, log, output, error, transfer_input_files. Rather, your
+ job’s ENTIRE interaction with files in /staging needs to occur
+ WITHIN each job’s executable, when it runs within the job (as shown above)
+
Request sufficient disk space: Using request_disk, request an amount of disk
+space that reflects the total of a) input data that each job will copy into
+ the job working directory from /staging, and b) any output that
+ will be created in the job working directory.
+
Require access to /staging: Include the CHTC specific attribute that requires
+servers with access to /staging
+
+
+
See the below submit file, as an example, which would be submitted from
+within the user’s /home directory:
+
+
### Example submit file for a single job that stages large data
+# Files for the below lines MUST all be somewhere within /home/username,
+# and not within /staging/username
+
+executable = run_myprogram.sh
+log = myprogram.log
+output = $(Cluster).out
+error = $(Cluster).err
+
+## Do NOT list the large data files here
+transfer_input_files = myprogram
+
+# IMPORTANT! Require execute servers that can access /staging
+Requirements = (Target.HasCHTCStaging == true)
+
+# Make sure to still include lines like "request_memory", "request_disk", "request_cpus", etc.
+
+queue
+
+
+
+
Note: in no way should files on /staging be specified in the submit file,
+directly or indirectly! For example, do not use the initialdir option (
+Submitting Multiple Jobs in Individual Directories)
+to specify a directory on /staging.
+
+
+
5. Checking your Quota, Data Use, and File Counts
+
+
You can use the command get_quotas to see what disk
+and items quotas are currently set for a given directory path.
+This command will also let you see how much disk is in use and how many
+items are present in a directory:
Alternatively, the ncdu command can also be used to see how many
+files and directories are contained in a given path:
+
+
[username@transfer ~]$ ncdu /staging/username
+
+
+
When ncdu has finished running, the output will give you a total file
+count and allow you to navigate between subdirectories for even more
+details. Type q when you're ready to exit the output viewer. More
+info here: https://lintut.com/ncdu-check-disk-usage/
When submitting jobs to CHTC’s High Throughput Computing (HTC) system,
+there is a distinct location for staging data that is too large to be
+handled at scale via the default HTCondor file transfer mechanism
+but needs to be accessed outside of CHTC
+(for example, data for jobs that run on the OS Pool).
+
+
To best understand the below information, users should already be
+familiar with:
+
+
+
Using the command-line to: navigate directories,
+create/edit/copy/move/delete files and directories, and run intended
+programs (aka “executables”).
USERS VIOLATING ANY OF THE POLICIES IN THIS GUIDE WILL
+HAVE THEIR DATA STAGING ACCESS AND/OR CHTC ACCOUNT REVOKED UNTIL CORRECTIVE
+MEASURES ARE TAKEN. CHTC STAFF RESERVE THE RIGHT TO REMOVE ANY
+PROBLEMATIC USER DATA AT ANY TIME IN ORDER TO PRESERVE PERFORMANCE.
+
+
+
A. Intended Use
+
+
Our S3 data storage is only for input and output files that
+are individually too large to be managed by our other data movement
+methods, HTCondor file transfer or SQUID, and when these files are
+expected to be accessed outside of CHTC. This includes individual input files
+greater than 100MB and individual output files greater than 3-4GB.
+
+
Files in our S3 data storage are organized in storage units called
+“buckets.” You can think of an S3 bucket like a folder containing a
+set of data. Each bucket has a unique name of your choosing and can
+contain folders, executable files, data files, and most other types of
+files. S3 buckets are protected with a key that is unique to you
+(similar to a password) and, when provided with the key, buckets
+can be accessed from any machine with an internet connection. CHTC
+automatically creates and manages keys for users, so you do not have
+to provide your key when manging files in your S3 buckets on CHTC
+transfer servers or when submitting jobs on CHTC submit servers that
+transfer data from S3 buckets.
+
+
Users are expected to abide by this intended use expectation and follow the
+instructions for using S3 buckets written in this guide (e.g. files placed
+in S3 buckets should ALWAYS be listed in the submit file).
+
+
B. Getting Access to Create S3 Buckets
+
+
Any one with a CHTC account whose data meets the intended use above
+can request access to create S3 buckets inside CHTC’s S3 data
+storage. A Research Computing Facilitator will review the request and
+follow up. If appropriate, S3 bucket creation will be enabled for and
+a quota will be set on your account. Quotas are based on individual
+user needs; if a larger quota is needed, email chtc@cs.wisc.edu with
+your request.
+
+
C. User Data Management Responsibilities
+
+
As with all CHTC file spaces:
+
+
+
Keep copies: Our S3 buckets are not backed up and have the
+possibility of data loss; keep copies of ANY and ALL data in S3
+buckets in another, non-CHTC location.
+
Remove data: We expect that users remove data from S3 buckets AS
+SOON AS IT IS NO LONGER NEEDED FOR ACTIVELY-RUNNING JOBS.
+
Monitor usage and quota: Your account has both a size and
+number of files quota that applies across all buckets owned by your
+account. Quota changes can be requested by emailing chtc@cs.wisc.edu.
+
+
+
CHTC staff reserve the right to remove S3 buckets or revoke bucket
+creation permission at any time.
+
+
D. Data Access Within Jobs
+
+
Data in a CHTC S3 bucket can be accessed from jobs running almost
+anywhere (including most of OS Pool). HTCondor automatically matches and
+runs jobs that use S3 buckets only on machines that support S3 data
+transfers.
+
+
Data in CHTC S3 buckets are owned by the user (or a set of users), and
+only the user’s (or users’) own jobs can access these files.
+
+
2. Staging Large Data in S3 Buckets
+
+
In order to stage data in an S3 bucket for use on CHTC’s HTC system:
+
+
+
Get S3 bucket creation access: Bucket creation access is granted by request.
+
Create an S3 bucket: Create a bucket that will contain the data for your project.
+
Reduce file counts: Combine and compress files that are used together.
+
Use the transfer server: Upload your data to your bucket via our dedicated file transfer server.
+
Remove files after jobs complete: Data in S3 buckets are quota controlled and not backed up.
+
+
+
A. Get S3 Bucket Creation Access
+
+
CHTC S3 bucket creation access is granted by request. If you think you need
+to create S3 buckets, email CHTC’s Research Computing Facilitators (chtc@cs.wisc.edu).
+
+
B. Create an S3 Bucket
+
+
Buckets can be created on a CHTC submit server or the CHTC transfer server
+using the mc command:
+
+
[alice@transfer]$ mc mb chtc/my-bucket-name
+
+
+
Each bucket in CHTC must have a unique name, so be descriptive! We
+recommend creating a bucket per dataset or per batch of jobs.
+
+
C. Reduce File Counts
+
+
Data placed in S3 buckets should be stored in as few files as possible
+(ideally, one file per job). Similarly, large output should first be
+written to the job working directory then compressed in to a single
+file before being transferred back to an S3 bucket at the end of the job.
+
+
To prepare job-specific data that is large enough
+and exists as multiple files or directories (or a directory of multiple
+files), first create a compressed tar package before placing the file in
+an S3 bucket (either before submitting jobs, or within jobs before
+transferring output to). For example:
+
+
$ tar -czvf job_package.tar.gz file_or_dir
+
+
+
D. Use the Transfer Server
+
+
Movement of large data into/out of S3 buckets before and after jobs
+should be performed via CHTC’s transfer server, as below, and
+not via a CHTC submit server. After obtaining an account on the
+transfer server and creating an S3 bucket, copy relevant files directly into your
+home directory from your own computer:
+
+
+
Example scp command on your own Linux or Mac computer:
+
Using a file transfer application, like WinSCP, directly drag the large
+file from its location on your computer to a location within
+/home/username/ on transfer.chtc.wisc.edu.
+
+
+
+
+
Then in an SSH session on the transfer server, copy files in to your
+S3 bucket:
+
+
[alice@transfer]$ mc cp large-input.file chtc/my-bucket
+
+
+
E. Remove Files After Jobs Complete
+
+
As with all CHTC file spaces, data should be removed from S3 buckets AS
+SOON AS IT IS NO LONGER NEEDED FOR ACTIVELY-RUNNING JOBS. Even if it
+will be used again in the future, it should be deleted from and copied
+back at a later date. Files can be taken out of S3 buckets using similar
+mechanisms as uploaded files. In an SSH session on the transfer
+server, copy files from your bucket to your home directory:
+
+
[alice@transfer]$ mc cp chtc/my-bucket/large-output.file .
+
+
+
Then copy files from the transfer server to your own computer:
+
+
+
Example scp command on your own Linux or Mac computer:
+
Using a file transfer application, like WinSCP, directly drag the large
+file from its location within /home/username/ on
+transfer.chtc.wisc.edu to your computer.
+
+
+
+
+
To remove a file inside your S3 bucket, in an SSH session on the
+transfer server:
+
+
[alice@transfer]$ mc rm chtc/my-bucket/large-input.file
+[alice@transfer]$ mc rm chtc/my-bucket/large-output.file
+
+
+
To remove an entire bucket (only do this if you are certain the
+bucket is no longer needed):
+
+
[alice@transfer]$ mc rb chtc/my-bucket
+
+
+
3. Using Staged Files in a Job
+
+
A. Transferring Large Input Files
+
+
To use data placed in a CHTC S3 bucket, add files to your submit
+file’s transfer_input_files that point to the filename
+(e.g. large-input.file) inside your bucket (e.g. my-bucket) on
+CHTC’s S3 storage (s3dev.chtc.wisc.edu):
Intended Use:
+ The SQUID web proxy is best for cases where many jobs will use the
+ same large file (or few files), including large software. It is not
+ good for cases when each of many jobs needs a different large
+ input file, in which case our large data staging
+ location should be used. Remember that
+ you're always better off by pre-splitting a large input file into
+ smaller job-specific files if each job only needs some of the large
+ files's data. If each job needs a large set of many files, you
+ should create a .tar.gz file containing all the files, and this
+ file will still need to be less than 1 GB.
+
+
+
Access to SQUID:
+ is granted upon request to chtc@cs.wisc.edu. A user on CHTC submit
+ servers may will be granted a user directory within /squid, which
+ users should transfer data into via the CHTC transfer server
+ (transfer.chtc.wisc.edu). As for all CHTC file space, users should
+ minimize the amount of data on the SQUID web proxy, and should clean
+ files from the /squid location regularly. CHTC staff reserve the
+ right to remove any file from /squid when needed to preserve
+ availability and performance for all users.
+
+
+
Advantages:
+ Files placed on the SQUID web proxy can be downloaded by jobs
+ running anywhere, because the files are world-readable.
+
+
Limitations and Policies:
+
+
SQUID cannot be used for job output, as there is no way to
+change files in SQUID from within a job.
+
SQUID is also only capable of delivering individual files up to
+1 GB in size.
+
A change you make to a file within your /squid directory may
+not take effect immediately on the SQUID web proxy if you use
+the same filename. Therefore, it is important to use a new
+filename when replacing a file in your /squid directory.
+
Jobs should still ALWAYS and ONLY be submitted from within the
+user's /home location.
+
Only the "http" address should be listed in the
+"transfer_input_files" line of the submit file. File
+locations starting with "/squid" should NEVER be listed in
+the submit file.
+
Users should only have data in /squid that is being use for
+currently-queued jobs; CHTC provides no back ups of any data in
+CHTC systems, and our staff reserve the right to remove any data
+causing issues. It is the responsibility of users to keep copies
+of all essential data in preparation for potential data loss or
+file system corruption.
+
+
+
+
Data Security:
+ Files placed in SQUID can only be edited by the owner of the user
+ directory within /squid, but will end up being world-readable on
+ the SQUID web proxy in order to be readily downloadable by jobs
+ (with the proper HTTP address); thus, large files that should be
+ "private" should not be placed in your user directory in /squid,
+ and should instead use CHTC's large data staging
+ space for large-file staging.
+
+
+
+
+
2. Using SQUID to Deliver Input Files
+
+
+
+
Request a directory in SQUID. Write to chtc@cs.wisc.edu describing the data you'd like to place in SQUID, and indicating your username and submit server hostname (i.e. submit-5.chtc.wisc.edu).
+
+
+
Place files within your /squid/username directory via a CHTC
+transfer server (if from your laptop/desktop) or on the submit
+server.
Have HTCondor download the file to the working job using the
+http://proxy.chtc.wisc.edu/SQUID address in the
+transfer_input_files line of your submit file:
Important:Make sure to replace "username" with your username
+in the above address. All other files should be staged before job
+submission.
+
+If your large file is a .tar.gz file that untars to include other
+files, remember to remove such files before the end of the job;
+otherwise, HTCondor will think that such files are new output that
+needs to be transferred back to the submit server. (HTCondor will
+not automatically transfer back directories.)
Due to the distributed configuration of the CHTC HTC pool, more often than not,
+your jobs will need to bring along a copy (i.e. transfer a copy) of
+data, code, packages, software, etc. from the submit server where the job
+is submitted to the execute node where the job will run. This requirement
+applies to any and all files that are needed to successfully execute and
+complete your job.
+
+
Any output that gets generated by your jobs is specifically written to
+the execute node on which the job ran. In order to get access to
+your output files, a copy of the output must be transferred back
+to an user accessible location like the submit server.
+
+
The mechanism that you use for file transfers will depend on the size
+of the individual input and output files of your jobs. This guide
+specifically describes input and output file transfer for input files
+<100MB in size (and <500MB of total input file transfer) and output
+files <4GB in size using the standard solution built into HTCondor
+job scheduling. More information about file transfer on a system
+without a shared filesystem is available in the
+HTCondor manual.
+
+
+
+
Applicability
+
+
+
+
Intended use:
+Good for delivering any type of data to jobs, but with file-size
+limitations (see below). Remember that you can/should split up a large
+input file into many smaller files for cases where each job only needs a
+portion of the data. By default, the submit file executable,
+output, error, and log files are ALWAYS transferred.
+
+
+
Advantages:
+HTCondor file transfer is robust and is available on ANY of CHTC's
+accessible HTC resources including the UW Grid of campus pools, and the
+OS Pool.
+
+
+
Data Security:
+Files transferred with HTCondor transfer are owned by the job and
+protected by user permissions in the CHTC pool. When signaling your jobs
+to run on the UW Grid (Flocking) or the OS Pool (Glidein),
+your files will exist on someone else's computer only for the duration
+of each job. Please feel free to email us if you have data security
+concerns regarding HTCondor file transfer, as encryption options are
+available.
+
+
+
+
+
+
Transferring Input Files
+
+
To have HTCondor transfer small (<100MB) input files needed by
+your job, include the following attributes in your CHTC HTCondor submit files:
By default, the submit file executable, output, and
+error files are ALWAYS transferred.
+
+
Important Considerations
+
+
+
+
DO NOT use transfer_input_files for files within /staging;
+for files in /squid only http links (e.g. http://proxy.chtc.wisc.edu/SQUID/username/file) should be
+used instead of direct file paths. These policies are in place to prevent severe performance issues for your
+jobs and those of other users. Jobs should should never be submitted
+from within /squid or /staging.
+
+
+
HTCondor's file transfer can cause issues for submit server performance
+when too many jobs are transferring too much data at the same time.
+Therefore, HTCondor file transfer is only good for input files up to
+~20 MB per file IF the number of concurrently-queued jobs will be 100
+or greater. Even when individual files are small, there are issues when
+the total amount of input data per-job approaches 500 MB. For cases
+beyond these limitations, one of our other CHTC file delivery methods
+should be used. Remember that creating a tar.gz file of directories
+and files can give your input and output data a useful amount of
+compression.
+
+
+
Comma-separated files and directories to-be-transferred should be
+listed with a path relative to the submit directory, or can be
+listed with the absolute path(s), as shown above for file3. The
+submit file executable is automatically transferred and does not
+need to be listed in transfer_input_files.
+
+
+
All files that are transferred to a job will appear within the top
+of the working directory of the job, regardless of how they are
+arranged within directories on the submit server.
+
+
+
A whole directory and it's contents will be transferred when listed
+without the trailing forward slash ("/") after the directory name. When a directory is
+listed with the trailing forward slash ("/") after the directory name, only the directory
+contents will be transferred. Care should be taken when transferring whole directories
+so that only the files needed by your jobs will be transferred.
+Generally, we recommend creating a tar.gz file of directories
+and files to be used a job inputs - this will help streamline the process of input
+file transfer and help speed up transfer times by reducing the overall size of
+files that will be transferred.
+
+
+
Jobs will be placed on hold by HTCondor if any of the files or
+directories do not exist or if you have a typo.
when_to_transfer_output = ON_EXIT will instruct HTCondor to automatically transfer
+ALL new or modified files in the top level directory of the job (where it ran on the execute
+server), back to the job’s initial directory on the submit server. Please note: this behavior
+only applies to files in the job’s top-level working directory, meaning HTCondor will ignore
+any files created in subdirectories of the job’s main working directory. Several options exist for modifying
+this default output file transfer behavior - see below for some examples.
+
+
Only individual output files <4GB should be transferred back to your home directory
+using HTCondor’s default behavior described here. Large output files >4GB should instead
+use CHTC’s large data filesystem called staging, more information is available at
+Managing Large Data in HTC Jobs. To help reduce output file
+sizes, and help speed up file transfer times, we recommend creating a tar.gz file of all
+desired output before job completion (and to also delete the “un-tar'd”
+files so they are not also transferred back); see our example below.
+
+
+
+
Group Multiple Output Files For Convenience
+
+
If your jobs will generate multiple output files, we recommend combining all output into a compressed
+tar archive for convenience, particularly when transferring your results to your local computer from
+the submit server. To create a compressed tar archive, include commands in your your bash executable script
+to create a new subdirectory, move all of the output to this new subdirectory, and create a tar archive.
+For example:
+
+
#! /bin/bash
+
+# various commands needed to run your job
+
+# create output tar archive
+mkidr my_output
+mv my_job_output.csv my_job_output.svg my_output/
+tar -czf my_job.output.tar.gz my_ouput/
+
+
+
The example above will create a file called my_job.output.tar.gz that contains all the output that
+was moved to my_output. Be sure to create my_job.output.tar.gz in the top-level directory of where
+your job executes and HTCondor will automatically transfer this tar archive back to your /home
+directory.
+
+
+
+
Select Specific Output Files to Transfer to /home
+
+
As described above, HTCondor will transfer ALL new or modified files in the top level
+directory of the job (where it ran on the execute server), back to the job’s initial directory
+on the submit server. If your jobs will produce multiple output
+files but you only need to retain a subset of these output files, we recommend deleting the unrequired
+output files or moving them to a subdirectory as a step in the bash
+executable script of your job - only the output files that remain in the top-level
+directory will be transferred back to your /home directory. This will help keep ample
+space free and available on your /home directory on the submit server and help prevent
+you from exceeding the disk quota.
+
+
For jobs that use large input files from /staging, you must include steps in your bash script
+to either remove these files or move them to a subdirectory before the job terminates. Else,
+these large files will be transferred back to your /home directory. For more details, please
+see Managing Large Data in HTC Jobs.
+
+
In cases where a bash script is not used as the excutable of your job and you wish to have only specific
+output files transferred back, please contact us.
+
+
+
+
Get Additional Options For Managing Job Output
+
+
Several options exist for managing output file transfers back to your /home directory and we
+encourage you to get in touch with us at chtc@cs.wisc.edu to
+help identify the best solution for your needs.
+
+
Request a Quota Change
+
+
If you find that you are need of more space in you /home directory to handle the number
+of jobs that you want to run, please see our Request a Quota Change guide.
Please fill out the below form in order to provide us with
+information about the computing work you would like
+to do (to the best of your knowledge). If you are unsure of an answer to a
+below question, leave the question blank or indicate that you don't know.
+
+
After filling out this form, CHTC Facilitation staff will follow up to
+create your account and offer times for an initial consultation. The consultation
+is only mandatory for new groups to CHTC, but we strongly encourage anyone who
+is getting started to take advantage of this valuable opportunity to discuss your
+work one-on-one with a CHTC Research Computing Facilitator.
If you do not receive an automated email from chtc@cs.wisc.edu within a few hours of completing the form,
+ OR if you do not receive a response from a human within two business days (M-F), please email chtc@cs.wisc.edu.
See below for the various ways to get help when using CHTC services.
+
+
+
+
Get An Account
+
+
If you don’t have an account yet, please fill out our Request
+Form, and we’ll follow up with your account details
+or a request to meet. If you don’t have an account but just have general
+questions, feel free to send an email to chtc@cs.wisc.edu (see below).
+
+
Request a Quota Change
+
+
If you’d like to request a change in your quotas for one of our data
+storage locations, please see our Request a Quota Change guide.
+
+
Help Via Email
+
+
We provide support via email at the address
+chtc@cs.wisc.edu and this is a good,
+general way to reach us. You can typically
+expect a first response within 1-2 business days.
+
+
When emailing us for assistance in troubleshooting an issue, please provide which system you are using,
+an explanation of what you expected to happen versus what actually happened, and
+include relevant files (or provide the locations of them on the system), such as:
+
+
+
The job submit file (.sub)
+
The job executable (.sh) or list of commands used in an interactive job
+
Standard error and standard output files (usually .out or .err)
+
If on the HTC system, the HTCondor log file (.log)
+
+
+
We will use this information to give you more effective responses and solutions.
+
+
Office Hours
+
+
+
+
+
+
For users who already have accounts, we have drop-in office hours, online, during the following times:
+
+
+
Tuesday morning: 10:30 am - 12:00 pm.
+
Thursday afternoon: 3:00 - 4:30 pm.
+
+
+
To drop in, find the videoconference link in either your email or in the
+login message when you log into a CHTC server.
+
+
As always, if the times above don’t work for you, please email us
+at our usual support address to schedule a separate meeting.
We have a system status page at https://status.chtc.wisc.edu that we
+use to provide updates
+about CHTC system issues, including outages and scheduled maintenance. Major outages and maintenance
+issues are still communicated via the chtc-users email list, but minor issues and updates to
+ongoing issues will be communicated via the status page.
+
+
+
If you are experiencing an issue with the system, please check the status page! If you
+don’t see a corresponding incident, feel free to email us.
+
+
+
Make an Appointment
+
+
We are happy to arrange meetings outside of designated Office Hours, per
+your preference. Simply email us at the address above, and we will set
+up a time to meet!
+
+
More About Us
+
+
Support at CHTC is provided by the whole team, and lead by the Research
+Computing Facilitation Team. Learn more about us here:
Anyone on the UW-Madison campus, and even off-campus collaborators,
+may use the CHTC. In order to get started, we need some information
+to understand how best to help you.
+Please fill out the form here,
+being as thorough as you can.
In order to submit jobs to our campus-wide collection of resources, you
+will need access to a submit node. There are several options for getting
+access to a submit node:
+
+
+
Use ours. We operate a submit node that
+is shared by many researchers. This is a great way to get started
+quickly, and it is sufficient if you do not need to run tens of
+thousands of jobs with heavy data transfer requirements.
+
Use your department's. Perhaps your department already has its
+own submit node, in which case you can contact your local
+administrator for an account. You will still need to provide all the
+info requested on the getting started form, so
+we can set up things on our end. The benefits of using a
+departmental or group submit node are: access to data on local file
+systems; limited impact from other, potentially new users; and,
+greater scalability in the number of simultaneous jobs you can run,
+as well as the amount of data you can transfer.
+
+
Set up a new submit node on a server. If you do not already have
+one and need access to data on local file systems, or if you believe
+that you will have a significant job and/or data volume, getting
+your own submit node is probably the best way to go. Here's an
+example system configuration that we've found works well for a
+variety of submit work loads. You can expect to spend around
+$4,000 - $5,000 for such a system.
+
+
Typical submit node configuration
+
+
+
A 1U rack-mount enclosure, like a Dell PowerEdge 410.
+
Two processors with 12 cores total, for example Intel Xeon
+E5645, 2.4GHz 6-core processors
+
24GB of 1.3 GHz RAM
+
Two drives for the operating system. 500GB each is enough. You
+can use mirroring or a RAID configuration like RAID-6 for
+reliability.
+
Two or more 2-3TB drives for data, depending on your needs.
+
+
+
Use your desktop. Depending on your department's level of
+system adminstration support, you may be able to have HTCondor
+installed on your desktop and configured to submit into our campus
+resources. Another option that is under development is
+Bosco, a
+user-installable software package that lets you submit jobs into
+resources managed by HTCondor, PBS or SGE.
+
+
+
Still not sure what option is right for you? No worries. This is one of
+the topics we discuss in our initial consultation. To schedule an
+initial consultation, fill out our getting started
+form.
A message will appear stating that the key pair is being generated.
+
+
A second message will appear prompting you to enter the location where the SSH keys should be stored:
+
Enter a file in which to save the key (/home/your_NetID/.ssh/ed25519):
+
+
+
Simply hit the enter key to accept the specified file path.
+
+
Note: If a SSH key already exists at the displayed path it will be overwritten by this action.
+This can be avoided by typing in an alternate path before pressing the enter key.
+
+
+
You will be prompted to create a passphrase. Type your desired passphrase and then hit enter. Repeat a second time when asked to confirm your passphrase.
+
+
Warning: If you leave the passphrase empty (hit enter without typing anything), a passphrase will not be created nor required for using the SSH connection. In principle, this means anyone with access to the private key can access and modify your GitHub account remotely.
+
+
+
A message will appear confirming the creation of the SSH key pair, as well as the paths and names of the private and public keys that were generated. Make note of these paths for use in the following steps.
Copy the contents of the public SSH key file (id_ed25519.pub) created in Part A. There are several ways of doing this.
+
+
+
If you provided an alternate file name in Step 3. of Part A., then the public SSH key will be the name of that file plus the .pub extension.
+
+
+
+
Print the contents of the file to the screen by entering the following command, replacing your_NetID with your actual NetID.
+
cat /home/your_NetID/.ssh/id_ed25519.pub
+
+
+
Use a terminal editor (nano, vi, etc.) to open and view the file
+
Use a file transfer method to transfer the file to your local computer (Transferring Files).
+
+
+
Next, log in to github.com using the same email that you used in Step 2. of Part A.
+
Go to your account settings by clicking on your profile icon in the top right corner of the webpage, then click on Settings within the drop-down menu. If your browser window is small, the Settings button can be found by clicking the menu button at the top left of the webpage.
+
Go to the SSH and GPG keys section. Under the SSH keys section, click New SSH key.
+
Paste the contents of the SSH public key from Step 1. into the Key textbox.
+
Name the SSH key using the Title textbox. We recommend “CHTC” plus the name of the login node. For example: “CHTC ap2001”.
+
Click Add SSH key. The SSH key will now appear in the SSH keys section in your GitHub account settings.
+
+
+
C. Accessing Your Private GitHub Repository from the Cluster
+
Once the SSH key has been added to your GitHub account, you can access your private repository using the repository’s SSH address.
+
+
+
In your web browser and while logged in to your GitHub account, go to webpage for the private repository.
+
Click the <>Code button, then select the Local tab and then the SSH tab.
+
Copy the SSH address that is shown.
+
+
On the CHTC submit node, you can now access the repository using git commands by using the SSH address in place of the HTTPS address. For example,
If prompted for a passphrase when running commands with the SSH address, provide the passphrase you created in Step 4. of Part A.
+
+
+
From an interactive job
+
+
Because the interactive job takes place on a different node than the submit node, it will not know about the SSH key that you set up above. Use the following instructions to transfer and use the private identity key in the interactive job (see Compiling or Testing Code with an Interactive Job for more information on interactive jobs).
+
+
+
+
When creating the submit file for your interactive job, include the path to the private SSH key identity file as a value for the transfer_input_files keyword. This will ensure that the identity file is copied to the interactive job directory. For example,
Note: Make sure that you are transferring the private SSH key file, not the public. The public SSH key should have the .pub extension, while the private SSH key does not.
+
+
+
Once your submit file is set up, start the interactive job using condor_submit -i and then the name of your submit file. When the interactive job has started, you will see that the private SSH key file is included in the initial directory. The SSH program, however, still needs to be told to use it.
+
+
Initialize an SSH agent using the command
+
+
eval "$(ssh-agent -s)"
+
+
+
+
Add the private SSH to the SSH agent by using the ssh-add command followed by the name of the private SSH key file that you transferred. You will be prompted to enter the passphrase that you created when you created the SSH key pair. For example,
+
+
ssh-add id_ed25519
+
+
+
You will now be able to access the repository during the interactive job.
+
+
+
+
Additional Notes
+
+
+
If you forget the passphrase you created in Step 4. of Part A., you will need to repeat this guide to create a new SSH key pair to replace the previous one.
+
+
When using the SSH address to your repository with non-git commands, you may need to replace the colon (:) in the address with a forward slash (/). For example,
Globus is a data management service that lets you
+move files files between endpoints, computers that are connected to the
+Globus file transfer network.
+Globus is primarily useful when you need to move large amounts of data to or
+from somewhere that is already providing a Globus endpoint.
+For example, a collaborator might provide shared data through Globus,
+or might expect you to send data to them through Globus.
+
+
This guide will show you how to execute such transfers to and from CHTC using the CHTC
+Globus endpoint, which may be simpler than trying to move the files to your
+own computer first.
+
+
Prerequisites
+
+
All file transfer via Globus at CHTC requires:
+
+
+
access to a directory in the /staging or /projects folders
+
login access to the transfer.chtc.wisc.edu server.
+
+
+
Contact us at chtc@cs.wisc.edu if you need either of the above.
+
+
You will also need to be able to
+log in to the Globus web interface;
+you can use your UW-Madison NetID (if you have one, or similar) by selecting
+University of Wisconsin-Madison from the drop down and pressing “Continue”.
+
+
Using the CHTC Globus Endpoints
+
+
You can use the Globus web interface to transfer files to and from CHTC.
+In the web interface, you can select two endpoints and then initiate a transfer
+between them.
+
+
The first step is to find the CHTC Globus endpoints. They can be found in the Globus web interface
+by searching endpoints for “CHTC Staging” or “CHTC Projects”.
If you need the actual endpoint UUID, it is listed on the above pages near the bottom
+of the “Overview”.
+
+
To use an endpoint, you must first activate it.
+Activations are usually time-limited, and transfers can only proceed while
+both the source and destination endpoints are activated.
+Activating an endpoint generally requires logging in.
+You should log in using your UW - Madison NetID.
+You can see how long your activation will last on the endpoint information page
+in the Globus web interface.
+
+
To begin a file transfer, go to the
+File Manager.
+In the top-right corner of the page, make sure you are in the “two panel” view.
+Select the two endpoints you want to transfer between
+(they are called “Collections” on this page).
+You should see a directory listing appear in the middle of each of the panes;
+select a directory or file and click “Start” at the bottom of the page to
+move that directory or file to the other endpoint.
+The item will be moved to the currently-selected directory on the other endpoint.
+
+
Globus transfers are asynchronous, and you do not need to leave the web
+interface open while they run.
+You will receive emails updates on the progress of the transfer, and you can
+view the status of in-progress and historical transfers
+on the Activity page.
+
+
You may find some of the “transfer settings”, available by clicking the
+“Transfer & Sync Options” dropdown, useful.
+In particular, sync will help reduce the amount of time it takes to transfer
+when some data has already been transferred.
+
+
Running a Personal Globus Endpoint
+
+
The CHTC Globus endpoint is a “Globus Connect Server”, designed for shared use
+on a dedicated machine.
+It is also possible to run
+Globus Connect Personal,
+a lighter-weight package that adds a Globus endpoint to your own computer,
+like a laptop or lab computer.
+Installers are available at that link for Mac, Linux, and Windows.
+
+
We only recommend using Globus Connect Personal if you are also working with
+some other Globus endpoint (not just CHTC and your computer).
+If you are just moving files between CHTC and your own computer, traditional
+file transfer tools like rsync will likely be more efficient.
GPUs (Graphical Processing Units) are a special kind of computer
+processor that are optimized for running very large numbers of simple
+calculations in parallel, which often can be applied to problems related
+to image processing or machine learning. Well-crafted GPU programs for
+suitable applications can outperform implementations running on CPUs by
+a factor of ten or more, but only when the program is written and
+designed explicitly to run on GPUs using special libraries like CUDA.
+For researchers who have problems that are well-suited to GPU
+processing, it is possible to run jobs that use GPUs in CHTC. Read on to
+determine:
CHTC has a set of GPUs that are available for use by any CHTC user with an
+account on our high throughput computing (HTC) system
+via the CHTC GPU Lab, which includes templates and a campus GPU community.
+
+
Our expectation is that most, if not all, of CHTC users running GPU jobs should utilize
+the capacity of the GPU Lab to run their work.
+
+
+
+
Number of Servers
+
Names
+
GPUs / Server
+
GPU Type (DeviceName)
+
Hardware Generation Capability
+
GPU Memory GlobalMemoryMB
+
+
+
+
2
+
gpu2000, gpu2001
+
2
+
Tesla P100-PCIE-16GB
+
6.0
+
16GB
+
+
+
4
+
gpulab2000 - gpulab2003
+
8
+
NVIDIA GeForce RTX 2080 Ti
+
7.5
+
10GB
+
+
+
2
+
gpulab2004, gpulab2005
+
4
+
NVIDIA A100-SXM4-40GB
+
8.0
+
40GB
+
+
+
10
+
gpu2002 - gpu2011
+
4
+
NVIDIA A100-SXM4-80GB
+
8.0
+
80GB
+
+
+
3
+
gpu4000 - gpu4002
+
10
+
NVIDIA L40
+
8.9
+
45GB
+
+
+
1
+
gpu4003
+
8
+
NVIDIA H100 80GB HBM3
+
9.0
+
80GB
+
+
+
+
Special GPU Lab Policies
+
+
Jobs running on GPU Lab servers have time limits and job number limits
+(differing from CHTC defaults across the rest of the HTC System).
+
+
+
+
+
Job type
+
Maximum runtime
+
Per-user limitation
+
+
+
+
+
Short
+
12 hrs
+
2/3 of CHTC GPU Lab GPUs
+
+
+
Medium
+
24 hrs
+
1/3 of CHTC GPU Lab GPUs
+
+
+
Long
+
7 days
+
up to 4 GPUs in use
+
+
+
+
+
There are a certain number of slots in the GPU Lab reserved for interactive use. Interactive
+jobs that use GPU Lab servers are restricted to using a single GPU and a 4 hour runtime.
+
+
2. Other Capacity
+
+
There is additional dedicated and backfill GPU capacity available in CHTC and beyond;
+see GPU capacity beyond the GPU Lab for details.
+
+
B. Submit Jobs Using GPUs in CHTC
+
+
1. Choose GPU-Related Submit File Options
+
+
The following options are needed in your HTCondor submit file in order
+to access the GPUs in the CHTC GPU Lab and beyond:
+
+
+
Request GPUs (required): All jobs that use GPUs must request GPUs in their submit file (along
+with the usual requests for CPUs, memory, and disk).
+
request_gpus = 1
+
+
+
Request the CHTC GPU Lab: To use CHTC’s shared use GPUs, you need to opt-in to the GPU Lab. To
+do so, add the
+following line to your submit file:
+
+WantGPULab = true
+
+
+
Indicate Job Type: We have categorized three “types”
+of GPU jobs, characterized in the table above. Indicate which job type you would
+like to submit by using the submit file option below.
+
+GPUJobLength = "short"
+# Can also request "medium" or "long"
+
+
If you do not specify a job type, the medium job type will be used as the default. If
+ your jobs will run in less than 12 hours, it is advantageous to indicate that they are
+ “short” jobs because you will be able to have more jobs running at once.
+
+
+
Request Specific GPUs or CUDA Functionality (optional): If your software or code requires a certain “capability” of GPU (see table above) or a certain amount of memory
+you can request them with these submit file options:
More information on these commands can be found in the HTCondor manual.
+
+
It may be tempting to add requirements for specific GPU servers or
+ types of GPU cards. However, when possible, it is best to write your
+ code so that it can run across GPU types and without needing the
+ latest version of CUDA.
+
+
+
Indicate Software or Data Requirements Using requirements: If your data is large enough to
+ use our /staging data system (see more information here),
+ or you are using modules or other software in our shared /software system, include
+ the needed requirements.
+
+
Indicate Shorter/Resumable Jobs: if your jobs are shorter than 4-6 hours, or have
+ the ability to checkpoint at least that frequently, we highly recommend taking
+ advantage of the additional GPU servers in CHTC that can run these kind of jobs
+ as backfill! Simply add the following option to your submit file:
+
+is_resumable = true
+
+
+
For more information about the servers that you can run on with this option,
+ and what it means to run your jobs as “backfill” see
+ the section below on Accessing Research Group GPUs.
+
+
Complex GPU requirements: if your jobs have more complex requirements than
+the capability and memory options shown above, you can use a more general submit file
+option require_gpus to construct a complex, custom requirement. Contact the facilitators
+at chtc@cs.wisc.edu if you believe you need to use this option.
+
+
+
2. Sample Submit File
+
+
A sample submit file is shown below. There are also example submit files and
+job scripts in this GPU Job Templates repository
+in CHTC’s Github organization.
+
+
# gpu-lab.sub
+# sample submit file for GPU Lab jobs
+
+universe = vanilla
+log = job_$(Cluster)_$(Process).log
+error = job_$(Cluster)_$(Process).err
+output = job_$(Cluster)_$(Process).out
+
+# Fill in with whatever executable you're using
+executable = run_gpu_job.sh
+#arguments =
+
+should_transfer_files = YES
+when_to_transfer_output = ON_EXIT
+# Uncomment and add input files that are in /home
+# transfer_input_files =
+
+# Uncomment and add custom requirements
+# requirements =
+
++WantGPULab = true
++GPUJobLength = "short"
+
+request_gpus = 1
+request_cpus = 1
+request_memory = 1GB
+request_disk = 1GB
+
+queue 1
+
+
+
+
3. Notes
+
+
It is important to still request at least one CPU per job to do the
+processing that is not well-suited to the GPU.
+
+
Note that HTCondor will make sure your job has access to the GPU; it will
+set the environment variable CUDA_VISIBLE_DEVICES to indicate which GPU(s)
+your code should run on. The environment variable will be read by CUDA to select the appropriate
+GPU(s). Your code should not modify this environment variable or manually
+select which GPU to run on, as this could result in two jobs sharing a GPU.
+
+
It is possible to request multiple GPUs. Before doing so, make sure you’re
+using code that can utilize multiple GPUs and then submit a test job to confirm
+success before submitting a bigger job. Also keep track of how long jobs
+are running versus waiting; the time you save by using multiple GPUs may be
+not worth the extra time that the job will likely wait in the queue.
+
+
C. GPU Capacity Beyond the CHTC GPU Lab
+
+
The following resources are additional CHTC-accessible servers with GPUs. They do not have the
+special time limit policies or job limits of the GPU Lab. However, some of them are
+owned or prioritized by specific groups. The implications of this
+on job runtimes is noted in each section.
+
+
Note that all GPU jobs need to include the request_gpus option in their submit file,
+even if they are not using the GPU Lab.
+
+
1. Access Research Group GPUs
+
+
Certain GPU servers in CHTC are prioritized for the
+research groups that own them, but are available to run other jobs when
+not being used by their owners. When running on these servers, jobs
+forfeit our otherwise guaranteed runtime of 72 hours, and have the potential to be interrupted. However, for
+shorter jobs or jobs that have implemented self-checkpointing, this is not a drawback and allowing jobs to run on these
+additional servers opens up more capacity.
+
+
Therefore, these servers are a good fit for GPU jobs that run in a few hours
+or less, or have implemented self-checkpointing (the capability to save progress
+to a file and restart from that progress). Use the is_resumable option shown
+above in the list of submit file options.
+
+
2. Use the gzk Servers
+
+
These are servers that are similar to the GPU Lab severs with two important differences
+for running GPU jobs:
+
+
they do not have access to CHTC’s large data /staging file system
+
they do not have Docker capability
+
+
+
You do not need to do anything specific to allow jobs to run on these servers.
+
+
3. Using GPUs in CHTC’s OSG Pool and the UW Grid
+
+
CHTC, as a member of the OSG Consortium can access GPUs that
+are available on the OS Pool. CHTC is
+also a member of a campus computing network called the UW Grid, where groups on campus
+share computing capacity, including access to idle GPUs.
+
+
See this guide to know
+whether your jobs are good candidates for the UW Grid or OS Pool and then get in touch
+with CHTC’s Research Computing Facilitators to discuss details.
+
+
D. Using condor_status to explore CHTC GPUs
+
+
You can find out information about GPUs in CHTC through the
+condor_status command. All of our servers with GPUs have a TotalGPUs
+attribute that is greater than zero; thus we can query the pool to find
+GPU-enabled servers by running:
To print out specific information about a GPU server and its GPUs, you
+can use the “auto-format” option for condor_status and the names of
+specific server attributes. In general, when querying attributes using
+condor_status, a “GPUs_” prefix needs to be added to the attribute name.
+For example, the tables at the top of the guide can be mostly
+recreated using the attributes Machine, TotalGpus,
+GPUs_DeviceName and GPUs_Capability:
In addition, HTCondor tracks other GPU-related attributes for each
+server, including:
+
+
+
+
Attribute
+
Explanation
+
+
+
Gpus
+
Number of GPUs in an individual job slot on a server (one server can be divided into slots to run multiple jobs).
+
+
+
TotalGPUs
+
The total number of GPUs on a server.
+
+
+
(GPUs_)DeviceName
+
The type of GPU card.
+
+
+
(GPUs_)Capability
+
Represents various capabilities of the GPU. Can be used as a proxy for the GPU card type when
+ requiring a specific type of GPU. Wikipedia
+ has a table showing the compute capability for specific GPU architectures and cards.
+ More details on what the capability numbers mean can be found on the
+
+ NVIDIA website.
+
+
+
(GPUs_)DriverVersion
+
Not the version of CUDA on the server or the NVIDIA driver version, but the maximum CUDA runtime version supported by the NVIDIA driver on the server.
+
+
+
(GPUs_)GlobalMemoryMb
+
Amount of memory available on the GPU card.
+
+
+
+
E. Prepare Software Using GPUs
+
+
Before using GPUs in CHTC you should ensure that the use of GPUs will
+actually help your program run faster. This means that the code or
+software you are using has the special programming required to use GPUs
+and that your particular task will use this capability.
+
+
If this is the case, there are several ways to run GPU-enabled software
+in CHTC:
+
+
+
Machine Learning
+ For those using machine learning code specifically, we have a guide
+with more specific recommendations here: Run Machine Learning Jobs on
+HTC
+
+
+
1. Compiled Code
+
+
You can use our conventional methods of creating a portable installation
+of a software package (as in our R/Python guides) to run on GPUs. Most
+of our build servers or GPU servers have copies of the CUDA Runtime that
+can be used to compile code. To access these servers, submit an
+interactive job, following the instructions in our Build Job
+Guide or by submitting a GPU job submit file with the
+interactive flag for condor_submit. Once on a build or GPU server, see
+what CUDA versions are available by looking at the path
+/user/local/cuda-*.
+
+
Note that we strongly recommend software installation strategies that
+incorporate the CUDA runtime into the final installed code, so that jobs
+are able to run on servers even if a different version of the CUDA
+runtime is installed (or there’s no runtime at all!). For compiled code,
+look for flags that enable static linking or use one of the solutions
+listed below.
+
+
2. Docker
+
+
CHTC’s GPU servers have “nvidia-docker” installed, a specific version of
+Docker that integrates Docker containers with GPUs. If you can find or
+create a Docker image with your software that is based on the
+nvidia-docker container, you can use this to run your jobs in CHTC. See
+our Docker guide for how to use Docker in CHTC.
The CHTC GPU Lab mailing list is used to announce new GPU hardware availability and
+GPU-related events, solicit feedback from GPU users, and share best practices for
+GPU computing in CHTC. Any CHTC user can subscribe to the list by
+emailing chtc-gpu-lab+managers@g-groups.wisc.edu
+and asking to join.
+Their subscription request will be reviewed by the list administrators.
+
+
+
The CHTC GPU Lab is led by Anthony Gitter, Christina Koch, Brian Bockelman, and Miron Livny.
+
+
+
+
The original UW2020 project was led by Anthony Gitter, Lauren Michael, Brian Bockelman, and Miron Livny and
+funded by the Office of the Vice Chancellor for Research and Graduate
+Education and the Wisconsin Alumni Research Foundation.
+
+
+
For more information about the CHTC GPU Lab project contact Anthony Gitter.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/preview-calendar/uw-research-computing/guide-icons/bash-icon.svg b/preview-calendar/uw-research-computing/guide-icons/bash-icon.svg
new file mode 100644
index 000000000..74540e407
--- /dev/null
+++ b/preview-calendar/uw-research-computing/guide-icons/bash-icon.svg
@@ -0,0 +1,10569 @@
+
+
+
+
+
+
+
+
+
+
+]>
+
diff --git a/preview-calendar/uw-research-computing/guide-icons/checkmark.png b/preview-calendar/uw-research-computing/guide-icons/checkmark.png
new file mode 100644
index 000000000..455a5e46e
Binary files /dev/null and b/preview-calendar/uw-research-computing/guide-icons/checkmark.png differ
diff --git a/preview-calendar/uw-research-computing/guide-icons/conda-icon.png b/preview-calendar/uw-research-computing/guide-icons/conda-icon.png
new file mode 100644
index 000000000..09300e897
Binary files /dev/null and b/preview-calendar/uw-research-computing/guide-icons/conda-icon.png differ
diff --git a/preview-calendar/uw-research-computing/guide-icons/data.png b/preview-calendar/uw-research-computing/guide-icons/data.png
new file mode 100644
index 000000000..e1788bc18
Binary files /dev/null and b/preview-calendar/uw-research-computing/guide-icons/data.png differ
diff --git a/preview-calendar/uw-research-computing/guide-icons/gear.png b/preview-calendar/uw-research-computing/guide-icons/gear.png
new file mode 100644
index 000000000..b3275eeb5
Binary files /dev/null and b/preview-calendar/uw-research-computing/guide-icons/gear.png differ
diff --git a/preview-calendar/uw-research-computing/guide-icons/java-icon.png b/preview-calendar/uw-research-computing/guide-icons/java-icon.png
new file mode 100644
index 000000000..f9dd2ba5a
Binary files /dev/null and b/preview-calendar/uw-research-computing/guide-icons/java-icon.png differ
diff --git a/preview-calendar/uw-research-computing/guide-icons/julia-icon.png b/preview-calendar/uw-research-computing/guide-icons/julia-icon.png
new file mode 100644
index 000000000..0a97ce548
Binary files /dev/null and b/preview-calendar/uw-research-computing/guide-icons/julia-icon.png differ
diff --git a/preview-calendar/uw-research-computing/guide-icons/laptop_arrow.png b/preview-calendar/uw-research-computing/guide-icons/laptop_arrow.png
new file mode 100644
index 000000000..7cd84ebeb
Binary files /dev/null and b/preview-calendar/uw-research-computing/guide-icons/laptop_arrow.png differ
diff --git a/preview-calendar/uw-research-computing/guide-icons/matlab-icon.png b/preview-calendar/uw-research-computing/guide-icons/matlab-icon.png
new file mode 100644
index 000000000..15ede2640
Binary files /dev/null and b/preview-calendar/uw-research-computing/guide-icons/matlab-icon.png differ
diff --git a/preview-calendar/uw-research-computing/guide-icons/noun_gpu_2528527.png b/preview-calendar/uw-research-computing/guide-icons/noun_gpu_2528527.png
new file mode 100644
index 000000000..613d14caa
Binary files /dev/null and b/preview-calendar/uw-research-computing/guide-icons/noun_gpu_2528527.png differ
diff --git a/preview-calendar/uw-research-computing/guide-icons/noun_open book_1179297.png b/preview-calendar/uw-research-computing/guide-icons/noun_open book_1179297.png
new file mode 100644
index 000000000..4da3340e7
Binary files /dev/null and b/preview-calendar/uw-research-computing/guide-icons/noun_open book_1179297.png differ
diff --git a/preview-calendar/uw-research-computing/guide-icons/noun_people_1188645.png b/preview-calendar/uw-research-computing/guide-icons/noun_people_1188645.png
new file mode 100644
index 000000000..af772b071
Binary files /dev/null and b/preview-calendar/uw-research-computing/guide-icons/noun_people_1188645.png differ
diff --git a/preview-calendar/uw-research-computing/guide-icons/processor.png b/preview-calendar/uw-research-computing/guide-icons/processor.png
new file mode 100644
index 000000000..a02d1a67b
Binary files /dev/null and b/preview-calendar/uw-research-computing/guide-icons/processor.png differ
diff --git a/preview-calendar/uw-research-computing/guide-icons/python-icon.png b/preview-calendar/uw-research-computing/guide-icons/python-icon.png
new file mode 100644
index 000000000..2de41b35f
Binary files /dev/null and b/preview-calendar/uw-research-computing/guide-icons/python-icon.png differ
diff --git a/preview-calendar/uw-research-computing/guide-icons/r-icon.png b/preview-calendar/uw-research-computing/guide-icons/r-icon.png
new file mode 100644
index 000000000..a9ea75258
Binary files /dev/null and b/preview-calendar/uw-research-computing/guide-icons/r-icon.png differ
diff --git a/preview-calendar/uw-research-computing/guide-icons/servers.png b/preview-calendar/uw-research-computing/guide-icons/servers.png
new file mode 100644
index 000000000..cc666200a
Binary files /dev/null and b/preview-calendar/uw-research-computing/guide-icons/servers.png differ
diff --git a/preview-calendar/uw-research-computing/guide-icons/trouble.png b/preview-calendar/uw-research-computing/guide-icons/trouble.png
new file mode 100644
index 000000000..62f51c563
Binary files /dev/null and b/preview-calendar/uw-research-computing/guide-icons/trouble.png differ
diff --git a/preview-calendar/uw-research-computing/guides.html b/preview-calendar/uw-research-computing/guides.html
new file mode 100644
index 000000000..79d314ffe
--- /dev/null
+++ b/preview-calendar/uw-research-computing/guides.html
@@ -0,0 +1,784 @@
+
+
+
+
+
+
+Computing Guides
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Below is a list of guides for some of the most common tasks our users need to
+carry out as they begin and continue to use the resources at the CHTC.
+Some of these are general computing solutions; others are specific to HTCondor
+or to the configuration of CHTC computing resources.
+
+
+Guides will be added to the list as we can provide them. Please contact us
+(email at bottom of page) if you find any of the information to be incorrect.
+
+
+
+
+
User Expectations
+
+Read through these user expectations and policies before using CHTC services.
+
+
+
+
+
+
+
CHTC's guides for handling large data (Guide
+here) and software installation.
+
+
+
Overview
+
+
A high-memory job is one that requires a significantly larger amount of
+memory (also known as RAM) than a typical high throughput job usually
+over 200 GB and up to 1-4 TB. In the following guide, we cover resources
+and recommendations for running high-memory work in CHTC. However,
+please make sure to email us if you believe you will need to run
+"high-memory" work for the first time, or are planning the execution
+of new "high-memory" work that is different from what you've run
+before. We'll happily help you with some personalized tips and
+considerations for getting your work done most efficiently.
Jobs that request over 200GB of memory in their submit file
+can run on our dedicated high memory machines. However, if your job
+doesn't need quite that much memory, it's good to request less, as
+doing so will allow your job(s) to run on more servers, since CHTC has
+hundreds of servers with up to 100 GB of memory and dozens of servers
+with up to 250 GB of memory.
+
+
+
+
B. Testing
+
+
Before running a full-size high-memory job, make sure to use a small
+subset of data in a test job. Not only will this give you a chance to
+try out the submit file syntax and make sure your job runs, but it can
+help you estimate how much memory and/or disk you will need for a job
+using your full data.
+
+
You can also use interactive jobs to test commands that will end up in
+your "executable" script. To run an interactive job, prepare your
+submit file as usual. Note that for an interactive job, you should use a
+smaller memory request (and possibly lower CPU and disk as well) than
+for the final job (so that the interactive job starts) and plan to
+simply test commands, not run the entire program. To submit interactive
+job, use the -i flag with condor_submit:
+
+
[alice@submit]$ condor_submit -i submit.file
+
+
+
After waiting for the interactive job to start, this should open a bash
+session on an execute machine, which will allow you to test your
+commands interactively. Once your testing is done, make the appropriate
+changes to your executable, adjust your resource requests, and submit
+the job normally.
+
+
+
+
C. Consult with Facilitators
+
+
If you are unsure how to run high-memory jobs on CHTC, or if you're not
+sure if everything in this guide applies to you, get in touch with a
+research computing facilitator by emailing chtc@cs.wisc.edu.
+
+
+
+
3. Running High Memory Job
+
+
+
+
A. Submit File
+
+
The submit file shown in our Hello World example is
+a good starting point for building your high memory job submit file. The
+following are places where it's important to customize:
+
+
+
+
request_memory: It is crucial to make this request as accurate
+as you can by testing at a small scale if possible (see
+above). Online documentation/help pages or your colleagues'
+experience is another source of information about required memory.
+
+
Long running jobs: If your high memory job is likely to run
+longer than our 3-day time limit, please email us for options on how
+to run for longer. In the past, high memory jobs received an extra
+time allowance automatically but this is no longer the case.
+
+
request_cpus: Sometimes, programs that use a large amount of
+memory can also take advantage of multiple CPUs. If this is the case
+for your program, you can request multiple CPUs. However, it is
+always easier to start jobs that request fewer number of cores,
+rather than more. We recommend:
+
+
+
+
+
Requesting ___ of memory?
+
Request fewer than ___ CPUs
+
+
+
+
+
up to 100 GB
+
4
+
+
+
100-500 GB
+
8
+
+
+
500GB-1TB
+
16
+
+
+
1-1.5TB
+
20
+
+
+
1.5-2TB
+
20
+
+
+
2TB or greater
+
32
+
+
+
+
+
If you think a higher CPU request would significantly improve your
+job's performance, contact a facilitator.
+
+
+
request_disk: Request the maximum amount of data your job will
+ever have within the job working directory on the execute node,
+including all output and input (which will take up space before some
+of it is removed from the job working directory at the end of the
+job).
+
+
Other requirements: if your job uses files from our large data
+space, or Docker for
+software, add the necessary requirements for
+these resources to your submit file.
+
+
+
Altogether, a sample submit file may look something like this:
+
+
### Example submit file for a single staging-dependent job
+
+universe = vanilla
+
+# Files for the below lines will all be somewhere within /home/username,
+# and not within /staging/username
+log = run_myprogram.log
+executable = run_Trinity.sh
+output = $(Cluster).out
+error = $(Cluster).err
+transfer_input_files = trinityrnaseq-2.0.1.tar.gz
+should_transfer_files = YES
+
+# Require execute servers that have large data staging
+Requirements = (Target.HasCHTCStaging == true)
+
+# Memory, disk and CPU requests
+request_memory = 200GB
+request_disk = 100GB
+request_cpus = 4
+
+# Submit 1 job
+queue 1
+### END
+
+
+
+
+
B. Software
+
+
Like any other job, the best option for high memory work is to create a
+portable installation of your software. We have guides for scripting
+languages and using
+Docker, and can otherwise provide individual
+support for program installation during office hours or over
+email.
+
+
+
+
C. "Executable" script
+
+
As described in many of our guides (for
+software or for using large
+data), you will need to write a script
+that will run your software commands for you and that will serve as the
+submit file "executable". Things to note are:
+
+
+
If using files from our large data staging space, follow the
+recommendations in our guide.
+
If using multiple cores, make sure that you request the same number
+of "threads" or "processes" in your command as you requested in
+your submit file.
+
+
+
Altogether, a sample script may look something like this (perhaps called
+run_Trinity.sh):
+
+
#!/bin/bash
+# Copy input data from /staging to the present directory of the job
+# and un-tar/un-zip them.
+cp /staging/username/reads.tar.gz ./
+tar -xzvf reads.tar.gz
+rm reads.tar.gz
+
+# Set up the software installation in the job working directory, and
+# add it to the job's PATH
+tar -xzvf trinityrnaseq-2.0.6-installed.tar.gz
+rm trinityrnaseq-2.0.6-installed.tar.gz
+export PATH=$(pwd)/trinityrnaseq-2.0.6:$PATH
+
+# Run software command, referencing input files in the working directory and
+# redirecting "stdout" to a file. Backslashes are line continuation.
+Trinity --seqType fq --left reads_1.fq \
+--right reads_2.fq --CPU 4 --max_memory \
+20G > trinity_stdout.txt
+
+# Trinity will write output to the working directory by default,
+# so when the job finishes, it needs to be moved back to /staging
+tar -czvf trinity_out_dir.tar.gz trinity_out_dir
+cp trinity_out_dir.tar.gz trinity_stdout.txt /staging/username/
+rm reads_*.fq trinity_out_dir.tar.gz trinity_stdout.txt
+
+### END
+
+ Recreating Spack Installs on a New Operating System
+
+
+
If you had a software install on the HPC cluster before June 2024 and need to update it, read through
+the details in this guide. Unless your code is fairly simple, you will likely need to recompile it.
+
+
If you are installing something on the HPC cluster for the first time, refer to our main software
+guide to start: Use HPC Software
+
+
Remember to always compile your code/programs in a (interactive) Slurm job! How To
+
+
+
Not only does this help avoid stressing the resources of the login server, but the upgraded login server uses a newer CPU architecture than the worker nodes in the cluster.
+Most compilers auto-detect the CPU architecture and adapt the compilation to use that architecture.
+Attempting to use such compiled code on a different/older CPU architecture can lead to “Illegal instruction” errors, among others.
+
+
+
Modules
+
+
Most of the modules on the upgraded cluster have been kept, but with upgraded versions.
+The following table is a comparison of the modules on the old operating system (EL8) versus the new operating system (EL9).
+(Adapted from the output of module avail on the respective servers.)
+
+
You will likely need to recompile your code to use the new module versions.
+Remember to also update any module load commands that specify a particular version of the module,
+otherwise you may encounter “module(s) are unknown” errors.
+
+
Module comparison
+
+
+
+
+
Module name
+
Old version (EL8)
+
New version (EL9)
+
+
+
+
+
abaqus
+
2018-hotfix-1904
+
TBD
+
+
+
ansys
+
2022r24
+
2024r1
+
+
+
aocc
+
3.2.0
+
4.2.0
+
+
+
cmake
+
3.27.7
+
3.27.9
+
+
+
comsol
+
6.0, 6.1, 6.2
+
6.2
+
+
+
gcc
+
11.3.0
+
13.2.0
+
+
+
hdf5 (intel-oneapi-mpi)
+
1.12.2
+
dropped
+
+
+
hdf5 (openmpi)
+
1.12.2
+
1.14.3
+
+
+
intel-oneapi-compilers
+
2023.2.1
+
2024.1.0
+
+
+
intel-oneapi-mkl
+
2023.2.0
+
2024.0.0
+
+
+
intel-oneapi-mpi
+
2021.10.0
+
2021.12.1
+
+
+
intel-tbb
+
2021.9.0
+
deprecated
+
+
+
lmstat.comsol
+
6.0
+
TBD
+
+
+
lumerical-fdtd
+
2022-r2.4
+
2024-R1.2
+
+
+
matlab
+
R2021b, R2022b
+
R2024a
+
+
+
mvapich2
+
2.3.7-1
+
deprecated
+
+
+
mvapich
+
n/a
+
3.0
+
+
+
netcdf-c
+
4.8.1
+
4.9.2
+
+
+
netcdf-cxx4
+
4.3.1
+
4.3.1
+
+
+
netcdf-fortran
+
4.5.4
+
4.6.1
+
+
+
openmpi (aocc)
+
4.1.3
+
dropped
+
+
+
openmpi (gcc)
+
4.1.3
+
5.0.3
+
+
+
patchelf (gcc)
+
0.17.2
+
0.17.2
+
+
+
patchelf (intel)
+
0.18.0
+
dropped
+
+
+
patchelf (oneapi)
+
0.18.0
+
0.17.2
+
+
+
petsc
+
3.18.1
+
3.21.1
+
+
+
pmix
+
n/a
+
5.0.1
+
+
+
+
+
+
Different versions of module packages, or packages that are “dropped” or “deprecated” may be manually installed by the user using Spack.
+
+
+
Spack
+
+
Spack is a package manager platform that allows users to install software without admin privileges.
+CHTC also uses Spack to install the software underlying the system-wide modules discussed above.
+
+
+
If you have not used Spack before, you can skip this section and go directly to the Set Up Spack on HPC guide.
+
+
+
Here is the general process for setting up your software on the upgraded EL9 system; detailed instructions are provided after the general process:
+
+
+
+
Identify the environments you currently have and which you want to reproduce on the upgraded system.
+
+
+
Remove your existing Spack folders.
+
+
+
Do a clean installation of Spack.
+
+
+
In an interactive job, create your Spack environment(s) and install the packages as you did previously.
+
+
+
Update your job submission scripts and/or recompile programs as needed to use the new Spack environment(s).
+
+
+
+
The following instructions assume that you previously installed Spack in your home (~/) directory for individual use.
+
+
1. Identify your environments
+
+
You can see your Spack environments with
+
+
spack env list
+
+
+
Activate an environment that you want to replicate with
+
+
spack env activate environment_name
+
+
+
Then list your package “specs” with the command
+
+
spack find
+
+
+
There is a section “==> Root specs” that lists the package specs you explicity added when you created your environment.
+Save a copy of these specs somewhere safe, so that you can use them to replicate the environment later on.
+You can ignore the “installed packages” section, as that will certainly change on the new system.
+
+
Repeat the above steps for each environment you want to replicate on the upgraded system.
+
+
2. Remove your existing Spack folders
+
+
The easiest way to update Spack for the upgraded system is to remove the current Spack installation and reinstall from scratch.
+
+
+
Before proceeding, you may want to make a backup of each folder using
+
+
tar -czf folder_name.tar.gz ~/folder_name
+
+
+
+
For most users, the following commands should work:
+
+
cd ~/
+rm -rf spack spack_programs spack_modules .spack
+
+
+
The command may take a while to run.
+
+
3. Fresh install of Spack
+
+
Next, follow the instructions in our guide Set Up Spack on HPC to do a fresh installation of Spack.
+The commands in the guide have been updated for setting up Spack on the new operating system.
+
+
4. Recreate your environments
+
+
Follow the instructions in our guide Install Software Using Spack to create your desired environments
+using the “root specs” that you saved earlier.
+
+
NOTE: We’ve made small but important change to this guide: you should always start an interactive Slurm job before creating or modifying a Spack environment.
+The login server uses different hardware than the execute servers, and the mismatch leads to Spack using the wrong settings for installing packages.
+Of course, as before, you should only install packages while in interactive Slurm job.
+
+
Behind the scenes, we’ve made a few changes to the configuration that will hopefully make the package installation much smoother.
+
+
5. Update your workflow
+
+
Finally, remember to update your workflow to use the new Spack environments and the packages installed therein.
+
+
+
+
If you explicitly provide paths to packages installed using Spack, be sure to update those paths in your compiler configuration or in your job submission script.
+
+
+
If you used Spack to provide dependencies for manually compiling a program, remember to recompile the program.
+
+
+
If you changed the name of your environment, be sure to update the name in your job submission script.
The following assumes that you have been granted access to the HPC cluster
+and can log into the head node spark-login.chtc.wisc.edu. If this is not
+the case, please see the CHTC account application page or email
+the facilitation team at chtc@cs.wisc.edu.
+
+
View Job Performance with seff
+
+
The seff command will print out a summary of usage and efficiency metrics for
+a specific job. The usage and output looks like this:
SLURM saves jobs information in a database that can be queried using the sacct command.
+
+
+
If you are having trouble viewing output from sacct try running this command first
+
+
[alice@login]$ sacct --start=2018-01-01
+
+
+
+
How To Select Jobs
+
+
By default sacct shows only your jobs, that ran or were submitted on the current
+date. See the following list for different ways to select groups of jobs to review. Some of the options – especially the time and user options – can both be added to the same query.
+
+
+
To display information about a specific job or list of jobs use -j or --jobs followed by a job number or comma separated list of job numbers.
+
+
+
[alice@login]$ sacct --jobs job1,job2,job3
+
+
+
+
+
To select information about jobs in a certain date range use --start and --end Without it, sacct will only return jobs from the current day.
+
+
+
[alice@login]$ sacct --start=YYYY-MM-DD
+
+
+
+
To select information about jobs in a certain time range use --starttime and --endtime The default start time is 00:00:00 of the current day, unless used with -j, then the default start time is Unix Epoch 0. The default end time is time of running the command. Valid time formats are
+
HH:MM[:SS] [AM|PM]
+MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
+MM/DD[/YY]-HH:MM[:SS]
+YYYY-MM-DD[THH:MM[:SS]]
+
To only show statistics relevant to the job allocation itself, not taking steps into consideration, use -X. This can be useful when trying to figure out which part of a job errored out.
+
+
[alice@login]$ sacct -X
+
+
+
+
A Sample sacct Query
+
+
For example to view all of your jobs since January 1, 2024, printing
+out which partition you used, how many nodes, and what the final status of the job was,
+use:
The following assumes that you have been granted access to the HPC cluster
+and can log into the head node spark-login.chtc.wisc.edu. If this is not
+the case, please see the CHTC account application page or email
+the facilitation team at chtc@cs.wisc.edu.
+
+
1. Submitting Jobs Using SLURM
+
+
A. Submitting a Job
+
+
Jobs can be submitted to the cluster using a submit file, sometimes also
+called a “batch” file. The top half of the file consists of #SBATCH
+options which communicate needs or parameters of the job – these lines
+are not comments, but essential options for the job. The values for
+#SBATCH options should reflect the size of nodes and run time limits
+described here.
+
+
After the #SBATCH options, the submit file should contain the commands
+needed to run your job, including loading any needed software modules.
+
+
An example submit file is given below. It requests 1 nodes of 64 cores
+and 4GB of memory each (so 64 cores and 256 GB of memory total), on the
+shared partition. It also specifies a run time limit of 4.5 hours.
+
+
#!/bin/sh
+#This file is called submit-script.sh
+#SBATCH --partition=shared # default "shared", if not specified
+#SBATCH --time=0-04:30:00 # run time in days-hh:mm:ss
+#SBATCH --nodes=1 # require 1 nodes
+#SBATCH --ntasks-per-node=64 # cpus per node (by default, "ntasks"="cpus")
+#SBATCH --mem=4000 # RAM per node in megabytes
+#SBATCH --error=job.%J.err
+#SBATCH --output=job.%J.out
+# Make sure to change the above two lines to reflect your appropriate
+# file locations for standard error and output
+
+# Now list your executable command (or a string of them).
+# Example for code compiled with a software module:
+module load mpimodule
+srun --mpi=pmix -n 64 /home/username/mpiprogram
+
+
+
Once the submit file is created, it can be submitted using the sbatch command:
+
+
[alice@login]$ sbatch submit-script.sh
+
+
+
B. Optimizing Your Submit File
+
+
The new cluster has different partition names and different sized nodes. We always recommend requesting cores per node (instead of total cores), using a multiple of 32 cores as your request per node. Requesting multiple nodes is not advantageous if your jobs are smaller than 128 cores. We also now recommend requesting memory per core instead of memory per node, for similar reasons, using the --mem-per-cpu flag with units of MB. Here are our recommendations for different sized jobs:
C. Requesting an Interactive Job ("int" and "pre" partitions)
+
+
If you want to run your job commands yourself, as a test before submitting
+a job as described above, you can request an interactive job on the cluster.
+
+
There is a dedicated partition
+for interactive work called int; you may request up to 16 CPUS and 64GB of memory
+when requesting an interactive session in the "int" partition. By default,
+the session is limited to 60 minutes though you can request up to 4 hours.
+Using another partition (like pre) will
+mean your interactive job is subject to the limits of that partition instead.
+
+
For simple testing or compiling
+
+
The command to request an interactive job is srun --mpi=pmix, and includes the partition
+in which you’d like to run the interactive job.
+
+
[alice@login]$ srun --mpi=pmix -n4 -N1 -p int --pty bash
+
+
+
+
Note: You will not be able to run MPI code in this interactive session.
+
+
+
The above example indicates a request for 4 CPUs (-n4) on a single
+node (-N1) in the "int" partition (-p int). Adding "-t 15" would
+indicate a request for 15 minutes, if desired, rather than the 60-minute
+default. After the interactive shell is created to a compute node with
+the above command, you'll have access to files on the shared file
+system and be able to execute code interactively as if you had directly
+logged in to that node. It is important to exit the interactive shell
+when you're done working by typing exit.
+
+
For running MPI code
+
+
To run an MPI program in an interactive session, you will need to (1) allocate the
+resources using salloc, then (2) use srun to run the MPI code, and finally (3)
+give up the allocated resources.
+
+
+
+
Request resources
+
+
[alice@login]$ salloc -n4 -N1 -p int
+
+
+
This command requests 4 CPUs (-n4) on a single node (-N1) in the "int"
+partition (-p int), and assigns the resources to a new terminal session
+on the login node. When the allocation has started, you will see a message
+like this:
+
salloc: Granted job allocation 18701
+ Guest on spark-a005.chtc.wisc.edu
+
+
+
To run code in this allocation, be sure to use srun as described in the next step!
+
+
+
Use resources
+
+
At this point, your terminal is still running on the login node. To run
+commands using the resources in the allocation, you will need to use srun.
This will execute the specified script using the allocated resources.
+When the srun calculation has finished, you will remain in the allocation
+session, allowing you to run srun multiple times in quick succession.
+
+
You can also use the allocated resources interactively with
+
[alice@login]$ srun --mpi=pmix --pty bash
+
+
+
which will start an interactive terminal session in your allocation (this
+is evident by the change in the command prompt from [alice@login] to
+[alice@spark-a###]). Keep in mind that you will not be able to use
+MPI inside the interactive session. You can exit the interactive session
+and return to the allocation by entering exit.
+
+
+
Give up resources
+
+
To end your allocation, simply enter exit. You will see a message like
+this:
+
exit
+salloc: Relinquishing job allocation 18701
+salloc: Job allocation 18701 has been revoked.
+
+
+
+
+
+
It can be difficult to remember whether or not you are currently using an
+allocation. A quick way of checking is to see if the SLURM_JOB_ID is set
+by entering echo $SLURM_JOB_ID. If you are in an allocation, this command
+will return the job ID number that corresponds to an entry in your SLURM queue
+(see below).
+
+
A more convenient option is to update your .bashrc file so that the command
+prompt changes when you are in an allocation. This can be done using the
+following commands:
Now when you run salloc, your command prompt will start with the corresponding
+SLURM job ID number. This will also be the case for the interactive srun
+command. For example,
+
[alice@login]$ salloc -n4 -N1 -p int
+salloc: Granted job allocation 18701
+ Guest on spark-a005.chtc.wisc.edu
+
+18701[alice@login]$ echo 'I am running an allocation.'
+I am running an allocation.
+18701[alice@login]$ srun --mpi=pmix --pty bash
+
+18701[alice@spark-a006] echo 'I am using the resources interactively.'
+I am using the resources interactively.
+18701[alice@spark-a006] exit
+exit
+18701[alice@login]$ exit
+exit
+salloc: Relinquishing job allocation 18701
+[alice@login]$
+
+
+
+
This can be undone by removing the two added lines from the .bashrc file
+ in your home directory.
+
+
+
+
More advanced users can manipulate their bash prompt further.
+The SLURM_JOB_ID variable is created for the allocation, and
+a SLURM_JOB_UID variable is created for the interactive srun.
+
+
+
+
2. Viewing Jobs in the Queue
+
+
To view your jobs in the SLURM queue, use the following command:
+
+
[alice@login]$ squeue -u username
+
+
+
Issuing squeue alone will show all user jobs in the queue. You can
+view all jobs for a particular partition with squeue -p shared.
This page documents some common and known issues encountered on the HPC system. While this page can be beneficial in troubleshooting, it does not contain a comprehensive list of errors.
+
+
Visit our Get Help page to find more resources for troubleshooting.
[Software] When compiling code, I get multiple errors such as "[library], needed by [library] not found" and "undefined reference to [library]".
+
+
Cause:
+
This occurs when you try to compile code on the login server. System-installed libraries are only for use on the execute servers, not the login server. When trying to use the modules, the compilation fails because the login server does not have the same libraries as the execut servers.
+
Solution:
+
Make sure you are compiling your code in an interactive job on the HPC.
The CHTC high-performance computing (HPC) cluster provides dedicated support for large,
+singular computations that use specialized software (i.e. MPI) to achieve internal
+parallelization of work across multiple servers of dozens to hundreds of cores.
+
+
Is high-performance computing right for me? Only computational work that
+fits that above description is appropriate for the HPC Cluster. Computational
+work that can complete on a single node in less than a few days will be
+best supported by our larger high-throughput computing (HTC) system (which also
+includes specialized hardware for extreme memory, GPUs, and other cases). For more
+information, please see Our Approach.
+
+
To get access to the HPC Cluster, please complete our
+New User Consultation Form. After your request is received,
+a Research Computing Facilitator will follow up to discuss the computational needs
+of your research and connect you with computing
+resources (including non-CHTC services) that best fit your needs.
The HPC Cluster consists of two login nodes and many compute (aka execute)
+nodes. All users log in at a login node, and all user files on the shared file sytem are accessible on all nodes.
+Additionally, all nodes are tightly networked (200 Gbit/s Infiniband) so
+they can work together as a single "supercomputer", depending on the
+number of CPUs you specify.
+
+
Operating System and Software
+
+
All nodes in the HPC Cluster are running CentOS 8 Stream Linux.
+
+
The SLURM scheduler version is 22.05.6.
+
+
To see more details of other software on the cluster, see the HPC Software page.
+
+
Login Nodes
+
+
The login node for the cluster is: spark-login.chtc.wisc.edu
+
+
For more details on logging in, see the “Connecting to CHTC” guide linked above.
+
+
Execute Nodes and Partitions
+
+
Only execute nodes will be used for performing your computational work.
+The execute nodes are organized into several "partitions", including
+the shared, pre, and int partitions which are available to
+all HPC users as well as research group specific partitions that consist
+of researcher-owned hardware and which all HPC users can access on a
+backfill capacity via the pre partition (more details below).
+
+
+
+
+
Partition
+
p-name
+
# nodes (N)
+
t-default
+
t-max
+
max cores/job
+
cores/node (n)
+
RAM/node (GB)
+
+
+
+
+
Shared
+
shared
+
45
+
1 day
+
7 day
+
320
+
64 or 128
+
512
+
+
+
Interactive
+
int
+
2
+
1 hr
+
4 hrs
+
16
+
64 or 128
+
512 (max 64 per job)
+
+
+
Pre-emptable (backfill)
+
pre
+
45
+
4 hrs
+
24 hrs
+
320
+
64 or 128
+
512
+
+
+
Owners
+
unique
+
19
+
24 hrs
+
7 days
+
unique
+
64 or 128
+
512
+
+
+
+
+
+
+
shared compute nodes each have 64 or 128 cores and 512 GB of RAM.
+Jobs submitted to this partition
+can request and use up to 7 days of running time.
+
+
+
int consists of two compute nodes is intended for short and immediate interactive
+testing on a single node (up to 16 CPUs, 64 GB RAM). Jobs submitted to this partition
+can run for up to 4 hours.
+
+
+
pre (i.e. pre-emptable) is an under-layed partition encompassing all HPC Cluster
+nodes and is intended for more immediate turn-around of shorter, smaller, and/or
+interactive sessions requiring more than the 4 hour time limit of the int partition.
+Jobs submitted to pre are run as back-fill on any idle nodes, including researcher-owned
+compute nodes, meaning these jobs may be pre-empted by higher priority
+jobs. By default, pre-empted jobs will be re-queued (to run again) if they were submitted with
+an sbatch script.
+
+
+
+
Fair Share Allocation
+
+
To promote fair access to HPC computing resources, all users are limited to 10 concurrently
+running jobs (if you need to queue more, please get in touch). Additionally, users are restricted to a total of 720 cores
+across all running jobs (core limits do not apply on research group partitions of
+more than 720 cores).
+
+
When determining which order to run jobs, the following policies are applies, in order or significance
+to job priority determinations:
+
+
A. User priority decreases as the user accumulates hours of CPU time over the last 21 days, across
+all queues. This “fair-share” policy means that users who have run many/larger jobs in the near-past
+will have a lower priority, and users with little recent activity will see their waiting jobs start sooner.
+(The cluster does not have a strict “first-in-first-out” queue policy.)
+
+
B. Job priority increases with job wait time. After the history-based user priority calculation in (A),
+the next most important factor for each job’s priority is the amount of time that each job has already
+waited in the queue. For all the jobs of a single user, these jobs will most closely follow a “first-in-first-out” policy.
+
+
C. Job priority increases with job size, in cores. This least important factor slightly favors larger jobs, so that
+the scheduler can take advantage when large numbers of newly-available nodes happen to become available (requiring less
+wasted time to deliberately drain nodes for larger jobs). So, among a user’s jobs submitted at roughly the same time,
+a larger job may run first, if the number of nodes necessary for the larger job is already available.
+
+
Data Storage and Management
+
+
Data space in the HPC Cluster filesystem is not backed-up and should be
+treated as temporary by users. Only files necessary for
+actively-running jobs should be kept on the filesystem, and files
+should be removed from the cluster when jobs complete. A primary copy of any
+essential files (e.g. software, submit files, input) should be kept in an
+alternate, non-CHTC storage location.
+
+
Each user will receive two primary data storage locations:
+
+
+
+
/home/username with an initial disk quota of 30GB
+and 250,000 items. Your home directory is meant to be used for files
+you use repeatedly, like submit file templates, source code, software
+installations, and reference data files.
+
+
+
/scratch/username with an initial disk quota of 100GB and
+250,000 items. Jobs should always be submitted and run out of
+/scratch. It is the space for all working data, including individual
+job inputs, job outputs, and job log/stderr/stdout files.
+
+
+
+
+
What about /software?
+
+
If you are installing software meant to be shared within a group,
+we can create a dedicated folder for you in the /software space
+email us (chtc@cs.wisc.edu) if this is you!
+
+
+
To check how many files and directories you have in
+your /home or /scratch directory see the
+instructions below.
+
+
Changes to quotas for either of these locations are available upon request
+per our Request a Quota Change guide. If you don't
+know how many files your installation creates, because it's more than
+the current items quota, simply indicate that in your request.
+
+
CHTC Staff reserve the right to remove any significant amounts of data
+on the HPC Cluster in our efforts to maintain filesystem performance
+for all users.
+
+
Local scratch space is available on each execute node in /local/$USER.
+This space is NOT automatically cleaned out, so if you use this space,
+be sure to remove the files before the end of your job script or
+interactive session.
+
+
Tools for managing home and software space
+
+
You can use the command get_quotas to see what disk
+and items quotas are currently set for a given directory path.
+This command will also let you see how much disk is in use and how many
+items are present in a directory:
When ncdu has finished running, the output will give you a total file
+count and allow you to navigate between subdirectories for even more
+details. Type q when you're ready to exit the output viewer. More
+info here: https://lintut.com/ncdu-check-disk-usage/
In CHTC, we install a minimal set of software for use
+on our systems. On the HPC Cluster, CHTC staff manage installations of
+the following types of programs:
+
+
+
Compilation tools and common dependencies (e.g. MPI, different GCC versions)
+
Software that requires a shared license (e.g. COMSOL)
+
+
+
Information on how to access CHTC-managed installations is in the next
+section of this guide. If you need to use a program not in that group, the instructions
+for creating your own installation follow.
+
+
If you have questions or concerns about installing your own software or
+the available dependencies, contact the facilitation team at chtc@cs.wisc.edu.
+
+
2. Using Pre-Installed Software in Modules
+
+
All software on the cluster that is installed by CHTC staff is available via
+a tool called “modules”.
+
+
A. See Available Software Modules
+
+
There are two ways to search through the software modules on the HPC cluster:
+
+
+
View all modules
+ This command will show all software modules available:
+
[alice@login]$ module avail
+
+
+
Search for specific modules
+ If you are searching for a specific software module, you can use the
+ module spider command with part of the software name. For example, to
+ search for Open MPI modules, you would type:
+
[alice@login]$ module spider openmpi
+
+
+
+
+
B. Access Software in Modules
+
+
Once you find a software module that you want to use, you need to “load” it
+into your command line environment to make it active, filling in module_name with
+the name you found through one of the above steps.
+
+
[alice@login]$ module load module_name
+
+
+
+
When to Load Modules
+
+
You can load modules to compile code (see below). If you do this, make sure to load
+the same modules as part of your job script before running the main command.
+
+
You can also load modules to run specific software. If done for interactive
+testing, this should be done in an interactive job; otherwise, the module
+should be loaded in the job submit file.
+
+
+
C. Unload Software in Modules
+
+
If you no longer want to use a specific software installation, you can “unload”
+the software module with the following command:
+
+
[alice@login]$ module unload module_name
+
+
+
If you want to clear your command line environment and start over, run the following:
+
+
[alice@login]$ module purge
+
+
+
3. Installing Software on the Cluster
+
+
A. Overview
+
+
Unless you are using a licensed software program provided via modules, you
+are able to compile and install the software you need on the HPC Cluster.
+
+
If your software requires modules to compile, you must compile software on an interactive job.
+Compilation can be done via an interactive job as described in
+our HPC Job Submission Guide.
+Software should be installed to your /home/username
+directory. If using CHTC’s provided compilation tools via modules, make
+sure to load the needed modules before compiling and to load the same
+modules in your job submission.
+
+
For groups that would like to share software installations among group
+members, please contact us about getting a shared “group” directory.
+
+
If you are new to software installation, see the section below for
+a more step-by-step description of the process.
+
+
B. Step by Step Process
+
+
+
Download Source Code - download the source code for your desired program. We
+ recommend downloading it to your /home/username directory on the login node.
+ You should only need the source code until the software is properly installed, but if desired, you may keep a zipped copy of
+ the source code in /home.
+
Read the Docs - try to find the installation instructions, either online or
+ in the downloaded source code. In particular, you’ll want to note if there are
+ any special requirements for dependencies like MPI or the compiler needed.
+
Load Modules - if you are using software modules to help you build your
+ code, load them now. Keep track of what you use so that you can load them
+ in your job submit file later. We also recommend doing a module purge before
+ loading your compiling modules to make sure you’re starting from a clean environment.
+
Install - most scientific software follows the three step installation process
+ of configure - make - make install.
+
+
configure- this step checks for tools and requirements needed to compile
+ the code. This is the step where you set the final installation location of
+ a program. The option for setting this location is typically called the
+ “prefix”; a common syntax is: $ ./configure --prefix=/home/user.
+ This is where you will want to set the installation location to be your
+ /home directory.
+
make - this step compiles and links the code, turning it from human-readable
+ source code to compiled binary code. This is usually the most time consuming
+ step of the installation process.
+
make install - this step copies compiled files to the final installation location
+ (usually specified in the configure step).
+
+
+
Clean Up - the final installation should place all needed files into a
+ subdirectory of your /home directory. The source code and location where
+ you ran the compilation commands can be removed at this point.
+
+
+
+
+
4. Using Software in Jobs
+
+
The commands to run your software will go in the job’s submit file, as described
+in our HPC job submission guide.
+
+
If you used one of the software modules to compile your code, make sure you
+load it in your job’s submit file before running your main command.
+
+
You can access your software by including the path to its location in your
+/home directory, or by setting the PATH environment variable to include
+the software location and then running the command.
CHTC uses Spack (https://github.com/spack/spack) for installing and managing software packages on the HPC cluster for all users to use, via the module command. Recently, Spack has developed a feature that allows for users to integrate their local installation of Spack with the system-wide installation. This means that when a user installs software with their local installation of Spack, they can automatically incorporate the system-wide packages to satisfy their software’s dependencies (similar to Conda and Miniconda).
+
+
This guide describes how to install and manage software using Spack, including how to install and use a specific compiler.
+
+
This guide assumes you or your group has already set up your local installation of Spack. If you have not installed Spack, follow the instructions in Setting Up Spack on HPC.
Once your local installation of Spack has been properly configured, you are now ready to install software using Spack.
+
+
Check the documentation for the program you want to install to see if they have instructions for installation using Spack. Even if your program can’t be installed using Spack, you can still use it to install the dependencies that your program needs.
+
+
+
Note: For a group installation of Spack, you will not be able to modify or remove the packages installed by a different user. We recommend that you consult with the rest of your group for permission before proceeding.
+
+
+
A. Start an Interactive Job
+
+
Before creating a Spack environment or installing packages using Spack, first start an interactive Slurm job:
When creating an environment, Spack automatically detects the hardware of the machine being used at the time and configures the packages as such.
+Since the login server uses newer hardware than the execute servers, creating an environment on the login server (not in an interactive job) is a bad idea.
+
+
+
B. Creating and Using a Spack Environment
+
+
Software installations with Spack should be done inside of a Spack environment, to help manage the shell and the paths to access programs and libraries needed for a particular software installation.
+
+
To create a Spack environment, run the command
+
+
spack env create yourEnvironmentName
+
+
+
where you should replace yourEnvironmentName with your desired name for the environment. You can then activate the environment with
+
+
spack env activate yourEnvironmentName
+
+
+
You will need to activate the environment when you wish to use the software that was installed in that environment.
+
+
+
You can see a list of your available environments using
+
+
spack env list
+
+
+
and you can see which environment you are currently using with
+
+
spack env status
+
+
+
To deactivate the environment, run
+
+
spack env deactivate
+
+
+
or close the terminal session.
+
+
+
C. Finding Program Packages in Spack
+
+
Once inside an active Spack environment, you can run the following command to see what packages are installed in the current environment
+
+
spack find
+
+
+
For a new environment, this will show that there are no programs installed. The output of this command will update after you install program packages in the environment.
+
+
To search for packages to install using Spack, use the command
+
+
spack list nameOfProgram
+
+
+
where you should replace nameOfProgram with the program that you are interested in finding. Spack will search for the package and print out a list of all the packages that match that name. (The first time you run this command may take several minutes while Spack downloads a current list of packages that can be installed.)
+
+
To learn more about an available package, use the exact name of the program and run
+
+
spack info exactNameOfProgram
+
+
+
This will print out information about the program, including a short description of the program, a link to the developer’s website, and the available versions of the program and its dependencies.
+
+
D. Adding Package Specifications to the Environment
+
+
Once you find the packages that you want to install, add their specifications to the environment using
+
+
spack add exactNameOfProgram
+
+
+
Spack will automatically decide which version of the program to use at installation time based on the other packages that you’ve added.
+
+
If you want a specific version of a package, you can specify it by appending @= to the end of the package name, followed by the version number. For example,
+
+
spack add python@=3.10
+
+
+
will tell the environment that you want to install version 3.10 of Python. There are additional ways of defining specifications for package versions, the compiler to be used, and dependencies. The documentation for Spack provides the details on how this is done.
Once you have identified the package(s) you would like to install and have added the specifications to your environment,
+
+
i. Create the local scratch directory
+
+
Using the default configuration from Setting Up Spack on HPC, Spack will try to use the machine’s local disk space for staging and compiling files before transferring the finished results to the final installation directory. Using this space will greatly improve the speed of the installation process. Create the local directory with the command
+
+
mkdir /local/yourNetID/spack_build
+
+
+
where you should replace yourNetID with your NetID. At the end of the session, remember to delete this directory so that other people can use the disk space in their jobs.
+
+
+
If the directory already exists, that means you forgot to remove it after one of your previous Spack installation sessions. Simply remove the directory and make it again.
If you’ve added the installation specifications to the environment, then you can check the installation plan using the command
+
+
spack spec -lI
+
+
+
(the first letter after the hyphen is a lowercase “L” and the second letter is an uppercase “i”).
+
+
+
This command identifies what dependencies Spack needs in order to install your desired packages along with how it will obtain them. Assuming their are no problems, then it will print a list of the packages and their dependencies, where entries that begin with a green [+] have already been installed somewhere in your local Spack installation, while those that begin with a green [^] are referencing the system installation, and those beginning with a gray - will need to be downloaded and installed.
+
+
+
Most users should see a bunch of packages with a green [^] in the first column.
+If you do not, then there are several possible explanations:
If you are satisfied with the results, then you can proceed to install the programs.
+
+
iii. Install the environment packages
+
+
Assuming that you are in an interactive Slurm session and have activated the desired environment containing the package specifications, you can run
+
+
spack install -j 4
+
+
+
to install the packages inside of the Spack environment, where the number that comes after -j needs to match the number that you noted from when you started the interactive session (the one after -n when you ran the srun command for the interactive session). You can also add the -v option to have the installation be verbose, which will cause Spack to print the compile and make outputs in addition to the standard Spack output.
+
+
Depending on the number and complexity of the programs you are installing, and how much can be bootstrapped from the system installation, the installation step can take anywhere from several minutes to several hours.
+
+
+
If something goes wrong or your connection is interrupted, the installation process can be resumed at a later time without having to start over from the beginning. Make sure that you are in an interactive Slurm session and that you have activated the Spack environment, then simply rerun the spack install command again.
+
+
+
iv. Finishing the installation
+
+
After the installation has successfully finished, you should be able to see that the programs have been installed by running
+
+
spack find
+
+
+
which should list the programs under the compiler heading used for installing the programs.
+
+
You may need to deactivate and reactivate the environment in order to properly use the programs that have been installed.
Once you are satisfied that the programs have been installed properly, you can remove packages that are build-only (not used for running the packages you installed) using the command
+
+
spack gc
+
+
+
Finally, remove the local build directory that Spack used during the installation with
+
+
rm -rf /local/yourNetID/spack_build
+
+
+
and then enter exit to end the interactive session.
F. Removing an Environment and Uninstalling Unneeded Packages
+
+
You may find it necessary to remove a Spack environment, or packages installed using Spack. To uninstall a package, simply run
+
+
spack uninstall yourPackageName
+
+
+
where you should replace yourPackageName with the name of the package that you want to remove. This command will only work for packages that you ‘added’ to the Spack environment, as described above.
+
+
To remove an environment, first make sure that you have deactivated the environment with
+
+
spack env deactivate
+
+
+
and then run
+
+
spack env rm yourEnvironmentName
+
+
+
where you should replace yourEnvironmentName with the name of the environment that you want to remove. Note that this will not necessarily remove the packages that were installed in the environment! After the environment has been removed, you can uninstall the packages that are no longer needed using the command
+
+
spack gc
+
+
+
2. Using Software Installed in Spack
+
+
If your account is configured correctly for using Spack, and the software has been installed inside of a Spack environment, then to use the software all you need to do is activate the corresponding environment. Simply use the command
+
+
spack env activate yourEnvironmentName
+
+
+
and Spack will update your shell accordingly. (Remember that you can see the available Spack environments by running the command spack env list). Once the environment has been activated, you should be able to use the packages just as normal. You can confirm you are using a command installed using Spack by running
+
+
which nameOfYourCommand
+
+
+
where you replace nameOfYourCommand with the name of the command. The command will output a path, and you should see something like spack/var/spack/environments/yourEnvironmentName/ in that path.
+
+
For submitting jobs using Slurm, you will need to make sure that you activate the Spack environment near the beginning of your sbatch file before the srun command. For example,
+
+
#!/bin/sh
+# This file is called submit-script.sh
+#SBATCH --partition=shared # default "shared", if not specified
+#SBATCH --time=0-04:30:00 # run time in days-hh:mm:ss
+#SBATCH --nodes=1 # require 1 nodes
+#SBATCH --ntasks-per-node=64 # cpus per node (by default, "ntasks"="cpus")
+#SBATCH --mem-per-cpu=4000 # RAM per node in megabytes
+#SBATCH --error=job.%J.err
+#SBATCH --output=job.%J.out
+
+# v---Remember to activate your Spack environment!!
+spack env activate yourEnvironmentName
+
+srun --mpi=pmix -n 64 /home/username/mpiprogram
+
+
+
When you submit this job to Slurm and it executes the commands in the sbatch file, it will first activate the Spack environment, and then your program will be run using the programs that are installed inside that environment.
+
+
+
Some programs include explicit module load commands in their execution, which may override the paths provided by the Spack environment. If your program appears to use the system versions of the packages instead of the versions installed in your Spack environment, you may need to remove or modify these explicit commands. Consult your program’s documentation for how to do so. You may want to create your own custom modules and modify your program to explicitly load your custom modules. See Creating Custom Modules Using Spack for more information on how to create your own modules using Spack.
+
+
+
3. Installing and Using a Specific Compiler
+
+
By default, Spack will attempt to compile packages it installs using one of the system compilers, most likely with GCC version 11.3.0. Some programs, however, may need to be compiled using a specific compiler, or require that their dependencies be built using a specific compiler. While this is possible using Spack, the process for installing and using a compiler is a bit more complicated than that for installing “regular” packages as was described above.
+
+
In brief, you will first create a separate environment for installing the compiler. Then you will add that compiler to the list of available compilers that Spack can use. Finally, you can install your desired packages as in a new environment, but you will need to specify which compiler to use.
+
+
A. Install the Compiler in its Own Environment
+
+
i. Identify the compiler and version
+
+
The first step is to identify the compiler and version you need for your program. Consult your program’s documentation for the requirements that it has. Then follow the instructions in C. Finding Program Packages in Spack to find the package name and confirm the version is available.
+
+
ii. Create the compiler’s environment
+
+
Next, create and activate an environment for installing the desired compiler. For example,
where you should replace compilerName and compilerVersion with the name and version of the desired compiler.
+
+
iii. Add the compiler specification to its environment
+
+
Once you’ve activated the environment, add the exact specification for the compiler to the Spack environment with
+
+
spack add compilerName@=compilerVersion
+
+
+
where you need to replace compilerName and compilerVersion with the name and version of the compiler that you identified above.
+
+
iv. Install the compiler in its environment
+
+
Next, follow the instructions in E. Installing Packages in an Environment to install the desired compiler in this environment. Installing the compiler may take several hours, so consider increasing the number of threads to speed up the installation.
+
+
B. Add the Compiler to Spack
+
+
i. Identify the compiler’s installation path
+
+
After installing the compiler, you need to find its location. First, activate the compiler’s environment with spack env activate compilerName_compilerVersion. Next, use the following command to save the path to the compiler as the shell variable compilerPath:
where you need to replace compilerName and compilerVersion with the name and version of the compiler that you installed. You can see print out the path using the command echo $compilerPath.
+
+
ii. Give the compiler’s path to Spack
+
+
Now that you know where the compiler is installed, deactivate the environment with spack env deactivate. Then run the following command to tell Spack to add the compiler to its list of available compilers:
+
+
spack compiler add $compilerPath
+
+
+
iii. Confirm compiler has been added to Spack
+
+
The command
+
+
spack compiler list
+
+
+
will print out the list of compilers that Spack can use, and should now show compilerName@compilerVersion in the results.
+
+
C. Install Packages Using the New Compiler
+
+
Once the compiler has been installed and recognized by Spack, you can now create and activate a new environment for installing your desired packages, following the instructions in Installing Software Using Spack.
+
+
To make sure the packages are installed using your desired compiler, you need to include the compiler when you add the package specification to the environment (D. Adding Package Specifications to the Environment). To include the compiler in the specification, you need to add the symbol % followed by the compiler name and version to the end of the spack add command. For example,
+
+
spack add python@=3.10 %gcc@=9.5.0
+
+
+
will use GCC version 9.5.0 to compile Python 3.10 when installing the package. As a general rule, you should use the same compiler for installing all of your packages within an environment, unless your program’s installation instructions say otherwise.
CHTC uses Spack (https://github.com/spack/spack) for installing and managing software packages on the HPC cluster for all users to use, via the module command. Recently, Spack has developed a feature that allows for users to integrate their local installation of Spack with the system-wide installation. This means that when a user installs software with their local installation of Spack, they can automatically incorporate the system-wide packages to satisfy their software’s dependencies (similar to Conda and Miniconda).
+
+
This guide describes how to create and use custom personal and shared modules for software packages installed using Spack. For instructions on how to install software using Spack for you and/or your research group, see our guide Installing Software Using Spack.
In order to load a software package using the module command, there must be a corresponding “module file” containing the information that the module command needs in order to load the software package. Spack will automatically generate the required content of the module files, but Spack will need to know where these module files should be saved. Similarly, the module command will need to know where the module files are stored.
+
+
If you followed the instructions in Setting Up Spack on HPC, then the default location of your module files is /home/yourNetID/spack_modules where yourNetID is your NetID.
+
+
+
If you are using a shared installation of Spack for a group, and if the person who set up the installation followed the instructions in Setting Up Spack on HPC, then the default location of the shared module files is likely /home/groups/yourGroupName/spack_modules. You can confirm this by running the command spack config get modules | grep -A 2 'roots' and examining the listed paths (make sure you do not have a Spack environment activated when you do so). If the paths are /home/$user/spack_modules, then you should follow the instructions in iii. Updating location of module files in Setting Up Spack on HPC before proceeding.
+
+
+
2. Using Custom Modules
+
+
Spack will automatically create module files for the packages that you explicitly install, in the location described above.
+
+
To update the module command with the default location of the new module files, run the command
+
+
module use ~/spack_modules
+
+
+
+
For a group installation of Spack, you’ll need to modify the module use command to specify the path to your group’s directory. The following should work if your group followed our instructions when setting up Spack:
+
+
module use /home/groups/yourGroupName/spack_modules
+
+
+
+
Now if you run module avail you should see the your custom modules listed in the first section, with the system modules listed in the following section. You can then use the module load command as usual to load your custom module for use in the current terminal session.
+
+
Note: Spack will not automatically create module files for the “upstream” dependencies (packages already installed on the system). If your module load test does not work, follow the instructions in the next section to generate these additional module files.
+
+
To have your custom modules found automatically by the module command, add the above module use command to the end of your ~/.bash_profile file.
+
+
3. Creating Custom Modules Using Spack
+
+
You may need to manually create the custom module files, especially after editing any of the modules configuration for Spack. To create the module files, first activate the desired environment with
+
+
spack env activate yourEnvironmentName
+
+
+
(where you should replace yourEnvironmentName with the your environment name) and then enter the following command:
+
+
spack module tcl refresh
+
+
+
Spack will print out a list of all the packages installed in the current environment, and you’ll be asked to confirm if you wish to create module files for all of these packages.
+
+
To remove old module files, or to update the directory structure, add the option --delete-tree, i.e.
+
+
spack module tcl refresh --delete-tree
+
+
+
If you tried to load a module but received the error(s) ‘Executing this command requires loading “{module file}” which failed while processing the following…‘, then you will need to generate the “upstream” module files in order to use your desired module. In this case, the following command should resolve the issue:
+
+
spack module tcl refresh --upstream-modules
+
+
+
+
Note: You should only run this command inside of an activated Spack environment, otherwise you will be prompted to create module files for ALL Spack packages, including those installed system-wide, regardless of whether they are required dependencies!
+
+
+
Lastly, note that Spack will not directly create module files for software installed independently of Spack (for example, using pip install).
+
+
4. Working with Multiple Environments
+
+
If you have more than one Spack environment that you wish to create modules for, we recommend that you modify the above procedure in order to better organize the list of modules.
+
+
For each environment that you wish to create module files for, activate the environment and then edit the configuration so that the module files are saved into a sub-directory named for that environment. For example,
You should similarly modify the following commands to account for the different paths.
+
+
+
Repeat the process for your other environments.
+
+
To use the modules for a particular environment, run the module use command but specify the path to the environment’s subdirectory. Continuing with our example,
+
+
module use ~/spack_modules/my-first-env
+
+
+
will update the module command with the location of the modules for using my-first-env.
+
+
If you want to switch environments, we recommend that you “unuse” the first environment and then “use” the second, i.e.
+
+
module unuse ~/spack_modules/my-first-env
+module use ~/spack_modules/my-second-env
+
+
+
While you can have more than one environment in “use” by the module command, this increases the chance of loading modules with conflicting dependencies that could result in unexpected behavior.
+
+
5. Using Hierarchy Based Modules
+
+
There are two “flavors” of the module system: tcl and lmod. We use tcl for managing the system modules, and have recommended using tcl throughout this guide. The main difference between the two “flavors” of modules is that tcl uses a “flat” directory structure (all the module files are located in the same central directory) whereas lmod uses a “hierarchy” directory structure (where the module files are grouped by their compiler or MPI version). The hierarchal structure of lmod can be very useful in organizing duplicate module files that differ only by how they were compiled.
+
+
To use the lmod style module files, you should first edit your modules configuration to enable lmod and disable tcl, then refresh your module files.
More advanced options regarding the naming and structure of the lmod module files can be configured by editing the modules.yaml (described in iii. Updating location of module files in Setting Up Spack on HPC). See the Spack documentation for more information on how to configure module files: https://spack.readthedocs.io/en/latest/module_file_support.html#.
CHTC uses Spack (https://github.com/spack/spack) for installing and managing software packages on the HPC cluster for all users to use, via the module command. Recently, Spack has developed a feature that allows for users to integrate their local installation of Spack with the system-wide installation. This means that when a user installs software with their local installation of Spack, they can automatically incorporate the system-wide packages to satisfy their software’s dependencies (similar to Conda and Miniconda).
+
+
This guide describes how to set up a local copy of Spack and integrate it with the system installation, either for an individual user or for a group of users. For instructions on how to install packages with Spack, see our other guide, Installing Software Using Spack.
+
+
+
If your group has already set up a shared group installation of Spack, you can skip to the end of this guide: 3. Using a Shared Group Installation.
and then activate Spack by sourcing the setup script with the . command
+
+
. spack/share/spack/setup-env.sh
+
+
+
That’s it! You can test that Spack has been installed by entering spack and you should see the help text print out. But before trying to install packages using your Spack installation, you should configure it to recognize the system installation of Spack.
+
+
+
This guide assumes that you ran the git clone command in your home directory, i.e. /home/yourNetID. If you did not, then run the following command to print the full path to your Spack installation.
+
+
echo $SPACK_ROOT
+
+
+
We will refer to this path as the SpackRootPath and you will need to use this path where noted in the instructions below.
+
+
+
B. Using Spack in Future Sessions (Individual)
+
+
While Spack has been installed, for each session that you want to use it you will need to rerun the command
A more convenient option is simply to update your account to run this command whenever you log in. Add the command to the end of the .bash_profile file in your home directory, e.g. nano ~/.bash_profile, with the full path to the file. If you ran the git clone command in your home directory, then the line you add should be
where you need to replace yourNetID with your NetID.
+
+
+
If Spack was not installed to your home directory, use the following command instead, where you need to replace SpackRootPath with the path that you noted above.
+
+
. SpackRootPath/share/spack/setup-env.sh
+
+
+
+
C. Obtain the Provided Configuration Files (Individual)
+
+
To simplify the process of configuring your local installation of Spack, we have provided a folder with the necessary configuration files. All that you need to do is copy it to your home directory using the following command.
Your local Spack installation will automatically find the configuration files and will now recognize the packages that are installed system-wide. You can confirm this with the command
+
+
spack find
+
+
+
This should show a list of packages, including those you see when you run the module avail command. A total of ~120 packages should be listed.
+
+
You are now ready to use Spack for installing the packages that you need! See the instructions in Installing Software Using Spack.
+
+
2. Setting Up Spack for Group Use
+
+
The following instructions for a group installation of Spack assumes that a shared directory has already been created for your group, and that you have access to this shared folder. We also recommend communicating with your colleagues before proceeding.
+
+
A. Downloading Spack (Group)
+
+
First, log in to the HPC cluster, and navigate to your group’s shared directory in /home with
+
+
cd /home/groups/yourGroupName
+
+
+
where you should replace yourGroupName with your group’s name. Note this path for use throughout this guide, and communicate it to your group members for configuring their access to the installation.
+
+
You can then install Spack following its documentation. Download the Spack code from their GitHub repository:
and then activate Spack by sourcing the setup script with the . command.
+
+
. spack/share/spack/setup-env.sh
+
+
+
That’s it! You can test that Spack has been installed by entering spack and you should see the help text print out. But before trying to install packages using your Spack installation, you should configure it to recognize the system installation of Spack.
+
+
+
This guide assumes that you ran the git clone command in your group’s home directory, i.e. /home/groups/yourGroupName. If you did not, then run the following command to obtain the full path to your Spack installation. We will refer to this path as the SpackRootPath and you will need to use this path where noted in the instructions below.
+
+
echo $SPACK_ROOT
+
+
+
+
B. Using Spack in Future Sessions (Group)
+
+
While Spack has been installed, for each session that you want to use it you will need to rerun the command
A more convenient option is simply to update your account to run this command whenever you log in. You and your group members should add the command to the end of the .bash_profile file in your respective home directories, e.g. nano ~/.bash_profile, with the full path to the file. For a group installation, the line should look like
where you need to replace yourGroupName with the name of your group.
+
+
+
If Spack was not installed in your group’s home directory, use the following command instead, where you will need to replace SpackRootPath with the path that you noted above.
+
+
. SpackRootPath/share/spack/setup-env.sh
+
+
+
+
C. Obtain the Provided Configuration Files (Group)
+
+
i. Copy the configuration files
+
+
To simplify the process of configuring your local installation of Spack, we have provided a folder with the necessary configuration files. All that you need to do is copy it to your home directory using the following command.
and Spack should now recognize the packages that are installed system-wide. You can confirm this with the command
+
+
spack find
+
+
+
This should show a list of packages similar to what you see when you run the module avail command.
+
+
To ensure that the configuration files are found in future terminal sessions, you and your group members need to edit your respective ~/.bash_profile files to include the above export command. That is, use a command-line text editor to open the file at ~/.bash_profile and add the following line to the end of the file:
If you or someone in your group is interested in creating custom modules following the instructions in the guide Creating Custom Modules Using Spack, then you should update the location where the module files will be saved. You can update the location with the following commands
where you replace yourGroupName with your group’s name.
+
+
You are now ready to use Spack for installing the packages that you need! See the instructions in Installing Software Using Spack.
+
+
3. Using a Shared Group Installation
+
+
Users who want to use a shared group installation of Spack, but who did not set up the installation, only need to modify their ~/.bash_profile file with instructions regarding the path to the shared group installation and its configuration files.
where yourGroupName should be replaced with the name of your group. Confirm the exact commands with the user who installed Spack for your group.
+
+
+
You should be able to find the requisite paths if necessary. For the first line, the command
+
find /home/groups/yourGroupName -type d -name spack | grep "share/spack"
+
+
+
should give the path you need; simply add “setup-env.sh” to the end of the path. For the second line, the command
+
find /home/groups/yourGroupName -type d -name .spack | sort -n | head -n 1
+
+
+
should give the path you need. If it doesn’t, try again without | sort -n | head -n 1 to see the full list of matches, and choose the appropriate one.
+
+
+
Source the .bash_profile with
+
. ~/.bash_profile
+
+
+
or else close the terminal and log in again.
+
+
+
+
Once configured, you can follow the instructions in our guide Installing Software Using Spack to install or use already-installed packages in Spack.
+
+
A. Switching Between Spack Installations
+
+
You can easily switch between different Spack installations by creating scripts containing the commands listed in Step 2. above, and then sourcing the one that you want to use.
+
+
For example, let’s say you want to use a personal installation of Spack for an independent research project, but want to use a group installation of Spack as part of a collaboration. In that case, you would create two scripts, load-my-spack.sh and load-group-spack.sh, and save them to some central location like ~/bin. In each script, you provide the path to the setup-env.sh file and the .spack configuration directory for the respective Spack installations. The example contents of these scripts are provided below, where you should replace yourNetID with your NetID and yourGroupName with the group name of your collaboration.
+Below is a list of guides for some of the most common tasks our users need to
+carry out as they begin and continue to use the HPC resources at the CHTC.
+
+
+
User Expectations
+
+Read through these user expectations and policies before using CHTC services.
+
+
+
+
+
+
+
In bash, $1 references the first argument to a script. If you have more
+arguments, $2 will refer to the second, $3 to the third, etc. There are
+also ways to refer to the whole list of arguments if needed.
+
+
+
Wrapper script
+
+
We can now write a wrapper script! This script checks that one argument is passed, and also allows room for pre- and post-processing in this simple workflow.
+
+
Our wrapper script written in bash, wrapper.sh:
+
#!/bin/bash
+
+# Check if an argument is provided
+if [ $# -ne 1 ]; then
+ echo 'Please use one argument. Usage: wrapper.sh [arg]'
+ exit 1
+fi
+
+# Set filename variable to the first argument to the bash script
+filename=$1
+
+echo 'Pre-processing could go here.'
+
+# Run code
+./echo-next.sh ${filename}
+
+echo 'Post-processing could go here.'
+
+
+
Ensure wrapper.sh is executable with chmod +x wrapper.sh before running it on the command line:
+
[user@login]$ chmod +x wrapper.sh
+[user@login]$ ./wrapper.sh
+Please use one argument. Usage: wrapper.sh [arg]
+[user@login]$ ./wrapper.sh data.csv
+Pre-processing could go here
+data.csv
+Post-processing could go here
+
+
+
Passing arguments with the HTCondor submit file
+
+
Now that we’ve understood how arguments work in shell script and how to write simple wrapper scripts that pass those arguments, we can pass arguments in HTCondor’s submit file by specifying the arguments attribute, as shown in this excerpt:
The ARGS constant is an array – if you have multiple arguments, you can reference
+them using Julia’s usual notation (ARGS[1], ARGS[2], etc.)
+See Julia documentation on arguments for details.
+
+
We can use the script on the command line, assuming Julia is installed and on the PATH:
+
[user@login]$ julia echo-next.jl data.csv
+data.csv
+
+
+
Wrapper script
+
+
We can now write a wrapper script! This script checks that one argument is passed, and also allows room for pre- and post-processing in this simple workflow.
+
+
Our wrapper script written in bash, wrapper.sh:
+
#!/bin/bash
+
+# Check if an argument is provided
+if [ $# -ne 1 ]; then
+ echo 'Please use one argument. Usage: wrapper.sh [arg]'
+ exit 1
+fi
+
+# Set filename variable to the first argument to the bash script
+filename=$1
+
+echo 'Pre-processing could go here.'
+
+# Run code
+julia echo-next.jl ${filename}
+
+echo 'Post-processing could go here.'
+
+
+
Ensure wrapper.sh is executable with chmod +x wrapper.sh before running it on the command line:
+
[user@login]$ chmod +x wrapper.sh
+[user@login]$ ./wrapper.sh
+Please use one argument. Usage: wrapper.sh [arg]
+[user@login]$ ./wrapper.sh data.csv
+Pre-processing could go here.
+data.csv
+Post-processing could go here.
+
+
+
Passing arguments with the HTCondor submit file
+
+
Now that we’ve understood how arguments work in Julia and how to write simple wrapper scripts that pass those arguments, we can pass arguments in HTCondor’s submit file by specifying the arguments attribute, as shown in this excerpt:
sys.argv is a list – if you have multiple arguments, you can reference each item in the
+list using Python’s usual notation (sys.argv[1], sys.argv[2], etc.).
+See Python documentation on sys.argv for details.
+
+
We can use the script on the command line, assuming Python is installed and on the PATH:
We can now write a wrapper script! This script checks that one argument is passed, and also allows room for pre- and post-processing in this simple workflow.
+
+
Our wrapper script written in bash, wrapper.sh:
+
#!/bin/bash
+
+# Check if an argument is provided
+if [ $# -ne 1 ]; then
+ echo 'Please use one argument. Usage: wrapper.sh [arg]'
+ exit 1
+fi
+
+# Set filename variable to the first argument to the bash script
+filename=$1
+
+echo 'Pre-processing could go here.'
+
+# Run code
+python3 echo-next.py ${filename}
+
+echo 'Post-processing could go here.'
+
+
+
Ensure wrapper.sh is executable with chmod +x wrapper.sh before running it on the command line:
+
[user@login]$ chmod +x wrapper.sh
+[user@login]$ ./wrapper.sh
+Please use one argument. Usage: wrapper.sh [arg]
+[user@login]$ ./wrapper.sh data.csv
+Pre-processing could go here.
+data.csv
+Post-processing could go here.
+
+
+
Passing arguments with the HTCondor submit file
+
+
Now that we’ve understood how arguments work in Python and how to write simple wrapper scripts that pass those arguments, we can pass arguments in HTCondor’s submit file by specifying the arguments attribute, as shown in this excerpt:
The output of commandArgs is a vector – if you have multiple arguments, you can reference each
+item using R’s usual notation (args[1], args[2], etc.).
+See R documentation on commandArgs for details.
+
+
We can use the script on the command line, assuming R is installed and on the PATH:
We can now write a wrapper script! This script checks that one argument is passed, and also allows room for pre- and post-processing in this simple workflow.
+
+
Our wrapper script written in bash, wrapper.sh:
+
#!/bin/bash
+
+# Check if an argument is provided
+if [ $# -ne 1 ]; then
+ echo 'Please use one argument. Usage: wrapper.sh [arg]'
+ exit 1
+fi
+
+# Set filename variable to the first argument to the bash script
+filename=$1
+
+echo 'Pre-processing could go here.'
+
+# Run code
+Rscript echo-next.R ${filename}
+
+echo 'Post-processing could go here.'
+
+
+
Ensure wrapper.sh is executable with chmod +x wrapper.sh before running it on the command line:
+
[user@login]$ chmod +x wrapper.sh
+[user@login]$ ./wrapper.sh
+Please use one argument. Usage: wrapper.sh [arg]
+[user@login]$ ./wrapper.sh data.csv
+Pre-processing could go here.
+data.csv
+Post-processing could go here.
+
+
+
Passing arguments with the HTCondor submit file
+
+
Now that we’ve understood how arguments work in R and how to write simple wrapper scripts that pass those arguments, we can pass arguments in HTCondor’s submit file by specifying the arguments attribute, as shown in this excerpt:
Many executables require arguments to perform tasks. This page shows you basic examples of how different programming languages take arguments, and how wrapper scripts can be written to pass arguments.
+
+
What is an argument?
+
+
In a computational workflow, how do you tell the computer what script to run, what data file to read, or which parameters to use? These inputs are typically passed as arguments, which your program is configured to read.
+
+
For example, on the command line, the sleep program takes in one argument, a non-negative number, then pauses for that number of seconds. If we run the following code,
+
sleep 60
+
+
the computer pauses for 60 seconds. In this example:
+
+
sleep is the executable.
+
60 is the argument.
+
+
+
What is a wrapper script, and why should I write one?
+
+
While the above example is simple, what if you need something more complex, like a workflow? Your workflow might need some pre- or post-processing, if/else statements, or iterations.
+
+
+
Wrapper scripts are a way to package simple computational workflows in one executable script, allowing computations to be run in noninteractive batches.
+
+
+
Using arguments in different programming languages
+
+
Let’s see how different programming languages might take an argument. Each tab contains different expressions of simple program, echo-next, that prints our next argument (we’ll use data.csv) to the terminal. Example wrapper scripts and HTCondor submit files are also included.
+
+
+
+
+
+
+
+
+ Shell script
+
+
+
+
+
+
+ Julia
+
+
+
+
+
+
+ Python
+
+
+
+
+
+
+ R
+
+
+
+
+
+
+
+
Code
+
Our executable written in shell, echo-next.sh:
+
#!/bin/bash
+echo $1
+
+
+
We can use it on the command line after changing the file permissions to executable:
In bash, $1 references the first argument to a script. If you have more
+arguments, $2 will refer to the second, $3 to the third, etc. There are
+also ways to refer to the whole list of arguments if needed.
+
+
+
Wrapper script
+
+
We can now write a wrapper script! This script checks that one argument is passed, and also allows room for pre- and post-processing in this simple workflow.
+
+
Our wrapper script written in bash, wrapper.sh:
+
#!/bin/bash
+
+# Check if an argument is provided
+if [ $# -ne 1 ]; then
+ echo 'Please use one argument. Usage: wrapper.sh [arg]'
+ exit 1
+fi
+
+# Set filename variable to the first argument to the bash script
+filename=$1
+
+echo 'Pre-processing could go here.'
+
+# Run code
+./echo-next.sh ${filename}
+
+echo 'Post-processing could go here.'
+
+
+
Ensure wrapper.sh is executable with chmod +x wrapper.sh before running it on the command line:
+
[user@login]$ chmod +x wrapper.sh
+[user@login]$ ./wrapper.sh
+Please use one argument. Usage: wrapper.sh [arg]
+[user@login]$ ./wrapper.sh data.csv
+Pre-processing could go here
+data.csv
+Post-processing could go here
+
+
+
Passing arguments with the HTCondor submit file
+
+
Now that we’ve understood how arguments work in shell script and how to write simple wrapper scripts that pass those arguments, we can pass arguments in HTCondor’s submit file by specifying the arguments attribute, as shown in this excerpt:
The ARGS constant is an array – if you have multiple arguments, you can reference
+them using Julia’s usual notation (ARGS[1], ARGS[2], etc.)
+See Julia documentation on arguments for details.
+
+
We can use the script on the command line, assuming Julia is installed and on the PATH:
+
[user@login]$ julia echo-next.jl data.csv
+data.csv
+
+
+
Wrapper script
+
+
We can now write a wrapper script! This script checks that one argument is passed, and also allows room for pre- and post-processing in this simple workflow.
+
+
Our wrapper script written in bash, wrapper.sh:
+
#!/bin/bash
+
+# Check if an argument is provided
+if [ $# -ne 1 ]; then
+ echo 'Please use one argument. Usage: wrapper.sh [arg]'
+ exit 1
+fi
+
+# Set filename variable to the first argument to the bash script
+filename=$1
+
+echo 'Pre-processing could go here.'
+
+# Run code
+julia echo-next.jl ${filename}
+
+echo 'Post-processing could go here.'
+
+
+
Ensure wrapper.sh is executable with chmod +x wrapper.sh before running it on the command line:
+
[user@login]$ chmod +x wrapper.sh
+[user@login]$ ./wrapper.sh
+Please use one argument. Usage: wrapper.sh [arg]
+[user@login]$ ./wrapper.sh data.csv
+Pre-processing could go here.
+data.csv
+Post-processing could go here.
+
+
+
Passing arguments with the HTCondor submit file
+
+
Now that we’ve understood how arguments work in Julia and how to write simple wrapper scripts that pass those arguments, we can pass arguments in HTCondor’s submit file by specifying the arguments attribute, as shown in this excerpt:
sys.argv is a list – if you have multiple arguments, you can reference each item in the
+list using Python’s usual notation (sys.argv[1], sys.argv[2], etc.).
+See Python documentation on sys.argv for details.
+
+
We can use the script on the command line, assuming Python is installed and on the PATH:
We can now write a wrapper script! This script checks that one argument is passed, and also allows room for pre- and post-processing in this simple workflow.
+
+
Our wrapper script written in bash, wrapper.sh:
+
#!/bin/bash
+
+# Check if an argument is provided
+if [ $# -ne 1 ]; then
+ echo 'Please use one argument. Usage: wrapper.sh [arg]'
+ exit 1
+fi
+
+# Set filename variable to the first argument to the bash script
+filename=$1
+
+echo 'Pre-processing could go here.'
+
+# Run code
+python3 echo-next.py ${filename}
+
+echo 'Post-processing could go here.'
+
+
+
Ensure wrapper.sh is executable with chmod +x wrapper.sh before running it on the command line:
+
[user@login]$ chmod +x wrapper.sh
+[user@login]$ ./wrapper.sh
+Please use one argument. Usage: wrapper.sh [arg]
+[user@login]$ ./wrapper.sh data.csv
+Pre-processing could go here.
+data.csv
+Post-processing could go here.
+
+
+
Passing arguments with the HTCondor submit file
+
+
Now that we’ve understood how arguments work in Python and how to write simple wrapper scripts that pass those arguments, we can pass arguments in HTCondor’s submit file by specifying the arguments attribute, as shown in this excerpt:
The output of commandArgs is a vector – if you have multiple arguments, you can reference each
+item using R’s usual notation (args[1], args[2], etc.).
+See R documentation on commandArgs for details.
+
+
We can use the script on the command line, assuming R is installed and on the PATH:
We can now write a wrapper script! This script checks that one argument is passed, and also allows room for pre- and post-processing in this simple workflow.
+
+
Our wrapper script written in bash, wrapper.sh:
+
#!/bin/bash
+
+# Check if an argument is provided
+if [ $# -ne 1 ]; then
+ echo 'Please use one argument. Usage: wrapper.sh [arg]'
+ exit 1
+fi
+
+# Set filename variable to the first argument to the bash script
+filename=$1
+
+echo 'Pre-processing could go here.'
+
+# Run code
+Rscript echo-next.R ${filename}
+
+echo 'Post-processing could go here.'
+
+
+
Ensure wrapper.sh is executable with chmod +x wrapper.sh before running it on the command line:
+
[user@login]$ chmod +x wrapper.sh
+[user@login]$ ./wrapper.sh
+Please use one argument. Usage: wrapper.sh [arg]
+[user@login]$ ./wrapper.sh data.csv
+Pre-processing could go here.
+data.csv
+Post-processing could go here.
+
+
+
Passing arguments with the HTCondor submit file
+
+
Now that we’ve understood how arguments work in R and how to write simple wrapper scripts that pass those arguments, we can pass arguments in HTCondor’s submit file by specifying the arguments attribute, as shown in this excerpt:
This page documents some common and known issues encountered on the HTC system. While this page can be beneficial in troubleshooting, it does not contain a comprehensive list of errors.
+
+
Visit our Get Help page to find more resources for troubleshooting.
We also recommend using the -y option to prevent installation from hanging due to interactive prompts.
+
+
+
+
[Container] When attempting to run a Docker container, it fails with the error message "[FATAL tini (7)] exec ./myExecutable.sh failed: Exec format error".
+
+
Cause:
+
The Docker container is likely built on an Apple computer using an ARM processor, which is incompatible with Linux machines.
+
Solution:
+
To resolve this, when building your Docker container, use the command:
+
docker build --platform linux/amd64 .
+
+
+
+
+
[GPU] My GPU job has been in the queue for a long period of time and is not starting.
+
+
Cause:
+
Jobs default to using CentOS9, but most GPU nodes are currently running CentOS8.
+
Solution:
+
To your submit file, add the following line and resubmit:
This guide describes when and how to use software, using MPI as an example, that is available as pre-installed modules on the HTC system.
+
+
To best understand the below information, users should already have an
+understanding of:
+
+
+
Using the command line to: navigate within directories,
+create/copy/move/delete files and directories, and run their
+intended programs (aka "executables").
In CHTC, we install a minimal set of software for use
+on our systems. On the HTC System, CHTC staff manage installations of
+the following types of programs:
+
+
+
Compilation tools and common dependencies (e.g. MPI, different GCC versions)
+
Software that requires a shared license (e.g. COMSOL)
+
+
+
Information on how to access CHTC-managed installations is in the next
+section of this guide.
+
+
2. Using Pre-Installed Software in Modules
+
+
All software on the HTC system that is installed by CHTC staff is available via
+a tool called “modules”.
+
+
A. See Available Software Modules
+
+
There are two ways to search through the software modules on the HTC system:
+
+
+
View all modules
+ This command will show all software modules available:
+
[alice@submit]$ module avail
+
+
+
Search for specific modules
+ If you are searching for a specific software module, you can use the
+ module spider command with part of the software name. For example, to
+ search for Open MPI modules, you would type:
+
[alice@submit]$ module spider openmpi
+
+
+
+
+
B. Load Software in Modules
+
+
Once you find a software module that you want to use, you need to “load” it
+into your command line environment to make it active, filling in module_name with the name you found through one of the above steps.
+
+
[alice@submit]$ module load module_name
+
+
+
+
When to Load Modules
+
+
You can load modules to compile code (see below). If you do this, make sure to load
+the same modules as part of your job script before running the main command.
+
+
You can also load modules to run specific software. If done for interactive
+testing, this should be done in an interactive job; otherwise, the module
+should be loaded in the job submit file.
+
+
+
C. Unload Software in Modules
+
+
If you no longer want to use a specific software installation, you can “unload”
+the software module with the following command:
+
+
[alice@submit]$ module unload module_name
+
+
+
If you want to clear your command line environment and start over, run the following:
+
+
[alice@submit]$ module purge
+
+
+
3. Installing Software on the HTC System
+
+
A. Overview
+
+
Unless you are using a licensed software program provided via modules, you
+are able to compile and install the software you need on the HTC System.
+
+
Compilation can be done via an interactive job as described in
+our HTC Compiling or Testing Code with an Interactive Job guide.
+If using CHTC’s provided compilation tools via modules, make
+sure to load the needed modules before compiling and to load the same
+modules in your job submission.
+
+
For groups that would like to share software installations among group
+members, please contact us about getting a shared “group” directory.
+
+
If you are new to software installation, see the section below for
+a more step-by-step description of the process.
Download Source Code - download the source code for your desired program.
+ You should only need the source code until the software is properly installed, but if desired, you may keep a zipped copy of
+ the source code in your workspace.
+
Read the Docs - try to find the installation instructions, either online or
+ in the downloaded source code. In particular, you’ll want to note if there are
+ any special requirements for dependencies like MPI or the compiler needed.
+
Load Modules - if you are using software modules to help you build your
+ code, load them now. Keep track of what you use so that you can load them
+ in your job submit file later. We also recommend doing a module purge before
+ loading your compiling modules to make sure you’re starting from a clean environment.
+
Install - most scientific software follows the three step installation process
+ of configure - make - make install.
+
+
configure- this step checks for tools and requirements needed to compile
+ the code. This is the step where you set the final installation location of
+ a program. The option for setting this location is typically called the
+ “prefix”; a common syntax is: $ ./configure --prefix=/home/user.
+ This is where you will want to set the installation location to be your
+ /home directory.
+
make - this step compiles and links the code, turning it from human-readable
+ source code to compiled binary code. This is usually the most time consuming
+ step of the installation process.
+
make install - this step copies compiled files to the final installation location
+ (usually specified in the configure step).
+
+
+
+
+
+
4. Example: Using an MPI module in HTC Jobs
+
+
Below is the process of setting up HTC jobs that use the MPI modules to run. This process can be modified for other software available in modules as well.
+
+
Before you begin, review our below discussion of MPI requirements and
+use cases, to make sure that our multi-core MPI capabilities
+are the right solution for your computing problem.
+
+
Once you know that you need to run multi-core jobs that use MPI on our
+HTC system, you will need to do the following:
Most jobs on CHTC's HTC system are run on one CPU (sometimes called a
+"processor", or "core") and can be executed without any special
+system libraries. However, in some cases, it may be advantageous to run
+a single program on multiple CPUs (also called multi-core), in order to
+speed up single computations that cannot be broken up and run as
+independent jobs.
+
+
Running on multiple CPUs can be enabled by the parallel programming
+standard MPI. For MPI jobs to compile and run, CHTC has a set of MPI
+tools installed to a shared location that can be accessed via software
+modules.
+
+
+
+
B. View MPI Modules on the HTC System
+
+
MPI tools are accessible on the HTC system through software "modules",
+which are tools to access and activate a software installation. To see
+which MPI packages are supported in the HTC, you can type the following
+command from the submit server:
+
+
[alice@submit]$ module avail
+
+
+
Your software may require newer versions of MPI libraries than those
+available via our modules. If this is the case, send an email to
+chtc@cs.wisc.edu, to find out if we can install
+that library into the module system.
+
+
C. Submitting MPI jobs
+
+
+
+
1. Compile MPI Code
+
+
You can compile your program by submitting an interactive build job to
+one of our compiling servers. Do not compile code on the submit server,
+as doing so may cause performance issues. The interactive job is
+essentially a regular HTCondor job, but without an executable; you
+are the one running the commands instead (in this case, to compile the
+program).
+
+
Instructions for submitting an interactive build/compile job are
+available on our interactive submission guide.
+The only line in the submit file that you need to change is
+transfer_input_files to reflect all the source files on which your
+program depends. Otherwise, go through the steps described in that guide
+until immediately after running condor_submit -i.
+
+
Once your interactive job begins on one of our compiling servers, you
+can confirm which MPI modules are available to you by typing:
+
+
[alice@build]$ module avail
+
+
+
Choose the module you want to use and load it with the following
+command:
+
+
[alice@build]$ module load mpi_module
+
+
+
where mpi_module is replaced with the name of the MPI module you'd
+like to use.
+
+
After loading the module, compile your program. If your program is
+organized in directories, make sure to create a tar.gz file of
+anything you want copied back to the submit server. Once typing exit
+the interactive job will end, and any *files* created during the
+interactive job will be copied back to the submit location for you.
+
+
If your MPI program is especially large (more than 100 MB, compiled), or
+if it can only run from the exact location to which it was installed,
+you may also need to take advantage of CHTC's shared software location
+or our public web proxy called Squid. Email CHTC's Research Computing
+Facilitators at chtc@cs.wisc.edu if this is the case.
+
+
+
+
2. Script For Running MPI Jobs
+
+
To run your newly compiled program within a job, you need to write a
+script that loads an MPI module and then runs the program, like so:
+
+
#!/bin/bash
+
+# The following three commands are **REQUIRED** to enable modules, and then to load the appropriate MP/MPI module
+export PATH
+. /etc/profile.d/modules.sh
+module load mpi_module
+
+# Untar your program installation, if necessary
+tar -xzf my_install.tar.gz
+
+# Command to run your OpenMP/MPI program
+# (This example uses mpirun, other programs
+# may use mpiexec, or other commands)
+mpirun -np 8 ./path/to/myprogram
+
+
+
Replace mpi_module with the name of the module you used to compile
+your code, myprogram with the name of your program, and X with the
+number of CPUs you want the program to use. There may be additional
+options or flags necessary to run your particular program; make sure to
+check the program's documentation about running multi-core processes.
+
+
+
+
3. Submit File Requirements
+
+
There are several important requirements to consider when writing a
+submit file for multicore jobs. They are shown in the sample submit file
+below and include:
+
+
+
+
Require access to MPI modules. To ensure that your job will have
+access to CHTC software modules, including MPI modules, you must
+include the following in your submit file.
+
+
requirements = (HasChtcSoftware == true)
+
+
+
+
The script you wrote above (shown as run_mpi.sh below) should be
+your submit file "executable", and your compiled program and any
+files should be listed in transfer_input_files.
+
+
+
+
A sample submit file for multi-core jobs is given below:
+
+
# multicore.sub
+# A sample submit file for running a single multicore (8 cores) job
+executable = run_mpi.sh
+# arguments = (if you want to pass any to the shell script)
+
+## Specify the name of HTCondor's log, standard error, and standard out files
+log = mc_$(Cluster).log
+output = mc_$(Cluster).out
+error = mc_$(Cluster).err
+
+# Tell HTCondor how to handle input files
+should_transfer_files = YES
+transfer_input_files = (this should be a comma separate list of input files if needed)
+
+# Requirement for accessing new set of modules
+requirements = ( HasChtcSoftware == true )
+
+## Request resources needed by your job
+request_cpus = 8
+request_memory = 8GB
+request_disk = 2GB
+
+queue
+
+
+
After the submit file is complete, you can submit your jobs using
+condor_submit.
The CHTC high-throughput computing (HTC) cluster provides support a variety of computational research tasks. The HTC system offers CPUs/GPUs, high-memory nodes, and other specialized hardware. Workflows that run well on this system include RNA/DNA sequencing, machine learning workflows, weather modeling, monte carlo simulations, etc.
+
+
To get access to the HTC System, please complete our
+New User Consultation Form. After your request is received,
+a Research Computing Facilitator will follow up to discuss the computational needs
+of your research and connect you with computing
+resources (including non-CHTC services) that best fit your needs.
Below are some of the default limits on CHTC’s HTC system. Note that as a large-scale
+computing center, we want you to be able to run at a large scale - often much larger
+than these defaults. Please contact the facilitation team whenever you encounter one
+of these limits so we can adjust your account settings or discuss alternative ways to
+achieve your computing goals.
+
+
+
Jobs with long runtimes. There is a default run limit of 72
+hours for each job queued in the HTC System, once it starts running.
+Jobs longer than this will be placed in HTCondor’s “hold” state.
+If your jobs will be longer, contact the CHTC facilitation team, and we’ll help you to determine the
+best solution.
+
Submitting many jobs from one submit file. HTCondor is designed
+to submit thousands (or more) jobs from one submit file. If you are
+submitting over 10,000 jobs per submit file or want to queue
+more than 50,000 total jobs as a single user,
+please email us as we have strategies to
+submit that many jobs in a way that will ensure you have as many
+jobs running as possible without also compromising queue performance.
+
Submitting many short jobs from one submit file. While HTCondor
+is designed to submit thousands of jobs at a time, many short jobs
+can overwhelm the submit server, resulting in other jobs taking much
+longer to start than usual. If you plan on submitting over
+1000 jobs per submit file, we ask that you ensure each job has a
+minimum run time of 5 minutes (on average).
+
The default disk quota is 20 GB in your /home directory, as a
+starting point. You can track your use of disk space and your quota value,
+using our Quota Guide. If you need more space
+for concurrent work, please send an email to chtc@cs.wisc.edu.
+
Submitting jobs with "large" files: HTCondor's
+normal file transfer mechanism ("transfer_input_files") is good for
+files up to 100MB in size (or 500MB total, per job). For jobs with larger
+files, please see our guide on File Availability
+Options, and contact us to make arrangements.
+
+
+
+
+
HTC Hardware and Configuration
+
+
The HTC System consists of several submit servers and many compute (aka execute)
+nodes. All users log in at a login node, and submit their workflow as HTCondor jobs that run on execute points.
+
+
+
+
HTC Operating System and Software
+
+
Submit servers in the HTC System are running CentOS 7 Linux.
+
+
Due to the distributed and independent nature of the HTC system’s execute points, there can be a variety of operating systems on the pool of execution point resources (especially for users that opt into running jobs on the globally available OSPool operated by the OSG). However, the default operating system is CentOS 8 Stream Linux unless users request to run on a different operating system using their HTCondor submit file.
+
+
The HTC system is a test bed for the HTCondor Software Suite, and thus is typically running the latest or soon-to-be-released versions of HTCondor.
+
+
To see more details of other software on the cluster, see our HTC Guides page.
+
+
+
+
HTC Submit Servers
+
+
There are multiple submit servers for the HTC system. The two most common submit servers are ap2001.chtc.wisc.edu and ap2002.chtc.wisc.edu (formerly submit1.chtc.wisc.edu and submit2.chtc.wisc.edu, respectively). All users will be notified what submit server they should log into when their account is created.
+
+
+
+
HTC Execute Nodes
+
+
Only execute nodes will be used for performing your computational work.
+
+
By default, when users submit HTCondor jobs, their jobs will only run on execute points owned and managed by CHTC staff. As of January 2024, there are approximately 40,000 CPU slots and 80+ GPU slots available in the CHTC execute pool.
+
+
Some users, particularly those requesting GPUs, may wish to access additional execute points so that they may have more jobs running simultantiously. HTC users can opt in to allowing their jobs to run on additional execute points not owned or managed by CHTC staff. There are two additional execute pools that users can opt into using: the UW Grid and the OSG’s OSPool. There are many advantages to opting into running on these execute pools, such as accessing more GPUs, accessing different computer architectures, and having more jobs running in parallel. However, because these machines are not managed by CHTC and thus are backfilling on hardware owned by other entities, it is recommended that users only opt into using these resources if they have short (<~10 hours), inturruptable jobs. For more information, see the Scaling Beyond Local HTC Capacity guide.
+
+
Fair Share Allocation
+
+
To promote fair access to HTC computing resources, all users are subject to a fair-share policy. This “fair-share” policy means that users who have run many jobs in the near-past will have a lower priority, and users with little recent activity will see their waiting jobs start sooner.
+(The HTC system does not have a strict “first-in-first-out” queue policy.)
+
+
Resource requests will also impact the number of jobs a user has running. Smaller jobs (those requesting smaller amounts of CPUs, memory, and disk) as well as more flexible jobs (those requesting to use a variety of GPUs instead of a specific GPU type) are able to match to more execute points than larger, less flexible jobs. Thus, these jobs will start sooner and more jobs will run in parallel.
+
+
+
Data Storage and Management
+
+
Data space in the HTC system is not backed-up and should be
+treated as temporary by users. Only files necessary for
+actively-running jobs should be kept on the filesystem, and files
+should be removed from the system when jobs complete. A primary copy of any
+essential files (e.g. software, submit files, input) should be kept in an
+alternate, non-CHTC storage location.
+
+
CHTC Staff reserve the right to remove any significant amounts of data
+on the HTC System in our efforts to maintain filesystem performance
+for all users.
+
+
+
+
Tools for Managing /home and /staging Space
+
+
+
Check /home Quota and Usage
+
To see what disk and items quotas are currently set for your /home direcotry, use the
+quota -vs command. See the example below:
+
+
[alice@submit]$ quota -vs
+Disk quotas for user alice (uid 20384):
+ Filesystem space quota limit grace files quota limit grace
+ /dev/sdb1 12690M 20480M 30720M 161k 0 0
+
+
+
The output will list your total data usage under blocks on the /dev/sbd1 filesystem that manages user /home data:
+
+
space (MB): the amount of disk space you are currently using
+
quota (MB): your soft quota. This is the value we recommend you consider to be your “quota”.
+
limit (MB): the hard limit or absolute maximum amount of space you can use. This value is almost always 10GB larger than your soft quota, and is only provided as a helpful spillover space. Once you hit this hard limit value, you and your jobs will no longer be allowed to save data.
+
files: the number of files in your /home directory. /home does not typically restrict the number of files a user can have, which is why there are no values for file quota and limit
+
+
+
Each of the disk space values are given in megabytes (MB), which can be converted to gigabytes (GB) by dividing by 1024.
+
+
+
+
Check /staging Quota and Usage
+
+
To see your /staging quota and usage, use the get_quotas <NetID> command. For example,
+
[NetID@ap2001 ~]$ get_quotas /staging/NetID
+
+
+
If the output of this command is blank, it means you do not have a /staging directory. Contact CHTC staff to request one at any time.
+
+
+
+
Alternative Commands to Check Quotas
+
Alternatively, the ncdu command can also be used to see how many
+files and directories are contained in a given path:
When ncdu has finished running, the output will give you a total file
+count and allow you to navigate between subdirectories for even more
+details. Type q when you're ready to exit the output viewer. More
+info here: https://lintut.com/ncdu-check-disk-usage/
+
+
+
Request a Quota Increase
+
Increased quotas on either of these locations are available upon email
+request to chtc@cs.wisc.edu after a user has
+cleared out old data and run relevant test jobs to inform the request. In your request,
+please include both size (in GB) and file/directory counts. If you don't
+know how many files your installation creates, because it's more than
+the current items quota, simply indicate that in your request.
What if you want to submit a list of jobs, each with unique arguments? Instead of tediously
+creating separate submit files for each job, we can utilize attributes in the submit file to
+pass various arguments to multiple jobs. On this page, we will introduce two methods: using
+numerical arguments, and using custom arguments. While this guide shows specific scripts and
+examples, we hope that you can apply the underlying principles to your own jobs. For more
+general descriptions of using arguments and submitting multiple jobs, see:
Submit multiple jobs by leveraging $(Process)/$(ProcID) as numerical arguments
+
+
One of the default variables in an HTCondor submit file is $(Process) or $(ProcID). This is assigned an integer that numbers N instances of the calculation, starting from 0 and ending at N-1. $(Process)/$(ProcID) can be useful for distinguishing filenames of outputs of different calculations within a job to prevent rewriting over outputs (also $(Cluster)/$(ClusterID)), but it may also be used as an argument for an executable.
+
+
In this exercise, we will use $(Process) to estimate the life expectancy within the years 2000-2009.
+
+
+
+
Create a new submit file, least_squares_process.sub.
+
+
# least_squares_process.sub - an example HTCondor submit file for passing arguments
+# with the $(Process) variable
+
+# Custom variables can be specified
+country = Brazil
+processplus = $(Process)+2000
+year = $INT(processplus,%d)
+
+# Specify your executable and your arguments
+# Usage: least_squares.py [CSV] [Country] [Year, optional]
+executable = least_squares.py
+arguments = gapminder-life-expectancy.csv $(country) $(year)
+
+# Specify the log, standard error, and standard output (or screen output) files
+log = $(country)_$(year).log
+error = $(country)_$(year).err
+output = $(country)_$(year).out
+
+# We need to also transfer the csv file for the calculation
+transfer_input_files = gapminder-life-expectancy.csv
+
+# Requirements for our calculation
+request_cpus = 1
+request_memory = 1GB
+request_disk = 1GB
+
+# Tell HTCondor to run 10 instances of our calculation
+queue 10
+
+
+
Notice the differences between this submit script and the previous one:
+
+
At the bottom of the script, queue 10 tells HTCondor to run 10 instances of our calculation. Each calculation will be assigned a number $(Process), which will range from 0 to 9.
+
We want to estimate life expectancy between 2000 and 2009, so we set a custom variable processplus = $(Process) + 2000. This returns a string, i.e. “0 + 2000”, but this isn’t what we want! In the next line, we convert it to a useful integer value: year = $INT(processplus,%d), which will now range from 2000 to 2009.
+
In our arguments, we append our new variable $(year).
+
To prevent HTCondor from rewriting outputs from each calculation over each other, _$(year) is appended to the filenames of the log, error, and output files.
Once the job is fully complete, you can check your outputs to see if it worked as expected.
+
+
+
+
Submit multiple jobs with custom arguments using queue <variable> from <list>
+
+
Let’s say we want to perform our analysis on a few countries in the year 2024, but not all. Instead of creating separate submit files from each country, we can utilize HTCondor’s queue <variable> from <list> function.
+
+
+
+
Create text file called countries.txt. Within it, paste the following:
+
+
Argentina
+Brazil
+Chile
+
+
+
+
Create a new submit script, least_squares_list.sub.
+
+
# least_squares_list.sub - an example HTCondor submit file for passing arguments
+
+# Specify your executable and your arguments
+# Usage: least_squares.py [CSV] [Country] [Year, optional]
+executable = least_squares.py
+arguments = gapminder-life-expectancy.csv $(country) 2024
+
+# Specify the log, standard error, and standard output (or screen output) files
+log = $(country)_2024.log
+error = $(country)_2024.err
+output = $(country)_2024.out
+
+# We need to also transfer the csv file for the calculation
+transfer_input_files = gapminder-life-expectancy.csv
+
+# Requirements for our calculation
+request_cpus = 1
+request_memory = 1GB
+request_disk = 1GB
+
+# Tell HTCondor to run instances of our calculation from a list
+queue country from countries.txt
+
+
+
Notice differences between this submit file and the previous examples.
+
+
At the bottom of the submit file, we now use queue country from countries.txt. This tells HTCondor to iterate over countries.txt and in each iteration, set the variable country to the value on that line.
+
In our arguments line, we use the $(country) variable.
Many executables require arguments to perform tasks. As a user, you may need to specify
+specific files or parameters for your calculations. This exercise will show how arguments are
+used in a simple calculation and walk through how to write an HTCondor submit file to pass
+these arguments to the executable. This guide builds on the concepts shown in Basic Scripting
+and Job Submission with Arguments. To see how to submit multiple jobs
+using arguments, see Passing Multiple Arguments to Multiple Jobs with One Submit
+File.
+
+
Understand and test the script with arguments
+
+
In this exercise, we will perform a linear least squares regression analysis on life expectancy data for a country.We will need to understand how the script utilizes arguments.
For this exercise, it’s not necessary to understand each line of code, but in summary, this code reads a .csv file and performs a linear least squares regression on a specified country’s data. Let’s see how to use this code.
+
+
First we need to make our code executable:
+
+
[user@ap2002]$ chmod +x least_squares.py
+
+
+
From the terminal, we can run the code to preview its usage:
From the returned line, we see how to use it. The executable is least_squares.py. The arguments are:
+
+
[CSV]: The .csv file containing our data
+
[Country]: The name of the country we want to analyze
+
[Year]: An optional argument that lets us estimate the life expectancy of the country for that year.
+
+
+
+
Let’s run the code to see what the output should look like. In the terminal, type:
+
+
[user@ap2002]$ ./least_squares.py gapminder-life-expectancy.csv Brazil
+
+
+
This should return the following:
+
+
Linear regression (y = mx + b):
+ m = 0.205 b = -348.495
+
+
+
+
If we put in the year 2000 as an optional argument:
+
+
[user@ap2002]$ ./least_squares.py gapminder-life-expectancy.csv Brazil 2000
+Linear regression (y = mx + b):
+m = 0.205 b = -348.495
+Estimated life expectancy for Brazil in the year 2000
+61.280
+
+
+
+
+
Write and submit an HTCondor submit file with arguments
+
+
Now that we know how to run our script and what to expect, let’s translate this into a job for HTCondor.
+
+
+
+
Create a submit file for the job called least_squares.sub.
+
+
# least_squares.sub - an example HTCondor submit file for passing arguments
+
+ # Custom variable can be specified
+ country = Brazil
+
+ # Specify your executable and your arguments
+ # Usage: least_squares.py [CSV] [Country] [Year, optional]
+ executable = least_squares.py
+ arguments = gapminder-life-expectancy.csv $(country)
+
+ # Specify the log, standard error, and standard output (or screen output) files
+ log = $(country).log
+ error = $(country).err
+ output = $(country).out
+
+ # We need to also transfer the csv file for the calculation
+ transfer_input_files = gapminder-life-expectancy.csv
+
+ # Requirements for our calculation
+ request_cpus = 1
+ request_memory = 1GB
+ request_disk = 1GB
+
+ # Tell HTCondor to run 1 instance of our calculation
+ queue
+
+
+
Important notes:
+
+
In this submit file, we created a custom variable called country set to the value “Brazil”, which we use later when specifying arguments.
+
We tell the job manager that our executable is least_squares.py.
+
In a separate line, we pass the arguments gapminder-life-expectancy.csv $(country) in the order that is required by least_squares.py.
+
We also need to transfer the csv with our data, which we do with transfer_input_files = gapminder-life-expectancy.csv
+
+
+
+
Submit the file.
+
+
[user@ap2002]$ condor_submit least_squares.sub
+
+
+
We can monitor the job with condor_q.
+
+
+
Once the job is completed, we can check least_sq_Brazil.out to see that the arguments passed to least_squares.py and works as expected.
A wrapper script can be useful in jobs, enabling more complex operations and simple pre- and post- calculation commands. Wrapper scripts can also take and pass on arguments. Let’s see how to write a simple wrapper script for our calculation.
+
+
In this exercise, we will obtain data for multiple countries between the years 2024 and 2033 and return them in tarballs organized by country.
+
+
+
+
Create least_squares_range.sh.
+
+
#!/bin/bash
+
+# This wrapper script takes in four arguments:
+# Usage: ./least_squares_range.sh [CSV] [Country] [Start Year] [End Year]
+
+# Assign variables for readability
+CSV=$1
+Country=$2
+StartY=$3
+EndY=$4
+
+# Loop least_squares.py over start and end years
+for i in $(seq $StartY $EndY);
+do
+ ./least_squares.py ${CSV} $Country $i > ${Country}_${i}.txt
+done
+
+# Create tarball
+tar -czf ${Country}.tar.gz ${Country}*.txt
+
+# Delete text files
+rm *.txt
+
+
+
In a shell script, arguments are assigned integers according to their order. The executable script itself, least_squares_range.sh, is assigned $0.
+
+
While not necessary, it’s useful to assign descriptive variables to input arguments to keep track of what’s happening in the wrapper script. Country=$2 assigns the variable $Country with the same value as the second argument. Note that there must be no spaces around the = sign.
+
+
The script then uses a simple for loop to run least_squares.py over a range of years and writes them to text files.
+
+
Once the loop is complete, the text files are consolidated into a tarball. Since this object is in the top-level directory of the job, it will automatically be transferred back to the submit server.
+
+
+
Create a new submit script, least_squares_range.sub.
+
+
# least_squares_range.sub - an example HTCondor submit file for passing arguments
+
+# Custom variables can be specified
+country = Brazil
+
+# Specify your executable and your arguments
+# Usage: ./least_squares_range.sh [CSV] [Country] [Start Year] [End Year]
+executable = least_squares_range.sh
+arguments = gapminder-life-expectancy.csv $(country) 2024 2033
+
+# Specify the log, standard error, and standard output (or screen output) files
+log = $(country)_24_33.log
+error = $(country)_24_33.err
+output = $(country)_24_33.out
+
+# We need to also transfer the csv file for the calculation
+transfer_input_files = gapminder-life-expectancy.csv, least_squares.py
+
+# Requirements for our calculation
+request_cpus = 1
+request_memory = 1GB
+request_disk = 1GB
+
+# Tell HTCondor to run instances of our calculation
+queue
+
+
+
Key highlights:
+
+
The executable is now our wrapper script, least_squares_range.sh
+
We edit the arguments line according to the usage of the wrapper script we wrote.
+
transfer_input_files now includes least_squares.py. We now need to specify this file to be transferred over, since it is no longer our executable.
+
+
+Introduction to the High Throughput Computing Strategy
+
+Like nearly all large-scale compute systems, users of both CHTC's High Throughput and High Performance systems prepare their computational work and submit them as tasks called "jobs" to run on execution points.
+
+
+High Throughput Computing systems specialize in running many small, independent jobs (< ~20 CPUs/job). On the other hand, High Performance Computing systems speicalize in running a few, very large jobs that run on more than one node (~30+ CPUs/job).
+
+
+It is best to keep this distinction in mind when setting up your jobs. On the HTC system, smaller jobs (i.e., those requesting smaller amounts of CPU, memory, and disk resources per job) are easier to find a slot to run on. This means that users will notice they will have jobs start quicker and will have more running simultaneously It is almost always beneficial to break up your analysis pipeline into smaller pieces to take advantage of getting more jobs up and running, quicker.
+
+
+Unlike the High Performance System, CHTC staff do not limit the number of jobs a user can have running in parallel, thus it is to your advantage to strategize your workflow to take advantage of as many compute resources as possible.
+
+
+More detailed information regarding CHTC's HTC system can be found in the HTC Overview Guide.
+
+
+
Step Two
+
+
+Log on to an HTCondor HTC Access Point
+
+Once your request for an account has been approved by a Research Computing Facilitator, you will be emailed your login information.
+
+
+For security purposes, every CHTC user is required to be connected to either a University of Wisconsin internet network or campus VPN and to use two-factor authentication when logging in to your HTC access point (also called a "submit server").
+
+
+
+
Step Three
+
+
+Understand the Basics of Submitting HTCondor Jobs
+
+Computational work is run on the the High Throughput Computing system's execution machines by submitting tasks as “jobs” to the HTCondor job scheduler. Before submitting your own computational work, it is necessary to understand how HTCondor job submission works. The following guide is a short step-by-step tutorial on how to submit basic HTCondor jobs: Practice: Submit HTC Jobs using HTCondor. It is highly recommended that every user follow this short tutorial as these are the steps you will need to know to complete your own analyses.
+
+
+
Step Four
+
+
+Learn to Run Many HTCondor Jobs using one Submit File
+
+After following this tutorial, we highly recommend users review the Easily Submit Multiple Jobs guide to learn how you can configure HTCondor to automatically pass files or parameters to different jobs, return output to specific directories, and other easily automated organizational behaviors.
+
+
+
+
Step Five
+
+
+Install your Software
+
+Our Software Solutions guides contain information about how to install and use software on the HTC system.
+
+
+Software Containers
+
+In general, we recommend installing your software into a "container" if your software relies on a specific version of R/Python, can be installed with `conda`, if your software has many dependencies, or if it already has a pre-existing container (which many common software packages do). There are many advantages to using a software container; one example is that software containers contain their own operating system. As a result, jobs with software containers have the most flexibility with where they run on CHTC or the OSPool. The CHTC website provides several guides on building, testing, and using software containers.
+
+
+Use Pre-installed Software in Modules
+
+CHTC's infrastructure team has provided a limited collection of software as modules, which users can load and then use in their jobs. This collection includes tools shared across domains, including COMSOL, ANSYS, ABAQUS, GUROBI, and others. To learn how to load these software into your jobs, our Use Software Available in Modules and Use Licensed Software guides.
+
+
+Access Software Building Tools on CHTC's Software Building Machines
+
+The HTC system contains several machines designed for users to use when building their software. These machines have access to common compilers (e.g., gcc) that are necessary to install many software packages. To learn how to submit an interactive job to log into these machines to build your software, see Compiling or Testing Code with an Interactive Job.
+
+
+
Step Six
+
+
+Access your Data on the HTC System
+
+Upload your data to CHTC
+
+When getting started on the HTC system, it is typically necessary to upload your data files to our system so that they can be used in jobs. For users that do not want to upload data to our system, it is possible to configure your HTCondor jobs to pull/push files using `s3` file transfer, pull data using standard unix commands (`wget`), among other transfer mechanisms.
+
+
+To learn how to upload data from different sources, including your laptop, see:
+
+
+Choose a Location to Stage your Data
+
+When uploading data to the HTC system, users need to choose a location to store that data on our system. There are two primary locations: `/home` and `/staging`.
+
+
+`/home` is more efficient at handling "small" files, while `/staging` is more efficient at handling "large" files. For more information on what is considered "small" and "large" data files and to learn how to use files stored in these locations for jobs, visit our HTC Data guides.
+
+
+
+
Step Seven
+
+
+Run Test Jobs
+
+Once you have your data, software, code, and HTCondor submit file prepared, you should submit several test jobs. The table created by HTCondor in the `.log` file will help you determine the amount of resources (CPUs/GPUs, memory, and disk) your job used, which is beneficial for understanding future job resource requests as well as troubleshooting. The standard out`.out` file will contain all text your code printed to the terminal screen while running, while the standard error `.err` file will contain any standard errors that your software printed out while running.
+
+
+Things to look for:
+
+
Jobs being placed on hold (hold messages can be viewed using `condor_q jobID -hold`)
+
Jobs producing expected files
+
Size and number of output files (to make sure output is being directed to the correct location and that your quota is sufficient for all of your output data as you submit more jobs)
+
+
+
+
Step Eight
+
+ Submit Your Workflow
+
+Once your jobs succeed and you have confirmed your quota is sufficient to store the files your job creates, you are ready to submit your full workflow. For researchers interested in queuing many jobs or accessing GPUs, we encourage you to consider accessing additional CPUs/GPUs outside of CHTC. Information is provided in the following step.
+
+
+
Step Nine
+
+ Access Additional Compute Capacity
+
+ Researchers with jobs that run for less than ~10 hours, use less than ~20GB of data per job, and do not require CHTC modules, can take advantage of additional CPUs/GPUs to run there jobs. These researchers can typically expect to have more jobs running simultaneously.
+
+
+ To opt into using this additional capacity, your jobs will run on hardware that CHTC does not own. Instead, your jobs will "backfill" on resources owned by research groups, UW-Madison departments and organizations, and a national scale compute system: the OSG's Open Science Pool. This allows researchers to access capacity beyond what CHTC can provide. To learn how to take advantage of additional CPUs/GPUs, visit Scale Beyond Local HTC Capacity.
+
+
+
Step Ten
+
+ Move Your Data off CHTC
+
+ Data stored on CHTC systems is not backed up. While CHTC staff try to maintain a stable compute environment, it is possible for unexpected outages to occur that may impact your data on our system. We highly recommend all CHTC users maintain copies of important scripts and input files on another compute system (your laptop, lab server, ResearchDrive, etc.) throughout their analysis. Additionally, as you complete your analysis on CHTC servers, we highly recommend you move your data off our system to a backed up storage location.
+
+
+ CHTC staff periodically delete data of users that have not logged in or submitted jobs in several months to clear up space for new users. Eventually, all users should expect their data to be deleted off CHTC servers and should plan accordingly. Data on CHTC is meant to be used for analyses actively being carried out - CHTC is not a long-term storage solution for your data storage needs.
+</details>
+
+
+
This guide walks you step-by-step through the construction and submission of a
+simple DAGMan workflow.
+We recommend this guide if you are interested in automating your job submissions.
Consider the case of two HTCondor jobs that use the submit files A.sub and B.sub.
+Let’s say that A.sub generates an output file (output.txt) that B.sub will analyze.
+To run this workflow manually, we would
+
+
+
Submit the first HTCondor job with condor_submit A.sub.
+
Wait for the first HTCondor job to complete successfully.
+
Submit the second HTCondor job with condor_submit B.sub.
+
+
+
If the first HTCondor job using A.sub is fairly short, then manually running this workflow is not a big deal.
+But if the first HTCondor job takes a long time to complete (maybe takes several hours to run, or has to wait for special resources),
+this can be very inconvenient.
+Instead, we can use DAGMan to automatically submit B.sub once the first HTCondor job using A.sub has completed successfully.
+This guide walks through the process of creating such a DAGMan workflow.
+
+
2. Structure of the DAG
+
+
In this scenario, our workflow could be described as a DAG consisting of two nodes (A.sub and B.sub) connected by a single edge (output.txt).
+To represent this relationship, we will define nodes A and B - corresponding to A.sub and B.sub, respectively - and connect them with a line pointing from A and B, like in this figure:
+
+
+
+
In order to use DAGMan to run this workflow, we need to communicate this structure to DAGMan via the .dag input file.
+
+
3. The Minimal DAG Input File
+
+
Let’s call the input file simple.dag.
+At minimum, the contents of the simple.dag input file are
+
+
# simple.dag
+
+# Define the DAG jobs
+JOB A A.sub
+JOB B B.sub
+
+# Define the connections
+PARENT A CHILD B
+
+
+
In a DAGMan input file, a node is defined using the JOB keyword, followed by the name of the node and the name of the corresponding submit file.
+In this case, we have created a node named A and instructed DAGMan to use the submit file A.sub for executing that node.
+We have similarly created node B and instructed DAGMan to use the submit file B.sub.
+(While there is no requirement that the name of the node match the name of the corresponding submit file, it is convenient to use a consistent naming scheme.)
+
+
To connect the nodes, we use the PARENT .. CHILD .. syntax.
+Since node B requires that node A has completed successfully, we say that node A is the PARENT while node B is the CHILD.
+Note that we do not need to define why node B is dependent on node A, only that it is.
+
+
4. The Submit Files
+
+
Now let’s define simple examples of the submit files A.sub and B.sub.
+
+
Node A
+
+
First, the submit file A.sub uses the executable A.sh, which will generate the file called output.txt.
+We have explicitly told HTCondor to transfer back this file by using the transfer_output_files command.
The executable file simply saves the hostname of the machine running the script:
+
+
#!/bin/bash
+
+# A.sh
+hostname > output.txt
+
+sleep 1m # so we can see the job in "running" status
+
+
+
Node B
+
+
Second, the submit file B.sub uses the executable B.sh to print a message using the contents of the output.txt file generated by A.sh.
+We have explicitly told HTCondor to transfer output.txt as an input file for this job, using the transfer_input_files command.
+Thus we have finally defined the “edge” that connects nodes A and B: the use of output.txt.
The executable file contains the command for printing the desired message, which will be printed to B.out.
+
+
#!/bin/bash
+
+# B.sh
+echo "The previous job was executed on the following machine:"
+cat output.txt
+
+sleep 1m # so we can see the job in "running" status
+
+
+
The directory structure
+
+
Based on the contents of simple.dag, DAGMan is expecting that the submit files A.sub and B.sub are in the same directory as simple.dag.
+The submit files in turn are expecting A.sh and B.sh be in the same directory as A.sub and B.sub.
+Thus, we have the following directory structure:
It is possible to organize each job into its own directory, but for now we will use this simple, flat organization.
+
+
5. Running the Simple DAG
+
+
To run the DAG workflow described by simple.dag, we use the HTCondor command condor_submit_dag:
+
+
condor_submit_dag simple.dag
+
+
+
The DAGMan utility will then parse the input file and generate an assortment of related files that it will use for monitoring and managing your workflow.
+Here is the output of running the above command:
+
+
[user@login DAG_simple]$ condor_submit_dag simple.dag
+
+Loading classad userMap 'checkpoint_destination_map' ts=1699037029 from /etc/condor/checkpoint-destination-mapfile
+-----------------------------------------------------------------------
+File for submitting this DAG to HTCondor : simple.dag.condor.sub
+Log of DAGMan debugging messages : simple.dag.dagman.out
+Log of HTCondor library output : simple.dag.lib.out
+Log of HTCondor library error messages : simple.dag.lib.err
+Log of the life of condor_dagman itself : simple.dag.dagman.log
+
+Submitting job(s).
+1 job(s) submitted to cluster 562265.
+-----------------------------------------------------------------------
+
+
+
The output shows the list of standard files that are created with every DAG submission along with brief descriptions.
+A couple of additional files, some of them temporary, will be created during the lifetime of the DAG.
+
+
6. Monitoring the Simple DAG
+
+
You can see the status of the DAG in your queue just like with any other HTCondor job submission.
There are a couple of things to note about the condor_q output:
+
+
+
The BATCH_NAME for the DAGMan job is the name of the input DAG file, simple.dag, plus the Job ID of the DAGMan scheduler job (562265 in this case): simple.dag+562265.
+
The total number of jobs for simple.dag+562265 corresponds to the total number of nodes in the DAG (2).
+
Only 1 node is listed as “Idle”, meaning that DAGMan has only submitted 1 job so far. This is consistent with the fact that node A has to complete before DAGMan can submit the job for node B.
+
+
+
+
Note that if you are very quick to run your condor_q command after running your condor_submit_dag command, then you may see only the DAGMan scheduler job. It may take a few seconds for DAGMan to start up and submit the HTCondor job associated with the first node.
+
+
+
To see more detailed information about the DAG workflow, use condor_q -nob -dag.
+For example,
+
+
[user@login DAG_simple]$ condor_q -dag -nob
+
+-- Schedd: ap2002.chtc.wisc.edu : <128.105.68.92:9618?... @ 12/14/23 11:27:03
+ ID OWNER/NODENAME SUBMITTED RUN_TIME ST PRI SIZE CMD
+562265.0 user 12/14 11:26 0+00:00:37 R 0 0.5 condor_dagman -p 0 -f -l . -Loc
+562279.0 |-A 12/14 11:26 0+00:00:00 I 0 0.0 A.sh
+
+
+
In this case, the first entry is the DAGMan scheduler job that you created when you first submitted the DAG.
+The following entries correspond to the nodes whose jobs are currently in the queue.
+Nodes that have not yet been submitted by DAGMan or that have completed and thus left the queue will not show up in your condor_q output.
+
+
7. Wrapping Up
+
+
After waiting enough time, this simple DAG workflow should complete without any issues.
+But of course, that will not be the case for every DAG, especially as you start to create your own.
+DAGMan has a lot more features for managing and submitting DAG workflows, ranging from how to handle errors, combining DAG workflows, and restarting failed DAG workflows.
+ Overview: Submit Workflows with HTCondor's DAGMan
+
+
+
If your work requires jobs that run in a particular sequence, you may benefit from a workflow tool that submits and monitors jobs for you in the correct order. A simple workflow manager that integrates with HTCondor is DAGMan, or “DAG Manager” where DAG stands for the typical picture of a workflow, a directed acyclic graph.
If your work requires jobs that run in a particular sequence, you may benefit
+from a workflow tool that submits and monitors jobs for you in the correct
+order. HTCondor has a built in utility called “DAGMan” that automates the
+job submission of such a workflow.
+
+
This talk (originally presented at HTCondor Week 2020) gives a good introduction
+to DAGMan and its most useful features:
DAGMan can be a powerful tool for creating large and complex HTCondor workflows.
+
+
What is DAGMan?
+
+
DAGMan is short for “DAG Manager”, and is a utility built into HTCondor for automatically running a workflow (DAG) of jobs,
+where the results of an earlier job are required for running a later job.
+This workflow is similar to a flowchart with a definite beginning and ending.
+More specificially, “DAG” is an acronym for Directed Acyclic Graph, a concept from the mathematic field of graph theory:
+
+
+
Graph: a collection of points (“nodes” or “vertices”) connected to each other by lines (“edges”).
+
Directed: the edges between nodes have direction, that is, each edge begins on one node and ends on a different node.
+
Acyclic: the graph does not have a cycle - or loop - where the graph returns to a previous node.
+
+
+
By using a directed acyclic graph, we can guarantee that the workflow has a defined ‘start’ and ‘end’.
+In DAGMan, each node in the workflow corresponds to a job submission (i.e., condor_submit).
+Each edge in the workflow corresponds to a set of files that are the output of one job submission and
+the input of another job submission.
+For convenience, we refer to such a workflow and the files necessary to execute it as “the DAG”.
+
+
The Basics of the DAG Input File
+
+
The purpose of the DAG input file (typically .dag) is to instruct DAGMan on the structure of the workflow you want to run.
+Additional instructions can be included in the DAG input file about how to manage the job submissions, rerun jobs (nodes),
+or to run pre- or post-processing scripts.
+
+
In general, the structure of the .dag input file consists of one instruction per line, with each line starting with a keyword defining the type of instruction.
+
+
1. Defining the DAG jobs
+
+
To define a DAG job, we begin a new line with JOB then provide the name, the submit file, and any additional options. The syntax is
+
+
JOB JobName JobSubmitFile [additional options]
+
+
+
where you need to replace JobName with the name you would like the DAG job to have, and JobSubmitFile with the name or path of the corresponding submit file. Both JobName and JobSubmitFile need to be specified.
+
+
Every node in your workflow must have a JOB entry in the .dag input file. While there are other instructions that can reference a particular node, they will only work if the node in question has a corresponding JOB entry.
+
+
2. Defining the connections
+
+
To define the relationship between DAG jobs in a workflow, we begin a new line with PARENT then the name of the first DAG job, followed by CHILD and the name of the second DAG job. That is, the PARENT DAG job must complete successfully before DAGMan will submit the CHILD DAG job. In fact, you can define such relationship for many DAG jobs (nodes) at the same time. Thus, the syntax is
+
+
PARENT p1 [p2 ...] CHILD c1 [c2 ...]
+
+
+
where you replace p# with the JobName for each parent DAG job, and c# with the JobName for each child DAG job. The child DAG jobs will only be submitted if all of the parent DAG jobs are completed successfully. Each JobName you provide must have a corresponding JOB entry elsewhere in the .dag input file.
+
+
+
Technically, DAGMan does not require that each DAG job in a workflow is connected to another DAG job.
+This allows you to submit many unrelated DAG jobs at one time using DAGMan.
+
+
+
Note that in defining the PARENT-CHILD relationship, there is no definition of how they are related.
+Effectively, DAGMan does not need to know the reason why the PARENT DAG jobs must complete successfully in order to submit the CHILD DAG jobs.
+There can be many reasons why you might want to execute the DAG jobs in this order, although the most common reason
+is that the PARENT DAG jobs create files that are required by the CHILD DAG jobs.
+In that case, it is up to you to organize the submit files of those DAG jobs in such a way that the output of the PARENT DAG jobs
+can be used as the input of the CHILD DAG jobs.
+In the DAGMan Features section, we will discuss tools that can assist you with this endeavor.
+
+
Running a DAG Workflow
+
+
1. Submitting the DAG
+
+
Because the DAG workflow represents a special type of job, a special command is used to submit it. To submit the DAG workflow, use
+
+
condor_submit_dag example.dag
+
+
+
where example.dag is the name of your DAG input file containing the JOB and PARENT-CHILD definitions for your workflow.
+This will create and submit a “DAGMan job” that will in turn be responsible for submitting and monitoring the job nodes described in your DAG input file.
+
+
A set of files is created for every DAG submission, and the output of the condor_submit_dag lists the files with a brief description.
+For the above submit command, the output will look like:
+
+
------------------------------------------------------------------------
+File for submitting this DAG to HTCondor : example.dag.condor.sub
+Log of DAGMan debugging messages : example.dag.dagman.out
+Log of HTCondor library output : example.dag.lib.out
+Log of HTCondor library error messages : example.dag.lib.err
+Log of the life of condor_dagman itself : example.dag.dagman.log
+
+Submitting job(s).
+1 job(s) submitted to cluster ######.
+------------------------------------------------------------------------
+
+
+
2. Monitoring the DAG
+
+
The DAGMan job is actually a “scheduler” job (described by example.dag.condor.sub) and the status and progress of the DAGMan job is saved to example.dag.dagman.out.
+Using condor_q or condor_watch_q, the DAGMan job will be under the name example.dag+######, where ###### is the Cluster ID of the DAGMan scheduler job.
+Each job submitted by DAGMan, however, will be assigned a separate Cluster ID.
+
+
For a more detailed status display, you can use
+
+
condor_q -dag -nobatch
+
+
+
If you want to see the status of just the DAGMan job proper, use
(Technically, this shows all “scheduler” type HTCondor jobs, but for most users this will only include DAGMan jobs.)
+
+
For even more details about the execution of the DAG workflow, you can examine the contents of the example.dag.dagman.out file.
+The file contains timestamped log information of the execution and status of nodes in the DAG, along with statistics.
+As the DAG progresses, it will also create the files example.dag.metrics and example.dag.nodes.log, where the metrics file contains the current statistics of the DAG and the log file is an aggregate of the individual nodes’ user log files.
+
+
If you want to see the status of a specific node, use
where YourNodeName should be replaced with the name of the node you want to know the status of.
+Note that this works only for jobs that are currently in the queue; if the node has not yet been submitted, or if it has completed and thus exited the queue, then you will not see the node using this command.
+To see if the node has completed, you should examine the contents of the .dagman.out file.
+A simple way to see the relevant log messages is to use a command like
+
+
grep "Node YourNodeName" example.dag.dagman.out
+
+
+
If you’d like to monitor the status of the individual nodes in your DAG workflow using condor_watch_q, then wait long enough for the .nodes.log file to be generated.
+Then run
+
+
condor_watch_q -file example.dag.nodes.log
+
+
+
Now condor_watch_q will update when DAGMan submits another job.
+
+
3. Removing the DAG
+
+
To remove the DAG, you need to condor_rm the Cluster ID corresponding to the DAGMan scheduler job.
+This will also remove the jobs that the DAGMan scheduler job submitted as part of executing the DAG workflow.
+A removed DAG is almost always marked as a failed DAG, and as such will generate a rescue DAG (see below).
+
+
DAGMan Features
+
+
1. Pre- and post-processing for DAG jobs
+
+
You can tell DAGMan to execute a script before or after it submits the HTCondor job for a particular node.
+Such a script will be executed on the submit server itself and can be used to set up the files needed for the HTCondor job, or to clean up or validate the files after a successful HTCondor job.
+
+
The instructions for executing these scripts are placed in the input .dag file.
+You must specify the name of the node the script is attached to and whether the script is to be executed before (PRE) or after (POST) the HTCondor job.
+Here is a simple example:
+
+
# Define the node (required) (example node named "my_node")
+JOB my_node run.sub
+
+# Define the script for executing before submitting run.sub (optional)
+SCRIPT PRE my_node setup.sh
+
+# Define a script for executing after run.sub has completed (optional)
+SCRIPT POST my_node cleanup.sh
+
+
+
In this example, when it is time for DAGMan to execute the node my_node, it will take the following steps:
+
+
+
Execute setup.sh (the PRE script)
+
Submit the HTCondor job run.sub (the node’s JOB)
+
Wait for the HTCondor job to complete
+
Execute cleanup.sh (the POST script)
+
+
+
All of these steps count as part of DAGMan’s attempt to execute the node my_node and may affect whether DAGMan considers the node to have succeeded or failed. For more information on PRE and POST scripts as well as other scripts that DAGMan can use, see the HTCondor documentation.
+
+
2. Retrying failed nodes
+
+
You can tell DAGMan to automatically retry a node if it fails.
+This way you don’t have to manually restart the DAG if the job failed due to a transient issue.
+
+
The instructions for how many times to retry a node go in the input .dag file.
+You must specify the node and the maximum number of times that DAGMan should attempt to retry that node.
+Here is a simple example:
+
+
# Define the node (required) (example node named "my_node")
+JOB my_node run.sub
+
+# Define the number of times to retry "my_node"
+RETRY my_node 2
+
+
+
In this example, if the job associated with node my_node fails for some reason, then DAGMan will resubmit run.sub up to 2 more times.
+
+
You can also apply the retry for statement to all nodes in the DAG by specifying ALL_NODES instead of a specific node name.
+For example,
+
+
RETRY ALL_NODES 2
+
+
+
As a general rule, you should not set the number of retry attempts to more than 1 or 2 times.
+If a job is failing repeatedly, it is better to troubleshoot the cause of that failure.
+This is especially true when you applying the RETRY statement to all of the nodes in your DAG.
+
+
DAGMan considers the exit code of the last executed step when it considers the success or failure of the node overall.
+There are various possible combinations that can determine the success or failure of the node itself, as discussed in the HTCondor documentation here.
+DAGMan only considers the success/failure of the node as a whole when deciding if it needs to attempt a retry.
+Importantly, if the .sub file for a node submits multiple HTCondor jobs, when any one of those jobs fails, DAGMan considers all of the jobs to have failed and will remove them from queue.
+
+
Finally, note that DAGMan does not consider an HTCondor job with a “hold” status as being completed.
+In that case, you can include a command in the submit file to automatically remove a held job from the queue.
+When a job is removed from the queue, DAGMan considers that job to be failed (though as noted above, failure of the HTCondor job does not necessarily mean the node has failed).
Generally, a DAG is considered failed if any one of its component nodes has failed.
+That does not mean, however, that DAGMan immediately stops the DAG.
+Instead, when DAGMan encounters a failed node, it will attempt to complete as much of the DAG as possible that does not require that node.
+Only then will DAGMan stop running the workflow.
+
+
When the DAGMan job exits from a failed DAG, it generates a report of the status of the nodes in a file called a “Rescue DAG” with the extension .rescue###,
+starting from .rescue001 and counting up each time a Rescue DAG is generated.
+The Rescue DAG can then be used by DAGMan to restart the DAG, skipping over nodes that are marked as completed successfully and jumping directly to the failed nodes that need to be resubmitted.
+The power of this feature is that DAGMan will not duplicate the work of already completed nodes, which is especially useful when there is an issue at the end of a large DAG.
+
+
DAGMan will automatically use a Rescue DAG if it exists when you use condor_submit_dag to submit the original .dag input file.
+If more than one Rescue DAG exists for a given .dag input file, then DAGMan will use the most recent Rescue DAG
+(the one with the highest number at the end of .rescue###).
+
+
# Automatically use the Rescue DAG if it exists
+condor_submit_dag example.dag
+
+
+
+
If you do NOT want DAGMan to use an existing Rescue DAG, then you can use the -force option to start the DAG completely from scratch:
+
+
+
# Do NOT use the Rescue DAG if it exists
+condor_submit_dag -force example.dag
+
+
+
For more information on Rescue DAGs and how to explicitly control them, see the HTCondor documentation.
+
+
+
If the DAGMan scheduler job itself crashes (or is placed on hold) and is unable to write a Rescue DAG, then when the DAGMan job is resubmitted (or released), DAGMan will go into “recovery mode”.
+Essentially this involves DAGMan reconstructing the Rescue DAG that should have been written, but wasn’t due to the job interruption.
+DAGMan will then resume the DAG based on its analysis of the files that do exist.
+
+
+
More Resources
+
+
Tutorials
+
+
If you are interested in using DAGMan to automatically run a workflow, we highly recommend that you first go through our tutorial Simple Example of a DAG Workflow.
+This tutorial takes you step by step through the mechanics of creating and submitting a DAG.
+
+
Once you’ve understood the basics from the simple tutorial, you are ready to explore more examples and scenarios in our Intermediate DAGMan Tutorial.
+
+
Trainings & Videos
+
+
An introductory tutorial to DAGMan previously presented at HTCondor Week was recorded and is available on YouTube: HTCondor DAGMan Workflows tutorial.
+
+
More recently, the current lead developer of HTCondor’s DAGMan utility gave an intermediate tutorial: HTC23 DAGMan intermediate.
+Below is a list of guides for some of the most common tasks our users need to
+carry out as they begin and continue to use the HTC resources at the CHTC.
+
+
+
User Expectations
+
+Read through these user expectations and policies before using CHTC services.
+
+
+
+
+
+
+
Now move into the new directory to see the contents of the tutorial:
+
+
$ cd tutorial-dagman-intermediate
+
+
At the top level is a worked example of a “Diamond DAG” that summarizes the basic components of a creating, submitting, and managing DAGMan workflows.
+In the lower level additional_examples directory are more worked examples with their own READMEs highlighting specific features that can be used with DAGMan.
+Brief descriptions of these examples are provided in the Additional Examples section at the end of this tutorial.
+
+
Before working on this tutorial, we recommend that you read through our other DAGMan guides:
While any workflow that satisfies the definition of a “Directed Acyclic Graph” (DAG) can be executed using DAGMan, there are certain types that are the most commonly used:
+
+
+
Sequential DAG: all the nodes are connected in a sequence of one after the other, with no branching or splitting. This is good for conducting increasingly refined analyses of a dataset or initial result, or chaining together a long-running calculation. The simplest example of this type is used in the guide Simple Example of a DAGMan Workflow.
+
Split and recombine DAG: the first node is connected to many nodes of the same layer (split) which then all connect back to the final node (recombine). Here, you can set up the shared environment in the first node and use it to parallelize the work into many individual jobs, then finally combine/analyze the results in the final node. The simplest example of this type is the “Diamond DAG” - the subject of this tutorial.
+
Collection DAG: no node is connected to any other node. This is good for the situation where you need to run a bunch of otherwise unrelated jobs, perhaps ones that are competing for a limited resource. The simplest example of this type is a DAG consisting of a single node.
+
+
+
These types are by no means “official”, nor are they the only types of structure that a DAG can take. Rather, they serve as starting points from which you can build your own DAG workflow, which will likely consist of some combination of the above elements.
+
+
The Diamond DAG
+
+
As mentioned above, the “Diamond DAG” is the simplest example of a “split and recombine” DAG.
+In this case, the first node TOP is connected to two nodes LEFT and RIGHT (the “split”), which are then connected to the final node BOTTOM (the “recombine”).
+
+
+
+
To describe the flow of the DAG and the parts needed to execute it, DAGMan uses a custom description language in an input file, typically named <DAG Name>.dag.
+The two most important commands in the DAG description language are:
+
+
+
JOB <NodeName> <NodeSubmitFile> - Describes a node and the submit file it will use to run the node.
+
PARENT <NodeName1> CHILD <NodeName2> - Describes the edge starting from <NodeName1> and pointing to <NodeName2>.
+
+
+
These commands have been used to construct the Diamond DAG and are saved in the file diamond.dag.
+To view the contents of diamond.dag, run
+
+
$ cat diamond.dag
+
+
Before you continue, we recommend that you closely examine the contents of diamond.dag and identify its components.
+Furthermore, try to identify the submit file for each node, and use that submit file to determine the nature of the HTCondor job that will be submitted for each node.
+
+
Submitting a DAG
+
+
To submit a DAGMan workflow to HTCondor, you can use one of the following commands:
+
+
$ condor_submit_dag diamond.dag
+ or
+$ htcondor dag submit diamond.dag
+
+
What Happens?
+
+
When a DAG is submitted to HTCondor a special job is created to run DAGMan
+on behalf of you the user. This job runs the provided HTCSS DAGMan executable
+in the AP job queue. This is an actual job that can be queried and acted upon.
+
+
You may also notice that lots of files are created. These files are all part
+of DAGMan and have various purposes. In general, the files that should
+always exist are as follows:
+
+
+
DAGMan job proper files
+
+
<DAG Name>.condor.sub - Submit file for the DAGMan job proper
+
<DAG Name>.dagman.log - Job event log file for the DAGMan job proper
+
<DAG Name>.lib.err - Standard error stream file for the DAGMan job proper
+
<DAG Name>.lib.out - Standard output stream file for the DAGMan job proper
+
+
+
Informational DAGMan files
+
+
<DAG Name>.dagman.out - General DAGMan process logging file
+
<DAG Name>.nodes.log - Collective job event log file for all managed jobs (Heart of DAGMan)
+
<DAG Name>.metrics - JSON formatted information about the DAG
+
+
+
+
+
Of these files, the two most important are the <DAG Name>.dagman.out and <DAG Name>.nodes.log.
+The .dagman.out file contains the entire history and status of DAGMan’s execution of your workflow.
+The .nodes.log file on the other hand is the accumulated log entries for every HTCondor job that DAGMan submitted,
+and DAGMan monitors the contents of this file to generate the contents of the .dagman.out file.
+
+
+
Note: these are not all the files that DAGMan can produce.
+Depending on the options and features you employ in your DAG input file, more files with different purposes can be created.
+
+
+
Monitoring DAGMan
+
+
The DAGMan job and the jobs in the DAG workflow can be found in the AP job queue
+and so the normal methods of job monitoring work.
+That also means that you can interact with these jobs, though in a more limited fashion than a regular job (see Running and Managing DAGMan for more details).
+
+
A plain condor_q command will show a condensed batch view of the jobs submitted, running, and managed by the DAGMan job proper.
+For more information about jobs running under DAGMan, use the -nobatch and -dag flags:
You can also watch the progress of the DAG and the jobs running under it
+by running:
+
+
$ condor_watch_q
+
+
+
Note that condor_watch_q works by monitoring the log files of jobs that are in the queue, but only at the time of its execution.
+Additional jobs submitted by DAGMan while condor_watch_q is running will not appear in condor_watch_q.
+To see additional jobs as they are submitted, wait for DAGMan to create the .nodes.log file, then run
+
+
$ condor_watch_q -files *.log
+
+
+
+
For more detail about the status and progress of your DAG workflow, you can use the noun-verb command:
+
+
$ htcondor dag status DAGManJobID
+
+
where DAGManJobID is the ID for the DAGMan job proper.
+Note that the information in the output of this command does not update frequently, and so it is not suited for short-lived DAG workflows such as the current example.
+
+
When your DAG workflow has completed, the DAGMan job proper will disappear from the queue.
+If the DAG workflow completed successfully, then the .dag.dagman.out file should have a message that All jobs Completed!, though it may be difficult to find manually (try using grep "All jobs Completed!" *.dag.dagman.out instead).
+If the DAG workflow was aborted due to an error, then the .dag.dagman.out file should have the message Aborting DAG....
+Assuming that the DAGMan job proper did not crash, then regardless the final line of the .dag.dagman.out file should contain (condor_DAGMAN) pid ####### EXITING WITH STATUS #, where the number after STATUS is the exit code (0 if success, not 0 if failure).
+
+
How DAGMan Handles Relative Paths
+
+
By default, the directory that DAGMan submits all jobs from is the same directory you are in when you run condor_submit_dag.
+This directory (let’s call it the submit directory) is the starting directory for any relative path in the .dag input file or in the node .sub files that DAGMan submits.
+
+
This can be observed by inspecting the sleep.sub submit file in the SleepJob sub-directory and by inspecting the diamond.dag input file.
+In the diamond.dag file, the jobs are declared using a relative path.
+For example:
+
+
JOB TOP ./SleepJob/sleep.sub
+
+
+
This tells DAGMan that the submit file for the JOBTOP is sleep.sub, located in the SleepJob in the submit directory (.).
+Similarly, the submit file sleep.sub uses paths relative to the submit directory for defining the save locations for the .log, .out, and .err files, i.e.,
+
+
log = ./SleepJob/$(JOB).log
+
+
+
This behavior is consistent with submission of regular (non-DAGMan) jobs, e.g. condor_submit SleepJob/sleep.sub.
+
+
+
Contrary to the above behavior, the .dag.* log/output files generated by the DAGMan job proper will always be in the same directory as the .dag input file.
+
+
+
This is just the default behavior, and there are ways to make the location of job submission/management more obvious.
+See the HTCondor documentation for more details: File Paths in DAGs.
+
+
Additional Examples
+
+
Additional examples that cover various topics related to DAGMan are provided in the folder additional_examples with corresponding READMEs.
+The following order of the examples is recommended:
+
+
+
RescueDag - Example for DAGs that don’t exit successfully
+
PreScript- Example using a pre-script for a node
+
PostScript - Example using a post-script for a node
+
Retry - Example for retrying a failed node
+
VARS - Example of reusing a single submit file for multiple nodes with differing variables
This guide discusses how to run jobs on the CHTC using HTCondor.
+
+
Workflow Overview
+
+
The process of running computational workflows on CHTC resources follows the following outline:
+
+
+
+
Terminology:
+
+
+
Access point is where you login and stage your data, executables/scripts, and software to use in jobs.
+
HTCondor is a job scheduling software that will run your jobs out on the execution points.
+
The Execution Points is the set of resources your job runs on. It is composed of servers, as well as other technologies, that compose the cpus, memory, and disk space that will run the computations of your jobs.
+
+
+
Run Jobs using HTCondor
+
+
We are going to run the traditional ‘hello world’ program with a CHTC twist. In order to demonstrate the distributed resource nature of CHTC’s HTC System, we will produce a ‘Hello CHTC’ message 3 times, where each message is produced within is its own ‘job’. Since you will not run execution commands yourself (HTCondor will do it for you), you need to tell HTCondor how to run the jobs for you in the form of a submit file, which describes the set of jobs.
+
+
+
Note: You must be logged into a CHTC Access Point for the following example to work.
+
+
+
Prepare job executable and submit file on an Access Point
+
+
+
+
First, create the executable script you would like HTCondor to run.
+For our example, copy the text below and paste it into a file called hello-world.sh (we recommend using a command line text editor) in your home directory.
+
+
#!/bin/bash
+#
+# hello-world.sh
+# My CHTC job
+#
+# print a 'hello' message to the job's terminal output:
+echo "Hello CHTC from Job $1 running on `whoami`@`hostname`"
+#
+# keep this job running for a few minutes so you'll see it in the queue:
+sleep 180
+
+
+
This script would be run locally on our terminal by typing hello-world.sh <FirstArgument>.
+However, to run it on CHTC, we will use our HTCondor submit file to run the hello-world.sh executable and to automatically pass different arguments to our script.
+
+
+
Prepare your HTCondor submit file, which you will use to tell HTCondor what job to run and how to run it.
+Copy the text below, and paste it into file called hello-world.sub.
+This is the file you will submit to HTCondor to describe your jobs (known as the submit file).
+
+
# hello-world.sub
+# My HTCondor submit file
+
+# Specify your executable (single binary or a script that runs several
+# commands) and arguments to be passed to jobs.
+# $(Process) will be a integer number for each job, starting with "0"
+# and increasing for the relevant number of jobs.
+executable = hello-world.sh
+arguments = $(Process)
+
+# Specify the name of the log, standard error, and standard output (or "screen output") files. Wherever you see $(Cluster), HTCondor will insert the
+# queue number assigned to this set of jobs at the time of submission.
+log = hello-world_$(Cluster)_$(Process).log
+error = hello-world_$(Cluster)_$(Process).err
+output = hello-world_$(Cluster)_$(Process).out
+
+# This line *would* be used if there were any other files
+# needed for the executable to use.
+# transfer_input_files = file1,/absolute/pathto/file2,etc
+
+# Tell HTCondor requirements (e.g., operating system) your job needs,
+# what amount of compute resources each job will need on the computer where it runs.
+request_cpus = 1
+request_memory = 1GB
+request_disk = 5GB
+
+# Tell HTCondor to run 3 instances of our job:
+queue 3
+
+
+
By using the “$1” variable in our hello-world.sh executable, we are telling HTCondor to fetch the value of the argument in the first position in the submit file and to insert it in location of “$1” in our executable file.
+
+
Therefore, when HTCondor runs this executable, it will pass the $(Process) value for each job and hello-world.sh will insert that value for “$1” in hello-world.sh.
+
+
More information on special variables like “$1”, “$2”, and “$@” can be found here.
+
+
+
Now, submit your job to HTCondor’s queue using condor_submit:
+
+
[alice@ap2002]$ condor_submit hello-world.sub
+
+
+
The condor_submit command actually submits your jobs to HTCondor. If all goes well, you will see output from the condor_submit command that appears as:
+
+
Submitting job(s)...
+3 job(s) submitted to cluster 36062145.
+
+
+
+
To check on the status of your jobs in the queue, run the following command:
You can run the condor_q command periodically to see the progress of your jobs.
+By default, condor_q shows jobs grouped into batches by batch name (if provided), or executable name.
+To show all of your jobs on individual lines, add the -nobatch option.
+
+
+
When your jobs complete after a few minutes, they’ll leave the queue.
+If you do a listing of your /home directory with the command ls -l, you should see something like:
+
+
[alice@submit]$ ls -l
+total 28
+-rw-r--r-- 1 alice alice 0 Apr 14 15:37 hello-world_36062145_0.err
+-rw-r--r-- 1 alice alice 60 Apr 14 15:37 hello-world_36062145_0.out
+-rw-r--r-- 1 alice alice 0 Apr 14 15:37 hello-world_36062145_0.log
+-rw-r--r-- 1 alice alice 0 Apr 14 15:37 hello-world_36062145_1.err
+-rw-r--r-- 1 alice alice 60 Apr 14 15:37 hello-world_36062145_1.out
+-rw-r--r-- 1 alice alice 0 Apr 14 15:37 hello-world_36062145_1.log
+-rw-r--r-- 1 alice alice 0 Apr 14 15:37 hello-world_36062145_2.err
+-rw-r--r-- 1 alice alice 60 Apr 14 15:37 hello-world_36062145_2.out
+-rw-r--r-- 1 alice alice 0 Apr 14 15:37 hello-world_36062145_2.log
+-rw-rw-r-- 1 alice alice 241 Apr 14 15:33 hello-world.sh
+-rw-rw-r-- 1 alice alice 1387 Apr 14 15:33 hello-world.sub
+
+
+
Useful information is provided in the user log, standard error, and standard output files.
+
+
HTCondor creates a transaction log of everything that happens to your jobs.
+Looking at the log file is very useful for debugging problems that may arise.
+Additionally, at the completion of a job, the .log file will print a table describing the amount of compute resources requested in the submit file compared to the amount the job actually used.
+An excerpt from hello-world_36062145_0.log produced due the submission of the 3 jobs will looks like this:
+
+
…
+005 (36062145.000.000) 2023-04-14 12:36:09 Job terminated.
+ (1) Normal termination (return value 0)
+ Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
+ Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
+ Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
+ Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
+ 72 - Run Bytes Sent By Job
+ 265 - Run Bytes Received By Job
+ 72 - Total Bytes Sent By Job
+ 265 - Total Bytes Received By Job
+ Partitionable Resources : Usage Request Allocated
+ Cpus : 0 1 1
+ Disk (KB) : 118 1024 1810509281
+ Memory (MB) : 54 1024 1024
+
+ Job terminated of its own accord at 2023-04-14T17:36:09Z with exit-code 0.
+
+
+
And, if you look at one of the output files, you should see something like this:
+Hello CHTC from Job 0 running on alice@e389.chtc.wisc.edu.
+
+
+
+
Congratulations. You’ve run an HTCondor job!
+
+
Important Workflow Elements
+
+
A. Removing Jobs
+
+
To remove a specific job, use condor_rm <JobID, ClusterID, Username>.
+Example:
+
+
[alice@ap2002]$ condor_rm 845638.0
+
+
+
B. Importance of Testing & Resource Optimization
+
+
+
+
Examine Job Success Within the log file, you can see information about the completion of each job, including a system error code (as seen in “return value 0”).
+You can use this code, as well as information in your “.err” file and other output files, to determine what issues your job(s) may have had, if any.
+
+
+
Improve Efficiency Researchers with input and output files greater than 1GB, should store them in their /staging directory instead of /home to improve file transfer efficiency.
+See our data transfer guides to learn more.
+
+
+
Get the Right Resource Requests
+Be sure to always add or modify the following lines in your submit files, as appropriate, and after running a few tests.
+
+
+
+
Submit file entry
+
Resources your jobs will run on
+
+
+
request_cpus = cpus
+
Matches each job to a computer "slot" with at least this many CPU cores.
+
+
+
request_disk = kilobytes
+
Matches each job to a slot with at least this much disk space, in units of KB.
+
+
+
request_memory = megabytes
+
Matches each job to a slot with at least this much memory (RAM), in units of MB.
+
+
+
+
+
Determining Memory and Disk Requirements.
+The log file also indicates how much memory and disk each job used, so that you can first test a few jobs before submitting many more with more accurate request values.
+When you request too little, your jobs will be terminated by HTCondor and set to “hold” status to flag that job as requiring your attention.
+To learn more about why a job as gone on hold, use condor_q -hold.
+When you request too much, your jobs may not match to as many available “slots” as they could otherwise, and your overall throughput will suffer.
+
+
+
+
You have the basics, now you are ready to run your OWN jobs!
+ We are the University of Wisconsin-Madison's core computational service provider for large scale computing.
+ CHTC services are open to UW-Madison staff, students, faculty, and external collaborators.
+
+ We offer both a High Throughput Computing system and a High Performance Computing cluster.
+ Access to CPUs/GPUs, high-memory servers, data storage capacity, as well as personalized consultations and classroom support,
+ are provided at no-cost.
+
+ Compiling or Testing Code with an Interactive Job
+
+
+
To best understand the below information, users should already have an
+understanding of:
+
+
+
Using the command line to: navigate within directories,
+create/copy/move/delete files and directories, and run their
+intended programs (aka "executables").
This guide provides a generic overview of steps required to install
+scientific software for use in CHTC. If you are using Python, R, or
+Matlab, see our specific installation and use guides here: Guides for
+Matlab, Python and R.
+
+
It is helpful to understand a little bit about normal “batch” HTCondor jobs
+before submitting interactive jobs. Just like batch jobs, interactive jobs
+can transfer input files (usually copies of source code or the software you
+want to install) and will transfer new/updated files in the main working directory
+back to the submit node when the job completes.
+
+
+
One exception to the file transfers working as usual is when running an interactive
+job that uses a Docker container. If any output files are generated inside an
+interactive Docker job, they will not be copied back to the submit node when you
+exist the interactive job. Contact the facilitation team for workarounds to this behavior.
+
+
+
+
+
1. Building a Software Installation
+
+
You are going to start an interactive job that runs on the HTC build
+servers. You will then install your packages to a folder and zip those
+files to return to the submit server.
+
+
+
+
A. Submit an Interactive Job
+
+
First, download the source code for your software to the submit server.
+Then create the following special submit file on the submit server,
+calling it something like build.sub.
+
+
Note that you’ll want to use +IsBuildJob = true to specifically match to CHTC’s servers designated for compiling code (which include Matlab compilers and other compiling tools you may need). Compiling servers do not include specialized resources like GPUs, extreme amounts of RAM/disk, etc.; to build/test software in these cases, submit an interactive job without +IsBuildJob.
+
+
# Software build file
+
+universe = vanilla
+log = interactive.log
+
+# In the latest version of HTCondor on CHTC, interactive jobs require an executable.
+# If you do not have an existing executable, use a generic linux command like hostname as shown below.
+executable = /usr/bin/hostname
+
+# change the name of the file to be the name of your source code
+transfer_input_files = source_code.tar.gz
+
++IsBuildJob = true
+# requirements = (OpSysMajorVer =?= 8)
+request_cpus = 1
+request_memory = 4GB
+request_disk = 2GB
+
+queue
+
+
+
The only thing you should need to change in the above file is the name
+of the source code tar.gz file - in the "transfer_input_files"
+line.
+
+
Once this submit file is created, you will start the interactive job by
+running the following command:
+
+
[alice@submit]$ condor_submit -i build.sub
+
+
+
The interactive build job should start in about a minute. Once it has
+started, the job has a time limit of four hours - if you need more time
+to compile a particular code, talk to CHTC's Research Computing
+Facilitators.
+
+
B. Install the Software
+
+
Software installation typically goes through a set of standard steps --
+configuration, then compilation (turning the source code into binary
+code that the computer can understand), and finally "installation",
+which means placing the compiled code into a specific location. In most
+install instructions, these steps look something like:
+
+
./configure
+make
+make install
+
+
+
There are two changes we make to this standard process. Because you are
+not an administrator, you will want to create a folder for the
+installation in the build job's working directory and use an option in
+the configuration step that will install the software to this folder.
+
+
In what follows, note that anything in italics is a name that you can
+(and should!) choose to be more descriptive. We use general names as
+an example; see the LAMMPS case study lower down to see what you might
+fill in for your own program.
+
+
+
+
In the interactive job, create a new directory to hold your final
+software installation:
+
+
[alice@build]$ mkdir program
+
+
+
+
You'll also want to un-tar the source code that you brought along,
+and cd into the source code folder.
+
+
[alice@build]$ tar -xzf source_code.tar.gz
+[alice@build]$ cd source_code/
+
+
+
+
Our next step will be to configure the installation. This involves
+changing into the un-tarred source code directory, and running a
+configuration script. It's at this step that we change the final
+installation location of the software from its default, to be the
+directory we created in the previous step. In a typical configure
+script, this option is called the "prefix" and is given by the
+--prefix flag.
Note that there are sometimes different options used. Some program
+use a helper program called cmake as their configuration script.
+Often the installation instructions for a program will indicate what
+to use as a prefix option, or, you can often run the configure
+command with the --help flag, which will have all the options
+which can be added to the configure command.
+
+
+
After the configuration step, you'll run the steps to compile and
+install your program. This is usually these two commands:
+
+
[alice@build]$ make
+[alice@build]$ make install
+
+
+
+
After this step, you can cd back up to the main working directory.
+
+
[alice@build]$ cd ..
+
+
+
+
Right now, if we exit the interactive job, nothing will be
+transferred back because we haven't created any new files in
+the working directory, just the new sub-folder with our software
+installation. In order to transfer back our installation, we will
+need to compress it into a tarball file - not only will HTCondor
+then transfer back the file, it is generally easier to transfer a
+single, compressed tarball file than an uncompressed set of
+directories.
+
+
Run the following command to create your own tarball of your
+packages:
+
+
[alice@build]$ tar -czf program.tar.gz program/
+
+
+
+
+
We now have our packages bundled and ready for CHTC! You can now exit
+the interactive job and the tar.gz file with your software installation
+will return to the submit server with you (this sometimes takes a few
+extra seconds after exiting).
+
+
[alice@build]$ exit
+
+
+
+
+
2. Case Study, Installing LAMMPS
+
+
First download a copy of LAMMPS and copy it to the submit server -- in
+this example, we've used the "stable" version under "Download a
+tarball": LAMMPS download
+page
+
+
Then, make a copy of the submit file above on the submit server,
+changing the name of the file to be transferred to
+lammps-stable.tar.gz. Submit the interactive job as described.
+
+
While waiting for the interactive build job to start, take a look at the
+installation instructions for LAMMPS:
You'll see that the install instructions have basically the same steps
+as listed above, with two changes:
+
+
+
+
Instead of the "configure" step, LAMMPS is using the "cmake"
+command. This means that we'll need to find the equivalent to the
+--prefix option for cmake. Reading further down in the
+documentation, you can see that there's this option:
+
+
-D CMAKE_INSTALL_PREFIX=path
+
+
+
This is exactly what we need to set the installation prefix.
+
+
+
There's extra steps before the configure step -- that's fine,
+we'll just add them to our list of commands to run.
+
+
+
+
With all these pieces together, this is what the commands will look like
+to install LAMMPS in the interactive build job and then bring the
+installed copy back to the submit server.
+
+
Create the folder for the installation:
+
+
[alice@build]$ mkdir lammps
+
+
+
Unzip and cd into a build directory:
+
+
[alice@build]$ tar -xf lammps-stable.tar.gz
+[alice@build]$ cd lammps-stable
+[alice@build]$ mkdir build; cd build
+
+
+
Run the installation commands:
+
+
[alice@build]$ cmake -D CMAKE_INSTALL_PREFIX=$_CONDOR_SCRATCH_DIR/lammps ../cmake
+[alice@build]$ make
+[alice@build]$ make install
+
+
+
Move back into the main job directory and create a tar.gz file of the
+installation folder.
+
+
[alice@build]$ cd ../..
+[alice@build]$ tar -czf lammps.tar.gz lammps
+[alice@build]$ exit
+
To obtain your copy of the Java Development Kit (JDK), go to https://jdk.java.net/.
+Click the link for the JDK that is “Ready for use”.
+There will be a download link “tar.gz” under the “Builds” section for “Linux/x64”.
+You can then either (a) right-click the download link and copy the link address, sign in to the submit server, and use the wget command with that link,
+or (b) click the link to download to your computer, then manually upload the file from your computer to the submit server.
+
+
The example above uses file names for JDK 22 as of 2024-04.
+Be sure to change the file names for the version that you actually use.
+We recommend that you test and explore your setup using an interactive job.
+
+
Executable
+
+
A bash .sh file is used as the executable file in order to unpack and set up the JDK environment for use by your script.
+Here is the executable from the section with comments:
+
+
#!/bin/bash
+
+# Decompress the JDK
+tar -xzf openjdk-22_linux-x64_bin.tar.gz
+
+# Add the new JDK folder to the bash environment
+export JAVA_HOME=$PWD/jdk-22
+export PATH=$JAVA_HOME/bin:$PATH
+
+# Run your program
+java -jar program.jar
+
This approach may be sensitive to the operating system of the execution point.
+We recommend building a container instead, but are keeping these instructions as a backup.
+
+
+
+
+
More information
+
+
No CHTC machine has Julia pre-installed, so you must configure a portable copy of Julia to work on the HTC system.
+Using a container as described above is the easiest way to accomplish this.
+
+
Executable
+
+
When using a container, you can use a .jl script as the submit file executable, provided that the first line (the “shebang”) in the .jl file is
+
+
#!/usr/bin/env julia
+
+
+
with the rest of the file containing the commands you want to run using Julia.
+
+
Alternatively, you can use a bash .sh script as the submit file executable, and in that file you can use the julia command:
+
+
#!/bin/bash
+
+julia my-script.jl
+
+
+
In this case, remember to include your .jl file in the transfer_input_files line of your submit file.
+
+
Arguments
+
+
For more information on passing arguments to a Julia script, see the
+Julia documentation.
+
+
Option B: Create your own portable copy
+
+
Use a portable copy of Julia and create your own portable copy of your Julia packages
+
+
This approach may be sensitive to the operating system of the execution point. We recommend building a container instead, but are keeping these instructions as a backup.
+
+
+
+
Download the precompiled Julia software from https://julialang.org/downloads/.
+You will need the 64-bit, tarball compiled for general use on a Linux x86 system. The
+file name will resemble something like julia-#.#.#-linux-x86_64.tar.gz.
+
+
+
Tip: use wget to download directly to your /home directory on the
+submit server, OR use transfer_input_files = url in your HTCondor submit files.
+
+
+
+
Submit an “interactive build” job to create a Julia project and
+install packages, else skip to the next step.
Submit a job that executes a Julia script using the Julia precompiled binary
+with base Julia and Standard Library.
+
+
#!/bin/bash
+
+ # extract Julia binaries tarball
+ tar -xzf julia-#.#.#-linux-x86_64.tar.gz
+
+ # add Julia binary to PATH
+ export PATH=$_CONDOR_SCRATCH_DIR/julia-#.#.#/bin:$PATH
+
+ # run Julia script
+ julia my-script.jl
+
+
+
+
For more details on the job submission, see the section
+ below: Submit Julia Jobs
+
+
+
+
+
Install Julia Packages
+
+
If your work requires additional Julia packages, you will need to peform a one-time
+installation of these packages within a Julia project. A copy of the project
+can then be saved for use in subsequent job submissions. For more details,
+please see Julia’s documentation at Julia Pkg.jl.
+
+
Create An Interactive Build Job Submit File
+
+
To install your Julia packages, first create an HTCondor submit for
+submitting an “interactive build” job which is a job that will run
+interactively on one of CHTC’s servers dedicated for building
+(aka compiling) software.
+
+
Using a text editor, create the following file, which can be named build.sub
+
+
# Julia build job submit file
+
+universe = vanilla
+log = julia-build.log
+
+# In the latest version of HTCondor on CHTC, interactive jobs require an executable.
+# If you do not have an existing executable, use a generic linux command like hostname as shown below.
+executable = /usr/bin/hostname
+
+# have job transfer a copy of precompiled Julia software
+# be sure to match the name of the version
+# that you have downloaded to your home directory
+transfer_input_files = julia-#.#.#-linux-x86_64.tar.gz
+
++IsBuildJob = true
+
+request_cpus = 1
+request_memory = 4GB
+request_disk = 2GB
+
+queue
+
+
+
The only thing you should need to change in the above file is the name
+of the Julia tarball file in the "transfer_input_files" line.
+
+
Submit Your Interactive Build Job
+
+
Once this submit file is created, submit the job using the following command:
+
+
[alice@submit]$ condor_submit -i build.sub
+
+
+
It may take a few minutes for the build job to start.
+
+
Install Julia Packages Interactively
+
+
Once the interactive jobs starts you should see the following
+inside the job’s working directory:
+
+
bash-4.2$ ls -F
+julia-#.#.#-linux-x86_64.tar.gz tmp/ var/
+
+
+
Run the following commands
+to extract the Julia software and add Julia to your PATH:
+
+
bash-4.2$ tar -xzf julia-#.#.#-linux-x86_64.tar.gz
+bash-4.2$ export PATH=$_CONDOR_SCRATCH_DIR/julia-#.#.#/bin:$PATH
+
+
+
After these steps, you should be able to run Julia from the command line, e.g.
+
+
julia --version
+
+
+
Now create a project directory to install your packages (we’ve called
+it my-project/ below) and tell Julia its name:
You can choose whatever name to use for this directory -- if you have
+different projects that you use for different jobs, you could
+use a more descriptive name than “my-project”.
+
+
We will now use Julia to install any needed packages to the project directory
+we created in the previous step.
+
+
Open Julia with the --project option set to the project directory:
+
+
bash-4.2$ julia --project=my-project
+
+
+
Once you’ve started up the Julia REPL (interpreter), start the Pkg REPL, used to
+install packages, by typing ]. Then install and test packages by using
+Julia’s add Package syntax.
If you have multiple packages to install they can be combined
+into a single command, e.g. (my-project) pkg> add Package1 Package2 Package3.
+
+
If you encounter issues getting packages to install successfully, please
+contact us at chtc@cs.wisc.edu.
+
+
Once you are done, you can exit the Pkg REPL by typing the Delete key and then
+exit()
+
+
(my-project) pkg>
+julia> exit()
+
+
+
Save Installed Packages For Later Jobs
+
+
To use this project, and the associated installed packages, in
+subsequent jobs, we need to have HTCondor return some files to
+the submit server by converting the my-project/ directory
+to a tarball, before exiting the interactive job session:
+
+
bash-4.2$ tar -czf my-project.tar.gz my-project/
+bash-4.2$ exit
+
+
+
After the job exits, you will be returned to your /home directory on the
+submit server (specifically where ever you were located when you submitted
+the interactive build job). A copy of packages.tar.gz will be present. Be
+sure to check the size of the project tarball before proceeding to subsequent job
+submissions. If the file is >100MB please contact us at chtc@cs.wisc.edu so
+that we can get you setup with access to our SQUID web proxy. More details
+are available on our SQUID guide: File Availability with SQUID
+
+
[alice@submit]$ ls
+build.sub julia-#.#.#-linux-x86_64.tar.gz julia-build.log
+my-project.tar.gz
+[alice@submit]$ ls -sh my-project.tar.gz
+
+
+
Submit Julia Jobs
+
+
To submit a job that runs a Julia script, create a bash
+script and HTCondor submit file following the examples in this section.
+These examples assume that you have downloaded a copy of Julia for Linux as a tar.gz
+file and if using packages, you have gone through the steps above to install them
+and create an additional tar.gz file of the installed packages.
+
+
Create Executable Bash Script
+
+
Your job will use a bash script as the HTCondor executable. This script
+will contain all the steps needed to unpack the Julia binaries and
+execute your Julia script (script.jl). Below are two example bash script,
+one which can be used to execute a script with base Julia, and one that
+will use packages installed in Julia project (see Install Julia Packages).
+
+
Example Bash Script For Base Julia Only
+
+
If your Julia script can run without additional packages (other than base Julia and
+the Julia Standard library) use the example script directly below.
+
+
#!/bin/bash
+
+# julia-job.sh
+
+# extract Julia tar.gz file
+tar -xzf julia-#.#.#-linux-x86_64.tar.gz
+
+# add Julia binary to PATH
+export PATH=$_CONDOR_SCRATCH_DIR/julia-#.#.#/bin:$PATH
+
+# run Julia script
+julia script.jl
+
+
+
Example Bash Script For Julia With Installed Packages
+
+
#!/bin/bash
+
+# julia-job.sh
+
+# extract Julia tar.gz file and project tar.gz file
+tar -xzf julia-#.#.#-linux-x86_64.tar.gz
+tar -xzf my-project.tar.gz
+
+# add Julia binary to PATH
+export PATH=$_CONDOR_SCRATCH_DIR/julia-#.#.#/bin:$PATH
+# add Julia packages to DEPOT variable
+export JULIA_DEPOT_PATH=$_CONDOR_SCRATCH_DIR/my-project
+
+# run Julia script
+julia --project=my-project script.jl
+
+
+
Create HTCondor Submit File
+
+
After creating a bash script to run Julia, then create a submit file
+to submit the job to run.
+
+
More details about setting up a submit file, including a submit file template,
+can be found in our hello world example page at Run Your First CHTC Jobs.
If your Julia script needs to use packages installed for a project,
+be sure to include my-project.tar.gz as in input file in julia-job.sub.
+For project tar.gz files that are <100MB, you can follow the below example:
For project tar.gz files that are larger than 100MB, email a facilitator about
+using SQUID.
+
+
Modify the CPU/memory request lines to match what is needed by the job.
+Test a few jobs for disk space/memory usage in order to make sure your
+requests for a large batch are accurate! Disk space and memory usage can be found in the
+log file after the job completes.
+
+
Submit Your Julia Job
+
+
Once you have created an executable bash script and submit file, you can
+submit the job to run using the following command:
This guide describes when and how to run jobs that use licensed software in
+CHTC’s high throughput compute (HTC) system.
+
+
To best understand the below information, users should already have an
+understanding of:
+
+
+
Using the command line to: navigate within directories,
+create/copy/move/delete files and directories, and run their
+intended programs (aka "executables").
A. CHTC's Licensed Software Policies on the HTC System
+
+
Our typical practice for software support in CHTC is for users to
+install and manage their own software installations. We have multiple
+guides to help users with common software
+programs and additional support is always
+available through CHTC's research computing
+facilitators.
+
+
However, certain software programs require paid licenses which can make
+it challenging for individual users to install the software and use the
+licenses correctly. As such, we provide support for software
+installation and use on our high throughput system. Installation of
+licensed programs is by request to and at the discretion of CHTC staff.
+
+
We always recommend using a free or open-source software alternative
+whenever possible, as certain software licenses restrict the amount of
+computing that can contribute to your research.
+
+
+
+
B. Viewing Licensed Software on the HTC System
+
+
Software with paid licenses that has been installed on the high
+throughput (HTC) system is accessible through software "modules",
+which are tools to access and activate a software installation. To see
+which software programs are available on the HTC system, run the
+following command on an HTC submit server:
+
+
[alice@submit]$ module avail
+
+
+
+
Note: you should never run a program directly on the submit server.
+Jobs that use licensed software/modules should always be submitted as
+HTCondor jobs as described below.
+
+
+
Note that not all software modules are available to all CHTC users. Some
+programs like ansys have a campus or shared license which makes them
+available to all CHTC users. Other software, like lumerical and
+abaqus, is licensed to a specific group and is only available to
+members of that group.
+
+
+
+
C. Submitting Jobs Using Licensed Software Modules
+
+
The following sections describe how to create a bash script executable
+and HTCondor submit file to run jobs that use software accessible via
+the modules.
+
+
+
+
1. Script For Running Jobs with Modules
+
+
To run a job that uses a licensed software installation on the HTC
+system, you need to write a script that loads the software module and
+then runs the program, like so:
+
+
#!/bin/bash
+
+# Commands to enable modules, and then load an appropriate software module
+export PATH
+. /etc/profile.d/modules.sh
+module load software
+
+# For Lumerical (the license requires a home directory)
+export HOME=$_CONDOR_SCRATCH_DIR
+
+# Command to run your software from the command line
+cmd -options input.file
+
+
+
Replace software with the name of the software module you want to use,
+found via the module avail command described above. Replace
+the final command with the syntax to run your software, with the
+appropriate options.
+
+
For example, to run a Comsol job, the script might look like this:
There are several important requirements to consider when writing a
+submit file for jobs that use our licensed software modules. They are
+shown in the sample submit file below and include:
+
+
+
+
Require access to the modules. To ensure that your job will have
+access to CHTC software modules you must include the following in
+your submit file.
+
+
requirements = (HasChtcSoftware == true)
+
+
+
+
Add a concurrency limit. For software with limited licenses, we have
+implemented concurrency limits, which control the number of jobs running
+at once in the HTC system. If your software is in the table below, use
+the concurrency limit name in your submit file like this:
So if you were planning to run a job that used one ANSYS license, you would
+use:
+
+
concurrency_limits = ANSYS_RESEARCH:1
+
+
+
Request accurate CPUs and memory. Run at least one test job and
+look at the log file produced by HTCondor to determine how much
+memory and disk space your jobs actually use. We recommend
+requesting the smallest number of CPUs where your job will finish in
+1-2 days.
+
The script you wrote above (shown as run_job.sh below) should be
+your submit file "executable", and any input files should be
+listed in transfer_input_files.
+
+
+
A sample submit file is given below:
+
+
# software.sub
+# A sample submit file for running a single job using software modules
+
+universe = vanilla
+log = job_$(Cluster).log
+output = job_$(Cluster).out
+error = job_$(Cluster).err
+
+# the executable should be the script you wrote above
+executable = run_job.sh
+# arguments = (if you want to pass any to the shell script)
+should_transfer_files = YES
+when_to_transfer_output = ON_EXIT
+transfer_input_files = (this should be a comma separate list of input files if needed)
+
+# Requirement for accessing new set of software modules
+requirements = ( HasChtcSoftware == true )
+
+# If required, add the concurrency limit for your software and uncomment
+# concurrency_limits = LIMIT_NAME:num_of_licenses_used
+
+request_cpus = 1
+request_memory = 2GB
+request_disk = 2GB
+
+queue
+
+
+
After the submit file is complete, you can submit your jobs using
+condor_submit.
This guide provides some of our recommendations for success
+in running machine learning (specifically deep learning) jobs in CHTC.
+
+
+
This is a new how-to guide on the CHTC website. Recommendations and
+feedback are welcome via email (chtc@cs.wisc.edu) or by creating an
+issue on the CHTC website Github repository: Create an issue
+
+
+
Overview
+
+
It is important to understand the needs of a machine learning job before submitting
+it and have a plan for managing software. This guide covers:
Before digging into the nuts and bolts of software installation in the next section,
+it is important to first consider a few other job requirements that might apply to
+your machine learning job.
+
+
A. Do you need GPUs?
+
+
CHTC has about 4 publicly available GPUs and thousands of CPUs. When possible, using
+CPUs will allow your jobs to start more quickly and to have many running at once. For
+certain calculations, GPUs may provide a different advantage as some machine learning
+algorithms are optimized to run significantly faster on GPUs. Consider whether you
+would benefit from running one or two long-running calculations on a GPU or if your
+work is better suited to running many jobs on CHTC’s available CPUs.
+
+
If you need GPUs for your jobs, you can see a summary of available GPUs in CHTC and
+how to access them here:
Note that you may need to use different versions of your software, depending on whether or
+not you are using GPUs, as shown in the software section of this guide.
+
+
B. How big is your data?
+
+
CHTC’s usual data recommendations apply for machine learning jobs. If your job is using
+an input data set larger than a few hundred MB or generating output files larger than
+a few GB, you will likely need to use our large data
+file share. Contact the CHTC Research Computing Facilitators to get access and
+read about the large data location here:
CHTC’s default job length is 72 hours. If your task is long enough that you will
+encounter this limit, contact the CHTC Research Computing Facilitators (chtc@cs.wisc.edu)
+for potential work arounds.
+
+
D. How many jobs do you want to submit?
+
+
Do you have the ability to break your work into many independent pieces? If so,
+you can take advantage of CHTC’s capability to run many independent jobs at once,
+especially when each job is using a CPU. See our guide for running multiple jobs here:
Many of the tools used for machine learning, specifically deep learning and
+convolutional neural networks, have enough dependencies that our usual installation
+processes work less reliably. The following options are the best way to handle the complexity
+of these software tools.
+
+
Please be aware of which CUDA library version you are using to run your code.
+
+
A. Using Docker Containers
+
+
CHTC’s HTC system has the ability to run jobs using Docker containers, which package
+up a whole system (and software) environment in a consistent, reproducible, portable
+format. When possible, we recommend using standard, publicly available
+Docker containers to run machine learning jobs in CHTC.
+
+
To see how you can use Docker containers to run jobs in CHTC, see:
Pytorch on Docker Hub - we recommend choosing the most recently published image that ends in -runtime.
+
+
+
If you can not find a Docker container with exactly the tools you need, you can build your
+own, starting with one of the containers above. For instructions on how to build and
+test your own Docker container, see this guide:
The Python package manager conda is a popular tool for installing and
+managing machine learning tools.
+See this guide for information on how
+to use conda to provide dependencies for CHTC jobs.
+
+
Note that when installing TensorFlow using conda, it is important to install
+not the generic tensorflow package, but tensorflow-gpu. This ensures that
+the installation will include the cudatoolkit and cudnn dependencies
+required for TensorFlow ‘s GPU capability.
Note: Because Matlab is a licensed software, you must add the following line to your submit file:
+
+
concurrency_limits = MATLAB:1
+
+
+
Failure to do so may cause your or other users’ jobs to fail to obtain a license from the license server.
+
+
+
+
+
More information
+
+
CHTC has a site license for Matlab that allows for up to 10,000 jobs to run at any given time across all CHTC users.
+Hence the requirement for adding the line concurrency_limits = MATLAB:1 to your submit files, so that HTCondor can keep track of which jobs are using or will use a license.
+
+
Following the instructions above, you are able to install a variety of Matlab Toolboxes when building the container.
+The Toolboxes available for each supported version of Matlab are described here: https://github.com/mathworks-ref-arch/matlab-dockerfile/blob/main/mpm-input-files/.
+Navigate to the text file for the version of interest, and look at the section named “INSTALL PRODUCTS”.
+The example recipes linked above provide instructions on how to specify the packages you want to install when building the container.
+
+
Executable
+
+
When using the Matlab container, we recommend the following process for executing your Matlab commands in an HTCondor job:
+
+
+
+
Put your Matlab commands in a .m script. For this example, we’ll call it my-script.m.
+
+
+
Create the file run-matlab.sh with the following contents:
+
+
#!/bin/bash
+
+matlab -batch "my-script"
+
+
+
Note that in the script, the .m extension has been dropped from the file name (uses "my-script" instead of "my-script.m").
+
+
+
In your submit file, set the .sh script as the executable and list the .m file to be transferred:
You can pass arguments from your submit file to your Matlab code via your executable .sh and the matlab -batch command.
+Arguments in your submit file are accessible inside your executable .sh script with the syntax ${n}, where n is the nth value passed in the arguments line.
+You can use this syntax inside of the matlab -batch command.
+
+
For example, if your Matlab script (my-script.m) is expecting a variable foo, you can add foo=${1} before calling my-script:
This will use the first argument from the submit file to define the Matlab variable foo.
+By default, such values are read in by Matlab as numeric values (or as a Matlab function/variable that evaluates to a numeric function).
+If you want Matlab to read in the argument as a string, you need to add apostrophes around the value, like this:
Here, the value of bar is defined as the second argument from the submit file, and will be identified by Matlab as a string because it’s wrapped in apostrophes ('${2}').
+
+
If you have defined your script to act as a function, you can call the function directly and pass the arguments directly as well.
+For example, if you have constructed your my-script.m as a function, then you can do
Again, by default Matlab will interpret these value of these variables as numeric values, unless you wrap the argument in apostrophes as described above.
+ Submitting Multiple Jobs in Individual Directories
+
+
+
This guide demonstrates how to submit multiple jobs, using a specific
+directory structure. It is relevant to:
+
+
+
Researchers who have used CHTC's "ChtcRun" tools in the past
+
Anyone who wants to submit multiple jobs, where each job has its own
+directory for input/output files on the submit server.
+
+
+
1. Software and Input Preparation
+
+
The first time you submit jobs, you will need to prepare a portable
+version of your software and a script (what we call the job's
+"executable") that runs your code. We have guides for preparing:
Choose the right guide for you and follow the directions for compiling
+your code (Matlab) or building an installation (Python, R). Also follow
+the instructions for writing a shell script that runs your program.
+These are typically steps 1 and 2 of the above guides.
+
+
2. Directory Structure
+
+
Once you've prepared your code and script, create the same directory
+structure that you would normally use with ChtcRun. For a single batch
+of jobs, the directories will look like this:
You'll want to put all your code and files required for every job in
+shared/ and individual input files in the individual job directories
+in an input folder. In the submit file below, it matters that the
+individual job directories start with the word "job".
+
+
+
Note: the job directories need to be hosted in your /home directory
+on the submit node. The following instructions will not work for files
+hosted on /staging!
+
+
+
3. Submit File
+
+
+
Note: if you are submitting more than 10,000 jobs at once, you'll
+need to use a different submit file. Please email the CHTC Research
+Computing Facilitators at chtc@cs.wisc.edu if this is the case!
+
+
+
Your submit file, which should go in your main project directory, should
+look like this:
+
+
# Specify the HTCondor Universe (vanilla is the default and is used
+# for almost all jobs), the desired name of the HTCondor log file,
+# and the desired name of the standard error and standard output file.
+universe = vanilla
+log = process.log
+error = process.err
+output = process.out
+#
+# Specify your executable (single binary or a script that runs several
+# commands) and arguments
+executable = run_code.sh
+# arguments = arguments to your script go here
+#
+# Specify that HTCondor should transfer files to and from the
+# computer where each job runs.
+should_transfer_files = YES
+when_to_transfer_output = ON_EXIT
+# Set the submission directory for each job with the $(directory)
+# variable (set below in the queue statement). Then transfer all
+# files in the shared directory, and from the input folder in the
+# submission directory
+initialdir = $(directory)
+transfer_input_files = ../shared/,input/
+#
+# Tell HTCondor what amount of compute resources
+# each job will need on the computer where it runs.
+request_cpus = 1
+request_memory = 1GB
+request_disk = 1GB
+#
+# Create a job for each "job" directory.
+queue directory matching job*
+
+
+
You must change the name of the executable to your own script, and
+in certain cases, add arguments.
+
+
Note that the final line matches the pattern of your directory names
+created in the second step. You can use a different name for the
+directories (like data or seed), but you should use whatever word
+they share in the final queue statement in place of "job".
HTCondor has several convenient features for streamlining high-throughput
+job submission. This guide provides several examples
+of how to leverage these features to submit multiple jobs with a
+single submit file.
+
+
Why submit multiple jobs with a single submit file?
+
+
Users should submit multiple jobs using a single submit file, or where applicable, as few
+separate submit files as needed. Using HTCondor multi-job submission features is more
+efficient for users and will help ensure reliable operation of the the login nodes.
+
+
Many options exist for streamlining your submission of multiple jobs,
+and this guide only covers a few examples of what is truly possible with
+HTCondor. If you are interested in a particular approach that isn’t described here,
+please contact CHTC’s research computing facilitators and we will
+work with you to identify options to meet the needs of your work.
+
+
+
Before you continue reading: While HTCondor is designed to submit many jobs at a
+time using a single submit file, the hardware of the submit server can be overwhelmed
+if there are a significant number of jobs submitted at once or rapidly starting and finishing.
+Therefore, plan ahead for the following to scenarios:
+
+
1) If you plan to submit 10,000+ jobs at a time, please let us
+ know, so we can provide options that will protect the queue’s performance.
+2) If you plan to submit 1000+ jobs, please make sure that each job
+ has a minimum run time of 10 minutes (on average). If your calculations are shorter than
+ 10 minutes, then modify your workflow to run multiple calculations per job.
+
+
+
+
+
1. Submit Multiple Jobs Using queue
+
+
All HTCondor submit files require a queue attribute (which must also be
+the last line of the submit file). By default, queue will submit one job, but
+users can also configure the queue attribute to behave like a for loop
+that will submit multiple jobs, with each job varying as predefined by the user.
+
+
Below are different HTCondor submit file examples for submitting batches of multiple
+jobs and, where applicable, how to indicate the differences between jobs in a batch
+with user-defined variables. Additional examples and use cases are provided further below:
+
+
+
queue <N> - will submit N number of jobs. Examples
+include performing replications, where the same job must be repeated N number
+of times, looping through files named with numbers, and looping through
+a matrix where each job uses information from a specific row or column.
+
queue <var> from <list> - will loop through a
+list of file names, parameters, etc. as defined in separate text file (i.e. **).
+This `queue` option is very flexible and provides users with many options for
+submitting multiple jobs.
What makes these queue options powerful is the ability to use user-defined
+variables to specify details about your jobs in the HTCondor submit file. The
+examples below will include the use of $(variable_name) to specify details
+like input file names, file locations (aka paths), etc. When selecting a
+variable name, users must avoid bespoke HTCondor submit file variables
+such as Cluster, Process, output, and input, arguments, etc.
+
+
2. Use queue N in your HTCondor submit files
+
+
+
When using queue N, HTCondor will submit a total of N
+jobs, counting from 0 to N - 1 and each job will be assigned
+a unique Process id number spanning this range of values. Because
+the Process variable will be unique for each job, it can be used in
+the submit file to indicate unique filenames and filepaths for each job.
+
+
The most straightforward example of using queue N is to submit
+N number of identical jobs. The example shown below demonstrates
+how to use the Cluster and Process variables to assign unique names
+for the HTCondor error, output, and log files for each job in the batch:
For each job, the appropriate number, 0, 1, 2, ... 99 will replace $(Process).
+$(Cluster) will be a unique number assigned to the entire 100 job batch. Each
+time you run condor_submit job.sub, you will be provided
+with the Cluster number which you will also see in the output produced by
+the command condor_q.
+
+
If a uniquely named results file needs to be returned by each job,
+$(Process) and $(Cluster) can also be used as arguments, and anywhere
+else as needed, in the submit file:
$(Process) can be used to specify a unique row or column of information in a
+matrix to be used by each job in the batch. The matrix needs to then be transferred
+with each job as input. For exmaple:
The above exmaples assumes that your job is set up to use an argument to
+specify the row or column to be used by your software.
+
+
+
2C. Need N to start at 1
+
+
If your input files are numbered 1 - 100 instead of 0 - 99, or your matrix
+row starts with 1 instead of 0, you can perform basic arithmetic in the submit
+file:
Then use $(NewProcess) anywhere in the submit file that you would
+have otherwise used $(Process). Note that there is nothing special about the
+names plusone and NewProcess, you can use any names you want as variables.
+
+
+
3. Submit multiple jobs with one or more distinct variables per job
+
+
Think about what’s different between each job that needs to be submitted.
+Will each job use a different input file or combination of software parameters? Do
+some of the jobs need more memory or disk space? Do you want to use a different
+software or script on a common set of input files? Using queue <var> from <list>
+in your submit files can make that possible! <var> can be a single user-defined
+variable or comma-separated list of variables to be used anywhere in the submit file.
+<list> is a plain text file that defines <var> for each individual job to be submitted in the batch.
+
+
Suppose you need to run a program called compare_states that will run on
+on the following set of input files: illinois.data, nebraska.data, and
+wisconsin.data and each input file can analyzed as a separate job.
+
+
To create a submit file that will submit all three jobs, first create a
+text file that lists each .data file (one file per line).
+This step can be performed directly on the login node, for example:
Then, in the submit file, following the pattern queue <var> from <list>,
+replace <var> with a variable name like state and replace <list>
+with the list of .data files saved in states.txt:
+
+
queue state from states.txt
+
+
+
For each line in states.txt, HTCondor will submit a job and the variable
+$(state) can be used anywhere in the submit file to represent the name of the .data file
+to be used by that job. For the first job, $(state) will be illinois.data, for the
+second job $(state) will be nebraska.data, and so on. For example:
Let’s imagine that each state .data file contains data spanning several
+years and that each job needs to analyze a specific year of data. Then
+the states.txt file can be modified to specify this information:
4A. Submitting Multiple Jobs in Different Directories with queue <variable> from list
+
+
One way to organize jobs is to assign each job to its own directory,
+instead of putting files in the same directory with unique names. To
+continue our "compare_states" example, suppose there's a directory
+for each state you want to analyze, and each of those directories has
+its own input file named input.data:
+
+
[user@state-analysis]$ ls -F
+compare_states illinois/ nebraska/ wisconsin/
+
+[user@state-analysis]$ ls -F illinois/
+input.data
+
+[user@state-analysis]$ ls -F nebraska/
+input.data
+
+[user@state-analysis]$ ls -F wisconsin/
+input.data
+
+
+
The HTCondor submit file attribute initialdir can be used
+to define a specific directory from which each job in the batch will be
+submitted. The default initialdir location is the directory from which the
+command condor_submit myjob.sub is executed.
+
+
Combining queue var from list with initiadir, each line of ** will include
+the path to each state directory and `initialdir` set to this path for
+each job:
Notice that executable = compare_states has remained unchanged in the above example.
+When using initialdir, only the input and output file path (including the HTCondor log, error, and
+output files) will be changed by initialdir.
+
+
In this example, HTCondor will create a job for each directory in state-dirs.txt and use
+that state's directory as the initialdir from which the job will be submitted.
+Therefore, transfer_input_files = input.data can be used without specifying
+the path to this input.data file. Any output generated by the job will then be returned to the initialdir
+location.
+
+
+
4B. Submitting Multiple Jobs in Different Directories with queue <directory> matching *
+
+
This section demonstrates how to submit multiple jobs, using a specific
+directory structure where folder names have a string of text in common. It is relevant to anyone who wants to submit multiple jobs, where each job has its own directory for input/output files on the submit server.
+
+
Directory Structure
+For a single batch of jobs, the directories will look like this:
You'll want to put all your code and files required for every job in
+shared/ and individual input files in the individual job directories
+in an input folder. In the submit file below, it matters that the
+individual job directories start with the word "job". Your directories should all have a string of text in common, so that you can use the queue <directory> matching <commonString>* syntax to queue a job for each directory.
+
+
+
Note: the job directories need to be hosted in your /home directory
+on the submit node. The following instructions will not work for files
+hosted on /staging!
+
+
+
Submit File
+Your submit file, which should go in your main project directory, should
+look like this:
+
+
# Specify your executable (single binary or a script that runs several
+# commands) and arguments
+executable = run_code.sh
+# arguments = arguments to your script go here
+#
+# Specify the desired name of the HTCondor log file,
+# and the desired name of the standard error and standard output file.
+log = process.log
+error = process.err
+output = process.out
+#
+# Specify that HTCondor should transfer files to and from the
+# computer where each job runs.
+should_transfer_files = YES
+# Set the submission directory for each job with the $(directory)
+# variable (set below in the queue statement). Then transfer all
+# files in the shared directory, and from the input folder in the
+# submission directory
+initialdir = $(directory)
+transfer_input_files = ../shared/,input/
+#
+# Tell HTCondor what amount of compute resources
+# each job will need on the computer where it runs.
+request_cpus = 1
+request_memory = 1GB
+request_disk = 1GB
+#
+# Create a job for each "job" directory.
+queue directory matching job*
+
+
+
Note that the final line matches the pattern of your directory names that you previously
+created. You can use a different name for the
+directories (like data, sample, or seed), but you should use whatever word
+the directories have in common in the final queue statement in place of "job".
By default, CHTC-managed submit servers automatically add a job
+requirement that requires jobs to run on servers running our primary operating system unless otherwise specified by the user. There are two options to override this
+default:
Using a container to provide a base version of Linux will allow you to
+run on any nodes in the HTC system, and not limit you to a subset of nodes.
+
+
After finding a container with the desired version of Linux, just follow our instructions
+for Docker or Singularity/Apptainer jobs.
+
+
Note that the default Linux containers on Docker Hub are often missing commonly installed
+packages. Our collaborators in OSG Services maintain a few curated containers with a
+greater selection of installed tools that
+can be seen here: Base Linux Containers
+
+
Option 2: Requesting a Specific Operating System
+
+
At any time, you can require a specific operating system
+version (or versions) for your jobs. This option is more limiting because
+you are restricted to operating systems used by CHTC, and the number of nodes
+running that operating system.
+
+
Require CentOS Stream 8 (previous default) or CentOS Stream 9
+
+
To request that your jobs run on servers with CentOS 8 only, add the
+following line to your submit file:
+
+
chtc_want_el8 = true
+
+
+
To request that your jobs run on servers with CentOS 9 only, add
+the following line to your submit file:
+
+
chtc_want_el9 = true
+
+
+
+
Note that after May 1, 2024, CentOS9 will be the default and CentOS8 will be phased out
+by the middle of summer 2024. If you think your code relies on CentOS8, make sure to
+see our transition guide or talk to the facilitation
+team about a long-term strategy for running your work.
+
+
+
Use Both CentOS Stream 8 (previous default) and CentOS Stream 9 (current default)
+
+
To request that your jobs run on computers running either version of
+CentOS Linux, add the following requirements line to your submit file:
Note: these requirements are not necessary for jobs that use Docker containers;
+these jobs will run on servers with any operating system automatically.
+
+
+
The advantage of this option is that you may be able to access a
+larger number of computers in CHTC. Note that code compiled on a
+newer version of Linux may not run older versions of Linux. Make
+sure to test your jobs specifically on both CentOS Stream 8 and CentOS Stream 9
+before using the option above.
+
+
Does your job already have a requirements statement? If so, you can
+add the requirements above to the pre-existing requirements by using
+the characters &&. For example, if your jobs already require large
+data staging:
+
+
requirements = (Target.HasCHTCStaging == true)
+
+
+
You can add the requirements for using both operating system versions like so:
The HTC system has two primary locations where users can store files: /home and /staging.
+
+
The mechanisms behind /home and /staging that manage data are different and are optimized to handle different file sizes. /home is more efficient at managing small files, while /staging is more efficient at managing larger files. It’s important to place your files in the correct location, as it will improve the speed and efficiency at which your data is handled and will help maintain the stability of the HTC filesystem.
+
+
Understand your file sizes
+
To know whether a file should be placed in /home or in /staging, you will need to know it’s file size (also known as the amount of “disk space” a file uses). There are many commands to print out your file sizes, but here are a few of our favorite:
+
+
Use ls with -lh flags
+
The command ls stands for “list” and, by default, lists the files in your current directory. The flag -l stands for “long” and -h stands for “human-readable”. When the flags are combined and passed to the ls command, it prints out the long metadata associated with the files and converts values such as file sizes into human-readable formats (instead of a computer readable format).
+
+
NetID@submit$ ls -lh
+
+
+
Use du -h
+
Similar to ls -lh, du -h prints out the “disk usage” of directories in a human-readable format.
+
+
NetID@submit$ du -h
+
+
+
Transferring Data to Jobs
+
The HTCondor submit file transfer_input_files = line should always be used to tell HTCondor what files to transfer to each job, regardless of if that file is origionating from your /home or /staging directory. However, the syntax you use to tell HTCondor to fetch files from /home and /staging and transfer to your running job will change:
When a job completes, by default, HTCondor will return newly created or edited files on the top level directory back to your /home directory.
+
+
To transfer files or folders back to /staging, in your HTCondor submit file, use
+transfer_output_remaps = “output1.txt = file:///staging/NetID/output1.txt”, where output1.txt is the name of the output file or folder you would like transfered back to a /staging directory.
+
+
If you have more than one file or folder to transfer back to /staging, use a semicolon (;) to seperate multiple files for HTCondor to transfer back like so:
+transfer_output_remaps = “output1.txt = file:///staging/NetID/output1.txt; output2.txt = file:///staging/NetID/output2.txt”
+
+
Make sure to only include one set of quotation marks that wraps around the information you are feeding to transfer_output_remaps =.
All CHTC machines have a base installation of Python 3.
+The exact versions and packages installed, however, can vary from machine to machine.
+You should be able to include simple python commands in your calculations, i.e., python3 simple-script.py.
+
+
If you need a specific version of Python 3 or would like to install your own packages, we recommend that you use a container as described above.
+
+
The example recipes provided above for building your own container are intended for python packages that can be installed using python3 -m pip install.
+Additional software can be installed when building your own container.
+
+
For packages that need to be installed with conda install, see the guide on Conda.
+
+
Executable
+
+
When using a container, you can use a python .py script as the submit file executable, provided that the first line (the “shebang”) in the .py file is
+
+
#!/usr/bin/env python3
+
+
+
with the rest of the file containing the commands that you want to run using Python.
+
+
Alternatively, you can use a bash .sh script as the submit file executable, and in that file you can use the python3 command:
+
+
#!/bin/bash
+
+python3 my-script.py
+
+
+
In this case, remember to include your .py file in the transfer_input_files line of your submit file.
To request a change in quota(s) for data storage locations on CHTC systems, please fill out the form below.
+This form applies to the following locations for both individual and shared (group) directories:
+
+
+
+
+
Location
+
Purpose
+
More Information
+
+
+
+
+
HTC /home
+
For files less than 1 GB for jobs on the HTC system
For other locations, please email us at chtc@cs.wisc.edu.
+Remember, CHTC data locations are not for long-term storage and are NOT backed up.
+Please review our data policies on the Policies and Expectations for Users page.
+
+
How to Check Your Quotas
+
+
The form asks for the current quotas of the folders you wish to change.
+For individual directories, your quotas are printed on login.
+For group directories at HTC /staging, HPC /home, HPC /scratch, you can retrieve your quotas using the command
+
+
get_quotas /path/to/group/directory
+
+
+
Quota Request Form
+
+
The following link leads to a Qualtrics form that we use for requesting quota changes.
If you do not receive an automated email from chtc@cs.wisc.edu within a few hours of completing the form,
+ OR if you do not receive a response from a human within two business days (M-F), please email chtc@cs.wisc.edu.
No CHTC machine has R pre-installed, so you must configure a portable copy of R to work on the HTC system.
+Using a container as described above is the easiest way to accomplish this.
+
+
Executable
+
+
When using a container, you can use a .R script as the submit file executable, provided that the first line (the “shebang”) in the .R file is
+
+
#!/usr/bin/env Rscript
+
+
+
with the rest of the file containing the commands that you want to run using R.
+
+
Alternatively, you can use a bash .sh script as the submit file executable, and in that file you can use the Rscript command:
+
+
#!/bin/bash
+
+Rscript my-script.R
+
+
+
In this case, remember to include your .R file in the transfer_input_files line of your submit file.
This guide provides an introduction to running jobs outside of CHTC: why
+using these resources is beneficial, what resources are available, and
+how to use them.
Running on other resources in addition to CHTC has one huge benefit:
+size! The UW Grid and OSG include thousands of computers,
+addition to what's already available in CHTC, including specialized
+hardware resources like GPUs. Most CHTC users who run
+on CHTC, the UW Grid, and the OSG can get more than 100,000 computer
+hours (more than 11 years of computing!) in a single day. Read on to
+learn more about these resources.
+
+
+
+
A. UW Grid
+
+
What we call the "UW Grid" is a collection of all the groups and
+centers on campus that run their own high throughput computing pool that
+uses HTCondor. Some of these groups include departments (Biochemistry,
+Statistics) or large physics projects (IceCube, CMS). Through agreements
+with these groups, jobs submitted in CHTC can opt into running on these
+other campus pools if there is space.
+
+
We call sending jobs to other pools on campus flocking.
+
+
+
+
B. UW-Madison’s OSG Pool
+
+
CHTC maintains an OSG pool for the campus community, which includes
+resources contributed by campuses, national labs, and other institutions
+across and beyond the US.
+
+
When you send jobs to other institutions in our OSG pool, we call that gliding.
+
+
+
+
2. Job Qualifications
+
+
Not all jobs will run well outside of CHTC. Because these jobs are
+running all over the campus or country, on computers that don't belong
+to us, they have two major requirements:
+
+
+
+
Moderate Data Sizes: We can support input file sizes of up to
+20 GB per file per job. This covers input files that would normally be
+transferred out of a /home directory or use SQUID, in addition to larger
+files up to 20GB. Outputs per job can be of similar sizes. If your input or
+output files are larger than 1GB, or you have any other questions about
+handling data on resources beyond CHTC, please contact us!
+
+
+
Short or interruptable jobs: Your job can complete in under 10 hours
+-- either it finishes in that amount of time, or it
+self-checkpoints at least that frequently. If you would like to implement
+self-checkpointing for a longer code, we are happy to provide resources
+and guidance.
+
+
+
+
+
+
3. Submitting Jobs to Run Beyond CHTC
+
+
If your jobs meet the characteristics above and you would like to use
+either the UW Grid or OS Pool to run jobs, in addition to CHTC, you can add
+the following to your submit file:
+
+
+
+
+
+WantFlocking = true
+
Also send jobs to other HTCondor Pools on campus (UW Grid) Good for jobs that are less than ~8 hours, on average, or checkpointing jobs.
+
+
+
+WantGlideIn = true
+
Also send jobs to the OS Pool. Good for jobs that are less than ~8 hours, on average, or checkpointing jobs.
+
+
+
+
+
To guarantee maximum efficiency, please do the following steps
+whenever submitting a new type of job to the UW Grid or OSG:
+
+
+
+
Test Your Jobs: You should run a small test (anywhere from
+10-100 jobs) outside CHTC before submitting your full workflow. To
+do this, take a job submission that you know runs successfully on
+CHTC. Then add the following options in the submit file + submit the
+test jobs:
+
+
requirements = (Poolname =!= "CHTC")
+
+
+
(If your submit file already has a requirements = line, you can
+appending the Poolname requirement by using a double ampersand
+(&&) and then the additional requirement.)
Scaling Up: Once you have tested your jobs and they seem to be
+running successfully, you are ready to submit a full batch of jobs
+that uses CHTC and the UW Grid/OS Pool. REMOVE the Poolname
+requirement from the test jobs but leave the +wantFlocking and
++wantGlidein lines.
Software that is packaged in a "container" can
+be run on the HPC cluster. This guide assumes that you are starting with
+an existing Docker container and shows how to use it to run a job on the HPC cluster.
+
+
Note that you may need to install a version of MPI to your container
+when it is initially created. See the notes about this below.
+
+
The two steps to run a container on the HPC cluster:
+Notes about MPI and Containers
+==================
+
+
There are two ways to run a Singularity container integrated with MPI: hybrid
+mode and bind mode.
+
+
In hybrid mode, the container has its own copy of MPI that is compatible
+with a version of MPI already installed on the cluster.
+
+
In bind mode, the code in the container has been compiled with MPI that
+exists outside the container and there is no MPI installation in the container itself.
+Again, the version of MPI used needs to be compatible with one already installed
+on the cluster.
We assume that there is a Docker container (either found
+or created by you) online that you want to use. To use this container
+on the HPC cluster, it needs to be converted to a Singularity-format
+image file. To do this:
+
+
+
Log in to one of the HPC cluster log in nodes.
+
Start an interactive job:
+
[alice@login]$ srun -n4 -N1 -p int --pty bash
+
+
+
Once the interactive job starts, you’ll need to unset a shell environment
+variable that prevents download of the Docker container.
+
[alice@int]$ unset HTTPS_PROXY
+
+
+
Then, save the Docker container to a Singularity image.
+
This command will by default, pull the initial Docker container from
+Docker Hub. If your Docker container is stored elsewhere, or you are
+starting with a Singularity image, contact CHTC staff for specific instructions.
+
+
+
Once the Singularity command completes, type exit to leave the interactive job.
+
+
+
+
+
2. Using Singularity Container Images
+
+
To use a Singularity container in a job, the SLURM submit file will remain mostly the
+same; what will change is the job’s primary command at the end of the
+file. This command will run your primary program inside the container
+file you've downloaded. The main MPI command will still be part of the
+singularity command:
For example, if Alice wanted to run a script she had written
+(poisson.py) inside the downloaded fenics container, using 40 cores, she would use the
+following command at the end of her submit file:
The example shown above uses the “hybrid” model for running MPI, which assumes
+that there is a copy of MPI installed in the container that matches what already
+exists on the cluster.
+
+
If your container does not have it’s own copy of MPI installed, you need
+to use the “bind” model for running MPI which requires an additional flag and
+the location of the main MPI directory:
On CHTC’s cluster, the GCC based version of OpenMPI is installed at the path:
+` /software/chtc/easybuild/v2/software/OpenMPI/4.0.5-GCC-9.3.0/`
+So the command(s) to run the “Alice” example above would be:
More details on the difference between using the “hybrid” and “bind” model
+for MPI and Singularity is here: https://sylabs.io/guides/3.8/user-guide/mpi.html
+In order to run jobs on the High Throughput Computing (HTC) system, researchers need to set up their software on the system.
+This guide introduces how to build software in a container (our recommended strategy), links to a repository with a selection of software installation “recipes”, and quick links to common software packages and their installation recommendations.
Click the link in the table below to jump to the instructions for the language/program/software that you want to use.
+More information is provided in the CHTC Recipes Repository and Containers sections.
+
+
+
+
+
+
+
+
+ Conda
+
+
+
+
+
+
+ Java
+
+
+
+
+
+
+ Julia
+
+
+
+
+
+
+ Matlab
+
+
+
+
+
+
+ Python
+
+
+
+
+
+
+ R
+
+
+
+
+
+
+
+
Quickstart: Conda
+
+
Option A (recommended)
+
+
Build a container with Conda packages installed inside:
This approach may be sensitive to the operating system of the execution point.
+We recommend building a container instead, but are keeping these instructions as a backup.
This approach may be sensitive to the operating system of the execution point.
+We recommend building a container instead, but are keeping these instructions as a backup.
CHTC provides specific examples for software and workflows for use on our systems in our “Recipes” repository on Github:
+https://github.com/CHTC/recipes.
+
+
Links to specific recipes are used in the Software section for certain softwares and coding languages.
+
+
+
+
Containers
+
+
Many of the recipes in our Recipes repository involve building your own container.
+In this section, we provide a brief introduction into how to use containers for setting up your own software to run on the High Throughput system.
+
+
What is a Container?
+
+
“A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another.”
+– Docker
+
+
A container is a portable, self-contained operating system and can be easily executed on different computers regardless of their operating systems or programs.
+When building the container you can choose the operating system you want to use, and can install programs as if you were the owner of the computer.
+
+
+
+
As an analogy, you could consider a container to be like a camping backpack. Every time you plan to use it, you will need a standard set of gear, which you could pre-pack. Other items, like maps, food, or fuel would depend on where you’re going, but you would still have access to the standard gear.
+
+
In the same way, you build a container image by installing your software and any additional dependencies. Jobs that use containers can differ in their tasks or data, but they still have access to the installed software and environment.
+
+
+
+
While there are some caveats, containers are useful for deploying software on shared computing systems like CHTC, where you do not have permission to install programs directly.
+
+
“You can build a container using Apptainer on your laptop, and then run it on many of the largest HPC clusters in the world, local university or company clusters, a single server, in the cloud, or on a workstation down the hall.”
+– Apptainer
+
+
+
What is a ContainerImage?
+
+
A “container image” is the persistent, on-disk copy of the container.
+When we talk about building or moving or distributing a container, we’re actually talking about the file(s) that constitute the container.
+When a container is “running” or “executed”, the container image is used to create the run time environment for executing the programs installed inside of it.
+
+
+
Container Technologies
+
+
There are two container technologies supported by CHTC: Docker and Apptainer.
+Here we briefly discuss the advantages of each.
Docker is a commercial container technology for building and distributing containers.
+Docker provides a platform for distributing containers, called Docker Hub.
+Docker Hub can make it easy to share containers with colleagues without having to worry about the minutiae of moving files around.
+
+
On the HTC system, you can provide the name of your Docker Hub container in your submit file,
+and HTCondor will automatically pull (download) the container and use it to create the software environment for executing your job.
+Unfortunately, however, you are unable to build a Docker container and upload it to Docker Hub from CHTC servers,
+so your container must already exist on Docker Hub in a public repository.
+This requires that you have Docker installed on your computer so that you can build the container and upload it to Docker Hub.
Apptainer is an open-source container technology for building containers.
+Apptainer creates a single, stand-alone file that is the (container image).
+As long as you have the container image file, you can use Apptainer to run your container.
+
+
On the HTC system, you can provide the name of your Apptainer file in your submit file,
+and HTCondor will use a copy of it to create the software environment for executing your job.
+You can use Apptainer to build the container image file on CHTC servers, so there is no need to install the container software on your own computer.
+
+
Use an Existing Container
+
+
If you or a colleague have already built a container for use on CHTC, it is fairly straightforward to modify your jobs to use the container environment as discussed below.
If the container you want to use is hosted on Docker Hub, find the container “address” and provide it in your submit file.
+The address typically has the convention of user/repository:tag, though official repositories such as Python are just repository:tag.
+In your submit file, use
+
+
container_image = docker://user/repository:tag
+
+
+
If the container you want to use is hosted in a different container registry, there should still be a container “address” to use,
+but now there will be a website prefix.
For historical reasons, the Apptainer container file has the file extension .sif.
+The syntax for giving HTCondor the name of the container file depends on where it is located on the CHTC system.
You can build your own container with the operating system and software that you want to use.
+The general process is the same whether you are using Docker or Apptainer.
+
+
+
+
Consult your software’s documentation
+
+
Determine the requirements for installing the software you want to use.
+In particular you are looking for (a) the operating systems it is compatible with and (b) the prerequisite libraries or packages.
+
+
+
Choose a base container
+
+
The base container should at minimum use an operating system compatible with your software.
+Ideally the container you choose also has many of the prerequisite libraries/programs already installed.
+
+
+
Create your own definition file
+
+
The definition file contains the installation commands needed to set up your software.
+(The structure of the container “definition” file differs between Docker and Apptainer, but it is fairly straightforward to translate between the two.)
+
+
+
Build the container
+
+
Once the definition file has been written, you must “build” the container.
+The computer you use to build the container will run through the installation commands, almost as if you were actually installing the software on that computer,
+but will save the results into the container file(s) for later use.
+
+
+
Distribute the container
+
+
To use the container on CHTC servers, you’ll need to distribute the container to right location.
+For Docker containers, this means “pushing” the container to Docker Hub or similar container registry.
+For Apptainer containers, this typically means copying the container .sif file to the /staging system.
A common question is whether the software installation process is repeated each time a container is used.
+The answer is “no”.
+The software installation process only occurs when the container is actually being built.
+Once the container has been built, no changes can be made to the container when being used (on CHTC systems).
+
+
+
Build your own Docker container
+
+
Please follow the instructions in our guide Build a Docker Container Image to build your own container using Docker.
+As mentioned above, you will need to have Docker installed on your own computer.
+This is so that you can push the completed container to Docker Hub.
+
+
You are unable to push containers from CHTC to Docker Hub, so please do not build Docker containers using CHTC!
+
+
Build your own Apptainer container
+
+
Please follow the instructions in our guide Use Apptainer Containers to build your own container using Apptainer.
+You can use CHTC servers to build the container, so there is no need to install any software on your computer.
The contents of the guide previously at this page are not currently
+supported in CHTC, although there are plans to re-integrate them in the
+future. For questions about running Tensorflow in CHTC, email CHTC's
+Research Computing Facilitators at chtc@cs.wisc.edu
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/preview-calendar/uw-research-computing/testing-and-scaling-up.md b/preview-calendar/uw-research-computing/testing-and-scaling-up.md
new file mode 100644
index 000000000..e58332ea6
--- /dev/null
+++ b/preview-calendar/uw-research-computing/testing-and-scaling-up.md
@@ -0,0 +1,131 @@
+
+[title]: - "Optimizing HTCondor Submit File Resource Requests"
+
+[TOC]
+
+# Overview
+
+Much of HTCondor's HTC power comes from the ability to run a large number of jobs simulateously.
+To optimize your work with a high-throughput computing (HTC)
+approach, you will need to test and optimizing the resource requests of those jobs, by
+only requesting the amount of memory, disk, and cpus truly needed.
+This is an important practice that will maximize your throughput by optimizing the
+number of potential 'slots' that your jobs can match to, reducing the overall
+turnaround time for completing a whole batch.
+
+If you have questions or are unsure if and how your work can be broken up, please contact us at
+.
+
+# Requesting the Number of CPUs, Memory, and Disk Space for the HTCondor Submit File
+
+In the HTCondor submit file, you must explicitly request the number of
+CPUs (i.e. cores), and the amount of disk and memory that the job needs
+to complete successfully.
+When you submit a job for the
+first time you may not know just how much to request and that's okay.
+Below are some suggestions for making resource requests for initial test
+jobs. **As always, reviewing the HTCondor `log` file from past jobs is
+a great way to learn about the resource needs of your jobs.**
+
+**Requesting CPU Cores**
+
+- For **requesting CPU cores start by requesting a single cpu**. With single-cpu jobs, you will see
+your jobs start sooner. Ultimately you will be able to achieve
+greater throughtput with single cpus jobs compared to jobs that request
+and use multiple cpus.
+
+ - **Keep in mind, requesting more CPU cores for a job
+ does not mean that your jobs will use more cpus.** Rather, you want to make sure
+ that your CPU request matches the number of cores (i.e. 'threads' or 'processes')
+ that you expect your software to use. (Most softwares only use 1 CPU core, by default.)
+
+ - There is limited support for multicore work in our high throughput system. For large-scale multicore jobs, contact a Research Computing Facilitator at .
+
+**Requesting Disk Space**
+
+- To inform initial disk requests always look at the size of your input
+files. At a minimum, you need to request enough disk to support all
+of the input files, executable, and the output you expect, but don't forget that the standard 'error' and 'output'
+files you specify will capture 'terminal' output that may add up, too.
+
+ - If many of your input and output files are compressed
+(i.e. zipped or tarballs) you will need to factor that into your
+estimates for disk usage as these files will take up additonal space once uncompressed
+in the job.
+
+ - For your initial tests it is okay to request more disk than
+your job may need so that the test completes successfully. **The key
+is to adjust disk requests for subsequent jobs based on the results
+of these test jobs.**
+
+**Requesting Memory**
+
+- Estimating **memory requests** can sometimes be tricky. If you've performed the
+same or similar work on another computer, consider using the amount of
+memory (i.e. RAM) from that computer as a starting point. For instance,
+most laptop computers these days will have 8 or 16 GB of memory, which is okay to start
+with if you know a single job will succeed on your laptop.
+
+ - For your initial tests it is okay to request more memory than
+your job may need so that the test completes successfully. **The key
+is to adjust memory requests for subsequent jobs based on the results
+of these test jobs.** To fine tune your requests, make sure to run test jobs - see below for a recommended process.
+
+**Importance of Test Jobs**
+
+- Once you have run a test job using a small number of jobs, **Review the bottom of the HTCondor `log` files from your test jobs to see how many cpus and how much memory and disk space were used.** HTCondor will report
+the memory, disk, and cpu usage of your jobs in a table at the *bottom* of this file. You can use these values to inform the parameters for future jobs. For example, the bottom of a `.log` file may look like this:
+
+ Partitionable Resources : Usage Request Allocated
+ Cpus : 1 1 1
+ Disk (KB) : 860878 1048576 1808522
+ IoHeavy : 0
+ Memory (MB) : 960 1024 1024
+
+*Memory is listed in units of megabytes (MB) and disk usage is listed in units of kilobytes (KB). A quick Google search yields many calculators to help convert between differnt computing size measurements.*
+
+
+# Always Start With Test Jobs
+
+Submitting test jobs is an important first step for optimizing
+the resource requests of your jobs. We always recommend the following approach whether this is your first time
+using HTC or you are an experienced user starting a new workflow:
+
+**Step 1: Submit a single test job**
+ - Use a single test job to confirm the job was completed successfully and the results are what you expected.
+
+**Step 2: Submit a few (3-10) test jobs using a single submit file**
+ - Once you have a single test job that completes successfully, the next
+ step is to submit a small batch of test jobs (e.g. 3 - 10 jobs)
+ [**using a single submit file**](https://chtc.cs.wisc.edu/uw-research-computing/multiple-jobs). Use this small-scale
+ multi-job submission test to ensure that all jobs complete successfully, produce the
+ desired output, and do not conflict with each other when submitted together. Additionally, by running test jobs, it provides an opportunity to review the `.log` files after each submission to optimize resource requests for future submissions as described above.
+
+**Step 3: Scale up**
+ - If your workflow requires submission of 500 jobs or less, proceed with submitting your entire batch of jobs. If you plan to submit
+ more than 500 jobs, we recommend submitting an intermediate test of 100-1,000 jobs to catch any
+ failures or holds that may mean your jobs have additional `requirements` they may need to specify
+ (and which CHTC staff can help you to identify, based upon your tests).
+
+Some general tips for test jobs:
+
+- Select smaller data sets or subsets of data for your first test jobs. Using
+smaller data will keep the resource needs of your jobs low which will help get
+test jobs to start, and complete, sooner, when you're just making sure that your submit file
+and other logistical aspects of jobs submission are as you want them.
+
+- If possible, submit test jobs that will reproduce results you've gotten
+using another system, this makes for a good "sanity check", as you'll be able
+to compare the results of the test to those previously obtained.
+
+- Give your test jobs, and associated HTCondor `log`, `error`, `output`,
+and `submit` files meaningful names so you know which results refer to which tests.
+
+- After initial tests complete successfully, scale up to larger or full-size
+data sets; **if your jobs may span a range of input file sizes, submit tests using the smallest
+and largest inputs to examine the range of resources that these jobs may need.**
+
+# Get Help
+
+For assistance or questions, please email the CHTC team at [chtc@cs.wisc.edu](mailto:chtc@cs.wisc.edu).
+
diff --git a/preview-calendar/uw-research-computing/testing-jobs.html b/preview-calendar/uw-research-computing/testing-jobs.html
new file mode 100644
index 000000000..7debc1a47
--- /dev/null
+++ b/preview-calendar/uw-research-computing/testing-jobs.html
@@ -0,0 +1,541 @@
+
+
+
+
+
+
+Importance of Testing
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Running your code in test jobs before submitting large job batches is
+CRUCIAL to effective high-throughput computing.
+
+
Why Test?
+
+
Improving your own throughput
+
+
Spending the time and effort to run test jobs will pay off in more
+effective high-throughput computing in the following ways:
+
+
+
Better matching: Pinpointing the required amount of memory/disk
+space will allow your jobs to match to as many computers as
+possible. Requesting excessive amounts of memory and disk space will
+limit the number of slots where your jobs can run, as well as using
+unnecessary space that could be available for other users. That
+said...
+
Fewer holds or evictions: If you don't request *enough*
+memory or disk and your job exceeds its request, it can go on hold.
+Jobs that have not been tested and run over 72 hours are liable to
+be evicted without finishing.
+
Fewer wasted compute hours: the evictions and holds described
+above will be wasted compute hours, decreasing your priority in the
+high-throughput system while not returning any results.
+
Making good choices: knowing how big and how long your jobs are,
+and the size of input/output files will show you how to most
+effectively use CHTC resources. Jobs under 2 hrs or so? Allow your
+jobs to flock and glide to the UW Grid and OS Pool. Input
+files of more than 5 GB? You should probably be using the CHTC large
+file staging area. Longer jobs? Include a line in your submit file
+restricting your jobs to the CHTC servers that guarantee 72 hours.
+
+
+
Being a good citizen
+
+
CHTC's high-throughput system has hundreds to thousands of users,
+meaning that poor computing practices by one user can impact many other
+users. Users who submit jobs that don't finish or are evicted because
+of incorrect memory requests are using hours that could have been used
+by other people. In the worst case, untested code can cause other jobs
+running on an execute server to be evicted, directly harming someone
+else's research process. The best practices listed in these guides
+exist for a reason. Testing your code and job submissions to make sure
+they abide by CHTC recommendations will not only benefit your own
+throughput but make sure that everyone else is also getting a fair share
+of the resource.
+
+
What to Test
+
+
When running test jobs, you want to pay attention to at least the
+following five variables:
+
+
+
disk space
+
memory usage
+
length of job
+
input file size
+
output file size
+
+
+
Memory and disk space simply make sure that your jobs have the resources
+they need to run properly. Memory is the amount of RAM needed by your
+program when it executes; disk space is how much hard drive space is
+required to store your data, executables, and any output files.
+
+
Job length has a huge impact on where your jobs can run. Within a subset
+of CHTC servers, jobs are guaranteed to run for 72 hours. Jobs that run
+for longer than 72 hours will fail, unless they have implemented a
+self-checkpointing method that allows them to resume after being
+evicted. Jobs that are shorter, around 2-4 hours, are good candidates to
+run on the UW Grid and/or OS Pool.
+
+
Input and output file size will impact how your files will be
+transferred to and from the execute nodes. Large input files will need
+to be staged on a proxy server or shared file system; small input files
+can use HTCondor's built-in file transfer system. If you have questions
+about how to handle your data, please email
+chtc@cs.wisc.edu to get in touch with a research
+computing facilitator who can advise you.
+
+
In addition to these considerations, your script/program itself should
+be thoroughly tested on your own machine until it is as bug-free and
+correct as possible. If it uses any libraries or packages, you should
+know what they are and if they have any other dependencies.
+
+
How to Test
+
+
Interactive Jobs
+
+
One of the most useful tools for testing is HTCondor's interactive job
+feature. An interactive job is essentially a job without an executable;
+you are the one running the commands instead, through a bash (shell?)
+session.
+
+
To request an interactive job:
+
+
+
+
Create a submit file as if you were submitting the job normally,
+with one change. Don't include an executable line; instead, list
+your executable file in the transfer_input_files line.
Submitting job(s).
+1 job(s) submitted to cluster 4347054.
+Waiting for job to start...
+
+
+
After a few minutes, the job should match and open an interactive
+session on an execute server, with all the files you listed in
+transfer_input_files You are now on an execute server, much like
+one your jobs will be running on when you submit them to HTCondor.
+Here, you can try running your executable.
+
+
Once you are done, you can type exit to leave the interactive
+session. Note that any files you created during the session will
+be transferred back with you! Another useful tool can be to save
+your history to a file, using the following command:
+
+
$ history > history.txt
+
+
+
+
+
Scale testing
+
+
Once you know that your code works and you can successfully submit one
+job to be run by HTCondor, you should test a few jobs before submitting
+the full-size batch. After these few jobs complete, pay attention to the
+variables described above (memory, disk space, etc.) so you can edit
+your submit files before submitting your entire batch of jobs.
+
+
To find information about memory, disk space and time, look at a job's
+log file. Its name and where it is located may vary, depending on your
+submit process, but once you find it, you should see information like
+this:
+
+
001 (845638.000.000) 03/12 12:48:06 Job executing on host: <128.104.58.85:49163>
+...
+005 (845638.000.000) 03/12 12:48:06 Job terminated.
+ (1) Normal termination (return value 0)
+ Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
+ Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
+ Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
+ Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
+ 17 - Run Bytes Sent By Job
+ 92 - Run Bytes Received By Job
+ 17 - Total Bytes Sent By Job
+ 92 - Total Bytes Received By Job
+ Partitionable Resources : Usage Request Allocated
+ Cpus : 1 1
+ Disk (KB) : 12 1000000 26703078
+ Memory (MB) : 0 1000 1000
+
+
+
The table at the end of the log file shows how many resources you used
+and can be used to fine-tune your requests for memory and disk. If you
+didn't keep track yourself, the log file also lists when the job
+started to execute, and when it ended, thus the length of time required
+for completion.
Thank you for your account request! You should get an email with “[CHTC Requests]” in the subject line within the next
+few minutes and a research
+computing facilitator should be in touch within a business day.
+
+
If you don’t get an email with the “CHTC Requests” subject or hear from a research computing facilitator within a
+day, please check your spam folder; if they aren’t in spam,
+please contact us via the email address at the bottom of this page to confirm that we received your account request.
UW Madison provides a shared data storage for research called ResearchDrive. It
+is possible to transfer files directly between ResearchDrive and CHTC’s systems. The
+instructions in this guide may also work for accessing other data services on campus from CHTC; contact us if you
+would like to know more.
+
+
A. Pre-Requisites
+
+
In order to follow the steps in this guide, you need access to a ResearchDrive share, either as PI or member of your PI’s group, as well as a CHTC account. In what follows,
+we assume that you are transferring files to and from our HTC system, but you can
+use the same process to transfer files to and from the HPC cluster if you first log
+in to one of the HPC login nodes.
+
+
B. Transferring Files
+
+
To transfer data between ResearchDrive and CHTC, do the following:
+
+
+
Log in:
+
+
If you are transferring files to or from a /staging directory, log in to transfer.chtc.wisc.edu.
+
If you are transferring files and or from your /home directory, log into your usual submit server (typically ap2001.chtc.wisc.edu or ap2002.chtc.wisc.edu).
+
+
+
Choose a folder: Navigate to the folder in CHTC (/staging or /home), where you would like to transfer files.
+
Connect to ResearchDrive: Run the following command to connect to ResearchDrive, filling in the username of
+your PI:
+
If your CHTC account is not tied to your campus NetID or you are accessing a data
+storage service that doesn’t use your NetID, you’ll need to omit the -k flag above
+
+
+
Choose a folder, part 2: If you type ls now, you’ll see the files in ResearchDrive, not CHTC.
+Navigate through ResearchDrive (using cd) until you are at the folder where you would
+like to get or put files.
+
Move files: To move files, you will use the get and put commands:
+
+
To move files from CHTC to ResearchDrive, run:
+
smb: \> put filename
+
+
+
To move files from ResearchDrive to CHTC, run:
+
smb: \> get filename
+
+
+
+
+
Finish: Once you are done moving files, you can type exit to leave the connection to ResearchDrive.
+
+
+
Transferring a Batch of Files
+
+
The steps described above work well for transferring a single file, or tar archive of
+many files, at a time and is best for transferring a few files in a session. However,
+smbclient also provides options for transferring many individual files in a single command
+using the * wildcard character.
+
+
To transfer multiple files at once, first you must turn off the smbclient file transfer prompt,
+then use either mget or mput for your file transfer. For example, if you have multiple fastq.gz files
+to transfer to CHTC:
+
+
+
Log in:
+
+
If you are transferring files to or from a /staging directory, log in to transfer.chtc.wisc.edu.
+
If you are transferring files to or from your /home directory, log into your usual submit server (typically ap2001.chtc.wisc.edu or ap2002.chtc.wisc.edu).
+
+
+
Choose a folder: Navigate to the folder in CHTC (/staging or /home), where you would like to put the files.
+
Connect to ResearchDrive: Run the following command to connect to ResearchDrive, filling in the username of
+your PI:
+
Use mget instead of get
+ This command downloads a group of files that end with “fastq.gz” to CHTC.
+
smb: \> mget *.fastq.gz
+
+
+
+
+
As another example, use smbclient to transfer multiple tar.gz output files to ResearchDrive from CHTC
+after your jobs complete:
+
+
+
Log in:
+
+
If you are transferring files to or from a /staging directory, log in to transfer.chtc.wisc.edu.
+
If you are transferring files to or from your /home directory, log into your usual submit server (typically ap2001.chtc.wisc.edu or ap2002.chtc.wisc.edu).
+
+
+
Choose a folder: Navigate to the folder in CHTC (/staging or /home) where your output files are located.
+
Connect to ResearchDrive: Run the following command to connect to ResearchDrive, filling in the username of
+your PI:
+
To transfer files to and from CHTC, you will need the same username and
+hostname information for logging in, as well as understanding
+where your files are and where you would like them to go.
+
+
+
+
A. On the command line
+
+
On Mac, Linux, or modern Windows (10+) systems, you can use the "Terminal" application and
+the scp command to copy files between your computer and CHTC servers.
+
+
Your computer to CHTC
+
+
First, open the "Terminal" application and navigate to the directory
+with the files or directories you want to transfer. Then, use this
+command to copy these files to CHTC:
+
+
$ scp file username@hostname:/home/username
+
+
+
If you would like these files to end up in a different directory inside
+your home directory, just add it to the path at the end of the command.
+
+
CHTC to your computer
+
+
Open the "Terminal" application. Do NOT log into CHTC. Instead,
+navigate to where you want the files to go on your computer. Then, use
+this command to copy these files there:
+
+
$ scp username@hostname:/home/username/file ./
+
+
+
Again, for many files, it will be easiest to create a compressed tarball
+(.tar.gz file) of your files and transfer that instead of each file
+individually.
+
+
+
+
B. Using a file transfer program (Windows/Mac)
+
+
Windows and Mac users can also use special programs to help them
+transfer files between their computers and CHTC. For Windows, we
+recommend WinSCP. It requires the
+same information as Putty (hostname, username), and once it's set up,
+looks like this:
+
+
+
+
The left window is a directory on your computer, the right window is
+your home directory in CHTC. To move files between the two, simply drag
+and drop.
+
+
There are other programs besides WinSCP that do this. Another that works
+on Mac and Windows is called Cyberduck.
+
+
+
+
C. Transferring Multiple Files
+
+
If you are transferring many files, it is advantageous to compress them
+into a single compressed file, in order to facilitate transferring them.
+Place all the files you need in a directory, and then either zip it or
+use the "tar" command to compress them:
+
+
$ tar czf data_files.tar.gz file_directory/
+
+
+
To untar or unzip files on the submit server or head nodes, you can use
+either:
+
+
[alice@submit]$ tar xzf data_files.tar.gz
+
+
+
or
+
+
[alice@submit]$ unzip data_files.zip
+
+
+
+
+
2. Creating and Editing Files in CHTC
+
+
Once you have logged in to a CHTC server, you can edit files from the
+command line, by using a command line file editor. Some common editing
+programs are:
+
+
+
nano
+
vi
+
emacs
+
+
+
nano is the most beginner-friendly, and emacs is the most advanced.
+This Software Carpentry
+lesson describes
+how to use nano, and there are many other resources online with
+instructions for these text editors.
+
+
Some of the file transfer programs mentioned above
+allow you to edit files on CHTC servers through the interface.
This page lists important policies and expectations for using CHTC computing and
+data services. Our goal is to support a community of users and a variety of
+research. If an individual user is taking
+action that negatively impacts our services, we reserve the right to
+deactivate their account or remove files without notice.
+
+
Access and Use
+
+
Use of CHTC services are free to use in support of UW - Madison’s research and
+teaching mission.
+
+
Accounts are linked to individuals and should NOT be shared. We are happy to make new
+accounts for individuals or group-owned spaces for sharing files. Accounts that we
+notice being shared will be immediately disabled and a meeting with the PI
+(faculty advisor) may be necessary to reinstate the account.
+
+
For more information on the process for obtaining an account, see our
+How to Request an Account guide.
+
+
Data Policies
+
+
CHTC data locations are not backed up, and users should
+treat CHTC compute systems as temporary storage locations for active,
+currently-queued computational work. Users should remove data from CHTC
+systems upon completion of a batch of computational work and keep copies of
+all essential files in a non-CHTC location. CHTC staff reserve the right
+to delete data from any CHTC data location at at any time, to preserve
+systems performance, and are not responsible for data loss or file system
+corruption, which are possible in the absence of back-ups.
+
+
CHTC is not HIPAA-compliant and users should not bring HIPAA data into
+CHTC services. If you have data security concerns or any questions about
+data security in CHTC, please get in touch!
+
+
To request a change in the quotas for a storage location, please see
+our Request a Quota Change guide.
+
+
Export Control
+
+
Users agree not to access, utilize, store, or in any way run export controlled data, information,
+programs, etc. on CHTC software, equipment, or computing resources without prior review by the
+UW-Madison Export Control Office.
+
+
Export controlled information is subject to federal government rules on handling and viewing and has
+restrictions on who and where it may be accessed. A license can be required for access by foreign
+persons and in foreign jurisdictions so it’s important to ensure that all legal requirements are
+followed.
+If you have export controlled information that you would like to use on the CHTC, or you are unsure
+if the information you have is export controlled, please contact the Export Control Office at
+exportcontrol@grad.wisc.edu for guidance.
+
+
Note: The CHTC is not compliant with Controlled Unclassified Information (CUI) requirements.
+
+
User Expectations
+
+
Because our systems are shared by many CHTC users, everyone contributes to
+helping the systems run smoothly. The following are some best practices
+to get the most out of CHTC without harming other users. Our goal
+is always to help you get your work done - if you think the following recommendations
+limit your capacity to run work, please contact us to discuss alternatives.
+
+
Never run computationally intensive tasks on the login nodes for either
+system. As a rule of thumb, anything that runs for more than a few seconds, or
+is known to use a lot of cores or memory should not be run directly, but as a job.
+Small scripts and commands (to compress data, create directories,
+etc.) that run within a few minutes on the submit server are okay,
+but their use should be minimized when possible. If you have questions about this,
+please contact the facilitation team. CHTC staff reserve the right to kill any long-running or problematic processes on the
+head nodes and/or disable user accounts that violate this policy
+
+
Avoid unsupervised scripts on the login nodes. Automating tasks via tools like
+cron, watch, or using a workflow manager (not including HTCondor’s DAGMan) on the login node is not allowed without prior
+discussion with the facilitation or infrastructure team.
+
+
+
(HTC system specific): Since use of watch with condor_q is prohibited,
+we recommend using condor_watch_q as an alternative for live updates on your jobs
+in the queue. condor_watch_q is more efficient and will not impair system performance.
+
+
+
Test your jobs. We recommend testing a small version of your overall workflow
+before submitting your full workflow. By testing a smaller version of your jobs,
+you can determine resource requests, runtimes, and whether you may need an increase
+in your user quota. Both our HTC and HPC systems use a fair shair policy and each
+researcher has a user priority. Submitting many jobs that fail or do not produce
+the unexpected output will decrease your user priority without helping you complete
+your research. User priorities naturally reset over time.
✓ Chemical Reaction Predictions ✓ Computer Vision & Artificial Intelligence Decision Making
+
… and much more!
+
+
Who We Are
+
+We are the University of Wisconsin-Madison’s core computational resource provider for large scale computing. UW-Madison staff, students, faculty, and external collaborators are welcome to use the Center for High Throughput Computing's (CHTC) resources to carry out their computationally-intensive research.
+
+CHTC provides a variety of resources and tools to meet the demands of the University’s research community.
+
+We provide no-cost compute resources (CPUs/GPUs, high-memory servers, etc.), as well as no-cost personalized consultations and classroom support.
+
+
+
Your Home for Research Computing is Here
+
+As a leading research institution, UW-Madison needs a leading research computing center. Established in 2006, the Center for High Throughput Computing (CHTC), aims to bring the power of High Throughput Computing (HTC) to all fields of research and to allow the future of HTC to be shaped by insight from all fields. To advance this mission, the CHTC provides researchers access to state-of-the-art High Throughput Computing and High Performance Computing systems, as well as tools for data management and specalized hardware.
+
+Beyond CHTC's compute resources, CHTC’s Research Facilitation team helps researchers of all backgrounds identify their needs for large scale computing, practice skills needed to do their work, and provide support for implementing workflows.
+
+They offer the following services, free-of-charge:
+• Office Hours twice a week
+• Personal consultations to provide individualized guidance
+• Workshops and other informational events
+• Yearly week-long HTC summer school
+• Guest presentations for courses, seminars, and other groups
+• Email support
+
+
diff --git a/preview-calendar/veritas.html b/preview-calendar/veritas.html
new file mode 100644
index 000000000..ad1ddfa0e
--- /dev/null
+++ b/preview-calendar/veritas.html
@@ -0,0 +1,389 @@
+
+
+
+
+
+
+VERITAS and OSG explore extreme window into the universe
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ VERITAS and OSG explore extreme window into the universe
+
+
Understanding the universe has always fascinated mankind. The VERITAS Cherenkov telescope array unravels its secrets by detecting very-high-energy gamma rays from astrophysics sources.
+
+
Gamma-ray astronomy studies the most energetic form of electromagnetic radiation, with photon energies in the GeV – TeV range (gigaelectronvolt to teraelectronvolt). The Very Energetic Radiation Imaging Telescope Array System (VERITAS) uses four 12-meter diameter imaging telescopes. The system uses the Imaging Atmospheric Cherenkov Telescope (IACT) technique to observe gamma rays that cause particle showers in Earth’s upper atmosphere.
+
+
Inside the system, there are four photo-multiplier, 499-pixel cameras, each 0.15 degree in diameter, which record images of the showers by detecting the Cherenkov light emitted by particles in the air shower. The field of view of each camera is 3.5 degrees.
+
+
There are currently three operating IACT arrays: H.E.S.S., MAGIC, and VERITAS. VERITAS is sensitive to very-high-energy (VHE) gamma rays in the energy range between ~80 GeV up to several tens of TeV. It is one of the most sensitive instruments in that energy band.
+
+
+
+
A. Nepomuk Otte, Ph.D., is an assistant professor in the School of Physics at the Georgia Institute of Technology. He works in the Center for Relativistic Astrophysics and is a collaborator in VERITAS.
+
+
“We use the VERITAS telescopes in Arizona to study black holes, the remnants of exploding stars, pulsars, and other objects in the sky,” says Otte. “A high-energy gamma ray has a trillion times more energy than a light particle from the sun. When such a gamma ray hits the atmosphere, it produces millions of electrons and positrons that travel faster than the speed of light through the atmosphere.”
+
+
Otte explains that these charged particles emit a bluish flash of light as they zip through the atmosphere. The VERITAS telescopes collect that light and project it onto special cameras to take an image of the particle shower.
+
+
“In our analysis software, we compare the recorded images with simulated ones to find out if a shower was produced by an actual gamma ray or a cosmic ray, which would be a background event,” says Otte. “We also have to compare our events with simulated ones to reconstruct the energy and its origin in the sky—everything we need for a full reconstruction. For our analysis, it is crucial that we properly simulate our experiment to make sense of the data.”
+
+
Otte relies on the Open Science Pool to run the simulations. “Without simulations, we are blind because the characteristics of each recorded image depend on too many parameters to be described analytically,” says Otte. “We have to repeat every step of the experiment in the computer, from the gamma-ray interaction in the atmosphere up to the point where the digitized photon detector signals are written to disk about 100 million times. That is a very time-consuming process.” Otte then compares each recorded event with simulated ones. “The simulated events that best match the recorded event tell us what the energy of the recorded event was and whether it was a gamma ray or a cosmic ray.”
+
+
VERITAS began recording data ten years ago. Over that time span, VERITAS accumulated 10,000 hours of observations on more than 100 objects. Some objects were observed for more than 300 hours. The analysis of these large data sets is sensitive to even small differences between the experiment and the simulations, which was not important when VERITAS started. Two years ago, the VERITAS collaboration reworked the simulation models to account for these small differences by including more details about the experiment itself.
+
+
“We had to rewrite large fractions of our simulation code,” says Otte. “The added detail also meant we needed more computing power. In the past, we could do our simulations on a few hundred CPUs. Now, we need a hundred times more power because we want to simulate ten times more showers than before.”
+
+
OSG gives the VERITAS collaboration the computing power they need. “Using free cycles that others are not using is almost perfect for us,” says Otte. He and his group started using the OSG in August 2016. Initially, Otte wrote an XSEDE allocation application to use Stampede, and the XSEDE experts recommended OSG as a better fit for the project. “I knew about OSG, having used it for other experiments,” says Otte, “but the shared free cycles in this case was a huge help.”
+
+
Otte says the grand challenge in their field is to look at the air showers in the atmosphere; they see very few gamma rays and just a handful of them observed for over 100 hours. At the same time, millions of background events are recorded that are cosmic rays but look very similar to gamma-ray air showers. “So, our challenge is to dig needles out of a huge haystack,” says Otte. “This has been a huge challenge for decades.”
+
+
It was possible only after realizing the power of image analysis of air showers in the late 1980s to distinguish between gamma-ray events and background events with very high efficiency. The simulations tell what features to look for in the images to suppress the background events in the analysis.
+
+
“Simulations are crucial,” says Otte. “We could not make sense of the data without them. And now with bigger data sets it has become very important to also include aspects of the telescopes that did not matter before. For example, we have now recorded events with energies of several tens of TeV. These events are extremely rare, but we have them in our data. The images of these events are so bright that a lot of the camera pixels saturate. We had not included these saturation effects in our simulations before and thus made large errors in reconstructing the energy of these events.”
+
+
After the VERITAS analysis is done, the data are combined with observations in X-ray, radio, and optical and compared with models that try to explain what happens inside the source. One of the important science drivers for VERITAS is to find the origin of cosmic rays, which is a century-old puzzle. “The remnants of supernovae are prime candidates to accelerate cosmic rays,” says Otte. “In some cases, we resolve the expanding shell in gamma rays and can directly see where the cosmic rays come from.”
+
+
Otte uses the OSG mostly for the simulations. “We don’t need these massive computing resources like other experiments might,” says Otte. “Running the simulations is a single effort that takes a lot of time and a lot of computing resources. Buying resources for such a short time would not be sustainable. The sharing concept of OSG is perfect for us. We borrow the resources, do production, have the data on disk, and then do our science for the next few years on our local computing clusters at the universities.”
+
+
Without the OSG, Otte says they would be stuck with local clusters, and that would hold them back. Another important aspect for the VERITAS collaboration is they have groups across the nation with computing resources, but only the OSG can combine them all into one big virtual computing cluster. That makes them far more productive. “With tens of terabytes of data,” says Otte, “the grid makes things much easier.”
+
+
“With the help of the OSG, we are exploring a new and very exciting window into the most extreme objects in our universe. Like black holes and exploding stars, we study the origin of dark matter, which makes up 25 percent of the universe, and we don’t even understand what it is. We explore the evolution of the universe and can even test the fabric of space-time. VERITAS is a very versatile tool and a world-leading instrument.”
+
+
“As we pursue our research, we develop new technologies and algorithms. These find use in other areas as well. For example, the photon detector technology we use is also used in apparatus for cancer screening and diagnostics. And our algorithms can apply to and be used for other large data sets.”