In part 2 of this series, I identified the absence of simple standalone Linux command line tools that support PREMIS implementaiton. To address this gap, I decided to create Premissh, a BASH script that can automatically generate PREMIS XML for an input file(s). This post outlines my thoughts and reflections on creating this tool.
The FOSS (Free and Open-Source Software) Ethos
When creating Premissh, I developed the code to be FOSS. This meant I’d make the source code publicly accessible and distribute it free of charge. This suited the core impetus of the project, since I wanted the tool to be accessible to organisations with restricted resources (see part 2), who may not adopt it if they had to pay a fee.
I also hoped that having the tool open-source would provide transparency in how it operated, facilitating better feedback and possibly collaboration with the digital curation community. I think this would be an ideal outcome, transitioning to a situation where the community controls the tool’s development. This would enable the project to better address contemporary digital curation challenges, as different contributors could offer different perspectives and continue developing the tool beyond the finite time I can provide. However, I am unsure whether a project like Premissh could have the impact required, to gain this signifianct level of community support (see this Open Preservation Foundation blog). This would depend on others being interested in the project, and having a sufficient level of technical coding skill to support it. Nevertheless, I think the potential benefits of this open approach are preferable, so I have tried to angle the tool in this direction, which will make it possible for this level of engagement in the future.

How Premissh Works
At the time of writing this blog, Premissh has a basic level of functionality. To make it open-source and natively compatible with the Linux command line, I decided to write it primarily as a BASH script. It was designed to use third-party applications, DROID and ExifTool, which extract relevant metadata from the target file, this is subsequently converted into PREMIS XML using XSLT code (parsed by Saxon). Ultimately, I think this initial development was successful, as I was able to get the script working, so it could produce a conformant PREMIS XML document from a file.
However, due to the short amount of time I could commit to the initial development of Premissh, it currently has several limitations. For example, the tool offers no options to change which metadata elements are included in the PREMIS XML output. This could be addressed by introducing a flag system, enabling users to select different options that adjusts the tool’s configuration, so it can be tailored to their particular needs. Furthermore, the tool would benefit from further testing, to identify any potential bugs in the code and rectify them. This highlights the high level of commitment required to build a robust tool. To address this, I think having a project roadmap would be useful, to confirm which areas of development should be prioritised, so the time I commit to developing the tool is maximising impact. Nevertheless, this points to the avenues for Premissh to continue to be developed in the future, as discussed further in part 4.

Should Digital Curators Learn to Code?
A key question I considered while creating Premissh is whether digital curators should learn coding. I am currently keen to advance my own knowledge of coding, since this directly relates to my professional interest in preserving complex digital objects, which includes source code. As my coding skills have developed, I’ve found this knowledge has opened up new avenues for me to solve problems with digital technology (see this blog by Ross Spencer, for more on the innovative potential of coding for archives). Premissh is a great example of this, as I would not have been able to build this tool without knowledge around Linux and BASH scripting. Therefore, I think its enormously beneficial to have people with coding skills within the digital curation network, to provide deeper understanding of the technologies we work with and find solutions for the problems we face.
However, I do not think all digital curators need to code. Learning this skill is a significant undertaking, requiring engagement with specialised technical concepts and conventions, while also keeping abreast of the rapid developments in the field (see this techcrunch article). This can be an unnecessary commitment when pre-built tools already exist, to sufficiently support the preservation and management of most types of digital objects. I do think coding is a very valuable skill in digital curation, but I want the community to embrace a diverse range of skills, background and positionalities. I think this will foster more creative collaboration, as we learn from each other through our different and changing perspectives.
See part 4 for the conclusion to this blog series and my final reflections on the project.
