Archiving an application’s data to a data vault takes time, effort and is a costly undertaking. The data migration must be planned, executed, tested and documented. Decisions have to be taken on which reports are required, then these also have to be built, documented and tested. What if there was a quicker, cheaper and better way to achieve this?
Great news – there is! Virtualization!!!
Well, Sort Of. Maybe.
Virtualization is certainly a valid technique for archiving applications and comes with many advantages. The process of virtualization is commonly used for production applications for cost, resiliency or efficiency reasons. Using specialized software a virtual image of the application, its operating system and any associated database servers is created which is then deployed to a server using hypervisor software which allows multiple virtual machines to operate on the same server.
For the archiving use case the same process of creating a virtual machine is followed, but typically the application’s vitual machine is only restored upon demand.
Compared to a data only archive there are many advantages, including:
- Full reporting functionality of the original system.
- Quicker archiving process.
- Proprietary data formats are readable by the native software.
So Far, So Good. What’s the Catch?
There’s always a catch. And in the case of virtualization there are many catches.
Sustainability
If an application’s data only has to be sustained for a couple of years then long term issues do not have to be considered, but many applications are subject to years or decades of data retention and legal hold can extend even short retention periods. Once you are into a multi-year model then you have to consider:
- How many versions of the hypervisor software will support all components of the archived application? How much support will Windows Server 2003 get from a vendor in 2033 releasing a new version of their hypervisor software?
- When you upgrade your hypervisor software how many archived applications’ virtual images will need to be restored for testing?
Periodic Testing
Continuing with the sustainability topic, in the Life Sciences industry we are obliged to demonstrate periodically that our archived data is still accessible. When data is archived to a specialized archiving system then that archiving system has to be tested, and potentially hundreds of assets can be validated in a single action. Every virtualized system has to be restored in order to demonstrate that it is still viable, creating an additional ongoing cost for each virtualization.
Licensing
How long does your license for the application last? Unless you have purchased a perpetual license for the application you will need a valid license to run the application when it is restored.
And you also have to consider the implications of any physical keys such as dongles that are required for the application to operate.
Cybersecurity
Spinning up an application that has been archived for 10 years is the electronic version of opening Tutankhamen’s tomb (I know, I know … there is no curse. But it’s still a great analogy ). The virtual image of the application will have had no security patches since the virtual image was created so it cannot be restored to the network.
Good restoration procedure is that the hypervisor software is installed onto a newly imaged computer, the virtual image is copied to the computer which is then disconnected from the network. Only then is the virtual image restored and all access has to be done physically at the computer. Once the data retrieval is completed the virtual image will be dismounted from the hypervisor, then the computer fully reimaged to remove any trace of the virtual machine.
There is a common misconception that an application can be virtualized as an archiving method and then be accessed by the business in the same manner as when it was in production. This is simply not true – that describes a production application that is subject to normal security patching protocols.
Data Privacy Compliance
Data compliance regulations apply to data wherever it exists in your company’s landscape, and just because the data has been archived does not exempt you from your legal obligations. These legal obligations can require that Personally Identifiable Information (PII) data to be destroyed (purged or obfuscated) upon request or after a set time, or localized to the country of origin.
The destruction of the data might be assisted by the tools within the virtualized application but most legacy applications will have little or no support for data privacy regulations and so the destruction of the data will require direct access to the restored database after which a new virtualized image must be created, tested and archived with the original virtual image being deleted from the archive.
The localization challenge is almost insurmountable for applications with PII for multiple countries. Obviously the application can only be restored to a single country which could be a violation of PII regulations, and even once restored segmenting the application into multiple instances would be a large effort requiring resources and knowledge that will not be available. Add on potential licensing issues for now having multiple instances?
In short, don’t virtualize applications with PII!!!
Compliance Training
If the application is subject to regulatory training then anyone accessing the application in years to come will also have to take the training. This requires that the training course is stored along with the archive and training records may have to be maintained outside of your corporate training application.
Counterpoints
Virtualization can be a valuable technique in certain circumstances. It is quicker and cheaper than standard data archives, provides full reporting capabilities and can interpret proprietary data sets.
Conclusion
In specific situation virtualization can be a valuable technique for archiving an application. However, longevity and licensing must be considered when making the decision, and there must be strong security controls placed on restored applications that limit the access. Additionally virtualized applications containing Personally Identifiable Information can become a significant liability.