According to Dictionary.com, the definition for Backup[i] is “a copy or duplicate version…retained for use in the event that the original is in some way rendered unusable” and Synchronize (Sync) is defined as “to occur at the same time or coincide or agree in time.”
Information management has constantly evolved since the advent of the computer industry in the fifties, and today, it remains a vital area for every business and business continuity plan[ii].
A critical part of this is minimizing risks of data loss, its impact, and how quickly data can be restored. Risks that are sometimes unavoidable include physical hardware failure, theft, viruses, or disasters like fire or flood, etc. If no copies of the data exist, a significant amount of time and considerable expense would be required to recover the business information.
Back to Basics
To do a data backup is to copy files (manually or automatically) from one location to another, usually from one physical drive (the “source”) to a backup location (the “target”) ideally situated in a different physical place in a secured environment.
The state of the source and target files are identical until the moment the source files change, which then renders the target files outdated and not mirrored to the source. To reconcile the difference, a new backup is run, which is a one-directional synchronization process that copies the files again from source to target.
The bi-directional synchronization of files and data (‘syncing’) copies the files to both source and target locations and reconciles any differences to ensure a constant replica of the data exists in both places.
For example, if a file is added or changed in Location1, it is copied to Location2 when the sync is run. If a newer file exists in Location2, it is copied to Location1. Similarly, files deleted in Location1 will be deleted from Location2 and vice-a-versa.
Storage Devices and Locations
Backing up data was primarily an enterprise process, until the nineties when personal computers and mobile devices became accessible to the average person.
Computer backups started with punch cards[iii], which were replaced by magnetic tape[iv] in the fifties and became the most widely-used method being a reliable and low cost backup solution for organizations and home users. The tape backup was the industry standard medium, being able to hold large amounts of data. Backups could be performed daily, weekly, or monthly, depending on how many tapes were available for rotation. However, this solution also had its’ pitfalls in that it was a slow process to run a backup or restore the data.
Floppy disks of all sizes were later used before the CD and DVD entered the fore. Hard drives were not considered suitable as a backup medium until the 1980s due to their large physical size, cost, and low storage capacity.
We are seeing constant development of new computers, laptops, and mobile devices, none of which have floppy drives, or even CD and DVD drives, which are slowly phasing out too. Today, backups are more commonly stored on hard drives, flash drives, company networks, and “in the Cloud[v]”.
The amount of data required to backup is a determining factor – for example, it would not be suitable to use a flash drive to backup a server, or practical to backup an entire system to an online location.
Although portable drives and devices are extremely popular, the risk with physical backup files has not changed over the years, as they can be damaged if dropped, or degrade if not stored correctly.
The four most common backup methods[vi] are:
- Full Backup
Stores a copy of all data and usually runs according to a predefined schedule. The data is compressed and the restoration process is relatively easy and straightforward. One aspect to note is that not all data changes between full backups, so multiple copies of the same unchanged data exist, taking up unnecessary storage space.
- Incremental Backup
Only new or changed files are copied since the last backup, which saves on storage space and bandwidth; however this requires more computing resources as files need to be compared before copying. The restoration process could be more challenging as specific files have to be located for recovery and this may require searching through multiple backup sets.
Many organizations use a combination of full and incremental backups, by running a full backup over a weekend and incremental backups on the weekdays.
- Differential Backups
Stores new and changed files since the last full backup was run. For example, if the last full backup was created on Sunday and a new file added on Monday, the file will be included in each differential backup until Sunday, when the next full backup is run.
This method also runs a comparison between current and backed up files, and would require more storage space than an incremental backup.
- Virtual Full Backup
A database is used to manage data backups by making a full replica of the source data once as long as the target location does not change or is removed. The restoration process is similar to a full backup.
Since the early 2000s, file synchronization solutions have become more popular among consumers but also used extensively in business environments to ensure selected data, in different locations, have identical and most recent files.
Syncing can be setup to run between:
- Computers/devices connected to a local area network (LAN);
- Computers/devices connected to the Internet (think of how iTunes[vii] syncs data across multiple Apple devices);
- Computers and external devices.
Syncing can be scheduled to run according to certain rules, e.g. when connected to WiFi, or to only sync at certain times.
This is an effective backup solution as only new or changed files are copied, but file syncing does have its risks.
With the Bring-Your-Own-Device (BYOD) concept[viii] becoming more fashionable, notable concerns are arising around the management and control of business data sprawled across many computers and devices that are connected to different cloud services.
People using file sync applications are exposing personal and business information using applications like iCloud[ix] or Dropbox[x]. This is a significant risk[xi] for businesses where corporate information is stored online and not managed or controlled by the company’s IT department.
This is not dissimilar to how people expose information about themselves and their families on social media, using consumer sync and share applications, live folder contents, storing financial records and passwords, which increases the risk of identity theft and fraud cases[xii].
However, consumers still seem prepared to compromise personal and business security for the convenient and low cost solution of using online file syncing as their main backup medium. Unfortunately for businesses, these consumers could be their employees bringing the sync-and-share habits into the organization.
Users have so much more control over their data today because of the cloud applications that enable creating, storing, and sharing of data. This comes with increased risks for businesses who should extend strict policy measures on syncing and backing up to the cloud.
File synchronization simply copies data being created or modified to another device or location, so the disadvantage here is not being able to roll back to a point in time before the data was lost, like you would when restoring from a backup.
Also, if you delete a file by mistake and then run a sync, the other location will be updated by removing the deleted file. Luckily, some software does not automatically delete files from the second location (the “target” you are syncing to) and users are warned if files exist in the synced drive but missing from the original, thus allowing you to either delete the file from the synced drive or restore it to the original location.
One of the main benefits of syncing versus backup is that less files are copied every time you run a sync and unchanged files are not copied needlessly. This reduces the time needed for syncing compared to a backup, which makes it more practical for frequently scheduled operations.
There are a multitude of backup solutions available that usually depend on budget, security, ease of use, and time. Additional points to consider are:
- What type of data is being backed up i.e. how sensitive is the information?
- How often will the data need to be accessed?
- How long do the backups need to last?
Online data storage is popular with consumers and small-to-medium businesses as there is no investment required to build and support the infrastructure, and may only require a small monthly cost.
Although traditional backup processes have a higher cost on physical devices, they are best suited for large amounts of data and entire systems can be backed up. However, the downside here is where backups are stored offsite, the data is not immediately available for recovery if required.
Online backups are available in real-time and accessible from anywhere (when connected to the internet), and there are copies of the data on servers for redundancy, so less risk of data loss. However, security is the biggest concern. Users and business need to be more cognizant of the data being to online locations.
Finally, whatever solution you choose, always test the backups because they are useless if corrupt or copied with errors!