SMS Backup & Restore XML Converter - smstoxml

Table of Contents
  1. Background
    1. Backing up Emojis
      1. Emoji Problem
      2. Emoji Solution
      3. Parsing Issues
    2. Viewing Backups on Firefox
  2. smstoxml
    1. Usage
    2. Download

Background

Keeping backups of your SMS messages is useful for having copies of important conversations, or for moving messages off your device to external storage. The app I chose to use was SyncTech's SMS Backup & Restore. This app backs up SMS, MMS, and call logs to local storage, Google Drive, Dropbox, or OneDrive. You can even view the XML backups in your browser with the .xsl file provided.

Note: Browsers may block the xml files because of CORS settings. You can allow this in Firefox by setting privacy.file_unique_origin to false in about:config.

Backing up Emojis

Emoji Problem

In the app settings, you can select to store emojis and special characters, but the option notes that it "stores invalid XML characters" and it won't be usable outside the app itself. At first I was fine with it, I didn't need to back up emojis and could leave it disabled. But it wasn't long before I realized that emojis added a small something that wasn't conveyed in the emoji-less backups, which I never thought I would need.

Trying to view a backup with emojis on Firefox led to this problem:

XML Parsing Error: reference to invalid character number

These numbers are supposed to be surrogate pairs for representing emoji characters, but these big numbers are outside of XML's valid character set.

Emoji Solution

Well, we can escape the ampersand of the invalid XML characters so they aren't parsed as invalid characters:

<sms address="..." body="Nice! Thank you!!! &amp;amp;#55357;&amp;amp;#56846;..." ...

That would make the XML parser happy and able to display the messages in-browser, but now we just see a bunch of numbers instead of emojis, which isn't ideal. But now we can add a little Javascript to the .xsl file the app provided us to convert those escaped characters back into emojis.

Parsing Issues

Because I wanted to make a tool to manipulate the backups, I decided to use an XML parser instead of doing a simple regex replacement. The problem with fixing parsing issues in XML is that you can't actually parse the XML to fix it. The Python libraries I tried either couldn't parse the invalid XML, or just stripped out what I was trying to fix, so, I had to escape the invalid characters before feeding it through.

Viewing Backups on Firefox

The .xsl file tells the browser how the XML backup should be displayed, but the default one is quite bland and just throws all the messages together ungrouped. Since I added Javascript to it for the emoji parsing, I decided to go a bit further and add an option to filter by numbers/contacts and improve the layout.

Old sms.xml layout:

Old sms.xsl results

(click to enlarge)

New sms.xml layout:

New sms.xsl results

(click to enlarge)

smstoxml

I wrote a python script called smstoxml that is able to fix the invalid XML characters. smstoxml can manipulate the SMS and the call backup file, in addition to converting emojis to valid XML, smstoxml can:

  • remove contacts/numbers
  • remove no-duration calls +
  • keep only certain contacts/numbers
  • shrink embedded image sizes *
  • export embedded media *

* - only for SMS file

+ - only for call file

Usage

Converting Exported XML Backup

Running smstoxml on the exported XML file will convert the emojis into valid XML, which can then be viewed on a browser:

python3 smstoxml.py sms.xml sms-converted.xml

Make sure that the modified sms.xsl is in the same directory as the XML backup for viewing.

Importing Messages Back onto the Device

Convert the valid emoji characters back into invalid XML so the app can correctly restore the backup:

python3 smstoxml.py --revert sms-converted.xml sms-reverted.xml

Replacing/Normalizing Contact Numbers

Sometimes, the backup will contain different styles of numbers for the messages.

The number +1-123-555-0123 may be found in the file in these formats for John Doe:

To normalize them for easier parsing, viewing, filtering, etc.:

python3 smstoxml.py sms.xml sms-converted.xml --replace-number "John Doe" "+11235550123"

Removing Messages

To delete messages from certain contacts, use:

python3 smstoxml.py sms.xml sms-converted.xml --filter-contact "John Doe" --filter-contact "Jane Doe" --filter-number "+11235550123" --remove-filtered

Extracting Media

python3 smstoxml.py sms.xml --extract-media "media.zip"

Download

smstoxml can be downloaded on GitHub.