Confluence XML Format

Last modified by Vincent Massol on 2026/04/08 13:50

Content

Reference

Scope

This document presents the format of the Confluence export packages. It is targeted at:

  • technical people working on Confluence migration tools or tools involving Confluence exports
  • technical people running migrations who need to deeply investigate some issues
  • people curious about the Confluence export format

Introduction

A Confluence export package is a zip file containing an attachments folder, an exportDescriptor.properties file, and an entities.xml file. It also sometimes contains a config and a plugin-data folder as well as other files which we haven't been using so far.

We know about two kinds of Confluence backup packages:

  • A space backup package is produced from the settings of a space in Confluence. The exportDescriptor.properties contains the name of the space selected for export from Confluence. It contains information about the exported space, and notifications, pages, attachments, permissions related to this space. It doesn't contain the attachments folder if the option to leave attachments out was selected.
  • A site backup package is produced from the global administration in Confluence. It contains all the  spaces, as well as users and groups, but it doesn't contain the attachments folder.

The entities.xml file

Overview

This is an XML 1.0 UTF-8 file that looks like a dump of the hibernate database of Confluence. It is close to the Confluence SQL schema , but not exactly the same. The differences probably come from their Hibernate configuration.

It starts with an XML prolog, and everything is contained in a hibernate-generic root node that has a datetime attribute that contains the date of the export following the YYY-MM-DD HH:mm:ss format.

<?xml version="1.0" encoding="UTF-8"?>
<hibernate-generic datetime="2013-10-14 16:05:52">

The root hibernate-generic contains object nodes all sorts of objects representing what is in a Confluence instance

Common concepts

Here is what an object looks like, with its usual indentation as it appears in a typical entities.xml file from Confluence server:

<object class="Page" package="com.atlassian.confluence.pages">
<id name="id">753689</id>
<property name="position"/><collection name="children" class="java.util.Collection"><element class="Page" package="com.atlassian.confluence.pages"><id name="id">753692</id>
</element>
</collection>
<property name="space" class="Space" package="com.atlassian.confluence.spaces"><id name="id">786435</id>
</property>
<property name="title"><![CDATA[privatespace Home]]></property>
<collection name="bodyContents" class="java.util.Collection"><element class="BodyContent" package="com.atlassian.confluence.core"><id name="id">819224</id>
</element>
</collection>
<property name="version">1</property>
<property name="creatorName"><![CDATA[admin]]></property>
<property name="creationDate">2013-10-14 15:37:24.463</property>
<property name="lastModifierName"><![CDATA[admin]]></property>
<property name="lastModificationDate">2013-10-14 15:37:24.463</property>
<property name="versionComment"><![CDATA[]]></property>
<property name="contentStatus"><![CDATA[current]]></property>
<collection name="comments" class="java.util.Collection"><element class="Comment" package="com.atlassian.confluence.pages"><id name="id">753690</id>
</element>
</collection>
</object>

An object has a type defined by its class and package attributes. In theory, the package attribute cannot be ignored. In practice, it can.

it has an id (defined with an id node), and properties (defined with property and collection nodes) which depends on the type of the object.

We list the 3 node types that can appear in a object node.

id nodes

This node defines the object unique identifier, which we will call id in the rest of this document. It appears exactly once per object. Although it probably cannot be relied upon, we've always seen appear as the first child node of the object. it usually has a name attribute with the value "id", except for ConfluenceUserImpl objects where it has the value "key".

property nodes

This node defines a property with a primitive type.

The name of the property is given by the name attribute.

Dates follow the YYYY-MM-DD HH:mm:ss.xxx format.

Strings (including enum values) are (apparently always) in a CDATA section, and numbers and dates are not.

collection nodes

This node defines a property with a value that is a collection of objects.

The name of the property is given by the name attribute. The Java type of the collection is given by the class attribute. It can be an interface or a concrete type. From what we have seen, collections always contain objects ids that are put in <id name="id"> nodes, themselves each put inside element nodes having a class and a package attributes that are equal to those of the node of the pointed objects. Said differently, collection nodes contain element nodes, each one containing exactly one id node that contains the id of an object.

Warning

The type advertised by the class attribute doesn't lie: you usually cannot rely on any order. There's no guarantee a java.util.Collection is ordered. The order in which collections are output probably depends on whatever the database engine used by Confluence returned and then on whatever the actual implementation of java.util.Collection used by Confluence returns elements when iterated over.

For example, attachments or child documents are not (necessarily) ordered by date or by version number.  When parsing a collection, if you need a certain order, you need to implement a sort method that uses dates, or versions, or more likely a combination of both because it happens that one of the fields is missing) stored in properties of the pointed objects.

See for instance:

Notes on order, ids, relationship and how many things can(not) be relied upon

  • Objects appear in no particular order that could be relied upon, or so it seems. It is very well possible that an object references another that has not yet been dumped.
  • Each object has a unique id although we don't currently rely on this in filter module (but this is not a promise) (e.g. we have not seen a Page object having the same id as a User object)
  • The id order cannot be relied upon. An older object can have a greater id. We believe this can happen because some import / backup restore mechanism at Confluence doesn't preserve the ids (the handling of ids might be left to the database engine, and since they are dumped in backups in no particular order, they are not created in the database in the same order as before the backup, or something like this).
  • There is some unreliable duplication in how objects declare their relationship, and in particular their parents and children, and sometimes their ancestors. Usually, everything is there but we've noticed this cannot be relied upon. Sometimes, one way is missing for some reason. For this reason, one needs to implement both ways when parsing the export package.

Referencing a user

Objects are usually referenced using their id. For users, we found 3 ways it is done:

  • The InternalUser id, which is a regular number
  • The ConfluenceUserImpl id (what we call the "user key"), which appears to be an hexadecimal string
  • The user name, found in the name property of ConfluenceUserImpl and of InternalUser objects

Usual creation and modification properties

Many object types have the following properties in common. We describe them here once to avoid repetition.

  • creatorName: the username of the creator of the page. Used by older versions of Confluence. See also creator, which is used by newer versions. In the general case, you'll have to check both properties.
  • creator: the user key of the creator of the page (and not of the revision!), which you can turn into a username using the corresponding ConfluenceUserImpl object. Used by more recent versions of Confluence. See also creatorName for the property used by older versions. In the general case, you'll have to check both properties. See also lastModifier and lastModifierName for the user who created this specific revision.
  • creationDate: the creation date of the first revision of the page. See also lastModificationDate.
  • lastModificationDate: the creation date of this specific page revision
  • lastModifierName: the username of the user who created this revision. Used by older versions of Confluence. See also lastModifier, used by more recent versions of Confluence. In the general case, you'll have to check both properties. See also creator and creatorName for the user who created the first revision of this page.
  • lastModifier: the user key of the user who created this revision. Used by more recent versions of Confluence. See also lastModifierName, used by older versions of Confluence. In the general case, you'll have to check both properties. See also creator and creatorName for the user who created the first revision of this page.

Known object types

bucket.user.propertyset.BucketPropertySetItem

<object class="BucketPropertySetItem" package="bucket.user.propertyset">
<composite-id><property name="entityName" type="string"><![CDATA[CWD_admin]]></property>
<property name="entityId" type="long">0</property>
<property name="key" type="string"><![CDATA[confluence.user.runtime.recent-changes.size]]></property>
</composite-id>
<property name="type">2</property>
<property name="booleanVal">false</property>
<property name="doubleVal">0.0</property>
<property name="stringVal"/><property name="textVal"/><property name="longVal">0</property>
<property name="intVal">30</property>
<property name="dateVal"/></object>

We don't yet use these objects. See https://docs.atlassian.com/ConfluenceServer/javadoc/8.2.0-m27/bucket/user/propertyset/BucketPropertySetItem.html

com.atlassian.confluence.core.BodyContent

<object class="BodyContent" package="com.atlassian.confluence.core">
<id name="id">819222</id>
<property name="body"><![CDATA[<p>Comment on homepage of space 2</p>]]></property>
<property name="content" class="Comment" package="com.atlassian.confluence.pages"><id name="id">753687</id>
</property>
<property name="bodyType">2</property>
</object>

The content of a comment or a page.

The body property contains the content in a CDATA section, in the syntax defined in the bodyType property. Here are the body types we know about:

  • 0: this is the old Confluence wiki syntax (the default)
  • 1: this is raw character data
  • 2: this is the XHTML storage format.

See also https://docs.atlassian.com/atlassian-confluence/6.6.0/com/atlassian/confluence/core/BodyType.html

The content property refers to the object this BodyContent object describes the body of and works like the element nodes, with the class and package attributes and the id name="id" child. Here are the two types of content having body contents we know about:

  • com.atlassian.confluence.pages.Comment
  • com.atlassian.confluence.pages.Page
Warning

The Confluence XHTML Syntax is an XML dialect that can contain CDATA sections. Since it's already stored in a CDATA section in the content property, their trick is to add a space to the CDATA end tag. ]]> becomes ]] >. You will have to pre-process this before parsing this value.

<object class="OutgoingLink" package="com.atlassian.confluence.links">
<id name="id">950286</id>
<property name="destinationPageTitle"><![CDATA[space1 Home]]></property>
<property name="destinationSpaceKey"><![CDATA[SPACE1]]></property>
<property name="sourceContent" class="Page" package="com.atlassian.confluence.pages"><id name="id">753668</id>
</property>
<property name="creatorName"><![CDATA[admin]]></property>
<property name="creationDate">2013-10-14 15:34:06.814</property>
<property name="lastModifierName"><![CDATA[admin]]></property>
<property name="lastModificationDate">2013-10-14 15:34:06.814</property>
</object>

Describes an outgoing link. We haven't used those so far. OutgoingLink objects have the usual creation and modification properties.

com.atlassian.confluence.mail.notification.Notification

<object class="Notification" package="com.atlassian.confluence.mail.notification">
<id name="id">983041</id>
<property name="page" class="Page" package="com.atlassian.confluence.pages"><id name="id">753668</id>
</property>
<property name="userName"><![CDATA[admin]]></property>
<property name="creatorName"><![CDATA[admin]]></property>
<property name="creationDate">2013-10-14 15:07:38.873</property>
<property name="lastModifierName"><![CDATA[admin]]></property>
<property name="lastModificationDate">2013-10-14 15:07:38.873</property>
<property name="digest">false</property>
<property name="network">false</property>
<property name="type" enum-class="ContentTypeEnum" package="com.atlassian.confluence.search.service"/></object>

A notification ("watch") setting. We don't yet use these objects. See https://docs.atlassian.com/atlassian-confluence/5.10.8/com/atlassian/confluence/mail/notification/Notification.html

com.atlassian.confluence.pages.Attachment

<object class="Attachment" package="com.atlassian.confluence.pages">
<id name="id">884739</id>
<property name="fileName"><![CDATA[Config.xml]]></property>
<property name="contentType"><![CDATA[text/xml]]></property>
<property name="content" class="Page" package="com.atlassian.confluence.pages"><id name="id">753668</id>
</property>
<property name="creatorName"><![CDATA[admin]]></property>
<property name="creationDate">2013-10-14 15:05:29.969</property>
<property name="lastModifierName"><![CDATA[admin]]></property>
<property name="lastModificationDate">2013-10-14 15:07:38.630</property>
<property name="fileSize">308</property>
<property name="comment"/><property name="attachmentVersion">1</property>
</object>

Describes an attachment. The structure looks quite like that of #Page# objects. See the Page section for the following fields: navigationType, contentStatus, space. Attachment objects also have the usual creation and modification properties.

Specific fields:

  • title: the file name of the attachment in the wiki (not in the backup package). See also fileName.
  • lowerTitle: the lower case version of the title property.
  • fileName: the older name of the title property.
  • contentType: the mime type of the file.
  • content: an element-like property pointing to the content containing this attachment. See also containerContent.
  • containerContent: the former name help of the content property.
  • fileSize: the size of the file, in bytes.
  • comment: the user comment attached to this version of this attachment (note: XWiki doesn't have an equivalent feature at the time of this was written)
    • attachmentVersion: a increasing number giving the revision number of this attachment. It's supposed to be unique per attachment. See also version.
  • version: another name for attachmentVersion. Supposedly the new name of the property.
  • originalVersion: an element-like value pointing to the last revision of the attachment. See also originalVersionId
  • originalVersionId: a number version of the original version property. Sometimes this property is used instead of originalVersion. It is unclear when. This property can be present and empty. In this case, it should be analyzed as if it were not present at all.
  • historicalVersions: the older revisions of an attachment.
  • imageDetailsDTO: ???
Information

The actual files are only there on space exports if the attachment export was not disabled. In this case, they are in the attachments folder (see the next section).

Information

In practice, some kinds of corruptions may require you to sort attachments by both the version and the modification date, and detect and remove duplicates.

com.atlassian.confluence.pages.Page

<object class="Page" package="com.atlassian.confluence.pages">
<id name="id">753692</id>
<property name="position"/><property name="parent" class="Page" package="com.atlassian.confluence.pages"><id name="id">753689</id>
</property>
<collection name="ancestors" class="java.util.List"><element class="Page" package="com.atlassian.confluence.pages"><id name="id">753689</id>
</element>
</collection>
<property name="space" class="Space" package="com.atlassian.confluence.spaces"><id name="id">786435</id>
</property>
<property name="title"><![CDATA[Private page 1]]></property>
<collection name="bodyContents" class="java.util.Collection"><element class="BodyContent" package="com.atlassian.confluence.core"><id name="id">819226</id>
</element>
</collection>
<property name="version">1</property>
<property name="creatorName"><![CDATA[admin]]></property>
<property name="creationDate">2013-10-14 15:37:52.357</property>
<property name="lastModifierName"><![CDATA[admin]]></property>
<property name="lastModificationDate">2013-10-14 15:37:52.357</property>
<property name="versionComment"><![CDATA[]]></property>
<property name="contentStatus"><![CDATA[current]]></property>
</object>
<collection name="historicalVersions" class="java.util.Collection"><element class="Page" package="com.atlassian.confluence.pages"><id name="id">753670</id>
</element>
<element class="Page" package="com.atlassian.confluence.pages"><id name="id">753675</id>
</element>
<element class="Page" package="com.atlassian.confluence.pages"><id name="id">753676</id>
</element>
<element class="Page" package="com.atlassian.confluence.pages"><id name="id">753677</id>
</element>
<element class="Page" package="com.atlassian.confluence.pages"><id name="id">753678</id>
</element>
<element class="Page" package="com.atlassian.confluence.pages"><id name="id">753684</id>
</element>
</collection>
<property name="originalVersion" class="Page" package="com.atlassian.confluence.pages"><id name="id">753668</id>
</property>

This describes a Confluence page revision. Here are the properties we know about:

  • position: an integer giving its position in the navigation menu of the space. See https://jira.xwiki.org/browse/CONFLUENCE-261
  • ancestors: a collection of ids referring to the parents of the page up to but excluding the space: its direct parent, the direct parent of its direct parent, and so on and so forth. We have not been relying on this property.
  • space: the id of the Space object describing the space in which the Page is
  • title: the title of the page, which is supposed to be unique in the whole space
  • lowerTitle: the lowercase version of the title, also supposed to be unique in the whole space
  • bodyContents: a collection that contains the id of the object describing the content of the page. This property is a collection but we have ever seen exactly one element in this collection. It is unclear why a collection is used here.
  • version: a number which is the revision number of the page. It is supposed to be unique across a page and its historical revisions. In practice, we've seen duplicate versions in some exports, not clear where it comes from, probably some sort of corruption. See for instance https://jira.xwiki.org/browse/CONFLUENCE-427
  • versionComment: a comment written by the user as save time to describe this version
  • contentStatus: contains the status of this Page. Here are the known values:
    • current:  the page is current
    • draft: the page is a draft. We currently discard these pages.
    • deleted: the page was deleted. We currently discard these pages.
  • originalVersion: this property is set only on historical versions of the page, and points to the last version of the page. Only older revisions have this property, the last revision doesn't have it and that's how you know a Page object describes the last revision of a Page. See also originalVersionId.
  • originalVersionId: like originalVersion, used by older Confluence versions, directly a number instead of an element-like value.
  • navigationType: ???
  • historicalVersions: a unordered collection of the older revisions of the page. Only the last revision of the page has this property.
  • children or childrens: an unordered collection of the last versions of the direct children of the page (note: it's sometimes childrens with an s at the end, sometimes children without the s)
  • attachments: an unordered collection of the attachments, including their older versions
  • comments: an unordered collections of comments
  • outgoingLinks: an unordered collection of outgoing links
  • contentPermissionSets: a collection of permission sets of type ContentPermissionSet, which are sets of permissions applying to this content

Page objects also have the usual creation and modification properties.

Information

A page that doesn't have several revisions don't have an originalVersion property nor a historicalVersions property.

Information

The id of the last revision of a page is stable. In particular, a page that gets an additional revision still keeps this id because it it always the id of its last revision. That's what we call stable id or stableId in the XWiki Confluence project.

Our understanding is that when a revision is added, it is like if the following were happening:

  1. Object N of the last revision is copied to a new object M
  2. Object M now describes  the former last revision
  3. Object M gets a originalVersion property pointing to object N
  4. The historicalVersions property is removed from object M
  5. M is added to the historicalVersions property of object N

Page revisions appear to have their ids changed when they stop being the last revisions (again, so the page can keep its "original" id).

This is likely why you can notice that ids in the historicalVersions property are often higher than the id of the "original" object itself. But remember: ⚠ this is not guaranteed!

Warning

Be careful: the term original may feel backwards. The first (oldest) version of a page is not the original version. It was the original version once in its life: when it was the current revision. The original version is the current version.
It makes sense if you think of "original" as referring to the "original" object that was copied each time a revision was created, and that kept the "original" id of the page.

com.atlassian.confluence.security.ContentPermission

<object class="ContentPermission" package="com.atlassian.confluence.security">
<id name="id">1048577</id>
<property name="type"><![CDATA[View]]></property>
<property name="userName"><![CDATA[admin]]></property>
<property name="groupName"/><property name="owningSet" class="ContentPermissionSet" package="com.atlassian.confluence.security"><id name="id">1015809</id>
</property>
<property name="creatorName"/><property name="creationDate">2013-10-14 15:41:26.893</property>
<property name="lastModifierName"><![CDATA[admin]]></property>
<property name="lastModificationDate">2013-10-14 15:41:26.893</property>
</object>

Other properties:

  • type: the name of the permission of all the content permissions in this set. Note: The content permissions themselves also have a type property, with the same value.
  • owningSet: an element-like value pointing to the content permission set in which this content permission is
  • userName: the name of the user to which the permission applies.
  • groupName: the name of the group to which the permission applies.

ContentPermission objects also have the usual creation and modification properties.

com.atlassian.confluence.security.ContentPermissionSet

  <object class="ContentPermissionSet" package="com.atlassian.confluence.security">
    <id name="id">67043333</id>
    <property name="type"><![CDATA[View]]></property>
    <collection name="contentPermissions" class="java.util.SortedSet">
      <element class="ContentPermission" package="com.atlassian.confluence.security">
        <id name="id">67076114</id>
      </element>
      <!-- ... cut ... -->
      <element class="ContentPermission" package="com.atlassian.confluence.security">
        <id name="id">152338667</id>
      </element>
    </collection>
    <property name="owningContent" class="Page" package="com.atlassian.confluence.pages">
      <id name="id">66719934</id>
    </property>
    <property name="creationDate">2015-01-14 10:21:23.000</property>
    <property name="lastModificationDate">2019-07-31 09:54:26.000</property>
  </object>

ContentPermissionSet objects have the usual creation and modification properties.

Other properties:

  • type: the name of the permission of all the content permissions in this set. Note: The content permissions themselves also have a type property, with the same value.
  • owningContent: an element-like property pointint to the content to which the permissions of this content permission set applies
  • contentPermissions: a collection of content permissions

com.atlassian.confluence.security.SpacePermission

  <object class="SpacePermission" package="com.atlassian.confluence.security">
    <id name="id">617742337</id>
    <property name="space" class="Space" package="com.atlassian.confluence.spaces">
      <id name="id">622593</id>
    </property>
    <property name="type"><![CDATA[COMMENT]]></property>
    <property name="group"/>
    <property name="allUsersSubject"><![CDATA[anonymous-users]]></property>
    <property name="creator" class="ConfluenceUserImpl" package="com.atlassian.confluence.user">
      <id name="key"><![CDATA[01f7c1ca483b2b1c01483b2d4f4206cc]]></id>
    </property>
    <property name="creationDate">2023-11-10 14:11:48.022</property>
    <property name="lastModifier" class="ConfluenceUserImpl" package="com.atlassian.confluence.user">
      <id name="key"><![CDATA[01f7c1ca483b2b1c01483b2d4f4206cc]]></id>
    </property>
    <property name="lastModificationDate">2023-11-10 14:11:48.022</property>
  </object>

A space permission.

Other properties:

  • type: the name of the permission
  • space: the space which the space permission applies to
  • group: the name of the group to which the permission apply. Empty if it doesn't apply to a group
  • allUsersSubject: equals to anonymous-users if the permission applies to guests
  • userSubject: an element-like value with a <id name="key"> containing the key of the user, described by a ConfluenceUserImpl object, to which the permission applies

SpacePermission objects have the usual creation and modification properties.

com.atlassian.confluence.setup.bandana.ConfluenceBandanaRecord

<object class="ConfluenceBandanaRecord" package="com.atlassian.confluence.setup.bandana">
<id name="id">43</id>
<property name="context"><![CDATA[_GLOBAL]]></property>
<property name="key"><![CDATA[__DEFAULT_SPACE_PERMISSIONS____GROUP_NAMES__]]></property>
<property name="value"><![CDATA[<set>
  <string>confluence-users</string>
</set>]]></property>
</object>

We don't yet use these objects.

com.atlassian.confluence.spaces.Space

  <object class="Space" package="com.atlassian.confluence.spaces">
    <id name="id">622593</id>
    <property name="name"><![CDATA[Great Internal Documentation]]></property>
    <property name="key"><![CDATA[Great]]></property>
    <property name="lowerKey"><![CDATA[great]]></property>
    <property name="description" class="SpaceDescription" package="com.atlassian.confluence.spaces">
      <id name="id">589825</id>
    </property>
    <property name="homePage" class="Page" package="com.atlassian.confluence.pages">
      <id name="id">589826</id>
    </property>
    <collection name="permissions" class="java.util.Collection">
      <element class="SpacePermission" package="com.atlassian.confluence.security">
        <id name="id">1277959</id>
      </element>
      <element class="SpacePermission" package="com.atlassian.confluence.security">
        <id name="id">1277960</id>
      </element>
      <!-- ... cut ... -->
      <element class="SpacePermission" package="com.atlassian.confluence.security">
        <id name="id">617742337</id>
      </element>
    </collection>
    <collection name="pageTemplates" class="java.util.Collection">
      <element class="PageTemplate" package="com.atlassian.confluence.pages.templates">
        <id name="id">110723073</id>
      </element>
      <element class="PageTemplate" package="com.atlassian.confluence.pages.templates">
        <id name="id">199458822</id>
      </element>
    </collection>
    <property name="creator" class="ConfluenceUserImpl" package="com.atlassian.confluence.user">
      <id name="key"><![CDATA[01f7c1ca483b2b1c01483b2d4db002d4]]></id>
    </property>
    <property name="creationDate">2008-04-23 11:24:41.000</property>
    <property name="lastModifier" class="ConfluenceUserImpl" package="com.atlassian.confluence.user">
      <id name="key"><![CDATA[01f7c1ca483b2b1c01483b2d4db002d4]]></id>
    </property>
    <property name="lastModificationDate">2009-07-10 10:16:53.000</property>
    <property name="spaceType">global</property>
    <property name="spaceStatus" enum-class="SpaceStatus" package="com.atlassian.confluence.spaces">CURRENT</property>
  </object>

A space. Its known properties are:

Space objects have the usual creation and modification properties.

com.atlassian.confluence.spaces.SpaceDescription

<object class="SpaceDescription" package="com.atlassian.confluence.spaces">
<id name="id">753665</id>
<property name="space" class="Space" package="com.atlassian.confluence.spaces"><id name="id">786433</id>
</property>
<property name="title"/><collection name="bodyContents" class="java.util.Collection"><element class="BodyContent" package="com.atlassian.confluence.core"><id name="id">819201</id>
</element>
</collection>
<property name="version">1</property>
<property name="creatorName"><![CDATA[admin]]></property>
<property name="creationDate">2013-10-14 14:53:25.489</property>
<property name="lastModifierName"><![CDATA[admin]]></property>
<property name="lastModificationDate">2013-10-14 14:53:25.489</property>
<property name="versionComment"><![CDATA[]]></property>
<property name="contentStatus"><![CDATA[current]]></property>
<collection name="labellings" class="java.util.Collection"><element class="Labelling" package="com.atlassian.confluence.labels"><id name="id">720901</id>
</element>
</collection>
</object>

The description of a space. For a description of its properties, see Page (the two types of objects are very similar). SpaceDescription objects have the usual creation and modification properties.

com.atlassian.confluence.user.ConfluenceUserImpl

    <object class="ConfluenceUserImpl" package="com.atlassian.confluence.user">
        <id name="key"><![CDATA[01f7c1cc638e0d8c0163d05ca6f60124]]></id>
        <property name="name"><![CDATA[47826731]]></property>
        <property name="lowerName"><![CDATA[47826731]]></property>
        <property name="email"/>
    </object>

An object that represents a user.

Information

The id is not a number but the user key and the name attribute of the id tag is "key"

  • name: the name of the user
  • lowerName: the lowercase version of the name of the user
  • email: the email address of the user (optional)

com.atlassian.confluence.user.persistence.dao.ConfluenceRememberMeToken

<object class="ConfluenceRememberMeToken" package="com.atlassian.confluence.user.persistence.dao">
<id name="id">393217</id>
<property name="username"><![CDATA[admin]]></property>
<property name="createdTime">1381745929067</property>
<property name="token"><![CDATA[251b5b4649888218a9c81ddf30b66029b63f83d5]]></property>
</object>

We don't use these objects.

com.atlassian.confluence.users.PersonalInformation

<object class="PersonalInformation" package="com.atlassian.confluence.user">
<id name="id">753694</id>
<property name="username"><![CDATA[user1]]></property>
<property name="title"/><property name="version">1</property>
<property name="creatorName"><![CDATA[admin]]></property>
<property name="creationDate">2013-10-14 15:42:39.535</property>
<property name="lastModifierName"><![CDATA[admin]]></property>
<property name="lastModificationDate">2013-10-14 15:42:39.535</property>
<property name="versionComment"><![CDATA[]]></property>
<property name="contentStatus"><![CDATA[current]]></property>
</object>

We don't yet use these objects (which seem to have the usual creation and modification properties). See https://docs.atlassian.com/atlassian-confluence/6.6.0/com/atlassian/confluence/user/PersonalInformation.html

com.atlassian.crowd.embedded.hibernate2.HibernateMembership

<object class="HibernateMembership" package="com.atlassian.crowd.embedded.hibernate2">
<id name="id">294915</id>
<property name="parentGroup" class="InternalGroup" package="com.atlassian.crowd.model.group"><id name="id">163842</id>
</property>
<property name="userMember" class="InternalUser" package="com.atlassian.crowd.model.user"><id name="id">229378</id>
</property>
</object>

We don't yet use these objects.

com.atlassian.crowd.model.application.DirectoryMapping

<object class="DirectoryMapping" package="com.atlassian.crowd.model.application">
<id name="id">131073</id>
<property name="application" class="ApplicationImpl" package="com.atlassian.crowd.model.application"><id name="id">65537</id>
</property>
<property name="directory" class="DirectoryImpl" package="com.atlassian.crowd.model.directory"><id name="id">98305</id>
</property>
<property name="allowAllToAuthenticate">true</property>
<collection name="allowedOperations" class="java.util.Set"><element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">UPDATE_ROLE_ATTRIBUTE</element>
<element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">DELETE_USER</element>
<element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">UPDATE_USER_ATTRIBUTE</element>
<element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">CREATE_USER</element>
<element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">DELETE_ROLE</element>
<element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">CREATE_ROLE</element>
<element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">CREATE_GROUP</element>
<element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">UPDATE_USER</element>
<element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">UPDATE_GROUP_ATTRIBUTE</element>
<element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">UPDATE_GROUP</element>
<element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">DELETE_GROUP</element>
<element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">UPDATE_ROLE</element>
</collection>
</object>

We don't use these objects.

com.atlassian.crowd.model.group.InternalUser

  <object class="InternalUser" package="com.atlassian.crowd.model.user">
    <id name="id">163842</id>
    <property name="name"><![CDATA[UserName]]></property>
    <property name="lowerName"><![CDATA[username]]></property>
    <property name="active">true</property>
    <property name="createdDate">2016-05-10 15:00:02.760</property>
    <property name="updatedDate">2016-05-10 15:00:02.760</property>
    <property name="firstName"><![CDATA[User]]></property>
    <property name="lowerFirstName"><![CDATA[user]]></property>
    <property name="lastName"><![CDATA[Name]]></property>
    <property name="lowerLastName"><![CDATA[name]]></property>
    <property name="displayName"><![CDATA[User Name]]></property>
    <property name="lowerDisplayName"><![CDATA[user name]]></property>
    <property name="emailAddress"><![CDATA[[email protected]]]></property>
    <property name="lowerEmailAddress"><![CDATA[[email protected]]]></property>
  </object>

These objects only seem to be in Site backups, not space exports. They describe users supposedly registered directly in Confluence.

com.atlassian.crowd.model.group.InternalGroup

<object class="InternalGroup" package="com.atlassian.crowd.model.group">
<id name="id">163843</id>
<property name="name"><![CDATA[twistedgroup]]></property>
<property name="lowerName"><![CDATA[twistedgroup]]></property>
<property name="active">true</property>
<property name="local">false</property>
<property name="createdDate">2013-10-14 15:43:47.360</property>
<property name="updatedDate">2013-10-14 15:43:47.360</property>
<property name="description"/><property name="type" enum-class="GroupType" package="com.atlassian.crowd.model.group">GROUP</property>
<property name="directory" class="DirectoryImpl" package="com.atlassian.crowd.model.directory"><id name="id">98305</id>
</property>
</object>

These objects only seem to be in Site backups, not space exports. They describe a group of users. They are used when group imports are enabled. It's usually better to import groups from a central user directory like LDAP. 

com.atlassian.crowd.model.user.InternalUserAttribute

<object class="InternalUserAttribute" package="com.atlassian.crowd.model.user">
<id name="id">262152</id>
<property name="user" class="InternalUser" package="com.atlassian.crowd.model.user"><id name="id">229379</id>
</property>
<property name="directory" class="DirectoryImpl" package="com.atlassian.crowd.model.directory"><id name="id">98305</id>
</property>
<property name="name"><![CDATA[passwordLastChanged]]></property>
<property name="value"><![CDATA[1381758208148]]></property>
<property name="lowerValue"><![CDATA[1381758208148]]></property>
</object>

We don't yet use these objects.

attachments folder

The folder is in space exports if the attachment export was not disabled. To our knowledge, attachments are not present in site exports. In this case, attachment version v (version or attachmentVersion property), original id originalAttachmentId of content originalContentId is expected to be at attachments/<originalContentId>/<originalAttachmentId>/<v>.

The export descriptor file (exportDescriptor.properties)

The export descriptor file gives information about the export. It is a propeties file, each line containing a key=value pair.
It starts with a comment containing the date at which the export was done.

#Mon Feb 10 10:23:09 UTC 2025
ao.data.version.min.com.atlassian.mywork.mywork-confluence-host-plugin=1.1.30
ao.data.version.com.atlassian.mywork.mywork-confluence-host-plugin=1000.0.0-fa970f983392
createdByVersionNumber=1000.0.0-fa970f983392
source=cloud
buildNumber=4515
ao.data.list=com.atlassian.mywork.mywork-confluence-host-plugin, com.atlassian.confluence.plugins.confluence-space-ia
spaceKey=attachhist
ao.data.version.min.com.atlassian.confluence.plugins.confluence-space-ia=5.0
defaultUsersGroup=confluence-users
ao.data.version.com.atlassian.confluence.plugins.confluence-space-ia=1000.0.0-fa970f983392
exportType=space
createdByBuildNumber=8401
timezoneId=UTC
inlineTasksFileIncluded=true
backupAttachments=true

Here are some interesting properties:

  • `source`: its value can be `server` or `cloud`. If it is absent, `server` can be assumed. It is interesting because Confluence Cloud and Confluence Server have significant differences.
  • `exportType`: its value is `all` for a site backup, and `space` for a space backup.
  • `spaceKey`: its value is the key of the exported spaces for space backups. Note: this can be used to handle https://jira.atlassian.com/browse/CONFSERVER-22853, a bug in Confluence Server that makes it export several spaces when one was asked (see https://jira.xwiki.org/browse/CONFLUENCE-296)
  • `backupAttachments`: whether exporting attachments was enabled for this backup.

Quirks and how to handle them

The entities.xml can have a leading space in its name

We've seen this sporadically. This makes the import fail fast. You'll need to remove the leading space from the filename. Should this happen more often, we shall add a workaround in Confluence XML.

Space exports sometimes contain several spaces

A bug in Confluence Server sometimes makes it export several spaces when one was asked (https://jira.atlassian.com/browse/CONFSERVER-22853.

We work around this issue in confluence-xml by not importing the extraneous space by default (https://jira.xwiki.org/browse/CONFLUENCE-296).

Parsing entities.xml may not work out of the box because of the presence of control characters

We fully work around this issue by transparently stripping them at parse time. See:

The export can be corrupted in ways the entities.xml file contains illegal XML characters in body contents

XML parsers don't like weird non-unicode characters. With confluence-xml, you'll have a stack trace like this one:

3/8/2025 5:56:26 PM Failed to read package
org.xwiki.filter.FilterException: Failed to analyze the package index
 at org.xwiki.contrib.confluence.filter.input.ConfluenceXMLPackage.read(ConfluenceXMLPackage.java:893)
 at org.xwiki.contrib.confluence.filter.internal.input.ConfluenceInputFilterStream.preparePackage(ConfluenceInputFilterStream.java:391)
 at org.xwiki.contrib.confluence.filter.internal.input.ConfluenceInputFilterStream.readInternal(ConfluenceInputFilterStream.java:331)
 at org.xwiki.contrib.confluence.filter.internal.input.ConfluenceInputFilterStream.read(ConfluenceInputFilterStream.java:229)
 at org.xwiki.contrib.confluence.filter.internal.input.ConfluenceInputFilterStream.read(ConfluenceInputFilterStream.java:106)
 at org.xwiki.filter.input.AbstractBeanInputFilterStream.read(AbstractBeanInputFilterStream.java:79)
 at org.xwiki.filter.internal.job.FilterStreamConverterJob.runInternal(FilterStreamConverterJob.java:97)
 at org.xwiki.job.AbstractJob.runInContext(AbstractJob.java:246)
 at org.xwiki.job.AbstractJob.run(AbstractJob.java:223)
 at org.xwiki.filter.script.internal.ScriptFilterStreamConverterJob.run(ScriptFilterStreamConverterJob.java:75)
 at com.xwiki.confluencepro.internal.ConfluenceMigrationJob.runInternal(ConfluenceMigrationJob.java:159)
 at org.xwiki.job.AbstractJob.runInContext(AbstractJob.java:246)
 at org.xwiki.job.AbstractJob.run(AbstractJob.java:223)
 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
 at java.base/java.lang.Thread.run(Thread.java:840)
3/8/2025 5:56:26 PM Exception thrown during job execution
org.xwiki.filter.FilterException: Failed to read package
 at org.xwiki.contrib.confluence.filter.internal.input.ConfluenceInputFilterStream.preparePackage(ConfluenceInputFilterStream.java:400)
 at org.xwiki.contrib.confluence.filter.internal.input.ConfluenceInputFilterStream.readInternal(ConfluenceInputFilterStream.java:331)
 at org.xwiki.contrib.confluence.filter.internal.input.ConfluenceInputFilterStream.read(ConfluenceInputFilterStream.java:229)
 at org.xwiki.contrib.confluence.filter.internal.input.ConfluenceInputFilterStream.read(ConfluenceInputFilterStream.java:106)
 at org.xwiki.filter.input.AbstractBeanInputFilterStream.read(AbstractBeanInputFilterStream.java:79)
 at org.xwiki.filter.internal.job.FilterStreamConverterJob.runInternal(FilterStreamConverterJob.java:97)
 at org.xwiki.job.AbstractJob.runInContext(AbstractJob.java:246)
 at org.xwiki.job.AbstractJob.run(AbstractJob.java:223)
 at org.xwiki.filter.script.internal.ScriptFilterStreamConverterJob.run(ScriptFilterStreamConverterJob.java:75)
 at com.xwiki.confluencepro.internal.ConfluenceMigrationJob.runInternal(ConfluenceMigrationJob.java:159)
 at org.xwiki.job.AbstractJob.runInContext(AbstractJob.java:246)
 at org.xwiki.job.AbstractJob.run(AbstractJob.java:223)
 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
 at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: org.xwiki.filter.FilterException: Failed to analyze the package index
 at org.xwiki.contrib.confluence.filter.input.ConfluenceXMLPackage.read(ConfluenceXMLPackage.java:893)
 at org.xwiki.contrib.confluence.filter.internal.input.ConfluenceInputFilterStream.preparePackage(ConfluenceInputFilterStream.java:391)
 ... 14 more

You will need to identify the problematic content and the problematic characters. Then, we know of two main ways to deal with this issue:

  1. Removing the bad characters from the pages in Confluence and then reexporting without history (with history is not possible, since the older versions of the pages contain the problematic characters).
  2. Remove the bad characters by editing entities.xml

One way of idenfitying the problematic characters is to run a failing import with a debugger connected to the XWiki instance, and set breakpoints in ConfluenceXMLPackage from the stack trace, and identify the affected content(s), or work with a modified version of ConfluenceXMLPackage that logs the ids of objects it parses and see where it stops. Then, you will need to extract the relevant body content and inspect it in some editor that lets you see special characters or some hexadecimal viewer / editor.

You can try to use unix tools to search occurences of the illegal characters you identified. For example, let's say we identified this sequence of problematic characters: perl -nle '\xEF\xBF\xBF`

You can count the occurences using the following command:

perl -nle '$c+=scalar(()=m/\xEF\xBF\xBF/g);END{print $c}' entities.xml

or:

unzip -p myexport.xml.zip entities.xml | perl -nle '$c+=scalar(()=m/\xEF\xBF\xBF/g);END{print $c}'

You can remove these characters in place using sed (make sure you can get the original entities.xml file)

sed -i 's/\xEF\xBF\xBF//g' entities.xml

and then add back entities.xml in a copy of the export zip archive.

Another example, with grep, where we had some content ending with 34004 NULL (\0) characters error:

grep --only-matching -a -P '\x00' entities.xml | wc -l
# answer: 34004
sed -i 's/\x00//g' entities.xml

If you work on separate files instead of in place, you can inspect the difference of size before and after fixing:

ls -la original/entities.xml fixed/entities.xml
-rw-r--r-- 1 raph raph 530607291 11 mars  09:02 fixed/entities.xml
-rw-r--r-- 1 raph raph 530641295 11 mars  08:51 original/entities.xml

We have 530641295 - 530607291 = 34004. You can also check that only the problematic line(s) were modified by running diff:

diff original/entities.xml fixed/entities.xml
Information

For debugging and working with a Confluence packages, see Exploring Confluence Exports.

Get Connected