Confluence XML Format
Reference
- Scope
- Introduction
- The entities.xml file
- Overview
- Common concepts
- Known object types
- bucket.user.propertyset.BucketPropertySetItem
- com.atlassian.confluence.core.BodyContent
- com.atlassian.confluence.links.OutgoingLink
- com.atlassian.confluence.mail.notification.Notification
- com.atlassian.confluence.pages.Attachment
- com.atlassian.confluence.pages.Page
- com.atlassian.confluence.security.ContentPermission
- com.atlassian.confluence.security.ContentPermissionSet
- com.atlassian.confluence.security.SpacePermission
- com.atlassian.confluence.setup.bandana.ConfluenceBandanaRecord
- com.atlassian.confluence.spaces.Space
- com.atlassian.confluence.spaces.SpaceDescription
- com.atlassian.confluence.user.ConfluenceUserImpl
- com.atlassian.confluence.user.persistence.dao.ConfluenceRememberMeToken
- com.atlassian.confluence.users.PersonalInformation
- com.atlassian.crowd.embedded.hibernate2.HibernateMembership
- com.atlassian.crowd.model.application.DirectoryMapping
- com.atlassian.crowd.model.group.InternalUser
- com.atlassian.crowd.model.group.InternalGroup
- com.atlassian.crowd.model.user.InternalUserAttribute
- attachments folder
- The export descriptor file (exportDescriptor.properties)
- Quirks and how to handle them
Scope
This document presents the format of the Confluence export packages. It is targeted at:
- technical people working on Confluence migration tools or tools involving Confluence exports
- technical people running migrations who need to deeply investigate some issues
- people curious about the Confluence export format
Introduction
A Confluence export package is a zip file containing an attachments folder, an exportDescriptor.properties file, and an entities.xml file. It also sometimes contains a config and a plugin-data folder as well as other files which we haven't been using so far.
We know about two kinds of Confluence backup packages:
- A space backup package is produced from the settings of a space in Confluence. The exportDescriptor.properties contains the name of the space selected for export from Confluence. It contains information about the exported space, and notifications, pages, attachments, permissions related to this space. It doesn't contain the attachments folder if the option to leave attachments out was selected.
- A site backup package is produced from the global administration in Confluence. It contains all the spaces, as well as users and groups, but it doesn't contain the attachments folder.
The entities.xml file
Overview
This is an XML 1.0 UTF-8 file that looks like a dump of the hibernate database of Confluence. It is close to the Confluence SQL schema , but not exactly the same. The differences probably come from their Hibernate configuration.
It starts with an XML prolog, and everything is contained in a hibernate-generic root node that has a datetime attribute that contains the date of the export following the YYY-MM-DD HH:mm:ss format.
<?xml version="1.0" encoding="UTF-8"?>
<hibernate-generic datetime="2013-10-14 16:05:52">
The root hibernate-generic contains object nodes all sorts of objects representing what is in a Confluence instance
Common concepts
Here is what an object looks like, with its usual indentation as it appears in a typical entities.xml file from Confluence server:
<object class="Page" package="com.atlassian.confluence.pages">
<id name="id">753689</id>
<property name="position"/><collection name="children" class="java.util.Collection"><element class="Page" package="com.atlassian.confluence.pages"><id name="id">753692</id>
</element>
</collection>
<property name="space" class="Space" package="com.atlassian.confluence.spaces"><id name="id">786435</id>
</property>
<property name="title"><![CDATA[privatespace Home]]></property>
<collection name="bodyContents" class="java.util.Collection"><element class="BodyContent" package="com.atlassian.confluence.core"><id name="id">819224</id>
</element>
</collection>
<property name="version">1</property>
<property name="creatorName"><![CDATA[admin]]></property>
<property name="creationDate">2013-10-14 15:37:24.463</property>
<property name="lastModifierName"><![CDATA[admin]]></property>
<property name="lastModificationDate">2013-10-14 15:37:24.463</property>
<property name="versionComment"><![CDATA[]]></property>
<property name="contentStatus"><![CDATA[current]]></property>
<collection name="comments" class="java.util.Collection"><element class="Comment" package="com.atlassian.confluence.pages"><id name="id">753690</id>
</element>
</collection>
</object>An object has a type defined by its class and package attributes. In theory, the package attribute cannot be ignored. In practice, it can.
it has an id (defined with an id node), and properties (defined with property and collection nodes) which depends on the type of the object.
We list the 3 node types that can appear in a object node.
id nodes
This node defines the object unique identifier, which we will call id in the rest of this document. It appears exactly once per object. Although it probably cannot be relied upon, we've always seen appear as the first child node of the object. it usually has a name attribute with the value "id", except for ConfluenceUserImpl objects where it has the value "key".
property nodes
This node defines a property with a primitive type.
The name of the property is given by the name attribute.
Dates follow the YYYY-MM-DD HH:mm:ss.xxx format.
Strings (including enum values) are (apparently always) in a CDATA section, and numbers and dates are not.
collection nodes
This node defines a property with a value that is a collection of objects.
The name of the property is given by the name attribute. The Java type of the collection is given by the class attribute. It can be an interface or a concrete type. From what we have seen, collections always contain objects ids that are put in <id name="id"> nodes, themselves each put inside element nodes having a class and a package attributes that are equal to those of the node of the pointed objects. Said differently, collection nodes contain element nodes, each one containing exactly one id node that contains the id of an object.
Notes on order, ids, relationship and how many things can(not) be relied upon
- Objects appear in no particular order that could be relied upon, or so it seems. It is very well possible that an object references another that has not yet been dumped.
- Each object has a unique id although we don't currently rely on this in filter module (but this is not a promise) (e.g. we have not seen a Page object having the same id as a User object)
- The id order cannot be relied upon. An older object can have a greater id. We believe this can happen because some import / backup restore mechanism at Confluence doesn't preserve the ids (the handling of ids might be left to the database engine, and since they are dumped in backups in no particular order, they are not created in the database in the same order as before the backup, or something like this).
- There is some unreliable duplication in how objects declare their relationship, and in particular their parents and children, and sometimes their ancestors. Usually, everything is there but we've noticed this cannot be relied upon. Sometimes, one way is missing for some reason. For this reason, one needs to implement both ways when parsing the export package.
Referencing a user
Objects are usually referenced using their id. For users, we found 3 ways it is done:
- The InternalUser id, which is a regular number
- The ConfluenceUserImpl id (what we call the "user key"), which appears to be an hexadecimal string
- The user name, found in the name property of ConfluenceUserImpl and of InternalUser objects
Usual creation and modification properties
Many object types have the following properties in common. We describe them here once to avoid repetition.
- creatorName: the username of the creator of the page. Used by older versions of Confluence. See also creator, which is used by newer versions. In the general case, you'll have to check both properties.
- creator: the user key of the creator of the page (and not of the revision!), which you can turn into a username using the corresponding ConfluenceUserImpl object. Used by more recent versions of Confluence. See also creatorName for the property used by older versions. In the general case, you'll have to check both properties. See also lastModifier and lastModifierName for the user who created this specific revision.
- creationDate: the creation date of the first revision of the page. See also lastModificationDate.
- lastModificationDate: the creation date of this specific page revision
- lastModifierName: the username of the user who created this revision. Used by older versions of Confluence. See also lastModifier, used by more recent versions of Confluence. In the general case, you'll have to check both properties. See also creator and creatorName for the user who created the first revision of this page.
- lastModifier: the user key of the user who created this revision. Used by more recent versions of Confluence. See also lastModifierName, used by older versions of Confluence. In the general case, you'll have to check both properties. See also creator and creatorName for the user who created the first revision of this page.
Known object types
bucket.user.propertyset.BucketPropertySetItem
<object class="BucketPropertySetItem" package="bucket.user.propertyset">
<composite-id><property name="entityName" type="string"><![CDATA[CWD_admin]]></property>
<property name="entityId" type="long">0</property>
<property name="key" type="string"><![CDATA[confluence.user.runtime.recent-changes.size]]></property>
</composite-id>
<property name="type">2</property>
<property name="booleanVal">false</property>
<property name="doubleVal">0.0</property>
<property name="stringVal"/><property name="textVal"/><property name="longVal">0</property>
<property name="intVal">30</property>
<property name="dateVal"/></object>
We don't yet use these objects. See https://docs.atlassian.com/ConfluenceServer/javadoc/8.2.0-m27/bucket/user/propertyset/BucketPropertySetItem.html
com.atlassian.confluence.core.BodyContent
<object class="BodyContent" package="com.atlassian.confluence.core">
<id name="id">819222</id>
<property name="body"><![CDATA[<p>Comment on homepage of space 2</p>]]></property>
<property name="content" class="Comment" package="com.atlassian.confluence.pages"><id name="id">753687</id>
</property>
<property name="bodyType">2</property>
</object>
The content of a comment or a page.
The body property contains the content in a CDATA section, in the syntax defined in the bodyType property. Here are the body types we know about:
- 0: this is the old Confluence wiki syntax (the default)
- 1: this is raw character data
- 2: this is the XHTML storage format.
See also https://docs.atlassian.com/atlassian-confluence/6.6.0/com/atlassian/confluence/core/BodyType.html
The content property refers to the object this BodyContent object describes the body of and works like the element nodes, with the class and package attributes and the id name="id" child. Here are the two types of content having body contents we know about:
- com.atlassian.confluence.pages.Comment
- com.atlassian.confluence.pages.Page
com.atlassian.confluence.links.OutgoingLink
<object class="OutgoingLink" package="com.atlassian.confluence.links">
<id name="id">950286</id>
<property name="destinationPageTitle"><![CDATA[space1 Home]]></property>
<property name="destinationSpaceKey"><![CDATA[SPACE1]]></property>
<property name="sourceContent" class="Page" package="com.atlassian.confluence.pages"><id name="id">753668</id>
</property>
<property name="creatorName"><![CDATA[admin]]></property>
<property name="creationDate">2013-10-14 15:34:06.814</property>
<property name="lastModifierName"><![CDATA[admin]]></property>
<property name="lastModificationDate">2013-10-14 15:34:06.814</property>
</object>
Describes an outgoing link. We haven't used those so far. OutgoingLink objects have the usual creation and modification properties.
com.atlassian.confluence.mail.notification.Notification
<object class="Notification" package="com.atlassian.confluence.mail.notification">
<id name="id">983041</id>
<property name="page" class="Page" package="com.atlassian.confluence.pages"><id name="id">753668</id>
</property>
<property name="userName"><![CDATA[admin]]></property>
<property name="creatorName"><![CDATA[admin]]></property>
<property name="creationDate">2013-10-14 15:07:38.873</property>
<property name="lastModifierName"><![CDATA[admin]]></property>
<property name="lastModificationDate">2013-10-14 15:07:38.873</property>
<property name="digest">false</property>
<property name="network">false</property>
<property name="type" enum-class="ContentTypeEnum" package="com.atlassian.confluence.search.service"/></object>
A notification ("watch") setting. We don't yet use these objects. See https://docs.atlassian.com/atlassian-confluence/5.10.8/com/atlassian/confluence/mail/notification/Notification.html
com.atlassian.confluence.pages.Attachment
<object class="Attachment" package="com.atlassian.confluence.pages">
<id name="id">884739</id>
<property name="fileName"><![CDATA[Config.xml]]></property>
<property name="contentType"><![CDATA[text/xml]]></property>
<property name="content" class="Page" package="com.atlassian.confluence.pages"><id name="id">753668</id>
</property>
<property name="creatorName"><![CDATA[admin]]></property>
<property name="creationDate">2013-10-14 15:05:29.969</property>
<property name="lastModifierName"><![CDATA[admin]]></property>
<property name="lastModificationDate">2013-10-14 15:07:38.630</property>
<property name="fileSize">308</property>
<property name="comment"/><property name="attachmentVersion">1</property>
</object>Describes an attachment. The structure looks quite like that of #Page# objects. See the Page section for the following fields: navigationType, contentStatus, space. Attachment objects also have the usual creation and modification properties.
Specific fields:
- title: the file name of the attachment in the wiki (not in the backup package). See also fileName.
- lowerTitle: the lower case version of the title property.
- fileName: the older name of the title property.
- contentType: the mime type of the file.
- content: an element-like property pointing to the content containing this attachment. See also containerContent.
- containerContent: the former name
of the content property. - fileSize: the size of the file, in bytes.
- comment: the user comment attached to this version of this attachment (note: XWiki doesn't have an equivalent feature at the time of this was written)
- attachmentVersion: a increasing number giving the revision number of this attachment. It's supposed to be unique per attachment. See also version.
- version: another name for attachmentVersion. Supposedly the new name of the property.
- originalVersion: an element-like value pointing to the last revision of the attachment. See also originalVersionId
- originalVersionId: a number version of the original version property. Sometimes this property is used instead of originalVersion. It is unclear when. This property can be present and empty. In this case, it should be analyzed as if it were not present at all.
- historicalVersions: the older revisions of an attachment.
- imageDetailsDTO: ???
com.atlassian.confluence.pages.Page
<object class="Page" package="com.atlassian.confluence.pages">
<id name="id">753692</id>
<property name="position"/><property name="parent" class="Page" package="com.atlassian.confluence.pages"><id name="id">753689</id>
</property>
<collection name="ancestors" class="java.util.List"><element class="Page" package="com.atlassian.confluence.pages"><id name="id">753689</id>
</element>
</collection>
<property name="space" class="Space" package="com.atlassian.confluence.spaces"><id name="id">786435</id>
</property>
<property name="title"><![CDATA[Private page 1]]></property>
<collection name="bodyContents" class="java.util.Collection"><element class="BodyContent" package="com.atlassian.confluence.core"><id name="id">819226</id>
</element>
</collection>
<property name="version">1</property>
<property name="creatorName"><![CDATA[admin]]></property>
<property name="creationDate">2013-10-14 15:37:52.357</property>
<property name="lastModifierName"><![CDATA[admin]]></property>
<property name="lastModificationDate">2013-10-14 15:37:52.357</property>
<property name="versionComment"><![CDATA[]]></property>
<property name="contentStatus"><![CDATA[current]]></property>
</object><collection name="historicalVersions" class="java.util.Collection"><element class="Page" package="com.atlassian.confluence.pages"><id name="id">753670</id>
</element>
<element class="Page" package="com.atlassian.confluence.pages"><id name="id">753675</id>
</element>
<element class="Page" package="com.atlassian.confluence.pages"><id name="id">753676</id>
</element>
<element class="Page" package="com.atlassian.confluence.pages"><id name="id">753677</id>
</element>
<element class="Page" package="com.atlassian.confluence.pages"><id name="id">753678</id>
</element>
<element class="Page" package="com.atlassian.confluence.pages"><id name="id">753684</id>
</element>
</collection><property name="originalVersion" class="Page" package="com.atlassian.confluence.pages"><id name="id">753668</id>
</property>This describes a Confluence page revision. Here are the properties we know about:
- position: an integer giving its position in the navigation menu of the space. See https://jira.xwiki.org/browse/CONFLUENCE-261
- ancestors: a collection of ids referring to the parents of the page up to but excluding the space: its direct parent, the direct parent of its direct parent, and so on and so forth. We have not been relying on this property.
- space: the id of the Space object describing the space in which the Page is
- title: the title of the page, which is supposed to be unique in the whole space
- lowerTitle: the lowercase version of the title, also supposed to be unique in the whole space
- bodyContents: a collection that contains the id of the object describing the content of the page. This property is a collection but we have ever seen exactly one element in this collection. It is unclear why a collection is used here.
- version: a number which is the revision number of the page. It is supposed to be unique across a page and its historical revisions. In practice, we've seen duplicate versions in some exports, not clear where it comes from, probably some sort of corruption. See for instance https://jira.xwiki.org/browse/CONFLUENCE-427
- versionComment: a comment written by the user as save time to describe this version
- contentStatus: contains the status of this Page. Here are the known values:
- current: the page is current
- draft: the page is a draft. We currently discard these pages.
- deleted: the page was deleted. We currently discard these pages.
- originalVersion: this property is set only on historical versions of the page, and points to the last version of the page. Only older revisions have this property, the last revision doesn't have it and that's how you know a Page object describes the last revision of a Page. See also originalVersionId.
- originalVersionId: like originalVersion, used by older Confluence versions, directly a number instead of an element-like value.
- navigationType: ???
- historicalVersions: a unordered collection of the older revisions of the page. Only the last revision of the page has this property.
- children or childrens: an unordered collection of the last versions of the direct children of the page (note: it's sometimes childrens with an s at the end, sometimes children without the s)
- attachments: an unordered collection of the attachments, including their older versions
- comments: an unordered collections of comments
- outgoingLinks: an unordered collection of outgoing links
- contentPermissionSets: a collection of permission sets of type ContentPermissionSet, which are sets of permissions applying to this content
Page objects also have the usual creation and modification properties.
com.atlassian.confluence.security.ContentPermission
<object class="ContentPermission" package="com.atlassian.confluence.security">
<id name="id">1048577</id>
<property name="type"><![CDATA[View]]></property>
<property name="userName"><![CDATA[admin]]></property>
<property name="groupName"/><property name="owningSet" class="ContentPermissionSet" package="com.atlassian.confluence.security"><id name="id">1015809</id>
</property>
<property name="creatorName"/><property name="creationDate">2013-10-14 15:41:26.893</property>
<property name="lastModifierName"><![CDATA[admin]]></property>
<property name="lastModificationDate">2013-10-14 15:41:26.893</property>
</object>Other properties:
- type: the name of the permission of all the content permissions in this set. Note: The content permissions themselves also have a type property, with the same value.
- owningSet: an element-like value pointing to the content permission set in which this content permission is
- userName: the name of the user to which the permission applies.
- groupName: the name of the group to which the permission applies.
ContentPermission objects also have the usual creation and modification properties.
com.atlassian.confluence.security.ContentPermissionSet
<object class="ContentPermissionSet" package="com.atlassian.confluence.security">
<id name="id">67043333</id>
<property name="type"><![CDATA[View]]></property>
<collection name="contentPermissions" class="java.util.SortedSet">
<element class="ContentPermission" package="com.atlassian.confluence.security">
<id name="id">67076114</id>
</element>
<!-- ... cut ... -->
<element class="ContentPermission" package="com.atlassian.confluence.security">
<id name="id">152338667</id>
</element>
</collection>
<property name="owningContent" class="Page" package="com.atlassian.confluence.pages">
<id name="id">66719934</id>
</property>
<property name="creationDate">2015-01-14 10:21:23.000</property>
<property name="lastModificationDate">2019-07-31 09:54:26.000</property>
</object>ContentPermissionSet objects have the usual creation and modification properties.
Other properties:
- type: the name of the permission of all the content permissions in this set. Note: The content permissions themselves also have a type property, with the same value.
- owningContent: an element-like property pointint to the content to which the permissions of this content permission set applies
- contentPermissions: a collection of content permissions
com.atlassian.confluence.security.SpacePermission
<object class="SpacePermission" package="com.atlassian.confluence.security">
<id name="id">617742337</id>
<property name="space" class="Space" package="com.atlassian.confluence.spaces">
<id name="id">622593</id>
</property>
<property name="type"><![CDATA[COMMENT]]></property>
<property name="group"/>
<property name="allUsersSubject"><![CDATA[anonymous-users]]></property>
<property name="creator" class="ConfluenceUserImpl" package="com.atlassian.confluence.user">
<id name="key"><![CDATA[01f7c1ca483b2b1c01483b2d4f4206cc]]></id>
</property>
<property name="creationDate">2023-11-10 14:11:48.022</property>
<property name="lastModifier" class="ConfluenceUserImpl" package="com.atlassian.confluence.user">
<id name="key"><![CDATA[01f7c1ca483b2b1c01483b2d4f4206cc]]></id>
</property>
<property name="lastModificationDate">2023-11-10 14:11:48.022</property>
</object>
A space permission.
Other properties:
- type: the name of the permission
- space: the space which the space permission applies to
- group: the name of the group to which the permission apply. Empty if it doesn't apply to a group
- allUsersSubject: equals to anonymous-users if the permission applies to guests
- userSubject: an element-like value with a <id name="key"> containing the key of the user, described by a ConfluenceUserImpl object, to which the permission applies
SpacePermission objects have the usual creation and modification properties.
com.atlassian.confluence.setup.bandana.ConfluenceBandanaRecord
<object class="ConfluenceBandanaRecord" package="com.atlassian.confluence.setup.bandana">
<id name="id">43</id>
<property name="context"><![CDATA[_GLOBAL]]></property>
<property name="key"><![CDATA[__DEFAULT_SPACE_PERMISSIONS____GROUP_NAMES__]]></property>
<property name="value"><![CDATA[<set>
<string>confluence-users</string>
</set>]]></property>
</object>
We don't yet use these objects.
com.atlassian.confluence.spaces.Space
<object class="Space" package="com.atlassian.confluence.spaces">
<id name="id">622593</id>
<property name="name"><![CDATA[Great Internal Documentation]]></property>
<property name="key"><![CDATA[Great]]></property>
<property name="lowerKey"><![CDATA[great]]></property>
<property name="description" class="SpaceDescription" package="com.atlassian.confluence.spaces">
<id name="id">589825</id>
</property>
<property name="homePage" class="Page" package="com.atlassian.confluence.pages">
<id name="id">589826</id>
</property>
<collection name="permissions" class="java.util.Collection">
<element class="SpacePermission" package="com.atlassian.confluence.security">
<id name="id">1277959</id>
</element>
<element class="SpacePermission" package="com.atlassian.confluence.security">
<id name="id">1277960</id>
</element>
<!-- ... cut ... -->
<element class="SpacePermission" package="com.atlassian.confluence.security">
<id name="id">617742337</id>
</element>
</collection>
<collection name="pageTemplates" class="java.util.Collection">
<element class="PageTemplate" package="com.atlassian.confluence.pages.templates">
<id name="id">110723073</id>
</element>
<element class="PageTemplate" package="com.atlassian.confluence.pages.templates">
<id name="id">199458822</id>
</element>
</collection>
<property name="creator" class="ConfluenceUserImpl" package="com.atlassian.confluence.user">
<id name="key"><![CDATA[01f7c1ca483b2b1c01483b2d4db002d4]]></id>
</property>
<property name="creationDate">2008-04-23 11:24:41.000</property>
<property name="lastModifier" class="ConfluenceUserImpl" package="com.atlassian.confluence.user">
<id name="key"><![CDATA[01f7c1ca483b2b1c01483b2d4db002d4]]></id>
</property>
<property name="lastModificationDate">2009-07-10 10:16:53.000</property>
<property name="spaceType">global</property>
<property name="spaceStatus" enum-class="SpaceStatus" package="com.atlassian.confluence.spaces">CURRENT</property>
</object>A space. Its known properties are:
- name: the pretty name of the space
- key: the space key, supposed to be unique
- lowerKey: the lowercase version of the key, also supposed to be unique
- description: an element-like value pointing to the space description, which is usually shown in tables listing spaces. We currently drop this, except for the labels. (we used to import the space description as the home page of the space, now we import the home page of the space itself)
- homePage: an element-like value pointing to the space home page. Note: a space may not have any home page, in which case all pages are orphans. confluence-xml will, by default, issue a minimal home page listing its children.
- permissions: a collection of space permissions (SpacePermission objects)
- pageTemplate: a collection of page templaces (PageTemplate objects)
- spaceType: whether the space is global or personal (See https://docs.atlassian.com/atlassian-confluence/1000.107.0/com/atlassian/confluence/spaces/SpaceType.html)
- spaceStatus: whether the space is current or archived (See https://docs.atlassian.com/atlassian-confluence/1000.107.0/com/atlassian/confluence/spaces/SpaceStatus.html)
Space objects have the usual creation and modification properties.
com.atlassian.confluence.spaces.SpaceDescription
<object class="SpaceDescription" package="com.atlassian.confluence.spaces">
<id name="id">753665</id>
<property name="space" class="Space" package="com.atlassian.confluence.spaces"><id name="id">786433</id>
</property>
<property name="title"/><collection name="bodyContents" class="java.util.Collection"><element class="BodyContent" package="com.atlassian.confluence.core"><id name="id">819201</id>
</element>
</collection>
<property name="version">1</property>
<property name="creatorName"><![CDATA[admin]]></property>
<property name="creationDate">2013-10-14 14:53:25.489</property>
<property name="lastModifierName"><![CDATA[admin]]></property>
<property name="lastModificationDate">2013-10-14 14:53:25.489</property>
<property name="versionComment"><![CDATA[]]></property>
<property name="contentStatus"><![CDATA[current]]></property>
<collection name="labellings" class="java.util.Collection"><element class="Labelling" package="com.atlassian.confluence.labels"><id name="id">720901</id>
</element>
</collection>
</object>The description of a space. For a description of its properties, see Page (the two types of objects are very similar). SpaceDescription objects have the usual creation and modification properties.
com.atlassian.confluence.user.ConfluenceUserImpl
<object class="ConfluenceUserImpl" package="com.atlassian.confluence.user">
<id name="key"><![CDATA[01f7c1cc638e0d8c0163d05ca6f60124]]></id>
<property name="name"><![CDATA[47826731]]></property>
<property name="lowerName"><![CDATA[47826731]]></property>
<property name="email"/>
</object>An object that represents a user.
- name: the name of the user
- lowerName: the lowercase version of the name of the user
- email: the email address of the user (optional)
com.atlassian.confluence.user.persistence.dao.ConfluenceRememberMeToken
<object class="ConfluenceRememberMeToken" package="com.atlassian.confluence.user.persistence.dao">
<id name="id">393217</id>
<property name="username"><![CDATA[admin]]></property>
<property name="createdTime">1381745929067</property>
<property name="token"><![CDATA[251b5b4649888218a9c81ddf30b66029b63f83d5]]></property>
</object>We don't use these objects.
com.atlassian.confluence.users.PersonalInformation
<object class="PersonalInformation" package="com.atlassian.confluence.user">
<id name="id">753694</id>
<property name="username"><![CDATA[user1]]></property>
<property name="title"/><property name="version">1</property>
<property name="creatorName"><![CDATA[admin]]></property>
<property name="creationDate">2013-10-14 15:42:39.535</property>
<property name="lastModifierName"><![CDATA[admin]]></property>
<property name="lastModificationDate">2013-10-14 15:42:39.535</property>
<property name="versionComment"><![CDATA[]]></property>
<property name="contentStatus"><![CDATA[current]]></property>
</object>
We don't yet use these objects (which seem to have the usual creation and modification properties). See https://docs.atlassian.com/atlassian-confluence/6.6.0/com/atlassian/confluence/user/PersonalInformation.html
com.atlassian.crowd.embedded.hibernate2.HibernateMembership
<object class="HibernateMembership" package="com.atlassian.crowd.embedded.hibernate2">
<id name="id">294915</id>
<property name="parentGroup" class="InternalGroup" package="com.atlassian.crowd.model.group"><id name="id">163842</id>
</property>
<property name="userMember" class="InternalUser" package="com.atlassian.crowd.model.user"><id name="id">229378</id>
</property>
</object>
We don't yet use these objects.
com.atlassian.crowd.model.application.DirectoryMapping
<object class="DirectoryMapping" package="com.atlassian.crowd.model.application">
<id name="id">131073</id>
<property name="application" class="ApplicationImpl" package="com.atlassian.crowd.model.application"><id name="id">65537</id>
</property>
<property name="directory" class="DirectoryImpl" package="com.atlassian.crowd.model.directory"><id name="id">98305</id>
</property>
<property name="allowAllToAuthenticate">true</property>
<collection name="allowedOperations" class="java.util.Set"><element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">UPDATE_ROLE_ATTRIBUTE</element>
<element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">DELETE_USER</element>
<element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">UPDATE_USER_ATTRIBUTE</element>
<element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">CREATE_USER</element>
<element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">DELETE_ROLE</element>
<element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">CREATE_ROLE</element>
<element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">CREATE_GROUP</element>
<element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">UPDATE_USER</element>
<element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">UPDATE_GROUP_ATTRIBUTE</element>
<element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">UPDATE_GROUP</element>
<element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">DELETE_GROUP</element>
<element enum-class="OperationType" package="com.atlassian.crowd.embedded.api">UPDATE_ROLE</element>
</collection>
</object>
We don't use these objects.
com.atlassian.crowd.model.group.InternalUser
<object class="InternalUser" package="com.atlassian.crowd.model.user">
<id name="id">163842</id>
<property name="name"><![CDATA[UserName]]></property>
<property name="lowerName"><![CDATA[username]]></property>
<property name="active">true</property>
<property name="createdDate">2016-05-10 15:00:02.760</property>
<property name="updatedDate">2016-05-10 15:00:02.760</property>
<property name="firstName"><![CDATA[User]]></property>
<property name="lowerFirstName"><![CDATA[user]]></property>
<property name="lastName"><![CDATA[Name]]></property>
<property name="lowerLastName"><![CDATA[name]]></property>
<property name="displayName"><![CDATA[User Name]]></property>
<property name="lowerDisplayName"><![CDATA[user name]]></property>
<property name="emailAddress"><![CDATA[[email protected]]]></property>
<property name="lowerEmailAddress"><![CDATA[[email protected]]]></property>
</object>These objects only seem to be in Site backups, not space exports. They describe users supposedly registered directly in Confluence.
com.atlassian.crowd.model.group.InternalGroup
<object class="InternalGroup" package="com.atlassian.crowd.model.group">
<id name="id">163843</id>
<property name="name"><![CDATA[twistedgroup]]></property>
<property name="lowerName"><![CDATA[twistedgroup]]></property>
<property name="active">true</property>
<property name="local">false</property>
<property name="createdDate">2013-10-14 15:43:47.360</property>
<property name="updatedDate">2013-10-14 15:43:47.360</property>
<property name="description"/><property name="type" enum-class="GroupType" package="com.atlassian.crowd.model.group">GROUP</property>
<property name="directory" class="DirectoryImpl" package="com.atlassian.crowd.model.directory"><id name="id">98305</id>
</property>
</object>These objects only seem to be in Site backups, not space exports. They describe a group of users. They are used when group imports are enabled. It's usually better to import groups from a central user directory like LDAP.
com.atlassian.crowd.model.user.InternalUserAttribute
<object class="InternalUserAttribute" package="com.atlassian.crowd.model.user">
<id name="id">262152</id>
<property name="user" class="InternalUser" package="com.atlassian.crowd.model.user"><id name="id">229379</id>
</property>
<property name="directory" class="DirectoryImpl" package="com.atlassian.crowd.model.directory"><id name="id">98305</id>
</property>
<property name="name"><![CDATA[passwordLastChanged]]></property>
<property name="value"><![CDATA[1381758208148]]></property>
<property name="lowerValue"><![CDATA[1381758208148]]></property>
</object>We don't yet use these objects.
attachments folder
The folder is in space exports if the attachment export was not disabled. To our knowledge, attachments are not present in site exports. In this case, attachment version v (version or attachmentVersion property), original id originalAttachmentId of content originalContentId is expected to be at attachments/<originalContentId>/<originalAttachmentId>/<v>.
The export descriptor file (exportDescriptor.properties)
The export descriptor file gives information about the export. It is a propeties file, each line containing a key=value pair.
It starts with a comment containing the date at which the export was done.
#Mon Feb 10 10:23:09 UTC 2025
ao.data.version.min.com.atlassian.mywork.mywork-confluence-host-plugin=1.1.30
ao.data.version.com.atlassian.mywork.mywork-confluence-host-plugin=1000.0.0-fa970f983392
createdByVersionNumber=1000.0.0-fa970f983392
source=cloud
buildNumber=4515
ao.data.list=com.atlassian.mywork.mywork-confluence-host-plugin, com.atlassian.confluence.plugins.confluence-space-ia
spaceKey=attachhist
ao.data.version.min.com.atlassian.confluence.plugins.confluence-space-ia=5.0
defaultUsersGroup=confluence-users
ao.data.version.com.atlassian.confluence.plugins.confluence-space-ia=1000.0.0-fa970f983392
exportType=space
createdByBuildNumber=8401
timezoneId=UTC
inlineTasksFileIncluded=true
backupAttachments=trueHere are some interesting properties:
- `source`: its value can be `server` or `cloud`. If it is absent, `server` can be assumed. It is interesting because Confluence Cloud and Confluence Server have significant differences.
- `exportType`: its value is `all` for a site backup, and `space` for a space backup.
- `spaceKey`: its value is the key of the exported spaces for space backups. Note: this can be used to handle https://jira.atlassian.com/browse/CONFSERVER-22853, a bug in Confluence Server that makes it export several spaces when one was asked (see https://jira.xwiki.org/browse/CONFLUENCE-296)
- `backupAttachments`: whether exporting attachments was enabled for this backup.
Quirks and how to handle them
The entities.xml can have a leading space in its name
We've seen this sporadically. This makes the import fail fast. You'll need to remove the leading space from the filename. Should this happen more often, we shall add a workaround in Confluence XML.
Space exports sometimes contain several spaces
A bug in Confluence Server sometimes makes it export several spaces when one was asked (https://jira.atlassian.com/browse/CONFSERVER-22853.
We work around this issue in confluence-xml by not importing the extraneous space by default (https://jira.xwiki.org/browse/CONFLUENCE-296).
Parsing entities.xml may not work out of the box because of the presence of control characters
We fully work around this issue by transparently stripping them at parse time. See:
- https://jira.xwiki.org/browse/CONFLUENCE-143 (BS characters, and BS means backspace)
- https://jira.xwiki.org/browse/CONFLUENCE-181 (character U+0002)
The export can be corrupted in ways the entities.xml file contains illegal XML characters in body contents
XML parsers don't like weird non-unicode characters. With confluence-xml, you'll have a stack trace like this one:
3/8/2025 5:56:26 PM Failed to read package
org.xwiki.filter.FilterException: Failed to analyze the package index
at org.xwiki.contrib.confluence.filter.input.ConfluenceXMLPackage.read(ConfluenceXMLPackage.java:893)
at org.xwiki.contrib.confluence.filter.internal.input.ConfluenceInputFilterStream.preparePackage(ConfluenceInputFilterStream.java:391)
at org.xwiki.contrib.confluence.filter.internal.input.ConfluenceInputFilterStream.readInternal(ConfluenceInputFilterStream.java:331)
at org.xwiki.contrib.confluence.filter.internal.input.ConfluenceInputFilterStream.read(ConfluenceInputFilterStream.java:229)
at org.xwiki.contrib.confluence.filter.internal.input.ConfluenceInputFilterStream.read(ConfluenceInputFilterStream.java:106)
at org.xwiki.filter.input.AbstractBeanInputFilterStream.read(AbstractBeanInputFilterStream.java:79)
at org.xwiki.filter.internal.job.FilterStreamConverterJob.runInternal(FilterStreamConverterJob.java:97)
at org.xwiki.job.AbstractJob.runInContext(AbstractJob.java:246)
at org.xwiki.job.AbstractJob.run(AbstractJob.java:223)
at org.xwiki.filter.script.internal.ScriptFilterStreamConverterJob.run(ScriptFilterStreamConverterJob.java:75)
at com.xwiki.confluencepro.internal.ConfluenceMigrationJob.runInternal(ConfluenceMigrationJob.java:159)
at org.xwiki.job.AbstractJob.runInContext(AbstractJob.java:246)
at org.xwiki.job.AbstractJob.run(AbstractJob.java:223)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
3/8/2025 5:56:26 PM Exception thrown during job execution
org.xwiki.filter.FilterException: Failed to read package
at org.xwiki.contrib.confluence.filter.internal.input.ConfluenceInputFilterStream.preparePackage(ConfluenceInputFilterStream.java:400)
at org.xwiki.contrib.confluence.filter.internal.input.ConfluenceInputFilterStream.readInternal(ConfluenceInputFilterStream.java:331)
at org.xwiki.contrib.confluence.filter.internal.input.ConfluenceInputFilterStream.read(ConfluenceInputFilterStream.java:229)
at org.xwiki.contrib.confluence.filter.internal.input.ConfluenceInputFilterStream.read(ConfluenceInputFilterStream.java:106)
at org.xwiki.filter.input.AbstractBeanInputFilterStream.read(AbstractBeanInputFilterStream.java:79)
at org.xwiki.filter.internal.job.FilterStreamConverterJob.runInternal(FilterStreamConverterJob.java:97)
at org.xwiki.job.AbstractJob.runInContext(AbstractJob.java:246)
at org.xwiki.job.AbstractJob.run(AbstractJob.java:223)
at org.xwiki.filter.script.internal.ScriptFilterStreamConverterJob.run(ScriptFilterStreamConverterJob.java:75)
at com.xwiki.confluencepro.internal.ConfluenceMigrationJob.runInternal(ConfluenceMigrationJob.java:159)
at org.xwiki.job.AbstractJob.runInContext(AbstractJob.java:246)
at org.xwiki.job.AbstractJob.run(AbstractJob.java:223)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: org.xwiki.filter.FilterException: Failed to analyze the package index
at org.xwiki.contrib.confluence.filter.input.ConfluenceXMLPackage.read(ConfluenceXMLPackage.java:893)
at org.xwiki.contrib.confluence.filter.internal.input.ConfluenceInputFilterStream.preparePackage(ConfluenceInputFilterStream.java:391)
... 14 moreYou will need to identify the problematic content and the problematic characters. Then, we know of two main ways to deal with this issue:
- Removing the bad characters from the pages in Confluence and then reexporting without history (with history is not possible, since the older versions of the pages contain the problematic characters).
- Remove the bad characters by editing entities.xml
One way of idenfitying the problematic characters is to run a failing import with a debugger connected to the XWiki instance, and set breakpoints in ConfluenceXMLPackage from the stack trace, and identify the affected content(s), or work with a modified version of ConfluenceXMLPackage that logs the ids of objects it parses and see where it stops. Then, you will need to extract the relevant body content and inspect it in some editor that lets you see special characters or some hexadecimal viewer / editor.
You can try to use unix tools to search occurences of the illegal characters you identified. For example, let's say we identified this sequence of problematic characters: perl -nle '\xEF\xBF\xBF`
You can count the occurences using the following command:
perl -nle '$c+=scalar(()=m/\xEF\xBF\xBF/g);END{print $c}' entities.xmlor:
unzip -p myexport.xml.zip entities.xml | perl -nle '$c+=scalar(()=m/\xEF\xBF\xBF/g);END{print $c}'You can remove these characters in place using sed (make sure you can get the original entities.xml file)
sed -i 's/\xEF\xBF\xBF//g' entities.xmland then add back entities.xml in a copy of the export zip archive.
Another example, with grep, where we had some content ending with 34004 NULL (\0) characters
:
grep --only-matching -a -P '\x00' entities.xml | wc -l
# answer: 34004
sed -i 's/\x00//g' entities.xmlIf you work on separate files instead of in place, you can inspect the difference of size before and after fixing:
ls -la original/entities.xml fixed/entities.xml
-rw-r--r-- 1 raph raph 530607291 11 mars 09:02 fixed/entities.xml
-rw-r--r-- 1 raph raph 530641295 11 mars 08:51 original/entities.xmlWe have 530641295 - 530607291 = 34004. You can also check that only the problematic line(s) were modified by running diff:
diff original/entities.xml fixed/entities.xml