Export Publishing Page Content to XML

We had a recent requirement to export all our variation content out as an XML file for an external organisation. You can kind of do this OOTB in Site Content and Structure -> Export Variation, but it’s not nice. You have no control over the schema you get the content in, or the identifiers of the content, and it’s all wrapped in a CAB file with loads of other junk. (Not to mention that I’ve run into an install where their ISA server settings blocked that file – schoolboy error! )

 

So, I set about writing a custom solution to the problem that gave me more control. I made the following decisions;

 

To control what content fields are exported using the same Translatable Columns settings that Sharepoint uses when you export variations. ( _Layouts/TranslatableSettings.aspx ).

 

To provide sufficient identifiers with the content so that I could conceivably use a similar route to re-import changed content if that requirement arises in future.

 

To download the content as a well formed XML file.

 

To access the export feature via a Site Action menu option.

 

[Btw you might want to read this post in conjunction with https://jamiemcallister.com/post/Propagating-Document-Library-Items-In-MOSS-Variations-(in-the-same-way-as-Pages).aspx as there are some similarities in how the development was done.]

 

So, how is it all done? It goes like this;

 

 Translatable Columns. 

This took a bit of digging, but I found that these are stored as an xml file within a property of SPList.RootFolder of the Relationships List. This code returns the xml;

 Guid guid = new Guid(publishingweb.AllProperties["_VarRelationshipsListId"].ToString()); 

string translatableCols = publishingweb.Lists[guid].RootFolder.Properties["TranslateFields"] as string;

  

The XML format didn’t suit what I wanted to do with this, so I created a Generic Sorted Dictionary to hold the Column Name and Field GUID, and copied those values in;

 

SortedDictionary<string, Guid> translatableColumns = new SortedDictionary<string, Guid>();

 

 I also then created an XmlDocument to fire my content extracts into as I went along; 

private XmlDocument contentExtract; 

Obtain the Content and its Identifiers 

If you’ve read any of my prior blog posts, you’ll have seen my recursion style code to recurse through my webs. I utilised this again to visit all my Publishing Pages within the Site Collection. I also checked the Moderation Status of the pages etc so I knew I was getting what I wanted. 

Once I had my publishing page instance (pp) it was time to retrieve the values of any fields detailed in translatableColumns; 

foreach (Guid g in translatableColumns.Values) 

string pageContent = SPHttpUtility.HtmlEncode(pp.Fields[g].GetFieldValueAsHtml(pp.ListItem[g])); 

if (pageContent.Length > 0) //We don't need the empty ones 

//Pop the fields internal name and the field Guid into XML properties

attributeFieldName.Value = pp.Fields[g].InternalName;

attributeGuid.Value = g.ToString(); 

//Pop the actual page content into the InnerText of the node

newFieldNode.InnerText = pageContent;  

Download the Content as an XML File 

So, I’ve got my XML file now. The next step is to throw it out as an XML file. This is very easy, because of the way I packaged this. I’ve created this whole solution as an ASP.NET Code Behind file. Andrew Connell detailed the technique here; http://www.andrewconnell.com/blog/articles/UsingCodeBehindFilesInSharePointSites.aspx My page is called ManageVariationsContent.aspx because I can’t stop my naming conventions getting verbose. J 

As such, I have access to the Response object so I simply write;  

Response.AddHeader("Content-disposition", "attachment;filename=ContentDownload.xml");

Response.ContentType = "application/octet-stream";

Response.BinaryWrite(System.Text.UTF8Encoding.UTF8.GetBytes(contentExtractXMLAsString));

Response.Flush();

Response.End();

 

When the code executes my browser receives the file and I can save or open it. Really neat!

 

 Putting it into Site Actions 

All my code has been written as a Feature. To create my SIte Actions menu item I place the following XML in my Element Manifest file (element.xml);

 

<CustomAction Id = “YourGuid”

GroupId = “SiteActions”

Location = “Microsoft.Sharepoint.StandardMenu”

Sequence = “1000”

Rights="ManageSubWebs"

Title = “Your Title”>

<UrlAction Url=”javascript:window.location= ‘{SiteUrl}/ManageVariationsContent.aspx?source=’ + window.location”/>

< /CustomAction>

 

GroupId and Location are what place it in the Site Actions menu. Since this is a fairly priviledged action I specified a Rights attribute so that only someone who can manage sub webs will ever see this Custom Action. (Check out the SPBasePermissions enum on MSDN to see what other values are available for that property).

I pass the site url as a parameter in UrlAction as it gets used in some of the code on my application page – but it’s not of major importance.

 

 Finally 

If you compile this all up in a Feature, and install/activate it you’ll get a new Site Action with the title you specified. Going through to your action page allows you to trigger the extract and download an XML file with all your content.

 

This method gives you a lot of control over what you extract. It also presents many of the steps you might need to push the content back in again (perhaps after you had the variation content translated).

 

You could also bend the code that interrogates publishing page fields to work in a solution for ad-hoc translation when items are published to variations. (But we’ll leave that for another blog post!)