Wednesday, November 16, 2016

SQLServer : Reading flat data stored in XML column

Hello everyone,
It took me a while to get on to XML data in SQL Server. I recently encountered a scenario of reading flat data from an XML column.
Another scenario of reading data from an XML saved in a VARCHAR column.

Scenario01: Reading XML data from an XML value saved in the XML type column
So, as a first step, lets create a BookSpace table and save some XML data in it.

CREATE TABLE dbo.BookSpace (ID INT IDENTITY(1,1), BookSpaced XML)
GO
INSERT INTO dbo.BookSpace (BookSpaced)
SELECT CONVERT(XML, '<BookShelf xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Author>
<AuthorName>Mokkapati NarasimhaSastry</AuthorName>
<AuthorNationality>Indian</AuthorNationality>
</Author>
<Book>
<BookName>Barrister Parvateesam</BookName>
<OriginalLanguage>Telugu</OriginalLanguage>
<YearPublished>1924</YearPublished>
</Book>
</BookShelf>'
) AS Book
UNION ALL
SELECT CONVERT(XML, '<BookShelf xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Author>
<AuthorName>Chilakamarthi Lakshmi Narasimham</AuthorName>
<AuthorNationality>Indian</AuthorNationality>
</Author>
<Book>
<BookName>Ganapathi</BookName>
<OriginalLanguage>Telugu</OriginalLanguage>
<YearPublished>1920</YearPublished>
</Book>
</BookShelf>'
) AS Book
UNION ALL
SELECT CONVERT(XML, '<BookShelf xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Author>
<AuthorName>Charles Dickens</AuthorName>
<AuthorNationality>English</AuthorNationality>
</Author>
<Book>
<BookName>Great Expectations</BookName>
<OriginalLanguage>English</OriginalLanguage>
<YearPublished>1861</YearPublished>
</Book>
</BookShelf>'
) AS Book
UNION ALL
SELECT CONVERT(XML, '<BookShelf xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Author>
<AuthorName>Charles Dickens</AuthorName>
<AuthorNationality>English</AuthorNationality>
</Author>
<Book>
<BookName>A Christmas Carol</BookName>
<OriginalLanguage>English</OriginalLanguage>
<YearPublished>1843</YearPublished>
</Book>
</BookShelf>'
) AS Book
UNION ALL
SELECT CONVERT(XML, '<BookShelf xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Author>
<AuthorName>Dan Brown</AuthorName>
<AuthorNationality>American</AuthorNationality>
</Author>
<Book>
<BookName>The Da Vinci Code</BookName>
<OriginalLanguage>English</OriginalLanguage>
<YearPublished>2003</YearPublished>
</Book>
</BookShelf>'
) AS Book
UNION ALL
SELECT CONVERT(XML, '<BookShelf xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Author>
<AuthorName>Miguel de Cervantes Saavedra</AuthorName>
<AuthorNationality>Spanish</AuthorNationality>
</Author>
<Book>
<BookName>The Ingenious Gentleman Don Quixote of La Mancha</BookName>
<OriginalLanguage>Spanish</OriginalLanguage>
<YearPublished>1605</YearPublished>
</Book>
</BookShelf>'
) AS Book
GO
SELECT * FROM dbo.BookSpace (NOLOCK)
GO

We can read flat data from XML column as below:
SELECT BookSpaced.value('(BookShelf/Book/BookName)[1]','VARCHAR(100)') AS BookName
, BookSpaced.value('(BookShelf/Book/OriginalLanguage)[1]','VARCHAR(50)') AS OriginalLanguage
, BookSpaced.value('(BookShelf/Book/YearPublished)[1]','INT') AS YearPublished
, BookSpaced.value('(BookShelf/Author/AuthorName)[1]','VARCHAR(250)') AS AuthorName
, BookSpaced.value('(BookShelf/Author/AuthorNationality)[1]','VARCHAR(50)') AS AuthorNationality
FROM dbo.BookSpace (NOLOCK)
 

 

Now, our goal is to read the XML data as a flat data.
For that, the below query will help us. I gave the color notation in the query so that we can quickly get what attribute is used where.

SELECT BookSpaced.value('(BookShelf/Book/BookName)[1]','VARCHAR(100)') AS BookName
, BookSpaced.value('(BookShelf/Book/OriginalLanguage)[1]','VARCHAR(50)') AS OriginalLanguage
, BookSpaced.value('(BookShelf/Book/YearPublished)[1]','INT') AS YearPublished
, BookSpaced.value('(BookShelf/Author/AuthorName)[1]','VARCHAR(250)') AS AuthorName
, BookSpaced.value('(BookShelf/Author/AuthorNationality)[1]','VARCHAR(50)') AS AuthorNationality
FROM dbo.BookSpace

Our XML value is as below:
'<BookShelf xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Author>
    <AuthorName>Miguel de Cervantes Saavedra
    <AuthorNationality>Spanish
  </Author>
  <Book>
    <BookName>The Ingenious Gentleman Don Quixote of La Mancha
    <OriginalLanguage>Spanish
    <YearPublished>1605
  </Book>
</BookShelf>'

In other words, it is

SELECT  XMLColumnName.value(  ‘(Node/SubNode1/Subnode2/…/FinalNode)[1]’,  ’Data Type’   ) AS OurAlias
FROM
Just to note that here the keyword “value” should be in all small letters as it is actually the value() method that performs XQuery against XML and returns data in the specified SQL datatype.

XMLColumn.value(XQuery, SQLType)
XQuery is the XQuery expression i.e. a string literal, that retrieves data inside the XML instance or can also be considered as node by node navigation till the data value.

SELECT BookSpaced.value('(BookShelf/Book/BookName)[1]','VARCHAR(100)') AS BookName
FROM dbo.BookSpace

In this example, XQuery is  '(BookShelf/Book/BookName)[1]'
SQLType is  'VARCHAR(100)'


Scenario02: Reading XML data from an XML saved in the varchar type column
So, as a first step, lets create a TextBookSpace table and save some XML data in it.

CREATE TABLE dbo.TextBookSpace (ID INT IDENTITY(1,1), BookSpaced VARCHAR(1000))
GO
INSERT INTO dbo.TextBookSpace (BookSpaced)
SELECT '<BookShelf xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Author>
    <AuthorName>Mokkapati NarasimhaSastry</AuthorName>
    <AuthorNationality>Indian</AuthorNationality>
  </Author>
  <Book>
    <BookName>Barrister Parvateesam</BookName>
    <OriginalLanguage>Telugu</OriginalLanguage>
    <YearPublished>1924</YearPublished>
  </Book>
</BookShelf>'
AS Book
UNION ALL
SELECT '<BookShelf xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Author>
    <AuthorName>Chilakamarthi Lakshmi Narasimham</AuthorName>
    <AuthorNationality>Indian</AuthorNationality>
  </Author>
  <Book>
    <BookName>Ganapathi</BookName>
    <OriginalLanguage>Telugu</OriginalLanguage>
    <YearPublished>1920</YearPublished>
  </Book>
</BookShelf>'
AS Book
UNION ALL
SELECT '<BookShelf xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Author>
    <AuthorName>Charles Dickens</AuthorName>
    <AuthorNationality>English</AuthorNationality>
  </Author>
  <Book>
    <BookName>Great Expectations</BookName>
    <OriginalLanguage>English</OriginalLanguage>
    <YearPublished>1861</YearPublished>
  </Book>
</BookShelf>'
AS Book
UNION ALL
SELECT '<BookShelf xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Author>
    <AuthorName>Charles Dickens</AuthorName>
    <AuthorNationality>English</AuthorNationality>
  </Author>
  <Book>
    <BookName>A Christmas Carol</BookName>
    <OriginalLanguage>English</OriginalLanguage>
    <YearPublished>1843</YearPublished>
  </Book>
</BookShelf>'
AS Book
UNION ALL
SELECT '<BookShelf xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Author>
    <AuthorName>Dan Brown</AuthorName>
    <AuthorNationality>American</AuthorNationality>
  </Author>
  <Book>
    <BookName>The Da Vinci Code</BookName>
    <OriginalLanguage>English</OriginalLanguage>
    <YearPublished>2003</YearPublished>
  </Book>
</BookShelf>'
AS Book
UNION ALL
SELECT '<BookShelf xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Author>
    <AuthorName>Miguel de Cervantes Saavedra</AuthorName>
    <AuthorNationality>Spanish</AuthorNationality>
  </Author>
  <Book>
    <BookName>The Ingenious Gentleman Don Quixote of La Mancha</BookName>
    <OriginalLanguage>Spanish</OriginalLanguage>
    <YearPublished>1605</YearPublished>
  </Book>
</BookShelf>'
AS Book
SELECT * FROM dbo.TextBookSpace (NOLOCK)

In this case, we can convert the VARCHAR column to XML type and follow the same process like using value() method to read the XML data.

SELECT BookSpaced.value('(BookShelf/Book/BookName)[1]','VARCHAR(100)') AS BookName
, BookSpaced.value('(BookShelf/Book/OriginalLanguage)[1]','VARCHAR(50)') AS OriginalLanguage
, BookSpaced.value('(BookShelf/Book/YearPublished)[1]','INT') AS YearPublished
, BookSpaced.value('(BookShelf/Author/AuthorName)[1]','VARCHAR(250)') AS AuthorName
, BookSpaced.value('(BookShelf/Author/AuthorNationality)[1]','VARCHAR(50)') AS AuthorNationality
FROM (SELECT ID, CONVERT(XML,BookSpaced) AS BookSpaced FROM dbo.TextBookSpace (NOLOCK)) AS Q


But mind you, sometimes, we may encounter strange situations of NULL data being resulted from the above query.
To handle those, we need to consider using substrings and charindex to get the data value we need.

For example, if we want to get the BookName and AuthorName, we can use the below query


SELECT SUBSTRING(BookSpaced,CHARINDEX('<BookName>',BookSpaced)+LEN('<BookName>'),CHARINDEX('</BookName>',BookSpaced) - CHARINDEX('<BookName>',BookSpaced)-LEN('<BookName>')) AS BookName
, SUBSTRING(BookSpaced,CHARINDEX('<AuthorName>',BookSpaced)+LEN('<AuthorName>'),CHARINDEX('</AuthorName>',BookSpaced) - CHARINDEX('<AuthorName>',BookSpaced)-LEN('<AuthorName>')) AS AuthorName
FROM dbo.TextBookSpace

 


To explain this query, for BookName column,
In our xml column the value is like …..<BookName>Ganapathi</BookName>…….

We are using substring function to achieve this. We gave the column name as the first parameter.

The second parameter is position from where the substring should be started.
We need to read the book name Ganapathi, for that we are finding the character position that starts with and subtracting the length of from it. Let us say, we got the value B1.

Note that can be replaced with any other node in the xml.

The third parameter is length to which the substring should be read. Here it is dynamic, we can’t guess the length of a book name.

So, we need to get the length of bookname dynamically. i.e. the starting position of the end node
MINUS the second parameter calculated.

With this we are at the closure of the discussion.

Will be meeting you soon with another scenario.

Thanks
SunilDutt N

 

No comments:

Post a Comment