Text to HTML Parser
Add Comment| Download File | SDK |
| textparser.zip (3kb) | Beta1 |
Introduction
If you have been into developing Web Applications then you might
have at many times experienced that when you display multiple lines
of data from the database you loose the spacing or formatting between
multiple lines of data. Also in some applications like Forums, where users
can post HTML content directly which can lead to some serious
problems. What I mean by Posting HTML content is that e.g.. A user
can post a HTML Image tag like
<img
src="http://myserver.com/mypic.jpg" > and when
someone views this post the actual image gets displayed instead of
the Tag! Someone can post a link to some malicious coded page and
all the users can become easy targets which can cause some serious
security implications.
Problem
The problem that I have described above is divided into 2 parts.
1) Formatting Problem: In HTML all the white spaces between
two characters get converted into a single white space
automatically. Also Carriage Return '/r' and Line Feed '/n'
characters do not have any affect on the HTML formatting. Due to
this if you have a multiple line post, while displaying HTML
converts all formatting to just a single continuous line.
2) HTML Content: This can be both a problem or boon depending on users of your application. While displaying the content from the database, the HTML engine of the client browser actually parses the HTML content of the data. Due to this instead of displaying the tag's as text, they actually get converted to HTML.
Solution
There is a common solution to both the above problems, you have to
parse the Text content from the Database into respective HTML tags.
1) Formatting Solution: In HTML denotes a
extra white space. So every 2 white spaces should be substituted by
a single white space and .
Also every line terminator should be replaced by the break tag <br>,
which will result in the next character starting for a new line.
2) HTML Content: The solution to this is a bit tricky, in HTML every valid tag is contained within the < and > brackets. So to make all the HTML tags in your post invalid just change the < and > tags to their HTML counter parts < and > respectively. Also one other formatting change to be made is that the double quotation mark " has to be changed into its HTML equivalent "
Text
to HTML parser
On the .NET Platform the String object is immutable i.e. once you
create a String object you cannot change its contents. Since our
parser needs to do some heavy weight string manipulations, I use the
StringBuilder class from the System.Text namespace which provides a
mutable string object. Also for streaming access to textual content
I use the StringReader and StringWriter classes from the System.IO
namespace.
Example: Normal post (See the problem!!)
| Some sample text with lots of extra white spacing .. ... and some text on a new line. lastly the HTML textbox tag |
Example: Parsed Text with HTML posting allowed (See the difference!)
| Some sample text with lots of
extra white spacing .. ... and some text on a new line. lastly the HTML textbox tag |
Example: Parsed Text with HTML posting disabled (exactly same as posted!)
| Some sample text with lots of
extra white spacing .. ... and some text on a new line. lastly the HTML textbox tag <input type="text"> |
Code
1) ParseText method :- The method to convert Text
into HTML
public string parsetext(string text, bool allow)
{
//Create a StringBuilder object from the string input
//parameter
StringBuilder sb = new StringBuilder(text) ;
//Replace all double white spaces with a single white space
//and
sb.Replace(" "," ");
//Check if HTML tags are not allowed
if(!allow)
{
//Convert the brackets into HTML equivalents
sb.Replace("<","<") ;
sb.Replace(">",">") ;
//Convert the double quote
sb.Replace("\"",""");
}
//Create a StringReader from the processed string of
//the StringBuilder
StringReader sr = new StringReader(sb.ToString());
StringWriter sw = new StringWriter();
//Loop while next character exists
while(sr.Peek()>-1)
{
//Read a line from the string and store it to a temp
//variable
string temp = sr.ReadLine();
//write the string with the HTML break tag
//Note here write method writes to a Internal StringBuilder
//object created automatically
sw.Write(temp+"<br>") ;
}
//Return the final processed text
return sw.GetStringBuilder().ToString();
}
|
2) textparser.aspx - A sample consumer for the Text to HTML parser
<%@ Page Language="C#" %>
<%@ Import namespace="System.Text" %>
<%@ Import Namespace="System.IO" %>
<html>
<head>
<script language="C#" runat=server >
private void Post_Text(object sender, EventArgs e)
{
//Check if there is some text inside the TextBox
if(mess.Text!="")
{
//Check if option to Parse Text is selected
if(parse.Checked)
{
//Check if option to convert HTML tags to text is selected
if(htmlpost.Checked)
{
//Call the parsetext method
//Pass the text content from the textbox and false so that
//HTML tags do not get converted to text
postmess.Text=parsetext(mess.Text,false) ;
}
else
{
//Call the parsetext method
//Pass the text content from the textbox and true so that
//HTML tags get converted to text
postmess.Text=parsetext(mess.Text,true) ;
}
}
else
{
//Just post the text without any parsing
postmess.Text=mess.Text ;
}
}
}
//Method to parse Text into HTML
public string parsetext(string text, bool allow)
{
//Create a StringBuilder object from the string input
//parameter
StringBuilder sb = new StringBuilder(text) ;
//Replace all double white spaces with a single white space
//and
sb.Replace(" "," ");
//Check if HTML tags are not allowed
if(!allow)
{
//Convert the brackets into HTML equivalents
sb.Replace("<","<") ;
sb.Replace(">",">") ;
//Convert the double quote
sb.Replace("\"",""");
}
//Create a StringReader from the processed string of
//the StringBuilder object
StringReader sr = new StringReader(sb.ToString());
StringWriter sw = new StringWriter();
//Loop while next character exists
while(sr.Peek()>-1)
{
//Read a line from the string and store it to a temp
//variable
string temp = sr.ReadLine();
//write the string with the HTML break tag
//Note here write method writes to a Internal StringBuilder
//object created automatically
sw.Write(temp+"<br>") ;
}
//Return the final processed text
return sw.GetStringBuilder().ToString();
}
</script>
</head>
<body>
<center>
<h3>Wecome to Saurabh's Text to HTML Parser</h3>
<br>
<form runat=server >
<table border=1>
<tr>
<td valign=top>Your message</td>
<td>
<asp:label text=" " id=postmess runat=server />
</td></tr>
<tr><td valign=top>Enter Message </td>
<td><asp:textbox Columns="50" Rows="20" TextMode="MultiLine" id=mess
runat=server /></td></tr>
<tr><td colspan=2>
<asp:checkbox id=parse text="Select to Parse the Text into HTML" runat=server/>
<br>
<asp:checkbox id=htmlpost
text="Select to allow posting of HTML content" runat=server />
</td></tr>
<tr><td colspan=2>
<asp:button onClick="Post_Text" text="Post Text" runat=server/></td></tr>
</table>
</form>
</center>
</body>
</html>
|

