Wednesday, December 28, 2005

C# inheritance

The can be done in a number of programming methodologies. Now for something like the waterfall approach, inheritance is all planned out. With other more dynamic programming methodologies it's not so obvious, which methods should be overwritten and which shouldn't be.

Java is a language where if something is overwritten in the child the parent's version will automatically execute the child code's view. In C# the designers have decided to go for a literal approach when it comes to the language and inheritance. Much like C++ which is understandable considering it translated to IL as well as C# and VB.NET.
Take the following example. If the class car is derived from automobile and automobile has a method called TopSpeed. Now if the virtual and override keywords aren't used then if somebody assigns a car to type automobile the code for automobile will be executed when the method TopSpeed is executed.

I believe this is a case of bad language design. If a child class is to have a method with the same name as parent's method it should implicitly override the parent method. This begs the question if one is writing libraries in the infrastructure of a project that are to be used by developers, should all methods be made virtual? I believe if you wish to allow flexible extensibility by your programmers then a strong argument might be made for it.

Thursday, December 22, 2005

The importance of contextual metadata... (Part 1)

As the amount of information in the world increases so does the need to be able to better search for information. The two main tools that we have at our aid to improve this are statistical metrics and metadata (ignoring for things such as anchor weighting since I'm assuming a non-web enviornment and constant boosting from certain authors due to the fact I'm assuming that all information is important).

When I say statistical metrics I'm referring to things such as a search for "soy or linseed", now statitically one word will occur more often than the other, so it should probably rated more highly and a document containing both soy and linseed is more important than one containing just one of the search terms. etc...

The next aid to finding interesting results is metadata. Now pre-existing metadata for a document is nice to have but is often incorrect or inaccurate, so we have to judgements on how much weight we give to pre-existing metadata must be made usually on a case by case basis (referring to an inspection of the data to be searched over).

Next we have created metadata. This data helps to define things about the document it's self. For example people or places can be extracted. This allows us to drilldown on pre-existing values in searches. Other contextual information can be gathered from the text of the document, such as identifing a title of a document or a heading and making it more important.
A search for bush a gives us? A plant, a president, and a pro footballer. By recognising people we can limit those documents to a president and a pro footballer, by searching for bush inside the people metadata.

Attributes in .NET

Attributes were definitely a bit of a blank for me initially when it came to .NET. Why would you want to have metadata within your code? What would be the point of that? It turns out that it's not that silly, especially when we come to thinking about code in terms of reflection and runtime discovery.

Note: Java has the same abilities as .NET in this area though not as highly documented. http://www-128.ibm.com/developerworks/java/library/j-dyn0429/ explains how java byte code has attributes and that each function is simply an attribute and that custom attributes may exist. Though it appears at this time there doesn't appear to be any real way to utilise this information.

On metadata attributes in .NET it's possible to use these attributes to dynamically find items in classes at runtime. This is of course especially useful if you'd like to dynamically discover load classes at runtime. Think plugins...

First of all create a custom attribute. This is simply done deriving a class from the System.Attribute class. You can use the Attribute AttributeUsage to say that it's only valid for classes and interfaces.
Write a custom interface that uses your custom attribute for the interface declaration.
Implement your interface. It is now possible with reflection to find all instances of classes that implement that interface using reflection. Simply load the current assembly (Assembly.ExecutingAssembly) and get all the types. For each type attempt to get all instances of our custom attribute. If the attribute exists then we know that the class implements our custom interface. (Also check to make sure that the class isn't abstract or an interface that we you can actually create an instance of the class).

And now you have seen how reflection can be used to find and dynamically instantiate classes. This method is used (though slightly differently), to find webservice methods. You may also see http://www.xml-rpc.net/ this uses the same sort of reflection to know what methods to marshall on a website. So there you have it. Attributes and reflection.

Wednesday, November 09, 2005

XML Python and Characters...

A little while ago python moved from having only 8bit strings that were treated as byte arrays to unicode supported strings. Personally I think if you're going to move, you have decide on a direction and move.

On a more interesting note, the python xml.dom.minidom provides support for parsing XML. When parsing a utf-8 encoded string it converts the Text and CDATA nodes into a python unicode string. Nice if it's routines can guess information about the source correctly.
Now image you have a CDATA node with the contents "\r\r\n" (using the C programming language representations of the line feed and carrigage return characters). A person would usually expect the XML parser to give a unicode string representation of "\r\r\n". This is not the case. In fact the character string becomes "\n\n". So if you're thinking of reliably extracting textual data from an XML document in python, I can only recommend staying away from python's minidom (python 2.3.5 under win32).

Saturday, November 05, 2005

Extracting Raw XHTML from an XML document...

One of the cited reasons that a person might want to use XHTML or XML safe HTML, is simply the extraction of a document or text fragment from within another that's ready for display. While this a good idea it may fall down in a number of places.

Yesterday I was misassigned a bug. Not my code, and we haven't started non-ownership fixing yet. Anyway the bug was a simply highlighting bug. The letters were all squished together. e.g. "<em>tag1</em><em>tag2</em>"

Now the way this text looked in the XML was "<containingtag>...<em>tag1</em> <em>tag2</em>..</containingtag>". So why has the extraction using an XML DOM parser failed?

System.Xml.XmlDocument (Microsoft .NET).
An XML document doesn't need to treat whitespace between tags as important and by default it shouldn't. Hence the implementation of XmlDocument will eat the whitespace gap between tags.
So if we wish to use .InnerXml to get our snippet in a preserved state we'll need to do the following.

Set the preserve whitespace attribute on the document to true. This will allow you to get the inner XML from tags and preserve and any whitespace you've placed between your HTML tags.

Friday, November 04, 2005

Nostalgia


I was having a little bit of nostalgia about my old house mate ginnly (Virginia) and more importantly flight. I said that I'd go for my pilots licence after I got my motorcycle licence.
Well I managed to get my bike licence a little while back, and I'm now still on my L's and have had two accidents so far. So this here is a little bit of a kick in my pants to get me up and going, so that I'll start getting lessons again. I can only assume that she's still getting lessons down in Melbourne.
So after I'm not so horrendously broke it should be flying lessons once again.

Creation

With the creation of anything in this case a blog, I feel that there should be a purpose or a statement. While this is a blog that allows me to write about whatever I feel, I do intend to use it as a place write technical information, about programming and the techniques around it. I'd also like to include information about myself, but I will I endevour to keep these two areas quite separate.