Tuesday, November 3, 2009

Troubleshooting Matlab and MySQL

Matlab's got a half-decent database toolbox, but it's usually a huge pain to get working at first.

You can see some other blogs here or here for the code but one thing that hasn't been pointed out clearly enough is how bad the error reporting is.

Take this, for example:

driver = 'com.mysql.jdbc.Driver';
driver = which('mysql-connector-java-5.0.8-bin.jar');
javaaddpath(driver);
host = 'loclhost';
port = 3306;
dbname = 'MyTestDb';
user = 'tester';
password = 'god'; %The Plague wouldn't like this...
url = ['jdbc:mysql://' host ':' num2str(port) '/' dbname];
db = database(dbname, user, password, driver, url)


Looks ok, but returns the error JDBC Driver Error: com.mysql.jdbc.Driver. Driver Not Found/Loaded. Which of course would send any normal person off on a wild goose chase -- Is the jar file corrupted? Is there supposed to be a 64-bit version? Maybe I should use the fixed java class path instead of the dynamic one?

But it's just a stupid typo: 'loclhost' instead of 'localhost'. Most stupid connection string mistakes you can make (user names, port numbers, etc) will result in the same missing driver error.

Wednesday, April 15, 2009

Unique handles

I was surprised today to find that there's no built-in UNIQUE function for arrays of handle derived objects.
>> X = [myClass(); myClass(); myClass()]; %Three unique instances of myClass
>> X = [X X]; %Double up -- two copies of each unique instance
>> unique(X)

??? Error using ==> double
Conversion to double from class1 is not possible.

Error in ==> unique at 92
a = double(a);

When you're working with arrays of handle graphics handles, handles are actually doubles, so any standard array-based function (e.g. permute, kron, ismember) will work on an array of handle graphics handles. This works because you're actually working with each graphics object's globally unique identifier (GUID).

The more recently released handle objects are represented by doubles in the same way; it's just that the underlying GUID is abstracted away so that its actual numeric value is never seen. This is a nice abstraction in some ways: myObj.Prop = val is much easier to read and write than set(myObj, 'Prop', val).

The fact that handle objects are still represented as GUIDs in the Matlab interpreter is evident when you check the built-in handle methods:
>> methods handle
Methods for class handle:

addlistener findobj gt lt
delete findprop isvalid ne
eq ge le notify
The relational operators gt, lt, ne, eq, ge, and le operate by calling the appropriate comparators on the GUIDs themselves. For some reason there are more handle methods listed in the documentation that aren't returned by the methods function: transpose, permute, reshape, and sort. Again, since arrays of handle objects are essentially arrays of GUIDs, these functions work exactly how we'd expect.

Of course, once we've got reshaping and relational operators, we can build most of the other functions we'd need. Here's a quick and dirty unique function:

function B = uniquehandles (A)
%UNIQUEHANDLES Find unique handle object instances

A = A(:); %Column vector
A = sort(A);
i = [true; A(1:end-1)~=A(2:end)];
B = A(i);
The problem here is obvious if you check out the code for the built in Matlab unique function: it's dominated by error-checking and code for maintaining dimensional consistency, and I really don't want to duplicate it all.

The weird thing is that despite the documentation, it seems like almost all array functions work with arrays of handles like repmat, sortrows, num2cell, and mat2cell. Even some of the set functions work: ismember and intersect work, but unique, setdiff, and setxor don't.

This happens because ismember and intersect are polymorphic. There isn't a separate handle.ismember function; it "just works" because the GUID basis allows arrays of handles to be treated like numeric arrays. The other set functions attempt to explicitly cast the input array to double, which results in the oh-so-aggravating "cannot convert to double" error.

Tuesday, March 17, 2009

Matlab OOP Oversight

I've been using Matlab full-time for about 3 years now, and using its new OOP features intensively for about 6 months. It's quite nice, and simplifies a lot of coding, but there's one big glaring oversight in the way Matlab treats arrays of objects. I sat down the other night to write a dorky little rant about it that I could post to the Matlab newsgroup, but it turned into a bit of an essay, complete with citations and sample code, so I made a dorky little blog out of the thing.

Let me first quote from the Matlab documentation:

Super and Subclass Behavior

Subclass objects behave like objects of the super class because they are specializations of the super class. This fact facilitates the development of related classes that behave similarly, but are implemented differently.

A Subclass Object Is A Superclass Object

You usually can describe the relationship between an object of a subclass and an object of its superclass with a statement like:

The subclass is a superclass . For example: An Engineer is an Employee.

This relationship implies that objects belonging to a subclass have the same properties, methods, and events of the superclass, as well as any new features defined by the subclass.

A Subclass Object Can be Treated Like a Superclass Object

You can pass a subclass object to a super class method, but you can access only those properties that are defined in the super class. This behavior enables you to modify the subclasses without affecting the super class.

Two points about super and subclass behavior to keep in mind are:

  • Methods defined in the super class can operate on objects belonging to the subclass.

  • Methods defined by the subclass cannot operate on objects belonging to the super class.

Therefore, you can treat an Engineer object like any other Employee object, but an Employee object cannot pass for an Engineer object.

This is NOT true in the case of arrays of objects. Consider the following pseudo code:

E1 = [Employee(), Employee()];

E1 = 1x2 Employee

E2 = [Employee(), Engineer()];

??? Error using ==> horzcat The following error occurred converting from Engineer to Employee:
Error using ==> Employee
Too many input arguments.

E2 should be an array of Employee objects, but the attempted concatenation raises an error. Despite what the documentation says, the Engineer object CANNOT be treated like any other Employee object.

There's a Matlab technical solution that suggests two possible work arounds to this issue:

  1. Create a converter method that casts the sub-class instance into a superclass instance

  2. Use cell arrays instead of arrays.

Option 1 is simply unviable -- casting to a superclass object loses all of the data specific to the subclass, and obviates the entire point of subclassing. Option 2 is bad idea for reasons I'll go into further on.

This issue came up a year ago or so in the newsgroup, where Steve Lord wrote, in response to somebody troubled by the same problem:

"Greg " wrote in message
news:g1sb1g$9tu$1@fred.mathworks.com...
> Well the work around is to use a cell array instead.

Yes.

> Although I feel that two classes sharing the same base class
> should be able to be put in a matrix...

Let's say that A is an object that is both an animal and a dog, and B is an
object that is both an animal and a cat. Both the dog class and the cat
class have additional properties and/or methods that aren't in the animal
class.

What class should C = [A; B] be?

It can't be a dog array, unless you can convert the cat animal B into a dog
(which usually doesn't work too well.)
It can't be a cat array, since you can't convert A into a cat.
It can't be an animal array either, since each of A and B have pieces that
aren't in the animal class that would have to be removed to fit them into an
animal array.

It doesn't make sense for C to be an animal array, a dog array, or a cat
array. Therefore, IMHO it doesn't really make sense to allow concatenating
A and B.

--
Steve Lord
slord@mathworks.com

IMHO, the only reasonable answer to Steve's question "What class should C = [A; B] be?" is an animal array, not just for consistency with the documentation as described above, but for some more extensive reasons I'll go into below.

Let me first discuss the issue Steve raised, that for C to be an animal array, the pieces that A and B have that aren't in the animal class would have to be removed. First of all, in some cases there might not be any extra bits at all. For example, the Matlab documentation suggests using inheritance as a method of class aliasing:

classdef newclassname < oldclassname
end

Even though there aren't any additional properties or methods, we still can't create a mixed array: [oldclassname() newclassname()] raises the same error we saw earlier.

Obviously, in most cases derived classes will have additional properties and methods. Fortunately, there seems to be a context in which these "extra bits" are handled intuitively, robustly, and quietly. Consider an array of dynamicprops-derived objects:

classdef MyObj < defaultprop =" 1">> obj1 = MyObj();
>> obj2 = MyObj();
>> obj1.addprop('ExtraProp1'); obj1.ExtraProp1 = 100;
>> obj2.addprop('ExtraProp1'); obj2.ExtraProp1 = 200;
>> obj2.addprop('ExtraProp2'); obj2.ExtraProp2 = 300;
>> objs = [obj1 obj2];
>> [objs.DefaultProp]

ans = 1 1

>> [objs.ExtraProp1]

ans = 100 200

>> [objs.ExtraProp2]

??? No appropriate method, property, or field ExtraProp2 for class Engineer.

>> objs(2).ExtraProp2

ans = 300


This is exactly how arrays of mixed class/subclass objects should behave. Properties which exist for all objects in the array are accessible, and properties which exist for only some subset of the array are inaccessible, unless that subset is explicitly indexed.

Now that I've established that arrays could contain mixed base/subclasses, let me explain why they should, beyond simple consistency with the documentation.

1) Ease of use

Cell arrays of objects are much more awkward and less intuitive to work with than regular arrays. Consider the following examples:

%Desired behavior - easy to read and write
employees = [Engineer(), SalesPerson()];
salaries = [employees.Salary];

%Current usage requires cell arrays, which are harder to read and write with
employees = {Engineer(), SalesPerson()};
salaries = cellfun(@(e) (e.Salary), employees);

%Even uglier:
employees = {Engineer(), SalesPerson()};
nEmployees = length(employees);
salaries = zeros(1, nEmpoyees);
for i = 1:nEmployees
salaries(i) = employees{i}.Salary;
end


2) The current interface discourages proper OOP principles

Imagine you have a large suite of functions for handling some custom class Engineer. If you don't sufficiently plan ahead, you might not consider the possibility of subclassing Engineer. Down the road, doing so would require the extensive and frustrating re-writing of every line of code that operated on arrays of Engineers. This could lead to an inertial response to proposed extensions and defeats the entire purpose of object-oriented programming principles.


3) The current interface encourages bad OOP principles.

Let's say there's been a proposal to subclass Engineer to TestEngineer. One potential way to avoid extensive rewriting of code would be to add an ExtraStuff property to the base class, and create a new class called TestEngineerExtraStuff. An instance of this class could then be stored in the ExtraStuff field of every Engineer instance that we would have otherwise subclassed to TestEngineer. I hope I don't need to explain why this is a horrible design decision, and yet the current OOP interface in Matlab encourages this behavior.


4) The community wants it!

Searching the newsgroup for "object array subclass" returns multiple cases (here, here, and here), as well as the aforementioned technical solution, where Matlab users expect their code to behave as I am promoting, and are told to use cell arrays instead. I imagine there's even more cases where users found work-arounds on their own without posting about it on the newsgroup.


The problem of method call dispatching

The biggest problem with my proposal relates to what should be done with method calls to mixed class/subclass arrays, like employees.giveRaise(10). This is a bit confusing due to the difference in how Matlab treats property assignment/referencing and method calls.

Property references on arrays of objects returns a comma separated list, that is: MyObjArray.MyProp is equivalent to MyObjArray(1).MyProp, MyObjArray(2).MyProp, MyObjArray(3).MyProp,... whereas method calls act on the entire array, so MyObjArray.MyMethod() is NOT equivalent to MyObjArray(1).MyMethod(), MyObjArray(2).MyMethod(), ...

If this was consistent, there wouldn't be any problem for methods calls -- each instance in the array would call its own methods by standard method scoping rules. As is, there are multiple ways of handling method dispatching, each with its own advantages and disadvantages:

  1. Find the first common ancestor in which the method is implemented. Thus employees.giveRaise(10) would call Employee.giveRaise(employees, 10), regardless of whether or not SalesPerson implemented its own giveRaise method. An error is raised if no such method is found.

  2. Call each instance's method independently no matter what, as in obj(1).method(), obj(2).method()... This is probably the simplest approach, although it would be pretty frustrating for classes derived from numerical bases used for computation.

  3. Break up array into classes by each unique implementation, and dispatch each sub-array. Thus given

    employees = [Employee, Engineer, Engineer, SalesPerson, Engineer];

    employees.giveRaise(10) would call, in order:

    Employee.giveRaise(employees(1), 10)
    Engineer.giveRaise(employees([2, 3, 5]), 10)
    SalesPerson.giveRaise(employees(4), 10)


Personally, I'd be fine with any of these options, although the third one is probably overly complicated and difficult to debug. The first implementation is probably the most intuitive and useful choice.


Implementation

Probably the best way to implement what I'm pitching here would be for The MathWorks to provide some abstract class SubclassConcatable. Custom classes would subclass this, and then work as I'm described. Custom classes that don't subclass SubclassConcatable would behave as is currently standard, which would maintain backwards compatibility.


What do you think?

I've done my best to make a pretty comprehensive case here, but I've probably missed something obvious. Let me know!