Tuesday, March 17, 2009

Matlab OOP Oversight

I've been using Matlab full-time for about 3 years now, and using its new OOP features intensively for about 6 months. It's quite nice, and simplifies a lot of coding, but there's one big glaring oversight in the way Matlab treats arrays of objects. I sat down the other night to write a dorky little rant about it that I could post to the Matlab newsgroup, but it turned into a bit of an essay, complete with citations and sample code, so I made a dorky little blog out of the thing.

Let me first quote from the Matlab documentation:

Super and Subclass Behavior

Subclass objects behave like objects of the super class because they are specializations of the super class. This fact facilitates the development of related classes that behave similarly, but are implemented differently.

A Subclass Object Is A Superclass Object

You usually can describe the relationship between an object of a subclass and an object of its superclass with a statement like:

The subclass is a superclass . For example: An Engineer is an Employee.

This relationship implies that objects belonging to a subclass have the same properties, methods, and events of the superclass, as well as any new features defined by the subclass.

A Subclass Object Can be Treated Like a Superclass Object

You can pass a subclass object to a super class method, but you can access only those properties that are defined in the super class. This behavior enables you to modify the subclasses without affecting the super class.

Two points about super and subclass behavior to keep in mind are:

  • Methods defined in the super class can operate on objects belonging to the subclass.

  • Methods defined by the subclass cannot operate on objects belonging to the super class.

Therefore, you can treat an Engineer object like any other Employee object, but an Employee object cannot pass for an Engineer object.

This is NOT true in the case of arrays of objects. Consider the following pseudo code:

E1 = [Employee(), Employee()];

E1 = 1x2 Employee

E2 = [Employee(), Engineer()];

??? Error using ==> horzcat The following error occurred converting from Engineer to Employee:
Error using ==> Employee
Too many input arguments.

E2 should be an array of Employee objects, but the attempted concatenation raises an error. Despite what the documentation says, the Engineer object CANNOT be treated like any other Employee object.

There's a Matlab technical solution that suggests two possible work arounds to this issue:

  1. Create a converter method that casts the sub-class instance into a superclass instance

  2. Use cell arrays instead of arrays.

Option 1 is simply unviable -- casting to a superclass object loses all of the data specific to the subclass, and obviates the entire point of subclassing. Option 2 is bad idea for reasons I'll go into further on.

This issue came up a year ago or so in the newsgroup, where Steve Lord wrote, in response to somebody troubled by the same problem:

"Greg " wrote in message
news:g1sb1g$9tu$1@fred.mathworks.com...
> Well the work around is to use a cell array instead.

Yes.

> Although I feel that two classes sharing the same base class
> should be able to be put in a matrix...

Let's say that A is an object that is both an animal and a dog, and B is an
object that is both an animal and a cat. Both the dog class and the cat
class have additional properties and/or methods that aren't in the animal
class.

What class should C = [A; B] be?

It can't be a dog array, unless you can convert the cat animal B into a dog
(which usually doesn't work too well.)
It can't be a cat array, since you can't convert A into a cat.
It can't be an animal array either, since each of A and B have pieces that
aren't in the animal class that would have to be removed to fit them into an
animal array.

It doesn't make sense for C to be an animal array, a dog array, or a cat
array. Therefore, IMHO it doesn't really make sense to allow concatenating
A and B.

--
Steve Lord
slord@mathworks.com

IMHO, the only reasonable answer to Steve's question "What class should C = [A; B] be?" is an animal array, not just for consistency with the documentation as described above, but for some more extensive reasons I'll go into below.

Let me first discuss the issue Steve raised, that for C to be an animal array, the pieces that A and B have that aren't in the animal class would have to be removed. First of all, in some cases there might not be any extra bits at all. For example, the Matlab documentation suggests using inheritance as a method of class aliasing:

classdef newclassname < oldclassname
end

Even though there aren't any additional properties or methods, we still can't create a mixed array: [oldclassname() newclassname()] raises the same error we saw earlier.

Obviously, in most cases derived classes will have additional properties and methods. Fortunately, there seems to be a context in which these "extra bits" are handled intuitively, robustly, and quietly. Consider an array of dynamicprops-derived objects:

classdef MyObj < defaultprop =" 1">> obj1 = MyObj();
>> obj2 = MyObj();
>> obj1.addprop('ExtraProp1'); obj1.ExtraProp1 = 100;
>> obj2.addprop('ExtraProp1'); obj2.ExtraProp1 = 200;
>> obj2.addprop('ExtraProp2'); obj2.ExtraProp2 = 300;
>> objs = [obj1 obj2];
>> [objs.DefaultProp]

ans = 1 1

>> [objs.ExtraProp1]

ans = 100 200

>> [objs.ExtraProp2]

??? No appropriate method, property, or field ExtraProp2 for class Engineer.

>> objs(2).ExtraProp2

ans = 300


This is exactly how arrays of mixed class/subclass objects should behave. Properties which exist for all objects in the array are accessible, and properties which exist for only some subset of the array are inaccessible, unless that subset is explicitly indexed.

Now that I've established that arrays could contain mixed base/subclasses, let me explain why they should, beyond simple consistency with the documentation.

1) Ease of use

Cell arrays of objects are much more awkward and less intuitive to work with than regular arrays. Consider the following examples:

%Desired behavior - easy to read and write
employees = [Engineer(), SalesPerson()];
salaries = [employees.Salary];

%Current usage requires cell arrays, which are harder to read and write with
employees = {Engineer(), SalesPerson()};
salaries = cellfun(@(e) (e.Salary), employees);

%Even uglier:
employees = {Engineer(), SalesPerson()};
nEmployees = length(employees);
salaries = zeros(1, nEmpoyees);
for i = 1:nEmployees
salaries(i) = employees{i}.Salary;
end


2) The current interface discourages proper OOP principles

Imagine you have a large suite of functions for handling some custom class Engineer. If you don't sufficiently plan ahead, you might not consider the possibility of subclassing Engineer. Down the road, doing so would require the extensive and frustrating re-writing of every line of code that operated on arrays of Engineers. This could lead to an inertial response to proposed extensions and defeats the entire purpose of object-oriented programming principles.


3) The current interface encourages bad OOP principles.

Let's say there's been a proposal to subclass Engineer to TestEngineer. One potential way to avoid extensive rewriting of code would be to add an ExtraStuff property to the base class, and create a new class called TestEngineerExtraStuff. An instance of this class could then be stored in the ExtraStuff field of every Engineer instance that we would have otherwise subclassed to TestEngineer. I hope I don't need to explain why this is a horrible design decision, and yet the current OOP interface in Matlab encourages this behavior.


4) The community wants it!

Searching the newsgroup for "object array subclass" returns multiple cases (here, here, and here), as well as the aforementioned technical solution, where Matlab users expect their code to behave as I am promoting, and are told to use cell arrays instead. I imagine there's even more cases where users found work-arounds on their own without posting about it on the newsgroup.


The problem of method call dispatching

The biggest problem with my proposal relates to what should be done with method calls to mixed class/subclass arrays, like employees.giveRaise(10). This is a bit confusing due to the difference in how Matlab treats property assignment/referencing and method calls.

Property references on arrays of objects returns a comma separated list, that is: MyObjArray.MyProp is equivalent to MyObjArray(1).MyProp, MyObjArray(2).MyProp, MyObjArray(3).MyProp,... whereas method calls act on the entire array, so MyObjArray.MyMethod() is NOT equivalent to MyObjArray(1).MyMethod(), MyObjArray(2).MyMethod(), ...

If this was consistent, there wouldn't be any problem for methods calls -- each instance in the array would call its own methods by standard method scoping rules. As is, there are multiple ways of handling method dispatching, each with its own advantages and disadvantages:

  1. Find the first common ancestor in which the method is implemented. Thus employees.giveRaise(10) would call Employee.giveRaise(employees, 10), regardless of whether or not SalesPerson implemented its own giveRaise method. An error is raised if no such method is found.

  2. Call each instance's method independently no matter what, as in obj(1).method(), obj(2).method()... This is probably the simplest approach, although it would be pretty frustrating for classes derived from numerical bases used for computation.

  3. Break up array into classes by each unique implementation, and dispatch each sub-array. Thus given

    employees = [Employee, Engineer, Engineer, SalesPerson, Engineer];

    employees.giveRaise(10) would call, in order:

    Employee.giveRaise(employees(1), 10)
    Engineer.giveRaise(employees([2, 3, 5]), 10)
    SalesPerson.giveRaise(employees(4), 10)


Personally, I'd be fine with any of these options, although the third one is probably overly complicated and difficult to debug. The first implementation is probably the most intuitive and useful choice.


Implementation

Probably the best way to implement what I'm pitching here would be for The MathWorks to provide some abstract class SubclassConcatable. Custom classes would subclass this, and then work as I'm described. Custom classes that don't subclass SubclassConcatable would behave as is currently standard, which would maintain backwards compatibility.


What do you think?

I've done my best to make a pretty comprehensive case here, but I've probably missed something obvious. Let me know!




3 comments:

  1. Hi Darik,

    You can use handle to concatenate any classes, regardless of whether or not they have a common ancestor:

    [handle(A), handle(B)]

    Yair Altman
    http://UndocumentedMatlab.com

    ReplyDelete
  2. Yair,

    Not sure how that works. If I call handle(MyClass()) I get:

    ??? Error using ==> handle
    Cannot convert to handle.

    Even though MyClass is derived from handle.

    ReplyDelete
  3. This behaviour for MATLAB has been around a long time. Long before the new object oriented system came into being. Others have commented about MATLABs arbitrary promotion and demotion of numeric data types when placing dissimilar data into arrays. You can try that out for yourself by creating some int8, single and double variables and then placing them into a vector in various orders. With the more recent versions of MALAB you at least get a warning when you do this. In R13 they were simply cast with no notice. Interestingly for a long time MATLAB has had a type of handle object which handles the situation that you describe in the way that you would expect. People with Simulink can test it easily. Simulink.Parameter, and Simulink.Signal classes are both subclasses of the Simulink.Data abstact class. If you create an instance of each and then place them in an array, the array will be of type Simulink.Data but the elements of the array will retain their original data type.

    >> a = Simulink.Parameter

    a =

    Simulink.Parameter (handle)
    Value: []
    RTWInfo: [1x1 Simulink.ParamRTWInfo]
    Description: ''
    DataType: 'auto'
    Min: -Inf
    Max: Inf
    DocUnits: ''
    Complexity: 'real'
    Dimensions: [0 0]

    >> b = Simulink.Signal

    b =

    Simulink.Signal (handle)
    RTWInfo: [1x1 Simulink.SignalRTWInfo]
    Description: ''
    DataType: 'auto'
    Min: -Inf
    Max: Inf
    DocUnits: ''
    Dimensions: -1
    Complexity: 'auto'
    SampleTime: -1
    SamplingMode: 'auto'
    InitialValue: ''

    >> c = [a,b]

    c =

    Simulink.Data: 1-by-2

    >> c(1)

    ans =

    Simulink.Parameter (handle)
    Value: []
    RTWInfo: [1x1 Simulink.ParamRTWInfo]
    Description: ''
    DataType: 'auto'
    Min: -Inf
    Max: Inf
    DocUnits: ''
    Complexity: 'real'
    Dimensions: [0 0]

    >> c(2)

    ans =

    Simulink.Signal (handle)
    RTWInfo: [1x1 Simulink.SignalRTWInfo]
    Description: ''
    DataType: 'auto'
    Min: -Inf
    Max: Inf
    DocUnits: ''
    Dimensions: -1
    Complexity: 'auto'
    SampleTime: -1
    SamplingMode: 'auto'
    InitialValue: ''

    >>

    I believe that Yair plans to cover these type of classes in an upcoming segment on undocumented-matlab.

    Donn

    ReplyDelete