Dictionaries v. Objects#
a story of shared abstractions
Shannon Behrens recently published a brief post on the use of dictionaries and objects to store named data. He raised the following question:
At what point do we transition from core data representations like dicts
to custom data representations with custom classes?
Choices in data representation interest me quite a bit. I think that the mechanisms we choose to store and share data strongly impact the longevity of the code that we write.
Let’s borrow Shannon’s book example to compare two common options of data representation
" Book object " " Plain Dictionary "
class Book(object):
def __init__(self, authors, title, isbn):
self.authors = authors
self.title = title
self.isbn = isbn
book = {
book = Book(["Neil Gaiman"], "authors": ["Neil Gaiman"],
"American Gods", "title": "American Gods",
"9780062113450") "isbn": "9780062113450"
}
}
Shannon advocates that you should “Keep it Simple” (right) for simple data structures but that, once things become more complex, you should transition to an object (left).
I think that most Python programmers would agree with this sentiment. It’s unclear (and thus interesting) exactly when this transition occurs though. Personally I use core data structures (like dict
) to represent almost all of my data. I create new classes very rarely. I follow this practice because I believe in the principle of shared abstractions. I’ll explain what I mean in a moment.
Advantages of Objects#
To start, lets state some of the virtues of objects (this is a very incomplete list.)
Historically they’re just more commonly used for complex data. It’s important to maintain coding standards that existing programmers find comfortable.
The attribute syntax is a lot nicer. Compare the following syntax to access an attribute:
book["authors"] # 11 characters to get attribute, 4 of them awkward to type book.authors # 8 characters to get attribute, all of them common
We can add functionality/methods to the objects in a controlled namespace. For example
book.open() .open clearly belongs to book open(book) open conflicts with standard Python function to open a file
The attribute syntax note is convenient but not a game-changer. The ability to associate a set of functions with the object (and reuse these functions on inherited classes) is really where objects shine. To emulate this behavior with core data structures you need to respect namespaces
import bookpy
bookpy.open(book)
And culturally many of us (myself included) are pretty bad at this.
Advantages of Dicts#
There are some small advantages to using dicts like the following
Literal syntax, we can jump right in and write down dicts without setting up a class first.
Efficient implementation (dicts are really fast)
But, like the object attribute syntax these are merely convenient.
The major reason to prefer core data structures is that they are an abstraction shared among all Python programmers. As a result you can depend on standard code to just work on your data.
For example we can serialize our data into JSON if, for example, we wanted to serialize it and send it over the internet.
>>> import json
>>> book = {
... "authors": ["Neil Gaiman"],
... "title": "American Gods",
... "isbn": "9780062113450"
... }
>>> json.dumps(book)
'{"title": "American Gods", "isbn": "9780062113450", "authors": ["Neil Gaiman"]}'
The json
library fails to transform our book
object however.
>>> import json
>>> book = Book(["Neil Gaiman"], "American Gods", "9780062113450")
>>> json.dumps(book)
TypeError: <__main__.Book object at 0x10e4410> is not JSON serializable
This is to be expected. The json
library developers have not seen our Book class. No one has but us.
While we may not think we need json
encoding it’s likely that if our project succeeds then someone will want to send our objects over the internet. The decision to make a custom class shuts our project off from the rest of the development ecosystem.
Custom classes inhibit interoperation with other libraries. This limits growth out to unanticipated applications. This is particularly tragic because our field thrives on connecting isolated components. Shared abstractions serve as the connection links, like the regular bumps and divots in LEGO pieces. Shared abstractions allow us to connect cool modules to create novel cool projects with ease. Otherwise we’re stuck hacking together missing technology (like a Book.write_json
method) when perfectly adequate solutions (like json.dumps
) already exist.
This example isn’t restricted to JSON encoding, in general we’re going to have to hack something together every time we want the Book objects to interact with foreign code. This problem compounds as we begin to rely on more and more external projects.
Best of Both Worlds (or, Fancy Solutions)#
You can implement shared abstractions on top of custom classes. For example you can implement the iterable
abstraction by implementing an __iter__
method on your class, allowing other foreign modules to leverage your custom data type without any trouble. The namedtuple
function in collections
might be a good fit for the book
application above.
Also, it’s worth noting that JavaScript handles this problem in a curious way. Objects and dictionaries are equivalent. It’s an interesting direction and something I suspect we could hack together in Python as well.
Finally, most objects in Python actually are just dictionaries. You can always operate on the object.__dict__
attribute.