PascalCase, camelCase and Underscores

Every programmer, or really just anybody who does code, probably has come across a myriad of coding styles — some pleasant, some not so pleasant. In the midst of the entirety of the “coding style”, is this little, but very significant, segment on naming conventions. Indeed coding style represents a lot more than how you name your variables, functions, methods and classes, but it would be an easy argument to say that naming convention is one of the biggest, if not the biggest, influence on how a piece of code looks like. After all, it’s the first thing your eyes will notice — it is the very look of the code.

Over time, two naming conventions have become dominant: camelCase (and it’s cousin PascalCase), and under_scores. There is an interesting article here, titled CamelCase vs underscores: Scientific showdown that does a sort of informal, semi-scientific study on which naming convention is superior.

I’m going to severely abuse Python here, to illustrate to an extreme extent, the two conventions:

class handle_email(object):
  def send_email(message):
    try:
      smtp_obj = smtp_lib.SMTP('smtp.server.org', 25)
      smtp_obj.send_mail('from@server.org', 'to@server.org', message)
      smtp_obj.quit()
  
    except smtp_exception:
      print "Error: unable to send email"
class HandleEmail(object):
  def sendEmail(message):
    try:
      smtpObj = smtpLib.SMTP('smtp.server.org', 25)
      smtpObj.sendMail('from@server.org', 'to@server.org', message)
      smtpObj.quit()
  
    except SMTPException:
      print "Error: unable to send email"

In case you didn’t visit the Whatthecode article linked above, I’m going to take a leaf from the author and ask you to choose, instinctively and before reading on: which do you like? It’d be quite unlikely that both are equally as good or as bad, since everybody feels differently. If you voted the first, then you’re clearly a under_score kinda guy. If not, you’re a camelCase person.

I boil down my choice of which is better to an aesthetic factor, and three key factors. What I’m saying here is that this is my personal, over-the-years and thought-through opinion, and my justification for it. I am not touting this at fact. Hence, that said, I’d like to take a shot at:

  1. Which simply looks better?
  2. Which gives you the most information?
  3. Which is easier to code in?
  4. Which is better for comprehending code?

 

1. Which simply looks better?

This is perhaps the most not-related-to-coding question, as it deals with nothing more than aesthetics. This has more to do with what the eye (and brain) perceives as beauty, than with actual issues such as comprehension and conciseness, which will be dealt with later. As such, the answer to this question is really just a matter of personal taste.

I personally will vote for the under_score convention here, as it sits well with my notion of what beautiful prose should look like — well-spaced, easy to pick apart key words, more spread out lines, and so on.

My opinion: under_score.

 

2. Which gives you the most information?

I’d argue that camelCase, together with PascalCase, has the highest fidelity of information. There are two reasons for that. Firstly, camelCase and PascalCase are distinctive, yet belong to the same naming convention. Hence, using camelCase to represent one set of things (say, function names, variable names), and PascalCase to represent another (say, class names, module names), gives immediate clues into the origin and purpose of a given “thing”. Once you’ve assimilated this and it has become second nature, you’ll be reading and comprehending code much faster, perhaps without even realizing it.

Secondly, the camelCase/PascalCase convention is the more condensed of the two conventions. numItems and MilitaryBoat is shorter than num_items and military_boat. On a single identifier, the length may not make much of a difference. However, code isn’t made up of one or two identifiers on a single line (with a few exceptions), it’s made up of numerous. I believe that the amount of -relevant- information in a given line has a direct translation to how easy it is to comprehend the code. We, as humans, don’t have infinite abilities to keep everything in our heads, and a visual reference is very important in aiding understanding when we cannot memorize and connect everything mentally.

Hence, the conciseness of representation has a huge bearing on me. As a small point, many underscores on a line is also.. ugly. Again, that’s personal ;)

My opinion: camelCase/PascalCase.

 

3. Which is easier to code in?

Again, for this I’d go with camelCase/PascalCase. Hitting shift on a new word is just easy to hit (at least on QWERTY keyboards) than the darn underscore key. Even for decent touch-typists, of which I consider myself, the error rate on hitting the underscore hit is easily infinitely higher than hitting the shift key, because you can hardly miss the shift key.

Hence, camelCase/PascalCase is often much, much easier to type, fast.

My opinion: camelCase/PascalCase.

 

4. Which is better for comprehending code?

There’s no real difference between reading other people’s code and reading your own code after some period of time. They all rely on your ability to comprehend the code that you see. Taking comments out of the picture, which should not be there to replace or fix badly named identifiers anyway,

I’d say that camelCase/PascalCase is again, better. camelCase/PascalCase accentuates the interactions between identifiers, which is the (more) important issue, rather than figuring out the name of the identifier.

Let me make the counter-argument first: under_scores closely mimic the way words are crafted into sentences in English, and hence is certainly easier to read.

I don’t agree. Yes, it is easier to read — as words. It is not easier to comprehend — as code. Code is not about one or two lines, it’s about blocks. This is akin to paragraphs in a language, not sentences. A single line of code hardly as much meaning in the grand scheme of things. The act of introducing blanks (represented by underscores) splits words up so it’s easy to read them quickly, but you lose the meaning that you are looking at an identifier. Taken from a “sentence” point of view, you lose the ability to distinguish the interactions of identifiers with each other. Taken from a “paragraph” point of view, you have a bunch of harder “sentences” to read.

Consider this:

remaining_chars = end_of_file_index - current_file_index

and this:

remainingChars = endOfFileIndex - currentFileIndex

The interactions between the three identifiers (in this case, variables) is extremely clear in the second case, but less so in the first case. Throw in your curly braces, indentations, other forms of “whitespace”, and things become less and less clear. The obvious counter-argument is that IDEs will beautifully color identifiers, making my point moot. I disagree that it’s moot though. Yes, indeed syntax highlighting resolves the issue to a large extent. However, my gripe is that firstly, not all syntax highlighting schemes are appropriately designed. Secondly, there are often times when you browse through code that is not syntax highlighted. Thirdly, I prefer having two forms of pattern matching for my brain to hook on (shape, and color), rather than one (just colors).

Let’s try this again, with colors this time:

remaining_chars = end_of_file_index - current_file_index

and this:

remainingChars = endOfFileIndex - currentFileIndex

Different, yes. Sufficient, perhaps, But two levels of distinction is still better.

My opinion: camelCase/PascalCase.

 

My Personal Feel

My conclusion is this: camelCase/PascalCase has a good number of things going for it, sufficient to overcome the few issues that it poses.

I am not oblivious to some of the issues with this naming convention. I know about the fact that sometimes it doesn’t deal well with special words (e.g. URLCharacters), single letters (e.g. MyIPhone), and so on. However, I do feel that these can be gotten around with by choosing better names.

I also realize that camelCase/PascalCase takes some time to get your brain around identifying capitals as word boundaries. However, once you’ve gotten that part down (which doesn’t take long, really), the benefits are apparent.

The human brain is a intricate and extremely powerful pattern matching machine. The introduction of a character that not just represents, but looks like the blank space character impedes on critical function of our pattern matching abilities — the ability to know when we are looking at a single “thing”, an identifier, and when we are not (i.e. we are looking at interactions between identifiers).

In addition, camelCase/PascalCase naturally introduces greater variation in the way things are named, allowing for conscious or even subconscious deduction of the meaning of an identifier. I’m talking about the fact that when I see PascalCase, I immediately thing class, and when I see camelCase, I immediately think function or variable (assuming that’s the convention of whatever you’re reading). Hence, when I read

Vehicle.getType()

I know what Vehicle and getType are. Immediately. No mousing over to check what the IDE tells me, no attempting to jump to function definitions, etc.

vehicle.get_type()

just doesn’t convey the same fidelity of information. IDE or not.

Comments
2 Responses to “PascalCase, camelCase and Underscores”
  1. You make some interesting points when discussing the readability from a “paragraph” point of view. Your argument of preferring two forms of pattern matching relates to the ‘redudancy gain’ principle in Human-Computer Interaction.

    Redundancy gain. If a signal is presented more than once, it is more likely that it will be understood correctly. This can be done by presenting the signal in alternative physical forms (e.g. color and shape, voice and print, etc.), as redundancy does not imply repetition. A traffic light is a good example of redundancy, as color and position are redundant.
    http://en.wikipedia.org/wiki/Human%E2%80%93computer_interaction#Perceptual_principles

    The readability of ‘paragraphs’ of course is a major aspect which isn’t measured in the study I reference in my article. We shouldn’t draw final conclusions from out-of-context studies, so your points are very valid. It does seem over the past two years interest in identifier styles within academic research has increased. The International Conference on Program Comprehension (http://www.program-comprehension.org/) has published several related articles the past few years, and when I find the time I will review them and see whether any interesting new results have been published. If you are interested in reading up on this yourself, check this article, and it’s citing documents: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5521745&tag=1

    I do find your last point to be pretty much irrelevant. Using underscores doesn’t prevent you from using capitals altogether. In fact, as far as I know those people that do use underscores generally still use camelcasing for classes.

    On a more general note, you might be interested in reading my article on “The Code Formatting Fallacy” which pretty much argues code formatting discussions should be made obsolete: http://whathecode.wordpress.com/2011/11/13/the-code-formatting-fallacy/

    I’ve also been planning for a long time to write about my particular visual studio setup with the help of ReSharper, in which I add additional semantics to coloring which I perceive as greatly improving the comprehension of my code. Finally a small but important sidenote, I use camelcasing myself in .NET since I find following the conventions to be more important than my preferences. :) Another thing I’ll write about in the near future is how you shouldn’t restrict yourself to coding style conventions. Conventions go much broader than that and can greatly improve code comprehension on a higher level aspect as well.

    Thank you for your interesting contribution!

  2. codejury says:

    Hi Steven, thanks for your very good references and insightful comments. I must say that was a refreshing read, especially since I know very little about the theories and formal studies performed in this area, and everything I’m saying is purely intuitive and/or personally empirical (if one can accept the slight paradox in such a term) :)

    With regard to the comprehensibility of code when using underscores vs. camelCase/PascalCase, the point i was trying to bring across is the inherent word-level separation within a single identifier that makes it harder to read. I certainly agree that one can combine underscores with camelCase/PascalCase to good effect. However, the underscores themselves break up the visual “togetherness” of a single identifier, as in myVar vs. my_var vs. my_Var. Hence, when strung together in complex or semi-complex arrangements (ignoring other aspects of good code writing style), it is visually harder to distinguish a single identifier from multiple identifiers at a glance.

    Also, thanks for pointing me to your article. Indeed your point is valid, in that code formatting can (and ought) to be made obsolete. I guess it’s just an unfortunate fact that these things take time, and that progress is made but not at astounding rates, from what I see in the world of programming and code editing. Also, way of looking at the problem is the separation of editing technology vs. code parsing technology.

    In essence, we already have both, excellent editors/IDEs and compilers/translators which correctly parse a program. But they’re not combined. While some effort is made in this area, as you mentioned, much is left to be said about the text-editing *quality* of these editors. It may be a long time coming before the very best-of-breed editors gain such capabilities. The Realaxy editor could be one exception. I personally have not used it before but I take your word that it is certainly a capable one. However, that said, opinions differ so widely about what is a good editor that it’s probably not worth debating about that (yet) again. :P

    Finally, thank you also for your very interesting articles! :)

Leave A Comment

*