Stop creating all those source code files!

By | 04/02/2015

Apologies, this is going to be a bit of a short rant, though it’s about something that’s been bothering me for quite a while now. Us programmers – we tend to create too many files when we write source code! But why is this bothering me?

Let’s start by looking at some facts. First of all, at execution time the machine does not care about how many files things were split into – it all lives in the same memory space. Secondly, all decent programming languages already offer different ways of structuring your code in logical units (function, classes, namespaces, etc.) in a way that can be automatically validated.

So why do we keep splitting code into different files, in different folders? In my opinion, there are only two valid reasons for splitting code into separate files: size limitations imposed by compiler/interpreter/editor and moving common code into a place where it can be shared so that we don’t require copy and paste. There is a third reason, however, which I think is the main reason why we tend to split code into files; namely that a lot of our tools used to suffer from a lack of dealing with code structure in a proper way. But that is a thing of the past – nowadays editors have features like code folding, code bubbles, editorconfig, et cetera, so this cannot really be a reason anymore.

“But wait, what is the downside of splitting code into files for other reasons than the first two?” I hear you asking. And here’s my answer: It creates an additional structural level that is ambigious and cannot be automatically validated without writing special tools for it. Suddenly you have to deal with questions along the lines of “should this code be in this file or that file?”, “will this create circular imports?” or “Where do I have to put configuration values again?” – how can this be an acceptable additional headache to anyone who is working on a major (read: not just a simple automation script) project? Every major project already has tons of other, in my opinion more important, aspects to consider – namely those directly required for fulfilling the projects duty. There are even more drawbacks that are introduced – for example, you need to deal with filesystem restrictions (granted these are pretty relaxed nowadays) and coming up with meaningful names – already a hard problem when programming – gets added to yet another layer (“defines.h” or “constants.json” are not helpful). And don’t forget the major culprit of forcing this extra burden on your co-workers.

All this takes energy away from actually solving the problems posed by your project, and thus reduce turn-around times and time-to-market.

Hence: Stop creating all those source code files!

25 thoughts on “Stop creating all those source code files!

  1. Daniel Robert SpeedDaniel Robert Speed

    Splitting files still reduces the potential for, and increases the clarity of when file based source control needs you to merge. There are other things that tend to happen in various languages at a file level too, like dependency / import management – having worked with a few packages where functions ended up requiring something else just because it was used somewhere in the file, it can be a bit of a pain in the ass to deal with. I think you need to better establish your motivation not to split things up.

  2. Thomas DählingThomas Dähling

    Dan: That is true. But in the case of the merging – that’s a matter of providing a useful tool. When it comes to dependency / import management it is a question of “does the code I am writing actually have to be a separate module”? But then you are quickly on the slippery slope of providing versioning, and so on, and so forth. Some languages have an ecosystem that has decent support for it, some do not. Most often you do not need a separate module, and when you realize that you do it is not too difficult to refactor (well, ok, that depends upon how many shortcuts were taken in the implementation but then its likely not a good module anyway).

    That being said: yes, there are times when splitting things into files is required, and that is all fine and dandy. But a lot of times splitting code into files happens for some perceived reason of “organization” (think along the liness classA.h/classA.cpp etc) which is simply not useful unless such organization is automatically enforced.

  3. Snorri SturlusonSnorri Sturluson

    I wholeheartedly disagree – source files should be kept small, usually only one class per file. So much easier to deal with when working with others, using a source control system. Your IDE keeps track of where things are and what needs importing.

  4. Robert BabiakRobert Babiak

    I agree with Dan, A well organized file structure helps you categorize your code by system it is part of it. Having one file per XXX in system YYY is useful for organizational reasons. It also provides a table of contents for your code, so if you want to find XXX you have a good idea where to start looking for the code.

  5. Thomas DählingThomas Dähling

    Snorri: the collaboration part is where I see current merge tools lacking and splitting things across files being a work around for lack of better tools.

    One class per file: great. And one folder per namespace? That would probably a good organziation. But suddenly I have to rename things in multiple places (the code and the filesystem level). And coming back to merges, then we all know how well some version control systems behave with that kind of operation…

    About the IDE: it does not tell me what code is supposed to go where.

  6. Vilhelm SævarssonVilhelm Sævarsson

    A modern IDE could put everything into some custom format and give you the impression that you are working in small logical units such as files and give you the benefits of having everything in the same place as well. There are strides towards this within the developer community, where the notion of a file is hidden and you work with some sort of custom logical units other than files and folders. Right now we have a lot of tools that rely on traditional setups but I would assume that future tools would rather cater towards getting rid of any separate hierarchical structures and rely on what jives the best with the IDE and the language.

  7. Thomas DählingThomas Dähling

    Robert: If I want to find XXX in my code I simply search the codebase for it, not the filesystem.

  8. Charles PalmerCharles Palmer

    I agree with this disagreement for the reasons above but also because I find giant files a real burden to move around in. With lots of code folding it becomes a real visual mess and over time all sorts of changes happen forcing stuff that should be closely related to be miles apart. That is at least harder to achieve in a system where each file has a specific purpose.

  9. Thomas DählingThomas Dähling

    Charles: Code bubbles! (-; The problem is that there are too many tools that treat files as the organizational level that should be used, even though it is redundant. And as it tends to go with redundancy – sometimes it is useful, sometimes it is an unnecessary extra burden.

  10. Charles PalmerCharles Palmer

    Code Bubbles work for a limited set of circumstances and aren’t available in every or even many IDEs. Aside from which files and folders often are the units of organisation in many languages and not in a redundant manner. I’m not sure if you’re arguing for a change in working practices or the way languages organise things.

  11. Snorri SturlusonSnorri Sturluson

    In an ideal world we wouldn’t have to care about what lives in what file and could focus on the constructs themselves, classes and methods and whatnot. In the real world, I want to keep files small and have clear guidelines on what goes where (generally one class per file, namespaces in folders).

  12. Thomas DählingThomas Dähling

    Charles: As stated in the article, I’m mostly just ranting about the extra overhead imposed on me when organizing stuff in files. There certainly are differences between individuals on how we work with code. For instance, personally I nowadays mostly navigate by function/variable/class/etc name around the code bases (jump to definition, etc).

    Also I really enjoy reading the disagreements. They are all valid points.

    What caused me to write this rant was that I ran across some code bases where for no obvious reasons things were split between different files, each file less than 50-100 LOC.

  13. Robert BabiakRobert Babiak

    yes, that works good for someone who is familiar with the code base and already knows the general structure of the code. If you are new to the project then you have no idea where to start looking for something. If you know it is part of system YYY then browsing the filename under that folder gives to a starting point and has eliminated 95% of the possible place in the code base that it could be. Think of it from the point of view of a new programmer just out of university and bootcamp. They have no understanding of the larger code structure, This person is probably trying to come to terms with the project vernacular. For instance If i told you to fix a bug with the anchoring of a starbase. where would you start looking? You need a level of knowledge to be able to search effectively. While a directory structure where the folder name is space objects, will lead you to a subset of the code to start exploring.

  14. Vilhelm SævarssonVilhelm Sævarsson

    Most of the utilities that are used for building languages are command line tools that are split up between multiple executables, custom or otherwise. Since the file system is the most basic and common structure we have, it tends to make a heavy mark on how the data is handled and organized. Splitting work into different processes is a large contributor for why Unix based systems are viewed as more stable than the windows systems. Applications are not written in a monolithic mess and application writers actually utilize the memory manager of the operating systems and decouple execution across process boundaries. There are a lot of cool and nice ideas of IDEs that can lift all our worries and make programming more attractive than sex. If some programming back end like LLVM would actually require you to use pipes to feed it data and not paths on a file system, you would have a better time decoupling that idea from the rest of the programming paradigm. Until that thing happens, I foresee us having to use filesystems for quite some time.

  15. Thomas DählingThomas Dähling

    Robert: If I were a new programmer and someone told me that there is an issue with anchoring a starbase, I’d ask a fellow coworker for where to start looking and/or consult the API documentation (which is hopefully autogenerated from the source code comments).

  16. Robert BabiakRobert Babiak

    I had kicked around the idea at one point of moving all the code into a database, where every class and function are rows in tables. Then building reference links between the code. This would allow you to quickly find what code used a given function, be able to automate some things like changing parameters. The IDE would know where this code was called from. You could also start tagging function, so that all the functions that are part of a given interface are tagged as such. Then a tag search would instantly bring up all the instance of that interface function, It would also eliminate duplicate named (and unrelated interface functions) from your search. It would also allow for intrinsic collaboration and source control. Along with an automated build system around the commit of a code section. Especially if the compile became just the changed function, and the database held the obj code for every class. I had been considering it from the point for speeding up large C++ project compile times. I don’t think it would be feasible, or of much benefit for compile times.

  17. Robert BabiakRobert Babiak

    In a perfect world, that would be nice, but some projects you don’t have co-workers, or the asking coworker is a form post and a response from a different timezone. Many open source projects the only communication comes from form posts, or wiki. You are left on your own to grawk the code structure. The file structure lends itself to understanding how the code is structured. A search doesn’t give you any relationships, just a list of key words.

  18. Ævar Örn KvaranÆvar Örn Kvaran

    Think your take on the subject is very strange to say the least, machines also do not care about classes or barely even functions, thats not to say that they are extremely helpful organizational units. I tend to like the rule of one class per file, keeping the responsibilities of each class limited. Even if you have all your code in one file and don’t have to answer the question “should this code be in this file or that file?” you will still have to put it into some sort of organizational structure, so that problem remains unless you are advocating just sticking to the main function. Regarding the units that are usually files corresponding to actual file system files is interesting to think about, but definitely helps to reduce editor/IDE and source control dependencies. Folders corresponding to namespaces is not necessarily super either. Xcode for example has/had two options to organize projects, all files in one folder with virtual folder structure as part of the project file, or organizing into folders, and thereby making namespaces (since objective-c did not have support for them, but folders where a good alternative when you needed them. However merging that goddamned project file was a nightmare. Code bubbles is a very interesting thing, but its so brand new (5 years) that very few (if any?) tools support it, would have to try that out for my self to see how things fit in them.

  19. Thomas DählingThomas Dähling

    Robert: That is assuming that the file structure actually reflects the code structure. Which is a dangerous assumption because it is not automatically enforced. Just like commented out code never gets deleted, there are some files that just get excluded from project files instead of removing them. So files alone do not help discovering the way a project works.

  20. Kristján Valur JónssonKristján Valur Jónsson

    Software does not fit into the strict hierarchical model of a filesystem. Do not try to create a mapping between the product and the file system because said mapping always breaks and it will just hinder you in the end, like trying to choose a suit to wear for life when you are ten. Use lots of files. Put stuff in it. Try to put them in generally sensible places using whatever “system” you come up with, such as organizing it by subsystem, colour or time of day. Then use your IDE or editor to find the code when you need it.

  21. Thomas DählingThomas Dähling

    Ævar: Functions and classes have a specific purposes directly related to solving the problem – functions reduce code duplication, for example. Files can serve the same purpose if I need to share a function between different projects.

  22. Ævar Örn KvaranÆvar Örn Kvaran

    It is an interesting thing thinking about the limitations of file systems as organizational things, I remember the problems I ran into trying to sort my music library into folders based on genre. Almost drove me insane, but tagging systems can work much better for that purpose. However until we have any realistic alternatives, splitting stuff into many files is good IMO. Code folding is not that, but bubbles might be the step in right direction

  23. Daniel Robert SpeedDaniel Robert Speed

    I have to admit, I like a flat, library based code structure with low coupling, but on a codebase like Eve’s, it becomes a forest. I actually don’t think that’s a problem in itself, as long as you have ways of discovering what you’re looking for (search, tags, up-to-date code docs). After all, github is pretty much flat.

  24. Ævar Örn KvaranÆvar Örn Kvaran

    regarding files, I think the thing I hate the most is windows (and partially OSX) semi case-insensitivity versus programming language/environments often complete case sensitivity. Tend to be careful about it now, but ooh has that burned me over the years

Leave a Reply

Your email address will not be published. Required fields are marked *