Git in Appsmith: The Details behind Our Implementation
Take a dive into how Git functions in Appsmith and discuss the problems and solutions we discovered while implementing this functionality.
As we discussed in the previous article in this series, we’ve spent a lot of effort over the last year polishing Git in Appsmith so that it’s ready for you to use, just the way you’re used to using Git. Since Appsmith is open-source, we’re happy to share the ins and outs of this process and what we’ve learned while developing this feature.
The biggest problem we had to solve was moving from the standard [local repository]—[remote repository] model in Git to our [web client]—[Appsmith server]—[remote repository] model. Another major difficulty was reliably (and efficiently!) converting Appsmith applications that are serialized to a database into the source files that Git expects to work with. In this article, we will examine our Git implementation and the solutions to these problems.
The Git implementation is more complex in Appsmith because of the interaction between Appsmith servers and web clients and the conversion from database entities to source files. This extra step is hidden from the user.
Why Git instead of another VCS (or something home-grown)?
We chose Git because it is almost synonymous with “version control” in modern software engineering, and for good reason. Git was the first successful open-source, decentralized source control solution created all the way back in 2005 by Linus Torvalds, the creator of the Linux kernel. He has a great talk about this from all the way back in 2007, but I will highlight just a few differences between Git and its predecessors.
Git looks at commits as a collection of every source file in the filesystem stamped with a SHA-1 hash, rather than each file having its own version as most previous version control systems (VCSs) do. This makes a lot more sense, because developers don’t really care about individual files themselves, but rather the functionality of the entire application.
Another major improvement introduced by Git was cheap branching. In previous VCSs, branching was a huge hassle that often took hours or days of planning. In Git, a branch is just a pointer to a commit, which makes it extremely cheap to create. All Git has to do on the back end is store the SHA-1 value of the commit that a branch points to in a file, which is around 40 bytes (not kilobytes, just bytes). This branching model, combined with the tree data structure to organize commits on the back end, made parallel development the default.
The last major innovation from Git was the idea of decentralized development. Since Linus wanted an open-source solution to manage the Linux kernel, he shied away from previous solutions that relied on a centralized server. With previous VCSs, if you didn’t have an internet connection to the centralized server, you could not commit changes. But with Git, you could have an entire repository that you could commit to just on your local machine without any internet connection and then push your changes to a remote repository for redundancy and sharing when you wanted.
Git is backed by a strong open-source community, is widely supported by different major hosting providers like GitHub and Bitbucket, allows plugins to customize workflows, and has a nearly two-decade-long track record of doing the job of source control well. These are some of the main reasons why Git is so ubiquitous today.
How we implemented Git in Appsmith
This was a massive undertaking for the Appsmith team, but we need to acknowledge that we are really standing on the shoulders of giants.
Without Git itself, we’d be having to implement our own VCS paradigms (and probably learn a lot of lessons that others have already learned), and without the JGit project, an implementation of Git in Java maintained by the Eclipse foundation, we wouldn’t have a stable foundation for the required back-end operations in Git.
However, even though leaning on high-quality third-party libraries saved us a lot of development toil, we still had plenty of problems to solve to make Git and Appsmith compatible.
Aligning Appsmith and Git to work together effectively
Working with Git on an app platform like Appsmith needs to make sense to users who are comfortable working on the command line. The biggest challenges were moving from the [local repository]—[remote repository] model of Git to our [web client]—[Appsmith server]—[remote repository] model and transitioning from applications stored as documents in MongoDB to source files on a filesystem.
These difficulties led to the following design decisions to make sure that Git and Appsmith could function properly together:
The Appsmith server, not a user’s web client, stores the local repository.
This local repository still communicates normally with the remote repository (like GitHub) for push and pull operations.
Changes are transferred between the Appsmith server and the web client for users to see.
We maintain a filesystem on the Appsmith server that mirrors the entry in the application database.
Most of the issues we experienced were caused by the extra complexity when transferring data between the web client and the Appsmith server as that doesn’t exist in the traditional Git workflow.
Authentication issues
Early on, authentication issues were a major blocker, because we wanted to find authentication methods with the right balance of convenience and flexibility. We initially gravitated towards user-based authentication methods (such as OAuth), but later settled on key-based authentication so that we could support as many hosting providers as possible with a single solution.
Managing application source files on the Appsmith server
The major components of Appsmith applications are stored in Appsmith’s database. This includes everything that makes up an application -- datasources, queries/JS Objects, widgets, and pages. This presents a few problems as Git only works with files on a filesystem, so we need to take this data and write it to files reliably and safely.
Our first concern was the security of our users. One of the primary ways users accidentally leak credentials is by committing them to a VCS repository. For that reason, when storing datasources in source control, we store as much of the configuration as possible in a .json file that can be committed to Git, while excluding any secrets. We then store the encrypted secrets only in the Appsmith database where they are safe from accidental disclosure.
We also serialize queries and JS Objects and write them to files. For a particular page on an application, each query has its own directory, where we store the query in a text file combined with any relevant metadata as a separate .json file. We write JS Objects to .js files containing the object code.
a) The text file contains the actual query while the b) metadata.json file contains information about that query, including the datasource it is built on.
Widgets are front-end React-based components that handle user interactions and trigger back-end queries and other actions. As of October 2023, Appsmith offers 40+ widgets out of the box. These widgets are stored in source control as a collection of .json files that describe the subcomponents of the overall widget.
This is a subset of the files stored in source control for one of the more complex widgets in Appsmith: the tabs widget.
All of these components (along with their associated metadata) are stored as text files, .js files, or .json files under a particular page directory in source control, corresponding to the different components of Appsmith applications. This filesystem is maintained by the Appsmith server, which transparently converts the contents of its database into source files for Git to use whenever changes are made.
Getting to the point where we could convert every single part of an application stored in Appsmith’s MongoDB database to and from source files like this was easier said than done. This is something that we had to continually polish to make sure all of the components and metadata were being imported and exported correctly throughout the entire application.
Performance
In a normal Git workflow, all development takes place on the machine holding the local repository and performance considerations over a network are only relevant for push and pull operations.
As development takes place on a user’s web client, that information needs to be transmitted to Appsmith’s server, which serves as the local repository. This means that all operations are network-dependent, which is why it was so important to make these operations as performant as possible.
Early on, many operations on the Appsmith server were simply too slow, which caused the Appsmith client to time out on certain requests. This caused partially-completed operations or operations completing out of order, ultimately leading to corruption. This is the main reason that Git functionality was so unreliable in the early days and we spent a lot of time fixing this issue.
Ignoring unchanged files
One operation that took particularly long was the conversion of the application from its database representation to its filesystem representation. These filesystem operations were I/O-heavy and therefore took a long time to complete, often causing timeouts. This made it look like operations were failing, even though they were working correctly on the server, just not quickly enough.
We realized that the most efficient way to resolve this was to avoid the write operation entirely, if possible. Each component of the application stored in the database has its own timestamp and we use this information to tell if that component has been altered since the previous commit. If it hasn’t, then we just skip the write operation for that component to save time.
Ignoring metadata
A more complicated problem was ignoring changes that didn’t make sense for a user. There is a massive amount of metadata for the various components of an application that wouldn’t necessarily make sense in a git diff, because it doesn’t correspond with changes that users made to the application.
The problem is that this data was intermixed with changes that users did make. We had to rework the architecture of the files that were stored in source control to pull out this metadata into its own file that could then be ignored in client requests to further speed up operations.
These improvements cut down the response time for requests to the Appsmith server by about 4x. This turned our Git functionality from something that sometimes worked into something that could be relied on as part of mission-critical workflows.
The road ahead
There is still one major issue, although it affects user experience rather than functionality. Whenever we upgrade a version of Appsmith (which happens regularly on the cloud-hosted version, since we’re always improving), that causes a change in the underlying metadata of the application. Then when a user makes their next commit, in addition to the changes they made to their application, they see other (often major) changes to files they didn’t touch.
This can be very confusing and cause the user’s changes to be drowned out by these other changes. We’re aware that there is a lot of customer pain around this, so we’re trying to better communicate these metadata changes to users by keeping automatic commits separate from user commits, but that is still a work in progress.
As we mentioned before, we also want to improve the output of git diff so that users can get an idea of the exact changes made rather than just where the changes were made in the application. At our recent Hackathon, one team created a demo of this functionality, although we still have a lot of work to do before it’s ready to be released.
We are focusing on auto-deployment functionality as well. We want users to be able to automatically create development, testing, and production branches to neatly extend to different environments in a CI/CD workflow. We see this as critical to any robust development process.
Further ahead, we will also be adding branch protections in the local Git repository. These can already be enabled in the remote repository, but we want to add that functionality to the local repository as well to prevent Appsmith applications from entering a bad state.
Git in Appsmith is a huge leap forward for productivity and reliability in internal apps
Modern software development practices would be impractical, if not impossible, without Git. Nobody would think to develop an application without first having robust and reliable source control in place. Just because internal app platforms allow users to create applications much more quickly and efficiently than other tools doesn’t mean they shouldn’t offer native Git integrations.
But the truth is that many internal app platforms don’t bother, because solving these performance issues is so difficult. That’s why solving these issues and getting a reliable feature was a monumental achievement for our team.
And now that Git is available in Appsmith out of the box, your team can quickly iterate and test your apps, roll back if there’s a problem, and always see the current state of your codebase using the same industry-standard tools as in any other IDE.