Migrating from ACS3.x to ACS40

by Mike Bonnet (mikeb@arsdigita.com)
(Slightly mangled/anonymized by Andrew Grumet)

Why, oh, why?

The project started in June, 2000. Development started on ACS3.2, with no abstract URLs. Some ACS3.4 modules (calendar, address-book, file-storage) were added to the site, but had to be modified to work under ACS3.2. Eventually the site was converted to ACS 3.4 abstract URLs. The entire site made heavy use of ADP templating, and was built upon a custom content-repository.

In December, 2000, it was decided that the project would move to ACS40. The Tcl code and templates would obviously have to be rewritten, but it was very important that no data in the database got lost. A team of content editors from the Client had been adding content since the start of the project, and all 37000 items needed to be moved onto the new servers, along with user information for approximately 450 registered users. We installed a new ACS40 server on our development machine, and started the process of moving our code and data to the new server, and getting it working.

Herding users and groups to greener pastures

The first step was migrating from the ACS3.x users table to the ACS40 parties-persons-users system. Our users table also had an extra column for user bios. We wrote this PL/SQL script to recreate the existing users in ACS4. Note that acs3x is the old database and acs4x is the new database. We use the "not in (select ..." in the cursor definition to make sure that we don't try to create users with emails that already exist. Note that we compare lower(email) rather than just email because in ACS3.x passwords were case-sensitive, so User@ComPany.COM was different than user@company.com. However, ACS40 creates accounts with all lower-case emails automatically, so if you don't compare the lowered emails, you could end up trying to create users with duplicate emails, and a unique constraint would be violated.

Passwords were encrypted in the ACS3.x database, and ACS40 uses a different encryption scheme than ACS3.x, so there was no way to recover them. All users were created with an initial password of "changeme". The acs.add_user function expects a hashed password, and the seed that password was hashed with. We wrote a simple Tcl page that appends the salt to the end of the password and calls ns_sha1 on that string. The result is the password argument that add_user is called with.

Update: We decided that it was unacceptable to reset all the users passswords to "changeme", so we had to have a way for them to keep using their old passwords. We ran the update at the the end of the user migration script to bring their old passwords into the users table. We then hacked the ad_check_password function to check for both the old and new password formats. If either format works, the user is logged in. Users can continue to use their old passwords for as long as they want to, but we will probably encourage them to change them soon, so that they will be using the newer, more secure password-encrypting algorithm.

A significant number of the users had also uploaded their portraits. The script checks if they have a portrait, and inserts it into the ACS40 content repository if they do. The SQL is taken from the portrait upload page; it creates a content item, a revision of that item, and a row in the images table. The script also inserts a row in the custom user_bios table, with the new user_id and their existing bio.

The last step in the script is creating a user_party_map table, relating old user_ids to new party_ids. The email column is there for accuracy checking only.

In the course of figuring out how to migrate the users and portraits there were some errors, so we created a drop user script that would totally remove a party and any references to that party from the database. You can find that here.

Next we had to bring the existing groups over from ACS3.x. We decided there were only 4 groups that we needed, and wrote this script to bring them over. It uses the acs_group and membership_rel packages to create the new groups and add the correct members to them. Membership can then be checked by querying the group_member_map table.

Content? What content?

We now had to bring our content repository into the ACS4 tablespace. We decided that for now we would keep our own content repository data model, and perhaps migrate it to the ACS4 data model at a later time. This meant that we just had to import our existing tables and views. Fortunately, none of the tables in our content repository had the same name as the ACS content repository. Unfortunately, the tables had a lot of referential integrity constraints, including references to the users table. We decided to bring the tables over in batches, resolving referential constraints as we went along.

We used the queries here to find tables with no referential constraints, or only constraints referencing the same table, and migrated those. We then found only tables referencing already imported tables, and migrated those next. This worked until we reached tables that referenced the users table. Since we had created new users in ACS40, the user_ids would not be the same, and the referential constraint would be violated. The solution was to disable the constraint in the ACS3.x tablespace before export, import the tables into the new tablespace, update the tables setting the ACS3.x user_id to the associated ACS40 party_id (using user_party_map) and reenabling the constraint. This allowed all of the content to be imported and retain its existing relationships.

Once all of the tables had been migrated, we used the content repository creation scripts to reload all the necessary views and PL/SQL functions. A few of them had to be modified to work correctly with the new user scheme, but most worked out of the box.

Stop it, I'm Tcl-ish!

Now that the data model was migrated, the existing Tcl procedures had to be modified to work under ACS40. The first issue was that our code did a lot of explicit handle management. Since database handles were such precious commodities in ACS3.x, many of our procs would take a handle as an argument, or get their own handle from the subquery pool. Since the request processor gets its database handles from the subquery pool, this immediately caused errors, and all of our Tcl code needed to be converted to use the DB API.

Most of the conversion was very straight-forward. Ns_db gethandles were removed, set selection [ns_db select $db "sql text..."] became set sql_query "sql text...", and while {[ns_db getrow $db $selection]} { code block... } became db_foreach query_name $sql_query { code block... }. There were a couple of places where a proc behaved differently based on the result of ns_db getrow, and the query couldn't be translated directly to a db_foreach, but these could be handled by keeping the existing code and wrapping it in a db_with_handle db { code block... }. All queries were also converted to using bind variables. It didn't take long until all of our custom Tcl code was working under ACS40.

ACS vs. ADPs, or How I learned to stop worrying and love the templating system

The Project was written using ADP pages. Accessing a URL will source an ADP file that will ns_adp_include several other ADP files. These files will be a mixture of HTML and calls to procedures that return HTML. Many of these procedures will ns_adp_parse templates to build the HTML that they return. This scheme allowed us to achieve good separation of logic and presentation, and kept unnecessary files out of the webroot. ADP templates that only got included were moved out of the webroot so that users couldn't access them directly and potentially cause server errors or view data they should not have access to.

We decided to try to get our existing templates working under ACS40 as a first step before converting to the ACS40 templating system, and to verify that the data migration was complete and correct. After debugging some of the ADP pages and Tcl procs, it looked like the pages should have been working. The ADPs were getting sourced, according to the server logs, but Netscape would always return Document contains no data. We had heard rumors about the request processor having problems serving ADP pages that had no corresponding Tcl page. We created an empty Tcl page with the same name as an ADP page, and tried to access it using abstract URLs, but it still returned no data. We then put this line in the Tcl page: ns_return 200 text/html [ns_adp_parse -file <full path to file>], and it worked. After placing the required images in the right location under the web root, our pages looked just like they had under ACS3.x.

Everything would have been great at this point, and we could have all lived happily ever after, but we started to notice some strange behavior from our server. On pages that used the new ACS40 templating system, random bits of HTML would get inserted into places they shouldn't be. One of the first symptoms was that CSS files that were referenced in a <link rel> would show up in-line in the page. It progressed to the point where whole pages from the old templating system would get inserted into the middle of a page being served by the new templating system. We noticed that it only seemed to happen after a page from the old templating system broke, resulting in a server error. After much investigation, we're still not sure what's wrong. We believe it's an issue with AOLServer's internal ADP buffers not getting cleared, but we haven't been able to verify this. We've been told that we're not supposed to use ns_adp_include and ns_adp_parse when using the new templating system, so the bottom line is that we can't mix old and new templates, and all our pages need to be converted to the new templating system before our problems will go away.

So, now we're...

...in the process of converting our existing ADPs to the new ACS40 templating system.