Status of this document
Unique URLs are in core as of 1.6.1, implemented better than described in this page
This should be viewed as a start of a specification, and is very much open for discussion. Use comments or edit the document (comments are easier to keep track of, though!)
Problem to be solved
Atutor courses does not have any unique URLs, which prevents them from being indexed by search engines, and also prevents any linking to specific content pages or tools in a given (public) course from other websites. Where you are at in a Atutor install is determined by a user cookie.
Adding unique URLs to all Atutor content will make courses indexable, and expand the potential user group of Atutor, specifically into the lightweight CMS audience.
For more background, see: http://www.atutor.ca/view/16/12245/1.html and http://www.atutor.ca/view/12/12246/1.html (note that after some thinking I've changed my mind in how this should be implemented)
Suggested solution
Pass a course=xx variable at the end of any URL in Atutor, as an identifier in addition to the user cookies. The course_id variable should be on by default.
A typical new URL would then be:
www.example.com/atutor/index.php?course=12
www.example.com/atutor/content.php?cid=175?course=12
(?course=xx is added to all URLs)
etc., for all possible URLs
How it should work
- for public courses: straigth forward. all pages get an unique URL, all are indexable.
- for protected and private courses: all get an unique URL, and users who are not logged in, but referred to a page is
- first redirected to the login / register page
- then redirected back to the referring page
(note: this is allready working for login pages, as you can log in to a specific course with an URL like this: /atutor/bounce.php?course=1 (to log in to the course with id=1 in an install)
Backwards compatibility
I believe this will work the way I describe it below by default, but make sure that old URLs does not break because the new variable is introduced.
Example: Many users link to specific pages / tools in Atutor installs allready. For example we often link to a specific test for a content page using the following URL:
/atutor/tools/take_test.php?tid=170
the new URL for this would be
/atutor/tools/take_test.php?tid=170?course=34
the old links should not give an 404 after the new URL system is introduced, this is why the user cookie is not removed.
Potential issues
What happens if a course is exported and backed up, and the content includes references with the course id property in the URL?
Future development
giving courses a slug, or nicer URLs, could later be introduced by adding a rewrite module. this would (could?) be dependent on mod_rewrite for Apache.
Wordpress has solved this the "typical" way for CMSes, see: http://codex.wordpress.org/Using_Permalinks and the attached screenshot of a Wordpress permalink configuration page. I believe the same principles should be used for Atutor.
yes, I do agree in principle. something like this was my initial suggestion in http://www.atutor.ca/view/12/12246/1.html - the reason I changed was because of backward compatibility to older courses, and the scale of what needs to be done to get this right.
for instance, all images and objects that are inserted from the file manager gets a link like IMG SRC="picture.jpg", meaning it is placed on root - if we change the whole URL structure with mod_rewrite this will break, as the correct would be something like IMG SRC="../../../picture.jpg" to refer the file to install-dir/12/content/175 - if Atutor is installed on root, this would be different again, etc. etc. etc. same goes for internal links in courses, and probably other things like backups / restores as well.
it might be worth it to get a proper link structure, though, but be aware that it wil be a massive change in how Atutor works, and most probably will break something in some installations. also, nothing can screw things up as much as mod_rewrite and .htaccess files - and I say that from experience
..in this case, maybe we would need to create a completely new system on how you refer one "node" to another in Atutor, that takes into account all this, including backup / restore and offline viewing?
I don't know any systems that has done this, but Drupal creates aliases for their "nodes", so /?q=node/12 can get an alias like, say /hardware/monitors, and where both URLs will work. but that's not an optimal solution either.
so, in short that's why I adjusted my spec. however, the end result when everything works from your suggestion would be better, but I fear that it might have more implications than we can foresee right now?
regarding Google, I believe that the pagerank system ensures that if there are links (internal or extarnal) to a page with ?= in the URL, that page is indexed. on the other hand, better URLs would for sure be indexed.
On Sunday I've read some docs on mod_rewrite, and tried to implement the subj. I can't say it worked as it should, but some points can already be made.
1. About the URL construction.
If we use mod_rewrite, then there won't be basically any difference between specifying the course ID in the path part and in the query part. That's because the very first step that RewriteRule directive will take will be replacing e.g. "/course/12/index.php" with "/index.php?course=12". Right after this moment, the two suggested URL schemes would become indistinguishable.
2. About the compatibility.
Vegard is right about its highest priority, and it seems it could be got for a low cost.
ATutor uses relative paths to files stored in the file manager. That's why as long as <base href=...> uses permalinks, there should be no problems. The web server will adjust the paths to images just like it does for PHP files - which would result in extra "?course=" appended, but it will make no difference.
Or, we can even use additional RewriteCond to prevent that appendix from being added for any files except PHPs.
vitals.inc.php (or some other file that is always included) should verify that a path contains the course part if we're inside a course and redirect the user's browser if it doesn't. This would keep old URLs accessible, while enforcing use of new ones as discussed in http://www.atutor.ca/view/12/12246/1.html#12323.
The extra "?course=" can be a problem for some files (e.g. faq/index_instructor.php treats any unknown GET parameter with an error). So it should be handled and unset early during the execution.
At this point, all the alterations to the URL will have been removed, and most existing files would stay happily unaware it had ever changed. So their code could remain untouched. The exception is URLs output; see below.
3. About downsides.
Though a simple approach works well on synthetic examples, I wasn't able to achieve a good result on real ATutor.
First and foremost, all the URLs generated by ATutor have to be replaced. This is not as easy as just editing some constants like AT_BASE_HREF and replacing ubiquitous $_SERVER['PHP_SELF'] with e.g. $_SERVER['REQUEST_URI']. It could be much more laborious.
We have to distinguish between pages that are course-dependent and those that are not. E.g. /documentation/* should not be prefixed as it's unchanged regardless of whether it's in a course or not. It will probably make a little difference for usability, but from a search engine viewpoint it may seem strange and a site could, I guess, get penalized for breeding exactly the same pages.
The result may be achieved by adding a list of files that are course-independent into $_pages structure of include/lib/menu_pages.php, or a separate array. Then, each URL, before it's output, may be first passed to a special function checking the list and changing the path as needed, like
echo '<a href="' . enforce_new_urls($_SERVER['PHP_SELF']) . '?id=500&asc=value">' ...
Also, some problems have been reported with rewriting HTTP methods other than GET. Didn't look at that though.
4. About a similar approach to course items.
Implementing the Greg's example ("www.example.com/atutor/12/content/175") would greatly improve readability.
Probably, it can be done just like the course permalinks, and fewer changes would be required. The thing to consider is whether a similar approach should be taken to any other data inside ATutor. That, in turn, may require a different approach if the above one appears to be non-scalable (just a guess).
I don't know the search engines' policies regarding URLs containing "?"s. Quite contrary, I've seen a lot of such pages in Google search results. So if it really cares about such links, and doesn't weigh them lower, the rationale for content permalinks will become quite different from that of courses': they will serve primarily as "syntactic sugar" rather than being enablers for something really new, like the ability to be traversed by bots.
But, if Google really underestimates/ignores such URLs, these content permalinks may well be worth implementing.
5. About custom slugs.
Course slugs are useful things, and once we declare that we'll try expanding to the CMS market, they would really be essential.
Though, as I said, I'd prefer to be able to turn them off in a large installation like ours.
That's because most our instructors don't know English, and there are different transliteration rules from Cyrillic to Latin script. Often the transliterations by such people are funny or ambiguous, which, when used in slugs, would be bad for a university. In a large installation (now we have about 300 courses) there also is a potential for a conflict between some common course titles like "Physics" or "System programming". That might cause debates over the naming priority.
So we'll have to either employ a centralized university-wide policy regarding this issue, or (better) remove an ability to customize the URLs altogether.
However, in a more strictly managed or personal CMS, this feature would be nothing but a bonus.
I'll be glad to hear more suggestions and any preliminary results.
Thanks! I cannot comment much on the more technical aspects, but regarding search engines policies regarding "?"s I can say that they are not automatically ignored. I believe the policy is that they are given less weight if they are not linked to from somewhere, but if they are linked to (say, from a course catalogue) they will be indexed as search engines most often index by following links on a site. However, more "static looking" URLs are better.
But based on this, can a good road map to get this into Atutor be the following:
- add the ?course_id=xx operator to the URLs, and confirm that this is working.
- add a rewrite module / option so servers who support mod_rewrite can change "?course_id=12&cid=9874" to "/12/9874"
- add a new option that can take a course id and replace it with a custom slug, so you could take the "/12/9874" and change it to "/courseslug/9874"
This seems to be the way most apps have followed towards better URLs.
Like what Indie's has mentioned, adding url queries(?course_id=xx) to some pages would produce some error message. Though, I believe this can be fix quite easily.
About the rewrite part, there are basically two ways of doing it: the mod_rewrite way, and the url way. The first way is the easiest but it's platform dependent. Mod_rewrite is only usable under Apache server and with the mod_rewrite enabled. The second way is a bit complicated where it has to reads the url and match the query (/12/9874/) with some rules (?course_id=12&cid=9874).
On top of the roadmap that Vegard has given, we should also add the ability to enable/disable this feature under admin/system preference.
I'm not sure I like the idea of adding more URL information. If anything we want to reduce it. For the most part, google does not traverse databases of information often found after the "?" in a URL. If anything we'd implement a mod-rewrite strategy to clean up URLs and make them appear like the URLs in the atutor.ca forum, which have no "?" in them, thus they are indexed fully by google.
A course url could be constructed to include various ids as directory names in the example above, something like
www.example.com/atutor/12
www.example.com/atutor/12/content/175
We could also replace the course id (12) with a slug defined by and instructor, or admin, something like
www.example.com/atutor/my_course/content/175
where "my_course" is the slug defined by the instructor to represent the course.