JP5113967B2

JP5113967B2 - Internet file system

Info

Publication number: JP5113967B2
Application number: JP2001516067A
Authority: JP
Inventors: セドラー，エリック; ロバーツ，マイケル
Original assignee: オラクル・インターナショナル・コーポレイション
Priority date: 1999-08-05
Filing date: 2000-07-26
Publication date: 2013-01-09
Anticipated expiration: 2020-07-26
Also published as: AU2004203241B2; AU2004203240B2; AU6495400A; AU2004203242B2; AU2004203240A1; AU2004203243B2; AU2004203242A1; AU774090B2; EP1330727A2; WO2001011486A2; CA2379930A1; AU2004203249A1; AU2004203249B2; CA2379930C; AU2004203243A1; WO2001011486A3; AU2004203241A1; JP2003527659A

Description

【０００１】
【優先権主張および関連出願の参照】
本願は、その全文が本明細書中に完全に述べられているかのように引用により援用される、Eric Sedlarによる「インターネットファイルシステム（"Internet File System"）」と題された１９９９年８月５日出願の先行米国仮特許出願連続番号第６０／１４７，５３８号に関し、米国特許法第１１９条（ｅ）によりその国内優先権を主張する。
【０００２】
本願は、その全文が本明細書中に完全に述べられているかのように引用により援用される、Eric Sedlarによる「関係システム内に階層状に構成された情報にアクセスするための階層索引付け（"Hierarchical Indexing for Accessing Hierarchically Organized Information in a Relational System"）」と題された１９９９年２月１８日出願の米国特許出願連続番号第０９／２５１，７５７号に関連する。
【０００３】
本願は、その全文が本明細書中に完全に述べられているかのように引用により援用される、Eric Sedlarによる「トランザクションをサポートするファイルシステム（"File System that Supports Transactions"）」と題された２０００年５月１５日出願の米国特許出願連続番号第０９／５７１，４９６号に関連する。
【０００４】
本願は、その全文が本明細書中に完全に述べられているかのように引用により援用される、Eric Sedlarによる「記憶されたクエリディレクトリ（"Stored Query Directories"）」と題された２０００年５月１５日出願の米国特許出願連続番号第０９／５７１，０６０号に関連する。
【０００５】
本願は、その全文が本明細書中に完全に述べられているかのように引用により援用される、Eric Sedlarによる「ファイルシステムに結合されたイベント通知システム（"Event Notification System Tied to a File System"）」と題された２０００年５月１５日出願の米国特許出願連続番号第０９／５７１，０３６号に関連する。
【０００６】
本願は、その全文が本明細書中に完全に述べられているかのように引用により援用される、Eric Sedlarによる「ファイルがタイプ付けされたオブジェクトファイルシステム（"Object File System with Typed Files"）」と題された２０００年５月１５日出願の米国特許出願連続番号第０９／５７１，４９２号に関連する。
【０００７】
本願は、その全文が本明細書中に完全に述べられているかのように引用により援用される、Eric Sedlarによる「オンザフライ・フォーマット変換（"On-the-fly Format Conversion"）」と題された２０００年５月１５日出願の米国特許出願連続番号第０９／５７１，５６８号に関連する。
【０００８】
本願は、その全文が本明細書中に完全に述べられているかのように引用により援用される、Eric SedlarおよびMichael J. Robertsによる「インターネットファイルシステムにおけるバージョニング（"Versioning in Internet File System"）」と題された２０００年５月１５日出願の米国特許出願連続番号第０９／５７１，６９６号に関連する。
【０００９】
本願は、その全文が本明細書中に完全に述べられているかのように引用により援用される、Eric Sedlarによる「データへのマルチモデルアクセス（"Multi-Model Access to Data"）」と題された２０００年５月１５日出願の米国特許出願連続番号第０９／５７１，５０８号に関連する。
【００１０】
【発明の分野】
本発明は一般に電子ファイルシステムに関し、特定的には、データベースシステムを用いてオペレーティングシステムファイルシステムを実現するシステムに関する。
【００１１】
【発明の背景】
ヒトは情報をカテゴリに分類する傾向にあり、情報が分類されるそれらカテゴリ自体は典型的に、何らかの階層状に互いに関連付けて構成される。たとえば、個々の動物は種に属し、種は属に属し、属は科に属し、科は目に属し、目は綱に属する。
【００１２】
コンピュータシステムの出現に伴ない、このような階層構成（hierarchical organization）を望むヒトの欲求を大いに反映する電子情報の記憶技術が開発されてきた。従来のオペレーティングシステムは、たとえば、階層ベースの構成原理を使用するファイルシステムを提供する。具体的には、典型的なオペレーティングシステムファイルシステム（「ＯＳファイルシステム」）においては、ディレクトリは階層に配され、文書（ドキュメント）はそれらディレクトリに記憶される。理想的には、ディレクトリ間の階層的関係は、それらディレクトリに割当てられた意味間の何らかの直観的な関係を反映する。同様に、各ドキュメントがディレクトリに記憶される場合、そのドキュメントの内容と、そのドキュメントが記憶されるディレクトリに割当てられた意味との間の何らかの直観的関係に基づいて、記憶されると理想的である。
【００１３】
図１は、（ワードプロセッサ等の）ファイルを作成して使用するソフトウェアアプリケーションがそのファイルを階層的ファイルシステム内に記憶する際に用いられる、典型的な機構を示す。図１を参照して、オペレーティングシステム１０４は、アプリケーション１０２に対してアプリケーションプログラミングインターフェイス（ＡＰＩ）を開く（expose）。そうして開かれたＡＰＩにより、アプリケーション１０２はそのオペレーティングシステムによって提供されるルーチンをコールすることができる。以後、ＯＳＡＰＩの、ＯＳファイルシステムを実現するルーチンに関連する部分を、ＯＳファイルＡＰＩと称する。アプリケーション１０２は、ＯＳファイルＡＰＩを介してファイルシステムルーチンをコールして、データを検索してディスク１０８に記憶する。オペレーティングシステム１０４の方は、ディスク１０８へのアクセスを制御するデバイスドライバ１０６に対してコールを行なって、ディスク１０６からファイルを検索させたりディスク１０６にファイルを記憶させたりする。
【００１４】
ＯＳファイルシステムルーチンは、ファイルシステムの階層的構成を実現する。たとえば、ＯＳファイルシステムルーチンは、ファイル間の階層的関係に関する情報を維持し、アプリケーション１０２にファイルへのアクセスを、階層内における当該ファイルの場所に基づいて与える。
【００１５】
電子情報を階層的に構成するのに対して、関係データベース（relational database）は、情報を行列からなるテーブルに記憶する。各行は独自のRowIDによって識別される。各列は記録の属性を表わし、各行は特定の記録を表わす。データベースからのデータの検索は、データベースを管理するデータベース管理システム（ＤＢＭＳ）にクエリを提示することによって行なわれる。
【００１６】
図２は、データベースアプリケーションがデータベース内の情報にアクセスする際に用いられる典型的な機構を示す。図２を参照して、データベースアプリケーション２０２は、データベースサーバ２０４によって提供されるＡＰＩ（「データベースＡＰＩ」）を通じてデータベースサーバ２０４と対話する。このように開かれたＡＰＩにより、データベースアプリケーション２０２は、データベースサーバ２０４によってサポートされたデータベース言語により構築されたクエリを用いて、データにアクセスすることができる。多くのデータベースサーバによってサポートされる言語の１つに、構造化クエリ言語（Structured Query Language, SQL）がある。データベースサーバ２０４は、データベースアプリケーション２０２に対して、すべてのデータがテーブルの行に記憶されているように見せる。しかし、データベースアプリケーション２０２にトランスペアレントなことに、データベースサーバ２０４は実際にはオペレーティングシステム１０４と対話して、データをファイルとしてＯＳファイルシステム内に記憶する。オペレーティングシステム１０４の方では、デバイスドライバ１０６に対してコールを行なって、ファイルをディスク１０８から検索させたりファイルをディスク１０８に記憶させたりする。
【００１７】
各種記憶システムにはそれぞれ、利点および限界がある。階層的に構成された記憶システムは、簡単、直観的、かつ実現が容易であって、大半のアプリケーションプログラムによって使用される標準的なモデルである。しかし、残念ながら、この階層的構成の簡易性は、複雑なデータ検索オペレーションに求められるサポートを提供することができない。たとえば、特定日に作成された特定的なファイル名を有するすべてのドキュメントを検索するのに、すべてのディレクトリの内容を検査せねばならないことがあり得る。すべてのディレクトリをサーチせねばならないので、階層的構成は検索プロセスを容易にすることはできない。
【００１８】
関係データベースシステムは、大量の情報を記憶したり、非常に柔軟にデータにアクセスするのに、好適である。階層的に構成されたシステムに対して、複雑なサーチ基準に合致するデータでさえも、関係データベースシステムからは容易にかつ効率的に検索することが可能である。しかし、クエリを公式化（formulate）してデータベースサーバに提示するプロセスは、ディレクトリの階層を単に通り抜けるのに比して直観的ではなく、多くのコンピュータユーザにとっての技術的快適度を越えるものである。
【００１９】
現時点において、アプリケーションの開発者は、それらのアプリケーションによって作成されるデータを、オペレーティングシステムによって提供される階層的ファイルシステムを通じてアクセス可能としたいか、それともデータベースシステムによって提供されるより複雑なクエリインターフェイスを通じてアクセス可能としたいか、どちらかを選択するよう求められる。一般に、データベースシステムの複雑なサーチ能力を要求しないアプリケーションについては、オペレーティングシステムによって提供されるより一般的かつより簡単な階層的ファイルシステムを使用して、それらのデータを記憶するように設計される。この場合、アプリケーションの設計およびアプリケーションの使用がどちらも簡素化されるものの、それらデータにアクセスすることのできる柔軟性およびパワーの面で制限が課されてしまう。
【００２０】
これに対し、複雑なサーチ能力が求められる場合には、アプリケーションは、データベースシステムによって提供されるクエリ機構を使用してそれらのデータにアクセスするように設計される。この場合、データにアクセスすることのできる柔軟性およびパワーは増すが、それと同時に、アプリケーションの複雑性が、設計者の観点からもユーザの観点からも増す。さらに、データベースシステムの存在も求められ、アプリケーションユーザに対して付加的な費用がかかることになる。
【００２１】
以上に鑑みて、アプリケーションが比較的簡単なＯＳファイルＡＰＩを使用してデータにアクセスすることができることが明らかに望ましい。また、より強力なデータベースＡＰＩを使用して同じデータにアクセスすることができることがさらに望ましい。
【００２２】
【発明の概要】
データベースに記憶されたデータにアクセスするための技術が提供される。一技術に従えば、アプリケーションはオペレーティングシステムに対して１または複数のコールを行なってファイルにアクセスする。該オペレーティングシステムは、オペレーティングシステムファイルシステムを実現するルーチンを含む。該１または複数のコールは、該オペレーティングシステムファイルシステムを実現するルーチンに対して行なわれる。該１または複数のコールに応答して、１または複数のデータベースコマンドがデータベースを管理するデータベースサーバに対して発行される。該データベースサーバはそのデータベースコマンドを実行して、データベースからデータを検索する。該データからファイルが生成されて、該アプリケーションに与えられる。
【００２３】
本発明を以下に限定のためではなく例示の目的で、添付の図面を参照して説明する。図中、同一の参照符号は同一の要素を表わす。
【００２４】
【好ましい実施例の詳細な説明】
同じデータの組に対して、データベースＡＰＩおよびＯＳファイルシステムＡＰＩを含む種々のインターフェイスを介してアクセスすることを可能にする、方法およびシステムが提供される。以下に、説明の目的で、本発明が完全に理解されるように多数の具体的な詳細が述べられるが、当業者には、本発明がそれらの具体的な詳細を伴なわずに実施され得ることは明らかであろう。他の例においては、本発明を不必要にあいまいにすることのないように、周知の構造および装置はブロック図により示される。
【００２５】
アーキテクチャ的な概観
図３は、本発明の一実施例に従って実現されるシステム３００のアーキテクチャを表わすブロック図である。図２に示すシステムと同様、システム３００は、データベースＡＰＩを提供するデータベースサーバ２０４を含み、このデータベースＡＰＩを通じて、データベースアプリケーション３１２が、データベースサーバ２０４により管理されるデータにアクセスし得る。データベースＡＰＩを通じてデータベースサーバ２０４により管理されるデータにアクセスするすべてのエンティティの観点から、データベースサーバ２０４により管理されるデータはデータベースサーバ２０４（たとえばＳＱＬ）によりサポートされるデータベース言語を用いて照会され得る関係テーブルにストアされる。これらのエンティティに対してトランスペアレントに、データベースサーバ２０４はこのデータをディスク１０８にストアする。一実施例によれば、データベースサーバ２０４は、データを直接ディスクにストア可能にすることによってオペレーティングシステム１０４のＯＳファイルシステムに伴うオーバヘッドを回避できるようにする、ディスク管理論理を実現する。したがって、データベースサーバ２０４は、（１）オペレーティングシステム１０４により提供されたＯＳファイルシステムをコールするか、または（２）データを直接ディスクにストアすることでオペレーティングシステム１０４を迂回するかのいずれかによって、データがディスクにストアされるようにし得る。
【００２６】
図２のシステムとは異なって、システム３００はトランスレーションエンジン３０８を提供し、これはオペレーティングシステム３０４ａおよび３０４ｂから受けたＩ／Ｏコマンドを、トランスレーションエンジン３０８がデータベースサーバ２０４へ発するデータベースコマンドへ変換する。Ｉ／Ｏコマンドがデータのストレージを求める場合、トランスレーションエンジン３０８はデータベースコマンドをデータベースサーバ２０４へ発し、データベースサーバ２０４により管理される関係テーブルにデータがストアされるようにする。Ｉ／Ｏコマンドがデータの検索を求める場合、トランスレーションエンジン３０８はデータベースコマンドをデータベースサーバ２０４に発し、データベースサーバにより管理される関係テーブルからデータを検索する。その後トランスレーションエンジン３０８は、こうして検索されたデータを、Ｉ／Ｏコマンドを発したオペレーティングシステムに与える。
【００２７】
オペレーティングシステム３０４ａおよび３０４ｂに対しては、トランスレーションエンジン３０８に伝達されたデータがデータベースサーバ２０４により管理される関係テーブルに最終的にストアされるという事実はトランスペアレントである。これは、オペレーティングシステム３０４ａおよび３０４ｂにトランスペアレントであるので、それらのオペレーティングシステムを含むプラットホーム上で実行されているアプリケーション３０２ａおよび３０２ｂに対してもトランスペアレントである。
【００２８】
たとえば、アプリケーション３０２ａのユーザがアプリケーション３０２ａにより与えられる「ファイルの保存」という選択肢を選択する場合を想定する。アプリケーション３０２ａはＯＳファイルＡＰＩを通じてコールを行ない、オペレーティングシステム３０４ａにファイルを保存させる。オペレーティングシステム３０４ａはトランスレーションエンジン３０８にＩ／Ｏコマンドを発し、ファイルをストアさせる。トランスレーションエンジン３０８はこれに応答して、データベースサーバ２０４に１つ以上のデータベースコマンドを発し、データベースサーバ２０４に、ファイル内に含まれるデータをデータベースサーバ２０４が保持する関係テーブルにストアさせる。データベースサーバ２０４は、このデータを直接ディスクにストアしてもよく、またはオペレーティングシステム１０４をコールしてオペレーティングシステム１０４により提供されるＯＳファイルシステムにデータをストアさせてもよい。データベースサーバ２０４がオペレーティングシステム１０４をコールすると、オペレーティングシステム１０４はこれに応答して、デバイスドライバ１０６にコマンドを送ることによりデータをディスク１０８にストアさせる。
【００２９】
別の例として、アプリケーション３０２ａのユーザがアプリケーション３０２ａにより与えられる「ファイルのロード」という選択肢を選択する場合を想定する。アプリケーション３０２ａはOS File APIを通じてコールを行ない、オペレーティングシステム３０４ａにファイルをロードさせる。オペレーティングシステム３０４ａはＩ／Ｏコマンドをトランスレーションエンジン３０８に発し、ファイルのロードを行なわせる。トランスレーションエンジン３０８は、１つ以上のデータベースコマンドをデータベースサーバ２０４に発し、データベースサーバ２０４に、検索すべきファイルを備えるデータをデータベースサーバ２０４が保持する関係テーブルから検索させる。データの検索中、データベースサーバ２０４はデータディレクトリを検索してもよく、またはオペレーティングシステム１０４をコールしてディスク１０８上のＯＳファイルからデータを検索させてもよい。一旦データが検索されると、この検索されたデータから所望のファイルが「構築される」。具体的には、この検索されたデータはファイルをリクエストしたアプリケーション３０２ａにより予測されたフォーマットにされる。こうして構築されたファイルは、トランスレーションエンジン３０８およびオペレーティングシステム３０４ａを通じて、アプリケーション３０２ａまで伝達される。
【００３０】
システム３００には数多くの新規な特徴が組入れられる。以下のセクションでは、これらの特徴をより詳細に説明する。しかしながら、当然、特定の実施例はこれらの特徴を説明するために用いられるのであり、本発明がこれらの特定の実施例に限定されることはない。
【００３１】
関係づけてストアされたデータへのＯＳファイルシステムアクセス
本発明のある局面によれば、システム３００により、アプリケーションが従来のＯＳファイルＡＰＩを通じて、データベースにストアされたデータにアクセスできるようになる。すなわち、オペレーティングシステムにより提供される標準ＯＳファイルＡＰＩをコールすることによりファイルをロードするように設計されている従来のアプリケーションが、関係テーブルにストアされたデータからオンザフライで構築されたファイルをロードできるようになる。さらに、関係テーブルからデータが発生するという事実は、アプリケーションに対しては完全にトランスペアレントである。
【００３２】
たとえば、データベースアプリケーション３１２が、データベースサーバ２０４により保持されるデータベース中のテーブルに１行のデータを挿入するというデータベースコマンドを発すると想定する。一旦その行が挿入されると、オペレーティングシステム３０４ａにより提供される比較的単純なＯＳファイルＡＰＩを用いてデータにアクセスするようにしか設計されていないアプリケーション３０２ａは、「ファイルを開く」というコマンドをオペレーティングシステム３０４ａに発する。これに応答して、オペレーティングシステム３０４ａはＩ／Ｏコマンドをトランスレーションエンジン３０８に発し、トランスレーションエンジン３０８は、１つ以上のデータベースコマンドをデータベースサーバ２０４に発することにより応答する。データベースサーバ２０４は、データベースコマンド（典型的にはデータベースクエリの形式）を実行することにより、データベースサーバ２０４に、データベースアプリケーション３１２により挿入された行を検索させる。アプリケーション３０２ａにより予測されるファイルタイプのファイルがその行に含まれるデータから構築され、こうして構築されたファイルが、トランスレーションエンジン３０８およびオペレーティングシステム３０４ａを通じて再びアプリケーション３０２ａへ戻される。
【００３３】
システム３００により、従来のＯＳファイルシステムアクセスしかサポートしていないアプリケーションが、関係づけてストアされたデータをロードできるようになるだけでなく、従来のＯＳファイルシステムアクセスしかサポートしないアプリケーションによりストアされた情報に、データベースアプリケーションが従来の照会技術を用いてアクセスできるようになる。たとえば、アプリケーション３０２ａがＯＳのコールを行ない、作成されたファイルを保存させるとする。その「ファイルの保存」コマンドはオペレーティングシステム３０４ａおよびトランスレーションエンジン３０８を通じてデータベースサーバ２０４へ伝達される。データベースサーバ２０４は「ファイルの保存」コマンドをトランスレーションエンジン３０８により発せられたデータベースコマンドの形で受け、そのファイルに含まれるデータを、データベースサーバ２０４により管理されるデータベース中に含まれる１つ以上のテーブルの１行以上の行中にストアする。データが一旦その態様でデータベース内にストアされると、データベースアプリケーション３１２はデータベースサーバ２０４にデータベースクエリを発し、データベースからデータを検索することができる。
【００３４】
データベースにおけるＯＳファイルシステム構成のエミュレート
上記で説明したように、オペレーティングシステム３０４ａおよび３０４ｂのファイルシステムルーチンに対するコールは、最終的に、トランスレーションエンジン３０８がデータベースサーバ２０４に対して発するデータベースコマンドに変換される。本発明の一実施例によれば、これらの変換を行なう処理は、オペレーティングシステム３０４ａおよび３０４ｂにより実現されたファイルシステムの特徴をデータベースサーバ２０４内でエミュレートすることにより単純化される。
【００３５】
この構成モデルに関して、ほとんどのオペレーティングシステムは、ファイル階層構造でファイルを構成するファイルシステムを実現する。したがって、アプリケーション３０２ａおよび３０２ｂが行なったこのＯＳファイルシステムのコールは、典型的には、ＯＳファイル階層構造内のその場所という観点からあるファイルを特定するだろう。このようなコールから対応するデータベースのコールへの変換を単純化するために、関係のあるデータベースシステム内の階層ファイルシステムをエミュレートするための機構が設けられる。このような機構の１つが、１９９９年２月１８日にエリック・セドラー（Eric Sedlar）により出願され「関係のあるシステムおいて階層的に構成された情報にアクセスするための階層的インデクシング（HIERARCHICAL INDEXING FOR ACCESSING HIERARCHICALLY ORGANIZED INFORMATION IN A RELATIONAL SYSREM）」と題された米国特許出願番号０９／２５１，７５７号に詳細に記載されており、この全内容をここに引用により援用する。
【００３６】
具体的には、「HIERARCHICAL INDEXING」の出願には、階層インデックスを作成、保持、および使用して、パス名に基づいて関係のあるシステム内の情報に効率的にアクセスすることにより、階層的に構成されたシステムをエミュレートするための技術が記載される。エミュレートされた階層システムに何らかの子を有する各アイテムは、そのインデックスにインデックスエントリを有する。インデックス中のインデックスエントリは、これらのインデックスエントリに関連付けられたアイテム中の階層的な関係を反映するような方法で互いにリンクされる。具体的には、２つのインデックスエントリに関連付けられたアイテム間に親子関係が存在すれば、親アイテムに関連付けられたインデックスエントリはその子アイテムに関連付けられたインデックスエントリへの直接のリンクを有する。
【００３７】
結果的に、パス名中のファイル名のシーケンスに従って、そのパス名におけるアイテムに関連付けられたインデックスエントリ間の直接のリンクに沿って進むことにより、パス名の導出（resolution）が行なわれる。インデックスエントリがこの態様でリンクされるインデックスを用いることにより、それらのパス名に基づいてアイテムにアクセスする処理は著しく加速され、また、その処理中に行われるディスクアクセスの数は著しく減少する。
【００３８】
階層インデックス
本発明と整合性のある階層インデックスは、パス名により特定されるように、親アイテムからそれらの子へ移動するという、階層システムのパス名に基づいたアクセス法をサポートする。一実施例によれば、本発明の原理に合う階層インデックスは、次の３つのフィールドを含むインデックスエントリを採用する。RowID、FileID、およびDir＿entry＿list（アレイとしてストアされる）。
【００３９】
図５は、データベース内の階層ストレージシステムをエミュレートするのに用いられ得る階層インデックス５１０を示す。図６は、階層インデックス５１０がエミュレートしている特定のファイル階層構造を示す。図７は、図６に示すファイルを関係データベース内にストアするのに用いられるファイルテーブル７１０を示す。
【００４０】
階層インデックス５１０はテーブルである。RowID欄はシステムにより生成されるＩＤを含み、データベースサーバ２０４がディスク上でその行の場所を突きとめ得るようにするディスクアドレスを特定する。この関係データベースシステムによると、RowIDは、ディスクドライブにストアされたデータの場所を突きとめるためにＤＢＭＳが用いる暗示的に規定されたフィールドであり得る。インデックスエントリのFileIDフィールドは、このインデックスエントリに関連付けられたファイルのFileIDをストアする。
【００４１】
本発明の一実施例によれば、階層インデックス５１０は、子を有するアイテムに対するインデックスエントリのみストアする。したがって、エミュレートされた階層ファイルシステムという面において、階層インデックス５１０にインデックスエントリを有するアイテムは、他のディレクトリに対して親であるディレクトリおよび／または現在ドキュメントをストアしているディレクトリのみである。子を有さないそれらのアイテム（たとえば、図６のExample.doc、Access、Appl、App2、App3）は含まれないのが好ましい。所与のファイルに対するインデックスエントリのDir＿entry＿listフィールドは、あるアレイ中に、所与のファイルの子ファイルの各々に対する「アレイエントリ」をストアする。
【００４２】
たとえば、インデックスエントリ５１２はWindows（Ｒ）ディレクトリ６１４に対するものである。Wordディレクトリ６１６およびAccessディレクトリ６２０はWindows（Ｒ）ディレクトリ６１４の子である。よって、Windows（Ｒ）ディレクトリ６１４に対するインデックスエントリ５１２のDir＿entry＿listフィールドは、Wordディレクトリ６１６に対するアレイエントリと、Accessディレクトリ６２０に対するアレイエントリとを含む。
【００４３】
一実施例によれば、Dir＿entry＿listフィールドが各子に対してストアする特定の情報は、その子のファイル名およびその子のFileIDを含む。階層インデックス５１０にそれら自体のエントリを有する子に対して、Dir＿entry＿listフィールドは子のインデックスエントリのRowIDもストアする。たとえば、Wordディレクトリ６１６は階層インデックス５１０にそれ自体のエントリを有する（エントリ５１４）。したがって、インデックスエントリ５１２のDir＿entry＿listフィールドは、ディレクトリ６１６の名称（“Word”）、階層インデックス５１０におけるディレクトリ６１６に対するインデックスエントリのRowID（“Ｙ３”）、およびディレクトリ６１６のFileID（“Ｘ３”）を含む。より詳細に説明するように、Dir＿entry＿listフィールドに含まれる情報により、パス名に基づいた情報へのアクセスがより速くより容易になる。
【００４４】
階層インデックスのいくつかの主要な原理は以下のとおりである。
・所与のディレクトリに対するインデックスエントリのDir＿entry＿list情報はできるだけ少数のディスクブロックとしてまとめて保たれる。これは、最も頻繁に用いられるファイルシステムオペレーション（パス名の導出、ディレクトリのリスティング（listing））は、あるディレクトリが参照されると常にそのディレクトリ内の多数のエントリを見る必要が生じることになるからである。言換えれば、特定のディレクトリエントリが参照されると同じディレクトリ内の他のエントリもまた参照されることが多いので、ディレクトリエントリは参照に対して高い局所性を有するべきである。
【００４５】
・階層インデックスのインデックスエントリにストアされる情報は、特定のディスクブロック中のエントリの最大数に適合するように、最小に保たれなければならない。ディレクトリエントリを、それらが含まれるディレクトリを特定するキーを反復する必要のないアレイ手段にまとめてグループ分けすると、ディレクトリ内のすべてのエントリが同じキーを共有することになる。
【００４６】
・パス名の導出に要する時間は、ファイルシステム内のファイルの総数ではなく、パス内のディレクトリの数に比例すべきである。これにより、ユーザは、頻繁にアクセスされるファイルをアクセス時間の少ないファイルシステムツリーの頂上の方に保つことが可能になる。
【００４７】
これらの要素はすべて、ｉノードおよびディレクトリのUNIX(R)システムなどの典型的なファイルシステムディレクトリ構造において存在する。ここに記載のような階層インデックスを用いることにより、それらの目的と、関係のデータベースが理解しかつ照会し得る構造とが一致し、データベースサーバが、パス名の導出に用いられたものとは別の態様でファイルのアドホックサーチを行なうことが可能になる。これを行なうためには、あるインデックスのデータベース概念を用いなければならない。すなわち、ある特定の方法（この場合、階層ツリーにおけるパス名の導出）を介したアクセスを最適化するよう設計された別の態様で別個のデータ構造に配置された下位情報（この場合ファイルデータ）の部分の複製である。
【００４８】
階層インデックスの使用
ファイルのパス名に基づいてファイルにアクセスするために階層インデックス５１０がいかに用いられ得るかについて、ここで図８のフローチャートを参照して述べることにする。説明の目的で、ドキュメント６１８がそのパス名を介してアクセスされると仮定する。このファイルのパス名は/Windows（Ｒ）/Word/Example.docであり、これは以下「入力パス名」と称する。このパス名が与えられると、パス名導出処理は、この入力パス名中の第１の名称に対するインデックスエントリの場所を階層インデックス５１０において突きとめることにより開始する。あるファイルシステムの場合、パス名における第１の名称はルートディレクトリである。したがって、エミュレートされたファイルシステム内のファイルの場所を突きとめるためのパス名導出処理は、ルートディレクトリ６１０のインデックスエントリ５０８の場所を突きとめることにより始まる（ステップ８００）。すべてのパス名導出オペレーションがルートディレクトリのインデックスエントリ５０８にアクセスすることにより始まるので、ルートディレクトリ６１０（インデックスエントリ５０８）に対するインデックスエントリの場所を示すデータは、あらゆるサーチの開始時においてルートディレクトリのインデックスエントリ５０８の場所を素早く突きとめるために、階層インデックス５１０外部の都合よい場所に保持され得る。
【００４９】
ルートディレクトリ６１０に対するインデックスエントリ５０８の場所が一旦突きとめられると、ＤＢＭＳは、入力パス名中にまだ何らかのファイル名があるかを判定する（ステップ８０２）。入力パス名中にもうファイルがなければ、制御はステップ８２０へと進み、インデックスエントリ５０８にストアされたFileIDが用いられてファイルテーブル７１０中のルートディレクトリエントリを捜す。
【００５０】
この例では、ファイル名「Windows（Ｒ）」は、入力パス名においてルートディレクトリの記号「／」の後に続く。したがって、制御はステップ８０４へと進む。ステップ８０４で、次のファイル名（たとえば「Windows（Ｒ）」が入力パス名から選択される。ステップ８０６でＤＢＭＳはインデックスエントリ５０８のDir＿entry＿list欄を見て、選択されたファイル名に関するアレイエントリの場所を突きとめる。
【００５１】
この例では、入力パス名においてルートディレクトリの後に続くファイル名は「Windows（Ｒ）」である。したがって、ステップ８０６は、ファイル名「Windows（Ｒ）」のアレイエントリに対するインデックスエントリ５０８のDir＿entry＿listをサーチすることを伴う。Dir＿entry＿listが選択されたファイル名のアレイエントリを含まなければ、制御はステップ８０８からステップ８１０へと進み、ここで入力パス名が無効であることを示すエラーが生成される。この例では、インデックスエントリ５０８のDir＿entry＿listは「Windows（Ｒ）」のアレイエントリを含んでいる。したがって、制御はステップ８０８からステップ８２２へと移る。
【００５２】
インデックスエントリ５０８のDir＿entry＿list中の情報は、ルートディレクトリ６１０の子の１つが実際「Windows（Ｒ）」という名称のファイルであることを示す。さらに、Dir＿entry＿listアレイエントリはこの子についての次の情報を含む。すなわち、これはRowIDＹ２に一致するインデックスエントリであり、このFileIDはＸ２である。
【００５３】
ステップ８２２において、入力パス名にまだ何らかのファイル名があるか否かが判定される。もうファイル名がなければ、制御はステップ８２２からステップ８２０へと移る。この例では、「Windows（Ｒ）」は最後のファイル名ではないので、制御は代わりにステップ８２４へ移る。
【００５４】
「Windows（Ｒ）」は入力パス中の最後のファイル名ではないので、Dir＿entry＿listに含まれるFileID情報は、このパス導出オペレーション中には用いられない。むしろ、Windows（Ｒ）ディレクトリ６１４は特定されたパスの部分にすぎず、ターゲットではないので、ファイルテーブル７１０はこの時点では調べられない。代わりに、ステップ８２４で、インデックスエントリ５０８のDir＿entry＿list中に見つけられる「Windows（Ｒ）」に対するRowID（Ｙ２）が用いられて、Windows（Ｒ）ディレクトリ６１４に対するインデックスエントリの場所が突きとめられる（インデックスエントリ５１２）。
【００５５】
インデックスエントリ５１２のDir＿entry＿listを調べて、このシステムは、入力パス名中の次のファイル名をサーチする（ステップ８０４および８０６）。この例では、ファイル名「Word」が入力パス中でファイル名「Windows（Ｒ）」の後に続く。したがって、このシステムは「Word」のアレイエントリに対するインデックスエントリ５１２のDir＿entry＿listをサーチする。このようなエントリはインデックスエントリ５１２のDir＿entry＿list中に存在し、「Windows（Ｒ）」が実際「Word」という名称の子を有していることを示す（ステップ８０８）。ステップ８２２において、入力パス中にまだファイル名があると判定され、よって制御はステップ８２４へと進む。
【００５６】
「Word」に対するアレイエントリを見つけると、このシステムは、そのアレイエントリ中の情報を読出し、Wordディレクトリ６１６に対するインデックスエントリが階層インデックス５１０中のRowIDＹ３で見つかるということと、Wordディレクトリ６１６に属する特定の情報がファイルテーブル７１０中の行Ｘ３で見つかるということとを決定する。ワードディレクトリ６１６は単に特定されたパスの部分でありターゲットではないので、ファイルテーブル７１０は調べられない。その代わり、このシステムはRowID（Ｙ３）を用いてWordディレクトリ６１６に対するインデックスエントリ５１４の場所を突きとめる（ステップ８２４）。
【００５７】
階層インデックス５１０のRowIDＹ３で、このシステムはインデックスエントリ５１４を見つける。ステップ８０４において、入力パス名から次のファイル名「Example.doc」が選択される。ステップ８０６において、インデックスエントリ５１４のDir＿entry＿listがサーチされて、「Example.doc」に対するアレイエントリがあることを見つけ（ステップ８０８）、これは「Example.doc」がWordディレクトリ６１６の子であることを示す。このシステムはまた、Example.docが階層インデックス５１０においてインデックス付けの情報を全く有さないことと、Example.docに関する特定の情報をFileIDＸ４を用いてファイルテーブル７１０中に見つけることができるということも見つける。Example.docはアクセスされるターゲットファイル（すなわち入力パス中の最後のファイル名）であるので、制御はステップ８２０へ移り、ここでシステムはFileIDＸ４を用いてファイルテーブル７１０中の適切な行にアクセスし、かつその行の本体欄にストアされたファイル本体（ＢＬＯＢ）を抽出する。こうして、Example.docファイルがアクセスされる。
【００５８】
このファイルのアクセスには、階層インデックス５１０のみが用いられた。テーブルのスキャンは必要なかった。ブロックのサイズおよびファイル名の長さが典型的なものであれば、少なくとも６００のディレクトリエントリが１つのディスクブロックに適合することになり、典型的なディレクトリは６００エントリ未満を有する。つまり、所与のディレクトリ中のディレクトリエントリのリストは、典型的には単一のブロックに適合することになる。言換えれば、インデックスエントリのDir＿entry＿listアレイ全体を含む階層インデックス５１０の各インデックスエントリは、典型的には単一のブロックに適合することになり、したがって単一のＩ／Ｏオペレーションにおいて読出され得る。
【００５９】
階層インデックス５１０中のインデックスエントリからインデックスエントリへの移動において、インデックス中のさまざまなインデックスエントリが種々の異なるディスクブロックに存在する場合、ディスクアクセスをいくらか行なう必要があるという可能性もある。しかしながら、各インデックスエントリが単一のブロックに完全に適合すれば、ディスクアクセスの数は、そのパス中のディレクトリの数以下となる。平均のインデックスエントリのサイズが単一のディスクブロックに適合しなくとも、ディレクトリごとのディスクアクセスの数は一定の項（term）となり、ファイルシステム中のファイルの総数に伴って増加することはない。
【００６０】
いくつかのファイルシステムが所有する階層的特徴をエミュレートするための技術についての以上の記述は単に例示的なものである。いくつかのファイルシステムおよびプロトコルの階層的な特徴をエミュレートするために他の技術も用いられ得る。さらに、階層的特徴を所有することさえないプロトコルもあり得る。このように、本発明は、いくつかのプロトコルの階層的特徴をエミュレートするための何らかの特定の技術に限定されることはない。さらに、本発明は、本質的に階層的であるプロトコルに限定されることもない。
【００６１】
データベースにおける他のＯＳファイルシステム特徴のエミュレート
ＯＳファイルシステムの階層的な構成以外に、ほとんどのＯＳファイルシステムの別の特徴は、それらがストアするファイルについて特定のシステム情報を保持していることである。一実施例によれば、このＯＳファイルシステム特徴もまた、データベースシステム内でエミュレートされる。具体的には、トランスレーションエンジン３０８が、あるファイルの「システム」データをデータベースサーバ２０４により管理されるファイルテーブル（たとえばファイルテーブル７１０）のある行にストアさせるコマンドを発する。一実施例によれば、ファイル内容のすべてまたは大部分が、その行のある欄に大規模バイナリオブジェクト（ＢＬＯＢ）としてストアされる。このＢＬＯＢの欄に加えて、このファイルテーブルはさらに、ＯＳファイルシステムで実現されるものに対応する属性値をストアするための欄を含む。このような属性値は、たとえば、ファイルの所有者または作成者、ファイルの作成日、ファイルの最終変更データ、ファイルへのハードリンク、ファイル名、ファイルサイズ、およびファイルタイプを含む。
【００６２】
トランスレーションエンジン３０８がデータベースサーバ２０４に対して何らかのファイルオペレーションを行なわせるようにデータベースコマンドを発する場合、それらのデータベースコマンドは、そのオペレーションに伴うファイルに関連付けられた属性を適切に変更させるステートメントを含む。たとえば、新たに作成されたファイルに対するファイルテーブル中に新たな行を挿入することに応答して、トランスレーションエンジン３０８はデータベースコマンドを発し、（１）誰がそのファイルを作成しているかをユーザに示す値をその行の「所有者」欄にストアし、（２）現在の日付を示す値をその行の「作成日」欄にストアし、（３）現在の日付および時刻を示す値を「最終変更」欄にストアし、（４）ＢＬＯＢのサイズを示す値を「サイズ」欄にストアする。このファイルにおける後続のオペレーションに応答して、これらの欄中の値はこれらのオペレーションにより要求されたとおり変更される。たとえば、トランスレーションエンジン３０８が特定の行にストアされたファイルの内容を変更するデータベースコマンドを発すると、同じオペレーションの部分として、トランスレーションエンジン３０８は、その特定の行の「最終変更」値を更新するデータベースコマンドを発する。さらに、この変更がファイルサイズを変えるものであれば、トランスレーションエンジン３０８は、その特定の行の「サイズ」値を更新するデータベースコマンドも発する。
【００６３】
ほとんどのＯＳファイルシステムの別の特徴は、各ファイルごとにセキュリティを提供する能力である。たとえば、Windows（Ｒ）ＮＴ、ＶＭＳおよびＵＮＩＸ（Ｒ）のいくつかのバージョンは、各ファイルに関してさまざまなエンティティが有する権利を示すアクセス制御リストを保持する。本発明の一実施例によれば、このＯＳファイルシステム特徴は、「セキュリティテーブル」を保持することによりデータベースシステム内でエミュレートされ、このセキュリティテーブルの各行は、アクセス制御リストのあるエントリと同様の内容を含む。たとえば、このセキュリティテーブル中の行がファイルを特定する値をストアするためのある欄と、許可タイプ（たとえば読出、更新、挿入、実行、変更の許可）を表わす値をストアするための別の欄と、その許可が与えられたか否かを示すフラグをストアする別の欄と、そのファイルに対する許可の所有者を表わす値をストアする所有者欄とを含む。この所有者とは、ユーザＩＤ（userid）で特定される単一のユーザでも、グループＩＤ（groupid）で特定されるグループでもよい。グループの場合は、１つ以上の追加テーブルを用いてそのグループＩＤをそのグループのメンバーのユーザＩＤにマッピングする。
【００６４】
データベースサーバ２０４により管理されるファイルテーブルにストアされたあるファイルにアクセスするデータベースコマンドを発する前に、トランスレーションエンジン３０８は、アクセスを要求しているユーザが特定されたファイルに対して要求されたアクセスのタイプを実行する許可を有することを検証するデータベースコマンドを発する。このようなプリアクセスデータベースコマンドにより、セキュリティテーブルからデータが検索され、アクセスを要求しているユーザにそのアクセスの実行が許可されているか否かが判定される。このように検索されたデータが、ユーザが要求された許可を有していないと示せば、トランスレーションエンジン３０８は要求されたオペレーションを実行するコマンドを発しない。その代わりに、トランスレーションエンジン３０８は要求の発生元のオペレーティングシステムへエラーメッセージを返す。このエラーメッセージに応答して、オペレーティングシステムは、アクセスを要求したアプリケーションに、そのアプリケーションがそのオペレーティングシステムのＯＳファイルシステムに保持されるあるファイルに許可なしにアクセスしようと試みた場合に送るであろうものと同じＯＳエラーメッセージを送る。このように、エラー状況下でも、データがＯＳファイルシステムではなく関係データベース中にストアされるという事実は、アプリケーションにはトランスペアレントである。
【００６５】
異なるオペレーティングシステムにはファイルについての異なるタイプのシステム情報がストアされる。たとえば、あるオペレーティングシステムは「アーカイブ」フラグをストアするがアイコン情報はストアしない場合もあり、また別のものはアイコン情報をストアしアーカイブフラグをストアしない場合もある。ここに記載の技術を実現するデータベースシステムにより保持されるシステムデータの特定のセットは、各実現例ごとに変化し得る。たとえば、データベース２０４はオペレーティングシステム３０４ａのＯＳファイルシステムによりサポートされるシステムデータのすべてをストアし得るが、オペレーティングシステム３０４ｂのＯＳファイルシステムによりサポートされるシステムデータはいくつかしかストアし得ない。これに代えて、データベースサーバはオペレーティングシステム３０４ａおよび３０４ｂの両者によりサポートされるシステムデータの全部をストアしてもよく、またはオペレーティングシステム３０４ａおよび３０４ｂのいずれか１つによりサポートされるシステムデータの一部をストアしてもよい。
【００６６】
図３に示すように、データベースサーバ２０４は多数の別々のＯＳファイルシステムから発生したファイルをストアする。たとえば、オペレーティングシステム３０４ａはオペレーティングシステム３０４ｂとは異なっていてもよく、また、オペレーティングシステム３０４ａおよび３０４ｂの両者がオペレーティングシステム１０４とは異なっていてもよい。ＯＳファイルシステム３０４ａおよび３０４ｂは対比する特徴を有し得る。たとえば、ＯＳファイルシステム３０４ａはファイル名に文字「／」を含むことを可能にし得るのに対し、ＯＳファイルシステム３０４ｂは可能にし得ない。一実施例によれば、このような状況において、トランスレーションエンジン３０８はＯＳファイルシステム特有の規則を実現するよう構成される。このように、アプリケーション３０２ａがファイル名に文字「／」を含むファイルをストアしようと試みると、トランスレーションエンジン３０８はデータベースサーバ２０４にそのオペレーションを実行させるデータベースコマンドを発する。一方、アプリケーション３０２ｂがファイル名に文字「／」を含むファイルをストアしようと試みると、トランスレーションエンジン３０８はエラーを生じさせる。
【００６７】
これに代えて、トランスレーションエンジン３０８は、すべてのオペレーティングシステムに対する規則の単一のセットを実現するよう構成され得る。たとえば、トランスレーションエンジン３０８は、ファイル名がトランスレーションエンジン３０８によりサポートされる１つのオペレーティングシステムにおいてさえも無効な場合には、たとえそのファイル名がそのファイル名を特定したコマンドを発したオペレーティングシステムにおいて有効であっても、エラーを生じさせることになる、という規則を実現し得る。
【００６８】
ＯＳファイルシステムコールのデータベースクエリへの変換
ＯＳファイルシステム特徴をデータベースシステム内でエミュレートするための機構を構築することにより、ＯＳファイルシステムのコールが、ＯＳファイルシステムのコールを行なっているアプリケーションにより予期される機能性を失うことなく、トランスレーションエンジン３０８によってデータベースクエリへ変換され得る。それらのアプリケーションによりなされたこのＯＳファイルシステムコールは、それらが実行されているオペレーティングシステムにより提供されるＯＳファイルＡＰＩを通じて行なわれる。たとえば、「Ｃ」プログラミング言語で書かれたプログラムに対しては、あるオペレーティングシステムのＯＳファイルＡＰＩのインターフェイスを特定するために「stdio.h」と題されたソースコードファイルが用いられる。このstdio.hファイルはアプリケーションに含まれるので、これらのアプリケーションはＯＳファイルＡＰＩを実現するルーチンをいかに呼出すかを知ることになる。
【００６９】
ＯＳファイルＡＰＩを実現する特定のルーチンはオペレーティングシステムごとに変化し得るが、典型的には次のオペレーションを行なうためのルーチンを含む。ファイルを開く、ファイルから読出す、ファイルへ書込む、ファイル内をシークする、ファイルをロックする、およびファイルを閉じる。一般に、これらのＩ／Ｏコマンドから関係データベースコマンドへのマッピングは、
ファイルを開く＝トランザクションを開始する、パスネームを導出しファイルを含む行の場所を突きとめる
ファイルへ書込む＝更新する
ファイルから読出す＝選択する
ファイルをロックする＝ファイルに関連付けられた行をロックする
ファイル内へシークする＝カウンタを更新する
ファイルを閉じる＝トランザクションを完遂させる（Windows（Ｒ）ＯＳファイルシステムプロトコルは、ファイルデータが書込まれる直前にディレクトリエントリが完遂するよう要求する。他のプロトコルは要求しない。）
以下により詳細に説明するように、ファイルの内容を受ける前であってもファイルの名称を可視にすることを予期するファイルシステムもある。これらのファイルシステムの関連で、「ファイルを開く」Ｉ／Ｏコマンドは、名称を書込むためのトランザクションの開始、名称を書込むためのトランザクションの完遂、および内容を書込むためのトランザクションの開始に対応する。
【００７０】
一実施例によれば、カウンタを用いてファイル内の「現在場所」が追跡される。ファイルがＢＬＯＢとしてストアされる実施例において、カウンタはＢＬＯＢの始めからオフセットの形態をとり得る。「ファイルを開く」コマンドが実行されると、カウンタが作成され、問題のＢＬＯＢの実行開始アドレスを示す値に設定される。ＢＬＯＢのカウンタはこの後データがＢＬＯＢから読出されるかまたはＢＬＯＢに書込まれることに応答して増分される。シークオペレーションは、このシークオペレーションのパラメータにより指示されたＢＬＯＢ内の場所を指すようにカウンタを更新させる。一実施例によれば、これらのオペレーションは、ノリ（Nori）他により１９９７年１０月３１日に出願され「ＬＯＢロケータ（LOB LOCATORS）」と題された米国特許出願番号０８／９６２，４８２号に記載されるようなＬＯＢロケータを用いることにより容易になり、この出願の全内容をここに引用により援用する。
【００７１】
いくつかのオペレーティングシステムにおいて、ＯＳのロックはファイルを閉じても続く場合がある。この特徴をエミュレートするためには、ロックファイルコマンドが、セッションロックのリクエストに変換される。この結果、「トランザクションの完遂」がこのファイルを閉じるコマンドに応答して実行される場合、そのファイルに関連付けられた行におけるロックは自動的に解除されない。このように確立されたロックは、ファイルのロックを解除するコマンドに応答して明示的に、またはロックが得られたデータベースセッションの終了に応答して自動的に、のいずれかで解除される。
【００７２】
進行中のＩ／Ｏオペレーション
あるファイルが作成されると、そのファイルが作成されるディレクトリはそのファイルの存在を示すように更新される。いくつかのＯＳファイルシステムにおいて、新たなファイルを示すようにディレクトリを変更することは、新たなファイルが完全に生成される前に完遂される。それらのＯＳファイルシステム用に設計されたアプリケーションには、その特徴をうまく利用するものもある。たとえば、あるアプリケーションは第１のファイルハンドルで新たなファイルを開き、そのファイル中へのデータの書込みへと進み得る。データが書込まれている間、同じアプリケーションが第２のファイルハンドルでそのファイルを開くことができる。
【００７３】
この特徴をデータベース内でエミュレートすることには特殊な問題が伴う。というのは、一般に、データベーストランザクションが完遂するまで、別のトランザクションはそのトランザクションによりなされた変更を見ることができないからである。たとえば、第１のデータベーストランザクションが第１の「開く」コマンドに応答して開始されたとする。第１のトランザクションは特定のディレクトリ中にファイルが存在することを示すようにディレクトリテーブルを更新し、その後、ファイルを含む行を挿入するようにファイルテーブルを更新する。同じアプリケーションが発した第２の「開く」コマンドに応答して第２のデータベーストランザクションが開始されると、第２のデータベーストランザクションには、ディレクトリテーブルに対する変更も、ファイルテーブル中の新たな行も、第１のトランザクションが完遂するまで見えない。
【００７４】
本発明の一実施例によれば、作成進行中のファイルのディレクトリエントリを見る能力は、そのファイルに対する行をファイルテーブル中に挿入するのに用いるトランザクションとは別のトランザクションとしてディレクトリテーブルの更新を行なわせることにより、データベースシステム内でエミュレートされる。このように、第１の開くコマンドに応答して、トランスレーションエンジン３０８はデータベースコマンドを発し、（１）第１のトランザクションを開始させ、（２）新たなファイルの存在を示すようにディレクトリテーブルを変更し、（３）第１のトランザクションを完遂させ、（４）第２のトランザクションを開始させ、（５）このファイルのある行をファイルテーブル中に挿入し、（６）第２のトランザクションを完遂させる。ディレクトリテーブルに対する変更をファイルテーブルに対する変更とは別に完遂させることにより、第２の開くコマンドに応答して開始される第３のトランザクションは、ファイルテーブル中への挿入がまだ進行している間にディレクトリテーブル内のエントリを見ることができる。第２のトランザクションに失敗すれば、このディレクトリは内容を持たずにファイルのエントリとともに残されることになる。
【００７５】
トランスレーションエンジン
本発明の一実施例によれば、トランスレーションエンジン３０８は２つの層で設計される。これらの層は図４に示される。図４を参照して、トランスレーションエンジン３０８は、プロトコルサーバ層およびＤＢファイルサーバ４０８層を含む。ＤＢファイルサーバ４０８は、アプリケーションが代替的なＡＰＩ（ここではＤＢファイルＡＰＩと称す）を通じて、データベースサーバ２０４により管理されるデータベースにストアされたデータにアクセスできるようにする。ＤＢファイルＡＰＩは、ＯＳファイルＡＰＩとデータベースＡＰＩとの両方の局面を組合せる。具体的には、ＤＢファイルＡＰＩは、従来のＯＳファイルＡＰＩによりサポートされたものと同様のファイルオペレーションをサポートする。
【００７６】
しかしながら、ＯＳファイルＡＰＩとは異なり、ＤＢファイルＡＰＩはトランザクションのデータベースＡＰＩの概念を組入れる。すなわち、ＤＢファイルＡＰＩにより、アプリケーションが、ファイルオペレーションのセットが原子単位で実行されることを特定できるようになる。トランザクションが行なわれたファイルシステムを有することの利点について、以下により詳細に述べる。
【００７７】
ＤＢファイルサーバ
ＤＢファイルサーバ４０８は、ＤＢファイルＡＰＩコマンドをデータベースコマンドに変換するという役割を担う。ＤＢファイルサーバ４０８が受けたＤＢファイルＡＰＩコマンドは、トランスレーションエンジン３０８のプロトコルサーバ層から来るものであってもよく、または、ＤＢファイルＡＰＩを通じてコールを発することによりファイルオペレーションを行なうよう特に設計されたアプリケーション（たとえばアプリケーション４１０）から直接のものであってもよい。
【００７８】
一実施例によれば、ＤＢファイルサーバ４０８はオブジェクト指向である。このように、ＤＢファイルサーバ４０８により供給されるルーチンがあるオブジェクトのインスタンス生成により、またそのオブジェクトに関連付けられた方法をコールすることにより呼出される。ある実現例において、ＤＢファイルサーバ４０８は次の方法を含む「トランザクション」オブジェクトクラスを規定する。挿入、保存、更新、削除、完遂およびロールバック。ＤＢファイルＡＰＩは、外部エンティティがこのトランザクションオブジェクトクラスのインスタンス生成を行ない使用できるようにする、インターフェイスを提供する。
【００７９】
具体的には、外部エンティティ（たとえばアプリケーション４１０またはプロトコルサーバ）がＤＢファイルサーバ４０８のコールを行ないトランザクションオブジェクトのインスタンスを生成すると、ＤＢファイルサーバ４０８はデータベースサーバ２０４に新たなトランザクションを始めさせるデータベースコマンドを送る。この外部エンティティはこの後トランザクションオブジェクトの方法を呼出す。ある方法を呼出すことは、結果としてＤＢファイルサーバ４０８に対するコールとなる。ＤＢファイルサーバ４０８はこのコールに応答して、データベースサーバ２０４に対応のデータベースコマンドを発する。所与のトランザクションオブジェクトの方法の呼出に応答して行なわれるデータベースオペレーションはすべて、この所与のトランザクションオブジェクトに関連付けられたデータベーストランザクションの部分として行なわれる。
【００８０】
重要なことには、ある単一のトランザクションオブジェクトに対して呼出された方法は複数のファイルオペレーションを伴う場合がある。たとえば、アプリケーション４１０は以下のようにＤＢファイルサーバ４０８と対話し得る。アプリケーション４１０は、ＤＢファイルＡＰＩを通じてコールすることによりトランザクションオブジェクトＴＸＯ１のインスタンス生成を行なう。これに応答して、ＤＢファイルサーバ４０８は、データベースサーバ２０４内でトランザクションＴＸ１を開始するデータベースコマンドを発する。アプリケーション４１０はＴＸＯ１の更新方法を呼出し、データベースサーバ２０４により管理されるデータベース中にストアされたファイルＦ１を更新する。これに応答して、ＤＢファイルサーバ４０８はデータベースサーバ２０４に、要求された更新をトランザクションＴＸ１の部分として行なわせるデータベースコマンドを発する。アプリケーション４１０はＴＸＯ１の更新方法を呼出し、データベースサーバ２０４により管理されるデータベース中にストアされた第２のファイルＦ２を更新する。これに応答して、ＤＢファイルサーバ４０８は、データベースサーバ２０４に、要求された更新をトランザクションＴＸ１の部分として行なわせるデータベースコマンドを発する。この後アプリケーション４１０は、ＴＸＯ１の完遂方法を呼出す。これに応答して、ＤＢファイルサーバ４０８は、データベースサーバ２０４にＴＸ１を完遂させるデータベースコマンドを発する。ファイルＦ２への更新に失敗した場合、ＴＸＯ１のロールバック方法が呼出され、ファイルＦ１の更新を含む、ＴＸ１によりなされたすべての変更がロールバックされる。
【００８１】
ここではトランザクションオブジェクトを用いるＤＢファイルサーバを参照して技術が述べられてきたが、他の実現例も可能である。たとえば、ＤＢファイルサーバ内で、トランザクションではなくファイルを表わすのにオブジェクトを用いることもできる。このような実現例において、ファイルオブジェクトの方法を呼出すことにより、またオペレーションが実行されようとするトランザクションを特定するデータをそれへ渡すことにより、ファイルオペレーションが行なわれ得る。したがって、本発明は、オブジェクトクラスの何らかの特定のセットを実現するＤＢファイルサーバに限定されない。
【００８２】
説明の目的で、図４に表わす実施例は、データベースＡＰＩを通じてデータベースサーバ２０４と通信する処理実行外部データベースサーバ２０４としてＤＢファイルサーバ４０８を示す。しかしながら、代替的実施例によれば、ＤＢファイルサーバ４０８の機能性はデータベースサーバ２０４に組込まれている。ＤＢファイルサーバ４０８をデータベースサーバ２０４に組込むことにより、ＤＢファイルシステムの使用中に生成される処理間通信の量が減じられる。ＤＢファイルサーバ４０８をデータベースサーバ２０４に組込むことにより作り出されるデータベースサーバは、したがって、データベースサーバ２０４により管理されるデータにアクセスするための２つの代替的なＡＰＩを、すなわちＤＢファイルＡＰＩおよびデータベースＡＰＩ（ＳＱＬ）を提供する。
【００８３】
プロトコルサーバ
トランスレーションエンジン３０８のプロトコルサーバ層は、特定のプロトコルとＤＢファイルＡＰＩコマンド間での変換を行なうという役割を担う。たとえば、プロトコルサーバ４０６ａは、オペレーティングシステム３０４ａから受けたＩ／Ｏコマンドを、それがＤＢファイルサーバ４０８に送るＤＢファイルＡＰＩコマンドに変換する。プロトコルサーバ４０６ａはまた、ＤＢファイルサーバ４０８から受けたＤＢファイルＡＰＩコマンドを、それがオペレーティングシステム３０４ａに送るＩ／Ｏコマンドに変換する。
【００８４】
実際には、プロトコルとオペレーティングシステムとは１対１対応になっていない。むしろ、オペレーティングシステムの多くは１より多い数のプロトコルをサポートし、またプロトコルの多くは、１より多い数のオペレーティングシステムによりサポートされる。たとえば、単一のオペレーティングシステムが１つ以上のネットワークファイルプロトコル（ＳＭＢ、ＦＴＰ、ＮＦＳ）、Ｅメールプロトコル（ＳＭＴＰ、ＩＭＡＰ４）、およびウェブプロトコル（ＨＴＴＰ）に対して固有のサポートをもたらす場合もある。さらに、異なるオペレーティングシステムがサポートするプロトコルのセット間にはオーバーラップがよく起こる。しかしながら、例示の目的で、オペレーティングシステム３０４ａがあるプロトコルをサポートし、オペレーティングシステム３０４ｂが別のプロトコルをサポートするという、単純化された環境が示される。
【００８５】
Ｉ／ＯＡＰＩ
上述したように、Ｉ／ＯコマンドをＤＢファイルコマンドに変換するためにプロトコルサーバが用いられる。プロトコルサーバとそれらの通信相手のＯＳファイルシステムとの間のインターフェイスは、包括的にラベル付けされたＩ／ＯＡＰＩである。しかしながら、あるプロトコルサーバにより与えられた特定のＩ／ＯＡＰＩは、（１）プロトコルサーバの通信相手のエンティティおよび（２）プロトコルサーバがいかにそのエンティティに現れるようにするか、にともに依存する。たとえば、オペレーティングシステム３０４ａはMicrosoft Windows（Ｒ）ＮＴであってもよく、プロトコルサーバ４０６ａはMicrosoft Windows（Ｒ）ＮＴに対するデバイスドライバとして出現するよう設計されてもよい。この状況下で、プロトコルサーバ４０６ａによってオペレーティングシステム３０４ａに提示されたＩ／ＯＡＰＩは、Windows（Ｒ）ＮＴにより理解されるデバイスインターフェイスのタイプとなるであろう。Windows（Ｒ）ＮＴは、何らかのストレージ装置と通信するのと同じように、プロトコルサーバ４０６ａと通信するとされる。プロトコルサーバ４０６ａにストアされたファイルおよびそこから検索されたファイルが実際にはデータベースサーバ２０４により保持されるデータベースにストアされまたそこから検索されているという事実は、Windows（Ｒ）ＮＴには完全にトランスペアレントである。
【００８６】
トランスレーションエンジン３０８により用いられるいくつかのプロトコルサーバがそれらのそれぞれのオペレーティングシステムにデバイスドライバインターフェイスを提示し得るのに対し、他のプロトコルサーバは他のタイプのエンティティとして出現し得る。たとえば、オペレーティングシステム３０４ａはMicrosoft Windows（Ｒ）ＮＴオペレーティングシステムであってもよく、プロトコルサーバ４０６ａはそれ自身をデバイスドライバとして提示するのに対し、オペレーティングシステム３０４ｂはMicrosoft Windows（Ｒ）９５オペレーティングシステムであって、プロトコルサーバ４０６ｂがそれ自身をシステムメッセージブロック（ＳＭＢ）サーバとして提示することもある。後者の場合、プロトコルサーバ４０６ｂは、典型的にはオペレーティングシステム３０４ｂとは別のマシン上で実行していることになり、オペレーティングシステム３０４ｂとプロトコルサーバ４０６ｂとの間の通信はネットワーク接続を介して起こることになる。
【００８７】
上記の例において、プロトコルサーバにより扱われるＩ／ＯコマンドのソースはＯＳファイルシステムである。しかしながら、トランスレーションエンジン３０８はＯＳファイルシステムのコマンドとともに用いることに限定されない。むしろ、ＤＢファイルコマンドと何らかのタイプのＩ／Ｏプロトコルとの間で変換を行なうために、プロトコルサーバが設けられ得る。ＯＳファイルシステムにより用いられるＩ／Ｏプロトコル以外にも、プロトコルサーバがそれに対して設けられ得る他のプロトコルとしては、たとえば、ファイル転送プロトコル（ＦＴＰ：File Transfer Protocol）や、電子メールシステム（ＰＯＰ３またはＩＭＡＰ４）により用いられるプロトコルが挙げられる。
【００８８】
ＯＳファイルシステムとともに働くプロトコルサーバにより提供されるインターフェイスが特殊なＯＳにより指示されるのと同様、非ＯＳファイルシステムとともに働くプロトコルサーバにより提供されるインターフェイスは、Ｉ／Ｏコマンドを発するであろうエンティティに基づいて変化することになる。たとえば、ＦＴＰプロトコルに従ってＩ／Ｏコマンドを受けるように構成されたプロトコルサーバは、ＦＴＰサーバのＡＰＩを提供するであろう。同様に、ＨＴＴＰプロトコル、ＰＯＰ３プロトコルおよびＩＭＡＰ４プロトコルに従ってＩ／Ｏコマンドを受けるように構成されたプロトコルサーバは、それぞれ、ＨＴＴＰサーバ、ＰＯＰ３サーバおよびＩＭＡＰ４サーバのＡＰＩを提供するであろう。
【００８９】
ＯＳファイルシステムと同様、非ＯＳファイルプロトコルの各々は、そのファイルに対して保持される特定の属性を予測する。たとえば、ほとんどのＯＳファイルシステムが、あるファイルの最終変更日を示すデータをストアするのに対し、電子メールシステムは各Ｅメールメッセージに対して、そのＥメールメッセージが読まれたか否かを示すデータをストアする。特定のプロトコルの各々に対するプロトコルサーバは、そのプロトコルのセマンティクスが確実にデータベースファイルシステム中でエミュレートされるために要求される論理を実現する。
【００９０】
トランザクション処理されたファイルシステム
データベースシステム内で、オペレーションは一般にトランザクションの部分として行なわれる。データベースシステムは、あるトランザクションの部分であるオペレーションのすべてを単一の原子操作（atomic operation）として行なう。すなわち、オペレーションのすべてがうまく完了するか、またはオペレーションのいずれも実行されないか、のいずれかである。トランザクションの実行中、あるオペレーションが実行され得なければ、そのトランザクションの以前実行されたオペレーションがすべて取消されるか、または「ロールバックされる」。
【００９１】
データベースシステムとは対照的に、ＯＳファイルシステムはトランザクションに基づくものではない。したがって、大規模なファイルオペレーションに失敗すれば、そのオペレーションのうちその失敗以前に実行された部分は残る。不完全なファイルオペレーションを取消すことに失敗すると、ディレクトリ構造およびファイルの破損につながるおそれがある。
【００９２】
本発明の一局面によれば、トランザクション処理されたファイルシステムが設けられる。上述したように、トランスレーションエンジン３０８はＩ／Ｏコマンドをデータベースサーバ２０４に送られるデータベースステートメントへとコンバートする。特定されたＩ／Ｏオペレーションを実行するためにトランスレーションエンジン３０８によって送られた一連のステートメントは、トランザクション開始（begin transaction）ステートメントにより先行され、トランザクション終了（close transaction）ステートメントで終わる。結果として、データベースサーバ２０４によるそれらのステートメントの実行中に何らかの失敗が起これば、そのトランザクションの部分としてデータベースサーバ２０４によりなされる変更はすべてその失敗の時点までロールバックされることになる。
【００９３】
トランザクションの失敗を引起す事態は、Ｉ／Ｏコマンドの発生元のシステムに基づいて変化し得る。たとえば、ＯＳファイルシステムは署名の概念をサポートすることができ、ここであるファイルのソースを特定するデジタル「署名」はそのファイルに付加される。署名されたファイルをストアするために開始されたトランザクションは、たとえば、そのストアされているファイルの署名が予測どおりの署名でない場合、失敗するおそれがある。
【００９４】
オンザフライ・インテリジェントファイルのコンバート
本発明の一実施例によれば、ファイルは、関係データベースに挿入される前に処理され、それらがその関係データベースから検索されると再び処理される。図９はインバウンドおよびアウトバウンドのファイル処理を行なうのに用いられるＤＢファイルサーバ３０８の機能的構成要素を示すブロック図である。
【００９５】
図９を参照して、トランスレーションエンジン３０８は、レンダリングユニット９０４およびパーシングユニット９０２を含む。一般に、パーシングユニット９０２は、ファイルのインバウンド処理を行なう役割を担い、レンダリングユニット９０４はファイルのアウトバウンド処理を行なう役割を担う。これらの機能的ユニットの各々について、ここでより詳細に述べることにする。
【００９６】
インバウンドのファイル処理
インバウンドのファイルは、ＤＢファイルＡＰＩを介してＤＢファイルサーバ４０８に渡される。インバウンドのファイルを受取ると、パーシングユニット９０２はそのファイルのファイルタイプを特定し、その後、そのファイルタイプに基づいてそのファイルをパーシングする。パーシング処理中、パーシングユニット９０２はパーシングされているファイルから構造化された情報を抽出する。この構造化された情報とは、たとえば、パーシングされているファイルについての情報、またはこのファイルの論理的に別個の構成要素もしくはフィールドを表わすデータを含み得る。この構造化情報は、構造化情報の発生元のファイルととともにデータベース中にストアされる。その後、データベースサーバに対してクエリが出され、このように抽出された構造化情報が特定のサーチ条件を満たすか否かに基づいてファイルを選択かつ検索し得る。
【００９７】
ある文書のパーシングを行なうためにパーシングユニット９０２により用いられる特定の技術、およびそれにより生成される構造化されたデータは、パーシングユニット９０２に渡された文書のタイプに基づいて変化することになる。したがって、何らかのパーシングオペレーションを行なう前に、パーシングユニット９０２はこの文書のファイルタイプを特定する。あるファイルのファイルタイプを決定するには、さまざまな要因が考慮され得る。たとえば、ＤＯＳまたはWindows（Ｒ）のオペレーティングシステムでは、あるファイルのファイルタイプは、そのファイルのファイル名中の拡張子により示されることが多い。すなわち、ファイル名が「．ｔｘｔ」で終わる場合、パーサユニット９０２はそのファイルをテキストファイルであると分類し、テキストファイル特有のパーシング技術をそのファイルに与える。同様に、ファイル名が「．ｄｏｃ」で終わる場合は、パーサユニット９０２はそのファイルをMicrosoft Word文書であると分類し、Microsoft Word特有のパーシング技術をそのファイルに与える。これに対して、Macintoshオペレーティングシステム（Macintosh Operating System）は、あるファイルに対するファイルタイプ情報をそのファイルとは別に保持される属性としてストアする。
【００９８】
あるファイルのファイルタイプを決定するためにパーシングユニット９０２が考慮し得る他の因子として、たとえば、そのファイルが位置付けられるディレクトリが挙げられる。したがって、パーサユニット９０２は、WordPerfect文書として＼WordPerfect＼文書ディレクトリにストアされるすべてのファイルを、それらのファイルのファイル名にかかわらず、分類およびパーシングするよう構成され得る。
【００９９】
これに代えて、インバウンドのファイルのファイルタイプとリクエスト元のエンティティが要求するファイルタイプとの両者が、ＤＢファイルサーバ４０８に対して提供される情報によって特定されるかまたはそれを通じて推定される場合もある。たとえば、あるウェブブラウザがメッセージを送る場合、そのメッセージは典型的にはそのブラウザについての情報（たとえばブラウザのタイプ、バージョンなど）を含む。あるウェブブラウザがＨＴＴＰプロトコルサーバを通じてあるファイルをリクエストする場合、この情報はＤＢファイルサーバ４０８に伝達される。この情報に基づいて、レンダリングユニット９０４はそのブラウザの能力（capability）についての情報を調べ、またそれらの能力から最良のファイルタイプを推定してブラウザへ運ぶこともできる。
【０１００】
上述したように、パーシングユニット９０２により用いられる特定のパーシング技術、およびこのように生成された構造化データのタイプは、パーシングされているファイルのタイプに基づいて変化することになる。たとえば、パーシングユニット９０２により生成された構造化データは、埋込メタデータ、導出（derived）メタデータおよびシステムメタデータを含み得る。埋込メタデータはファイル自体に埋込まれた情報である。導出メタデータは、ファイル内に含まれておらずそのファイルを解析することにより導出され得る情報である。システムメタデータは、ファイルの発生元のシステムにより提供されたファイルについてのデータである。
【０１０１】
たとえば、アプリケーション４１０がMicrosoft Word文書をパーシングユニット９０２に渡すとする。パーシングユニット９０２はその文書をパーシングしてそのファイル内に埋込まれているファイルについての情報を抽出する。Microsoft Word文書に埋込まれる情報としては、たとえば、文書の作者、文書が割当てられるカテゴリ、および文書についてのコメントを示すデータが挙げられる。
【０１０２】
Word文書についての埋込情報の場所を突きとめ抽出するのに加えて、パーサ９０２はその文書についての情報の導出もし得る。たとえば、パーサ９０２はこのWord文書をスキャンしてこの文書に含まれるページ数、段落数および語数を決定し得る。最後に、この文書が発生したシステムは、この文書のサイズ、作成日、最終変更日、およびファイルタイプを示すデータをパーシングユニット９０２に供給し得る。
【０１０３】
ある文書のファイルタイプがより構造化されるほど、この文書から構造化されたデータの特定のアイテムを抽出しやすくなる。たとえば、ＨＴＭＬ文書は典型的には、特定のフィールド（タイトル、ヘッディング１、ヘッディング２など）の初めと終わりとを特定するデリミタまたは「タグ」を有する。これらのデリミタはパーシングユニット９０２により用いられ、ＨＴＭＬ文書をパーシングすることによって、区切られたフィールドのいくつかまたはすべてに対してメタデータのアイテムをもたらし得る。同様に、ＸＭＬファイルは高度に構造化されており、ＸＭＬパーサはＸＭＬ文書中に含まれるフィールドのいくつかまたはすべてに対するメタデータの別のアイテムを抽出することができるであろう。
【０１０４】
一旦あるファイルに対してパーシングユニット９０２が構造化データを生成していれば、ＤＢファイルサーバ４０８はデータベースサーバ２０４にデータベースコマンドを発し、そのファイルをファイルテーブル（たとえばファイルテーブル７１０）のある行に挿入させる。一実施例によれば、このように発せられたデータベースコマンドはこのファイルをその行のある欄におけるＢＬＯＢとしてストアし、そのファイルについて生成された構造化データのさまざまなアイテムを同じ行の他の欄にストアする。
【０１０５】
これに代えて、あるファイルに対する構造化データアイテムのいくつかまたはすべてをファイルテーブルの外部にストアすることもできる。このような状況下で、あるファイルに関連付けられた構造化データをストアする行は、そのファイルを特定するデータを典型的に含むことになる。たとえば、Word文書がファイルテーブルの行Ｒ２０にストアされ、そのWord文書に対するシステムメタデータ（たとえば作成日、変更日など）がシステム属性テーブルの行Ｒ３４にストアされると想定する。このような状況において、ファイルテーブルのＲ２０およびシステム属性テーブルのＲ３４の両者が典型的にはWord文書に対する固有の識別子をストアするFileID欄を含むことになる。それから、クエリにより、ファイルテーブル中の行とシステム属性テーブル中の行とをFileID値に基づいて結合する結合ステートメントを発することで、そのファイルとそのファイルについてのシステムメタデータとの両方を検索することができる。ファイル「クラス」に関連付けられたテーブル中のファイル属性をストアするための技術について以下により詳細に説明する。
【０１０６】
アウトバウンドのファイル処理
アウトバウンドのファイルは、データベースサーバ２０４に送られたデータベースコマンドに応答して検索される情報に基づいてレンダリングユニット９０４により構築される。一旦構築されると、アウトバウンドのファイルはＤＢファイルＡＰＩを通じてそれをリクエストしたエンティティへと運ばれる。
【０１０７】
重要なことには、レンダリングユニット９０４により生じたアウトバウンドファイルのファイルタイプ（ターゲットファイルタイプ）は、必ずしもそのアウトバウンドファイルを構築するのに用いられたデータを生じたファイルと同じファイルタイプ（ソースファイルタイプ）でなくてもよい。たとえば、レンダリングユニット９０４はもともとデータベース内にWordファイルとしてストアされていたデータに基づいてテキストファイルを構築し得る。
【０１０８】
さらに、アウトバウンドファイルを要求するエンティティは、そのアウトバウンドファイルが構築されるもとのファイルを生じたエンティティとは全く別のプロトコルを用いて全く別のプラットフォーム上にあってもよい。たとえば、プロトコルサーバ４０６ｂがＩＭＡＰ４サーバインターフェイスを実現し、プロトコルサーバ４０６ａがＨＴＴＰサーバインターフェイスを実現する場合を想定する。これらの状況下で、Ｅメールアプリケーションから発生するＥメール文書はプロトコルサーバ４０６ｂを通じてデータベース内にストアされ、プロトコルサーバ４０６ａを通じてWebブラウザによりデータベースから検索され得る。この筋書きでは、パーシングユニット９０２がこのＥメールのファイルタイプ（たとえばＲＦＣ８２２）に関連付けられたパーシング技術を呼出し、レンダリングユニットがデータベースから検索されたＥメールデータからＨＴＭＬ文書を構築するレンダリングルーチンを呼出すであろう。
【０１０９】
パーサおよびレンダラの登録
上述したように、あるファイルに施されるパーシング技術はそのファイルのタイプにより指示される。同様に、あるファイルに施されるレンダリング技術はそのファイルのソースタイプとそのファイルのターゲットタイプとの両者により指示される。すべてのコンピュータプラットホームにわたって存在するファイルタイプの数は莫大である。したがって、すべての公知のファイルタイプを扱うパーシングユニット９０２を築くのも、ファイルタイプからファイルタイプへのすべての可能なコンバートを扱うレンダリングユニット９０４を築くのも実用的でない。
【０１１０】
本発明の一実施例によれば、ファイルタイプの急増により引き起こされる問題は、タイプ特有のパーシングモジュールをパーシングユニット９０２に登録可能にすることにより、またタイプ特有のレンダリングモジュールをレンダリングユニット９０４に登録可能にすることにより、対処される。タイプ特有のパーシングモジュールとは、ある特定のファイルタイプに対してパーシング技術を実現するモジュールのことである。たとえば、Word文書は、Word文書パーシングモジュールを用いてパーシングされるのに対し、ＰＯＰ３のＥメール文書はＰＯＰ３Ｅメールパーシングモジュールを用いてパーシングされる。
【０１１１】
タイプ特有のパーシングモジュールと同様、タイプ特有のレンダリングモジュールとは、１つ以上のソースファイルタイプに関連付けられたデータを１つ以上のターゲットファイルタイプにコンバートするための技術を実現するモジュールのことである。たとえば、タイプ特有のレンダリングモジュールは、Word文書をテキスト文書にコンバートするために設けられ得る。
【０１１２】
ソースファイルタイプとターゲットファイルタイプとが同じであってもコンバートが必要となる場合もある。たとえば、パーシングされデータベース内に挿入されると、ＸＭＬ文書の内容は単一のＢＬＯＢには保持されず、多数のテーブルの多数の欄にわたって広がり得る。その場合は、たとえそのデータがもはやＸＭＬファイルとしてストアされていなくとも、ＸＭＬがそのデータのソースファイルタイプである。タイプ特有のレンダリングモジュールは、そのデータからＸＭＬ文書を構築するために設けられ得る。
【０１１３】
パーシングユニット９０２がインバウンドのファイルを受けると、パーシングユニット９０２はそのファイルのファイルタイプを決定し、タイプ特有のパーシングモジュールがそのファイルタイプに対して登録されているか否かを判定する。タイプ特有のパーシングモジュールがそのファイルタイプに登録されていれば、パーシングユニット９０２はそのタイプ特有のパーシングモジュールにより与えられたパーシングルーチンをコールする。それらのパーシングルーチンはインバウンドファイルをパーシングしてメタデータを生成し、このメタデータはこの後そのファイルとともにデータベース内にストアされる。タイプ特有のパーシングモジュールがそのファイルタイプに対して登録されていなければ、パーシングユニット９０２はエラーを生じるか、あるいは汎用のパーシング技術をそのファイルに施し得る。この汎用のパーシング技術にはファイルの内容についての知識が何もないので、そのファイルに対して生成できる有用なメタデータに関してこの汎用のパーシング技術が制限されることになる。
【０１１４】
レンダリングユニット９０４がファイルリクエストを受けると、レンダリングユニット９０４はデータベースコマンドを発し、そのファイルに関連付けられるデータを検索する。そのデータは、ファイルのソースファイルタイプを示すメタデータを含む。レンダリングユニット９０４はこの後、タイプ特有のレンダリングモジュールがそのソースファイルタイプに対して登録されているか否かを判定する。タイプ特有のレンダリングモジュールがそのソースファイルタイプに対して登録されていれば、そのタイプ特有のレンダリングモジュールにより与えられたレンダリングルーチンを呼出してファイルを構築し、こうして構築されたファイルを、そのファイルをリクエストしているエンティティへ与える。
【０１１５】
タイプ特有のレンダリングモジュールによってどのターゲットファイルタイプを選択すべきかを決定するためにさまざまな因子が用いられ得る。ファイルをリクエストしているエンティティが、それが要求するファイルのタイプを明示的に示す場合もある。たとえばテキスト編集者はテキストファイルのみ扱うことができる。テキスト編集者はソースファイルタイプがWord文書であるファイルをリクエストできる。このリクエストに応答して、Word特有のレンダリングモジュールが呼出され、これが、要求されたターゲットファイルタイプに基づいて、このWord文書をテキストファイルにコンバートする。このテキストファイルはその後テキスト編集者へ運ばれる。
【０１１６】
他には、ファイルをリクエストしているエンティティが多数のファイルタイプをサポートし得る場合もある。一実施例によれば、タイプ特有のレンダリングモジュールは、（１）リクエスト元のエンティティとタイプ特有のレンダリングモジュールとの両者によりサポートされるファイルタイプのセットを特定し、（２）そのセットにおいて最良のターゲットファイルタイプを選択する、という論理を組込む。この最良のターゲットファイルの選択には、問題のファイル特有の特徴を含む、さまざまな因子を考慮に入れることができる。
【０１１７】
たとえば、（１）ＤＢファイルサーバ４０８があるファイルに対するリクエストを受け、（２）そのファイルのソースファイルタイプがそのファイルは「ＢＭＰ」イメージであると示し、（３）このリクエストが「ＧＩＦ」、「ＴＩＦ」および「ＪＰＧ」イメージをサポートするエンティティにより開始されており、（４）ＢＭＰソースタイプ特有のレンダリングモジュールが「ＧＩＦ」、「ＪＰＧ」および「ＰＣＸ」のターゲットファイルタイプをサポートする、と仮定する。このような状況下で、ＢＭＰソースタイプ特有のレンダリングモジュールは、「ＧＩＦ」および「ＪＰＧ」の両者が可能性のあるターゲットファイルタイプであると決定する。これら２つの可能性のあるターゲットファイルタイプから選択するために、ＢＭＰソースタイプ特有のレンダリングモジュールは、そのファイルについての情報（その解像度および色の深みを含む）を考慮に入れ得る。この情報に基づいて、ＢＭＰソースタイプ特有のレンダリングモジュールはＪＰＧが最良のターゲットファイルタイプであると決定でき、このＢＭＰファイルをＪＰＧファイルにコンバートするよう進むことができる。その結果生じるＪＰＧファイルはこの後、リクエスト元のエンティティに運ばれる。
【０１１８】
一実施例によれば、タイプ特有のパーシングおよびレンダリングモジュールは、データベーステーブルにモジュールの能力を示す情報をストアすることにより登録される。たとえば、タイプ特有のレンダリングモジュールに対するエントリは、ソースファイルタイプがＸＭＬでありリクエスト元のエンティティがWindows（Ｒ）に基づいたWebブラウザである場合に用いるべきであることを示し得る。タイプ特有のパーシングモジュールに対するエントリは、ソースファイルタイプが.GIFイメージである場合にそれを用いるべきであるということを示し得る。
【０１１９】
ＤＢファイルサーバ４０８がＤＢファイルＡＰＩを通じてあるファイルに関係するコマンドを受けると、ＤＢファイルサーバ４０８は発生時のファイルタイプと、そのコマンドを発したエンティティのアイデンティティとを決定する。ＤＢファイルサーバ４０８はこの後データベースサーバ２０４にデータベースコマンドを発し、これによってデータベースサーバ２０４に、登録されたモジュールのテーブルをスキャンさせ、現状で用いるのに適切なモジュールを選択させる。インバウンドのファイルの場合、適切なパーシングモジュールが呼出され、データベースに挿入される前にファイルをパーシングする。アウトバウンドのファイルの場合、適切なレンダリングモジュールが呼出され、データベースから検索されたデータからアウトバウンドのファイルを構築する。
【０１２０】
本発明のある実施例によれば、ＤＢファイルシステムにより、オブジェクト指向技術を用いてファイルのクラスを規定することが可能になり、ここで各ファイルタイプは１つのファイルクラスに属し、ファイルクラスは他のファイルクラスからの属性を継承することができる。このようなシステムにおいて、あるファイルのファイルクラスはそのファイルに対する適切なパーサおよびレンダラを決定するのに用いられる因子となり得る。ファイルクラスの使用については以下により詳細に述べることにする。
【０１２１】
ストアドクエリディレクトリ
上記で説明したように、階層ディレクトリ構造は、各行が１つのファイルに対応するファイルテーブル７１０を用いて、データベースシステムにおいて実現され得る。特定されたファイルに関連付けられた行の場所をファイルのパス名に基づいて効率的に突きとめるために階層インデックス５１０が採用され得る。
【０１２２】
図５および図７に示す実施例において、各ディレクトリの子ファイルが明示的に列挙される。特に、各ディレクトリの子ファイルはそのディレクトリに関連付けられるインデックスエントリのDir＿entry＿listにおいて列挙される。たとえば、インデックスエントリ５１２はWindows（Ｒ）ディレクトリ６１４に対応し、インデックスエントリ５１２のDir＿entry＿listは「Word」および「Access」をWindows（Ｒ）ディレクトリ６１４の子ファイルとして明示的に列挙する。
【０１２３】
本発明の一局面によれば、いくつかまたはすべてのディレクトリの子ファイルが、明示的に列挙されるのではなく、ストアドクエリのサーチ結果に基づいて動的に決定される、ファイルシステムが提供される。このようなディレクトリはここではストアドクエリディレクトリと称す。
【０１２４】
たとえば、ファイルシステムのユーザが拡張子.docを有するすべてのファイルを単一のディレクトリにグループ分けしたいとする。従来のファイルシステムでは、ユーザはディレクトリを作成し、拡張子.docを有するすべてのファイルをサーチし、その後このサーチで見つかったファイルを新たに作成したディレクトリへ移動させるか、新たに作成したディレクトリとサーチで見つけたファイルとの間にハードリンクを作成するか、のいずれかを行なう。残念ながら、この新たに作成されたディレクトリの内容はサーチが行なわれた時点でのシステムの状態を正確に反映しているにすぎない。仮に、.doc拡張子を有さない名称に変えたとしても、フィールドはディレクトリ内に残ることになる。さらに、新規ディレクトリが確立された後に他のディレクトリ内に作成された.doc拡張子を有するファイルは、この新規ディレクトリには含まれない。
【０１２５】
新規ディレクトリのメンバーシップを統計的に規定するのではなく、このディレクトリのメンバーシップはストアドクエリにより規定され得る。拡張子.docを有するファイルを選択するストアドクエリは以下のように現われ得る。
【０１２６】
Ｑ１：
SELECT^* from files＿table
但し、
files＿table.Extension = “doc”
図７を参照して、テーブル７１０に対して実行されると、クエリＱ１は、Example.doc」と題された２つの文書に対する行である、行Ｒ４および行Ｒ１２を選択する。
【０１２７】
本発明の一実施例によれば、クエリＱ１などのクエリを階層インデックス５１０におけるディレクトリエントリにリンクするための機構が設けられる。階層インデックス５１０の横断中、そのようなリンクを含むディレクトリエントリに遭遇すると、このリンクにより特定されるクエリが実行される。このクエリにより選択された各ファイルは、ちょうどそのファイルがディレクトリエントリをストアするデータベーステーブル中の明示的なエントリであったかのように、ディレクトリエントリに関連付けられるディレクトリの子として扱われる。
【０１２８】
たとえば、ユーザがWord６１６の子であるディレクトリ「Documents」の作成を望み、このドキュメントディレクトリが拡張子.docを有するすべてのファイルを含むことを望むとする。本発明の一実施例によれば、このユーザは、このディレクトリに属することになるファイルに対する選択条件を特定する、クエリを設計する。この例では、ユーザはクエリＱ１を生成し得る。このクエリはこの後データベースシステム内にストアされる。
【０１２９】
他のタイプのディレクトリと同様、Documentディレクトリに対する行がファイルテーブル７１０に加えられ、このDocumentディレクトリに対するインデックスエントリが階層インデックス５１０に加えられる。さらに、Wordディレクトリに対するインデックスエントリのDir＿Entry＿listは、新規なDocumentディレクトリがWordディレクトリの子であることを示すように更新される。Dir＿Entry＿listにおける子を明示的にリストするのではなく、このDocumentディレクトリに対する新規ディレクトリエントリは、ストアドクエリに対するリンクを含む。
【０１３０】
図１０および図１１はそれぞれ、Documentsディレクトリに対して適切なエントリが作成された後の階層インデックス５１０とファイルテーブル７１０との状態を示す。図１０を参照して、Documentsディレクトリに対してインデックスエントリ１００４が作成されている。Documentsディレクトリの子はストアドクエリの結果セットに基づいて動的に決定されるので、インデックスエントリ１００４のDir＿entry＿listフィールドはヌル（null）である。子ファイルを静的に列挙する代わりに、インデックスエントリ１００４は、Documentsディレクトリの子ファイルを決定するために実行されることになっているストアドクエリ１００２へのリンクを含む。
【０１３１】
Documentsディレクトリに対するインデックスエントリ１００４の作成に加えて、Wordディレクトリに対する既存のインデックスエントリ５１４は、DocumentsがWordディレクトリの子であることを示すように更新される。具体的には、インデックスエントリ５１４にDir＿entry＿listアレイエントリが加えられ、名称「Documents」、Documentsディレクトリに対するインデックスエントリのRowID（すなわちＹ７）、およびDocumentsディレクトリのFileID（すなわちＸ１３）を特定する。
【０１３２】
図示した実施例では、階層インデックス５１０に２つの欄が加えられている。具体的には、ストアドクエリディレクトリ（ＳＱＤ：Stored Query Directory）欄は、ディレクトリエントリがストアドクエリディレクトリに対するものであるかを示すフラグを含む。ストアドクエリディレクトリに対するディレクトリエントリにおいて、クエリポインタ（ＱＰ：Query Pointer）欄に、ディレクトリに関連付けられるストアドクエリへのリンクが記憶される。ストアドクエリディレクトリ以外のディレクトリに対するディレクトリエントリにおいては、ＱＰ欄はヌルである。
【０１３３】
リンクの性質は各実現例ごとに変化し得る。たとえば、ある実現例によれば、このリンクは、ストアドクエリがストアされているストレージ場所に対するポインタであり得る。別の実現例によれば、このリンクは、単に、ストアドクエリテーブル中のストアドクエリを調べるのに用いられ得る固有のストアドクエリ識別子であり得る。本発明は、何らかの特定のタイプのリンクに限定されることはない。
【０１３４】
図１１を参照して、ここには、Documentsディレクトリに対する行（Ｒ１３）を含むように更新されたファイルテーブル７１０が図示される。一実施例によれば、従来のディレクトリに対して保持されたものと同じメタデータがDocumentsディレクトリに対してもまた保持される。たとえば、行Ｒ１３は、作成日、最終変更日などを含み得る。
【０１３５】
図１２はファイル階層構造のブロック図である。図１２に示す階層構造は図６のものと同じであるが、Documentsディレクトリ１２０２が加えられている。何らかのアプリケーションがDocumentsディレクトリ１２０２の内容の表示をリクエストすると、データベースはそのDocumentsディレクトリ１２０２に関連付けられたクエリを実行する。クエリは、このクエリを満足するファイルを選択する。このクエリの結果はその後、Documentsディレクトリ１２０２の内容としてアプリケーションに提示される。図１２に示された時点では、ファイルシステムはDocumentsディレクトリ１２０２に関連付けられたクエリを満足するファイルを２つしか含まない。これら２つのファイルはともにExample.docと題されている。したがって、これら２つのExample.docファイル６１８および６２２はDocumentsディレクトリ１２０２の子として示される。
【０１３６】
ＯＳファイルシステムの多くにおいて、同じディレクトリは同じ名称の２つの異なるファイルをストアできない。したがって、Documentsディレクトリ１２０２内にExample.docと題された２つのファイルがに存在すると、ＯＳファイルシステムの規則が破られるおそれがある。この問題に対処するためにさまざまな技術が用いられ得る。たとえば、ＤＢファイルシステムは各ファイル名に文字を付加して固有のファイル名を作ることができる。したがって、Example.doc６１８はExample.doc１として提示され得るのに対し、Example.doc６２２はExample.doc２として提示される。特定の情報を伝えない文字を付加するのではなく、意味を伝えるように付加文字を選択してもよい。たとえば、付加する文字により、そのファイルが静的に位置づけられるディレクトリへのパスを示してもよい。すなわち、Example.doc６１８はExample.doc＿Windows（Ｒ）＿Wordと表わすことができ、一方、Example.doc６２２はExample.doc＿VMS＿App４と表わされる。これに代えて、単にストアドクエリディレクトリにＯＳファイルシステムの規則を破らせることも可能である。
【０１３７】
図１０に示す実施例では、所与のディレクトリの子ファイルはすべて静的に規定されるか、またはすべてストアドクエリにより規定されるかのいずれかである。しかしながら、本発明の一実施例によれば、ディレクトリはいくつかの静的に規定された子ファイルと、ストアドクエリにより規定されたいくつかの子ファイルとを有し得る。たとえば、ヌルのDir＿entry＿listを有するのではなく、インデックスエントリ１００４は、１つ以上の子ファイルを静的に特定するDir＿entry＿listを有し得る。したがって、アプリケーションがデータベースシステムにDocumentsディレクトリの子を特定するように要請すると、データベースサーバは静的に規定された子ファイルとストアドクエリ１００２を満足する子ファイルとの集合をリストすることになる。
【０１３８】
重要なことには、あるディレクトリの子ファイルを特定するストアドクエリが他のディレクトリおよび文書を選択してもよい。そのような他のディレクトリのいくつかまたはすべては、それら自体がストアドクエリディレクトリであり得る。ある状況下では、ある特定のディレクトリのストアドクエリがその特定のディレクトリ自体を選択し、そのディレクトリを自身の子にするという場合もある。
【０１３９】
ストアドクエリディレクトリの子ファイルはオンザフライで決定されるので、子ファイルのリスティングは常にデータベースの現状を反映するものとなろう。たとえば、「Documents」ストアドクエリディレクトリが上述のとおり作成されたとする。拡張子.docを有する新規ファイルが作成されるたびに、そのファイルは自動的にDocumentsディレクトリの子になる。同様に、あるファイルの拡張子が.docから.txtに変わると、そのファイルは自動的にDocumentsディレクトリの子としての資格をなくすことになる。
【０１４０】
一実施例によれば、ストアドクエリディレクトリに関連付けられたクエリは、ディレクトリの子ファイルとなる特定のデータベース記録を選択し得る。たとえば、「Employees（従業員）」と題されたディレクトリは、データベース内のEmployeesテーブルからすべての行を選択するストアドクエリにリンクされ得る。あるアプリケーションが仮想従業員ファイルのうちの１つの検索をリクエストすると、レンダラは対応の従業員記録からのデータを用いて、リクエストを出しているアプリケーションが予期するファイルタイプのファイルを生成する。
【０１４１】
ストアされたクエリ文書
ストアドクエリをディレクトリの子ファイルを特定するのに用いることができるのと同様に、ストアドクエリはまた、文書の内容を特定するのに用いることもできる。図７および図１１を参照して、これらの図はBody（本体）欄を有するファイルテーブル７１０を示している。ディレクトリに対し、Body欄はヌルである。文書に対し、Body欄は文書を含むBLOBを含む。ストアドクエリによって内容が特定されるファイルに対して、Body欄はストアドクエリに対するリンクを含んでいてもよい。アプリケーションがストアされたクエリ文書の検索を要求すると、ストアされたクエリ文書に関連付けられる行にリンクされたストアドクエリが実行される。文書の内容はそこで、クエリの結果のセットに基づいて構成される。ある実施例によれば、クエリ結果から文書を構成するプロセスは上述のようにレンダラによって行なわれる。
【０１４２】
ストアドクエリの結果によってその内容が完全に決められる文書に対するサポートをもたらすことに加えて、サポートはまた、ある部分はクエリの結果によって決められるが他の部分はそうではない文書に対してもたらされてもよい。たとえば、文書ディレクトリにおける行のBody欄はBLOBを含んでいてもよく、その際、別の欄はストアドクエリへのリンクを含む。その行に関連付けられるファイルに対するリクエストを受取った際、クエリは実行されてもよく、クエリの結果はファイルをレンダリングする際にBLOBと組合されてもよい。
【０１４３】
複数レベルのストアドクエリディレクトリ
上述のように、ストアドクエリはディレクトリの子ファイルを動的に選択するのに用いられてもよい。ディレクトリの子ファイルはすべて、ファイル階層構造における同じレベル（すなわち、ストアドクエリと関連付けられるディレクトリの真下のレベル）に属する。ある実施例によれば、あるディレクトリに関連付けられるストアドクエリはディレクトリの下の複数のレベルを規定し得る。複数のレベルを規定するクエリに関連付けられるディレクトリはここで、複数レベルのストアドクエリディレクトリと称するものとする。
【０１４４】
たとえば、複数レベルのストアドクエリディレクトリは、従業員テーブルにおけるすべての従業員記録を選択し、これらの従業員記録を部門および地域ごとにグループ分けするクエリと関連付けられていてもよい。これらの条件のもとで、各グループ分けキー（部門および地域）および従業員記録に対して別個の階層レベルを設けてもよい。具体的には、このようなクエリの結果はファイル階層構造における３つの異なるレベルとして表わされてもよい。ディレクトリの子ファイルは第１のグループ分け基準によって定められる。この例においては、第１のグループ分け基準は「部門」(department)である。よって、ディレクトリの子ファイルは、さまざまな部門の値、すなわち「Dept1」、「Dept2」および「Dept3」であってもよい。これらの子ファイルはそれら自体がディレクトリとして表わされる。
【０１４５】
部門ディレクトリの子ファイルは第２のグループ分け基準によって定められることになる。この例においては、第２のグループ分け基準は「地域」(region)である。したがって、各部門ディレクトリは「North」、「South」、「East」、「West」などの地域値の各々に対する子ファイルを有することとなる。地域ファイルもまたディレクトリとして表わされる。最後に、各地域ディレクトリの子ファイルは、地域ディレクトリに関連付けられるある特定の部門／地域の組合せに対応するファイルとなる。たとえば、＼Dept1＼Eastディレクトリの子はEast地域におけるDepartment1における従業員であることとなる。
【０１４６】
ストアドクエリディレクトリの子ファイルに対する
ファイルオペレーションの取扱い
上記のように、ストアドクエリディレクトリの子ファイルは、従来のディレクトリの子ファイルと同様の態様でアプリケーションに対して示される。しかしながら、従来のディレクトリの子ファイルに対して行なわれ得るあるファイルオペレーションは、ストアドクエリディレクトリの子ファイルに対して行なわれると特殊な問題点を生じることとなる。
【０１４７】
たとえば、ユーザがストアドクエリディレクトリの子ファイルを別のディレクトリに移動すべきであることを特定する入力をしたと仮定する。子ファイルはディレクトリに関連付けられるストアドクエリにおいて特定される基準を満たしているという事実によってストアドクエリディレクトリに属しているため、このオペレーションは問題を生じる。ファイルがその基準をもはや満たさなくなるような態様でファイルを変更するのでなければ、そのファイルはストアドクエリディレクトリの子ファイルとしての資格を有し続けることになる。
【０１４８】
あるファイルをストアドクエリディレクトリに移動する試みがなされる際に同様の問題が生じる。そのファイルが既にストアドクエリディレクトリの子ではないのであれば、そのファイルはストアドクエリディレクトリに関連付けられるストアドクエリを満たさない。ストアドクエリにより特定される基準をファイルが満たすようにする態様でそのファイルを変更するのでなければ、そのファイルはストアドクエリディレクトリの子とされるべきではない。
【０１４９】
これらの問題点を解決するのにさまざまなアプローチをとることができる。たとえば、ファイルをストアドクエリディレクトリの中へまたはその中から移動することを試みるオペレーションに応答してエラーを出すようにＤＢファイルシステムを構成してもよい。代わりに、ＤＢファイルシステムはこのような試みに応答して、問題のファイル（またはファイルとして表わされているデータベース記録）を削除してもよい。
【０１５０】
さらに別のアプローチでは、ストアドクエリディレクトリの中へ移動されたファイルはこれらがディレクトリに関連付けられるストアドクエリの基準を満たすように自動的に変更されてもよい。たとえば、ストアドクエリディレクトリに関連付けられるストアドクエリが既婚のすべての従業員を選択しているものと仮定する。ある従業員記録に対応するファイルがそのストアドクエリディレクトリに移動されると、その従業員記録の「既婚」フィールドが更新され、その従業員が既婚であることを示す。
【０１５１】
同様に、ストアドクエリディレクトリの外へ移動されたファイルはこれらがディレクトリに関連付けられるストアドクエリの基準をもはや満たさないように自動的に変更されてもよい。たとえば、「既婚の従業員」のストアドクエリディレクトリにおけるファイルがそのディレクトリの外へ移動された場合、対応する従業員記録の「既婚」フィールドが更新されその従業員が既婚ではないことを示すようにする。
【０１５２】
ストアドクエリの基準を満たさないファイルを対応のストアドクエリディレクトリの中へ移動する試みがなされた場合、別のアプローチとしては、ストアドクエリディレクトリのインデックスエントリを更新してそのファイルをストアドクエリディレクトリの子として統計的に確立することが挙げられる。こうした状況のもとでは、ストアドクエリディレクトリは、ストアドクエリを満たしていることから子ファイルであるいくつかの子ファイルと、ストアドクエリディレクトリに手動で移動されたために子ファイルとなった他の子ファイルとを有することとなる。
【０１５３】
プログラム的に規定されたファイル
ストアドクエリディレクトリおよびストアされたクエリ文書はプログラム的に規定されたファイルの例である。プログラム的に規定されたファイルとは、ファイルシステムに対してファイルとして表わされたエンティティ（たとえば文書またはディレクトリ）であるが、その内容および／または子ファイルがコードを実行することによって定められるものである。ファイルの内容を定めるために実行されるコードとは、ストアされたクエリファイルの場合のようにストアされたデータベースクエリを含んでいてもよく、および／または他のコードを含んでいてもよい。一実施例によれば、プログラム的に規定されるファイルに関連付けられるコードは以下のルーチンを実現する。
【０１５４】

resolve＿filenameルーチンは、「filename」の名を有しかつプログラム的に規定されたファイルの子であるファイルのファイルハンドル(file handle)を戻す。list＿directoryルーチンは、プログラム的に規定されたファイルのすべての子ファイルのリストを戻す。fetchルーチンは、プログラム的に規定されたファイルの内容を検索する。putルーチンは、プログラム的に規定されたファイルの中へデータを挿入する。deleteルーチンは、プログラム的に規定されたファイルを削除する。
【０１５５】
一実施例によれば、「resolve＿pathname(path):file＿handle」ルーチンも提供される。resolve＿pathnameルーチンはパスを受け、パスにおける各ファイル名(filename)に対してresolve＿filename機能を反復的にコールする。
【０１５６】
一実施例によれば、ＤＢファイルシステムは、従来のファイル（すなわち、プログラム的に規定されていないファイル）に対する、上に列挙したルーチンを実現するオブジェクトクラスをもたらす。説明の目的で、そのオブジェクトクラスはここで「ディレクトリクラス」と称することにする。プログラム的に規定されたファイルを実現するため、ディレクトリクラスのサブクラスが確立される。そのサブクラスはディレクトリクラスのルーチンを継承するが、これらのルーチンの実現をプログラマがオーバーライドすることを可能にする。サブクラスによりもたらされる実現は、プログラム的に規定されたファイルにかかわるファイルオペレーションに応答してＤＢファイルシステムによって行なわれるオペレーションを決める。
【０１５７】
ファイルシステム内でのイベント通知
この発明の一局面によれば、あるファイルシステムイベントの発生の際にユーザが先を見越して(proactively)通知されるファイルシステムが提供される。これらは先を見越して通知されるため、関心のあるイベントが起こったことを示す条件を検出するため繰返されるポーリングのオーバーヘッドを引起さずにすむ。ファイルシステムイベントの発生の際に通知を受けるという能力は、たとえば、ユーザにとってある特定のファイルシステムイベントが重要な意味を有している場合などに非常に有用である。
【０１５８】
たとえば、ある文書の複数のコピーが異なる場所において維持され（「キャッシュされ」）、その文書に対してより効率のよいアクセスをもたらすようにすることは一般的である。こうした条件のもとで、そのコピーのうちの１つが更新された場合、残りのコピーは古くなってしまう（すなわち、これらのコピーはもはやその文書の現在の状態を反映していない）。以下説明するイベント通知手法を用いて、１つのコピーが更新された際に、他のコピーが存在するサイトではその更新について先を見越して通知を受けることができる。これらのサイトにおけるプロセスまたはユーザはそこで、その状況下で適当である何らかの処置をとることができる。キャッシュの場合、適当な処置とはたとえば、文書のキャッシュされたバージョンを更新されたバージョンで置き換えることであるかもしれない。
【０１５９】
別の例としては、ある特定のユーザがある会社の技術文書のすべてをそれらが出版される前に見直す責任がある場合がある。その会社のテクニカルライタは、すべての技術文書をそのユーザによる見直しのための準備が整った際に「見直し準備完了」ディレクトリの中へ記憶するように指示を受けているかもしれない。事前対応型の(proactive)通知システムがなければ、技術文書を「見直し準備完了」ディレクトリに単に記憶するだけでは新しい文書の見直しの準備ができたことをユーザに気づかせることにはならない。むしろ、テクニカルライタがそのユーザにその文書が見直されるための準備が整ったことを知らせるか、またはユーザが「見直し準備完了」ディレクトリを定期的にチェックするなどの何らかの追加の作業が必要となる。これに対し、ここに説明するイベント通知手法を実現するファイルシステムでは、技術文書を「見直し準備完了」ディレクトリの中へ入れるという行為により、新しい技術文書が見直しされる準備が整ったことをユーザに通知するためのユーザに対するメッセージの生成を引起すことができる。
【０１６０】
この発明の一実施例によれば、ファイルシステムイベントに対して先を見越してメッセージを生成するためのルールを定義付けてもよい。このようなイベントには、たとえば、ある特定のディレクトリにおけるファイルの記憶または作成、ある特定のディレクトリにおけるファイルの削除、ある特定のディレクトリからのファイルの移動、ある特定のファイルの変更または削除およびある特定のディレクトリへファイルをリンクすることなどが含まれる。これらのファイルシステムオペレーションは単に代表として表わすものである。事前対応型の通知ルールが作成され得る特定のオペレーションは実現例ごとに異なり得る。この発明はどんなある特定のセットのファイルシステムオペレーションに対してイベント通知サポートをもたらすことにも限定されない。
【０１６１】
一実施例によれば、event＿idがファイルシステムイベントに割当てられる。そこで、あるevent＿idおよび１以上の加入者の組を特定する通知ルールが作成されてもよい。あるルールがファイルシステムに一旦登録されると、ルールのevent＿idによって識別されるファイルシステムイベントの発生に応答して、そのルールにおいて識別される消費者の組に自動的にメッセージが送られる。
【０１６２】
たとえば、あるユーザはいつファイルがある特定のディレクトリに追加されるかを知ることについて関心を登録するかもしれない。この関心を記録するため、データベースサーバは、（１）「登録ルール」テーブルの中に行を挿入し、（２）ディレクトリに関連付けられるフラグを設定して、少なくとも１つのルールがそのディレクトリに対して登録されたことを示す。登録されたルールのテーブルに挿入される行はエンティティを識別しそのエンティティが関心を持っているイベントを示す。行はまた、そのエンティティと通信するのに用いるべきプロトコルなどの追加の情報を含んでいてもよい。あるルールがディレクトリに当てはまることを示すフラグは、ディレクトリに関連付けられるファイルのテーブルの行において、またはディレクトリに関連付けられる階層インデックスエントリにおいて、またはその両方において記憶されてもよい。
【０１６３】
ファイルをディレクトリに挿入する際、データベースサーバはディレクトリと関連付けられるフラグを検査してそのディレクトリに対して何らかのルールが登録されているかどうかを判定する。そのディレクトリに対してルールが登録されている場合、登録されたルールのテーブルをサーチしてそのディレクトリに当てはまる特定のルールを見出す。登録されたルールがディレクトリに対して行なわれている特定のオペレーションに当てはまるルールを含んでいる場合、これらのルールに識別される関心を持っているエンティティにメッセージが送られる。エンティティに対してメッセージを送るのに用いられるプロトコルはエンティティごとに異なり得る。たとえば、あるエンティティに対しては、メッセージはＣＯＲＢＡを介して送られてもよく、その一方、他のエンティティに対しては、メッセージはＨＴＴＰを介するＨＴＭＬページの形で送られてもよい。
【０１６４】
一実施例によれば、通知機構は、その内容のすべてがここに引用により援用される、１９９７年１０月３１日にチャンドラ（Chandra）他によって提出された「データベースシステムにおけるメッセージ待ち行列のための装置および方法」（APPARATUS AND METHOD FOR MESSAGE QUEUING IN A DATABASE SYSTEM）と題された米国特許出願第０８／９６１，５９７号に記載される待ち行列機構のような待ち行列機構を用いて、上述のように、データベース実現型ファイルシステムとともに実現される。
【０１６５】
そのような実施例の１つによれば、データベースサーバの外部で実行されるイベントサーバがデータベースサーバによって管理される待ち行列に対して加入者として登録される。イベントサーバが加入する待ち行列はここでファイルイベント待ち行列と称することとする。ある特定のファイルシステムイベントに関心があるエンティティはその関心をイベントサーバに登録する。イベントサーバはデータベースＡＰＩを介してデータベースサーバと通信し、関心を持っているエンティティと、それらのエンティティがサポートするプロトコルを介して通信する。
【０１６６】
データベースサーバがファイルシステムに関連するオペレーションを行なう際、データベースサーバファイルイベント待ち行列の中に、オペレーションに関連付けられるevent＿idを示すメッセージを入れる。待ち行列機構は、イベントサーバがファイルイベント待ち行列の中に関心を登録したことを判定し、イベントサーバにメッセージを送信する。イベントサーバは関心を持っているエンティティのリストをサーチしていずれかのエンティティがそのメッセージにおいて識別されるイベントに関心を登録していないかどうかを判定する。イベントサーバは次に、そのイベントに関心を登録したすべてのエンティティに対してファイルシステムイベントの発生を示すメッセージを送信する。
【０１６７】
関心を持っているエンティティへメッセージを転送するのにイベントサーバを用いる実施例において、イベントサーバはある特定の最大数のユーザをサポートするように構成されてもよい。関心を持っているユーザの数が最大数を超えた場合、追加のイベントサーバを開始して追加のユーザに対してサービスをもたらす。単一のイベントサーバのケースと同様に、複数イベントサーバのシステムにおける各イベントサーバはファイルイベント待ち行列への加入者として登録される。
【０１６８】
代替の実施例によれば、ファイルシステムイベントに関心を持っているエンティティはファイルイベント待ち行列への加入者として直接登録される。登録情報の一部として、エンティティはそれらが関心を持っているファイルシステムイベントのevent＿idを示す。待ち行列機構がファイルイベント待ち行列の中にメッセージを入れる際、待ち行列機構はすべての待ち行列加入者に自動的にメッセージを送るわけではない。むしろ、待ち行列機構は登録情報を検査してどのエンティティがそのメッセージに関連付けられる特定のイベントに関心を登録しているかを判定し、それらのエンティティのみに選択的にメッセージを送る。データベースＡＰＩをサポートとしていないエンティティの場合、登録情報にはこれらのエンティティがサポートするプロトコルについての情報が含まれる。待ち行列機構はこれらのエンティティに対し、それらの登録情報にリストされているプロトコルを用いてファイルイベントメッセージを送信する。
【０１６９】
ファイルシステムイベント通知はさまざまなコンテキストにおいて適用され得る。たとえば、時には第１のマシンに第２のマシンに存在するファイルのキャッシュを記憶することが望ましいことがある。そのようなファイルキャッシュを実現する現在利用可能な機構の１つはMicrosoft Windows（Ｒ）オペレーティングシステムにより提供される「ブリーフケース」機能である。ブリーフケース機能により、ユーザがあるマシン上で特殊なフォルダ（「ブリーフケース」）を作成し、そのブリーフケースの中へ他のマシン上に記憶されているファイルをコピーすることが可能となる。各々のブリーフケースは「更新」オプションを有しており、これが選択されると、ブリーフケース内のファイルのコピーと元の場所にあるファイルのコピーとをファイルシステムに比較させる。もしファイルが同じ変更日を有していない場合、ファイルシステムはユーザがその２つのコピーを同期化するのを可能にする（典型的に、より新しいコピーをより古いコピーに上書きすることによって）。
【０１７０】
ブリーフケース機構とは異なり、ファイルシステムイベント通知機構は、ファイルキャッシュを先を見越して更新することを可能にし、これによってファイルキャッシュが常に、元の場所にあるファイルの現在の状態を反映するようにする。たとえば、ファイルキャッシュを管理するプロセスは、キャッシュに含まれているファイルの元のコピーに対する更新について関心を登録してもよい。これにより、元のファイルのいずれかが更新された際にはプロセスは自動的にこれを知らされることとなり、即刻これに応答して更新されたファイルをファイルキャッシュの中へコピーすることができる。同様に、ファイルシステムイベント通知機構を用いて、第１のマシン上に第２のマシン上に存在する１つ以上のディレクトリをミラー化してもよい。ファイルシステムイベント通知機構をこのような態様で用いるため、ミラー化された(mirrored)ディレクトリを維持するためのプロセスは最初にディレクトリおよびその中に含まれるすべてのファイルのコピーを作り、次に、ディレクトリおよびディレクトリに含まれるファイルに対して加えられた変更についてその関心を登録する。変更がディレクトリに加えられたことを知らされると、プロセスはディレクトリのコピーに対し対応する変更を加える。同様に、ミラー化されたディレクトリ内のファイルのいずれかに対する変更を知らされた際には、プロセスはファイルのコピーに対して対応する変更を加える。
【０１７１】
たとえば、ミラー化されたディレクトリからミラー化されていないディレクトリへファイルが移動された場合、プロセスはミラー化されたディレクトリからファイルのコピーを削除し、そのファイルについての関心の登録を解除する。したがって、プロセスはファイルが更新された際も引続き通知されるということはない。同様に、ミラー化されていないディレクトリからミラー化されているディレクトリへファイルが移動された場合、プロセスはディレクトリが変わったことを知らされることとなる。そのメッセージに応答して、プロセスは新しいファイルを識別し、ミラー化されたディレクトリ内に新しいファイルのコピーを作り、その新しいファイルについてその関心を登録する。
【０１７２】
ファイルシステムにおけるバージョン管理
職場においては、大勢の人が長期間にわたってともに作業することになる大型の仕事を「プロジェクト」と称する。プロジェクトに取組んでいる際、社員は典型的に多数の文書を作成し、その各々は何らかの態様でそのプロジェクトに関係がある。
【０１７３】
同様に、コンピュータシステム内では、ユーザはしばしば、すべてがあるプロジェクトに関係のある多数の電子文書を作成する。たとえば世界中の多数のサイトに位置しているプログラマがそれぞれ、同じコンピュータプログラムの異なる部分に取組んでいるかもしれない。そのコンピュータプログラムに対して彼らが生成する電子文書は典型的にソースコードファイルを含むが、単一のプロジェクトに属する。すなわち、この議論の文脈では、プロジェクトとは関連するファイルの集まりのことである。
【０１７４】
典型的に、プロジェクトのファイルは特定のフォルダの中へ整理されることとなる。たとえば、図１３は、プロジェクト「Big Project」に関連するファイルがどのようにさまざまなフォルダに整理されているかの一例を示している。図１３を参照して、Big Projectと題されたフォルダ１３０２は、そのプロジェクトに関連するすべてのファイル（ディレクトリおよび文書）を保持するように作成されたものである。Big Project１３０２のすぐ下の子ファイルはフォルダsource code１３０４およびフォルダdocs１３０６である。source code１３０４は、ロサンゼルスに位置するプログラマのsource code１３１６およびsource code１３１８を記憶するためのLA code１３１２と、サンフランシスコに位置するプログラマのsource code１３２０を記憶するためのSF code１３１４との２つのディレクトリを含む。docs１３０６は、specs１３０８およびuser manual１３１０の２つのフォルダを含む。specs１３０８はspecs１３２２およびspecs１３２４を含む。user manual１３１０はＵＭ１３２６を含む。
【０１７５】
しばしば、あるプロジェクト内のファイルは同じプロジェクト内の他のファイルへの参照(reference)（たとえばＨＴＭＬリンク）を含んでいるであろう。これらの参照は典型的に、他の文書をその文書の完全なパス名を用いて識別している。したがって、文書がディレクトリ階層構造におけるある場所から別の場所へ移動された場合、あるいはその文書の名称が変更された場合、その文書へのすべての参照が無効となってしまう。
【０１７６】
文書間参照の存在により、ファイルの新しいバージョンは典型的に、それらが置換するより古いバージョンと同じ名前で同じ場所に記憶される。従来のファイルシステムでは、このプロセスによってファイルのより古いバージョンは上書きされ、これを回復するのが不可能になる。残念ながら、ファイルのより古いバージョンを回復することが望ましい場合は多々ある。たとえば、より新しいバージョンから重大な情報がうっかり削除されてしまったかもしれない。もしより古いバージョンを回復することが不可能であれば、ユーザはその失われた資料を再現するのに、それも再現できるのであればの話であるが、かなりのリソースを費やさなければならないかもしれない。さらに、多くの場合、ファイルに対する変更履歴を復元することが可能であったり、ある特定の変更がいつ加えられたものであるかを判断することが可能であったり、またはある時点で何が変更されたかを判断することが可能であることは望ましい。
【０１７７】
この発明の一局面によれば、ファイルの新しいバージョンがより古いバージョンを上書きすることなく、より古いバージョンと同じ名称を用いてディレクトリ階層構造における同じ場所に保存されるバージョニング(versioning)機構が提供される。より古いバージョンを上書きするのではなく、より古いバージョンは保持され、ユーザは選択的にファイルのより古いバージョンを検索することができる。さらに、より古いバージョンはディレクトリ階層構造におけるそれらの元の場所において保持される。以下により詳細に説明するように、ファイルシステムがディレクトリ階層構造内の同じ場所において同じファイルの複数のバージョンを同じ名称で保持することを可能にする新規のディレクトリバージョニング手法が提供される。
【０１７８】
新しいバージョンの作成によって元のバージョンの名称または場所が変更されないため、ファイルの最初のバージョンに対するどんな参照も、ファイルのより新しいバージョンが作成された場合でもファイルの最初のバージョンを示し続けることとなる。したがって、文書内に含まれるファイル間参照は、参照された文書のより新しいバージョンが作成されたとしても、引続き参照された文書の正しいバージョンを指す。バージョニングプロセスにおいてファイル間参照が有効なままである（すなわち、引続き、参照されたファイルの正しいバージョンを参照する）という事実は、ファイル検索の効率にかなり有益な影響がある。具体的には、参照されたファイルの適切なバージョンを探すためにルックアップオペレーションを行なうことを必要とするのではなく、参照されたファイルは他のファイル内に含まれるそれらへの参照をたどることによって直接検索することができる。
【０１７９】
同様に、ある特定の時点におけるディレクトリの内容を判定するプロセスにルックアップオペレーションが関与する必要がない。ディレクトリはそれら自体がバージョン付け（versioned）されているため、ディレクトリのある特定のバージョンを選択することは単にディレクトリのメンバを選択することになる。ディレクトリの選択されたバージョンは、ディレクトリのそのバージョンに属する正しいファイルへの、よってファイルの正しいバージョンへの、直接リンクを含むことになる。
【０１８０】
また、バージョンごとにファイルの名称が変わる場合でも同じファイルのバージョン間の関係を追跡するための手法が提供される。以下により詳細に説明するように、ファイルの名称に加えて、各ファイルの各バージョンに対してFileIDおよびバージョンナンバが維持される。２つのファイルが同じFileIDを有している場合、それらは異なる名称を有していたとしても同じファイルの異なるバージョンである。
【０１８１】
この発明の一局面によれば、ユーザが見たいと思うプロジェクトの「ビュー」(view)をユーザが選択することを可能にする機構が提供される。プロジェクトのビューはある特定の時点において存在していた状態でのプロジェクトのファイルを表わす。たとえば、ユーザに提示されるデフォルトビューはすべてのファイルの最新のバージョンを表わしてもよい。別のビューでは、１日前の時点で最新であったファイルのバージョンを表わしているかもしれない。別のビューでは、一週間前の時点で最新であったファイルのバージョンを表わしていてもよい。
【０１８２】
一実施例によれば、あるプロジェクトにおける各ファイルとともにバージョンナンバを記憶することによってバージョン追跡機構が提供される。たとえば、ファイルテーブル７１０などの、ファイルテーブルを用いるデータベースシステムにおいて実現されるファイルシステムにおいて、あるファイルに関連付けられる行の１つの欄はそのファイルに対するバージョンナンバを記憶してもよい。ファイルが作成されるたびに、ファイルに対する行がファイルテーブル７１０の中に挿入され、予め定められた最初のバージョンナンバ（たとえば１）がその行のバージョン欄に記憶される。
【０１８３】
ファイルが更新されると、ファイルの前のバージョンは上書きされない。その代わり、ファイルの新しいバージョンのために新しい行がファイルテーブルに挿入される。新しいバージョンのための行には元の行と同じFileID、NameおよびCreation Dateが含まれるが、より高いバージョンナンバ（たとえば２）、新しいModification Dateおよび場合によっては異なるファイルサイズなどが含まれる。さらに、ファイルの内容を記憶するBLOBは更新を反映することとなるが、元のエントリのBLOBは変わらない。
【０１８４】
一実施例によれば、ファイルとそのファイルが存在するディレクトリとがともにあるプロジェクトに属している場合、ファイルへの変更によってディレクトリの新しいバージョンが実効的に作成される。これにより、ディレクトリにおけるファイルの更新により、ファイルの新しいバージョンのためのファイルテーブルの行が作成されるだけでなく、ディレクトリの新しいバージョンのためのファイルテーブルの行も作成されることとなる。階層インデックスを用いる一実施例において、ディレクトリの新しいバージョンに対するインデックスエントリもまた階層インデックスに追加されることとなる。
【０１８５】
もしあるディレクトリと親ディレクトリとがともに同じプロジェクトに属しているなら、ディレクトリの新しいバージョンの作成によって親ディレクトリの新しいバージョンが実効的に作成される。これにより、ディレクトリの親ディレクトリに対するファイルテーブルおよび階層インデックスにも新しい行が追加される。このプロセスは続けられ、あるプロジェクトに属しかつファイル階層構造において更新されたファイルの上に存在するすべてのディレクトリに対して新しいバージョンが作られることとなる。
【０１８６】
バージョニング機構がプロジェクトに属するファイルの更新にどのように応答するかを示すため、図１３に示されるすべてのファイルがバージョン１であると仮定し、かつcode１３２０に対して更新が行なわれたと仮定する。図１４に示されるように、バージョニング機構はcode１３２０の元のバージョンを削除することなく新しいバージョンのcode１３２０′を作成することによって更新に応答する。code１３２０はSF codeディレクトリ１３１４に属し、そのため元のバージョンを削除することなく新しいバージョンのSF codeディレクトリ１３１４′が作成される。SF codeディレクトリ１３１４はsource codeディレクトリ１３０４に属するため新しいバージョンのsource codeディレクトリ１３０４′が元のバージョンを削除することなく作成される。最後に、source codeディレクトリ１３０４はbig projectディレクトリ１３０２に属するため、新しいバージョンのbig project１３０２′が元のバージョンを削除することなく作成される。
【０１８７】
図１４に示されるように、親ファイルの新しいバージョンが子ファイルの新しいバージョンに応答して作成される際、親ファイルの新しいバージョンは更新されたファイルの元のバージョンではなく、更新されたファイルの新しいバージョンが子であることを除いて、引続き更新前に有していたのと同じ子を有する。たとえば、新しいバージョンのcode１３２０′は新しいバージョンのSF code１３１４′の子である。新しいバージョンのSF code１３１４′は、新しいバージョンのsource code１３０４′の子である。しかしながら、元のsource code１３０４の変わらない子ファイル（たとえばLA code１３１２）は引続き新しいバージョンのsource code１３０４′の子ファイルであり続ける。同様に、新しいバージョンのsource code１３０４′は新しいバージョンのbig project１３０２′の子であるが、元のbig projectの変わらない子ファイル（たとえばdocs１３０６）はbig project１３０２の新しいバージョンの子ファイルであり続ける。
【０１８８】
ファイルシステムが階層インデックスを用いて実現される実施例では、ディレクトリの新しいバージョンに対して作成されたインデックスエントリは、更新された子ファイルに対するアレイエントリが子ファイルの新しいバージョンに対するアレイエントリで置換されることを除いて、ディレクトリの前のバージョンに対するインデックスエントリと同じDir＿entry＿listを含むことになる。更新された子ファイルが子ディレクトリであった場合、新しいディレクトリに対するDir＿entry＿listアレイエントリは、子ディレクトリの新しいバージョンに対するインデックスエントリの、階層インデックス内の、RowIDを含むこととなる。
【０１８９】
あるプロジェクトに属するファイルがそのプロジェクトにおける１つのディレクトリからそのプロジェクトにおける別のディレクトリに移動された場合、ファイルそのものは変更されていないため、ファイルの新しいバージョンは作成されない。しかしながら、ファイルが移動された元のディレクトリおよびファイルが入れられたディレクトリはともに変更されている。このため、これらのディレクトリおよび同じプロジェクトにあるこれらのディレクトリのすべての先祖ディレクトリに対して新しいバージョンが作成される。図１５は、LA code１３１２からSF code１３１４へ移動される図１３のcode１３１８に応答して作られることになる新しいディレクトリを示す。具体的には、新しいバージョンのLA code１３１２′およびSF code１３１４′が作成されることになる。新しいバージョンのLA code１３１２′はその子としてcode１３１８を有さない。むしろ、code１３１８は新しいバージョンのSF code１３１４′の子となる。新しいsource codeディレクトリ１３０４′が作成され、新しいバージョンのLA code１３１２′およびSF code１３１４′にリンクされる。新しいbig projectディレクトリ１３０２′が作成され、新しいsource code１３０４′および元のdocsディレクトリ１３０６にリンクされる。
【０１９０】
上述のバージョニング手法を用いて、あるプロジェクト（たとえばbig project１３０２）に対して変更が加えられる度にそのプロジェクトのルートディレクトリの新しいバージョンが作成される。ルートプロジェクトディレクトリの各バージョンから派生するリンクはある特定の時点においてそのプロジェクトに属していたすべてのファイルを互いにリンクし、このようにリンクされたファイルのバージョンはそのある特定の時点において存在していたバージョンである。たとえば、図１４を参照して、big project１３０２から派生するリンクはcode１３２０に対する更新の前に存在していた状態でのプロジェクトを反映している。big project１３０２′から派生するリンクは、code１３２０に対する更新の直後に存在していた状態でのプロジェクトを反映する。同様に、図１５において、big project１３０２から派生するリンクは、code１３１８をLA code１３１２からSF code１３１４へ移動する前に存在していた状態でのプロジェクトを反映する。big project１３０２′から派生するリンクは、code１３１８をLA code１３１２からSF code１３１４へ移動した直後に存在していた状態でのプロジェクトを反映する。
【０１９１】
タグ付け
残念なことに、上述のバージョニング手法により、特にプロジェクトのより上位のレベルにおけるディレクトリの、ファイルバージョンの大幅な急増が起きる。状況によっては、このような急増は必要ではなく望ましくもないかもしれない。したがって、この発明の一実施例によれば、ファイルのバージョンに「タグ付けする」ための機構が提供される。ファイルのバージョンのタグ付けによりファイルのそのバージョンを保持すべきであることを示す。すなわち、より新しいバージョンが作成される際にファイルのより古いバージョンを常に保持するのではなく、ファイルのより古いバージョンはタグ付けされている場合にのみ保持される。そうでなければ、これらはより新しいバージョンが作成される際に置換される（上書きされる）。
【０１９２】
図１３を参照して、code１３２０がタグ付けされていないものと仮定する。code１３２０が更新された場合、codeの新しいバージョンは単にcodeの古いバージョンで置換される。code１３２０がタグ付けされている場合にのみ、図１４に示されるように、code１３２０、SF code１３１４、source code１３０４およびbig project１３０２の別個の新しいバージョンが作られることとなる。
【０１９３】
多くの場合、タグはあるプロジェクト内のすべてのファイルに対して同時に適用されることになる。たとえば、あるソフトウェアプログラムのある特定のバージョンがリリースされた場合、プログラムのリリースされたバージョンを作成するのに用いられたすべてのソースコードはその時点でタグ付けされてもよい。これにより、ソースコードファイルへのその後の改訂にかかわらず、リリースされたバージョンに関連付けられる正確に同じソースコードの組が後に参照するために利用可能となる。
【０１９４】
タグが常に全体としてのプロジェクトに適用される実施例において、単一のタグがルートプロジェクトディレクトリに対して維持されてもよい。タグ付けされているルートプロジェクトディレクトリのバージョンを用いてあるファイルの場所を確認する場合、そのファイルに対するいかなる変更もそのファイルの新しいバージョンを作成することにつながり、その一方でそのファイルの元のバージョンが保持される。逆に、タグ付けされていないルートプロジェクトディレクトリのバージョンを用いてファイルの場所が確認される場合、そのファイルに対するいかなる変更も単にファイルの前のバージョンを上書きすることになる。
【０１９５】
別の実施例によれば、タグをファイルに適用することはファイル階層構造においてそのファイルより下にあるすべてのファイルにタグを実効的に適用する。たとえば、タグがLA code１３１２に適用されるものと仮定する。code１３１８がLA code１３１２から外へ移動される場合、LA code１３１２の新しいバージョンが作成される。code１３１８が更新される場合、code１３１８およびLA code１３１２の双方の新しいバージョンが作成される。このような実施例において、すべてのタグ付きファイルを通してファイル階層構造をトラバースすることによってファイルの場所が確認される場合、そのファイルに対するどんな変更によってもファイルの新しいバージョンが作成されることになる。階層構造におけるタグ付けされたいずれのファイルもトラバースすることなくファイルの場所が確認される場合、そのファイルに対するどんな変更もそのファイルの前のバージョンを上書きすることとなる。
【０１９６】
パージカウント
タグ付けの代わりにまたはタグ付けに加えて用いることができるバージョンの急増を低減するための別の手法には、パージカウントを維持することが含まれる。パージカウントは、いずれかの所与のファイルに対して保持されることとなるバージョンの最大数を示す。既にバージョンの数がパージカウントに達しているファイルに対して新しいバージョンが作成される場合、そのファイルの新しいバージョンはそのファイルの保持される最も古いバージョンを上書きする。パージカウントはファイルごとのシステム、プロジェクトごとのベースまたはファイルごとのベースで実現されてもよい。ファイルごとのシステムのベースで実現される場合、単一のパージカウントがファイルシステムにおいて維持されるすべてのファイルに適用される。プロジェクトごとのベースでは、所与のプロジェクトにおけるすべてのファイルは同じパージカウントを有するが、異なるプロジェクトは異なるパージカウントを有し得る。ファイルごとのベースでは、各ファイルに対して異なるパージカウントが特定され得る。
【０１９７】
タグ付けと組合せて用いられる場合、パージカウント機構はさまざまな態様で実現され得る。一実施例によれば、タグ付けされたファイルは、ファイルの新しいバージョンを作成することによってパージカウントを超えることになるかどうかを判定する目的では無視され、タグ付けされたファイルはパージカウント機構によっては削除されることは決してない。たとえば、あるファイルに対するパージカウントが５であり、すなわちそのファイルの５つのバージョンが存在すると仮定し、かつこれらの５つのバージョンのうちの１つにタグ付けがされていると仮定する。そのファイルに対して更新がなされると、パージカウント機構は、現在そのファイルの既存のタグなしバージョンは４つしかないと判断し、よって、既存のバージョンのいずれをも削除することなくファイルの別のバージョンを作成する。同じファイルが再び更新された場合は、パージカウント機構は、ファイルの既存のタグなしバージョンは５つであると判定し、よって、新しいバージョンを作成することに応答してファイルの最も古いタグなしバージョンを削除する。
【０１９８】
プロジェクト間リンク
各リンクはソースファイル（そのリンクが拡張される元のファイル）およびターゲットファイル（そのリンクが指し示すファイル）を有する。ファイル階層構造において、リンクのソースファイルはしばしばディレクトリであり、リンクのターゲットファイルはディレクトリ内のファイルである。しかしながら、リンクのすべてがディレクトリとその子との間のものであるわけではない。たとえばＨＴＭＬファイルは、グラフィック画像および他のＨＴＭＬファイルへのハイパーリンクを含み得る。階層インデックスを用いて実現されるファイルシステムでは、これらのハイパーリンクはディレクトリ−文書間リンクと同様の態様で扱うことができる。
【０１９９】
ファイルシステムのビューにより、ファイルシステムにおける各プロジェクトがある特定の時点においてどのように存在していたかが示される。しかしながら、あるビューにおける１つのプロジェクトに関連付けられるその時点は、同じビューにおける別のプロジェクトに関連付けられる時点とは異なるかもしれない。このことにより、リンクのソースファイルがリンクのターゲットファイルとは異なるプロジェクトに属する場合に問題が生じる。たとえば、ビューが、ファイルＦ１を含むプロジェクトＰ１に対する時間Ｔ１とファイルＦ２を含むプロジェクトＰ２に対する後の時間Ｔ２とを特定すると仮定する。さらに、ファイルＦ２がファイルＦ１へのリンクを有すると仮定する。Ｆ２のＴ２バージョンに含まれるリンクはＰ１のＴ２バージョンへ行くのであり、Ｐ１のＴ１バージョンに行くのではない。しかしながら、そのビューはＰ１に対するＴ１を特定するため、そのビューを介してＰ１におけるいずれのファイルに対して行なわれるどんなオペレーションに対してもＰ１のＴ１バージョンが用いられるべきである。
【０２００】
この発明の一実施例によれば、各リンクに対して「プロジェクト間境界」フラグが維持される。リンクのプロジェクト間境界フラグは、そのリンクのソースファイルおよびターゲットファイルが同じプロジェクトにあるかどうかを示す。階層インデックス５１０などの階層インデックスを用いるファイルシステムにおいて、プロジェクト間境界フラグはたとえば、インデックスエントリのDir＿entry＿listの各アレイエントリに記憶されてもよい。
【０２０１】
ファイル階層構造のトラバース(traversal)において、すべてのリンクのプロジェクト間境界フラグはそのリンクをたどる前に検査される。あるリンクのプロジェクト間境界フラグが設定されている場合、ソース側ファイルが属しているプロジェクトの要求されるバージョン時間はターゲット側ファイルが属しているプロジェクトの要求されるバージョン時間と比較される。所望のバージョン時間が同じである場合、そのリンクはトラバースされる。所望のバージョン時間が同じではない場合、ターゲット側ファイルが属しているプロジェクトの要求されるバージョン時間に対応するターゲットファイルのバージョンを探してサーチが行われる。
【０２０２】
たとえば、Ｆ２とＦ１との間のリンクのプロジェクト間境界フラグが設定されることとなる。これにより、Ｐ２の要求されるバージョン時間とＰ１の要求されるバージョン時間とが比較される。Ｐ２の要求されるバージョン時間はＴ２であり、これはＰ１の要求されるバージョン時間であるＴ１と同じではない。したがって、Ｐ１はリンクをたどることによってその場所を確認することはできないであろう。その代わり、時間Ｔ１に対応するＰ１のバージョンの場所を確認するためにサーチが行われることとなる。
【０２０３】
代替の実施例によれば、プロジェクト間境界フラグは全く維持されない。代わりに、リンクに遭遇するたびに、ソースファイルの要求されるバージョン時間がターゲットファイルの要求されるバージョン時間と比較される。ソースファイルとターゲットファイルとが同じプロジェクトにある場合、または同じ要求されるバージョン時間を有する異なるプロジェクトにある場合、そのリンクをたどる。そうでなければ、ターゲットファイルの正しいバージョンを探してサーチが行なわれる。
【０２０４】
オブジェクト指向ファイルシステム
近年、オブジェクト指向プログラミングが標準のプログラミング規範となっている。オブジェクト指向プログラミングでは、世界はオブジェクトの観点からモデル化される。オブジェクトとは、記録を操る手続きおよび機能と組合される記録である。あるオブジェクトクラスにおけるすべてのオブジェクトは同じフィールド（「属性」）を有し、同じ手続きおよび機能（「方法」）により操られる。オブジェクトはそれが属するオブジェクトクラスの「インスタンス」であるといわれる。
【０２０５】
ときおり、アプリケーションは、類似であるが同一ではないオブジェクトクラスの使用を必要とすることがある。たとえば、イルカと犬との両方をモデル化するのに用いられるオブジェクトクラスには鼻、口、長さおよび年齢の属性が含まれるかもしれない。しかしながら、犬オブジェクトクラスは毛色属性を必要とする一方、イルカオブジェクトクラスはひれの大きさの属性を必要とするかもしれない。
【０２０６】
あるアプリケーションが複数の類似の属性を必要とする状況におけるプログラミングを容易にするため、オブジェクト指向プログラミングでは「継承」をサポートする。継承がなければ、プログラマは犬オブジェクトクラスに対して１つのコードのセットを書き、イルカオブジェクトクラスに対して第２のコードのセットを書かなければならなくなる。双方のオブジェクトクラスに共通した属性および方法を実現するコードは双方のオブジェクトクラスに重複して現われることとなる。このような態様でコードが重複しているのは、特に、共通の属性および方法の数が独特の属性の数よりはるかに多い場合に非常に効率が悪い。さらに、オブジェクトクラス間のコード重複によりコードを改訂するプロセスが複雑になる。これは、その属性を有するすべてのオブジェクトクラス間で整合性を維持するために、共通の属性に対して加えられた変更はコードにおける複数の位置において複製されなければならないためである。
【０２０７】
継承により、オブジェクトクラス間に階層構造を確立することが可能となる。所与のオブジェクトクラスの属性および方法は、自動的に階層構造における所与のオブジェクトクラスに基づいたオブジェクトクラスの属性および方法となる。たとえば、「動物」オブジェクトクラスは関連付けられた方法とともに、鼻、口、長さおよび年齢属性を有するものとして定義付けられ得る。これらの属性および方法をイルカおよび犬オブジェクトクラスに追加するため、プログラマはイルカおよび犬オブジェクトクラスが動物オブジェクトクラスを「継承する」のを特定することができる。このような状況の下で、イルカおよび犬オブジェクトクラスは動物オブジェクトクラスの「サブクラス」であるといえ、動物オブジェクトクラスは犬およびイルカオブジェクトクラスの「親」クラスであるといわれる。
【０２０８】
この発明の一局面によれば、ファイルシステムに対して、継承を含むオブジェクト指向規範を適用するための機構が提供される。具体的には、ファイルシステムにおける各ファイルはあるクラスに属する。ファイルシステムのクラスは、とりわけ、ファイルシステムがそのファイルについて記憶している情報のタイプを定める。一実施例によれば、ベースクラスが設けられる。ファイルシステムのユーザはそこで他のクラスを登録してもよく、これはベースクラスまたはいずれかの前に登録したクラスのサブクラスとして定義付けられてもよい。
【０２０９】
新しいファイルクラスがファイルシステムに登録される際、ファイルシステムは新しいタイプのファイルおよび新しいタイプのファイルシステムとの対話をサポートするよう実効的に拡張される。たとえば、ほとんどの電子メールアプリケーションは電子メール文書が「優先度」プロパティを有していることを期待する。ファイルシステムが優先度プロパティのための記憶をもたらしていない場合、電子メールアプリケーションはそのファイルシステムに記憶される電子メール文書に対して正しく動作しないかもしれない。同様に、あるオペレーティングシステムは、あるタイプのシステム情報がファイルとともに記憶されていることを期待するかもしれない。ファイルシステムがその情報を記憶していない場合、オペレーティングシステムは問題に遭遇し得る。ある特定のタイプのシステムまたはプロトコル（たとえば、特定のオペレーティングシステム、ＦＴＰ、ＨＴＰＰ、ＩＭＡＰ４など）をサポートするのに必要とされるすべての属性を含むクラスを登録することによって、そのシステムまたはプロトコルとの正確かつ透過的な対話が可能となる。
【０２１０】
クラスを登録するために、そのクラスについての情報がもたらされ、これはそのクラスの親クラスを識別し親クラスが有していない属性でそのクラスが有しているどんな属性をも記述するデータを含む。その情報はまた、そのクラスのインスタンスに対して動作する特定の方法を特定してもよい。
【０２１１】
ユーザがファイルクラスを登録することを可能にし、ファイルクラス間での継承をサポートし、ファイルが属するクラスに基づいてファイルについての情報を記憶するオブジェクト指向ファイルシステムは、ファイルシステムそのものが実現されるコンテキストに応じてさまざまな態様で実現され得る。一実施例によれば、オブジェクト指向ファイルシステムは上述のようにデータベース実現型ファイルシステムのコンテキストにおいて提供される。オブジェクト指向ファイルシステムのさまざまな局面をデータベース実現型の実施例に関連して説明するが、ここで説明するオブジェクト指向ファイルシステム手法はそのような実施例に限定されるものではない。
【０２１２】
オブジェクト指向ファイルシステムのデータベース実現
一実施例によれば、データベース実現型ファイルシステムはベースクラスを設けており、そのベースクラスのサブクラスをファイルシステムに登録することが可能である。図１６を参照して、ファイルクラスの例示的なセットが示される。ベースクラスは「Files」と題され、名称、作成日および変更日を含むすべてのファイルに一般的に共通である属性を含む。同様に、Filesクラスの方法には、すべてのファイルに対して行われ得るオペレーションのための方法が含まれる。
【０２１３】
一実施例によれば、Filesクラスの属性は、データベース実現型ファイルシステムがともに用いられることになるオペレーティングシステムによって維持されるすべての属性の合併である。たとえば、図３に示されるようにサーバ２０４によって維持されるデータベースにおいてファイルシステムが実現されていると仮定する。そのファイルシステムに記憶されるファイルはオペレーティングシステム３０４ａおよびオペレーティングシステム３０４ｂから生じたものであるが、これらのオペレーティングシステムは必ずしも同じファイル属性のセットをサポートするわけではない。このため、データベースサーバ２０４によって実現されるファイルシステムのFilesクラスの属性のセットは２つのオペレーティングシステム３０４ａおよび３０４ｂによってサポートされる属性のセットの合併となる。
【０２１４】
代替の実施例によれば、Filesクラスの属性はデータベース実現型ファイルシステムがともに用いられるオペレーティングシステムによって維持されるすべての属性の交差である。そのような実施例においては、Filesクラスのサブクラスを各オペレーティングシステムに対して登録することができる。所与のオペレーティングシステムに対して登録されたサブクラスは、ベースのFilesクラスに既に含まれていない所与のオペレーティングシステムによってサポートされる属性のすべてを追加することによってベースのFilesクラスを拡張することとなる。
【０２１５】
図１６に例示される実施例では、「Document」クラスおよび「Folder」クラスの、Filesの２つのサブクラスが登録されている。DocumentクラスはFilesクラスの属性および方法のすべてを継承し、かつ文書ファイルに特有の属性を追加する。例示される実施例では、Documentクラスは属性「サイズ」を追加する。
【０２１６】
Folderクラスは、Filesクラスの属性および方法のすべてを継承し、フォルダファイル（すなわち、他のファイルを含むことが可能である、ディレクトリなどのファイル）に特有である属性および方法を追加する。例示される実施例では、Folderクラスは新しい属性「max＿children」および新しい方法「dir＿list」を導入している。max＿children属性はたとえば、所与のフォルダ内に含まれ得る子ファイルの最大数を示していてもよい。dir＿list方法はたとえば、所与のフォルダの子ファイルのすべてのリストを提供するようにしてもよい。
【０２１７】
図１６に例示されるクラス階層構造では、Documentクラスは、e-mailおよびTextの２つの登録されたサブクラスを有する。これらのサブクラスは両方ともDocumentクラスの属性および方法のすべてを継承する。さらに、e-mailクラスは、Read＿flag、優先度および送信者の３つの追加のプロパティを含む。Textクラスは１つの追加の属性であるCR＿Flagと追加の方法Typeとを有する。CR＿Flagは、テキスト文書が「復帰」(carriage return)記号を含むかどうかを示すフラグであってもよい。Type方法は、コンピュータモニタなどの入出力デバイスへテキスト文書を出力する。
【０２１８】
ファイルクラスおよびファイル形式
ファイルの内部構造はファイルの「形式」と称される。典型的に、ファイルの形式はファイルを作成するアプリケーションにより決められる。たとえば、あるワードプロセッサにより作成された文書は別のワードプロセッサによって作成された別の文書と同じ意味内容を有していても、全く異なる形式を有しているかもしれない。いくつかのファイルシステムでは、文書形式とファイル名拡張子との間にマッピングが維持されている。たとえば、.docで終わるファイル名を有するすべてのファイルはある特定のワードプロセッサにより作成されたファイルであると推定され、よって、そのワードプロセッサによって強いられる内部構造を有するものと推定される。他のファイルシステムでは、文書の形式についての情報はその文書に関連付けられる別個のメタファイルにおいて維持される。
【０２１９】
ファイル形式とは対照的に、ここに説明するファイルクラス機構は文書の内部構造に関連しない。むしろ、ファイルのファイルクラスはファイルシステムがそのファイルに対してどんな情報を維持するか、かつファイルシステムがファイルにどんなオペレーションを行なえるかを決める。たとえば、多数のワードプロセッサによって作成された文書はすべてDocumentクラスのインスタンスであり得る。このため、ファイルシステムは文書の内部構造が完全に異なっていても、文書について同じ属性情報を維持し、文書に対して同じオペレーションを行なうことを可能にする。
【０２２０】
クラステーブル
一実施例によれば、オブジェクト指向ファイルシステムは、ファイルの各クラスに対して関係テーブルが作成される関係データベースシステムにおいて実現される。図１７は、図１６に例示されるクラスに対して作成され得るテーブルの一例である。具体的には、Filesテーブル１７０２、documentテーブル１７０４、E-mailテーブル１７０６、Textテーブル１７０８およびFolderテーブル１７０８はそれぞれ、Filesクラス、Documentクラス、E-mailクラス、TextクラスおよびFolderクラスに対応する。
【０２２１】
一実施例によれば、所与のクラスに対するクラステーブルは、（１）その所与のクラスに属するファイルおよび（２）その所与のクラスのいずれかの子孫(descendant)クラスに属するファイルのための行を含む。たとえば、例示されるシステムにおいて、Filesクラスはベースクラスである。したがって、ファイルシステムにおけるすべてのファイルはFilesクラスまたはその子孫クラスのメンバとなる。したがって、Filesテーブルはファイルシステムにおけるすべてのファイルに対する行を含むこととなる。一方、E-mailクラスおよびTextクラスはDocumentクラスの子孫であるが、FilesクラスおよびFolderクラスはそうではない。したがって、Documentテーブル１７０４は、クラスDocument、E-mailまたはTextのすべてのファイルに対する行を含むが、クラスFilesまたはFolderのものであるファイルに対する行は含まない。
【０２２２】
各クラスに対するテーブルは、そのクラスにより導入された属性に対する値を記憶する欄を含む。たとえば、DocumentクラスはFilesクラスの属性を継承し、これらの属性にサイズ属性を追加する。したがって、Documentテーブルには、サイズ属性に対するサイズ値を記憶するための欄が含まれる。同様に、E-mailクラスはDocumentクラスの属性を継承し、read＿flag、優先度および送信者属性を導入する。したがって、E-mailテーブル１７０６には、read＿flag値、優先度値および送信者値を記憶するための欄が含まれる。
【０２２３】
図１７に示されるファイルシステムにおいて５つのファイルが記憶されている。File1と名づけられたファイルはFilesテーブル１７０２におけるRowID X1に記憶される。File1のFileIDはＦ１である。File1のクラスはFileクラスであり、これは行Ｘ１のClass欄に記憶される値によって示されるとおりである。File1はFilesクラスのインスタンスであるため、Filesテーブル１７０４はFile1に対する情報を含む唯一のクラステーブルである。したがって、File1に対して記憶される唯一の属性値はFilesクラスに関連付けられる属性に対する値である。
【０２２４】
File2と名付けられたファイルはFilesテーブル１７０２におけるRowID X2に記憶される。File2のFileIDはＦ２である。File2のクラスはDocumentクラスであり、これは行Ｘ２のClass欄に記憶される値によって示されるとおりである。File2はDocumentクラスのインスタンスであるため、Filesテーブル１７０２およびDocumentテーブル１７０４はFile2に対する情報を含む。すなわち、File2に対して記憶される属性値は、Filesクラスから継承された属性を含む、Documentクラスと関連付けられる属性に対する値である。
【０２２５】
File3と名付けられるファイルはFilesテーブル１７０２におけるRowID X3に記憶される。File3のFileIDはＦ３である。File3のクラスはE-mailクラスであり、これは行Ｘ３のClass欄に記憶される値によって示されるとおりである。File3はE-mailクラスのインスタンスであるため、Filesテーブル１７０２、Documentテーブル１７０４およびE-mailテーブル１７０６はすべてFile3に対する情報を含む。すなわち、File3に対して記憶される属性値は、DocumentクラスおよびFilesクラスから継承された属性を含む、E-mailクラスに関連付けられる属性に対する値である。
【０２２６】
File4と名付けられたファイルはFilesテーブル１７０２におけるRowID X4に記憶される。File4のFileIDはＦ４である。File4のクラスはTextクラスであり、これは行Ｘ４のClass欄に記憶される値によって示されるとおりである。File4はTextクラスのインスタンスであるため、Filesテーブル１７０２、Document１７０４およびTextテーブル１７０８はFile4に対する情報を含む。すなわち、File4に対して記憶される属性値は、DocumentクラスおよびFilesクラスから継承された属性を含む、Textクラスに関連付けられる属性に対する値である。
【０２２７】
File5と名付けられたファイルはFilesテーブル１７０２におけるRowID X5に記憶される。File5のFileIDはＦ５である。File5のクラスはFolderクラスであり、これは行Ｘ５のClass欄に記憶される値によって示されるとおりである。File5はFolderクラスのインスタンスであるため、Filesテーブル１７０２およびFolderテーブル１７０８はFile5に対する情報を含む。すなわち、File5に対して記憶される属性値は、Filesクラスから継承される属性を含む、Folderクラスに関連付けられる属性に対する値である。
【０２２８】
この発明の一実施例によれば、クラステーブル内のファイルは上に図５および図８に関連して説明したように階層インデックスをトラバースすることによってアクセスされる。階層インデックスのトラバースにより（パス名導出において行われるように）、ターゲットファイルに対応するFilesテーブル１７０２内の行のRowIDが生成される。その行から、Filesクラス属性に対する属性値が検索される。しかしながら、他のクラスに属するファイルに関しては、追加の属性は他のクラステーブルから検索されなければならないかもしれない。たとえば、File3に対し、作成日および変更日はFilesテーブル１７０２の行Ｘ３から検索され得る。しかしながら、File3のサイズを検索するには、Documentテーブル１７０４の行Ｙ２にアクセスしなければならない。File3に対する優先度情報を検索するには、E-mailテーブル１７０６の行Ｑ１にアクセスしなければならない。
【０２２９】
あるファイルに属するさまざまな属性値の検索を容易にするため、これらの属性を含む行は互いにリンクされる。例示される実施例では、リンクは「Derived RowID」とラベル付けされた欄に記憶される。ある特定のクラスに対するテーブルにおけるある特定のファイルに対する行のDerived RowID欄において記憶される値は、そのある特定のクラスのサブクラスに対するテーブルに存在するそのある特定のファイルに対する行を指し示す。たとえば、File3に対するFilesテーブル行Ｘ３のDerived RowID欄は値Ｙ２を含む。Ｙ２はDocumentテーブル１７０４におけるFile3に対する行のRowIDである。同様に、Document行Ｙ２のDerived RowID欄は値Ｑ１を含む。Ｑ１はE-mailテーブル１７０６におけるFile3に対する行のRowIDである。
【０２３０】
例示される実施例では、ある特定のファイルに対する行間のリンクは片方向であり、親クラスに対するテーブルにおける行からサブクラスのテーブルにおける行へ行く。これらの片方向リンクにより、ベーステーブル（すなわちファイルテーブル）における行から始まるサーチが容易となるが、これはほとんどの条件下で当てはまる。しかしながら、サーチの開始点が別のテーブルの行である場合、親クラステーブルにおける関連のある行はリンクによってその場所を確認することができない。これらの関連のある行を探すため、関心のあるファイルのFileIDに基づいてこれらのテーブルのサーチが行なわれてもよい。
【０２３１】
たとえば、ユーザがDocumentテーブル１７０４の行Ｙ２を検索し、File3に対する他の属性値のすべてを検索することを望んだと仮定する。E-mailに特有の属性値を含む行は、行Ｙ２のDerived RowID欄におけるポインタをたどることによって見つけられるかも知れず、これはE-mailテーブル１７０６における行Ｑ１を指し示す。しかしながら、残りの属性を探すためには、Filesテーブル１７０２をFileID F3に基づいてサーチする。このようなサーチにより行Ｘ３が見出されることとなり、これはFile3の残りの属性値を含む。
【０２３２】
代替の実施例によれば、関連のある行間のリンクは、すべての関連のある行がFileIDルックアップなしでその場所を確認することが可能となる態様で実現されてもよい。たとえば、各クラステーブルはまた、親クラステーブルにおける関連のある行のRowIDを含むParent RowID欄を有していてもよい。したがって、Documentテーブル１７０４の行Ｙ２に対するParent RowID欄はFilesテーブル１７０２における行Ｘ３を指し示すこととなる。代わりに、片方向リンクの連鎖における最後の行が、Filesテーブルにおける関連のある行へ戻るポインタを含んでいてもよい。さらに別の選択肢としては、各クラステーブルに対して、Filesテーブルにおける関連のある行へ戻るポインタを含む欄を設けることを含む。したがって、Textテーブル１７０８の行Ｒ１およびDocumentテーブル１７０４の行Ｙ３はともに、Filesテーブル１７０２の行Ｘ４へ戻るポインタを含むことになる。
【０２３３】
サブクラス登録
上に述べたように、新しいクラスを登録することによってファイルシステムのクラス階層構造を拡張するための機構が提供される。一般的に、クラス登録プロセスにおいて提供される情報は、新しいクラスの親クラスを識別するデータと新しいクラスによって追加される属性を記述するデータとを含む。任意に、データはまた、新しいクラスのインスタンスに対して行なうことができる新しい方法を識別するのに用いられるデータを含んでいてもよい。
【０２３４】
登録情報は数多くの手法のうちのいずれを用いてファイルシステムに提供されてもよい。たとえば、ユーザに、登録されたクラスのすべてを表わすアイコンを含むグラフィックユーザインターフェイスを提示してもよく、ユーザはユーザインターフェイスによって表わされるコントロールを操作して、（１）クラスのうちの１つを新しいクラスの親として選択し、（２）新しいクラスに名を付け、（３）新しいクラスに対して追加の属性を定義付け、（４）新しいクラスに対して行われ得る新しい方法を定義付けてもよい。代わりに、ユーザはファイルシステムに対して、新しいクラスに対する登録情報を含むファイルを与えてもよい。ファイルシステムはそのファイルをパーシングして情報を識別し抽出し、その情報に基づいて新しいクラスに対するクラスファイルを作る。
【０２３５】
この発明の一実施例によれば、クラス登録情報がExtensible Markup Language（ＸＭＬ）ファイルの形でファイルシステムにもたらされる。ＸＭＬ形式はwww.oasis-open.org/cover/xml.htm1＃contentsおよびそこにリストされるサイトにおいて詳細に説明される。一般的に、ＸＭＬ言語は、フィールドを指名しフィールドの始まりおよび終わりをマークするタグとこれらのフィールドに対する値とを含む。たとえば「Folder」ファイルクラスに対する登録情報を含むＸＭＬ文書は以下の情報を含んでいるかもしれない。
【０２３６】

このファイルクラス登録文書を受取ったことに応答して、ファイルシステムは、新しいクラスFolderに対するテーブルを作成する。このようにして作成された新しいテーブルは、登録情報において定義付けられる属性の各々に対する欄を含む。この例においては、max＿children属性のみが定義付けられている。max＿children属性に対して特定されるデータタイプは「整数」である。したがって、Folderテーブルは、整数値を保持するmax＿children欄とともに作成される。属性の名称およびタイプに加えて、各属性に対してさまざまな他の情報がもたらされてもよい。たとえば、登録情報は、属性値に対する範囲または最大長さを示していてもよく、その欄にインデックスをつけるべきであるかまたはその欄が一意性または参照制約を受けるべきであるかを示していてもよい。
【０２３７】
登録情報はまた、新しいクラスファイルによってサポートされるどんな方法についての情報も含む。一実施例によれば、新しい方法はこれらの方法に関連付けられるルーチンを含むファイルを識別することによって特定される。一実施例によれば、各ファイルクラスに関連付けられるルーチンはJAVA（Ｒ）クラスにおいて実現される。第１のファイルクラスが第２のファイルクラスのサブクラスである場合、第１のファイルクラスに関連付けられる方法を実現するJAVA（Ｒ）クラスは第２のファイルクラスの方法を実現するJAVA（Ｒ）クラスのサブクラスである。
【０２３８】
上に挙げたＸＭＬの例では、登録情報のdbi＿classnameフィールドがFolderファイルクラスに対するJAVA（Ｒ）クラスファイルを特定する。具体的には、登録情報はdbi＿classnameフィールドに対してファイル名「my＿folder＿methods」をもたらし、my＿folder＿methods JAVA（Ｒ）クラスがFolderクラスの継承されていない方法に対するルーチンを実現することを示す。FolderクラスはFilesクラスのサブクラスであるため、my＿folder＿methodsクラスはFilesクラスに対する方法を実現するJAVA（Ｒ）クラスのサブクラスとなる。したがって、my＿folder＿methodsクラスはFiles方法を継承することとなる。
【０２３９】
親ファイルクラスによってサポートされていない新しい方法を定義付けることに加え、子ファイルクラスに対するルーチンは親クラスにおいて定義付けられる方法の実現をオーバーライドできる。たとえば、図１６に示されるFilesクラスは「記憶」方法を提供する。Folderクラスはその記憶方法を継承する。しかしながら、Filesクラスに対してもたらされる記憶方法の実現は、フォルダを記憶するのに必要とされる実現ではないかもしれない。したがって、Folderクラスは記憶方法のそれ自身の実現をもたらしてもよく、これによりFilesクラスによってもたらされる実現をオーバーライドする。
【０２４０】
ファイルのクラスの判定
ファイルシステムがファイルに対してオペレーションを行なうように求められた際、ファイルシステムはそのファイルが属するファイルのある特定のクラスに対する要求されたオペレーションを実現するルーチンを呼出す。上述のように、その同じオペレーションは、たとえばサブクラスがその親クラスによってもたらされた実現をオーバーライドした際には異なるファイルクラスに対して異なった態様で実現され得る。すなわち、正しいオペレーションが行なわれることを確実にするため、ファイルシステムはまず、オペレーションが行なわれるべきファイルのクラスを識別しなければならない。
【０２４１】
ファイルシステムにおいて既に記憶されているファイルに対しては、ファイルのクラスを識別するタスクは些細なことかもしれない。たとえば、図１７に示される実施例では、Filesテーブル１７０２は、どの所与の行に対しても、その行と関連付けられるファイルのクラスを示すデータを記憶するClass欄を含む。したがって、File3に対して「移動」オペレーションを行なうことのリクエストを受取った場合、行Ｘ３のClass欄を検査してFile3がE-mailのタイプのものであることを判定する。これにより、「移動」のE-mailの実現が実行されるべきである。「移動」のE-mail実現は、E-mailファイルクラスがその継承した「移動」方法の実現をオーバーライドする場合にはE-mailファイルクラスに対してもたらされる実現となる。そうでなければ、「移動」のE-mail実現はE-mailクラスによって継承された実現である。
【０２４２】
ファイルのクラスを識別するタスクは、ファイルが既にファイルシステムに記憶されていない場合にはより困難であり得る。たとえば、ファイルシステムがファイルシステムに既に存在していないファイルを記憶することを求められた際、ファイルシステムはファイルテーブルを検査することによってクラス判定を行なうことができない。このような条件下では、ファイルのタイプを識別するのにさまざまな手法を用いてもよい。一実施例によれば、ファイルのタイプはファイルオペレーションリクエストにおいて明白にもたらされ得る。たとえば、オペレーティングシステムのコマンド行を介して発行されたコマンドに応答してリクエストがなされた場合、コマンド行引き数(command-line arguments)のうちの１つを用いてファイルのファイルタイプを示してもよい。たとえば、コマンドは「move a:＼mydocs＼file2c:＼yourdocs/class=document」と入力されてもよい。
【０２４３】
ファイルのクラスを判定するための別の手法には、ファイルの名称に含まれる情報に基づいてクラスを判定することが含まれる。たとえば、ある拡張子（たとえばdoc.wpd.pwpなど）を有するすべてのファイルはある特定のファイルクラス（たとえばDocument）のメンバとしてすべて扱われてもよい。したがって、ファイルシステムがこれらのファイルに対してオペレーションを行なうことを求められると、そのある特定のファイルクラスに関連付けられる方法実現が用いられる。
【０２４４】
ファイルのクラスを判定するためのさらに別の手法には、ファイルシステム階層構造内のファイルの場所に基づいてクラスを判定することが含まれる。たとえば、ある特定のディレクトリまたはディレクトリのセット内で作成されるすべてのファイルは、ファイルがどのように名付けられているかにかかわらずある特定のファイルクラスに属するものと推定され得る。これらおよび他の手法をさまざまな態様で組合せてもよい。たとえば、ある特定の拡張子を有するファイルは、そのファイルが第２のクラスと関連付けられるディレクトリに記憶されているのでなければ、第１のクラスのメンバとして扱われ得る。ファイルが第２のクラスに関連付けられるディレクトリに記憶されている場合、ファイルが別のファイルクラスのメンバであることをファイルオペレーションリクエストが明示的に識別しているのでなければ、ファイルは第２のクラスのメンバとして扱われる。
【０２４５】
ハードウェアの外観
図１８はこの発明の実施例が実現され得るコンピュータシステム１８００を示すブロック図である。コンピュータシステム１８００は、バス１８０２または情報を通信するための他の通信機構と、バス１８０２に結合され情報を処理するためのプロセッサ１８０４とを含む。コンピュータシステム１８００はまた、ランダムアクセスメモリ（ＲＡＭ）または他の動的記憶装置などのメインメモリ１８０６を含み、これはバス１８０２に結合され情報と、プロセッサ１８０４によって実行されるべき命令とを記憶する。メインメモリ１８０６はまた、プロセッサ１８０４によって実行されるべき命令の実行の間に一時変数または他の中間情報を記憶するのに用いられてもよい。コンピュータシステム１８００はさらに、バス１８０２に結合されプロセッサ１８０４のための静的情報および命令を記憶するための読取専用メモリ（ＲＯＭ）１８０８または他の静的記憶装置を含む。磁気ディスクまたは光ディスクなどのストレージデバイス１８１０が設けられ、バス１８０２に結合されて情報および命令を記憶する。
【０２４６】
コンピュータシステム１８００は、コンピュータユーザに情報を表示するための、陰極線管（ＣＲＴ）などのディスプレイ１８１２にバス１８０２を介して結合されてもよい。英数字キーおよび他のキーを含む入力デバイス１８１４は、バス１８０２に結合されプロセッサ１８０４へ情報およびコマンド選択を通信する。別のタイプのユーザ入力デバイスは、マウス、トラックボールまたはカーソル方向キーなどのカーソル制御１８１６であり、これはプロセッサ１８０４へ方向情報およびコマンド選択を通信し、ディスプレイ１８１２上でのカーソル移動を制御する。この入力デバイスは典型的に、第１の軸（たとえばｘ）および第２の軸（たとえばｙ）の２軸での２つの自由度を有し、これはデバイスが平面における位置を特定することを可能にする。
【０２４７】
この発明は、ここに説明される手法を実現するためのコンピュータシステム１８００の使用に関する。この発明の一実施例によれば、これらの手法は、プロセッサ１８０４がメインメモリ１８０６に含まれる１つ以上の命令の１つ以上のシーケンスを実行することに応答してコンピュータシステム１８００によって実現される。このような命令は、ストレージデバイス１８１０などの別のコンピュータ可読媒体からメインメモリ１８０６に読込まれてもよい。メインメモリ１８０６に含まれる命令のシーケンスの実行により、プロセッサ１８０４がここに説明するプロセスステップを行なうこととなる。代替の実施例では、この発明を実現するのに布線回路をソフトウェア命令の代わりに、またはソフトウェア命令と組合せて用いてもよい。このように、この発明の実施例はハードウェア回路およびソフトウェアのいずれの特定の組合せにも限定されるものではない。
【０２４８】
ここで用いられる用語「コンピュータ可読媒体」は、実行のためにプロセッサ１８０４に命令を提供することにかかわるすべての媒体を指していう。このような媒体は数多くの形態をとることができ、これには不揮発性媒体、揮発性媒体および伝送媒体が含まれるがこれらに限定されるものではない。不揮発性媒体はたとえば、ストレージデバイス１８１０などの光ディスクまたは磁気ディスクを含む。揮発性媒体には、メインメモリ１８０６などの動的メモリが含まれる。伝送媒体には、バス１８０２をなす配線を含む、同軸ケーブル、銅線および光ファイバが含まれる。伝送媒体にはまた、電波および赤外線データ通信において生成されるもののような、音波または光波の形態をとっていてもよい。
【０２４９】
一般的な形態のコンピュータ可読媒体には、たとえば、フロッピー（Ｒ）ディスク、フレキシブルディスク、ハードディスク、磁気テープまたは他の磁気媒体すべて、ＣＤ−ＲＯＭ、他の光学媒体すべて、穿孔カード、紙テープ、孔のパターンを有する他の物理媒体すべて、ＲＡＭ、ＰＲＯＭおよびＥＰＲＯＭ、FLASH-EPROM他のメモリチップすべてまたはカートリッジ、以下に説明するような搬送波またはコンピュータが読むことができる他の媒体すべてが含まれる。
【０２５０】
さまざまな形態のコンピュータ可読媒体が、実行のためにプロセッサ１８０４に１つ以上の命令の１つ以上のシーケンスを与えることに関係し得る。たとえば、命令は初めに、遠隔地のコンピュータの磁気ディスクに担持されているかもしれない。遠隔地のコンピュータはその動的メモリに命令をロードして、命令をモデムを用いて電話回線を介して送信することができる。コンピュータシステム１８００が有するモデムが電話回線上のデータを受信し、赤外線送信機を用いてそのデータを赤外線信号に変換することができる。赤外線検出器は赤外線信号に載ったデータを受信し、適切な回路はそのデータをバス１８０２上に出力することができる。バス１８０２はデータをメインメモリ１８０６に運び、そこからプロセッサ１８０４が命令を取出し実行する。メインメモリ１８０６によって受信された命令は、任意にプロセッサ１８０４による実行の前または後のいずれかにおいてストレージデバイス１８１０に記憶されてもよい。
【０２５１】
コンピュータシステム１８００はまた、バス１８０２に結合される通信インターフェイス１８１８を含む。通信インターフェイス１８１８は、ローカルネットワーク１８２２に接続されるネットワークリンク１８２０に結合される双方向データ通信を提供する。たとえば、通信インターフェイス１８１８は、対応するタイプの電話回線に対するデータ通信接続をもたらすモデムまたはサービス総合デジタルネットワーク（ＩＳＤＮ）カードであってもよい。別の例としては、通信インターフェイス１８１８は、互換性のあるＬＡＮへのデータ通信接続をもたらすためのローカルエリアネットワーク（ＬＡＮ）カードであってもよい。無線リンクもまた実現され得る。このような実現例のいずれにおいても、通信インターフェイス１８１８は、さまざまなタイプの情報を表わすデジタルデータストリームを担持する電気信号、電磁信号または光信号を送受信する。
【０２５２】
ネットワークリンク１８２０は典型的に、１つ以上のネットワークを介して他のデータデバイスに対するデータ通信を提供する。たとえば、ネットワークリンク１８２０は、ローカルネットワーク１８２２を介して、ホストコンピュータ１８２４に対してまたはインターネットサービスプロバイダ（ＩＳＰ）１８２６によって操作されるデータ機器に対して接続をもたらし得る。そのＩＳＰ１８２６は、現在一般的に「インターネット」１８２８と称されるワールドワイドパケットデータ通信ネットワークを介するデータ通信サービスを提供する。ローカルネットワーク１８２２およびインターネット１８２８はともに、デジタルデータストリームを担持する電気信号、電磁信号または光信号を用いる。さまざまなネットワークを介する信号と、デジタルデータをコンピュータシステム１８００と授受するネットワークリンク１８２０上のおよび通信インターフェイス１８１８を介する信号とは、情報を運ぶ搬送波の例示的な形態である。
【０２５３】
コンピュータシステム１８００は、ネットワーク、ネットワークリンク１８２０および通信インターフェイス１８１８を介して、プログラムコードを含む、メッセージを送信しデータを受信することができる。インターネットの例では、サーバ１８３０がインターネット１８２８、ＩＳＰ１８２６、ローカルネットワーク１８２２および通信インターフェイス１８１８を介してアプリケーションプログラムに対する要求されたコードを送信するかもしれない。この発明によれば、ダウンロードされたそのようなアプリケーションの１つがここに説明される手法を実現する。
【０２５４】
受信されたコードは、受信されるとともにプロセッサ１８０４によって実行されてもよく、および／または後に実行するためにストレージデバイス１８１０または他の不揮発性ストレージに記憶されてもよい。このような態様で、コンピュータシステム１８００は搬送波の形態であるアプリケーションコードを得てもよい。
【０２５５】
前述の明細書において、この発明をその特定の実施例に関連して説明した。しかしながら、この発明のより広い精神および範囲から逸脱することなくこれにさまざまな変更および修正を加えてもよいことが明らかになるであろう。明細書および図面はしたがって、例示的な意味でみなすべきであり、限定的な意味でみなすものではない。
【図面の簡単な説明】
【図１】従来のアプリケーションにおいて、オペレーティングシステムによって提供されるファイルシステムを通じてデータが記憶される様子を示すブロック図である。
【図２】従来のデータベースアプリケーションにおいて、データベースシステムによって提供されるデータベースＡＰＩを通じてデータが記憶される様子を示すブロック図である。
【図３】データベースＡＰＩおよびＯＳファイルシステムＡＰＩを含む種々のインターフェイスを通じて同じデータの組にアクセス可能な、システムを示すブロック図である。
【図４】トランスレーションエンジン３０８をより詳細に示すブロック図である。
【図５】階層インデックスを示すブロック図である。
【図６】階層インデックスによってエミュレートすることのできるファイル階層構造を示すブロック図である。
【図７】本発明に一実施例に従った、関係データベース内にファイルを記憶するのに使用することのできるファイルテーブルを示すブロック図である。
【図８】階層インデックスを使用してパス名を導出するステップを示すフローチャートである。
【図９】データベースファイルサーバをより詳細に示すブロック図である。
【図１０】ストアドクエリディレクトリのためのエントリを含む階層インデックスのブロック図である。
【図１１】ストアドクエリディレクトリのための行を含むファイルテーブルのブロック図である。
【図１２】ストアドクエリディレクトリを含むファイル階層構造を示すブロック図である。
【図１３】ファイル階層構造を示すブロック図である。
【図１４】図１３に示すファイル階層構造が、ここに説明するバージョニング技術の一実施例に従って、ドキュメントの更新に応答して更新される様子を示す、ブロック図である。
【図１５】図１３に示すファイル階層構造が、ここに説明するバージョニング技術の一実施例に従って、ドキュメントがあるフォルダから別のフォルダへと移動するのに応答して更新される様子を示す、ブロック図である。
【図１６】本発明の一実施例に従ったファイルクラスのクラス階層構造を示すブロック図である。
【図１７】本発明の一実施例に従った、図１６のファイルクラス階層構造を実現するデータベース実現型ファイルシステムにおいて使用される、関係テーブルを示すブロック図である。
【図１８】本発明の実施例がそれにおいて実現され得る、コンピュータシステムを示すブロック図である。[0001]
[Refer to priority claim and related applications]
This application is dated 5 August 1999 entitled "Internet File System" by Eric Sedlar, which is incorporated by reference as if fully set forth herein. No. 60 / 147,538 of prior US provisional patent application serial number filed on Japanese Patent Application No. 119 (e), claiming its national priority.
[0002]
This application is incorporated by reference as if fully set forth herein, by "Embedded Indexing to Access Information Organized Hierarchically in Relational Systems ( Related to US Patent Application Serial No. 09 / 251,757, filed February 18, 1999, entitled "Hierarchical Indexing for Accessing Hierarchically Organized Information in a Relational System").
[0003]
This application was entitled “File System that Supports Transactions” by Eric Sedlar, which is incorporated by reference as if fully set forth herein. Related to US application Ser. No. 09 / 571,496, filed May 15, 2000.
[0004]
This application is entitled 2000 “Stored Query Directories” by Eric Sedlar, which is incorporated by reference as if fully set forth herein. Related to U.S. Patent Application Serial No. 09 / 571,060, filed on May 15th.
[0005]
This application is a “Event Notification System Tied to a File System” by Eric Sedlar, which is incorporated by reference as if fully set forth herein. Related to U.S. Patent Application Serial No. 09 / 571,036, filed May 15, 2000.
[0006]
This application is a "Object File System with Typed Files" by Eric Sedlar, which is incorporated by reference as if fully set forth herein. Related to U.S. Patent Application Serial No. 09 / 571,492, filed May 15, 2000.
[0007]
This application was entitled “On-the-fly Format Conversion” by Eric Sedlar, which is incorporated by reference as if fully set forth herein. Related to US Patent Application Serial No. 09 / 571,568 filed May 15, 2000.
[0008]
This application is “Versioning in Internet File System” by Eric Sedlar and Michael J. Roberts, which is incorporated by reference as if fully set forth herein. Related to U.S. Patent Application Serial No. 09 / 571,696, filed May 15, 2000.
[0009]
This application is entitled “Multi-Model Access to Data” by Eric Sedlar, which is incorporated by reference as if fully set forth herein. Related to U.S. Patent Application Serial No. 09 / 571,508 filed May 15, 2000.
[0010]
Field of the Invention
The present invention relates generally to electronic file systems, and more particularly to a system that implements an operating system file system using a database system.
[0011]
BACKGROUND OF THE INVENTION
Humans tend to classify information into categories, and the categories themselves into which information is classified are typically configured in association with each other in some hierarchy. For example, an individual animal belongs to a species, a species belongs to a genus, a genus belongs to a family, a family belongs to an eye, and an eye belongs to a class.
[0012]
With the advent of computer systems, electronic information storage technology has been developed that greatly reflects the desire of humans who desire such a hierarchical organization. Conventional operating systems provide, for example, file systems that use a hierarchy-based organization principle. Specifically, in a typical operating system file system (“OS file system”), directories are arranged in a hierarchy, and documents (documents) are stored in these directories. Ideally, the hierarchical relationship between directories reflects some intuitive relationship between the meanings assigned to those directories. Similarly, if each document is stored in a directory, it would be ideal if it was stored based on some intuitive relationship between the contents of the document and the meaning assigned to the directory in which the document is stored. is there.
[0013]
FIG. 1 illustrates a typical mechanism used by a software application that creates and uses a file (such as a word processor) to store the file in a hierarchical file system. Referring to FIG. 1, the operating system 104 opens an application programming interface (API) for the application 102. The API thus opened allows application 102 to call routines provided by its operating system. Hereinafter, a portion of the OS API related to a routine for realizing the OS file system is referred to as an OS file API. The application 102 calls a file system routine via the OS file API, retrieves data, and stores it in the disk 108. The operating system 104 makes a call to the device driver 106 that controls access to the disk 108 to retrieve a file from the disk 106 and store the file in the disk 106.
[0014]
The OS file system routine realizes a hierarchical configuration of the file system. For example, OS file system routines maintain information about the hierarchical relationship between files and give application 102 access to the file based on the location of the file in the hierarchy.
[0015]
Whereas electronic information is organized hierarchically, a relational database stores information in a table made up of matrices. Each row is identified by a unique RowID. Each column represents a record attribute and each row represents a particular record. Data retrieval from a database is performed by presenting a query to a database management system (DBMS) that manages the database.
[0016]
FIG. 2 shows a typical mechanism used by database applications to access information in the database. With reference to FIG. 2, database application 202 interacts with database server 204 through an API provided by database server 204 (“database API”). The API thus opened allows the database application 202 to access the data using queries built in the database language supported by the database server 204. One of the languages supported by many database servers is Structured Query Language (SQL). The database server 204 makes the database application 202 appear as if all the data is stored in the rows of the table. However, transparent to the database application 202, the database server 204 actually interacts with the operating system 104 to store the data as files in the OS file system. The operating system 104 makes a call to the device driver 106 to retrieve the file from the disk 108 or store the file in the disk 108.
[0017]
Each type of storage system has advantages and limitations. A hierarchically structured storage system is simple, intuitive and easy to implement and is a standard model used by most application programs. Unfortunately, however, the simplicity of this hierarchical structure cannot provide the support required for complex data retrieval operations. For example, to retrieve all documents with a specific file name created on a specific date, it may be necessary to examine the contents of all directories. Hierarchical structure cannot facilitate the search process because all directories must be searched.
[0018]
Relational database systems are suitable for storing large amounts of information and accessing data very flexibly. For a hierarchically structured system, even data that meets complex search criteria can be easily and efficiently retrieved from a relational database system. However, the process of formulating a query and presenting it to a database server is not as intuitive as simply navigating through a directory hierarchy and goes beyond technical comfort for many computer users.
[0019]
At this time, application developers want to make the data created by those applications accessible through the hierarchical file system provided by the operating system or through the more complex query interface provided by the database system. You are asked to choose which one you want. In general, applications that do not require the complex search capabilities of a database system are designed to store their data using a more general and simpler hierarchical file system provided by the operating system. This simplifies both application design and application use, but places limitations on the flexibility and power with which the data can be accessed.
[0020]
In contrast, when complex search capabilities are required, applications are designed to access their data using a query mechanism provided by the database system. This increases the flexibility and power with which the data can be accessed, but at the same time increases the complexity of the application from both the designer's perspective and the user's perspective. Furthermore, the existence of a database system is also required, which adds additional costs to application users.
[0021]
In view of the above, it is clearly desirable for an application to be able to access data using a relatively simple OS file API. It is further desirable to be able to access the same data using a more powerful database API.
[0022]
Summary of the Invention
Techniques for accessing data stored in a database are provided. According to one technique, an application makes one or more calls to the operating system to access a file. The operating system includes routines that implement an operating system file system. The one or more calls are made to a routine that implements the operating system file system. In response to the one or more calls, one or more database commands are issued to a database server that manages the database. The database server executes the database command to retrieve data from the database. A file is generated from the data and provided to the application.
[0023]
The present invention will now be described by way of example and not limitation with reference to the accompanying drawings. In the figures, the same reference numerals represent the same elements.
[0024]
Detailed Description of the Preferred Embodiment
Methods and systems are provided that allow accessing the same set of data via various interfaces including database APIs and OS file system APIs. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. Those skilled in the art will recognize that the invention has been practiced without these specific details. It will be clear that you get. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.
[0025]
Architectural overview
FIG. 3 is a block diagram representing the architecture of a system 300 implemented in accordance with one embodiment of the present invention. Similar to the system shown in FIG. 2, the system 300 includes a database server 204 that provides a database API through which the database application 312 can access data managed by the database server 204. From the perspective of all entities that access data managed by the database server 204 through a database API, the data managed by the database server 204 can be queried using a database language supported by the database server 204 (eg, SQL). Stored in a table. Transparent to these entities, database server 204 stores this data on disk 108. According to one embodiment, the database server 204 implements disk management logic that allows data to be stored directly on disk, thereby avoiding the overhead associated with the OS file system of the operating system 104. Thus, the database server 204 either (1) calls the OS file system provided by the operating system 104 or (2) bypasses the operating system 104 by storing data directly on the disk. Data may be stored on disk.
[0026]
Unlike the system of FIG. 2, system 300 provides a translation engine 308 that converts I / O commands received from operating systems 304a and 304b into database commands that translation engine 308 issues to database server 204. To do. When an I / O command seeks storage of data, the translation engine 308 issues a database command to the database server 204 so that the data is stored in a relational table managed by the database server 204. When the I / O command seeks to retrieve data, the translation engine 308 issues a database command to the database server 204 to retrieve the data from the relational table managed by the database server. The translation engine 308 then provides the retrieved data to the operating system that issued the I / O command.
[0027]
For operating systems 304a and 304b, the fact that the data communicated to translation engine 308 is ultimately stored in a relational table managed by database server 204 is transparent. Since it is transparent to operating systems 304a and 304b, it is also transparent to applications 302a and 302b running on platforms that include those operating systems.
[0028]
For example, it is assumed that the user of the application 302a selects the option “save file” given by the application 302a. The application 302a makes a call through the OS file API and causes the operating system 304a to save the file. The operating system 304a issues an I / O command to the translation engine 308 to store the file. In response, the translation engine 308 issues one or more database commands to the database server 204, causing the database server 204 to store the data contained in the file in a relational table held by the database server 204. Database server 204 may store this data directly on disk, or may call operating system 104 to store the data in an OS file system provided by operating system 104. When the database server 204 calls the operating system 104, the operating system 104 responds by sending a command to the device driver 106 to store the data on the disk 108.
[0029]
As another example, assume that the user of the application 302a selects the “load file” option provided by the application 302a. The application 302a makes a call through the OS File API and causes the operating system 304a to load the file. The operating system 304a issues an I / O command to the translation engine 308 to cause the file to be loaded. The translation engine 308 issues one or more database commands to the database server 204, causing the database server 204 to retrieve data comprising the file to be retrieved from a relational table held by the database server 204. During the data retrieval, the database server 204 may retrieve the data directory or may call the operating system 104 to retrieve the data from the OS file on the disk 108. Once the data is retrieved, the desired file is “built” from the retrieved data. Specifically, the retrieved data is in a format predicted by the application 302a that requested the file. The file thus constructed is transmitted to the application 302a through the translation engine 308 and the operating system 304a.
[0030]
System 300 incorporates a number of novel features. The following sections describe these features in more detail. However, it will be appreciated that specific embodiments are used to illustrate these features, and the invention is not limited to these specific embodiments.
[0031]
OS file system access to correlated and stored data
In accordance with one aspect of the present invention, the system 300 allows an application to access data stored in a database through a conventional OS file API. That is, a conventional application designed to load a file by calling a standard OS file API provided by the operating system can load a file built on the fly from data stored in a relationship table. become. Furthermore, the fact that data is generated from the relational table is completely transparent to the application.
[0032]
For example, assume that the database application 312 issues a database command that inserts a row of data into a table in the database held by the database server 204. Once the line is inserted, the application 302a, which is only designed to access data using the relatively simple OS file API provided by the operating system 304a, will issue the command "Open File" to the operating system. Issue to system 304a. In response, the operating system 304a issues an I / O command to the translation engine 308, which responds by issuing one or more database commands to the database server 204. The database server 204 causes the database server 204 to retrieve a row inserted by the database application 312 by executing a database command (typically in the form of a database query). A file of the file type predicted by the application 302a is constructed from the data contained in the line, and the file thus constructed is returned to the application 302a again through the translation engine 308 and the operating system 304a.
[0033]
The system 300 allows applications that only support conventional OS file system access to load related and stored data, as well as information stored by applications that only support conventional OS file system access. In addition, database applications can be accessed using conventional query techniques. For example, assume that the application 302a makes an OS call and saves the created file. The “save file” command is communicated to the database server 204 through the operating system 304 a and the translation engine 308. The database server 204 receives a “save file” command in the form of a database command issued by the translation engine 308 and receives data contained in the file in one or more databases contained in the database managed by the database server 204. Store in one or more rows of the table. Once the data is stored in that manner in the database, the database application 312 can issue a database query to the database server 204 to retrieve the data from the database.
[0034]
Emulate OS file system configuration in database
As explained above, calls to operating system 304a and 304b file system routines are ultimately translated into database commands issued by translation engine 308 to database server 204. According to one embodiment of the present invention, the process of performing these conversions is simplified by emulating within the database server 204 the file system features implemented by the operating systems 304a and 304b.
[0035]
With respect to this configuration model, most operating systems implement a file system that configures files in a file hierarchy. Thus, this OS file system call made by applications 302a and 302b will typically identify a file in terms of its location in the OS file hierarchy. In order to simplify the conversion of such calls to corresponding database calls, a mechanism is provided for emulating a hierarchical file system in the relevant database system. One such mechanism was filed on 18 February 1999 by Eric Sedlar, “HIERARCHICAL INDEXING to access information organized hierarchically in related systems. US patent application Ser. No. 09 / 251,757 entitled “FOR ACCESSING HIERARCHICALLY ORGANIZED INFORMATION IN A RELATIONAL SYSREM”, the entire contents of which are incorporated herein by reference.
[0036]
Specifically, the “HIERARCHICAL INDEXING” application is hierarchically created by creating, maintaining, and using hierarchical indexes to efficiently access information in relevant systems based on pathnames. Techniques for emulating a configured system are described. Each item that has any children in the emulated hierarchical system has an index entry at its index. The index entries in the index are linked together in a manner that reflects the hierarchical relationship in the items associated with these index entries. Specifically, if a parent-child relationship exists between items associated with two index entries, the index entry associated with the parent item has a direct link to the index entry associated with the child item.
[0037]
As a result, path name resolution is performed by following a direct link between index entries associated with items in the path name according to the sequence of file names in the path name. By using an index in which index entries are linked in this manner, the process of accessing items based on their pathname is significantly accelerated, and the number of disk accesses made during that process is significantly reduced.
[0038]
Hierarchical index
A hierarchical index consistent with the present invention supports an access method based on a hierarchical system path name that moves from parent items to their children as specified by the path name. According to one embodiment, a hierarchical index consistent with the principles of the present invention employs an index entry that includes the following three fields: RowID, FileID, and Dir_entry_list (stored as an array).
[0039]
FIG. 5 shows a hierarchical index 510 that can be used to emulate a hierarchical storage system in a database. FIG. 6 shows a particular file hierarchy that the hierarchy index 510 is emulating. FIG. 7 shows a file table 710 used to store the file shown in FIG. 6 in the relational database.
[0040]
The hierarchical index 510 is a table. The RowID column contains an ID generated by the system and identifies the disk address that allows the database server 204 to locate the row on the disk. According to this relational database system, the RowID may be an implicitly defined field used by the DBMS to locate the data stored on the disk drive. The FileID field of the index entry stores the FileID of the file associated with this index entry.
[0041]
In accordance with one embodiment of the present invention, the hierarchical index 510 stores only index entries for items that have children. Thus, in terms of the emulated hierarchical file system, the only item that has an index entry in the hierarchical index 510 is the directory that is the parent to other directories and / or the directory that currently stores the document. Those items that have no children (eg, Example.doc, Access, Appl, App2, App3 in FIG. 6) are preferably not included. The Dir_entry_list field of the index entry for a given file stores an “array entry” for each child file of the given file in an array.
[0042]
For example, index entry 512 is for a Windows® directory 614. Word directory 616 and Access directory 620 are children of Windows® directory 614. Thus, the Dir_entry_list field of the index entry 512 for the Windows® directory 614 includes an array entry for the Word directory 616 and an array entry for the Access directory 620.
[0043]
According to one embodiment, the specific information that the Dir_entry_list field stores for each child includes the child's file name and the child's FileID. For children that have their own entry in the hierarchical index 510, the Dir_entry_list field also stores the RowID of the child index entry. For example, the Word directory 616 has its own entry in the hierarchical index 510 (entry 514). Accordingly, the Dir_entry_list field of the index entry 512 includes the name of the directory 616 (“Word”), the RowID (“Y3”) of the index entry for the directory 616 in the hierarchical index 510, and the FileID (“X3”) of the directory 616. As will be described in more detail, the information contained in the Dir_entry_list field makes access to information based on path names faster and easier.
[0044]
Some key principles of hierarchical indexes are:
The Dir_entry_list information of the index entry for a given directory is kept together as few disk blocks as possible. This is because the most frequently used file system operations (pathname derivation, directory listing) will always need to see a large number of entries in a directory when it is referenced. It is. In other words, directory entries should have a high locality with respect to the reference, because when a particular directory entry is referenced, other entries in the same directory are also often referenced.
[0045]
Information stored in the index entries of the hierarchical index must be kept to a minimum to fit the maximum number of entries in a particular disk block. Grouping directory entries together in an array means that does not require repeated keys to identify the directory in which they are contained, so that all entries in the directory share the same key.
[0046]
-The time taken to derive the path name should be proportional to the number of directories in the path, not the total number of files in the file system. This allows the user to keep frequently accessed files at the top of the file system tree with less access time.
[0047]
All of these elements are present in typical file system directory structures such as UNIX® systems of inodes and directories. By using a hierarchical index as described here, their purpose matches the structure that the relational database can understand and query, and the database server is separate from the one used to derive the pathname. This makes it possible to perform an ad hoc search for files. To do this, you must use a database concept of an index. That is, subordinate information (file data in this case) arranged in a separate data structure in another way designed to optimize access through a certain method (in this case, derivation of pathnames in a hierarchical tree) It is a duplicate of the part.
[0048]
Using hierarchical indexes
How the hierarchical index 510 can be used to access a file based on the path name of the file will now be described with reference to the flowchart of FIG. For illustrative purposes, assume that document 618 is accessed via its pathname. The path name of this file is /Windows®/Word/Example.doc, which is hereinafter referred to as “input path name”. Given this path name, the path name derivation process begins by locating in the hierarchical index 510 the location of the index entry for the first name in this input path name. In the case of a certain file system, the first name in the path name is the root directory. Accordingly, the pathname derivation process for locating the file in the emulated file system begins by locating the index entry 508 in the root directory 610 (step 800). Since all pathname derivation operations begin by accessing the root directory index entry 508, the data indicating the location of the index entry for the root directory 610 (index entry 508) is the root directory index entry at the start of every search. To quickly locate 508, it can be held at a convenient location outside the hierarchical index 510.
[0049]
Once the location of the index entry 508 for the root directory 610 is located, the DBMS determines whether there are any file names in the input path name (step 802). If there are no more files in the input path name, control proceeds to step 820 where the FileID stored in the index entry 508 is used to look for the root directory entry in the file table 710.
[0050]
In this example, the file name “Windows®” follows the root directory symbol “/” in the input path name. Accordingly, control proceeds to step 804. In step 804, the next file name (eg, “Windows®”) is selected from the input path name. In step 806, the DBMS looks at the Dir_entry_list field of the index entry 508 and the location of the array entry for the selected file name. Find out.
[0051]
In this example, the file name that follows the root directory in the input path name is “Windows®”. Thus, step 806 involves searching the Dir_entry_list of the index entry 508 for the array entry with the file name “Windows®”. If Dir_entry_list does not contain an array entry for the selected file name, control proceeds from step 808 to step 810, where an error is generated indicating that the input path name is invalid. In this example, the Dir_entry_list of the index entry 508 includes an array entry of “Windows®”. Therefore, control passes from step 808 to step 822.
[0052]
The information in the Dir_entry_list of the index entry 508 indicates that one of the children of the root directory 610 is actually a file named “Windows®”. In addition, the Dir_entry_list array entry contains the following information about this child: That is, this is an index entry that matches RowIDY2, and this FileID is X2.
[0053]
In step 822, it is determined whether there is any file name in the input path name. If there are no more file names, control passes from step 822 to step 820. In this example, “Windows®” is not the last file name, so control passes to step 824 instead.
[0054]
Since “Windows®” is not the last file name in the input path, the FileID information included in the Dir_entry_list is not used during this path derivation operation. Rather, the file table 710 is not examined at this point because the Windows® directory 614 is only part of the identified path and not the target. Instead, at step 824, the RowID (Y2) for “Windows®” found in the Dir_entry_list of the index entry 508 is used to locate the index entry for the Windows® directory 614 (index entry). 512).
[0055]
By examining the Dir_entry_list of the index entry 512, the system searches for the next file name in the input path name (steps 804 and 806). In this example, the file name “Word” follows the file name “Windows®” in the input path. Thus, the system searches the Dir_entry_list of the index entry 512 for the “Word” array entry. Such an entry is present in the Dir_entry_list of the index entry 512, indicating that “Windows®” actually has a child named “Word” (step 808). In step 822, it is determined that there is still a file name in the input path, so control proceeds to step 824.
[0056]
When it finds an array entry for “Word”, the system reads the information in that array entry, finds that the index entry for Word directory 616 is found in RowIDY3 in hierarchical index 510, and certain information belonging to Word directory 616. Is found at line X3 in file table 710. Since the word directory 616 is simply a part of the identified path and not the target, the file table 710 is not examined. Instead, the system uses the RowID (Y3) to locate the index entry 514 for the Word directory 616 (step 824).
[0057]
At RowIDY3 of hierarchical index 510, the system finds index entry 514. In step 804, the next file name “Example.doc” is selected from the input path name. In step 806, the Dir_entry_list of index entry 514 is searched to find that there is an array entry for "Example.doc" (step 808), which indicates that "Example.doc" is a child of Word directory 616. . The system also finds that Example.doc has no indexing information in hierarchical index 510 and that specific information about Example.doc can be found in file table 710 using FileIDX4. . Since Example.doc is the target file being accessed (ie, the last file name in the input path), control passes to step 820, where the system uses FileIDX4 to access the appropriate row in file table 710. And the file body (BLOB) stored in the body column of the line is extracted. Thus, the Example.doc file is accessed.
[0058]
Only the hierarchical index 510 was used to access this file. A table scan was not necessary. Given a typical block size and filename length, at least 600 directory entries will fit into one disk block, and a typical directory will have less than 600 entries. That is, the list of directory entries in a given directory will typically fit into a single block. In other words, each index entry of the hierarchical index 510 that includes the entire Dir_entry_list array of index entries will typically fit into a single block and can therefore be read in a single I / O operation.
[0059]
In moving from an index entry to an index entry in the hierarchical index 510, it may be necessary to do some disk access if the various index entries in the index are in a variety of different disk blocks. However, if each index entry fits perfectly into a single block, the number of disk accesses will be less than or equal to the number of directories in the path. Even if the average index entry size does not fit into a single disk block, the number of disk accesses per directory will be a constant term and will not increase with the total number of files in the file system.
[0060]
The above description of techniques for emulating the hierarchical characteristics owned by some file systems is merely exemplary. Other techniques can also be used to emulate the hierarchical characteristics of some file systems and protocols. In addition, some protocols may not even possess hierarchical features. Thus, the present invention is not limited to any particular technique for emulating the hierarchical characteristics of some protocols. Further, the present invention is not limited to protocols that are hierarchical in nature.
[0061]
Emulate other OS file system features in the database
Besides the hierarchical structure of OS file systems, another feature of most OS file systems is that they hold specific system information about the files they store. According to one embodiment, this OS file system feature is also emulated in the database system. Specifically, translation engine 308 issues a command to store “system” data for a file in a row of a file table (eg, file table 710) managed by database server 204. According to one embodiment, all or most of the file content is stored as a large binary object (BLOB) in a certain column of the line. In addition to this BLOB column, this file table further includes a column for storing attribute values corresponding to those realized in the OS file system. Such attribute values include, for example, the owner or creator of the file, the creation date of the file, the last modified data of the file, the hard link to the file, the file name, the file size, and the file type.
[0062]
When translation engine 308 issues database commands to cause database server 204 to perform any file operations, those database commands include statements that cause the attributes associated with the files associated with the operations to be appropriately changed. For example, in response to inserting a new row in the file table for a newly created file, translation engine 308 issues a database command to (1) indicate to the user who is creating the file. Store the value in the “Owner” column of the row, (2) Store the value indicating the current date in the “Created date” column of the row, and (3) Set the value indicating the current date and time to Store in the “change” column, and (4) store a value indicating the size of the BLOB in the “size” column. In response to subsequent operations in this file, the values in these fields are changed as required by these operations. For example, when translation engine 308 issues a database command that modifies the contents of a file stored on a particular row, as part of the same operation, translation engine 308 updates the “last modified” value for that particular row. Issue a database command. In addition, if this change changes the file size, the translation engine 308 also issues a database command to update the “size” value for that particular row.
[0063]
Another feature of most OS file systems is the ability to provide security for each file. For example, some versions of Windows® NT, VMS, and UNIX® maintain access control lists that indicate the rights that various entities have with respect to each file. According to one embodiment of the present invention, this OS file system feature is emulated in the database system by maintaining a “security table”, where each row of the security table is similar to an entry in the access control list. Includes content. For example, one column for storing a value for which a row in this security table identifies a file, and another column for storing a value representing a permission type (eg, read, update, insert, execute, change permission). And another field for storing a flag indicating whether or not the permission has been granted, and an owner field for storing a value representing the owner of the permission for the file. The owner may be a single user specified by a user ID (userid) or a group specified by a group ID (groupid). In the case of a group, the group ID is mapped to the user ID of the member of the group using one or more additional tables.
[0064]
Prior to issuing a database command to access a file stored in a file table managed by the database server 204, the translation engine 308 requests the requested access to the file identified by the user requesting access. Issue a database command to verify that you have permission to execute With such a pre-access database command, data is retrieved from the security table, and it is determined whether or not the user requesting access is permitted to execute the access. If the data retrieved in this way indicates that the user does not have the requested permission, the translation engine 308 does not issue a command to perform the requested operation. Instead, translation engine 308 returns an error message to the operating system from which the request originated. In response to this error message, the operating system will send the application that requested access if the application attempts to access a file held in the operating system's OS file system without permission. Send the same OS error message. Thus, even under error conditions, the fact that data is stored in the relational database rather than the OS file system is transparent to the application.
[0065]
Different operating systems store different types of system information about files. For example, one operating system may store an “archive” flag but not icon information, and another may store icon information and not an archive flag. The particular set of system data maintained by the database system that implements the techniques described herein may vary from implementation to implementation. For example, the database 204 may store all of the system data supported by the OS file system of the operating system 304a, but may store only some system data supported by the OS file system of the operating system 304b. Alternatively, the database server may store all of the system data supported by both operating systems 304a and 304b, or a portion of the system data supported by any one of operating systems 304a and 304b. May be stored.
[0066]
As shown in FIG. 3, the database server 204 stores files generated from a number of separate OS file systems. For example, operating system 304a may be different from operating system 304b, and both operating systems 304a and 304b may be different from operating system 104. OS file systems 304a and 304b may have contrasting features. For example, OS file system 304a may allow the file name to include the character “/”, whereas OS file system 304b may not. According to one embodiment, in such a situation, translation engine 308 is configured to implement OS file system specific rules. Thus, when the application 302a attempts to store a file containing the character “/” in the file name, the translation engine 308 issues a database command that causes the database server 204 to perform the operation. On the other hand, when the application 302b attempts to store a file containing the character “/” in the file name, the translation engine 308 generates an error.
[0067]
Alternatively, translation engine 308 may be configured to implement a single set of rules for all operating systems. For example, the translation engine 308 may, if the file name is invalid even in one operating system supported by the translation engine 308, in the operating system that issued the command that identified the file name. Even if it is valid, a rule may be realized that will cause an error.
[0068]
Convert OS file system calls into database queries
By building a mechanism to emulate OS file system features within the database system, OS file system calls can be transcoded without losing the functionality expected by the application making the OS file system call. Conversion engine 308 may convert it to a database query. This OS file system call made by those applications is made through an OS file API provided by the operating system on which they are executed. For example, for a program written in the “C” programming language, a source code file entitled “stdio.h” is used to identify the interface of an OS file API of an operating system. Since this stdio.h file is included in the application, these applications will know how to call the routine that implements the OS file API.
[0069]
Specific routines that implement the OS file API may vary from operating system to operating system, but typically include routines for performing the following operations. Open file, read from file, write to file, seek inside file, lock file, and close file. In general, the mapping from these I / O commands to relational database commands is
Open file = start transaction, derive pathname and locate line containing file
Write to file = update
Read from file = select
Lock file = lock line associated with file
Seek into file = update counter
Close file = complete transaction (Windows (R) OS file system protocol requires directory entry to be completed just before file data is written, no other protocol requires)
As described in more detail below, some file systems expect to make file names visible even before the contents of the file are received. In the context of these file systems, the “open file” I / O command is used to start a transaction to write a name, complete a transaction to write a name, and start a transaction to write content. Correspond.
[0070]
According to one embodiment, a “current location” in the file is tracked using a counter. In embodiments where the file is stored as a BLOB, the counter may take the form of an offset from the beginning of the BLOB. When the “open file” command is executed, a counter is created and set to a value indicating the execution start address of the BLOB in question. The BLOB counter is then incremented in response to data being read from or written to the BLOB. The seek operation causes the counter to be updated to point to the location in the BLOB indicated by the parameters of this seek operation. According to one embodiment, these operations are described in US patent application Ser. No. 08 / 962,482, filed Oct. 31, 1997 by Nori et al. And entitled “LOB LOCATORS”. It is facilitated by using a LOB locator as described, the entire contents of this application are incorporated herein by reference.
[0071]
In some operating systems, the OS lock may persist even when the file is closed. To emulate this feature, a lock file command is converted into a session lock request. As a result, if “transaction completion” is executed in response to a command to close this file, the lock on the line associated with that file is not automatically released. Locks established in this way are released either explicitly in response to a command to unlock the file or automatically in response to the end of the database session in which the lock was acquired.
[0072]
Ongoing I / O operations
When a file is created, the directory in which the file is created is updated to indicate the existence of the file. In some OS file systems, changing the directory to point to the new file is accomplished before the new file is completely generated. Some applications designed for these OS file systems take advantage of the features. For example, an application may open a new file with a first file handle and proceed to write data into the file. While the data is being written, the same application can open the file with the second file handle.
[0073]
Emulating this feature in a database has special problems. This is because, typically, until a database transaction is completed, another transaction cannot see the changes made by that transaction. For example, a first database transaction is initiated in response to a first “open” command. The first transaction updates the directory table to indicate that the file exists in a particular directory, and then updates the file table to insert a row that contains the file. When a second database transaction is initiated in response to a second “open” command issued by the same application, the second database transaction includes changes to the directory table, new rows in the file table, It is not visible until the first transaction is completed.
[0074]
According to one embodiment of the present invention, the ability to view a directory entry for a file in progress creates an update of the directory table as a separate transaction from the transaction used to insert a row for that file into the file table. Is emulated in the database system. Thus, in response to the first open command, the translation engine 308 issues a database command, (1) initiates the first transaction, and (2) updates the directory table to indicate the presence of a new file. Change, (3) complete the first transaction, (4) start the second transaction, (5) insert a row in this file into the file table, and (6) complete the second transaction. Let By completing the changes to the directory table separately from the changes to the file table, the third transaction initiated in response to the second open command is a directory while an insert into the file table is still in progress. You can see the entries in the table. If the second transaction fails, this directory will be left with the file entry without any content.
[0075]
Translation engine
According to one embodiment of the invention, translation engine 308 is designed in two layers. These layers are shown in FIG. Referring to FIG. 4, the translation engine 308 includes a protocol server layer and a DB file server 408 layer. The DB file server 408 allows an application to access data stored in a database managed by the database server 204 through an alternative API (referred to herein as a DB file API). The DB file API combines aspects of both the OS file API and the database API. Specifically, the DB file API supports file operations similar to those supported by the conventional OS file API.
[0076]
However, unlike the OS file API, the DB file API incorporates the concept of a transactional database API. That is, the DB file API allows an application to specify that a set of file operations is executed on an atomic basis. The advantages of having a file system in which transactions are performed are described in more detail below.
[0077]
DB file server
The DB file server 408 plays a role of converting a DB file API command into a database command. The DB file API commands received by the DB file server 408 may come from the protocol server layer of the translation engine 308 or are specifically designed to perform file operations by making calls through the DB file API. It may be directly from an application (eg, application 410).
[0078]
According to one embodiment, the DB file server 408 is object oriented. Thus, the routine supplied by the DB file server 408 is invoked by instantiating an object and by calling a method associated with the object. In one implementation, the DB file server 408 defines a “transaction” object class that includes the following methods: Insert, save, update, delete, complete and rollback. The DB file API provides an interface that allows external entities to instantiate and use this transaction object class.
[0079]
Specifically, when an external entity (for example, the application 410 or the protocol server) makes a call to the DB file server 408 to generate an instance of a transaction object, the DB file server 408 sends a database command that causes the database server 204 to start a new transaction. send. This external entity then calls the transaction object method. Invoking a method results in a call to the DB file server 408. In response to this call, the DB file server 408 issues a corresponding database command to the database server 204. All database operations performed in response to a method call for a given transaction object are performed as part of the database transaction associated with this given transaction object.
[0080]
Importantly, the method invoked for a single transaction object may involve multiple file operations. For example, the application 410 may interact with the DB file server 408 as follows. The application 410 creates an instance of the transaction object TXO1 by calling through the DB file API. In response to this, the DB file server 408 issues a database command for starting the transaction TX1 in the database server 204. The application 410 calls the TXO1 update method to update the file F1 stored in the database managed by the database server 204. In response, the DB file server 408 issues a database command that causes the database server 204 to perform the requested update as part of the transaction TX1. The application 410 calls the update method of TXO1, and updates the second file F2 stored in the database managed by the database server 204. In response, the DB file server 408 issues a database command that causes the database server 204 to perform the requested update as part of the transaction TX1. Thereafter, the application 410 calls the TXO1 completion method. In response to this, the DB file server 408 issues a database command for causing the database server 204 to complete TX1. If the update to file F2 fails, the TXO1 rollback method is called to roll back all changes made by TX1, including the update of file F1.
[0081]
Although the technique has been described herein with reference to a DB file server using transaction objects, other implementations are possible. For example, in a DB file server, an object can be used to represent a file rather than a transaction. In such implementations, file operations may be performed by invoking file object methods and passing to it data identifying the transaction on which the operation is to be performed. Thus, the present invention is not limited to DB file servers that implement any particular set of object classes.
[0082]
For illustrative purposes, the embodiment depicted in FIG. 4 shows a DB file server 408 as a process execution external database server 204 that communicates with the database server 204 through a database API. However, according to an alternative embodiment, the functionality of DB file server 408 is incorporated into database server 204. By incorporating the DB file server 408 into the database server 204, the amount of inter-process communication generated during use of the DB file system is reduced. The database server created by incorporating the DB file server 408 into the database server 204 thus provides two alternative APIs for accessing data managed by the database server 204: the DB file API and the database API (SQL). )I will provide a.
[0083]
Protocol server
The protocol server layer of the translation engine 308 plays a role of performing conversion between a specific protocol and a DB file API command. For example, the protocol server 406 a converts an I / O command received from the operating system 304 a into a DB file API command that it sends to the DB file server 408. The protocol server 406a also converts the DB file API command received from the DB file server 408 into an I / O command that it sends to the operating system 304a.
[0084]
Actually, there is no one-to-one correspondence between protocols and operating systems. Rather, many operating systems support more than one protocol, and many protocols are supported by more than one operating system. For example, a single operating system may provide unique support for one or more network file protocols (SMB, FTP, NFS), email protocols (SMTP, IMAP4), and web protocols (HTTP). In addition, overlap often occurs between sets of protocols supported by different operating systems. However, for illustrative purposes, a simplified environment is shown where the operating system 304a supports one protocol and the operating system 304b supports another protocol.
[0085]
I / O API
As described above, a protocol server is used to convert an I / O command into a DB file command. The interface between the protocol servers and their communicating OS file systems is a comprehensively labeled I / O API. However, the particular I / O API provided by a protocol server depends both on (1) the entity with which the protocol server communicates and (2) how the protocol server will appear on that entity. For example, the operating system 304a may be Microsoft Windows® NT, and the protocol server 406a may be designed to appear as a device driver for Microsoft Windows® NT. Under this circumstance, the I / O API presented to the operating system 304a by the protocol server 406a will be the type of device interface understood by Windows® NT. Windows (R) NT is supposed to communicate with the protocol server 406a in the same manner as it communicates with any storage device. The fact that files stored in and retrieved from the protocol server 406a are actually stored in and retrieved from the database maintained by the database server 204 is completely for Windows NT. Transparent.
[0086]
Some protocol servers used by the translation engine 308 may present device driver interfaces to their respective operating systems, while other protocol servers may appear as other types of entities. For example, operating system 304a may be a Microsoft Windows® NT operating system, protocol server 406a presents itself as a device driver, while operating system 304b is a Microsoft Windows® 95 operating system. Thus, the protocol server 406b may present itself as a system message block (SMB) server. In the latter case, the protocol server 406b will typically be running on a separate machine from the operating system 304b, and communication between the operating system 304b and the protocol server 406b occurs over a network connection. It will be.
[0087]
In the above example, the source of the I / O command handled by the protocol server is the OS file system. However, the translation engine 308 is not limited to use with OS file system commands. Rather, a protocol server may be provided to convert between DB file commands and some type of I / O protocol. In addition to the I / O protocol used by the OS file system, other protocols that can be provided with a protocol server include, for example, a file transfer protocol (FTP) and an electronic mail system (POP3 or IMAP4). ) Is used.
[0088]
Just as the interface provided by a protocol server that works with an OS file system is directed by a special OS, the interface provided by a protocol server that works with a non-OS file system allows entities that will issue I / O commands. Will change based on. For example, a protocol server configured to receive I / O commands according to the FTP protocol will provide the FTP server API. Similarly, protocol servers configured to receive I / O commands according to the HTTP, POP3, and IMAP4 protocols will provide APIs for HTTP, POP3, and IMAP4 servers, respectively.
[0089]
Like the OS file system, each non-OS file protocol predicts certain attributes that are retained for that file. For example, most OS file systems store data indicating the last modified date of a file, whereas e-mail systems for each e-mail message indicate whether the e-mail message has been read. Store. The protocol server for each particular protocol implements the logic required to ensure that the protocol semantics are emulated in the database file system.
[0090]
Transaction processed file system
Within a database system, operations are generally performed as part of a transaction. Database systems perform all operations that are part of a transaction as a single atomic operation. That is, either all of the operations are completed successfully or none of the operations are performed. If an operation cannot be executed during the execution of a transaction, all previously executed operations of that transaction are canceled or “rolled back”.
[0091]
In contrast to database systems, OS file systems are not transaction based. Therefore, if a large-scale file operation fails, the portion of the operation that was executed before the failure remains. Failure to undo incomplete file operations can lead to directory structure and file corruption.
[0092]
According to one aspect of the present invention, a transaction processed file system is provided. As described above, translation engine 308 converts I / O commands into database statements that are sent to database server 204. The sequence of statements sent by the translation engine 308 to perform the identified I / O operation is preceded by a begin transaction statement and ends with a close transaction statement. As a result, if any failure occurs during the execution of those statements by the database server 204, any changes made by the database server 204 as part of the transaction will be rolled back to the point of failure.
[0093]
The situation that causes a transaction failure can vary based on the system from which the I / O command originated. For example, an OS file system can support the concept of a signature, where a digital “signature” that identifies the source of the file is appended to the file. A transaction initiated to store a signed file may fail if, for example, the stored file has a signature that is not as expected.
[0094]
Convert on-the-fly intelligent files
According to one embodiment of the present invention, files are processed before being inserted into a relational database and are processed again when they are retrieved from the relational database. FIG. 9 is a block diagram showing functional components of the DB file server 308 used to perform inbound and outbound file processing.
[0095]
Referring to FIG. 9, translation engine 308 includes a rendering unit 904 and a parsing unit 902. In general, the parsing unit 902 is responsible for performing inbound processing of the file, and the rendering unit 904 is responsible for performing outbound processing of the file. Each of these functional units will now be described in more detail.
[0096]
Inbound file processing
The inbound file is passed to the DB file server 408 via the DB file API. Upon receiving an inbound file, parsing unit 902 identifies the file type of the file and then parses the file based on the file type. During the parsing process, parsing unit 902 extracts structured information from the file being parsed. This structured information may include, for example, information about the file being parsed, or data representing logically distinct components or fields of the file. This structured information is stored in the database together with the file from which the structured information is generated. Thereafter, a query is issued to the database server, and files can be selected and searched based on whether the structured information extracted in this way satisfies a specific search condition.
[0097]
The particular technique used by parsing unit 902 to parse a document, and the structured data produced thereby, will vary based on the type of document passed to parsing unit 902. Thus, before performing any parsing operation, parsing unit 902 identifies the file type of this document. Various factors can be considered in determining the file type of a file. For example, in a DOS or Windows® operating system, the file type of a file is often indicated by an extension in the file name of the file. That is, if the file name ends with “.txt”, the parser unit 902 classifies the file as a text file and gives the file a parsing technique specific to the text file. Similarly, if the file name ends with “.doc”, the parser unit 902 classifies the file as a Microsoft Word document and gives the file a parsing technique specific to Microsoft Word. On the other hand, the Macintosh operating system stores file type information for a file as an attribute held separately from the file.
[0098]
Other factors that the parsing unit 902 can consider to determine the file type of a file include, for example, the directory in which the file is located. Accordingly, parser unit 902 can be configured to classify and parse all files stored as WordPerfect documents in the \ WordPerfect \ document directory, regardless of the file names of those files.
[0099]
Alternatively, both the file type of the inbound file and the file type requested by the requesting entity may be specified or estimated through information provided to the DB file server 408. is there. For example, when a web browser sends a message, the message typically includes information about the browser (eg, browser type, version, etc.). When a certain web browser requests a certain file through the HTTP protocol server, this information is transmitted to the DB file server 408. Based on this information, the rendering unit 904 can look up information about the capabilities of the browser and also estimate the best file type from those capabilities and carry it to the browser.
[0100]
As described above, the particular parsing technique used by parsing unit 902, and the type of structured data so generated, will vary based on the type of file being parsed. For example, structured data generated by parsing unit 902 may include embedded metadata, derived metadata, and system metadata. Embedded metadata is information embedded in the file itself. Derived metadata is information that is not included in a file and can be derived by analyzing the file. System metadata is data about a file provided by the system that generated the file.
[0101]
For example, assume that application 410 passes a Microsoft Word document to parsing unit 902. Parsing unit 902 parses the document and extracts information about the file embedded in the file. The information embedded in the Microsoft Word document includes, for example, data indicating the author of the document, the category to which the document is assigned, and comments about the document.
[0102]
In addition to locating and extracting the location of embedded information about a Word document, parser 902 can also derive information about that document. For example, the parser 902 may scan the Word document to determine the number of pages, paragraphs, and words included in the document. Finally, the system in which this document originated may provide data to parsing unit 902 indicating the size, creation date, last modification date, and file type of this document.
[0103]
The more structured the file type of a document, the easier it is to extract specific items of structured data from this document. For example, an HTML document typically has a delimiter or “tag” that identifies the beginning and end of a particular field (title, heading 1, heading 2, etc.). These delimiters can be used by parsing unit 902 to parse the HTML document, resulting in metadata items for some or all of the delimited fields. Similarly, the XML file is highly structured and the XML parser will be able to extract another item of metadata for some or all of the fields contained in the XML document.
[0104]
Once the parsing unit 902 has generated structured data for a file, the DB file server 408 issues a database command to the database server 204 and inserts the file into a row of the file table (eg, file table 710). Let According to one embodiment, a database command issued in this manner stores this file as a BLOB in one column of that line, and stores various items of structured data generated for that file in other columns of the same line. To store.
[0105]
Alternatively, some or all of the structured data items for a file can be stored outside the file table. Under such circumstances, the line that stores the structured data associated with a file will typically contain data that identifies the file. For example, assume that a Word document is stored in row R20 of the file table, and system metadata (eg, creation date, modification date, etc.) for that Word document is stored in row R34 of the system attribute table. In such a situation, both R20 of the file table and R34 of the system attribute table will typically include a FileID column that stores a unique identifier for the Word document. The query then retrieves both the file and the system metadata about the file by issuing a join statement that joins the row in the file table with the row in the system attribute table based on the FileID value. Can do. Techniques for storing file attributes in a table associated with a file “class” are described in more detail below.
[0106]
Outbound file processing
Outbound files are constructed by the rendering unit 904 based on information retrieved in response to database commands sent to the database server 204. Once constructed, the outbound file is carried through the DB file API to the entity that requested it.
[0107]
Importantly, the file type (target file type) of the outbound file produced by the rendering unit 904 is not necessarily the same file type (source file type) as the file that produced the data used to build the outbound file. Not necessarily. For example, rendering unit 904 may construct a text file based on data that was originally stored as a Word file in the database.
[0108]
Furthermore, an entity that requests an outbound file may be on a completely different platform using a completely different protocol than the entity that produced the file from which the outbound file was built. For example, assume that the protocol server 406b implements an IMAP4 server interface and the protocol server 406a implements an HTTP server interface. Under these circumstances, e-mail documents generated from the e-mail application can be stored in the database through the protocol server 406b and retrieved from the database by the web browser through the protocol server 406a. In this scenario, the parsing unit 902 will call the parsing technique associated with this email file type (eg, RFC 822), and the rendering unit will call the rendering routine that builds the HTML document from the email data retrieved from the database. Let's go.
[0109]
Registering parsers and renderers
As described above, the parsing technique applied to a file is dictated by the file type. Similarly, the rendering technique applied to a file is dictated by both the source type of the file and the target type of the file. The number of file types that exist across all computer platforms is enormous. Therefore, it is impractical to build a parsing unit 902 that handles all known file types, or a rendering unit 904 that handles all possible conversions from file types to file types.
[0110]
According to one embodiment of the present invention, the problem caused by the proliferation of file types is that the type-specific parsing module can be registered with the parsing unit 902, and the type-specific rendering module can be registered with the rendering unit 904. To deal with it. A type-specific parsing module is a module that implements a parsing technique for a particular file type. For example, a Word document is parsed using a Word document parsing module, while a POP3 email document is parsed using a POP3 email parsing module.
[0111]
Like a type-specific parsing module, a type-specific rendering module is a module that implements a technique for converting data associated with one or more source file types into one or more target file types. . For example, a type specific rendering module may be provided to convert a Word document into a text document.
[0112]
Conversion may be required even if the source file type and the target file type are the same. For example, when parsed and inserted into a database, the contents of an XML document are not held in a single BLOB, but can spread across many columns of many tables. In that case, XML is the source file type of the data, even if the data is no longer stored as an XML file. A type specific rendering module may be provided to build an XML document from the data.
[0113]
When parsing unit 902 receives an inbound file, parsing unit 902 determines the file type of the file and determines whether a type-specific parsing module is registered for the file type. If a type-specific parsing module is registered for the file type, parsing unit 902 calls a parsing routine provided by the type-specific parsing module. These parsing routines parse the inbound file to generate metadata, which is then stored in the database along with the file. If no type-specific parsing module is registered for the file type, parsing unit 902 may generate an error or apply a general parsing technique to the file. Since this general parsing technique has no knowledge of the contents of the file, this general parsing technique is limited in terms of useful metadata that can be generated for the file.
[0114]
When rendering unit 904 receives a file request, rendering unit 904 issues a database command to retrieve data associated with the file. The data includes metadata indicating the source file type of the file. The rendering unit 904 then determines whether a type specific rendering module is registered for the source file type. If a type-specific rendering module is registered for the source file type, it calls the rendering routine given by the type-specific rendering module to build the file, and requests the file so constructed. To the current entity.
[0115]
Various factors can be used to determine which target file type should be selected by the type specific rendering module. In some cases, an entity requesting a file explicitly indicates the type of file it requires. For example, text editors can only handle text files. A text editor can request a file whose source file type is a Word document. In response to this request, a Word specific rendering module is invoked, which converts this Word document into a text file based on the requested target file type. This text file is then transported to the text editor.
[0116]
In other cases, an entity requesting a file may support multiple file types. According to one embodiment, the type specific rendering module (1) identifies a set of file types supported by both the requesting entity and the type specific rendering module, and (2) the best in that set. Incorporate logic to select the target file type. The selection of this best target file can take into account various factors, including the specific characteristics of the file in question.
[0117]
For example, (1) a DB file server 408 receives a request for a file, (2) the source file type of the file indicates that the file is a “BMP” image, and (3) the request is “GIF”, “ Suppose started by an entity that supports TIF and JPG images, and (4) BMP source type specific rendering modules support target file types of GIF, JPG and PCX. . Under such circumstances, the rendering module specific to the BMP source type determines that both “GIF” and “JPG” are possible target file types. To select from these two possible target file types, the rendering module specific to the BMP source type may take into account information about that file, including its resolution and color depth. Based on this information, the BMP source type specific rendering module can determine that JPG is the best target file type and proceed to convert this BMP file to a JPG file. The resulting JPG file is then carried to the requesting entity.
[0118]
According to one embodiment, type-specific parsing and rendering modules are registered by storing information indicating module capabilities in a database table. For example, an entry for a type-specific rendering module may indicate that the source file type should be used when the source file type is XML and the requesting entity is a Windows browser based on Windows®. An entry for the type specific parsing module may indicate that it should be used if the source file type is a .GIF image.
[0119]
When the DB file server 408 receives a command related to a certain file through the DB file API, the DB file server 408 determines the file type at the time of occurrence and the identity of the entity that issued the command. The DB file server 408 then issues a database command to the database server 204, thereby causing the database server 204 to scan a table of registered modules and select an appropriate module for current use. For inbound files, the appropriate parsing module is called to parse the file before it is inserted into the database. For outbound files, the appropriate rendering module is called to build the outbound file from the data retrieved from the database.
[0120]
According to an embodiment of the present invention, a DB file system allows a file class to be defined using object-oriented technology, where each file type belongs to one file class and the file class is the other. You can inherit attributes from file classes. In such a system, the file class of a file can be a factor used to determine the appropriate parser and renderer for that file. The use of file classes will be described in more detail below.
[0121]
Stored query directory
As explained above, a hierarchical directory structure can be implemented in a database system using a file table 710 where each row corresponds to one file. A hierarchical index 510 may be employed to efficiently locate the line associated with the identified file based on the pathname of the file.
[0122]
In the embodiment shown in FIGS. 5 and 7, the child files of each directory are explicitly listed. In particular, the child files of each directory are listed in the Dir_entry_list of the index entry associated with that directory. For example, the index entry 512 corresponds to the Windows® directory 614, and the Dir_entry_list of the index entry 512 explicitly lists “Word” and “Access” as child files of the Windows® directory 614.
[0123]
According to one aspect of the present invention, there is provided a file system in which child files of some or all directories are dynamically determined based on stored query search results rather than explicitly listed. The Such a directory is referred to herein as a stored query directory.
[0124]
For example, suppose a file system user wants to group all files with the extension .doc into a single directory. In the conventional file system, the user creates a directory and searches for all files with the extension .doc, and then moves the files found in this search to the newly created directory, or creates a new directory and Either create a hard link with the file found in the search. Unfortunately, the contents of this newly created directory only accurately reflect the state of the system at the time the search was performed. Even if you change the name to one that does not have a .doc extension, the field will remain in the directory. Furthermore, files with a .doc extension created in other directories after the new directory is established are not included in this new directory.
[0125]
Rather than statistically defining membership of a new directory, membership of this directory can be defined by stored queries. A stored query that selects a file with the extension .doc can appear as follows:
[0126]
Q1:
SELECT ^* from files_table
However,
files_table.Extension = “doc”
Referring to FIG. 7, when executed against table 710, query Q1 selects rows R4 and R12, which are rows for the two documents entitled Example.doc.
[0127]
In accordance with one embodiment of the present invention, a mechanism is provided for linking a query, such as query Q1, to a directory entry in hierarchical index 510. When a directory entry containing such a link is encountered during traversal of the hierarchical index 510, the query specified by this link is executed. Each file selected by this query is treated as a child of the directory associated with the directory entry, just as if the file were an explicit entry in the database table that stores the directory entry.
[0128]
For example, suppose a user wants to create a directory “Documents” that is a child of Word 616 and wants this document directory to contain all files with the extension. According to one embodiment of the present invention, the user designs a query that specifies selection criteria for files that will belong to this directory. In this example, the user may generate query Q1. This query is then stored in the database system.
[0129]
As with other types of directories, a row for the Document directory is added to the file table 710 and an index entry for this Document directory is added to the hierarchical index 510. Further, the Dir_Entry_list of the index entry for the Word directory is updated to indicate that the new Document directory is a child of the Word directory. Rather than explicitly listing the children in Dir_Entry_list, the new directory entry for this Document directory contains a link to the stored query.
[0130]
FIGS. 10 and 11 respectively show the state of the hierarchical index 510 and the file table 710 after appropriate entries are created for the Documents directory. Referring to FIG. 10, an index entry 1004 is created for the Documents directory. Since the children of the Documents directory are dynamically determined based on the stored query result set, the Dir_entry_list field of the index entry 1004 is null. Instead of statically enumerating child files, the index entry 1004 includes a link to a stored query 1002 that is to be executed to determine the child files in the Documents directory.
[0131]
In addition to creating an index entry 1004 for the Documents directory, the existing index entry 514 for the Word directory is updated to indicate that Documents is a child of the Word directory. Specifically, a Dir_entry_list array entry is added to the index entry 514 to specify the name “Documents”, the RowID of the index entry for the Documents directory (ie, Y7), and the FileID of the Documents directory (ie, X13).
[0132]
In the illustrated embodiment, two columns are added to the hierarchical index 510. Specifically, the stored query directory (SQD) column includes a flag indicating whether the directory entry is for a stored query directory. In the directory entry for the stored query directory, a link to the stored query associated with the directory is stored in a query pointer (QP) column. In a directory entry for a directory other than the stored query directory, the QP column is null.
[0133]
The nature of the link can vary from implementation to implementation. For example, according to one implementation, this link may be a pointer to the storage location where the stored query is stored. According to another implementation, this link can simply be a unique stored query identifier that can be used to look up stored queries in the stored query table. The present invention is not limited to any particular type of link.
[0134]
Referring to FIG. 11, here is shown a file table 710 updated to include a row (R13) for the Documents directory. According to one embodiment, the same metadata maintained for the conventional directory is also maintained for the Documents directory. For example, row R13 may include a creation date, a last modified date, and the like.
[0135]
FIG. 12 is a block diagram of a file hierarchical structure. The hierarchical structure shown in FIG. 12 is the same as that of FIG. 6, but a Documents directory 1202 is added. When any application requests to display the contents of the Documents directory 1202, the database executes a query associated with the Documents directory 1202. The query selects files that satisfy this query. The result of this query is then presented to the application as the contents of the Documents directory 1202. At the time shown in FIG. 12, the file system contains only two files that satisfy the query associated with the Documents directory 1202. Both of these files are titled Example.doc. Thus, these two Example.doc files 618 and 622 are shown as children of the Documents directory 1202.
[0136]
In many OS file systems, the same directory cannot store two different files with the same name. Therefore, if two files titled Example.doc exist in the Documents directory 1202, the rules of the OS file system may be broken. Various techniques can be used to address this problem. For example, the DB file system can create a unique file name by adding characters to each file name. Thus, Example.doc 618 can be presented as Example.doc1, while Example.doc 622 is presented as Example.doc2. Instead of adding characters that do not convey specific information, additional characters may be selected to convey meaning. For example, the added character may indicate a path to a directory where the file is statically positioned. That is, Example.doc 618 can be represented as Example.doc_Windows (R) _Word, while Example.doc 622 is represented as Example.doc_VMS_App4. Alternatively, it is also possible to simply break the OS file system rules in the stored query directory.
[0137]
In the example shown in FIG. 10, all child files of a given directory are either statically defined or all are defined by stored queries. However, according to one embodiment of the present invention, a directory may have a number of statically defined child files and a number of child files defined by stored queries. For example, rather than having a null Dir_entry_list, the index entry 1004 may have a Dir_entry_list that statically identifies one or more child files. Thus, when an application requests the database system to identify a child of the Documents directory, the database server will list a set of statically defined child files and child files that satisfy the stored query 1002.
[0138]
Importantly, a stored query that identifies child files in one directory may select other directories and documents. Some or all of such other directories may themselves be stored query directories. Under certain circumstances, a stored query for a particular directory may select that particular directory itself and make that directory its child.
[0139]
Since the child files in the stored query directory are determined on the fly, the child file listing will always reflect the current state of the database. For example, assume that the “Documents” stored query directory is created as described above. Each time a new file with the extension .doc is created, it automatically becomes a child of the Documents directory. Similarly, if a file's extension changes from .doc to .txt, the file will automatically qualify as a child of the Documents directory.
[0140]
According to one embodiment, a query associated with a stored query directory may select a particular database record that becomes a child file of the directory. For example, a directory entitled “Employees” may be linked to a stored query that selects all rows from the Employees table in the database. When an application requests a search for one of the virtual employee files, the renderer uses the data from the corresponding employee record to generate a file of the file type expected by the requesting application.
[0141]
Stored query documents
Just as a stored query can be used to identify child files of a directory, a stored query can also be used to identify the contents of a document. Referring to FIGS. 7 and 11, these drawings show a file table 710 having a Body column. For directories, the Body field is null. For documents, the Body field contains the BLOB that contains the document. For files whose contents are specified by a stored query, the Body field may contain a link to the stored query. When an application requests a search for a stored query document, a stored query linked to the row associated with the stored query document is executed. The content of the document is then constructed based on the set of query results. According to one embodiment, the process of constructing a document from query results is performed by a renderer as described above.
[0142]
In addition to providing support for documents whose contents are fully determined by the results of the stored query, support is also provided for documents where some are determined by the results of the query but others are not. May be. For example, the Body field of a row in the document directory may contain a BLOB, with another field containing a link to the stored query. When a request is received for the file associated with that row, the query may be executed and the results of the query may be combined with the BLOB when rendering the file.
[0143]
Multi-level stored query directory
As described above, stored queries may be used to dynamically select child files of a directory. All child files of a directory belong to the same level in the file hierarchy (ie, the level directly below the directory associated with the stored query). According to certain embodiments, a stored query associated with a directory may define multiple levels under the directory. A directory associated with a query that defines multiple levels shall be referred to herein as a multiple level stored query directory.
[0144]
For example, a multi-level stored query directory may be associated with a query that selects all employee records in the employee table and groups these employee records by department and region. Under these conditions, separate hierarchy levels may be provided for each grouping key (department and region) and employee records. Specifically, the results of such a query may be represented as three different levels in the file hierarchy. The child files of the directory are defined by the first grouping criterion. In this example, the first grouping criterion is “department”. Thus, the child files of the directory may be values of various departments, ie “Dept1”, “Dept2” and “Dept3”. These child files are themselves represented as directories.
[0145]
The child files of the department directory are determined by the second grouping standard. In this example, the second grouping criterion is “region”. Therefore, each department directory has a child file for each of the regional values such as “North”, “South”, “East”, “West”. Regional files are also represented as directories. Finally, the child file of each regional directory is a file corresponding to a particular department / region combination associated with the regional directory. For example, a child of the \ Dept1 \ East directory would be an employee at Department1 in the East region.
[0146]
For child files of stored query directory
Handling file operations
As described above, the child files of the stored query directory are shown to the application in a manner similar to the child files of the conventional directory. However, certain file operations that can be performed on a child file of a conventional directory cause special problems when performed on a child file of a stored query directory.
[0147]
For example, suppose a user has entered that specifies that a child file in a stored query directory should be moved to another directory. This operation creates a problem because the child file belongs to the stored query directory due to the fact that it meets the criteria specified in the stored query associated with the directory. Unless the file is modified in such a way that the file no longer meets the criteria, it will continue to qualify as a child file of the stored query directory.
[0148]
A similar problem occurs when an attempt is made to move a file to the stored query directory. If the file is not already a child of the stored query directory, the file does not satisfy the stored query associated with the stored query directory. A file should not be a child of a stored query directory unless the file is modified in such a way that it satisfies the criteria specified by the stored query.
[0149]
Various approaches can be taken to solve these problems. For example, the DB file system may be configured to issue an error in response to an operation that attempts to move a file into or out of a stored query directory. Alternatively, the DB file system may delete the file in question (or the database record represented as a file) in response to such an attempt.
[0150]
In yet another approach, files moved into the stored query directory may be automatically changed so that they meet the criteria of the stored query associated with the directory. For example, assume that the stored query associated with the stored query directory selects all married employees. When a file corresponding to an employee record is moved to the stored query directory, the “married” field of the employee record is updated to indicate that the employee is married.
[0151]
Similarly, files moved out of the stored query directory may be automatically modified so that they no longer meet the stored query criteria associated with the directory. For example, if a file in the stored query directory for “Married employee” is moved out of that directory, the “Married” field in the corresponding employee record is updated to indicate that the employee is not married. To do.
[0152]
If an attempt is made to move a file that does not meet the stored query criteria into the corresponding stored query directory, an alternative approach is to update the index entry in the stored query directory and make that file a child of the stored query directory. Establish statistically. Under these circumstances, the stored query directory contains several child files that are child files because they satisfy the stored query, and other child files that became child files because they were manually moved to the stored query directory. Will have.
[0153]
Program-defined files
Stored query directories and stored query documents are examples of programmatically defined files. A program-defined file is an entity (such as a document or directory) represented as a file to the file system, but whose contents and / or child files are defined by executing code. is there. The code executed to define the contents of the file may include a stored database query, as in the case of a stored query file, and / or may include other code. According to one embodiment, the code associated with the programmatically defined file implements the following routine:
[0154]

The resolve_filename routine returns a file handle for a file having the name “filename” and which is a child of a programmatically defined file. The list_directory routine returns a list of all child files of the programmatically defined file. The fetch routine retrieves the contents of a file specified programmatically. The put routine inserts data into a programmatically defined file. The delete routine deletes a file specified programmatically.
[0155]
According to one embodiment, a “resolve_pathname (path): file_handle” routine is also provided. The resolve_pathname routine receives a path and iteratively calls the resolve_filename function for each filename in the path.
[0156]
According to one embodiment, the DB file system provides an object class that implements the routines listed above for conventional files (ie, files that are not programmatically defined). For purposes of explanation, the object class will be referred to herein as a “directory class”. A subclass of the directory class is established to implement a programmatically defined file. Its subclasses inherit the directory class routines, but allow the programmer to override the implementation of these routines. The implementation provided by the subclass determines the operations performed by the DB file system in response to file operations involving programmatically defined files.
[0157]
Event notification in the file system
According to one aspect of the present invention, a file system is provided in which a user is proactively notified when a certain file system event occurs. These are proactively notified so that repeated polling overhead is not required to detect conditions that indicate that an event of interest has occurred. The ability to be notified when a file system event occurs is very useful, for example, when a particular file system event is significant to the user.
[0158]
For example, it is common for multiple copies of a document to be maintained ("cached") at different locations, resulting in more efficient access to the document. Under these conditions, if one of the copies is updated, the remaining copies become stale (ie, these copies no longer reflect the current state of the document). Using the event notification method described below, when one copy is updated, a site where another copy exists can be notified in anticipation of the update. The process or user at these sites can then take whatever action is appropriate under the circumstances. In the case of a cache, a suitable action may be to replace the cached version of the document with an updated version, for example.
[0159]
As another example, a particular user may be responsible for reviewing all of a company's technical documents before they are published. The company's technical writer may have been instructed to store all technical documents in the “ready for review” directory when ready for review by the user. Without a proactive notification system, simply storing a technical document in the “ready for review” directory does not make the user aware that the new document is ready for review. Rather, some additional work is required, such as the technical writer informing the user that the document is ready for review, or the user periodically checking the “ready for review” directory. In contrast, in the file system that implements the event notification method described here, the user is ready to review a new technical document by putting the technical document into the “Review Ready” directory. Generation of a message to the user for notification can be triggered.
[0160]
According to one embodiment of the invention, rules may be defined for proactively generating messages for file system events. Such events include, for example, storing or creating a file in a certain directory, deleting a file in a certain directory, moving a file from a certain directory, changing or deleting a certain file, and certain Including linking files to other directories. These file system operations are merely representative. The specific operations for which proactive notification rules can be created may vary from implementation to implementation. The present invention is not limited to providing event notification support for any particular set of file system operations.
[0161]
According to one embodiment, event_id is assigned to a file system event. Therefore, a notification rule that identifies a set of an event_id and one or more subscribers may be created. Once a rule is registered in the file system, a message is automatically sent to the set of consumers identified in the rule in response to the occurrence of the file system event identified by the rule's event_id.
[0162]
For example, a user may register an interest in knowing when a file is added to a particular directory. To record this interest, the database server (1) inserts a row into the “registration rules” table, (2) sets a flag associated with the directory, and at least one rule is for that directory. Indicates that it has been registered. The row inserted into the registered rules table identifies the entity and indicates the events that the entity is interested in. A row may also contain additional information such as the protocol to be used to communicate with that entity. A flag indicating that a rule applies to a directory may be stored in a table row of a file associated with the directory, in a hierarchical index entry associated with the directory, or both.
[0163]
When inserting a file into a directory, the database server checks a flag associated with the directory to determine whether any rules are registered for that directory. If a rule is registered for that directory, the registered rules table is searched to find the specific rule that applies to that directory. If the registered rules contain rules that apply to the specific operations being performed on the directory, a message is sent to the interested entity identified by these rules. The protocol used to send a message to an entity can vary from entity to entity. For example, for some entities the message may be sent via CORBA, while for other entities the message may be sent in the form of an HTML page over HTTP.
[0164]
According to one embodiment, the notification mechanism was submitted by Chandra et al. On Oct. 31, 1997, the entire contents of which are incorporated herein by reference for “message queues in database systems. Using a queuing mechanism, such as the queuing mechanism described in US patent application Ser. No. 08 / 961,597, entitled “Apparatus and Method for Message Queuing in a Data Base System”, as described above. In addition, it is realized with a database realization file system.
[0165]
According to one such embodiment, an event server running outside the database server is registered as a subscriber to a queue managed by the database server. The queue to which the event server subscribes will be referred to herein as the file event queue. Entities interested in a particular file system event register that interest with the event server. The event server communicates with the database server via the database API, and communicates with interested entities via protocols supported by those entities.
[0166]
When the database server performs an operation related to the file system, a message indicating the event_id associated with the operation is placed in the database server file event queue. The queuing mechanism determines that the event server has registered interest in the file event queue and sends a message to the event server. The event server searches the list of interested entities to determine if any entity has registered interest in the event identified in the message. The event server then sends a message indicating the occurrence of the file system event to all entities that have registered interest in the event.
[0167]
In embodiments that use an event server to forward messages to interested entities, the event server may be configured to support a certain maximum number of users. If the number of interested users exceeds the maximum number, an additional event server is started to provide services to the additional users. As in the case of a single event server, each event server in a multiple event server system is registered as a subscriber to the file event queue.
[0168]
According to an alternative embodiment, entities interested in file system events are registered directly as subscribers to the file event queue. As part of the registration information, entities indicate the event_id of the file system event that they are interested in. When a queuing mechanism places a message in a file event queue, the queuing mechanism does not automatically send a message to all queue subscribers. Rather, the queuing mechanism examines the registration information to determine which entities have registered interest in a particular event associated with the message and selectively sends messages to only those entities. For entities that do not support database APIs, the registration information includes information about the protocols that these entities support. The queuing mechanism sends file event messages to these entities using the protocols listed in their registration information.
[0169]
File system event notification may be applied in various contexts. For example, it may sometimes be desirable to store a cache of files residing on a second machine on a first machine. One currently available mechanism for implementing such a file cache is the “briefcase” feature provided by the Microsoft Windows® operating system. The briefcase function allows a user to create a special folder (“briefcase”) on one machine and copy files stored on other machines into the briefcase. Each briefcase has an “update” option that, when selected, causes the file system to compare the copy of the file in the briefcase with the copy of the file in its original location. If the files do not have the same modification date, the file system allows the user to synchronize the two copies (typically by overwriting a newer copy over an older copy).
[0170]
Unlike the briefcase mechanism, the file system event notification mechanism allows the file cache to be proactively updated so that the file cache always reflects the current state of the file in its original location. To do. For example, the process of managing the file cache may register interest in updates to the original copy of the files contained in the cache. This allows the process to be automatically notified when any of the original files are updated, and the updated file can be copied into the file cache immediately in response. . Similarly, one or more directories residing on the second machine may be mirrored on the first machine using a file system event notification mechanism. In order to use the file system event notification mechanism in this manner, the process for maintaining a mirrored directory first makes a copy of the directory and all the files contained within it, and then the directory And register their interest in changes made to the files contained in the directory. When notified that a change has been made to a directory, the process makes a corresponding change to the copy of the directory. Similarly, when notified of a change to any of the files in the mirrored directory, the process makes a corresponding change to the copy of the file.
[0171]
For example, if a file is moved from a mirrored directory to a non-mirrored directory, the process deletes a copy of the file from the mirrored directory and unregisters interest about the file. Thus, the process will not continue to be notified when the file is updated. Similarly, if a file is moved from a non-mirrored directory to a mirrored directory, the process will be notified that the directory has changed. In response to the message, the process identifies the new file, makes a copy of the new file in the mirrored directory, and registers its interest for the new file.
[0172]
Version control in the file system
In the workplace, a large-scale job that many people work together for a long time is called a “project”. When working on a project, employees typically create a number of documents, each of which is related to the project in some way.
[0173]
Similarly, within a computer system, users often create numerous electronic documents that are all related to a project. For example, programmers located at many sites around the world may each work on different parts of the same computer program. The electronic documents they generate for the computer program typically contain source code files, but belong to a single project. That is, in the context of this discussion, a project is a collection of related files.
[0174]
Typically, project files will be organized into specific folders. For example, FIG. 13 shows an example of how files related to the project “Big Project” are organized in various folders. Referring to FIG. 13, a folder 1302 entitled Big Project is created to hold all files (directories and documents) related to the project. The child files immediately below Big Project 1302 are folder source code 1304 and folder docs1306. Source code 1304 includes two directories: LA code 1312 for storing the programmer's source code 1316 and source code 1318 located in Los Angeles, and SF code 1314 for storing the source code 1320 of the programmer located in San Francisco. docs 1306 includes two folders: specs 1308 and user manual 1310. specs 1308 includes specs 1322 and specs 1324. User manual 1310 includes a UM 1326.
[0175]
Often, files within one project will contain references (eg, HTML links) to other files within the same project. These references typically identify other documents using the full path name of the document. Thus, if a document is moved from one location to another in the directory hierarchy, or if the name of the document is changed, all references to the document become invalid.
[0176]
Due to the presence of inter-document references, new versions of files are typically stored in the same location with the same name as the older version they replace. In conventional file systems, this process overwrites older versions of the file, making it impossible to recover. Unfortunately, there are many cases where it is desirable to recover an older version of a file. For example, critical information may have been inadvertently deleted from a newer version. If it is impossible to recover an older version, the user may have to spend considerable resources to reproduce the lost material, but also to reproduce it. unknown. In addition, it is often possible to restore the change history for a file, determine when a particular change was made, or what changed at a point in time. It is desirable to be able to determine whether it has been done.
[0177]
According to one aspect of the invention, a versioning mechanism is provided in which a new version of a file is stored in the same location in the directory hierarchy using the same name as the older version without overwriting the older version. The Rather than overwriting the older version, the older version is retained and the user can selectively search for older versions of the file. In addition, older versions are retained in their original location in the directory hierarchy. As described in more detail below, a new directory versioning technique is provided that allows a file system to hold multiple versions of the same file with the same name at the same location in the directory hierarchy.
[0178]
Since the creation of a new version does not change the name or location of the original version, any reference to the first version of the file will continue to indicate the first version of the file even if a newer version of the file is created. Thus, an inter-file reference contained within a document continues to point to the correct version of the referenced document, even if a newer version of the referenced document is created. The fact that cross-file references remain valid in the versioning process (ie, continue to refer to the correct version of the referenced file) has a rather beneficial impact on the efficiency of file retrieval. Specifically, rather than requiring a lookup operation to find the appropriate version of the referenced file, the referenced file follows references to those contained within other files. You can search directly.
[0179]
Similarly, lookup operations need not be involved in the process of determining the contents of a directory at a particular point in time. Since directories are themselves versioned, selecting a particular version of a directory simply selects a member of the directory. The selected version of the directory will contain a direct link to the correct file belonging to that version of the directory and thus to the correct version of the file.
[0180]
In addition, a technique for tracking the relationship between versions of the same file even when the file name changes from version to version is provided. As will be described in more detail below, in addition to the name of the file, a FileID and version number are maintained for each version of each file. If two files have the same FileID, they are different versions of the same file, even if they have different names.
[0181]
According to one aspect of the invention, a mechanism is provided that allows a user to select a “view” of a project that the user wants to see. A project view represents a project file as it existed at a particular point in time. For example, the default view presented to the user may represent the latest version of all files. Another view may represent the version of the file that was up-to-date a day ago. Another view may represent the version of the file that was current as of a week ago.
[0182]
According to one embodiment, a version tracking mechanism is provided by storing a version number with each file in a project. For example, in a file system implemented in a database system that uses a file table, such as file table 710, one column in a row associated with a file may store a version number for that file. Each time a file is created, a line for the file is inserted into the file table 710 and a predetermined initial version number (eg, 1) is stored in the version field for that line.
[0183]
When the file is updated, the previous version of the file is not overwritten. Instead, a new row is inserted into the file table for the new version of the file. The row for the new version contains the same FileID, Name and Creation Date as the original row, but includes a higher version number (eg 2), a new Modification Date and possibly a different file size. In addition, the BLOB that stores the file contents will reflect the update, but the BLOB of the original entry remains the same.
[0184]
According to one embodiment, if a file and the directory in which it resides belong to a project, a change to the file effectively creates a new version of the directory. Thus, updating a file in the directory not only creates a file table row for the new version of the file, but also creates a file table row for the new version of the directory. In one embodiment using a hierarchical index, an index entry for the new version of the directory will also be added to the hierarchical index.
[0185]
If a directory and a parent directory both belong to the same project, creating a new version of the directory effectively creates a new version of the parent directory. This also adds a new row to the file table and hierarchy index for the directory's parent directory. This process will continue and new versions will be created for all directories that belong to a project and exist above the updated files in the file hierarchy.
[0186]
To show how the versioning mechanism responds to updates to files belonging to a project, assume that all files shown in FIG. 13 are version 1 and that updates have been made to code 1320. As shown in FIG. 14, the versioning mechanism responds to the update by creating a new version of code 1320 ′ without deleting the original version of code 1320. The code 1320 belongs to the SF code directory 1314. Therefore, a new version of the SF code directory 1314 ′ is created without deleting the original version. Since the SF code directory 1314 belongs to the source code directory 1304, a new version of the source code directory 1304 ′ is created without deleting the original version. Finally, since the source code directory 1304 belongs to the big project directory 1302, a new version of big project 1302 ′ is created without deleting the original version.
[0187]
As shown in FIG. 14, when a new version of the parent file is created in response to the new version of the child file, the new version of the parent file is not the original version of the updated file, but the updated version of the updated file. It will continue to have the same children that it had before the update, except that the new version is a child. For example, the new version of code 1320 'is a child of the new version of SF code 1314'. The new version of SF code 1314 'is a child of the new version of source code 1304'. However, the unchanged child file (eg, LA code 1312) of the original source code 1304 continues to be a child file of the new version of source code 1304 ′. Similarly, the new version of source code 1304 ′ is a child of the new version of big project 1302 ′, but the unchanged child file (eg, docs 1306) of the original big project continues to be a child of the new version of big project 1302.
[0188]
In an embodiment where the file system is implemented using a hierarchical index, the index entry created for the new version of the directory replaces the array entry for the updated child file with the array entry for the new version of the child file. Except that it will contain the same Dir_entry_list as the index entry for the previous version of the directory. If the updated child file is a child directory, the Dir_entry_list array entry for the new directory will contain the RowID in the hierarchical index of the index entry for the new version of the child directory.
[0189]
If a file that belongs to a project is moved from one directory in the project to another directory in the project, the file itself has not changed and a new version of the file is not created. However, both the original directory where the file was moved and the directory where the file was placed have changed. This creates new versions for these directories and all ancestors of these directories in the same project. FIG. 15 shows a new directory that will be created in response to code 1318 of FIG. 13 being moved from LA code 1312 to SF code 1314. Specifically, new versions of LA code 1312 ′ and SF code 1314 ′ are created. The new version of LA code 1312 'does not have code 1318 as its child. Rather, code 1318 is a child of the new version of SF code 1314 '. A new source code directory 1304 'is created and linked to the new version of LA code 1312' and SF code 1314 '. A new big project directory 1302 ′ is created and linked to the new source code 1304 ′ and the original docs directory 1306.
[0190]
Using the versioning technique described above, each time a change is made to a project (eg, big project 1302), a new version of the project's root directory is created. A link derived from each version of the root project directory links all the files that belonged to that project at a particular point in time, and versions of files linked in this way existed at that particular point in time It is a version. For example, referring to FIG. 14, the link derived from big project 1302 reflects the project that existed before the update to code 1320. The link derived from big project 1302 ′ reflects the project that existed immediately after the update to code 1320. Similarly, in FIG. 15, the link derived from big project 1302 reflects the project in the state that existed before moving code 1318 from LA code 1312 to SF code 1314. The link derived from big project 1302 ′ reflects the project that existed immediately after code 1318 was moved from LA code 1312 to SF code 1314.
[0191]
Tagging
Unfortunately, the versioning technique described above causes a significant surge in file versions of directories, especially at higher levels of the project. In some situations, such a surge may not be necessary or desirable. Thus, according to one embodiment of the present invention, a mechanism for “tagging” a version of a file is provided. Indicates that the version of the file should be retained by tagging the version of the file. That is, rather than always keeping an older version of a file when a newer version is created, an older version of a file is kept only if it is tagged. Otherwise, they are replaced (overwritten) as newer versions are created.
[0192]
Referring to FIG. 13, assume that code 1320 is not tagged. If code 1320 is updated, the new version of code is simply replaced with the old version of code. Only when code 1320 is tagged will separate new versions of code 1320, SF code 1314, source code 1304 and big project 1302 be created, as shown in FIG.
[0193]
In many cases, tags will be applied to all files in a project at the same time. For example, if a particular version of a software program is released, all source code used to create the released version of the program may be tagged at that time. This allows the exact same set of source code associated with the released version to be available for later reference, regardless of subsequent revisions to the source code file.
[0194]
In embodiments where tags are always applied to the project as a whole, a single tag may be maintained for the root project directory. When using the root project directory version that is tagged to determine the location of a file, any changes to that file will create a new version of that file, while the original version of that file is Retained. Conversely, if the location of a file is verified using an untagged version of the root project directory, any change to that file will simply overwrite the previous version of the file.
[0195]
According to another embodiment, applying a tag to a file effectively applies the tag to all files below that file in the file hierarchy. For example, assume that a tag applies to LA code 1312. If code 1318 is moved out of LA code 1312, a new version of LA code 1312 is created. When code 1318 is updated, new versions of both code 1318 and LA code 1312 are created. In such an embodiment, if the location of a file is verified by traversing the file hierarchy through all tagged files, any change to that file will create a new version of the file. If the location of a file is verified without traversing any tagged file in the hierarchy, any changes to that file will overwrite the previous version of that file.
[0196]
Purge count
Another approach to reducing version spikes that can be used instead of or in addition to tagging involves maintaining a purge count. The purge count indicates the maximum number of versions that will be kept for any given file. If a new version is created for a file whose number of versions has already reached the purge count, the new version of the file overwrites the oldest retained version of the file. The purge count may be implemented on a per file system, a per project basis, or a per file basis. When implemented on a file-by-file system basis, a single purge count applies to all files maintained in the file system. On a per project basis, all files in a given project may have the same purge count, but different projects may have different purge counts. On a per file basis, a different purge count can be specified for each file.
[0197]
When used in combination with tagging, the purge counting mechanism can be implemented in various ways. According to one embodiment, the tagged file is ignored for purposes of determining whether the purge count will be exceeded by creating a new version of the file, and the tagged file is Will never be deleted. For example, assume that the purge count for a file is 5, that is, there are 5 versions of that file, and that one of these 5 versions is tagged. When an update is made to the file, the purge count mechanism determines that there are currently only four existing untagged versions of the file, and therefore separates the file without deleting any of the existing versions. Create a version of If the same file is updated again, the purge count mechanism determines that there are five existing untagged versions of the file, and therefore the oldest untagged version of the file in response to creating a new version Is deleted.
[0198]
Link between projects
Each link has a source file (the file from which the link is expanded) and a target file (the file that the link points to). In a file hierarchy, the link source file is often a directory, and the link target file is a file in the directory. However, not all links are between the directory and its children. For example, an HTML file may include graphic images and hyperlinks to other HTML files. In a file system implemented using a hierarchical index, these hyperlinks can be handled in the same manner as directory-document links.
[0199]
The file system view shows how each project in the file system existed at a particular point in time. However, the point in time associated with one project in one view may be different from the point in time associated with another project in the same view. This creates a problem when the link source file belongs to a different project than the link target file. For example, assume that the view identifies a time T1 for project P1 that includes file F1 and a later time T2 for project P2 that includes file F2. Further assume that file F2 has a link to file F1. The links included in the T2 version of F2 go to the T2 version of P1, not the T1 version of P1. However, since the view identifies T1 for P1, the T1 version of P1 should be used for any operation performed on any file in P1 via that view.
[0200]
According to one embodiment of the invention, an "interproject boundary" flag is maintained for each link. The link's inter-project boundary flag indicates whether the source and target files for the link are in the same project. In a file system that uses a hierarchical index such as the hierarchical index 510, the inter-project boundary flag may be stored in each array entry of the Dir_entry_list of the index entry, for example.
[0201]
In a file hierarchy traversal, the cross-project boundary flag of every link is checked before following the link. When the inter-project boundary flag of a certain link is set, the required version time of the project to which the source side file belongs is compared with the required version time of the project to which the target side file belongs. If the desired version times are the same, the link is traversed. If the desired version times are not the same, a search is performed looking for the version of the target file corresponding to the required version time of the project to which the target side file belongs.
[0202]
For example, the inter-project boundary flag for the link between F2 and F1 is set. Thereby, the required version time of P2 is compared with the required version time of P1. The required version time of P2 is T2, which is not the same as T1, which is the required version time of P1. Therefore, P1 will not be able to confirm its location by following the link. Instead, a search is performed to confirm the location of the version of P1 corresponding to time T1.
[0203]
According to an alternative embodiment, no inter-project boundary flag is maintained. Instead, each time a link is encountered, the requested version time of the source file is compared to the requested version time of the target file. If the source and target files are in the same project, or if they are in different projects with the same required version time, follow that link. Otherwise, a search is performed looking for the correct version of the target file.
[0204]
Object-oriented file system
In recent years, object-oriented programming has become a standard programming norm. In object-oriented programming, the world is modeled from an object perspective. An object is a record combined with procedures and functions that manipulate the record. All objects in an object class have the same fields (“attributes”) and are manipulated by the same procedures and functions (“methods”). An object is said to be an “instance” of the object class to which it belongs.
[0205]
Occasionally, an application may require the use of object classes that are similar but not identical. For example, an object class used to model both dolphins and dogs may include nose, mouth, length and age attributes. However, the dog object class may require a hair color attribute while the dolphin object class may require a fin size attribute.
[0206]
Object-oriented programming supports “inheritance” to facilitate programming in situations where an application requires multiple similar attributes. Without inheritance, the programmer would have to write one set of code for the dog object class and a second set of code for the dolphin object class. Codes that implement attributes and methods common to both object classes will appear redundantly in both object classes. The duplication of code in this manner is very inefficient, especially when the number of common attributes and methods is much greater than the number of unique attributes. Furthermore, code duplication between object classes complicates the process of revising code. This is because changes made to a common attribute must be replicated at multiple locations in the code to maintain consistency among all object classes that have that attribute.
[0207]
Inheritance makes it possible to establish a hierarchical structure between object classes. The attributes and methods of a given object class automatically become object class attributes and methods based on the given object class in the hierarchical structure. For example, an “animal” object class may be defined as having nose, mouth, length, and age attributes along with associated methods. To add these attributes and methods to the dolphin and dog object classes, the programmer can specify that the dolphin and dog object classes "inherit" the animal object class. Under such circumstances, the dolphin and dog object classes are said to be “subclasses” of the animal object class, whereas the animal object class is said to be the “parent” class of the dog and dolphin object classes.
[0208]
According to one aspect of the present invention, a mechanism for applying an object-oriented norm including inheritance to a file system is provided. Specifically, each file in the file system belongs to a certain class. The file system class defines, among other things, the type of information that the file system stores about the file. According to one embodiment, a base class is provided. The user of the file system may then register other classes, which may be defined as a base class or a subclass of any previously registered class.
[0209]
When a new file class is registered with the file system, the file system is effectively extended to support new types of files and interactions with new types of file systems. For example, most email applications expect an email document to have a “priority” property. If the file system does not provide storage for the priority property, the email application may not work correctly for email documents stored in that file system. Similarly, certain operating systems may expect certain types of system information to be stored with the file. If the file system does not store that information, the operating system can encounter problems. By registering a class that includes all the attributes required to support a particular type of system or protocol (eg, a particular operating system, FTP, HTPP, IMAP4, etc.), Accurate and transparent interaction is possible.
[0210]
To register a class, information about that class is provided, which identifies the parent class of the class and describes what attributes the class has with attributes that the parent class does not have including. The information may also specify a particular way to operate on instances of that class.
[0211]
An object-oriented file system that allows users to register file classes, supports inheritance between file classes, and stores information about files based on the class to which the file belongs. Can be implemented in various ways. According to one embodiment, the object oriented file system is provided in the context of a database-implemented file system as described above. Various aspects of the object-oriented file system will be described in connection with a database-implemented embodiment, but the object-oriented file system technique described here is not limited to such an embodiment.
[0212]
Database implementation of object-oriented file system
According to one embodiment, the database-implemented file system is provided with a base class, and a subclass of the base class can be registered in the file system. With reference to FIG. 16, an exemplary set of file classes is shown. The base class is entitled “Files” and contains attributes that are commonly common to all files, including name, creation date, and modification date. Similarly, the methods of the Files class include methods for operations that can be performed on all files.
[0213]
According to one embodiment, the attributes of the Files class are a merge of all attributes maintained by the operating system with which the database-implemented file system will be used. For example, assume that the file system is implemented in a database maintained by server 204 as shown in FIG. Although the files stored in that file system originate from operating system 304a and operating system 304b, these operating systems do not necessarily support the same set of file attributes. For this reason, the set of attributes of the Files class of the file system implemented by the database server 204 is a merge of the set of attributes supported by the two operating systems 304a and 304b.
[0214]
According to an alternative embodiment, the attributes of the Files class are the intersection of all attributes maintained by the operating system with which the database-implemented file system is used. In such an embodiment, a subclass of the Files class can be registered for each operating system. A subclass registered for a given operating system extends the base Files class by adding all of the attributes supported by the given operating system that are not already included in the base Files class. Become.
[0215]
In the example illustrated in FIG. 16, two subclasses of Files, a “Document” class and a “Folder” class, are registered. The Document class inherits all of the attributes and methods of the Files class and adds attributes specific to document files. In the illustrated embodiment, the Document class adds an attribute “size”.
[0216]
The Folder class inherits all of the attributes and methods of the Files class and adds attributes and methods that are specific to folder files (ie, files such as directories that can contain other files). In the illustrated embodiment, the Folder class introduces a new attribute “max_children” and a new method “dir_list”. The max_children attribute may, for example, indicate the maximum number of child files that can be included in a given folder. The dir_list method may, for example, provide a list of all child files of a given folder.
[0217]
In the class hierarchical structure illustrated in FIG. 16, the Document class has two registered subclasses, e-mail and Text. Both of these subclasses inherit all of the attributes and methods of the Document class. In addition, the e-mail class includes three additional properties: Read_flag, priority, and sender. The Text class has one additional attribute, CR_Flag, and an additional method Type. CR_Flag may be a flag indicating whether the text document includes a “carriage return” symbol. The Type method outputs a text document to an input / output device such as a computer monitor.
[0218]
File class and file format
The internal structure of a file is called the “format” of the file. Typically, the file format is determined by the application that creates the file. For example, a document created by one word processor may have the same semantic content as another document created by another word processor, but may have a completely different format. In some file systems, a mapping is maintained between the document format and the file name extension. For example, all files with file names ending in .doc are presumed to be files created by a particular word processor, and thus presumed to have an internal structure imposed by that word processor. In other file systems, information about the format of a document is maintained in a separate metafile associated with the document.
[0219]
In contrast to file formats, the file class mechanism described here is not related to the internal structure of the document. Rather, the file class of a file determines what information the file system maintains for that file and what operations the file system can perform on the file. For example, all documents created by multiple word processors can be instances of the Document class. For this reason, the file system maintains the same attribute information for a document and allows the same operation to be performed on the document even if the internal structure of the document is completely different.
[0220]
Class table
According to one embodiment, the object-oriented file system is implemented in a relational database system in which a relation table is created for each class of file. FIG. 17 is an example of a table that can be created for the class illustrated in FIG. Specifically, the Files table 1702, the document table 1704, the E-mail table 1706, the Text table 1708, and the Folder table 1708 respectively correspond to the Files class, Document class, E-mail class, Text class, and Folder class.
[0221]
According to one embodiment, the class table for a given class is for (1) a file belonging to that given class and (2) a file belonging to any descendant class of that given class. Contains the line. For example, in the illustrated system, the Files class is a base class. Thus, all files in the file system are members of the Files class or its descendant classes. Thus, the Files table will contain a row for every file in the file system. On the other hand, the E-mail class and Text class are descendants of the Document class, but the Files class and Folder class are not. Thus, the Document table 1704 includes lines for all files of class Document, E-mail, or Text, but does not include lines for files that are of class Files or Folder.
[0222]
The table for each class includes a column that stores values for attributes introduced by that class. For example, the Document class inherits the attributes of the Files class and adds a size attribute to these attributes. Therefore, the Document table includes a column for storing a size value for the size attribute. Similarly, the E-mail class inherits the attributes of the Document class and introduces read_flag, priority, and sender attributes. Therefore, the E-mail table 1706 includes columns for storing a read_flag value, a priority value, and a sender value.
[0223]
Five files are stored in the file system shown in FIG. The file named File1 is stored in RowID X1 in the Files table 1702. File1 has FileID F1. The class of File1 is the File class, which is as indicated by the value stored in the Class column of row X1. Since File1 is an instance of the Files class, the Files table 1704 is the only class table that contains information for File1. Thus, the only attribute value stored for File1 is the value for the attribute associated with the Files class.
[0224]
A file named File2 is stored in RowID X2 in the Files table 1702. The FileID of File2 is F2. The class of File2 is the Document class, which is as indicated by the value stored in the Class column of row X2. Since File2 is an instance of the Document class, the Files table 1702 and Document table 1704 include information on File2. That is, the attribute values stored for File2 are values for attributes associated with the Document class, including attributes inherited from the Files class.
[0225]
A file named File3 is stored in RowID X3 in the Files table 1702. File3 has FileID F3. The class of File3 is an E-mail class, as indicated by the value stored in the Class column of row X3. Since File3 is an instance of the E-mail class, the Files table 1702, the Document table 1704, and the E-mail table 1706 all include information on File3. That is, the attribute values stored for File3 are values for attributes associated with the E-mail class, including attributes inherited from the Document class and the Files class.
[0226]
A file named File4 is stored in RowID X4 in the Files table 1702. File4 has FileID F4. The class of File4 is the Text class, which is as indicated by the value stored in the Class column of row X4. Since File4 is an instance of the Text class, the Files table 1702, Document 1704, and Text table 1708 include information for File4. That is, the attribute values stored for File4 are values for attributes associated with the Text class, including attributes inherited from the Document class and the Files class.
[0227]
A file named File5 is stored in RowID X5 in the Files table 1702. File5 has FileID F5. The class of File5 is a Folder class, which is as indicated by the value stored in the Class column of row X5. Since File5 is an instance of the Folder class, the Files table 1702 and Folder table 1708 include information for File5. That is, the attribute values stored for File5 are values for attributes associated with the Folder class, including attributes inherited from the Files class.
[0228]
According to one embodiment of the invention, the files in the class table are accessed by traversing the hierarchical index as described above in connection with FIGS. By traversing the hierarchical index (as done in pathname derivation), the RowID of the row in the Files table 1702 corresponding to the target file is generated. From that line, the attribute value for the Files class attribute is retrieved. However, for files belonging to other classes, additional attributes may have to be retrieved from other class tables. For example, for File3, the creation date and modification date can be retrieved from row X3 of Files table 1702. However, in order to retrieve the size of File3, it is necessary to access the row Y2 of the Document table 1704. In order to retrieve priority information for File3, line Q1 of the E-mail table 1706 must be accessed.
[0229]
To facilitate searching for various attribute values belonging to a file, the lines containing these attributes are linked together. In the illustrated embodiment, the link is stored in the column labeled “Derived RowID”. The value stored in the Derived RowID column of the row for a particular file in the table for a particular class points to the row for that particular file that exists in the table for that particular class subclass. For example, the Derived RowID field of the Files table row X3 for File3 contains the value Y2. Y2 is the RowID of the row for File3 in the Document table 1704. Similarly, the Derived RowID column of Document row Y2 includes the value Q1. Q1 is the RowID of the row for File3 in the E-mail table 1706.
[0230]
In the illustrated embodiment, the link between lines for a particular file is one-way, going from a line in the table for the parent class to a line in the subclass table. These one-way links facilitate searches starting from rows in the base table (ie, the file table), but this is true under most conditions. However, if the starting point of the search is a row in another table, the relevant row in the parent class table cannot be located by a link. To find these related rows, a search of these tables may be performed based on the FileID of the file of interest.
[0231]
For example, suppose the user wants to retrieve row Y2 of Document table 1704 and retrieve all other attribute values for File3. The row containing the attribute value specific to E-mail may be found by following the pointer in the Derived RowID column of row Y2, which points to row Q1 in the E-mail table 1706. However, to search for the remaining attributes, the Files table 1702 is searched based on FileID F3. Such a search will find row X3, which contains the remaining attribute values of File3.
[0232]
According to an alternative embodiment, links between related rows may be implemented in a manner that allows all relevant rows to verify their location without FileID lookup. For example, each class table may also have a Parent RowID field that contains the RowID of the relevant row in the parent class table. Therefore, the Parent RowID column for the row Y2 in the Document table 1704 points to the row X3 in the Files table 1702. Alternatively, the last row in the one-way link chain may contain a pointer back to the relevant row in the Files table. Yet another option includes providing a column for each class table that contains a pointer back to the relevant row in the Files table. Therefore, both the line R1 of the Text table 1708 and the line Y3 of the Document table 1704 both include a pointer that returns to the line X4 of the Files table 1702.
[0233]
Subclass registration
As mentioned above, a mechanism is provided for extending the class hierarchy of the file system by registering new classes. In general, the information provided in the class registration process includes data identifying the parent class of the new class and data describing the attributes added by the new class. Optionally, the data may also include data used to identify new methods that can be performed on instances of the new class.
[0234]
Registration information may be provided to the file system using any of a number of techniques. For example, the user may be presented with a graphical user interface that includes icons representing all of the registered classes, and the user manipulates the controls represented by the user interface to (1) create one of the new classes Select as the parent of the class, (2) name the new class, (3) define additional attributes for the new class, and (4) define new methods that can be done for the new class Good. Alternatively, the user may give the file system a file containing registration information for the new class. The file system parses the file to identify and extract information and creates a class file for the new class based on the information.
[0235]
According to one embodiment of the present invention, class registration information is provided to the file system in the form of an Extensible Markup Language (XML) file. The XML format is described in detail at www.oasis-open.org/cover/xml.htm1#contents and the sites listed there. In general, the XML language includes tags that nominate fields and mark the beginning and end of the fields and values for these fields. For example, an XML document containing registration information for the “Folder” file class may contain the following information:
[0236]

In response to receiving this file class registration document, the file system creates a table for the new class Folder. The new table created in this way includes a column for each of the attributes defined in the registration information. In this example, only the max_children attribute is defined. The data type specified for the max_children attribute is “integer”. Therefore, the Folder table is created with a max_children field that holds an integer value. In addition to the attribute name and type, various other information may be provided for each attribute. For example, the registration information may indicate a range or maximum length for the attribute value, indicating whether the field should be indexed or whether the field should be subject to uniqueness or referential constraints. Also good.
[0237]
Registration information also includes information about any method supported by the new class file. According to one embodiment, new methods are identified by identifying files that contain routines associated with these methods. According to one embodiment, the routine associated with each file class is implemented in the JAVA (R) class. If the first file class is a subclass of the second file class, the JAVA (R) class that implements the method associated with the first file class is the JAVA (R) class that implements the second file class method. Is a subclass of
[0238]
In the XML example given above, the dbi_classname field of the registration information specifies the JAVA (R) class file for the Folder file class. Specifically, the registration information brings the file name “my_folder_methods” to the dbi_classname field, indicating that the my_folder_methods JAVA® class implements a routine for the non-inherited method of the Folder class. Since the Folder class is a subclass of the Files class, the my_folder_methods class is a subclass of the JAVA (R) class that implements a method for the Files class. Therefore, the my_folder_methods class inherits the Files method.
[0239]
In addition to defining new methods that are not supported by the parent file class, routines for child file classes can override the implementation of the method defined in the parent class. For example, the Files class shown in FIG. 16 provides a “store” method. The Folder class inherits its storage method. However, the implementation of the storage method provided for the Files class may not be the implementation required to store folders. Thus, the Folder class may provide its own implementation of the storage method, thereby overriding the implementation provided by the Files class.
[0240]
Determining file class
When the file system is asked to perform an operation on a file, the file system calls a routine that implements the requested operation for a particular class of files to which the file belongs. As mentioned above, the same operation may be implemented differently for different file classes, for example when a subclass overrides the implementation provided by its parent class. That is, to ensure that the correct operation is performed, the file system must first identify the class of the file on which the operation is to be performed.
[0241]
For files already stored in the file system, the task of identifying the class of the file may be trivial. For example, in the example shown in FIG. 17, the Files table 1702 includes, for any given row, a Class column that stores data indicating the class of the file associated with that row. Therefore, when a request to perform a “move” operation on File 3 is received, the Class field on line X 3 is examined to determine that File 3 is of the E-mail type. As a result, the realization of “moving” email should be implemented. The "move" E-mail implementation is the implementation brought about for the E-mail file class if the E-mail file class overrides the inherited "move" method implementation. Otherwise, the "Move" e-mail implementation is an implementation inherited by the E-mail class.
[0242]
The task of identifying a class of files can be more difficult if the files are not already stored in the file system. For example, when the file system is asked to store a file that does not already exist in the file system, the file system cannot make a class decision by examining the file table. Under such conditions, various techniques may be used to identify the type of file. According to one embodiment, the file type can be explicitly provided in the file operation request. For example, if a request is made in response to a command issued via the operating system command line, one of the command line arguments (command-line arguments) may be used to indicate the file type of the file. Good. For example, the command may be entered as “move a: \ mydocs \ file2c: \ yourdocs / class = document”.
[0243]
Another technique for determining the class of a file includes determining the class based on information contained in the name of the file. For example, all files having a certain extension (for example, doc.wpd.pwp) may all be treated as members of a specific file class (for example, Document). Thus, when the file system is asked to perform operations on these files, the method implementation associated with that particular file class is used.
[0244]
Yet another technique for determining the class of a file includes determining the class based on the location of the file in the file system hierarchy. For example, all files created within a particular directory or set of directories may be presumed to belong to a particular file class regardless of how the file is named. These and other approaches may be combined in various ways. For example, a file with a certain extension may be treated as a member of the first class unless the file is stored in the directory associated with the second class. If the file is stored in the directory associated with the second class, the file is in the second class unless the file operation request explicitly identifies the file as a member of another file class. Is treated as a member of
[0245]
Hardware appearance
FIG. 18 is a block diagram that illustrates a computer system 1800 upon which an embodiment of the invention may be implemented. Computer system 1800 includes a bus 1802 or other communication mechanism for communicating information, and a processor 1804 coupled with bus 1802 for processing information. Computer system 1800 also includes main memory 1806, such as random access memory (RAM) or other dynamic storage device, coupled to bus 1802 for storing information and instructions to be executed by processor 1804. Main memory 1806 may also be used to store temporary variables or other intermediate information during execution of instructions to be executed by processor 1804. Computer system 1800 further includes a read only memory (ROM) 1808 or other static storage device coupled to bus 1802 for storing static information and instructions for processor 1804. A storage device 1810, such as a magnetic disk or optical disk, is provided and coupled to bus 1802 for storing information and instructions.
[0246]
Computer system 1800 may be coupled via bus 1802 to a display 1812 such as a cathode ray tube (CRT) for displaying information to a computer user. Input device 1814, including alphanumeric keys and other keys, is coupled to bus 1802 and communicates information and command selections to processor 1804. Another type of user input device is a cursor control 1816, such as a mouse, trackball or cursor direction key, that communicates direction information and command selections to the processor 1804 and controls cursor movement on the display 1812. This input device typically has two degrees of freedom in two axes, a first axis (eg, x) and a second axis (eg, y), which indicates that the device locates in a plane. enable.
[0247]
The invention is related to the use of computer system 1800 for implementing the techniques described herein. According to one embodiment of the invention, these techniques are implemented by computer system 1800 in response to processor 1804 executing one or more sequences of one or more instructions contained in main memory 1806. . Such instructions may be read into main memory 1806 from another computer readable medium, such as storage device 1810. Execution of the sequence of instructions contained in main memory 1806 causes processor 1804 to perform the process steps described herein. In alternative embodiments, routing circuitry may be used in place of software instructions or in combination with software instructions to implement the present invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software.
[0248]
The term “computer-readable medium” as used herein refers to any medium that participates in providing instructions to processor 1804 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 1810. Volatile media includes dynamic memory, such as main memory 1806. Transmission media includes coaxial cable, copper wire and optical fiber, including the wiring that makes up bus 1802. Transmission media may also take the form of acoustic or light waves, such as those generated in radio wave and infrared data communications.
[0249]
Common forms of computer readable media include, for example, floppy (R) disks, flexible disks, hard disks, magnetic tapes or other magnetic media, CD-ROMs, all other optical media, punched cards, paper tapes, holes All other physical media with patterns, RAM, PROM and EPROM, FLASH-EPROM and all other memory chips or cartridges, carrier waves as described below, or any other media that can be read by a computer.
[0250]
Various forms of computer readable media may be involved in providing processor 1804 with one or more sequences of one or more instructions for execution. For example, the instructions may initially be carried on a remote computer magnetic disk. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem included in computer system 1800 can receive the data on the telephone line and use an infrared transmitter to convert the data to an infrared signal. The infrared detector receives the data carried in the infrared signal and an appropriate circuit can output the data on the bus 1802. Bus 1802 carries data to main memory 1806, from which processor 1804 fetches and executes instructions. The instructions received by main memory 1806 may optionally be stored on storage device 1810 either before or after execution by processor 1804.
[0251]
Computer system 1800 also includes a communication interface 1818 coupled to bus 1802. Communication interface 1818 provides a two-way data communication coupled to a network link 1820 that is connected to a local network 1822. For example, communication interface 1818 may be a modem or integrated services digital network (ISDN) card that provides a data communication connection to a corresponding type of telephone line. As another example, communication interface 1818 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. A wireless link may also be realized. In any such implementation, communication interface 1818 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
[0252]
Network link 1820 typically provides data communication to other data devices via one or more networks. For example, the network link 1820 may provide a connection via the local network 1822 to a host computer 1824 or to data equipment operated by an Internet service provider (ISP) 1826. The ISP 1826 provides data communication services over the world wide packet data communication network now commonly referred to as the “Internet” 1828. Local network 1822 and Internet 1828 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals over the various networks and the signals on network link 1820 that exchange digital data with computer system 1800 and through communication interface 1818 are exemplary forms of carrier waves that carry information.
[0253]
Computer system 1800 can send messages and receive data, including program code, over a network, network link 1820 and communication interface 1818. In the Internet example, the server 1830 may send the requested code for the application program via the Internet 1828, ISP 1826, local network 1822 and communication interface 1818. According to the present invention, one such downloaded application implements the approach described herein.
[0254]
The received code may be received and executed by processor 1804 and / or stored in storage device 1810 or other non-volatile storage for later execution. In this manner, computer system 1800 may obtain application code that is in the form of a carrier wave.
[0255]
In the foregoing specification, the invention has been described with reference to specific embodiments thereof. However, it will be apparent that various changes and modifications may be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are accordingly to be regarded in an illustrative sense and not in a limiting sense.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating how data is stored through a file system provided by an operating system in a conventional application.
FIG. 2 is a block diagram illustrating how data is stored through a database API provided by a database system in a conventional database application.
FIG. 3 is a block diagram illustrating a system that can access the same set of data through various interfaces including a database API and an OS file system API.
FIG. 4 is a block diagram showing the translation engine 308 in more detail.
FIG. 5 is a block diagram showing a hierarchical index.
FIG. 6 is a block diagram illustrating a file hierarchy that can be emulated by a hierarchy index.
FIG. 7 is a block diagram illustrating a file table that can be used to store files in a relational database, in accordance with one embodiment of the present invention.
FIG. 8 is a flowchart illustrating steps for deriving a path name using a hierarchical index.
FIG. 9 is a block diagram showing the database file server in more detail.
FIG. 10 is a block diagram of a hierarchical index that includes an entry for a stored query directory.
FIG. 11 is a block diagram of a file table that includes a row for a stored query directory.
FIG. 12 is a block diagram showing a file hierarchical structure including a stored query directory.
FIG. 13 is a block diagram showing a file hierarchical structure.
14 is a block diagram illustrating how the file hierarchy shown in FIG. 13 is updated in response to a document update in accordance with one embodiment of the versioning technique described herein.
FIG. 15 is a block showing how the file hierarchy shown in FIG. 13 is updated in response to moving a document from one folder to another according to one embodiment of the versioning technique described herein. FIG.
FIG. 16 is a block diagram illustrating a class hierarchy structure of a file class according to an embodiment of the present invention.
17 is a block diagram showing a relationship table used in the database-implemented file system that implements the file class hierarchical structure of FIG. 16, according to one embodiment of the present invention.
FIG. 18 is a block diagram that illustrates a computer system upon which an embodiment of the invention may be implemented.

Claims

A method of accessing data stored in a database,
The method protocol server, from one or more routines in the operating system, comprising the step of receiving one or more I / O commands, the protocol server is configured to operate in the operating system ,
The one or more routines generate one or more I / O commands in response to one or more calls from an application to the operating system to access a file;
The method
The protocol server converting the one or more I / O commands into one or more DB file system commands;
A DB file server in response to the one or more DB file system commands, generating one or more first database commands and issuing them to a database server that manages the database;
The database server executes the one or more first database commands to retrieve first data from the database to provide a pre-Symbol file generated from the first data to the first application, The method further includes:
Providing a DB file API to a second application;
The DB file server directly receiving one or more second DB file system commands from the second application via the DB file API;
One or more second database commands are generated and issued to the database server in response to the one or more second DB file system commands;
The database server executes the one or more second database commands to retrieve second data from the database, and provides a file generated from the second data to the second application; Method.

The process of providing the file to an application, executed by one or more routines in the operating system, the method according to claim 1.

The DB file server includes receiving a call for performing a plurality of file operations via the DB file API, the plurality of file operations including at least a first file stored in the database; And a second file operation on a second file stored in the database, the method further comprising:
The DB file server, a plurality of file operation, by performing the following steps, see contains the steps to carry out as a single transaction, the steps of this next is
If all file operations of the plurality of file operations complete without failure, making all changes made by the plurality of file operations permanent;
The method of claim 2 , further comprising the step of invalidating all changes made by all of the plurality of file operations if any of the plurality of file operations fails.

The method of claim 3 , wherein the plurality of file operations includes a plurality of write operations for a single file stored in the database .

Performing a plurality of file operation includes the step of issuing one or more databases statements to the database server, the database server of the plurality of file operation by executing the one or more databases Statement The method of claim 3 , wherein the method is performed.

The method of claim 4 , wherein the plurality of write operations correspond to transferring the single file over a network connection for storage in the database.

The protocol server functions as a device driver interface method according to any one of claims 1-6.

One or more computer-readable media storing instructions for accessing data stored in a database, the instructions when executed by one or more processors, cause the following steps:
Step under該以the protocol server, from one or more routines in the operating system, comprising the step of receiving one or more I / O commands, the protocol server is configured to operate in the operating system And
The one or more routines generate one or more I / O commands in response to one or more calls from an application to the operating system to access a file;
The following steps are:
The protocol server converting the one or more I / O commands into one or more DB file system commands;
A DB file server in response to the one or more DB file system commands, generating one or more first database commands and issuing them to a database server that manages the database;
The database server performs the one or more first database commands to retrieve first data from the database to provide a pre-Symbol file generated from the first data to the first application, The following steps further include
Providing a DB file API to a second application;
The DB file server directly receiving one or more second DB file system commands from the second application via the DB file API;
One or more second database commands are generated and issued to the database server in response to the one or more second DB file system commands;
The database server executes the one or more second database commands to retrieve second data from the database, and provides a file generated from the second data to the second application; Computer readable medium.

The process of providing the file to the application, the executed by one or more routines in the operating system, a computer-readable medium of claim 8.

The DB file server includes receiving a call for performing a plurality of file operations via the DB file API, the plurality of file operations including at least a first file stored in the database; A second file operation on a second file stored in the database, the following steps further comprising:
The DB file server, a plurality of file operation, by performing the following steps, see contains the steps to carry out as a single transaction, the steps of this next is
If all file operations of the plurality of file operations complete without failure, making all changes made by the plurality of file operations permanent;
10. The computer-readable medium of claim 9 , comprising: invalidating all changes made by all of the plurality of file operations if any file operation of the plurality of file operations fails.

The computer-readable medium of claim 10 , wherein the plurality of file operations includes a plurality of write operations for a single file stored in the database .

Performing a plurality of file operation includes the step of issuing one or more de chromatography database statements to the database server, the database server the one or more of the plurality of files by executing a database statement The computer-readable medium of claim 10 , wherein the computer-readable medium performs an operation.

The computer-readable medium of claim 11 , wherein the plurality of write operations correspond to transferring the single file over a network connection for storage in the database.

The computer-readable medium according to any one of claims 8 to 13 , wherein the protocol server functions as a device driver interface.