Intelligence,Agents,MultimediaGroup
UniversityofSouthampton,UK
LeslieCarr
Intelligence,Agents,MultimediaGroup
UniversityofSouthampton,UK
TimothyMiles-Board
Intelligence,Agents,MultimediaGroup
UniversityofSouthampton,UK
ArounaWoukeu
lac@ecs.soton.ac.uktmb@ecs.soton.ac.ukaw1@ecs.soton.ac.uk
Intelligence,GaryWills
Agents,MultimediaGroup
UniversityofSouthampton,UK
gbw@ecs.soton.ac.uk
ABSTRACT
TheWebisfullofdocumentswhichmustbeinterpretedbyhumanreadersandbysoftwareagents(searchengines,recommendersystems,clusteringprocessesetc.).AlthoughWebstandardshaveaddressedformatobfuscationbyus-ingXMLschemasandstylesheetstospecifyunambiguousstructureandpresentationsemantics,interpretationisstillhamperedbythefundamentalambiguityofinformationin#PCDATAtext.Eventhemosteasilydistinguishablekindsofknowledgesuchasarticlecitationsandpropernouns(re-ferringtopeople,organisations,projects,products,tech-nicalconcepts)havetobeidentifiedbyfallible,post-hocextractionprocesses.TheWiCKprojecthasinvestigatedthewritingprocessinaSemanticWebenvironmentwhereknowledgeservicesexistandactivelyassisttheauthor.Inthispaperwediscusstheneedtomakeknowledgeanexplicitpartofthedocumentrepresentationandtheadvantagesanddisadvantagesofthisstep.
CategoriesandSubjectDescriptors
I.2.1[ArtificialIntelligence]:ApplicationsandExpertSystems—officeautomation;I.2.4[ArtificialIntelligence]:KnowledgeRepresentationFormalismsandMethods;I.7.2[DocumentandTextProcessing]:DocumentPrepara-tion—markuplanguages
GeneralTerms
HumanFactors
Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprofitorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationonthefirstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecificpermissionand/orafee.
DocEng’04,October28–30,2004,Milwaukee,Wisconsin,USA.Copyright2004ACM1-58113-938-1/04/0010...$5.00.
Intelligence,WendyHall
Agents,MultimediaGroup
UniversityofSouthampton,UK
wh@ecs.soton.ac.ukKeywords
DocumentStructure,SemanticWeb,KnowledgeWriting
1.BACKGROUND
TheinitialdevelopmentoftheWorldWideWebdrewfromanestablishedbodyofresearchintodocumentprocess-ing;thecurrentdevelopmentoftheSemanticWebisassim-ilatingworkfromtheArtificialIntelligenceandKnowledgeManagementcommunities.
1.1DocumentProcessing101
Thecomputermodelofdocumentationhasevolvedfrom80-columnASCIIfiles,throughvariouskindsofpresenta-tionalmarkup(TEX,troff)toso-calledstructuralmarkup(LATEXandSGML)[2].Theaimhasbeentoenablecom-puterstoprovideasrichapresentationstyleaspossibleforinformation(thatusesfonts,graphics,colourandlayoutstructuresaseffectivelyaspossibletoimplementvisualde-sign)andthentomakethatpresentationspecificationasin-dependentofthedocumentcontentsaspossible.Thisisof-tenachievedthroughtheuseofseparatelystoredstylesheetsreferredtobytheattributesofcontentelements.
Theresultisadocumentthatcanbeinterpretedeffi-cientlybyhumaneyes.Theeffectofthestyleseparation(orparameterisation)istoallowthedocumenttobere-producedindifferentdisplaycontexts(withdifferentpub-licationregimes,suchasA4report,gatefoldbooklet,webpage,poster)orreprocessedfordifferentinformationenvi-ronmentsanddatabases.TheuseofXMLontheWebre-inforcesthisseparationbetweencontentandpresentation,andallowsbothhumanauthorsandsoftwareagentstofocusontheinformationthatistobecommunicated,ratherthanonthewayinwhichitistobedelivered.
1.2SemanticWeb101
TheaimoftheSemanticWeb[3]istobuildonthiscontent-neutralplatformtoprovideanenvironmentinwhichcompu-tationalagentscanunambiguouslydeterminethemeaningofaresource,tomaketheWebanenvironmentinwhichsoftwareagentsandhumanscanmakebetter(reasoned)useoftheavailableresources.
Figure1:Thecontextinwhichadocumentisread.ThekeycomponentsoftheSemanticWebare(a)agreedmodels(ontologies)oftheobjectsandrelationshipscon-tainedinthedocuments(b)formallyspecifiedontologylan-guagesforunambiguouslycodifyingtheseagreedmodelsand(c)anannotationmechanismforidentifying(partsof)Webdocumentswithconceptsfromrelevantontologies.
1.3AnnotatingforPresentation
forSemanticsandAnnotating
Presentational-andsemantic-processingarenotdissim-ilar.ThefundamentalconceptsinvolvedwithindependentlayoutandformattingprocessesasdevelopedthroughODAandDSSSLandwhicharenowembodiedinCSSandXSL-FOarepartsofcommunity-agreedontologies,expressingtheconceptsofapage,margin,border,block,paragraphetc.Theparticularlanguagesinwhichtheyareexpressed(CSS,XSL)arelooselyanalogoustotheontologylanguagesoftheSemanticWebinthattheydescribethecompositionandusefulconjunctionofthe(display)concepts.RDFsmecha-nismforconjoiningconceptandcontent(viastructuread-dressingandnamespacereference)hasonecounterpartinthedocumentprocessing:theexternallink(XLinkorHy-Time).
Stylesheetseitherachievethisimplicitlybysubvertingparticularcontentnodes(CSSusesstyleorclassattributes)orbydeclaringageneralisedcontent-visitingprocess(XSLT).ItwouldnotbeimpossibletointerchangemechanismsandtoidentifyindividualelementsforstylesheetprocessingwithRDF.Conversely,itwouldbepossibletodefineaconceptattributeforcontentelementstoexplicitlyaddontologicalsemanticstothedocumentcontent.Infact,thisisanap-proachundertakenbytheearliestWebontologysystemssuchasSHOE[14].
1.4AdaptiveandActiveDocuments
Adocumentisoftenconsideredasstructurepluscontent,andconstructedatalowlevelfromanumberofdifferentstorageentitiesorXIncludedfragments(figure1).Presen-
tationspecificationsaremappedontostructureorspecifiedinit.Hypertextlinksarespecifiedsimilarly,butsomelinksemanticsmayaffectthecontentinclusion(e.g.XLink’sactuate=\"auto\"show=\"embed\").
Allofthesedevicesareavailabletotheauthor,thevisualdesignerorthewebsitemanager.Typically,theauthorisconsideredtohavecontroloverthecontentofthedoc-umentratherthanitsdisplay.However,thecontentitselfmayincludedirectiveswhichgenerateormanipulatedocu-mentcomponents.Thesemaysimplyaddtimestamps,nav-igationalpanelsoradverts,buttheycanalsobeusedtoadaptivelyinsertordeleterelevantsubjectmaterial[4].Activedocumentsaredocumentswhichcontainoraredi-rectlyassociatedwithexecutableprogramcodeorscripts.Placelessdocuments[7],forexample,haveactivepropertiesinadditiontoregularproperties(title,author,etc.).Aswellasanameandvalue,eachactivepropertyincludesathirdcomponent—code,whichrunsinresponsetovariousactionsuponthedocument(e.g.readContent,writeCon-tent,deleteDocument).Activepropertiestakefunctionalitynormallyassociatedwithspecificapplications,suchaswork-floworcontentconversion,andassociatethemdirectlywiththedocumentsothattheytravelaroundwithit(e.g.byemail).Microsoft’s“smartdocuments”areanexampleofactivedocuments:“availableinMicrosoftOfficeWord2003andMicrosoftOfficeExcel2003,smartdocumentscontainprogramminglogicthatdefinesthewaydocumentsareused,andcontrolsthewaythedatainthedocumentscanbema-nipulated”[19].
2.DOCUMENTPARADIGMFORTHE
SEMANTICWEB
Howeveradocumentisconstructed,andwhateverflexi-bilityisprovidedinitsstorageandconstruction,thedefin-ingfeatureofadocumentisanauthor;adocumentistheboundedcommunicationofahumanindividual.Itisthecontentidentifiedataparticulartime,withaparticularmeaningforaparticularpurpose.Thewayinwhichthismeaningandpurposeisrepresentedwithinthedocumenthasadirectimpactonthesuccessful(orotherwise)inter-pretationofthecontentbyitsconsumers.
InthecaseoftraditionalplainASCIIdocuments,althougheasilyprocessablebybothhumanandsoftwareagents,knowl-edgeisrepresentedimplicitlyandhastobeextractedbyanalysingthetext(e.g.byusingNLPtechniques);subse-quentlyinterpretationofthedocumentcontentisopentoerrorandambiguity.InthecaseofPostscriptorAdobePDFdocuments,softwareagentshavetoextractthetextualcon-tentfromdrawingcodebeforeprocessingit.LATandHTMLmarkupmaygivesoftwareagentssomeEXmacros
cluesastothestructuralsemanticsofthetext,butareultimatelyambiguous.
XMLreducesambiguityindocumentsbymakingthedoc-ument1structureexplicitthroughaschema,forexampleDoc-Book.Theinterpretationofthedocumentstructureisthereforeunambiguous,buttheconceptsandideaswithinthetextitself—the#PCDATA—areunlikelytohaveschematicequivalents.Theseneedtobeinplacetofacilitatethein-terpretationofthedocumentatafinergranularitythanitsstructuralblocks.
RDFfacilitatesthisbyassociatingknowledgewithtextspansatthecharacterlevelviaXPointers.However,thispointingmechanismbecomesaproblemwhenthecontentofthetargetdocumentischanged,potentiallycausingXPoint-erstoreferencethewrongtextspan,ortobecomeun-resolvable(‘broken’).ThisproblemhasbeentermedtheeditingproblembytheOpenHypermediacommunity[6].Astraightforwardsolutiontothisproblemistoembedtheknowledgemarkupinthecontentitself,atleastwhilethedocumentsisbeingauthoredorundergoingeditorialchanges,sothattheknowledgesurvives(e.g.,bymovingwiththetextascontentisinserted,deleted,orcopiedandpastedtoanewlocationinthedocument).
SHOEandXMPareexamplesofformatsforembeddingknowledgeindocuments.SHOE[14]extendsHTMLwithahandfulofextratagsfordescribingthedocumentasawholeinrelationtoadomainontology.Adobe’sXMP[1],al-lowsRDFconstructstobeembeddedinHTML,PostScriptandPDFdocuments,aswellasTIFF,JPEG,GIF,PNGandallAdobeformats(Photoshop,Illustratoretc.).XMP-compliantapplicationsprovidebuilt-insupportforafunc-tionalsetofXMPschemasincludingDublinCore,rightsmanagement,andEXIF,althoughnewschemascanalsobedefined.However,theXMPportionwithinthedocumentrepresentationisseparatefromthecontent,sointhecaseofaPDFforexample,afragilepointermechanismwouldstillberequiredtoassociateknowledgewithspecificspansoftextduringauthoring/editing.Onesolutionofferedbyanumberofknowledgewritingtoolsistouseanad-hocrepresenta-tionalformatforembeddedknowledgeduringtheauthoringoreditingofadocument,whichcanbeextractedtoaformalrepresentation(e.g.SemanticWord[21]—DAML+OIL,SemTalk[8]—RDFS/DAML)whenthedocumentispub-lished.Suchtoolsarethesubjectofthenextsection.
3.
CREATINGSEMANTICWEBDOCUMENTS
ForknowledgemarkuptobecomeanexplicitpartofSe-manticWebdocumentrepresentation,wealsoneedtocon-siderhowthisstepcanbesupportedduringtheprocessofauthoringnewdocuments.
InanattempttomakepartsoftheWebcorpusamenabletomachine-processing,muchemphasiswithintheSemanticWebcommunityhasbeenonbuildingtoolsfortheman-ualannotationofexistingdocuments[11].However,recog-nisingthatpost-hocmanualannotationisdifficult,time-consuming,anderrorprone,researchershaveturnedtopro-ducingautomaticorsemi-automaticmethodsforaddingknowledgeannotationstoexistingdocumentsinordertomakethisprocessmorefeasible[17,23,12].Amilcare[5],forexample,usesthegeneralisedNLPruleslearntfromanannotatedtestcorpustocreatenewknowledgeannotationsonanunseencorpus.Amilcarehasbeenusedasafullyautomaticprocess,andalsoasa‘suggester’wherethean-notationsarerevisedbyahumanannotator(theserevisionsmayinturnfinetunetheextractionrules).
However,whilethispost-hocminingexerciseworkswellasanapproachfor‘enabling’alegacycorpus,itisnotaparadigmthatshouldbeusedinthecaseofcreatingnewdocuments.Semanticannotation,overseenbytheauthor,shouldbeanessentialandactivecomponentofthewrit-
ingprocessitself,shiftingtheresponsibilityforknowledgemarkupfromsystem/annotatortoauthor.
TheWritingintheContextofKnowledge(WiCK)projectaimstoproducetoolswhichcapturetheknowledgeandin-tentionsoftheauthorratherthanjustcapturingtheau-thor’skeystrokesandattemptingtoguessafterwardswhattheyactuallymean—inshort,tofacilitatetheauthorincommunicatinghisorherideasclearlyandunambiguouslytohumanandmachineinterpretersintheSemanticWeb.Anumberofotherapproacheshavealsoinvestigatedthissimultaneousauthoringofcontentandsemanticmarkup,whichwedescribebrieflybeforeturningtothecontributionsoftheWiCKprojectitself.
CREAM[10]allowsanauthortobuildnewdocumentsbydragginganddroppingknowledgefragmentsfromanon-tologybrowserintoatexteditor—forexampleadroppedinstanceslotinsertsatextrenderingoftheslotvalue;adroppedrelationshipslotinsertsashortsentencecompletewithlinkstoeachoftherelatedinstances.Inbothcases,knowledgeisembeddedinthedocumentalongsidethein-sertedtext.
Knowledgemarkupatauthoring-timedoesnotprecludetheuseofinformationextractiontechniquessuchasthatdemonstratedbyAmilcare.Extractedknowledgecanbeusedtoofferrelevantservicestotheauthorinordertoassistthewritingtask.Forexample,ARIA[18]supportsemailorwebpageauthoringbasedonasemanticallyannotatedphotodatabase.Bycontinuouslymonitoringthetexttypedbytheauthoragainstadomainontology,ARIArecommendsphotosfromthedatabasethatseemappropriatetoillustratethevariousfacetsoftheunfoldingnarrative.
Thepotentialresearchandcommercialbenefitsofbring-ingtheseknowledge-awareprocessesintotheofficearenahavenotgoneunnoticed.MicrosoftWord,forexample,isthemostoftenadoptedproductforauthoringtextdoc-uments[21];authorscanthereforeadoptnewknowledge-awareextensionswithoutlearninganewproductionenvi-ronmentandwithoutsacrificingfamiliarfeatures[22].Se-manticWord[21],aMicrosoftWord-basedenvironment,addsseveraltoolbarstothestandardinterfacewhichsupportthecreationofsemanticannotationsaccordingtoselectedon-tologies(localorimportedfromtheSemanticWeb).Usingthesetoolbars,authorscanassociateatextregionwithaninstanceofaclass(aninstancereference),ordescribethecontentofatextregionwithacollectionoftriples—bothtypesofannotationareembeddedinthetextandcanbedi-rectlymanipulated.Annotationsare“carriedover”intextcut/copyandpasteoperations,facilitatingalevelofknowl-edgereusebetweendocuments.Aswellasknowledge-richdocuments,theSemanticWordtoolbarscanalsobeusedtocreateannotatedtemplates,thusspeedingupcontentandannotationproductioninfrequentlycreateddocuments.SemanticWordalsooffersamoreproactiveinformationex-tractionfeaturewhichtheauthorexperiencesthroughtheMicrosoftSmartTagsinterface.AswithARIA,theauthor’skeystrokesaremonitoredbyaninformationextractionpro-cesswhichrelatesnamedentitiesinthetexttoontologyinstancesandtypes,visuallyhighlightingtherecogniseddtextinthedocument.Theauthorcanthenexaminethehighlightedentitiesandconvertthemintoinstancereferenceannotations.
Althoughprovokingarangeofreactionsuponitsrelease[15],SmartTagtechnologyhasalsobeenadoptedbyotheroffice-
basedknowledgewritinginitiatives,includingSemTalk[9]andOntoOffice[20].AswithSemanticWord,recognisedconceptsandinstancesarehighlightedwithSmartTags.However,thekindsofactionoffereddiffersbetweensystems:inSemTalk,forexample,theauthorcanaccessandedittheunderlyingontologicalmodel;inOntoOffice,asearchforcontext-relevantdocumentscanbeinitiated.
Goingbeyondsimplysupportingknowledgewritinginthecontextofanunderlyingontology,theWiCKprojecthasattemptedtobuildontheseinitiativesbyconsideringanofficeenvironmentinwhichseveralknowledge-basesandknowledge-awareservicesexistandactivelyassisttheau-thorbyprovidingtargetedknowledgethatwouldotherwiseneedtobesearchedforbothmanuallyandindividually.
4.
WICKOFFICE:AKNOWLEDGEWRITINGENVIRONMENT
Aswehavearguedinprevioussections,authorsofnewSe-manticWebdocumentsfaceanadditionalresponsibilityofexplicitlyembeddingknowledgemarkupintheirdocuments.However,justasauthor’sworriesaboutlinkingresponsibil-itieswhenwritingfortheWebwerealleviatedbysearchen-ginessuchasGoogle,thepresenceofsuchknowledge-awareservices(forexample,SemanticWord’sknowledgemarkupas-you-type)andresources(forexample,OntoOffice’ssearch-abledocumentstore)isessential.TheWiCKprojecthasinvestigatedthewritingprocessinaSemanticWebenviron-mentwhereknowledgerepositoriesandservicesexistandactivelyassisttheauthorinproducingadocumentwithex-plicitembeddedknowledge.Inordertodemonstrateourapproach,weconsiderabusiness-typescenariowhereanauthoristaskedtoproduceafundingproposalforaproject.Byanalysingandsubsequentlymodellingtheknowledge‘flow’inthisscenariowecandemonstratethebenefitsthataknowledge-awareofficeenvironmentcanprovide.
4.1Scenario
Thetaskofwritingafundingproposaliscommoninin-dustrialandcommercialenvironments;here,weconsiderahypotheticalfundingproposalforaresearchprojectinanacademicenvironment.TheproposalisdirectedattheUK’sEngineeringandPhysicalSciencesResearchCouncil(EP-SRC),whichhasawell-definedprocedureforsubmitting,reviewing,andselecting2proposalsforfunding,andprovidesastandardform(theJe-SRP1)andacomprehensiveguid-ancedocument3onhowtofillouttheform,createthesup-plementarydocumentation,andsubmititforconsideration(table1).
TheJe-SRP1formitselfservesasanadministrativesum-maryoftheresearchproposal,collectingtogethertherel-evantinformationaboutthehostingorganisation,projectinvestigators,projectpartners(forjointproposals),refer-ees,staff(includingvisitingresearchers),andtravelandequipmentcosts.The‘meat’oftheproposaliscontainedinthesupplementarydocument—theCaseforSupport—thecompositionofwhichistightlydefinedintheguid-ancenotes.TherulesfortheCasedefinetheformatting(constraintsonpagelength,fontsizesetc.),theinforma-
Figure2:TheproposedWiCKOfficeknowledgewritingenvironment.
longastheprojectlifetime).Thesesimpleconstraintscaneasilybemodelledasverificationconditionsondataentry,orasqueriesupontheknowledge-basetoselectanappro-priatelistofchoices.De-constructingtheforminthiswaythereforeprovidesanoutlineproposalontology,withtheGuidanceNotesdocumentsupplyingtheconstraints.
CreatingtheCaseforSupportdocumentismoreinvolved,astheauthorisrequiredtoconstructatext,ratherthanenterdataintoclearlylabelledspacesonaform.HowevertheGuidanceNotesdocumentindicatesveryclearlythekindofinformationthatisexpectedineachpartofthedocument.ExaminingthebulletpointswhichgiveinstructionsforPart1oftheCaseforSupport,wecanseewhatbasicinformationisrequiredfromtheknowledge-base,inadditiontothekindofprocessingandanalysiswhichwouldneedtobeperformedonit:
Provideasummaryoftheresultsandconclusionsofre-centworkinthetechnological/scientificareawhichiscoveredbytheresearchproposal.IncludereferencetobothEPSRCfundedworkandnon-EPSRCfundedwork.Detailsofrele-vantpastcollaborativeworkwithindustryand/orwithotherbeneficiariesshouldbegiven...Thisspecifiesaliteraturere-view;theknowledgeisdescribedbythesubjectandresearchontologies.Asimplequeryoftheknowledge-baseordigitallibrarywouldprovidealistofpotentiallyrelevantpapers,butamoreadvancedreasoningagentwouldberequiredinordertoassisttheauthorinevaluatingtherelativesignifi-canceoftheprojectsandpapers.
Part2oftheCaseforSupportrequiresadifferentkindofknowledgesupport,forinstancewithintheProgramandMethodologysection:Identifytheoverallaimsoftheprojectandtheindividualmeasurableobjectivesagainstwhichyouwouldwishtheoutcomeoftheworktobeassessed.Thisin-formationdoesnotexistintheknowledge-base;itisinventedasanintegralpartofthecreationofanewresearchunder-taking.However,authorsmaybeassistedbyseeingtheaimsandobjectivesofsimilar,recentorsuccessfulprojectpropos-als,especiallyiftheydonothavemuchexperienceofpro-posalwritingtodrawon.Inotherwordsalackofpersonalexperiencecouldbesupplementedbydirectedbrowsingofaninstitutionalmemory.
ThisbriefexaminationoftheEPSRCGuidanceNotesforaprojectproposalsshowshowheavilythewritingprocess(bothapparentlyfree-textcontentcreationandinformationrecall)isconstrainedandspecifiedbytheappropriateon-tologies,openingthepossibilityofsubstantivehelpfromasuitablyequippedknowledgeenvironment.
4.2ProposedArchitecture
Figure2illustratesourproposedknowledge-awareofficeenvironment,WiCKOffice,designedinresponsetotheop-portunitiesforfunctionalityidentifiedintheprevioussec-tion.Inthisenvironment,knowledgeismanagedbytwoknowledge-bases,bothbasedontheAKT3Storeplatform[13].TheAKTknowledge-basemodelstheUKHigherEducationcomputersciencecommunity4(expressedusingtheAKTReferenceOntology5),andhenceprovidesasuitableresearchontologyforourpurposes.AWiCKknowledge-basehoststheadditionalontologies.Instancesfortheprojectandpro-posalontologiesareacquiredfrompreviousEPSRCprojectproposals;weenvisionSemanticWebagentstrawlingdigitallibraryarchivesandautomaticallyconstructingandpopu-latingthesubjectontology.
WiCKextensionstotheMicrosoftOfficeenvironment(bothVBAandCOM-based)utilisekeycomputationalknowledgeservicestoassistthewritingtask(inaccordancewiththe‘writingcontext’),andtoupdatetheknowledge-baseswhenthewritingtaskiscompleted(forexample,newpropos-alsbecomingpartofthe“institutionalmemory”).Explicitknowledgerepresentationintheproposaldocumentsmakesthelatterastraightforwardprocess.
5.WICKOFFICEPROTOTYPE
Basedontheopportunitiesforfunctionalityidentifiedintheprevioussection,ourmodellinganddevelopmenteffortstodatehaveproducedacoherentWiCKOfficeenvironmentinwhichseveralknowledgeservicesareavailabletoauthors.Aknowledgefill-inserviceandknowledgerecallservicearemotivatedbytheneedtoprovidetimelyandconvenientac-
Figure3:AugmentingJe-SRP1templatewithexplicitstructuralsemanticsfacilitatesassistedknowledgefill-in.
cesstoknowledge,whichwouldotherwisehavetobemanu-ally‘lookedup’ontheinstitutionalintranet.Athirdservice,in-lineguidelines,alsoassistsrecallbyexposingguidelinesandconstraintscapturedfromadesignspecification(inthiscase,theEPSRCguidancenotes),thatarerelevanttothepartoftheproposaldocumentcurrentlybeingworkedon,viatheMicrosoftOfficeAssistantinterface.
1B,2B,and3B).Eachindividualformfieldismarkedupwiththreeattributes—theIDofthesub-formtowhichthefieldbelongs,abooleanvalueindicatingwhetherthatfieldisapreferredsearchfield(inthecaseoftheJe-SRP1,thePI’sfirstnameandsurnamearegoodsearchtermsforapersoninstanceintheresearchontology;knowingthePI’stitlemaynotsohelpful),andfinallyafilled-in-byattributewhichidentifiestheslotofthematchingknowledgeinstancewhichshouldbeusedtoactuallyprovideavalueforthefield.Whentheauthorpartiallyfillsinasub-form(figure4a)andpressesthe“Fill-In”button,theXMLstructureofthedocumentisconsultedtodeterminewhichfieldsarepartofthecurrentsub-form(andalsowhichfieldsarepartofothersub-formsthatsharedatawiththecurrentsub-form).Fieldsinthecurrentsub-formwithanis-search-fieldattributevalueoftruearethenusedbytheknowledgefill-inservicetoconstructanRDQLquerytoextractmatchesfromtheresearchontology.Inthecasethatmultipleinstancesmatchthequery,theseinstancesarepresentedtotheauthorwhochoosestheappropriatematch.Finally,thefilled-in-byattributeisusedtomaptheslotvaluesofthereturnedin-stancetoeachassociatedfield(figure4b);theURIofthematchinginstancefromtheresearchontologyisalsoembed-ded.
Recently,theEPSRCrolledoutitsownassistedformfill-ingsystem,theJe-S1e-form6,whichprovidessomeequiva-lentfunctionalitytothisservice.Providedthateachpartyhaspreviouslyregisteredtheirdetailswiththesystem,theauthorcanselectthehostorganisation,principalandco-investigators,refereesandotherstafffromchecklistsandthendownloadapartiallycompletedJE-SRP1formwhichcontainsalltherequireddetailsoftheselectedparties,butstillrequiressomeunaided‘mandraulic’efforttocompleteinfull.Bycontrast,wearguethattheWiCKOfficeapproachofleveragingthefunctionalityofmultipleservicesoperat-ingoverdiverseknowledgesources(including,butnotre-strictedto,employeedataandinformationharvestedfrom
5.1FillingInForms
Theknowledgefill-inserviceassiststheauthorinfillingintheJe-SRP1form.Forexample,theauthorcanspecifythe(partial)nameofthePrincipalInvestigatorandinstructtheservicetoretrieveappropriate(incontext)instancesfromtheknowledge-basetoautomaticallyfillintheremainderoftherequiredinformation.
Themajorityoftheinformationrequiredtoprovideanassistedknowledgefill-inservicefortheJe-SRP1formisalreadyprovidedbytheAKTReferenceOntology(there-searchontologyinourscenario).However,leveragingthisserviceisnotassimpleasfillingeachpartoftheformwithanappropriateinstanceselectedfromtheresearchontol-ogy—differentpartsoftheJe-SRP1form“share”dataaboutthesameconcept.Forexample,informationrelatingtothePrincipalInvestigatormustenteredinthreediffer-entlocations:section1B(page1)requiresthePI’stitle,name,organisation,department,andcommitmentstootherprojects;section2B(page12)requiresthePI’sname(fortheproposaldeclaration);andsection3B(page13)requiresthePI’scontacttelephonenumber,emailaddress,faxnumber,etc.
InlinewithourparadigmforSemanticWebdocuments(section2),wehaveusedMicrosoftOffice2003’snew“smartdocuments”featuretoaddsemanticstructuretotheother-wiseunstructuredJe-SRP1templateintheformofanXMLSchemaderivedfromthedocumentontology.TheXMLSchemaidentifieseach‘sub-form’oftheJe-SRP1andgroupstogetherrelatedsub-forms(thus,forexample,describingthefactthatinformationaboutthePIissharedbysub-forms
a.Authorfillsinpartialdetails.
b.Allsub-formssharingdatawithcurrentsub-formare
populatedfrommatchinginstance.Figure4:Usingtheknowledgefill-inserviceviatheWiCKOFficetoolbar.
personalwebpagesanddirectories)notonlyallowsauthorstobeaidedinfillinginallaspectsoftheJe-SRP1formbutalsopotentiallyofferswiderapplicability(addingnewtypesofformrequiresonlythatform’ssemanticstructurebeelicitedaccordingthedocumentontology)thanadata-basedapplication.
5.2KnowledgeRightTime
InTheRightPlaceAtThe
Theknowledgerecallserviceassiststheauthorinquicklyandconvenientlyrecallingappropriateknowledgefromtheresearchenvironment.Example(contextual)queriesinclude“whatpapersrelevanttothisproposalhavebeenpublishedrecently?”,or“whatrelevantprojectshasthispersonworkedon?”.Inresponsetosuchqueries,appropriateknowledgefromtheknowledge-basesisselectedandinserteddirectlyintothedocumentintheformof‘potted’summaries.
Aswiththeknowledgefill-inservice,theAKTRefer-enceOntologyprovidesthemajorityofknowledgeutilisedbythisservice.Inthecurrentimplementation,giventhenameofarecognisedperson,projectorplace,theknowl-edgerecallserviceassiststhewriterinrecallingfactsaboutit.WehaveseenthatrecentincarnationsofMicrosoftOf-ficealreadyprovideamechanismforrecognisingtermsandpresentingavailable“actions”associatedwiththattermtotheuserintheformofSmartTags.However,inCaseforSupportdocument,theauthor’sinformationrequirementsdependonthesectionorpartofthedocumentcurrentlybe-ingworkedon.Forexample,theauthormightexpectthattyping“LesCarr”inthePreviousResearchsectionwould
a.Namerecognisedasauthortypes.
b.AvailableactionsinPreviousResearchsection.
c.Availableactionsforrecognisedtext”WendyHall”in
Referencessection.Figure5:Usingtheknowledgerecallservice,viatheWiCKOfficeSmartTag.
makeavailableoptionsto“auto-summarise”orbrowsethosefacetsofLesCarr’spreviousresearchhistorymostrelevanttothecurrentproposal,whereastyping“LesCarr”intheReferencessectionwouldmakeavailableoptionstoinsertLesCarr’smostrecentandrelevantpublications,andtyp-ing“LesCarr”intheResearcherCurriculumVitaesectionwouldmakeavailableoptionstoinserta“miniCV”withinformationappropriatetotheproposal(withappropriateembeddedknowledgemarkupineachcase).However,priortothereleaseofMicrosoftOffice2003,theactionsmadeavailablethroughSmartTagshavebeenstatic;Office2003allowsthesetofavailableactionstobedetermineddynami-callywhentheauthoractivates(clickson)aSmartTag[16].AnXMLSchemaderivedfromthedocumentontologyisagainusedtomakeexplicitthestructuralsemanticsoftheCaseforSupportdocument.WhentheauthoractivatesaWiCKSmartTagbyclickingonahighlightedterminthetext,theXMLstructureofthedocumentisconsultedtoworkoutwhichpartofthedocumentthetextappearsin(e.g.Background,References)andtheactionsofferedbyavailableserviceswhichareappropriatetothetypeofknowledgerequiredinthatsectionarepresented(figure5).
Wethereforedescribethisserviceasprovidingknowledgeintherightplace(theauthor’scurrentlocationinthedocu-ment)attherighttime(whenanameofarecognisedperson,placeorprojectistypedbytheauthor).Againweantici-patethewiderapplicabilityofthistypeofservicebeyondthespecificsofourscenario;withappropriateknowledgesources,services,anddiscoverymechanismsinplacethis’rightplace,righttime’writingparadigmcanbeappliedtootherwritingtasks.
5.3PlannedFutureServices
Twofurtherknowledge-basedservicesarecurrentlyun-derdevelopmentwithintheprojectproposalwritingsce-nario.Anaugmentedexperienceserviceprovidestheauthorwithaccesstothe“institutionalmemory”ofpreviousre-searchproposals,therebyaugmentingtheauthor’sownex-perienceofproposalwriting(“whatworks?whatdoesn’twork?”).Forexample,theauthorisassistedinevaluatingthemostimportantbeneficiariesoftheproposedresearchbybeingshownthebeneficiariesputforwardbyotherpropos-als(withanindicationastowhetherthoseproposalsweresubsequentlyapprovedorotherwise).
Anassistedwritingserviceattemptstoassisttheauthorinmakinghigher-leveldecisionsaboutrelevantcontenttoincludeintheproposalbysuggestingappropriateinstancesfromthesubjectontology(forexample,relevantprojects,papers,resources)basedonboththewritingcontextandthetextthattheauthorhasalreadywritten.Forexample,thisserviceusesaninternalreasoningenginetodetectthatal-thoughtheauthorhasreferredtoanumberofknowledgeac-quisition-relatedprojectsintheBackgroundsection7oftheCaseforSupport,oneparticularly‘significant’projecthasnotyetbeenmentioned,andsoofferstocreateasummaryoftheprojectfromtherelevantinstancesintheknowledge-base(gatheringdetailsofkeypersonnelandpublications)andinsertstheknowledge-annotatedinformationintotheappropriatesectionsoftheCasedocument.
6.CONCLUSIONSANDFUTUREWORK
Inordertoallowdocumentstobeunambiguouslyinter-pretedbybothhumanreadersandsoftwareagents,knowl-edgeshouldbeanexplicitpartofdocumentrepresentation.Ratherthanbeingtheresultofanimprecise,after-the-factactivity,knowledgeelicitationinSemanticWebdocumentscanbeanexact,author-assistedprocess.However,insteadofmanifestingthisadditionalresponsibilityasanextrapro-cessofannotation,knowledgeelicitationcanbeanindistin-guishablepartoftheauthoringprocess.Infacttheknowl-edgeelicitationprocesscanactuallyhelptheauthor(ored-itorialstaff)ratherthanaddinganextraburden,byprovid-ingasynthesisofarangeoftargetedbackgroundmaterialthatwouldotherwiseneedtobesearchedforbothmanuallyandindividually.Thispaperhasintroducedourcontribu-tiontothisprocess,WiCKOffice,aknowledge-writingenvi-ronment.Inthecontextofaprojectproposalwritingsce-nario,WiCKOfficedemonstratesthatwithasuitablesetofontologiesandasupportiveknowledge-awareenvironment,anauthorcanbeassistedinproducingexplicitknowledgedocuments.
[8]C.Fillies.OnVisualizingtheSemanticWebinMS
Office.InProceedingsofthe6thInternationalConferenceonInformationVisualisation(IV’02),London,England,pages441–446,2002.
[9]C.Fillies,G.Wood-Albrecht,andF.Weichardt.A
PragmaticApplicationoftheSemanticWebusingSemTalk.InProceedingsoftheEleventhInternationalWorldWideWebConference,Honolulu,Hawaii,USA,pages686–692,2002.
[10]S.HandschuhandS.Staab.Authoringand
AnnotationofWebPagesinCREAM.InProceedingsoftheEleventhInternationalWorldWideWebConference,Honolulu,Hawaii,USA,2002.
[11]S.Handschuh,S.Staab,andA.Maedche.CREAM—
Creatingrelationalmetadatawithacomponent-based,ontology-drivenannotationframework.InProceedingsoftheFirstInternationalConferenceonKnowledgeCapture(KCAP2001),Victoria,B.C.,Canada,Oct.2001.
[12]S.Handschuh,S.Stabb,andF.Ciravegna.S-CREAM
-Semi-automaticCREAtionofMetadata.In
Proceedingsofthe13thInternationalConferenceonKnowledgeEngineeringandKnowledgeManagement(EKAW’02),2002.
[13]S.HarrisandN.Gibbins.3store:EfficientBulkRDF
Storage.InProceedingsofthe1stInternational
WorkshoponPracticalandScalableSemanticSystems(PSSS’03),SanibelIsland,Florida,pages1–15,2003.[14]J.Heflin,J.Hendler,,andS.Luke.ReadingBetween
theLines:UsingSHOEtoDiscoverImplicit
KnowledgefromtheWeb.InProceedingsofthe1998ConferenceonArtificialIntelligence,1998.[15]G.HughesandL.Carr.MicrosoftSmartTags:
Support,ignoreorcondemnthem?InProceedingsoftheACMHypertext2002Conference,Maryland,USA,pages80–81,2002.
[16]C.Kunicki.What’sNewwithSmartTagsinOffice
2003.MSDNOfficeTalk,2003.Availablefromhttp://msdn.microsoft.com/library/en-us/dnofftalk/html/office01022003.asp.
[17]T.LeonardandH.Glaser.Largescaleacquisitionand
maintenancefromthewebwithoutsourceaccess.InProceedingsoftheKnowledgeMarkupandSemanticAnnotationWorkshop(K-CAP2001),pages97–101,2001.
[18]H.LiebermanandH.Liu.AdaptiveLinkingbetween
TextandPhotosUsingCommonSenseReasoning.InProceedingsoftheConferenceonAdaptive
HypermediaandAdaptiveWebSystems,Malaga,Spain,pages2–11,2002.
[19]MicrosoftCorporation.SmartDocumentsinMicrosoft
Office2003:SummaryTechnicalWhitePaper.
http://www.microsoft.com/technet/prodtechnol/office/office2003/operate/smrtdcsy.mspx,2003.[20]ontopriseGmbH.OntoOfficeTutorial.http://www.
ontoprise.de/documents/tutorial
因篇幅问题不能全部显示,请点此查看更多更全内容
Copyright © 2019- baoquwan.com 版权所有 湘ICP备2024080961号-7
违法及侵权请联系:TEL:199 18 7713 E-MAIL:2724546146@qq.com
本站由北京市万商天勤律师事务所王兴未律师提供法律服务