Archive for the ‘Uncategorized’ Category

Talk: “A new lemmatizer that handles morphological changes in pre- in- and suffixes alike” by Bart Jongejan

Thursday, November 11th, 2010

A new lemmatizer that handles morphological changes in pre- in- and suffixes alike
talk by Bart Jongejan, CST, University of Copenhagen, Tuesday, May 6, 2008, at 13.00-14.45, sammanträdesrummet 7501, Forum, DSV, Kista.

In some Indo-European languages like English and the North Germanic languages, most words can be lemmatized by removing or replacing a suffix. In languages like German and Dutch, on the other hand, lemmatization often proceeds regularly by removing, adding or replacing other types of affixes and even by combinations of such string operations.

The rules for the new lemmatizer are created by automatic training on a large sample set of full form – lemma pairs. An attempt was made to allow a rule-based attribution of a word to more than one lemma (when appropriate), but this had to be given up. The current implementation produces one lemma per word when the lemmatization rules are applied and relies on an optional built-in dictionary to produce additional correct lemmas of known words only.

The first results of tests show that the new lemmatizer probably has a higher accuracy than the former CSTlemma software, even with languages that have mainly suffix morphology, but that the errors it makes sometimes may be “more wrong” than the errors made by the old CSTlemma software.

AVID-Deidentifying Swedish Medical Records for Better Health Care submitted April 15, 2008 to Vetenskapsrådet

Thursday, November 11th, 2010

Within hospital care there has been an explosion in the production of medical record data. A large amount of this data is unstructured free-text that is almost never reused. Our research group will soon have access to more than one million medical records from the Stockholm City Council. Currently, we already have access to 5 000 medical records within rheumatology. Unfortunately the free-text of the medical records very often contains misspellings, syntactical errors as well as plenty of unknown abbreviations and is therefore difficult to process by computers. In order to use the free-text corpus for research purposes it is also necessary to deidentify the texts since they typically contain information that can identify the individual patient. In this project we will therefore normalise and deidentify the medical records and we expect to reach 99 percent deidentification. When this is carried out we and the research community have the possibility to use human language technology tools such as text mining and text extraction methods to find previously unchartered relations between diseases, medical treatment, age, occupation, social situation, etc. One primary goal with this project is thus to make it possible for researchers in medicine to use the abundant digital textual information that is available in medical records. Such research has never previously been carried out in Sweden, and is unique due to the kind of and large amount of textual data being used.

Popular scientific description in Swedish:

Avidentifiering av patientjournaler fo?r ba?ttre ha?lsova?rd

Inom sjukva?rden produceras ett mycket stort antal digitala patientjournaler av la?kare och sjuksko?terskor. Journalerna inneha?ller information om patientens allma?ntillsta?nd, symptom, diagnos och behandling. Dessa patientjournaler inneha?ller tillsammans va?rdefull information och sa?rskilt delar i fritext som inte alls utnyttjas i den medicinska forskningen. Vi har tidigare gjort experiment pa? 5 000 avidentifierade patientjournaler inom reumatologi och hittat tva? problem:

Ett problem a?r att journalerna trots att de har avidentifierats fo?r att kunna utnyttjas i forskningen fortfarande inneha?ller information som kan go?ra att patienterna kan identifieras eftersom det bland annat refereras till patientens yrke (VD-position pa? Alfa Laval), eller familjemedlemmar och telefonnummer (patientens man Bengt-A?ke na?s pa? telefonummer 08-123 4567). Det andra problemet a?r att journaltexterna inneha?ller ma?nga felstavningar och grammatiska fel men a?ven tvetydiga fo?rkortningar som go?r dem sva?ra att bearbetas av dataprogram.
Vi a?mnar da?rfo?r i detta forskningsprojekt dels ordna att dessa patientjournalerna korrigeras fra?n felstavningar och fa?r en enhetlig stavning av begrepp och dels att texten avidentifieras. Ba?de ra?ttstavning och avidentifiering av texterna kommer att ske med helt automatiska spra?kteknologiska metoder. Vi kommer att utga? fra?n drygt en miljon patientjournaler som vi snart kommer att fa? tillga?ng till genom Stockholms la?ns landsting.

Dessa patientjournaltexter a?r det material vi kommer att la?ta va?ra system tra?nas upp pa? sa? att de la?r sig att ka?nna igen nya begrepp. De automatiska metoderna fo?r automatisk namnigenka?nning och da?rmed avidentifiering kan skapas antingen genom regelbaserade eller statistikbaserade metoder. Med dessa metoder kan man sedan automatiskt ka?nna igen personnamn, yrken, platser, organisationer, mm. Na?r detta a?r utfo?rt kommer vi med att ha ett stort antal patientjournaler med kanske upp till 99 procent helt avidentifierat inneha?ll som mo?jliggo?r forskning pa? ett unikt material. Vi hoppas kunna tillga?ngliggo?ra va?r rentva?ttade patientjournalkorpus och va?ra framtagna spra?kteknologiska verktyg till Svensk Nationell Datatja?nst (SND) fo?r att a?stadkomma vidare spridning.

Det automatiska ra?ttstavningssystemet bygger pa? regler fo?r hur felstavade ord i en text kan korrigeras. Ra?ttstavningsystemet anva?nder sig av ba?de lexikon och fo?rkortningslistor och kommer att korrigera de felstavade orden i patientjournalerna, men vi kommer a?ven att anva?nda oss av speciella medicinska ordlistor som t.ex. FASS-listor med la?kemedelsnamn. Patientjournaltexterna med o?ver en miljon patientjournaler ga?r ocksa? att utnyttja fo?r att ta fram nya doma?nspecifika ordlistor, da? kan man la?ta de vanligaste stavningarna av orden “vinna o?ver” de ovanligare stavningarna av orden.
Forskningen som kan go?ras pa? dessa patientjournaler a?r ba?de traditionell so?kning inom en individs samlade journaltext men ocksa? bland flera individer. Viktigast av allt a?r att man kommer att ha ett stort material som samlar va?rdefull information om ett stort antal patienter, som man kan utnyttja fo?r att extrahera ny information och kunskap.

Projektet har tva? ma?l: dels att skapa en stor avidentifierad patientjournalskorpus pa? svenska fo?r forskningssa?ndama?l, och dels ge forskarva?rlden tillga?ng till i projektet framtagna spra?kteknologiska verktyg fo?r avidentifiering och arbete med liknande textma?ngder. I och med detta kommer man i framtiden enkelt kunna skapa nya avidentifierade textma?ngder och arbeta med stora, informationsta?ta
textma?ngder.

Va?rt projekt a?r unikt sa?tillvida att det a?r fo?rsta ga?ngen na?gon kommer att genomfo?ra avidentifiering och rentva?ttning av drygt en miljon patientjournaltexter (pa? svenska). Tidigare arbete har oftast ro?rt sig om ho?gst na?gra fa? tusen patientjournaler pa? engelska. Denna forskning a?r mycket relevant eftersom den kommer att bidra till att ha?lsova?rden kommer att kunna utnyttja alla de samlade kunskaperna som finns skrivna i fri text tillsammans med mer “ha?rda” ma?tva?rden och genom detta kunna hitta ny kunskap fo?r ba?ttre ha?lsova?rd.

DSV to China in Roadshow to recruit master students 18-29 Oct 2007

Thursday, November 11th, 2010

DSV is represented by Hercules Dalianis in the Road Show delegation with 60 professors from Stockholm University and 14 other Swedish universities travelling to Beijing and Shanghai in China to recruit master students, and to show Swedish research in a lot of areas.
We have been visiting education fairs, Chinese Academy of Social Sciences, Ministry of Education, Peking University, Renmin University, Bei Hang University all in Beijing. The students are enthusiastic and are eager to start master studies and even PhD studies. Now we have arrived in Shanghai and continue with meeting with Jiatong university, Fudan university and Tongji university.

Download RoadshowKina18-29okt2007_Hercules.pdf

Find more photos like this on .

The 3rd PoEM is ongoing!

Wednesday, November 10th, 2010
The 3rd International Conference on Practice of Enterprise Modeling (PoEM 2010) is hosted this year by Delft/Netherlands Janis, Constantinos and Jelena are attending it. As on the previous PoEMs, the accepted studies are typically grounded in practices or empirical studies. In addition, some time is reserved for focused discussions. We have 2 contributions on the conference:
1. Towards a Unified Business Strategy Language:A Meta-model of Strategy Maps
Constantinos Giannoulis, Michael Petit2, and Jelena Zdravkovic
2. Towards Defining a Competence Profile for the Enterprise Practitioner
Anne Persson and Janis Stirna

14th IEEE EDOC Conference

Friday, October 22nd, 2010

Next week (26-29) October, I will present a contribution on the EDOC conference: http://edoc2010.inf.ufes.br/

“A Model-Driven Approach for Designing E-Services Using Business Ontological Frameworks”
Abstract—A constant goal of enterprises of all sizes is to align their business with IT. The major concern is to design the technology to support the desired performance goals and business values.
In e-business collaborations, services are becoming the cornerstones for modeling

the offerings of the involved parties. However, business concepts, like value
offerings, typically cannot be linked to technology levels, such as SOA and Web
services. Business value models, formulated in terms of economic values, have
been recognized as the basis for eliciting the actors in a business scenario
and their relationships. Recently, several business ontological frameworks have
been proposed to facilitate the design of business value models. Aiming towards
an MDA-aligned approach, in our study we consider business value models for
creating a service-centric Computational Independent Model (CIM). By utilizing
well-defined mappings, the model is further transformed into a UML-based system
model at the Platform Independent Model (PIM) level, capturing both the static
and behavioral specifications of elicited e-services.

Sundsvall42

Tuesday, October 19th, 2010

Tomorrow I will present the paper Mot en processorienterad förvaltning (En. Towards a process oriented government*), by Gustaf and me, at Sundsvall 42 conference.

The paper summarizes our work at the OST (Open Social Services) project finances by Vinnova. It discusses the prototype we developed for the emergency phone application process and the full scale solution developed during the project.

The prototype can be downloaded from http://people.dsv.su.se/~petia/phone_demo_v_2.zip
A description of how to install and run the prototype is available here.
A movie from the prototype is available here.

The full scale solution is available at Järfälla’s web-site.(electronic identification is requested.)

Finally, the presentation for tomorrow is available here.

* The paper is written in Swedish. A related paper, presenting some of the results in English, is Business Process Management for Open E-Services in Local Government.

Hercules bloggar på nya bloggen

Monday, October 18th, 2010

Detta är ett test för att se om det hela funkar.

Jag länkar till min vanliga hemsida

http://people.dsv.su.se/~hercules

Research Project Application: A Universal Repository of Process Models, submitted to Vetenskapsrådet, April 15, 2008

Monday, April 21st, 2008

The rapid development of the Internet during the last decade has supported enterprises in building novel infrastructures, setting up virtual organisations, and operating in larger geographical spaces. To manage this new environment, enterprises need to align their IT infrastructures to the business processes. Therefore, the interest in business process management using Process Aware Information Systems (PAIS) has been rapidly increasing. Solutions implemented in PAISs are often complex and time-consuming to develop. One way to address this problem is to utilize repositories of reusable process models. However, while repositories have proved to be successful within object-oriented and component-based development, similar success has not yet been achieved in the area of PAIS. This is because we still lack the critical mass of process models within a single repository and we lack transparency between different repositories. The main goal of this research is, therefore, to design the architecture of a universal process repository, i.e. a repository that is independent of process modelling languages, comprises a large number of existing process repositories, and is open for change and growth by any potential user. The long term goal of the research is to lay the foundations for a Business Process Management Wikipedia, which will become a universal knowledge resource on process models that can be used by researchers for empirical investigations in the business process management area.

Service Engineering

Tuesday, December 18th, 2007

Service engineering is an approach to the study, design and implementation of service systems in which specific constellations of organizations and technologies provide value for others in the form of electronic services.

Key to this approach is the Service-Oriented Computing (SOC), a model that utilizes services as the basic computing unit to support development and composition of larger-granularity services in heterogeneous environments, which can in turn, support flexible business processes and applications that span organizations.

SOC heavily relies on the use of Service-Oriented Architecture (SOA), a logical way of designing a software system to provide services to either end-user applications or to other services distributed in a network, via published and discoverable interfaces. In the scope of SOA, services are autonomous computational entities that can be used in a platform independent way. Services can be described, published, discovered, and dynamically assembled.

SOA, built with Web services is gaining increasing use in electronic-based business interactions. Web services employ common Internet technologies enabling thus standards-based, infrastructure to be used.

So far, research and development of Web services has mainly focused on an operational perspective, such as the development of standards for message exchanges and service coordination. However, more important is the fact that Web services are used to expose valuable business functionality. In the long-run, Web services that do not support certain business values cannot be motivated. This fact is shifting lately the focus to large scale design of external e-services, within the context of economic value exchanges to the business level. These, high-level business services are further implemented using basic functions composed in the form of Web services in processes. Apprehending this as a core relation between high-level, business-centered services and low-level, technology-centered services, it becomes natural to develop systems from a higher level of abstraction, and leave particular technologies to handle tedious details of the low-level services.

Besides the need to handle the increased complexity in the form of numerous business actors and their value exchanges, there is also a need for a structured approach for software service design that merges the IT and business perspectives. A well-defined alignment of software and business values provides benefits for service requirement gathering, service design and service validation.

Current SYSLAB research within the area of the service engineering is focused on the identification and design of goal- and business-aligned e-services.