Archive for the ‘Uncategorized’ Category

Seminarium – Smartare kunskapssökning i journalen, Karolinska institutet, Huddinge, 13 maj 2009, kl 09.00-10.30

Thursday, November 11th, 2010

Tid: Onsdagen 13 maj kl 9-10.30
Plats: CeFam – Alfred Nobels allé 12, Flemingsberg, (rummet bredvid lunchrummet, plan 5)

Smartare kunskapssökning i journalen, Hercules Dalianis, Martin Hassel och Sumithra Velupillai
Vi kommer att beskriva en del av Stockholm EPR-korpus som består av över en miljon patientjournaler från 2 000 kliniker från Stockholm läns landsting, ur ett textlingvistiskt perspektiv. Vidare kommer vi att presentera några
preliminära resultat från experiment utförda på journaltexterna:
1) En annoteringsstandard och guldstandard för att kunna avidentifiera journalerna
2) Automatisk ICD-10 kodtilldelning (och validering av ICD-10 kodtilldelning) av journaltext.
3) Ett utforsknings- och hypotesgenereringsexperiment baserat på textklustringsverktyget Infomat, utförda på journaler från geriatriska kliniker.
Nyheter Centrum för Hälsoinformatik, Karolinska institutet

Planeringsansökan med titeln Avidentifierad PatientKorpus (APK) till Vetenskapsrådet.

Thursday, November 11th, 2010

Syftet med planeringsprojektet med titeln Avidentifierad PatientKorpus (APK) är att tillgängliggöra en stor databas med över en miljon patientjournaler från Stockholm från åren 2006, 2007 och 2008 från över 2 000 kliniker från Stockholms läns landsting. Patientjournalerna innehåller både strukturerade data såsom kön, ålder, besökstider, diagnoskoder och läkemedel för patienterna men också löpande fritext, vilket är den största delen av journalen. Journalerna är skrivna på svenska av klinisk personal. Vi kallar denna databas för Stockholm EPR Corpus och det är den största kända databasen i Sverige, och kanske även i världen med patientjournaler. Vi önskar tillgängliggöra Stockholm EPR Corpus till en vidare grupp forskare inom medicin, hälsoinformatik, epidemiologi samt språkteknologi. Inom epidiomologi finns det möjlighet att direkt koppla individer i epidemiologiska register (tex Svenska Barncancerregistret och det Svenska tvillingregistret) med motsvarande patient i en patientjournal, men också att koppla biobanker direkt till klinisk data och på så sätt få ytterligare värdefull information för forskningen. Stockholm EPR-Korpus är också värdefull för språkteknologer som utvecklar så kallade textbrytningsverktyg för att kunna hitta nya och dolda samband mellan symptom, diagnoser, behandling, biverkningar i både den fria texten och i de strukturerade delarna av texten.
Patientjournalerna i Stockholm EPR-Korpus är avidentifierade med avseende på namn och personnummer, men innehåller fortfarande information, bland annat i fritextfältet, som skulle kunna identifiera patienterna. Det är etiskt mycket viktigt att denna information aldrig kommer ut och vi kommer därför att med hjälp av våra avidentifieringsverktyg avidentifiera texten innan Stockholm EPR Corpus görs tillgänglig. En fråga som då automatiskt dyker upp är hur mycket ska avidentifieras för att texterna ska kunna vara användbara samtidigt som vi behåller patientsekretessen detta mått måste räknas fram inom ramen för detta projekt. Vi kommer inom projektets ram även ta fram definitioner och riktlinjer för hur man kan skapa en avidentifierad patientkorpus.

Mina medsökande är Dr. Martin Hassel, Dr. Anette Hulth, Smittskyddsinstitutet och Professor Gunnar Nilsson, Karolinska institutet.

GSLT-Retreat, Gullmarsstrand, at the West coast, January 26-28, 2009

Thursday, November 11th, 2010

Sumithra Velupillai and Hercules Dalianis participated at the yearly GSLT, Graduate School of Language Technology Retreat. There where over 50 participants at the conference; PhD-students, supervisors and alumni from the whole of Sweden, see here for the conference programme.

DSV/KTH-Stockholm University is part of GSLT. At DSV we have one more GSLT PhD student at DSV namely Atelach Argaw who is supervised by associate professor Lars Asker. Dr. Martin Hassel at DSV is also one of the GSLT supervisors and he is also supervisor of Sumithra.

Sumithra presented two posters, one with title Mixing and Blending Syntactic and Semantic Dependencies. The research was been carried out during a GSLT phd-course in Machine learning together with the PhD students Yvonne Samuelsson, Oscar Täckström, Johan Eklund, Mark Fishel, Mark and Markus Saers. This poster was also presented at the Coling workshop CoNLL, in August 2008.

Hercules presented our patient data corpus, the Stockholm EPR corpus, and some experiments we have carried out on it.

Some photos from the GSLT-Retreat:

Find more photos like this on .

Master thesis: Using parallel corpora and Uplug to create a Chinese-English dictionary. Defended December 10, 2008

Thursday, November 11th, 2010

Authors: Hao-chun Xing (EMIS) & Xin Zhang (EMIS)

Abstract

This master thesis is about using parallel corpora and word alignment to
automatically create a bilingual Chinese-English dictionary. The dictionary
can contribute to multilingual information retrieval or to a domain specific
dictionary.

However, the creation of bilingual dictionaries is a difficult task
especially in the case of the Chinese language. Chinese is very different
from European languages. Chinese has no delimiters to mark word boundaries
as in European languages. Therefore we needed Chinese word segmentation
software to insert the boundaries between each Chinese word in order to
correspond with English words. That was one of the difficult issues in our
project. We spend half time on it.

Our parallel corpora consists of 104,563 Chinese characters, that is
approximately 50,000-60,000 Chinese words, and 75,997 English words, mainly
law texts. We used ICTCLAS as the Chinese word segmentation software to
pre-process the raw Chinese text, and then we used the word alignment system
Uplug to process the prepared parallel corpora and to create the Chinese-
English dictionary.

Our dictionary contains 2,118 entries. We evaluated the results with the
assistance of nine native Chinese speakers. The average accuracy of
dictionary is 74.1 percent.

Key Words: Parallel Corpora, Chinese Word Segmentation, Uplug, Word
Alignment Tool

Download master thesis

Seminar: Work on Electronic Health Records – from texts to improved health care, Dec 2, 9-12

Thursday, November 11th, 2010

9 – 10: Louhi project: Text mining of Finnish nursing narratives
[Slides text mining] [Slides parsing]
For more information about the project, click here.


10 – 11: DSV-KEA project: The Stockholm-EPR Corpus and some experiments.

[slides]

For more information about the project, click here.

11 – 12: LIME, Karolinska institutet, Sabine Koch: Integrating electronic health records to bridge health and social care,[slides]
For more information about the project, click here.

Tuesday, December 2, 2008, sammanträdesrum 6405, floor 6, Forum, DSV, Kista

(EPR = Electronic Patient Records)

Find more photos like this on .
Some notes on the presentations:
The Louhi project at University of Turku carries out research at intensive care unit patient records written in Finnish. According to the participants of the Louhi project 20-30 percent of the clinicians’ time goes to documenting during the health care process, many times the clinicians also need to enter the same information in several different medical record systems. Sometimes it can be up to 50 pages of documentation for one ICU intensive care unit patient. This information is very difficult to obtain an overview of, therefore very little of this written information is reused. Almost none of the information in the EPRs is transferred to the discharge letter.

The DSV-KEA group presented the Stockholm-EPR corpus that contains several hundred thousand patient records. From the Stockholm-EPR corpus a Gold corpus was created that will assist in assessing the quality of a deidentification system for Swedish patient records. The Gold corpus contains 100 patient records and contains in average 4 200 PHIs, (Protected Health Information) that is 2.5 percent of the total amount of information. Name on both clinicians and patients were about 0.75 percent of the total amount of information.

Sabine Koch presented a system from the Vinnova supported project old@home. In the project handheld devices were developed to support in the health care care process. In this system all people involved in the health care care process obtained customized information about the patient. The users are the patients themselfes, the nurses, the home helpers, the relatives, etc. The system is now used partly in Hudiksvalls kommun, (muncipality).

New course – Web mining, spring 2009, period 3, 7,5 hp

Thursday, November 11th, 2010

Internet contains a huge amount of information, which is rapidly growing at an ever increasing pace. People, organizations and corporations from the whole world are adding different types of information to the web continuously in various languages. The web therefore contains potentially very interesting and valuable information. This course will investigate various techniques for processing the Web in order to extract such information, refine it and make it more structured, thus making it both more valuable and accessible. These techniques are often referred to as web mining techniques.

The domains within the Internet that we will study are databases, e-commerce web sites, wikis, virtual communities and blogs. Semantic web and Web 2.0 are two other concepts that are relevant for the course. Web mining is considered to contain three main areas, namely web content mining, web structure mining and web usage mining. Web structure mining is closely related to information search techniques, and web usage mining to opinion mining or sentiment analysis. Also related is the automatic construction of sociograms. Web content mining can for example be used to find the cheapest airline tickets, by monitoring all web based databases of all airlines in order to attempt to find the lowest common denominator of all databases.
Web mining techniques explored in the course are human language technology, machine learning, statistics, information retrieval and extraction, text mining, text summarization, automatic classification, clustering, wrapper induction, normalization of data, match cardinality of data in different databases, interface matching, schema matching, sentiment analysis, opinion mining, extraction of comparatives, forensic linguistics etc.

Read more here Web Mining/Web-mining, WEBMIN/IV2038, 7.5 hp.

IMAIL-Intelligent e-services for eGovernment accepted by Vinnova

Thursday, November 11th, 2010

The project vision is to design and develop eGovernment services based on human language technology tools that facilitate efficient communication between government agencies and citizens and companies, which will lead to a transformed and improved government.
The overall goal of the demonstrator is to show how further development of today´s tools and technologies can improve the communication between large organizations and people. The demonstrator will run on Försäkringskassan (Swedish Social Insurance Agency) and help to automate the communication between these organizations and the people by processing text-based inquiries, primarily e-mail based queries.
Our tools and technologies will:
1. automate answering of a large part of the incoming e-mail flow,
2. improve right-on-time answers to inquiries asked through electronic devices.
3. change the workload for the administrators at Försäkringskassan and use their skills in a better way.

The project will be carried out in cooperation with Försäkringskassan, Statistics Sweden, SCB and the Human Language Technology Group at CSC/KTH

Press release Vinnova: Användarna i fokus i VINNOVA-satsning på e-förvaltning> in Swedish.

New project proposal to Vinnova: Beslutsstöd genom utforskning av patientjournaler

Thursday, November 11th, 2010

Sammanfattning:
I projektet ämnar vi konstruera en demonstrator från våra befintliga verktyg som vi har utvecklat inom det Vinnovastödda KEA-projektet. Med verktyget kommer användaren att kunna utforska helt avidentifierade patientjournaler för att finna både synliga och dolda samband mellan sjukdomar, diagnoser, kost, social situation, medicinering, mm. Indata som kommer att användas för att åstadkomma detta är hundratusentals patientjournaler som forskargruppen på DSV/Stockholms universitet har fått tillgång från Stockholms läns landsting.
Demonstratorn kommer att efter projekttidens tid att kommersialiseras för att komma sjukvården och samhället till nytta.

Ny upplaga av Informationssökning på Internet av Våge, Dalianis och Iselid, Studentlitteratur

Thursday, November 11th, 2010

Mera information om boken

New Project Proposal to Vinnova – IMAIL-Intelligent e-mail answering service for eGovernment

Thursday, November 11th, 2010

We, Martin Hassel, Eriks Sneiders, Tessy Ceratto, Ola Knutsson (CSC), Viggo Kann (CSC) and Magnus Rosell (CSC) are preparing an application to Vinnova – Deadline sept 2, 2008: Title: IMAIL-Intelligent mail answering service for eGovernment, other partners Försäkringskassan (Swedish Social Insurance Agency) and Euroling AB

Abstract
The project vision is to design and develop eGovernment services that facilitate efficient communication between government agencies and citizens and companies, which will lead to a transformed and improved government.
The overall goal of the demonstrator is to show how further development of today´s tools and technologies can improve the communication between large organizations and people. The demonstrator will run on Försäkringskassan and help to automate the communication between these organizations and the people by processing text-based inquiries, primarily e-mail based queries.
Our tools and technologies will
1. automate answering of a large part of the incoming e-mail flow,
2. improve right-on-time answers to inquiries asked through electronic devices.

Two year project = 4.6 million SEK