Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Resource Summary Report New Search

Name: Smoking NLP Challenge Data
Keywords: nlp datasets

Resource Name

Smoking NLP Challenge Data

RRID:SCR_008644 RRID Copied

PDF Report How to cite

Smoking NLP Challenge Data (RRID:SCR_008644)

Copy Citation Copied

Resource Information

URL: https://www.i2b2.org/NLP/DataSets/Main.php

Proper Citation: Smoking NLP Challenge Data (RRID:SCR_008644)

Description: The data for the smoking challenge consisted exclusively of discharge summaries from Partners HealthCare which were preprocessed and converted into XML format, and separated into training and test sets. I2B2 is a data warehouse containing clinical data on over 150k patients, including outpatient DX, lab results, medications, and inpatient procedures. ETL processes authored to pull data from EMR and finance systems Institutional review boards of Partners HealthCare approved the challenge and the data preparation process. The data were annotated by pulmonologists and classified patients into Past Smokers, Current Smokers, Smokers, Non-smokers, and unknown. Second-hand smokers were considered non-smokers. Other institutions involved include Massachusetts Institute of Technology, and the State University of New York at Albany. i2b2 is a passionate advocate for the potential of existing clinical information to yield insights that can directly impact healthcare improvement. In our many use cases (Driving Biology Projects) it has become increasingly obvious that the value locked in unstructured text is essential to the success of our mission. In order to enhance the ability of natural language processing (NLP) tools to prise increasingly fine grained information from clinical records, i2b2 has previously provided sets of fully deidentified notes from the Research Patient Data Repository at Partners HealthCare for a series of NLP Challenges organized by Dr. Ozlem Uzuner. We are pleased to now make those notes available to the community for general research purposes. At this time we are releasing the notes (~1,000) from the first i2b2 Challenge as i2b2 NLP Research Data Set #1. A similar set of notes from the Second i2b2 Challenge will be released on the one year anniversary of that Challenge (November, 2010).

Synonyms: NLP Data Set #1C

Resource Type: data or information resource, database

Keywords: nlp datasets

Expand All

Funding:

Resource Name: Smoking NLP Challenge Data

Resource ID: SCR_008644

Alternate IDs: nif-0000-32739

Record Creation Time: 2022-01-29 12:02:48

Record Last Update: 2025-05-06 11:06:30

This resource

has parent organization

Informatics for Integrating Biology and the Bedside

Usage and Citation Metrics

We found {{ ctrl2.mentions.total_count }} mentions in open access literature.

We have not found any literature mentions for this resource.

We are searching literature mentions for this resource.

View full usage report

Most recent articles:

{{ mention._source.dc.creators[0].familyName }} {{ mention._source.dc.creators[0].initials }}, et al. ({{ mention._source.dc.publicationYear }}) {{ mention._source.dc.title }} {{ mention._source.dc.publishers[0].name }}, {{ mention._source.dc.publishers[0].volume }}({{ mention._source.dc.publishers[0].issue }}), {{ mention._source.dc.publishers[0].pagination }}. (PMID:{{ mention._id.replace('PMID:', '') }})

Checkfor all resource mentions.

Collaborator Network

A list of researchers who have used the resource and an author search tool

Find mentions based on location

A list of researchers who have used the resource and an author search tool. This is available for resources that have literature mentions.

Ratings and Alerts

Report Information

No rating or validation information has been found for Smoking NLP Challenge Data.

No alerts have been found for Smoking NLP Challenge Data.

Data and Source Information

Source: SciCrunch Registry

About

Welcome to the Neuroscience Information Framework Project, designed to serve the biomedical research community. NIF maintains the largest searchable collection of neuroscience data, the largest catalog of biomedical resources, and the largest ontology for neuroscience on the web. We welcome all feedback and suggestions and are actively looking for resource providers to make their resources accessible through NIF. Learn about the tools available to help you share your data and discover a dynamic inventory of Web-based neuroscience resources.