this is the complete manual reference for web cellHTS2.
web cellHTS2 is a frontend for the R package cellHTS2. It has been written as a web server application to provide a graphical user interface and the only requirement to use the web tool is access to a web browser. In contrast to the command-line interface, which can be difficult and time consuming to learn, web cellHTS2 provides an intuitive and convenient user interface to analyze high-throughput screening data sets.
web cellHTS2 can be accessed at: http://web-cellhts2.dkfz.de.
web cellHTS2 key features
- Analysis of high-throughput screening data on-the-fly and streaming of the results directly to the browser or sending via email.
- Webpage with wizard-like techniques to select analysis parameters, plate configurations and further options
- Uploading of all file types supported by the cellHTS2 R-package
- Plate Configurator, Filename Parsing and Text Forms allow an easy generation of all needed cellHTS2 configuration files
- Error detection mechanism will detects plenty of common input errors pre-running the package
- Saving and loading of complete sessions with all input files and parameters saves time on repetitive data analysis and function as a backup of your analyzed data.
- Using the Rserve system as the R backend allows massively parallel usage of the analysis tool
1. System requirements
The following web browsers support the application web cellHTS2:
- Mozilla Firefox 3
- Safari 3
- Internet Explorer 8
Previous browser versions or other browser types where not tested and therefore should not be used because they might produce errors when running the program.
If you cannot access one of the supported browsers you can always use our Virtual Box Image which can be used out-of-the-box from here.
For running the all-in-one executable jar version the following dependencies have to be met on your computer:
- R Version at least 2.10.0
- cellHTS2 Version 2.9.18
- RServe 0.6.0
- Java JRE 5
- Read and write permissions to a temporay folder
When starting a new web cellHTS2 session by going to http://web-cellHTS2.dkfz.de the program will first check if all the necessary run-time dependencies have been met. The checks include client as well as server side (note: if running the all-in-one executable jar version on a homecomputer possible server-side issues have to be fixed as well ) dependency checks:
2. checks if the browser and its version are supported
3. Optional for the multiple file uploader: checks if flash is installed and its version ist supported
Server side checks:1. checks if Ajax (XML-HTTP) Request/Responses can be made.
2. checks if the program's temporary folder can be accessed and is writable
3.checks if the R programming language and cellHTS2 is installed and the Rserver can be accessed.
2. currently only Mozilla Firefox, Safari and Internet Explorer are supported and those in the versions you can see here so please switch to one of these supported webbrowsers.
3. You have to install the Adobe Shockwave/Flash Plugin for your browser and enable it (this is not mandantory and only needed for selecting multiple files for uploading in the adanced file uploader).
If any of the dependencies can't be resolved, web cellHTS2 will be blocked and a info box willl appear:
What to do if a server side error occurs:
If you run the all-in-one integrated jetty server executable jar several things can happen:
1. this is a rare phenomenom and happens if web cellHTS2 gets accessed from behind a proxy which disables/cuts off XHTTP headers. For example this is reported when using the windows ezproxy software or your institute uses very restrictive proxy security settings. If the proxy settings can't be changed for any reasons the web cellHTS2 Virtual Box image can be downloaded from here . The image contains a complete linux web cellHTS2 analysis environment with all dependencies included and can be started out-of-the box.
2. If the program's temporary folder can't be created/accessed you can change the read/write permissions to that directory or select another one by editing the apps.properties file.
3. In order to proceed you have to install R, the cellHTS2 package and Rserve (the supported versions for those dependencies can be found here ) and start the Rserve on your server so that web cellHTS2 can make a connection to it.
2. web cellHTS2 wizard interface
web cellHTS2 contains am easy to use interface. It works like a typical wizard (as being used for installation of Mac or Windows software) where navigation through the steps occurs by clicking "next" or "back" buttons at the bottom of each page. In this process it is important to never use the browser's "back" and "next" buttons while working with this web interface.
Whenever the page is left by click on the web tools "next" or "back" buttons, an error checking procedure will be started to check the user input on that specific page. In dependence of the test result, the move to the next step will be allowed or not. This error checking mechanism will help to identify possible input errors immediately and not during the execution the analysis.
Before the analysis job can be started, all available steps have to be passed through. The last page of the wizard contains the option to save the current session with all inputs. Loading a stored session is possible only in the first step.
Also included on the page is a menue bar which contains links to external resources as well to a project page, where latest source or binary packages can be downloaded or bugs submitted.
3. Wizard steps in detail
Step1: Selection of analysis type
In the first step of every analysis, the type of input data will be chosen. Either single- or dual-channel data analysis can be selected. In the latter case, a form will pop up where the channel names can be labeled.
As alternative option, session files can be uploaded instead of the channel selection, which is assisted by a file upload form on this page. The generation of session files is described in step 12.
Upon uploading a valid session file, all consecutive steps will be filled in with the selected former input file information and after an automatic redirection to the last page where from the analysis can be started immediatelty.
Step2 a: Uploading of screening data files using the normal file uploader
In the second step all screening data files can be uploaded and plate, replicate and channel number for every data file can be defined. Either single data files or multiple data files archived as a zip file can be uploaded in order to save time required for uploading them one by one. Furthermore, zip and single files can be uploaded in parallel.
Discarding all uploaded files can be done by clicking on the link "Drop all entries" .
Upon uploading a data file, the program automatically checks whether the file is valid or not. A valid data file is defined as a text file with three columns per line, which are separated by tab:
If the uploaded files are not conform to these rules, an error message showing the exact position where the error happened in the files will be shown.
When data files were not uploaded in the correct format, a table will appear showing the filenames and the corresponding plate, replicate and channel.
Plate, replicate or channel options can be edited by clicking into the table element of interest. A small text field will appear to edit the respective data. After finishing the editing, press "Return" in the text field or click with the mouse outside of the edited area.
The checkbox "Filename Parsing" controls whether the name(s) of the file(s) will be used to extract plate, replicate and/or channel numbers. Some of the patterns can recognize numbers and characters out of your filenames to annotate them. The program tries to normalize the numbers e.g. filenames uploaded with the subpattern Set28, Set29, Set30 (which correspond to the replicate number) will be normalized to the smallest set number given, resulting in 1,2,3 as replicates.
For filename parsing, there are currently several data file filename patterns implemented, which can automatically be recognized (this uses some kind of brute force approach to select the pattern with the highest amount of plate/replicate/channel hits in your filenames):
*FL=Fluc (Firefly Luciferase),RL=Rluc (Renilla Luciferase)
For other filename types which are common within the group and would be good to include these into the filename parsing option, please feel free to send an email with some examples to b110-IT@dkfz.de
If the filename parameters could not be recognized by the parsing algorithm, it is possible to parse them manually by providing a special regular expression pattern in the text field "apply regexp" and pressing the "Update" parameter button.
There is a special syntax for regular expression filename parsing therefore some basic knowledge about regexpis needed. Or send a request with the filenames to get parsed to b110-IT@dkfz.de).
With x,y,z the order of p,r or c (for plate, replicate, channel) as they appear in the perl regexp (the regexp-groups "()" from left to right).
For every x,y,z further character-to-number converting expressions can be defined with the seperator ":". If multiple character-to-number converting expression is present it can be separated by using ",".
would parse filename "ABCD_3_0815_RL_1.TXT to Plate:1, Replicate:3, Channel:2
and "EFGH_5_6667_FL_7.TXT" to Plate:7, Replicate:5, Channel:1
Another option is to upload an existing plate list file in the standard cellHTS2 file format which will be parsed and all recognized plate, replicate and channel numbers will be added to your uploaded file.
The format of the plate list file is as following:
By clicking on the "next" button, the server will check whether at least one file was uploaded, all data files are in the same format (e.g. 384) and all mandantory parameters for the set type of analysis were filled out. For single channel analysis all filesrequire at least an associated plate and replicate number. For dual-channel analysis, channel number need to be provided for all files with equal number of channel 1 and 2 files.
Step2 b: Uploading of screening data files using the advanced file uploader
Besides uploading single data files or multi files packed as zip archive which are exactly in the data file format required for cellHTS2, data files which do not follow these standards such as Excel files or cvs files with a largevariety of columns (including gene id, sequence, refseq accession number, ...) can be reformatted and imported into web cellHTS2 using the advanced file uploader tool. To use it just click on the link "Advanced File Importer" just above the traditional file uploader form.
A full manual is available here.
A quickstart tutorial is available here .
Step 3: Configuration of the plate wells
In this step, the layout of the plates will be configured. The plate configurator grid contains the same amount of wells and row to column proportion as the uploaded data files.
On the right, two drop-down menus are available. In the upper menu the type of well can be defined and marked respectively in the grid:
positive, negative, other, empty or contaminated (the sample welltype can be selected by clicking on a already defined well again).
After well type selection, the wells can be marked by clicking on a certain well in the grid. For visual differentiation of the types there are six different colors for the available well types: positive: red, negative: blue, other: black empty: white, contaminated: green. Sample: grey.
To mark a complete row or column just click on the row or column number (named A, B, C, D, ... or 1, 2, 3, 4,...) wells.
To delete a well just click on a well which has already been defined with a certain type before and it will get erased (or better: set to the sample well). If the contaminated welltype is selected and wells are marked by it, the welltype before gets conserved which mean that when deleting a contaminated well by clicking on it, the welltype before will be restored.
In order to delete a complete plate layout (which means set everything to the sample well) click on the "X" in the upper left corner of any selected plate. Please note that upon deetion of a certain plate, all the other plates with the same plate number (and different replicate numbers) will be deleted too.
If the plate with the description "all" (see later) gets selected from the drop down menue and press the "X" field, the complete layout for all plates (!) will vanish so please be careful with this option.
The other drop-down menue allows the switching between plates of the uploaded data file set. The first plate in this list is always depicts the "all" plate, which acts as layout plate for every plate in the set. The "all" plate allows assigning all wells in the plate set with the same content at identical positions. The other plates represent all available plate and replicate combinations. These are important to define contaminated wells separately on the basis of single replicates (the shortcuts in the dropdown menu are PL_X_REP_Y with X = Plate Number and Y = Replicate Number).
Wells, which are available within specific plates only can be selected by switching to the respective plate. Positive, negative and other wells will be applied for all replicates of one plate type, whereas contaminated wells are only displayed for single replicates.
Instead or in addition to manually defining the wells in the grid of the plate configurator, an existing plate config file can be uploaded.
The interface is able to read and display plate config files in the cellHTS2 R-package file format and displays them in the plate configurator editor. If already marked positions were selected manually, these will be overwritten, thus the two described selection methods can be combined.
Contaminated wells are not stored in the plate config file but in another file called Screenlog file. These Screenlog files can be saved and uploaded as well in another file upload field to enable the submission of such files to the program.
The uploaded plate config and screenlog file will be made visible in the configuration grid.
The interface will validate if the uploaded plate config file is correct:
Header line: consists of four lines where wells can be in 96- ,384- and 1536-well format.
* Dataline: tab separated lines where content column can only be positive (pos),negative (neg),other or sample.
A valid screenlog file is shown in the following (the header order is set, no other columns are allowed such as Channel which is mandatory for dual-channel screens):
For proceeding to the next step, at least one well within all plates has to be marked with a certain well type.
Step 4-9 Parameter choice for the analysis run
Here, the statistical methods can be defined for application to the data. All sub-steps on these pages are straightforward by selecting the choices through clicking on the radio buttons. A short explanation for most of the available statistics methods is available here, for a detailed explanation please refer to the cellHTS2 R manual here.
In some rare cases, options are excluded in dependence of another selected option. E.g. when "additive" was selected as your normalization scaling method (step 6), the log transformation value can only be set to NO (step 5).
These cases will produce an error message when clicking on the "next" button so that the conflict can be removed before proceeding.
If you click on the "Use viability threshold function" "YES" parameter, a textfield will appear where you can add a R-function for the summarization of dual-channel data (for more information refer to the manual here).
If no function is given and the parameter "YES" is selected, the function:
will be applied.
Some further help for the different normalization methods is available by clicking on the help links which will popup a window:
Step 10-11 Upload of additional cellHTS2 files
These steps offer the possibility to upload annotation and/or description files. Both are not required. The upload needs to be implemented in the appropriate format. Alternatively, a custom description file can be created through the description edit form or the uploaded file can be edited.
Don't forget to press the create or update buttons whenever data were added or edited in the description form or changes will not be send to cellHTS2.
Annotation file format:
- Header line: tab separated, the first three columns are fixed in order and name : "Plate" "Well" "GeneID", all further columns are optional and will just being appended to the output.
* Data line: tab separated, the first three columns must be available
Description file format:
Description files have no strict rules they only have to be in sections with the section names in square brackets.
The program checks description files when uploading for the existence of one square bracket at the beginning of the file.
The following description file layout is necessary to allow the description editor form to read the plate config values.
Please note that date symbols are only recognized by the program when they have the form mm/dd/yy e.g. 02/29/09 for 29th February of 2009, which is the common US date notation.
Upon labeling the screen item in the description form editor and then running cellHTS2, this label will show up in the resulting zip filename as a prefix e.g. "screen: DualChanData" will result in an result file DualChanData_RUN12345.zip.
If the system is used with "sending your results by email" (instead of showing a progress bar and streaming the results via html) the specification of the user's email address will be mandantory.
Step 12: Start cellHTS2 analysis
Clicking on the link to start the analysis will open a new window where either a progress bar will show up togeter with a a short text showing the current processing step and status or a text message with a text notification will appear indicating that the results will be send after the job has been processed by the system. When the program has completed the analysis successfully, a download window will appear and the results can be saved as a zip file.
Unfortunately, only a limited number of error detection steps can be implemented so there is still the possibility to receive an error message.
There will be a short report of what went wrong as well as a link to the R logfile on the server, which should be send to the maintainer of this program (Read FAQ section: "In case of cellHTS2 error").
If the system has enabled progress bar feature and the processing queue is currently full while submitting the job, a notification to wait will appear. Then, don't (!) close the window until one slot gets free in the queue after which the job will get processed instantaneously.
Unpack the zip file among plenty of others it contains three important files:
index.html which is the start of the HTML results and should be opened in a web browser.
R_OUTPUT.TXT contains all the R output and can be used to trace down bugs in your set-up.
R_OUTPUT.script is the R script used for the analysis. It can be used for running the analysis with the command-line R version.
If the progress bar feature is enabled (for internal version) in order to pass the error-checking at this step a valid screen ID has to be provided too.
Screen IDs at Boutros Labs consist of a four digit number combined with a two-digit number, separated by a "-" hyphen. You can only start an analysis run if you entered a valid number such as
Upon clicking this button, all current input and uploads will be emptied and the process starts at step 1: begin a new analysis session.
save this session
To save the current session, click on the respective link and all input files, uploaded data, additional files and settings will be stored in an archive file which will be send to to the user through a download pop-up window. This archive contains the complete processing information enabling to use them on another server where this webservice is installed too (How to load it see step 1). Furthermore, all data are bundled into one file, which could be send to other scientists or in case of errors to the maintainers of this application so they can quickly track down what went wrong.
- Q: What are the limitations of web cellHTS2 regarding uploaded file size, parallel jobs etc.?
- Limitations can be set up through a config file. Currently for external usage at the DKFZ we limit the parallel R runs to maximum 1. For internal usage we have a number of 20 parallel runs. On some systems we have seen memory leaks in the Rserve binary when analysing big genome wide screens.
- Q: I have deleted my files on harddisk, are the results of the analysis stored on the server?
- All uploaded data files and the result files of the analysis are stored for one month on the server. Please get in contact with the system administrators at email@example.com.
- Q: I have uploaded a Plate Configuration file but the PlateConfigurator view remains empty.
- Make sure your plate config file has only pos,neg,sample,other in the "content" column.
- Q: Where can I load or save a stored session
- You can only load a stored session at the first step in the wizard. So if you are not at this step either use the "back" button or alternatively if you are already on the last page click the link "New Analysis" so that you will automatically be redirected to the first page.
- Saving a session can only be done in the last step where the job is started as well. This assures that only valid data is going to be saved and you can quickly start stored sessions later on.
- Q: I am using the description editor to label and describe my screen analysis but the html output does not reflect my description edit changes.
- Most likely the "create/update" button at the bottom of the description edit form was not pressed so that the description file could not be written properly.
- Q: I want to use the generated files e.g. the plate list file for use with the command line tool or for uploading again into the single upload forms during all those single steps. How can I see/get all the files such as the plate list, plate conf, etc.
- If you set up your analysis parameter in the last step click on "save this session". The downloaded file is actually a zip file containing all generated data. If your OS doesnt detect it as a compressed file and it has no .ZIP extension you have to add it to the file and then use your favorite extractor tool to extract the configuration files of interest.
- Q: I am getting a cellHTS2 (or R) Error:
- In case an error message was produced while starting cellHTS2 R analysis you should at least check the short red error text which should appear under the error message. This gives a hint what exactly went wrong. Also check the current step output right under the progress bar which gives further information about the problematic step.
- here is a script to validate the data files which are very error prone
- Standard error messages are:
Error in pretty(s, n = ceiling(nrWell/10)) :
NA/NaN/Inf in foreign function call (arg 1)
This happens when using multichannel data and some older version combination of R/cellHTS2 . We suggest R 2.10 w/ cellHTS2 2.9.3 here. Please check that your cellHTS2 version is compatible to the R version you use.
Error in plot.window(...) : need finite 'ylim' values
This happens when using multichannel data and some older versions of R/cellHTS2. We suggest R 2.10 w/ cellHTS2 2.9.3 here.
Currently this function is implemented only for single-color data.
Channel 2 was labeld to one of your files in step 2
Error in data.frame(title = title, shortTitle = title, thumbnail = img, :
object "imap" not found
This is not clear yet and, happens on large analyses also in normal cellHTS2 R-package seeming like a cellHTS2 R bug in the newest version
Error in readPlateList(PlateList, name = Name, path = Indir) : The following rows are duplicated in the plateList table: Plate Replicate Channel 10 2 2 1
this means that line 10 of your platelist file contains the same plate,replicate (and channel) combination as another entry in the plate list.
- If a problem repeatedly occurs, you should write down the file location to the log file and go back one step to the step 12 page and save the current session. Then send an email with the session archive zip file and the location to the R Logfile to b110-IT@dfkfz.de. Thanks in advance.
- Q: I am getting the error message: Rserver crashes....
- This is a really severe bug to track down, mainly those errors occur when analysing huge genomic data. Because Rserve is still in developement it can crash without any logging of the problem. Thus it will be best to send to firstname.lastname@example.org the current session you used so we can try to debug it in cellHTS2 (or we can contact with the Rserve developers directly).
- Q: I am using the external version of the tool. I am waiting forever for an email notification.
- If the queue is not full, emails should be sent out in an estimated timeframe of maximum 20 minutes. If you still have not received an email after that please get in contact with the program maintainers under: b110-IT@dkfz.de and refer to the problem by providing your job ID.
- Q: I am using the external version of the tool. Although it is stated in the mail that the run was successful, the email notification mails do not contain any attachments
- It is propable that you use the dkfz webmail program which for whatever reason is not able to display our attachments. Try viewing this mail with a proper email client such as Outlook,Thunderbird, Exchange,Evolution,Netscape Messanger,Pegasus with this result mail and you will have access to the results file!
- Q:Every time I download a streamed result file the zip file is broken.
- We have seen this problem several times, it looks like something is wrong between the webserver communication and the browser (FF3). Try not clicking on "open with" in the download pop-up, instead use "save as" and extract and open it manually later which solved the issue for lots of people