Rick is back from IEMSS conference. Conference went well and paper was well received. The proceedings can be downloaded here.
John is back from teaching his summer course in Germany.
Rick and Reggie filled in John on what has been happening in recent months.
The new server Crane is up and running.
The website has been migrated to Crane.
Initial experiments have been run on Crane.
Reggie is also working on a 3D GIS coverage project.
Initial benchmark experiments using no subsampling have produced a decline in classification accuracy for both the neural net and naive Bayes. This needs to be investigated.
KNN results on Crane will be available shortly.
3D GIS Nexrad coverage project is going well, but Reggie needs a way to include all pulse volumes for a sweep, not just those that return a reflectivity signal.
It might be possible to define the geometry in ArcGIS using visual basic.
It also would be worth contacting Robb Diehl or Manuel Suarez to see if they have any suggestions.
Nexrad II data (polarimetric and finer resolution) are becoming available. We should investigate what it will take to make our system compatible with the new data.
2007-04-24
I need to add more explanation to the results posted in the 'Experiments' section
We should experiment with pulse volume thresholding because we would rather have some questionable
biological sweeps classified as nonbiological than have nonbiological sweeps being classified as
biological. Therefore, we should try requiring at least an 80% confidence before classifing as biological
In addition to 'Accuracy' it would also be useful to specify what percentage of the errors are errors of omission
and what percentage are errors of comission.
I should perform a formal benchmark to determine how much time is spent on training vs. classification
[Action Item] Collect information on false negatives vs. false positives.
2007-02-05
Robb was unavilable
Reviewed John's comments from latest update
We should ask Robb to review the sweep
that consistently gets misclassified.
It would be good to perform an attribute optimization
experiment that tried all possible attribute combinations
(the previous experiment added attributes in order of
descending information gain until the accuracy degraded).
We should find out if Robb has had a chance to review
the classifications of the previously unclassified data.
Discussed possibility of increasesing the number of training files
by breaking the sweeps into smaller windows, or combining all pulse
volumes into one large database (which we could then randomly sample)
Support Vector Machines were discussed
They typically achieve high accuracy, but may require unacceptable
computation time. Projecting our data to higher dimensions could be
a very comlex and expensive(computationally) process.
We should focus on Neural Nets at this point and possibly
discuss Support Vector Machines again at a later date.
Discussed Using Neural Nets
We could try with and without the second order moments to
see how well the neural net finds its own set of features
We could try discretizing the range and use a single range
to describe the window of inputs we are feeding the NN.
Feature scaling should be performed before training the
Neural Net. Outliers should also be examined. If they make up
less than 1% of data, they can probably be removed from training
data.
Next Meeting: Feb 26th
2007-01-22
John was unavilable
Reviewed slides from Robb
Teal is biological
Brown is non-biological
White is where no data exists (e.g. ground clutter)
Black is where data exists but has been removed by our algorithm
*Colors are bad for Rick
Discussed ambiguous velocity
Velocity values beyond the 'Unambiguous Velocity' boundary can not be trusted.
In some instances the computer at the radar station may be unable to calculate
a velocity value but it will always return a value.
A very obvious boundary at this range was revealed by the slides. Biological echos
were nearly always reported as non-biological beyond this boundary.
We might want to consider filtering out any data beyond the range of 'Unambiguous Velocity'
Doppler Dilemma (Result of the Laws of Nature)
Higher Pulse Frequency = Higher Maximum Detectable Velocity = shorter Unambiguous Range
Lower Pulse Frequency = longer Unambiguous Range = Lower Maximum Detectable Velocity
It would be interesting to find statistical averages for velocity in both 120km-145km range
and 145km-230km range.
Slide KLCH_20052004_040528
Beam Obstruction
Anomalous Propagation
Slide KILX_20010907_000028
Weak echoes to west(probably dust or insects)
Small, weak strip to east is a gust front.
It would be useful if I had a way to visualize the data.
Robb has provided more training data and several examples of mixed data.
I need to run tests with Robb's new data(prediction: worse overall accuracy.)
2007-01-08
Create a webpage to accompany entropy results; explain terms, moments and statistics.
Make changes to experiment display style.
Rename 'Count' to 'Number of Experiments'
Format Accuracy as Percentage (86.6% vs .866)
Fix Name Field (make sure original experiment xml files define it)
Attach Legend to 'Batch Files' Section
Replace Training File Lists with Aliases
Add Best Accuracy, Worst Accuracy, and Std. Dev to Summary
Provide Rob with Classified Output for Visualization
Provide Rob with Classifications for Unclassified Data