Improve error message in spark tools when trying to access a local file from other nodes #1417

akiezun · 2016-01-06T20:26:06Z

I'm almost certain this used to work.

./bin/gatk/gatk-launch FlagStatSpark -I file:///local/dev/akiezun/bin/gatk/src/test/resources/org/broadinstitute/hellbender/tools/valid.bam -- --sparkRunner SPARK --sparkMaster yarn-client

the error is

java.lang.IllegalArgumentException: Wrong FS: file:/local/dev/akiezun/bin/gatk/src/test/resources/org/broadinstitute/hellbender/tools/valid.bam, expected: hdfs://dataflow01.broadinstitute.org:8020
    at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:654)
    at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:474)
    at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSource.getHeader(ReadsSparkSource.java:181)
    at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.initializeReads(GATKSparkTool.java:284)

It's fine when running a LOCAL runner, or when the file is on HDFS.

When resolving the ticket, make sure to devise a way (or at least enter a ticket) to prevent this from happening again - ie some way to discover this kind of problem.

The text was updated successfully, but these errors were encountered:

lbergelson · 2016-01-06T20:45:24Z

@akiezun Are you running this on your laptop or on one of the dataflow nodes? I suspect it will work on your laptop but not on the dataflow nodes.

I believe this is the inverse problem of #1324

akiezun · 2016-01-06T20:46:20Z

this is on yarn on dataflow01

lbergelson · 2016-01-06T21:02:06Z

This same issue is coming up in a number of different ways. What's happening is that we ask for the FileSystem and then try to open a path using whatever it gives back. The FileSystem it returns only knows how to open 1 type of file, either file:/// or hdfs:///. We could specify a specific file system, but I'm afraid that there will be nasty consequences from that that I don't understand... like getting the wrong hdfs block size or something like that.

          FileSystem fs = FileSystem.get(ctx.hadoopConfiguration());
            Path path = fs.makeQualified(new Path(filePath));
            if (fs.isDirectory(path)) {
                FileStatus[] bamFiles = fs.listStatus(path, new PathFilter() {
                    private static final long serialVersionUID = 1L;
                    @Override
                    public boolean accept(Path path) {
                        return path.getName().startsWith(HADOOP_PART_PREFIX);
                    }
                });
                if (bamFiles.length == 0) {
                    throw new UserException("No BAM files to load header from in: " + path);
                }
                path = bamFiles[0].getPath(); // Hadoop-BAM writes the same header to each shard, so use the first one
            }
            return SAMHeaderReader.readSAMHeaderFrom(path, ctx.hadoopConfiguration());

lbergelson · 2016-01-06T21:06:28Z

If you git bisect can you find a commit where this does work?

lbergelson · 2016-01-25T20:37:30Z

I was assuming that this would be fixed by #1433, but it only fixed the inverse of this problem. It's possible now to load HDFS files from the local runner using the full namenode path

i.e.
hdfs://dataflow01.broadinstitute.org/user/louisb/CEUTrio.HiSeq.WGS.b37.NA12878.20.21.bam,

Loading files with the sparkRunner and yarn-client is still failing.

We're getting a new error now though.

java.lang.IllegalArgumentException: unknown SAM format, cannot create RecordReader: file:/local/dev/akiezun/bin/gatk/src/test/resources/org/broadinstitute/hellbender/tools/valid.bam
    at org.seqdoop.hadoop_bam.AnySAMInputFormat.createRecordReader(AnySAMInputFormat.java:181)
    at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:151)
    at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:124)
    at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

lbergelson · 2016-01-27T17:31:46Z

We've been hit with a collective case of confusion. Of course this is going to fail. We're accessing a file on local. This is not visible to the executors on the other nodes. Thus the explosion.

droazen · 2016-01-27T18:21:51Z

Doh! Perhaps we could improve the error message in this case?

lbergelson · 2016-01-27T18:24:23Z

Yes. We desperately need to improve error messages for hadoop bam.

droazen · 2016-03-10T21:45:39Z

Updated the title of this ticket to clarify that the task is now just to improve the error message, not fix an actual bug.

akiezun · 2016-03-18T13:52:11Z

this is the same as #1452. The bug is in hadoop-bam - let's not put a bandaid for that here.

akiezun · 2016-03-18T13:55:33Z

opened HadoopGenomics/Hadoop-BAM#79

blocked until that is fixed. not alpha-1

tomwhite · 2016-06-29T11:45:07Z

@akiezun I created a fix for HadoopGenomics/Hadoop-BAM#79 in HadoopGenomics/Hadoop-BAM#99, which should fix this. Can you take a look?

akiezun · 2016-07-06T14:15:20Z

yes, though not today

akiezun · 2016-07-28T16:37:40Z

@tomwhite So I'm running on the latest gatk which uses hadoop-bam 7.6.0 (which i think includes those fixes) and I still get this error. I'm running on a dataproc cluster and my exact commandline is:

./gatk-launch FlagStatSpark -I file:///home/akiezun/gatk/src/test/resources/org/broadinstitute/hellbender/tools/valid.bam -- --sparkRunner SPARK --sparkMaster yarn-client

the file /home/akiezun/gatk/src/test/resources/org/broadinstitute/hellbender/tools/valid.bam does exit locally on the master node.

Can you reproduce it too? What do you get?

akiezun · 2016-07-29T18:15:28Z

assigning to @droazen for dispatch

akiezun added bug Spark labels Jan 6, 2016

droazen added this to the alpha-1 milestone Jan 6, 2016

geetduggal mentioned this issue Jan 22, 2016

Spark Standalone+Markduplicates results in empty BAM or 'Wrong FS' exception #1444

Closed

This was referenced Jan 27, 2016

MarkDuplicatesSpark can't write to file:/// when using yarn #1451

Open

Improve Hadoop-bam error message in createRecordReader #1452

Closed

droazen changed the title ~~regression: can't run Spark tools on local files when using yarn~~ Improve error message in spark tools when trying to access a local file from other nodes Mar 10, 2016

droazen assigned akiezun Mar 10, 2016

akiezun added the hadoop-bam label Mar 18, 2016

akiezun mentioned this issue Mar 18, 2016

Improve error message in createRecordReader when file missing HadoopGenomics/Hadoop-BAM#79

Closed

akiezun removed this from the alpha-1 milestone Mar 18, 2016

akiezun added this to the alpha-2 milestone Jul 1, 2016

akiezun modified the milestones: alpha-3, alpha-2 Jul 6, 2016

akiezun assigned droazen and unassigned akiezun Jul 29, 2016

droazen removed this from the alpha-3 milestone Aug 1, 2016

droazen added the alpha3-candidate-pool label Aug 1, 2016

droazen assigned tomwhite and unassigned droazen Mar 20, 2017

droazen added this to the beta milestone Mar 20, 2017

droazen removed the alpha3-candidate-pool label Mar 22, 2017

droazen modified the milestones: 4.0 release, beta May 30, 2017

tomwhite removed their assignment Jun 13, 2017

droazen removed this from the Engine-4.0 milestone Oct 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve error message in spark tools when trying to access a local file from other nodes #1417

Improve error message in spark tools when trying to access a local file from other nodes #1417

Improve error message in spark tools when trying to access a local file from other nodes #1417

Improve error message in spark tools when trying to access a local file from other nodes #1417

Comments