<div class="header center">
    <h2>
      Setting Up and Using MOSS
    </h2>
</div>

<h3 class="sub-header" id="intro">
  Introduction
</h3>
<p>
  MOSS is a document-fingerprinting and matching program developed by Stanford University. For more 
  official information on it, check out <a target="_blank" rel="noopener noreferrer" href="https://theory.stanford.edu/~aiken/moss/">
  the official MOSS site.</a> To make this guide reasonably brief, I want to cover the basics of how 
  one goes about using MOSS and what happens in the process. 
</p>
<p>
  We use MOSS to find similarities in programming
  code between all student submissions for a given assignment. This means, we have all of the code files 
  we want to check and MOSS can check them. However, we can't tell MOSS to look for similar code online—it's not
  designed to do that. We can, however, search for code online <span class="italic">before</span> running our plagiarism 
  check, add the code we've found to a single directory, and throw that directory into our MOSS run. Believe it or not,
  it's actually pretty easy to find old/famous assignment solutions online. If you or anyone you know has an account with 
  Chegg or CourseHero, those are the sites to be looking at. Keep in mind, students are cheating because <span class="bold">
  they are just totally clueless.</span> Their cheating will almost always reflect this, and much of their code will be directly
  copied and pasted. Though MOSS isn't a perfect artificial intelligence, it's very sophisticated and works well as a quick, 
  "worst-case" scan to alert you of specific cases to look into further (manually).
</p>

<p>
  This tutorial is divided into the following parts:
</p>
<ol>
  <li>
    <a href="tutorials/moss#intro">
      <span class="toc">Introduction</span>
    </a>
  </li>
  <li>
    <a href="tutorials/moss#setup">
      <span class="toc">Getting the Code and Setting Up</span>
    </a>
  </li>
  <li>
    <a href="tutorials/moss#test-run">
      <span class="toc">A Typical MOSS Run</span>
    </a>
  </li>
  <li>
    <a href="tutorials/moss#moss-prep">
      <span class="toc">Preparing Real Submissions from Canvas</span>
    </a>
  </li>
  <li>
    <a href="tutorials/moss#tips">
      <span class="toc">Closing Tips and Tricks</span>
    </a>
  </li>
</ol>

<h3 class="sub-header" id="setup">
    Getting the Code and Setting Up
</h3>
<p>
  The first thing to understand about MOSS is that <span class="italic">you</span> don't run the actual 
  code that checks all of the documents. MOSS is a service that is hosted on Stanford's servers. However,
  they provide you with a Perl script that will gather up your files and send them to this service for you.
  When the server finishes processing all of the files you've sent, the Perl script will reveal a URL that 
  holds all of the results of the MOSS scan. <span class="bold">[NOTE: These results only last for about 2 weeks, so if there 
  is anything important on there, save it to your computer.]</span>
</p>
<p>
  Since this is a computationally-expensive service that is available for free on the internet, Stanford 
  understandably requires that you register a user ID with the MOSS service and insert the user ID into 
  your Perl script. This way, if you are spamming their service, they can block you out. Luckily, 
  registering a user ID with them is easy—it only requires an email.
</p>
<p>
  To obtain a user ID (in addition to a copy of the Perl script with your new ID conveniently 
  inserted into it), simply email the following message to <code>moss@moss.stanford.edu</code> :
  <br>
  <br>
  <code>
    registeruser
  </code>
    <br>
  <code>
    mail <span class="italic">username@domain</span>
  </code>
  &lt;&lt;&lt;&lt; (where <span class="italic">username@domain</span> is your email address)
</p>
<p>
  These exact instructions can also be found 
  <a href="https://theory.stanford.edu/~aiken/moss/" target="_blank" rel="noopener noreferrer">
    here, on the official MOSS website
  </a>
   (under the Registering for Moss section). 
  After sending that email, you should receive a response from the MOSS server within a few minutes. It can sometimes actually take 
  several minutes to get the response, so be patient with it—if you haven't gotten a response from the server in the first minute, you 
  probably just have to wait a little longer.
</p>
<p>
  Upon receiving the response email, you should find a short introduction followed immediately by the plaintext Perl script with your 
  own user ID integrated into it. It should look something like this:
</p>

<app-code-editor
  [eid]="'00'"
  [title]="'moss (submission script)'"
  [height]="'300px'"
  [width]="'100%'"
  [file]="'assets/moss.pl'"
  [readOnly]="true"
  [theme]="'ambiance'">
</app-code-editor>

<p>
  You may notice that line 167 of this code snippet says:
  <br>
  <br>
  <code>
  $userid=987654321;
  </code>
  <br>
  <br>
  This is the default user ID that is posted on the MOSS website, and <span class="bold">it does NOT work.</span>
  However, the response email will actually use your own user ID in this spot, so this won't be a problem in your 
  custom-built script. Also note, this submission script is written in Perl. That means that the machine you run it on will need to have a Perl 
  interpreter installed. For those of you from IUPUI, the Tesla server already has Perl installed, so this is not 
  a problem.
</p>
<p>
   For ease of use, I'm going to describe how to add this file to your system PATH variable (so you don't have to move 
   it around every time you use it). If you haven't already, make a directory somewhere on your system called "Moss". 
   Then, copy and paste the submission script you received from the email response into a text editor and name the 
   file "moss" (no extension is used in this tutorial). Save it to the Moss directory you just created.
</p>
<h4 class="sub-header">
  Adding moss to PATH
</h4>
<p>
  Adding moss to the system PATH variable on a Linux/Unix system is fairly straightforward. First, you need to 
  configure your SCP program (like WinSCP or CyberDuck) to display hidden files. This is because you need to 
  edit a file in your main directory called <code>.bash_profile</code>. Open this file in a text editor. At some 
  point in this file, it should say:
  <br>
  <br>
  <code>
    PATH=$PATH:...
  </code>
  <br>
  <br>
  You need to add the Moss directory to that variable after the colon. For example, if I created the Moss directory 
  in my main user directory, I would have a path like:
  <br>
  <br>
  <code>
    PATH=$PATH:$HOME/Moss
  </code>
  <br>
  <br>
  Here's an example of how my <code>.bash_profile</code> looks (I have some other stuff in my PATH variable and 
  I'm doing some other unrelated tricks in there, so don't worry about those):

</p>

<div class="margin-80">
  <app-code-editor
    [eid]="'01'"
    [title]="'/home/<username>/.bash_profile'"
    [height]="'200px'"
    [width]="'50%'"
    [file]="'assets/sampleBashProfile.txt'"
    [readOnly]="true"
    [theme]="'ambiance'">
  </app-code-editor>
</div>

<p>
  If you are on a Windows system instead of a Linux/Unix system, then you can still add moss to the user PATH 
  variable. Instead of finding a <code>.bash_profile</code> file, you need to search Windows for "path". The first 
  option should say "Edit the system environment variables". That is the app you need. After opening it, click on 
  "Environment Variables..." in the bottom right corner. Then, in the section labeled "User variables for 
  &lt;username&gt;", there should be a listing called "Path". Double click on it and a new dialogue box should pop 
  up. In that box, click "New", then click "Browse". From here, find the Moss folder you created,
  add it to PATH, and save the changes.
  You may need to restart any command prompts you are using after this.
</p>
<p>
  At this point, the MOSS script should be set up on your machine and you are ready to start running plagiarism 
  checks in any directory.
</p>

<h3 class="sub-header" id="test-run">
  A Typical MOSS Run
</h3>
<p>
  Now that the MOSS submission script is properly setup on your machine, let's examine what a typical run of the 
  program will look like. Let's assume that you've downloaded all of the students' files and have organized them 
  by placing each file in a directory named after its associated student (later, this tutorial
  will discuss how to get to this point easily). Take this example:
</p>
<pre>
<p class="consolas">
        submissions/
        ├──studentA/
        │   └──cardgameA.py
        ├──studentB/
        │   └──cardgameB.py
        ├──studentC/
            └──cardgameC.py
</p>
</pre>
<p>
  In this example, we assume that you have opened a terminal and navigated to the <code>submissions/</code> directory. Within the 
  <code>submissions/</code> directory, there is a directory for every student (studentA, studentB, studentC) which contains all 
  of the source code for each student (in this case, each student has a single file associated with an assignment 
  called "Card Game"). Now, to run the MOSS submission script on these files and compare cardgameA.py, cardgameB.py, and 
  cardgameC.py, you would use the following command:
</p>
<code class="margin-80">
  moss -l python -d */*.py
</code>
<p>
  The <code>-l</code> (lowercase letter L) argument specifies that the language MOSS needs to parse is Python (hence, we follow the <code>
  -l</code> with <code>python</code>). The <code>-d</code> is a useful argument that specifies that the submission files 
  are separated by directory. This means that MOSS will not compare any file with any other file in the same directory. 
  This is useful because sometimes, in more complex assignments, students may have several files and you don't want to 
  trigger matches between 2 files that the same student submitted. We follow the <code>-d</code> with a list of files 
  that MOSS needs to look at. Note that we start the expression from wherever the current working directory is. This 
  means that we need to start our expression as if we are in the <code>submissions/</code> directory since that's where 
  we are located in the terminal. The MOSS submission script is loaded with other useful arguments and documentation in 
  the form of Perl comments, so checks those out sometime. However, you really don't need much more than what is shown 
  in this example.
</p>
<p>
  Upon running this command, you should see something along the lines of:
</p>

<pre class="codeblock margin-80">
Checking files . . .
OK
Uploading studentA/cardgameA.py ...done.
Uploading studentB/cardgameB.py ...done.
Uploading studentC/cardgameC.py ...done.
Query submitted.  Waiting for the server's response.

</pre>
<p>
  Eventually, the server will respond and print a URL at the bottom of the screen in the fashion of:
</p>
<code class="margin-80">
http://moss.stanford.edu/results/123456789
</code>
<p>
  Copy and paste this link into a web browser and it should link you to a results page. From there you can 
  inspect specific similarities between different files and judge for yourself if you think plagiarism occurred. 
  Refer to the <code>-m</code> argument in the documentation (in the Perl script comments) to raise or 
  lower the similarity threshold. However, from personal experience, I would assert that it's not necessary, 
  as MOSS is mainly used as an over-compensating first pass. Any serious allegations of plagiarism should always 
  first be investigated by a human—do not rely on MOSS to tell you exactly who is and isn't cheating. Trying to find 
  just the right threshold to get MOSS to be spot on with 100% of its results is likely a waste of time.
</p>

<h3 class="sub-header" id="moss-prep">
  Preparing Real Submissions from Canvas
</h3>
<p>
  Now that you understand the basics of running MOSS, you  may notice that it would be a pain to download each 
  file individually and make different directories for each student. For this purpose, I have prepared a Python 
  script to help. But before working with this script, you need to download the entire batch of assignment 
  submissions from Canvas. As a TA or Grader (or Teacher/Co-Instructor for that matter), you have access to both 
  the entire gradebook and special options on the assignment page for each assignment. Here, I will show you how 
  to obtain a zip file of every student submission for an assignment from the assignment page—it's pretty simple.
</p>

<img src="assets/downloadSubmissionsEdited.png" alt="Download Submissions button the Assignment page" 
  class="center" height="471" width="800">

<p>
  If you go to the assignment description for a given assignment, 
  then you will see a "Download Submissions" link on the right side of the page. 
  In this case we are using the Change Maker assignment.  Due to responsive 
  design, sometimes this link can appear at the bottom of the page; however, it is typically on the right-hand 
  side. All you have to do is click on it and it will prepare a zip file of every student's submission. Upon 
  unzipping this file, the results may look something like this:
</p>

<img src="assets/canvasSubmissionsEdited.png" alt="Example of files inside the contents of a Canvas submission download" 
  class="center">

<p>
  I have blurred out the students' names for privacy. But, as you can see, these files are all clumped together—this 
  is not ideal for MOSS. We need to refactor the entire contents of this directory to prepare it for a MOSS run. Copy 
  and paste the following Python script onto your machine (anywhere is fine): 
</p>

<div class="margin-80">
  <app-code-editor
    [eid]="'02'"
    [title]="'mossPrep.py'"
    [height]="'700px'"
    [width]="'60%'"
    [file]="'assets/mossPrep230.py'"
    [readOnly]="true"
    [theme]="'ambiance'">
  </app-code-editor>
</div>

<p>
  Once you have saved the mossPrep.py file, you will need to do a couple of things. First, you need to create a new 
  directory somewhere on your machine for all of the new files to be copied to. I recommend naming this directory 
  something like <code>moss-prepped/</code>. Second, you will need to change lines 17 and 18 of the script file.
</p>

<pre class="codeblock text-10 margin-80">
BASE_DIRECTORY = "C:/Path/to/unzipped/folder/FA18-ST-AAAA-00000-00000-Change_Maker_submissions/"
TARGET_DIRECTORY = "C:/Path/to/wherever/you/want/to/dump/the/results/moss-prepped/"
</pre>

<p>
  <code>BASE_DIRECTORY</code> is the main directory where your downloaded files are located and <code>
  TARGET_DIRECTORY</code> is the directory where the files will be copied to (this would be the new 
  <code>moss-prepped/</code> directory). <span class="bold">Note: YOUR FILE PATHS HERE MUST END WITH SLASHES.</span>
  Also, for simplicity's sake, I recommend you use absolute path names so you don't need to worry about where 
  this script is located with respect to the downloaded files.
</p>
<p>
  Once you've done all of that, do a quick double-check that your base and target paths are correct. Keep in mind
  if you are copying and pasting file paths on Windows, Windows will use backslashes instead of forward slashes—so 
  you will need to switch these to forward slashes or double them up to escape them out of the string. Once 
  everything looks correct, run the mossPrep.py script. It should provide feedback on its progress as it is running. 
  When it is done, the new <code>moss-prepped/</code> directory should be loaded with new directories, each one 
  corresponding to a different student. From here you can go to a terminal, navigate to the <code>moss-prepped/</code> directory, and run the same MOSS command as before: <code>moss -l python -d */*.py</code>
</p>
<p>
  <span class="bold">Side Note: if any students submitted zip files, you will need to run this script and then
  extract the contents of their zip file from their new directory in <code>moss-prepped/</code>. You want 
  all of the student's files to be located flatly inside their respective subdirectory.</span>
</p>

<h3 class="sub-heading" id="tips">
  Closing Tips and Tricks
</h3>
<p>
  All that being said, here are some remaining tips about using MOSS:
</p>
<ul>
  <li>
    <p><span class="italic">
      <span class="bold purple text-shadow">Don't rely on MOSS:</span> 
      MOSS is not a perfect tool. Do not rely on it to be an absolute decision algorithm for determining 
      academic dishonesty. Allow it to over-compensate and manually investigate matches that are noticeable. 
      Normally, the matches come in clusters—so, always look for the highest cluster of matches and take a 
      look at those.
    </span></p>
  </li>
  <li>
    <p><span class="italic">
        <span class="bold purple text-shadow">Be ready to re-run the scan quickly:</span> 
      MOSS deletes results approximately every 2 weeks. So, I recommend saving the command you used to run 
      MOSS for each batch in a shell script file <code>(.sh)</code>. Sometimes, especially if you're using 
      directories loaded with files from the internet to match against, these MOSS commands can get complicated 
      and may be difficult to remember. Having a simple shell script that you can re-run is very convenient when 
      students come crying to the professor 3 weeks after they've been notified that they've been busted.
    </span></p>
  </li>
  <li>
    <p><span class="italic">
        <span class="bold purple text-shadow">Backup important results:</span> 
      On this note, MOSS is a free, third-party service. Do not expect it to always be available when you need it. 
      If there are students that you are going to send to the professor, go to the specific match page for those 
      students and download the results (use Web Page, Complete). This way, you don't have to risk not being able 
      to re-run the MOSS query for that batch. However, keep in mind that academic misconduct is serious. You really 
      shouldn't be trying to bust anyone unless you could hold up the 2 plaintext programs side-by-side (without any 
      MOSS highlighting) and convince someone that cheating occurred.
    </span></p>
  </li>
  <li>
      <p><span class="italic">
        <span class="bold purple text-shadow">Think like a cheater:</span> 
        It's very beneficial to evaluate students' code against code that appears online. To do this, you will 
        need to manually find assignment postings online and copy the results into different text files. You should 
        also paste the URL from the source page to each file you create so you can always retrace your steps. After 
        collecting all of the results, you can throw them all into the same directory (flattened out) and name 
        it something that's easy to spot in a MOSS results page. For example, I normally name this directory 
        <code>CHEAT_FILES/</code> and drop it in the same directory that has all of the student subdirectories. 
        In that case, you could use the same MOSS command and you will see "CHEAT_FILES" in pretty capital 
        letters on the results page in any place where someone matched against it. In terms of finding these postings 
        online, Chegg and CourseHero are HUGE with this sort of stuff. So if your department can obtain a Chegg 
        account and/or a CourseHero account, that would be a game changer. In general, finding online postings for 
        assignments can be easy if you use the right approach. Start by copying and pasting a couple of sentences 
        of the EXACT assignment description into Google (or DuckDuckGo if you don't like being tracked).
         You will be surprised to see how many accurate results pop 
        up with such a short (and potentially generic) phrase. Repeat this process for several chunks of the 
        assignment description until you feel you have collected most of the online postings. From there, it's just 
        a matter of throwing them in the batch and letting MOSS do its magic.
      </span></p>
  <li>
    <p><span class="italic">
      <span class="bold purple text-shadow">Reverse-search strange matches:</span> 
      Often times, students who cheat will match with students they've never met or seen in their entire lives. 
      This tends to be the result of 2 or more students copying from the same online site (perhaps a site you 
      missed previously). In this case, you can still successfully track down the site that the students obtained code 
      from. The process is the same as before (picking very small bits out of the file and searching for them), except 
      now you need to search for chunks out of the code files that the students submitted. Students often change 
      variable names, so focus your searches on parts of the code that don't use many variables, but still have 
      some uniqueness to them. I have busted a great many students with this "reverse-search" approach—it's a solid 
      tactic.
    </span></p>
  </li>
</ul>