The source code and attachments for the No Moss 3 Carbo Landfill Online Library.

Go to file

David Ball a9b8d439f7 Added .env expansion and set welcome message behavior to use WELCOME_MSG environment variable.		2024-05-25 22:01:46 -04:00
app	Added .env expansion and set welcome message behavior to use WELCOME_MSG environment variable.	2024-05-25 22:01:46 -04:00
docker	Updated default environment.	2024-05-25 22:00:44 -04:00
index	Added defunct code in index folder.	2024-05-23 22:13:47 -04:00
pages	Updated Privacy Policy to add Cloudflare Analytics as an analytics platform.	2024-04-13 14:47:04 -04:00
routes	Moved to correct folder and removed hard-coded URLs.	2024-05-24 15:30:03 -04:00
static	Added Font Awesome, updated template, and correct a file name in public.	2024-05-23 18:32:52 -04:00
views	Added .env expansion and set welcome message behavior to use WELCOME_MSG environment variable.	2024-05-25 22:01:46 -04:00
.env.example	Updated default environment.	2024-05-25 22:00:44 -04:00
.gitattributes	Added more LFS patterns to track.	2024-05-23 15:27:56 -04:00
.gitignore	Added archive files to .gitignore.	2024-05-23 23:49:44 -04:00
archived.php	Added defunct archived.php code.	2024-05-23 21:46:13 -04:00
browse.html	Added defunct browse.html configuration.	2024-05-23 18:36:22 -04:00
gulpfile.js	Added .env expansion and set welcome message behavior to use WELCOME_MSG environment variable.	2024-05-25 22:01:46 -04:00
LICENSE	Added LICENSE.	2024-05-24 03:48:31 -04:00
mirror-virginia-law.ps1	Updated scripts, YouTube cookies example, docker compose configuration, and documentation.	2024-05-24 03:29:36 -04:00
mirror-virginia-law.sh	Increase portability by making paths relative in utility scripts.	2024-05-24 13:10:50 -04:00
package-lock.json	Added .env expansion and set welcome message behavior to use WELCOME_MSG environment variable.	2024-05-25 22:01:46 -04:00
package.json	Added .env expansion and set welcome message behavior to use WELCOME_MSG environment variable.	2024-05-25 22:01:46 -04:00
README.md	Document .env files.	2024-05-24 15:40:16 -04:00
redirects.Caddyfile	Added defunct Caddyfile configuration.	2024-05-23 18:35:13 -04:00
sync-youtube-videos.cmd	Increase portability by making paths relative in utility scripts.	2024-05-24 13:10:50 -04:00
sync-youtube-videos.sh	Increase portability by making paths relative in utility scripts.	2024-05-24 13:10:50 -04:00
tsconfig.build.json	Initial state of the development and production site as of 2024-03-10.	2024-03-10 08:01:21 -04:00
tsconfig.json	Initial state of the development and production site as of 2024-03-10.	2024-03-10 08:01:21 -04:00
web.config	Added news articles from WJHL, Cardinal, and Bristol Herald Courier. Provided archive view. Patched web.config to try to get logging working again, but still not working.	2024-04-30 04:18:51 -04:00
youtube-cookies.example.txt	Updated scripts, YouTube cookies example, docker compose configuration, and documentation.	2024-05-24 03:29:36 -04:00

README.md

Installing Git

On Ubuntu you can just do:

apt install git

Microsoft Documentation: https://learn.microsoft.com/en-us/devops/develop/git/install-and-set-up-git

Git SCM Downloads: https://git-scm.com/downloads

Installing Git LFS

This enables the large file storage system. It needs to be done once per user account that uses it.

git lfs install

Documentation: https://git-lfs.com/

Cloning this Repository

You can use git to clone this repository.

git clone https://gitea.daball.me/No-Moss-3-Carbo-Landfill-Online-Library/no-moss-3-carbo-landfill-library.online.git

Pulling Git Source Updates

From inside the working directory, you can use git to pull source updates from the origin repository.

git pull

Pulling Git LFS Updates

From inside the working directory, download all the large file storage updates from the origin server.

git lfs fetch --all

Installing Docker and Docker Compose (optional)

Installing Docker is beyond the scope of this document. You may choose to install Docker to simplify running this web site.

https://docs.docker.com/get-docker/

Creating a .env file

You will need a .env file if you want to use Docker Compose. Here's an example that will run the Docker fullstack configuration.

# You will need a .env file to use with the Docker containers. This is set up for localhost use with the Docker Compose fullstack.

# SITE_NAME is used for page generation.
SITE_NAME="No Moss 3 Landfill Online Library"
# SITE_URL is used for generating links for the search index. (If you leave this blank it should work using relative paths.)
SITE_URL="https://localhost"

# APP_HTTP_LISTEN_PORT is the TCP port used to access the Node application's HTTP interface (usually by a reverse proxy).
APP_HTTP_HOST="nm3clol"
# APP_HTTP_LISTEN_PORT is the TCP port used to access the Node application's HTTP interface (usually by a reverse proxy).
APP_HTTP_LISTEN_PORT=3000
# APP_URL is the URL used to access the Node application (usually by a reverse proxy).
APP_HTTP_URL="http://$APP_HTTP_HOST:$APP_HTTP_LISTEN_PORT"

# SOLR_DOCS_HOST is the host for Apache Solr's core for indexed documents.
SOLR_DOCS_HOST="solr"
# SOLR_DOCS_PORT is the port for Apache Solr's core for indexed documents.
SOLR_DOCS_PORT=8983
# SOLR_DOCS_CORE_NAME is the core name for Apache Solr's core for indexed documents.
SOLR_DOCS_CORE_NAME="nm3clol_core"
# SOLR_DOCS_URL is the URL to access Apache Solr's core for indexed documents. It is used by Gulp and the Search feature.
SOLR_DOCS_URL="http://$SOLR_DOCS_HOST:$SOLR_DOCS_PORT/solr/$SOLR_DOCS_CORE_NAME"

# SOLR_LAW_HOST is the host for Apache Solr's core for indexed laws.
SOLR_LAW_HOST="$SOLR_DOCS_HOST"
# SOLR_LAW_PORT is the host for Apache Solr's core for indexed laws.
SOLR_LAW_PORT=$SOLR_DOCS_PORT
# SOLR_LAW_CORE_NAME is the core name for Apache Solr's core for indexed laws.
SOLR_LAW_CORE_NAME="vacode_core"
# SOLR_LAW_URL is the URL to access Apache Solr's core for indexed laws. It is used by Gulp (and eventually the Search feature.)
SOLR_LAW_URL="http://$SOLR_LAW_HOST:$SOLR_LAW_PORT/solr/$SOLR_LAW_CORE_NAME"

# TIKA_HOST is the URL to access the host running Apache Tika.
TIKA_HOST="tika"
# TIKA_PORT is the URL to access the host running Apache Tika.
TIKA_PORT=9998
# TIKA_URL is the URL to access the host running Apache Tika.
TIKA_URL="http://tika:$TIKA_PORT"

Installing Solr with Tika and Tesseract

Installing Solr with Tika and Tesseract is beyond the scope of this document. It is the search engine I am currently using.

From inside the solr folder, use Docker Compose bring up Solr and Tika. Tesseract will be installed in the Tika instance.

docker compose up

From inside the working directory, go into the solr folder and use Docker Compose to bring up the instances.

cd solr
docker compose up

Take note of your Docker hostname. Docker should be exposing port 8983 for the solr instance and port 9998 for the tika instance.

Solr Test URL: http://localhost:8983/

Tika Test URL: http://localhost:9998/

In case of trouble with accessing them from outside the localhost, please check and ensure the exposed Solr port is allowed through your firewall for the web server host to access. Please permit the Tika and the Solr port through your firewall for any npm workers that need to request the plaintext for documents outside of Docker and submit them to the Solr search index.

Installing ffmpeg

Before you can re-encode the yt-dlp output, you will need to install ffmpeg and have the ffmpeg binary available in your PATH.

https://ffmpeg.org/download.html

Installing yt-dlp

You need to install yt-dlp and have the binary yt-dlp binary available in your PATH.

Project: https://github.com/yt-dlp/yt-dlp

Standalone Binaries: https://github.com/yt-dlp/yt-dlp#release-files

Installation: https://github.com/yt-dlp/yt-dlp/wiki/Installation

Windows: Reloading Environment

In Windows you will need to log out and back in after updating your environment variables. In PowerShell you can reload your PATH environment variable using:

$env:Path = [System.Environment]::GetEnvironmentVariable("Path","Machine") + ";" + [System.Environment]::GetEnvironmentVariable("Path","User")

Downloading YouTube Archives

You may need to use your web browser to login to YouTube. You can then copy each cookie value out of your web browser using the Developer Tools Network tab. Gather the correct values to build a youtube-cookies.txt file in your working directory using something like the youtube-example-cookies.txt template here:

# Netscape HTTP Cookie File
# This file is generated by yt-dlp.  Do not edit.

.youtube.com	TRUE	/	TRUE	...	GPS	...
.youtube.com	TRUE	/	FALSE	...	PREF	tz=...&f6=...&f7=...&hl=...
.youtube.com	TRUE	/	TRUE	...	SOCS	...
.youtube.com	TRUE	/	TRUE	...	VISITOR_INFO1_LIVE	...
.youtube.com	TRUE	/	TRUE	...	VISITOR_PRIVACY_METADATA	...
.youtube.com	TRUE	/	TRUE	...	YSC	...
youtube.com	FALSE	/	FALSE	...	APISID	.../...
youtube.com	FALSE	/	TRUE	...	PREF	tz=...&f6=...&f7=...
youtube.com	FALSE	/	TRUE	...	SAPISID	.../...
youtube.com	FALSE	/	FALSE	...	SID	...
youtube.com	FALSE	/	FALSE	...	SIDCC	...
youtube.com	FALSE	/	TRUE	...	__Secure-1PAPISID	.../...
youtube.com	FALSE	/	TRUE	...	__Secure-3PAPISID	.../...
youtube.com	FALSE	/	FALSE	...	_gcl_au	...

Linux or MacOS Terminal: You can use the sync-youtube-videos.sh script to download all of the YouTube videos.

chmod a+x ./sync-youtube-videos.sh
./sync-youtube-videos.sh

Windows Command Prompt: You can use the sync-youtube-videos.cmd batch script to download all of the YouTube videos.

sync-youtube-videos.cmd

Updating Virginia Code

You can update the copy of the Virginia code:

Linux or MacOS Terminal: You can use the mirror-virginia-law.sh shell script to download all of the current Virginia code. (Requires wget installed.)

chmod a+x ./mirror-virginia-law.sh
./mirror-virginia-law.sh

Windows PowerShell: You can use the mirror-virginia-law.ps1 PowerShell script to download all of the current Virginia code.

mirror-virginia-law.ps1

Installing Node.js

In order to run the web app server, you must have Node.js installed.

https://nodejs.org/

Installing NPM

Your Node.js should come with NPM. If you don't have it, see these directions:

https://docs.npmjs.com/downloading-and-installing-node-js-and-npm

Installing Web App Server Dependencies

You can use npm to install the web app server dependencies.

npm install

Before Running the Node Server

You may need to transpile the TypeScript to JavaScript.

npm run-script transpile:ts

(Optional) Clearing the Search Index

You can clear the search index using npm:

npm run-script index:clear

(Optional) Rebuilding the Search Index

You can clear and rebuild the search index in one go using npm:

npm run-script index:reindex

Incrementally Building the Document Search Index

You can scan all of the documents into the search index incrementally using npm. If the index is empty, all the documents will be scanned. If the file exists in the index, it's hash is checked before scanning the document's text. The presumption is that a second scan would produce the same text, and the scan is computationally expensive. Using npm, incrementally build the search index:

npm run-script index:docs

Incrementally Building the Laws Search Index

You can scan all of the laws into the search index incrementally using npm. If the index is empty, all the documents will be scanned. If the file exists in the index, it's hash is checked before scanning the document's text. The presumption is that a second scan would produce the same text, and the scan is computationally expensive. Using npm, incrementally build the search index:

npm run-script index:laws

Running the Node Server

You can run the web server using npm.

npm run-script server

Or you can use Node.js directly.

node app/server.js

Reverse Proxy using Caddy

If you want to use Caddy as a reverse proxy to the web application, try a Caddyfile like this:

www.no-moss-3-carbo-landfill-library.online {
    redir https://no-moss-3-carbo-landfill-library.online{uri}
}

no-moss-3-carbo-landfill-library.online {
    reverse_proxy localhost:3000
}

Reverse Proxy using Nginx

If you want to use Nginx as a reverse proxy to the web application, try a configuration file like this:

server {
    listen 80;
    server_name www.no-moss-3-carbo-landfill-library.online;
    location / {
        rewrite ^/(.*) http://no-moss-3-carbo-landfill-library.online/$1 permanent;
    }
}
server {
    listen 80;
    server_name no-moss-3-carbo-landfill-library.online;
    location / {
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;
        proxy_pass http://127.0.0.1:3000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

Securing Nginx

If you want a free SSL certification for Nginx try the certbot to keep your certificates up-to-date.

https://letsencrypt.org/docs/client-options/

Reverse Proxy Using IISNode

If you want to use IIS to access the Node instance, this can be greatly simplified by using IISNode.

IISNode requires the URL_rewrite module for IIS.

https://iis-umbraco.azurewebsites.net/downloads/microsoft/url-rewrite

Then you can set up IISNode.

https://github.com/Azure/iisnode

The included web.config is an example configuration for IISNode.

Post-Installation Considerations

If running the web server application outside of Docker or IISNode, consider using a daemonizer such as PM2.

https://pm2.io/docs/runtime/guide/installation/